voice recognition software: Topics by Science.gov

Sample records for voice recognition software

Voice Recognition Software Accuracy with Second Language Speakers of English.

ERIC Educational Resources Information Center

Coniam, D.

1999-01-01

Explores the potential of the use of voice-recognition technology with second-language speakers of English. Involves the analysis of the output produced by a small group of very competent second-language subjects reading a text into the voice recognition software Dragon Systems "Dragon NaturallySpeaking." (Author/VWL)
Embodied Transcription: A Creative Method for Using Voice-Recognition Software

ERIC Educational Resources Information Center

Brooks, Christine

2010-01-01

Voice-recognition software is designed to be used by one user (voice) at a time, requiring a researcher to speak all of the words of a recorded interview to achieve transcription. Thus, the researcher becomes a conduit through which interview material is inscribed as written word. Embodied Transcription acknowledges performative and interpretative…
Voice recognition software for clinical use.

PubMed

Korn, K

1998-11-01

The current generation voice recognition products truly offer the promise of voice recognition systems, that are financially and operationally acceptable for use in a health care facility. Although the initial capital outlay for the purchase of such equipment may be substantial, the long-term benefit is felt to outweigh the expense. The ability to utilize computer equipment for educational purposes and information management alone helps to rationalize the cost. In addition, it is important to remember that the Internet has become a substantial source of information which provides another functional use for this equipment. Although one can readily see the implication for such a program in clinical practice, other uses for the program should not be overlooked. Uses far beyond the writing of clinic notes and correspondence can be easily envisioned. Utilization of voice recognition software offers clinical practices the ability to produce quality printed records in a timely and cost-effective manner. After learning procedures for the selected product and appropriately formatting word processing software and printers, printed progress notes should be able to be produced in less time than traditional dictation and transcription methods. Although certain procedures and practices may need to be altered, or may preclude optimal utilization of this type of system, many advantages are apparent. It is recommended that facilities consider utilization of Voice Recognition products such as Dragon Systems Naturally Speaking Software, or at least consider a trial of this method with one of the limited-feature products, if current dictation practices are unsatisfactory or excessively costly. Free downloadable trial software or single user software can provide a reduced-cost method for trial evaluation of such products if a major commitment is not felt to be desired. A list of voice recognition software manufacturer web sites may be accessed through the following: http://www.dragonsys.com/ http://www.software.ibm/com/is/voicetype/ http://www.lhs.com/
Voice-Recognition Augmented Performance Tools in Performance Poetry Pedagogy

ERIC Educational Resources Information Center

Devanny, David; McGowan, Jack

2016-01-01

This provocation shares findings from the use of bespoke voice-recognition performance software in a number of seminars (which took place in the 2014-2016 academic years at Glasgow School of Art, University of Warwick, and Falmouth University). The software, made available through this publication, is a web-app which uses Google Chrome's native…
An automatic speech recognition system with speaker-independent identification support

NASA Astrophysics Data System (ADS)

Caranica, Alexandru; Burileanu, Corneliu

2015-02-01

The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.
What does voice-processing technology support today?

PubMed Central

Nakatsu, R; Suzuki, Y

1995-01-01

This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions. Images Fig. 3 PMID:7479720
How well does voice interaction work in space?

NASA Technical Reports Server (NTRS)

Morris, Randy B.; Whitmore, Mihriban; Adam, Susan C.

1993-01-01

The methods and results of an evaluation of the Voice Navigator software package are discussed. The first phase or ground phase of the study consisted of creating, or training, computer voice files of specific commands. This consisted of repeating each of six commands eight times. The files were then tested for recognition accuracy by the software aboard the microgravity aircraft. During the second phase, both voice training and testing were performed in microgravity. Inflight training was done due to problems encountered in phase one which were believed to be caused by ambient noise levels. Both quantitative and qualitative data were collected. Only one of the commands was found to offer consistently high recognition rates across subjects during the second phase.
Researching the Use of Voice Recognition Writing Software.

ERIC Educational Resources Information Center

Honeycutt, Lee

2003-01-01

Notes that voice recognition technology (VRT) has become accurate and fast enough to be useful in a variety of writing scenarios. Contends that little is known about how this technology might affect writing process or perceptions of silent writing. Explores future use of VRT by examining past research in the technology of dictation. (PM)
(Almost) Word for Word: As Voice Recognition Programs Improve, Students Reap the Benefits

ERIC Educational Resources Information Center

Smith, Mark

2006-01-01

Voice recognition software is hardly new--attempts at capturing spoken words and turning them into written text have been available to consumers for about two decades. But what was once an expensive and highly unreliable tool has made great strides in recent years, perhaps most recognized in programs such as Nuance's Dragon NaturallySpeaking…
Voice reaction times with recognition for Commodore computers

NASA Technical Reports Server (NTRS)

Washburn, David A.; Putney, R. Thompson

1990-01-01

Hardware and software modifications are presented that allow for collection and recognition by a Commodore computer of spoken responses. Responses are timed with millisecond accuracy and automatically analyzed and scored. Accuracy data for this device from several experiments are presented. Potential applications and suggestions for improving recognition accuracy are also discussed.
Comparison of voice-automated transcription and human transcription in generating pathology reports.

PubMed

Al-Aynati, Maamoun M; Chorneyko, Katherine A

2003-06-01

Software that can convert spoken words into written text has been available since the early 1980s. Early continuous speech systems were developed in 1994, with the latest commercially available editions having a claimed accuracy of up to 98% of speech recognition at natural speech rates. To evaluate the efficacy of one commercially available voice-recognition software system with pathology vocabulary in generating pathology reports and to compare this with human transcription. To draw cost analysis conclusions regarding human versus computer-based transcription. Two hundred six routine pathology reports from the surgical pathology material handled at St Joseph's Healthcare, Hamilton, Ontario, were generated simultaneously using computer-based transcription and human transcription. The following hardware and software were used: a desktop 450-MHz Intel Pentium III processor with 192 MB of RAM, a speech-quality sound card (Sound Blaster), noise-canceling headset microphone, and IBM ViaVoice Pro version 8 with pathology vocabulary support (Voice Automated, Huntington Beach, Calif). The cost of the hardware and software used was approximately Can 2250 dollars. A total of 23 458 words were transcribed using both methods with a mean of 114 words per report. The mean accuracy rate was 93.6% (range, 87.4%-96%) using the computer software, compared to a mean accuracy of 99.6% (range, 99.4%-99.8%) for human transcription (P <.001). Time needed to edit documents by the primary evaluator (M.A.) using the computer was on average twice that needed for editing the documents produced by human transcriptionists (range, 1.4-3.5 times). The extra time needed to edit documents was 67 minutes per week (13 minutes per day). Computer-based continuous speech-recognition systems in pathology can be successfully used in pathology practice even during the handling of gross pathology specimens. The relatively low accuracy rate of this voice-recognition software with resultant increased editing burden on pathologists may not encourage its application on a wide scale in pathology departments with sufficient human transcription services, despite significant potential financial savings. However, computer-based transcription represents an attractive and relatively inexpensive alternative to human transcription in departments where there is a shortage of transcription services, and will no doubt become more commonly used in pathology departments in the future.
Robotics control using isolated word recognition of voice input

NASA Technical Reports Server (NTRS)

Weiner, J. M.

1977-01-01

A speech input/output system is presented that can be used to communicate with a task oriented system. Human speech commands and synthesized voice output extend conventional information exchange capabilities between man and machine by utilizing audio input and output channels. The speech input facility is comprised of a hardware feature extractor and a microprocessor implemented isolated word or phrase recognition system. The recognizer offers a medium sized (100 commands), syntactically constrained vocabulary, and exhibits close to real time performance. The major portion of the recognition processing required is accomplished through software, minimizing the complexity of the hardware feature extractor.
United States Homeland Security and National Biometric Identification

DTIC Science & Technology

2002-04-09

security number. Biometrics is the use of unique individual traits such as fingerprints, iris eye patterns, voice recognition, and facial recognition to...technology to control access onto their military bases using a Defense Manpower Management Command developed software application. FACIAL Facial recognition systems...installed facial recognition systems in conjunction with a series of 200 cameras to fight street crime and identify terrorists. The cameras, which are
Voice recognition products-an occupational risk for users with ULDs?

PubMed

Williams, N R

2003-10-01

Voice recognition systems (VRS) allow speech to be converted both directly into text-which appears on the screen of a computer-and to direct equipment to perform specific functions. Suggested applications are many and varied, including increasing efficiency in the reporting of radiographs, allowing directed surgery and enabling individuals with upper limb disorders (ULDs) who cannot use other input devices, such as keyboards and mice, to carry out word processing and other activities. Aim This paper describes four cases of vocal dysfunction related to the use of such software, which have been identified from the database of the Voice and Speech Laboratory of the Massachusetts Eye and Ear infirmary (MEEI). The database was searched using key words 'voice recognition' and four cases were identified from a total of 4800. In all cases, the VRS was supplied to assist individuals with ULDs who could not use conventional input devices. Case reports illustrate time of onset and symptoms experienced. The cases illustrate the need for risk assessment and consideration of the ergonomic aspects of voice use prior to such adaptations being used, particularly in those who already experience work-related ULDs.
The Voice Transcription Technique: Use of Voice Recognition Software to Transcribe Digital Interview Data in Qualitative Research

ERIC Educational Resources Information Center

Matheson, Jennifer L.

2007-01-01

Transcribing interview data is a time-consuming task that most qualitative researchers dislike. Transcribing is even more difficult for people with physical limitations because traditional transcribing requires manual dexterity and the ability to sit at a computer for long stretches of time. Researchers have begun to explore using an automated…
Scientific bases of human-machine communication by voice.

PubMed Central

Schafer, R W

1995-01-01

The scientific bases for human-machine communication by voice are in the fields of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology. The purpose of this paper is to highlight the basic scientific and technological issues in human-machine communication by voice and to point out areas of future research opportunity. The discussion is organized around the following major issues in implementing human-machine voice communication systems: (i) hardware/software implementation of the system, (ii) speech synthesis for voice output, (iii) speech recognition and understanding for voice input, and (iv) usability factors related to how humans interact with machines. PMID:7479802
Voice input/output capabilities at Perception Technology Corporation

NASA Technical Reports Server (NTRS)

Ferber, Leon A.

1977-01-01

Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included.
What Are Some Types of Assistive Devices and How Are They Used?

MedlinePlus

... in persons with hearing problems. Cognitive assistance, including computer or electrical assistive devices, can help people function following brain injury. Computer software and hardware, such as voice recognition programs, ...
[Research on Control System of an Exoskeleton Upper-limb Rehabilitation Robot].

PubMed

Wang, Lulu; Hu, Xin; Hu, Jie; Fang, Youfang; He, Rongrong; Yu, Hongliu

2016-12-01

In order to help the patients with upper-limb disfunction go on rehabilitation training,this paper proposed an upper-limb exoskeleton rehabilitation robot with four degrees of freedom(DOF),and realized two control schemes,i.e.,voice control and electromyography control.The hardware and software design of the voice control system was completed based on RSC-4128 chips,which realized the speech recognition technology of a specific person.Besides,this study adapted self-made surface eletromyogram(sEMG)signal extraction electrodes to collect sEMG signals and realized pattern recognition by conducting sEMG signals processing,extracting time domain features and fixed threshold algorithm.In addition,the pulse-width modulation(PWM)algorithm was used to realize the speed adjustment of the system.Voice control and electromyography control experiments were then carried out,and the results showed that the mean recognition rate of the voice control and electromyography control reached 93.1%and 90.9%,respectively.The results proved the feasibility of the control system.This study is expected to lay a theoretical foundation for the further improvement of the control system of the upper-limb rehabilitation robot.
Top 10 "Smart" Technologies for Schools.

ERIC Educational Resources Information Center

Fodeman, Doug; Holzberg, Carol S.; Kennedy, Kristen; McIntire, Todd; McLester, Susan; Ohler, Jason; Parham, Charles; Poftak, Amy; Schrock, Kathy; Warlick, David

2002-01-01

Describes 10 smart technologies for education, including voice to text software; mobile computing; hybrid computing; virtual reality; artificial intelligence; telementoring; assessment methods; digital video production; fingerprint recognition; and brain functions. Lists pertinent Web sites for each technology. (LRW)

When the face fits: recognition of celebrities from matching and mismatching faces and voices.

PubMed

Stevenage, Sarah V; Neil, Greg J; Hamlin, Iain

2014-01-01

The results of two experiments are presented in which participants engaged in a face-recognition or a voice-recognition task. The stimuli were face-voice pairs in which the face and voice were co-presented and were either "matched" (same person), "related" (two highly associated people), or "mismatched" (two unrelated people). Analysis in both experiments confirmed that accuracy and confidence in face recognition was consistently high regardless of the identity of the accompanying voice. However accuracy of voice recognition was increasingly affected as the relationship between voice and accompanying face declined. Moreover, when considering self-reported confidence in voice recognition, confidence remained high for correct responses despite the proportion of these responses declining across conditions. These results converged with existing evidence indicating the vulnerability of voice recognition as a relatively weak signaller of identity, and results are discussed in the context of a person-recognition framework.
Obligatory and facultative brain regions for voice-identity recognition

PubMed Central

Roswandowitz, Claudia; Kappes, Claudia; Obrig, Hellmuth; von Kriegstein, Katharina

2018-01-01

Abstract Recognizing the identity of others by their voice is an important skill for social interactions. To date, it remains controversial which parts of the brain are critical structures for this skill. Based on neuroimaging findings, standard models of person-identity recognition suggest that the right temporal lobe is the hub for voice-identity recognition. Neuropsychological case studies, however, reported selective deficits of voice-identity recognition in patients predominantly with right inferior parietal lobe lesions. Here, our aim was to work towards resolving the discrepancy between neuroimaging studies and neuropsychological case studies to find out which brain structures are critical for voice-identity recognition in humans. We performed a voxel-based lesion-behaviour mapping study in a cohort of patients (n = 58) with unilateral focal brain lesions. The study included a comprehensive behavioural test battery on voice-identity recognition of newly learned (voice-name, voice-face association learning) and familiar voices (famous voice recognition) as well as visual (face-identity recognition) and acoustic control tests (vocal-pitch and vocal-timbre discrimination). The study also comprised clinically established tests (neuropsychological assessment, audiometry) and high-resolution structural brain images. The three key findings were: (i) a strong association between voice-identity recognition performance and right posterior/mid temporal and right inferior parietal lobe lesions; (ii) a selective association between right posterior/mid temporal lobe lesions and voice-identity recognition performance when face-identity recognition performance was factored out; and (iii) an association of right inferior parietal lobe lesions with tasks requiring the association between voices and faces but not voices and names. The results imply that the right posterior/mid temporal lobe is an obligatory structure for voice-identity recognition, while the inferior parietal lobe is only a facultative component of voice-identity recognition in situations where additional face-identity processing is required. PMID:29228111
Obligatory and facultative brain regions for voice-identity recognition.

PubMed

Roswandowitz, Claudia; Kappes, Claudia; Obrig, Hellmuth; von Kriegstein, Katharina

2018-01-01

Recognizing the identity of others by their voice is an important skill for social interactions. To date, it remains controversial which parts of the brain are critical structures for this skill. Based on neuroimaging findings, standard models of person-identity recognition suggest that the right temporal lobe is the hub for voice-identity recognition. Neuropsychological case studies, however, reported selective deficits of voice-identity recognition in patients predominantly with right inferior parietal lobe lesions. Here, our aim was to work towards resolving the discrepancy between neuroimaging studies and neuropsychological case studies to find out which brain structures are critical for voice-identity recognition in humans. We performed a voxel-based lesion-behaviour mapping study in a cohort of patients (n = 58) with unilateral focal brain lesions. The study included a comprehensive behavioural test battery on voice-identity recognition of newly learned (voice-name, voice-face association learning) and familiar voices (famous voice recognition) as well as visual (face-identity recognition) and acoustic control tests (vocal-pitch and vocal-timbre discrimination). The study also comprised clinically established tests (neuropsychological assessment, audiometry) and high-resolution structural brain images. The three key findings were: (i) a strong association between voice-identity recognition performance and right posterior/mid temporal and right inferior parietal lobe lesions; (ii) a selective association between right posterior/mid temporal lobe lesions and voice-identity recognition performance when face-identity recognition performance was factored out; and (iii) an association of right inferior parietal lobe lesions with tasks requiring the association between voices and faces but not voices and names. The results imply that the right posterior/mid temporal lobe is an obligatory structure for voice-identity recognition, while the inferior parietal lobe is only a facultative component of voice-identity recognition in situations where additional face-identity processing is required. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain.
Processing Electromyographic Signals to Recognize Words

NASA Technical Reports Server (NTRS)

Jorgensen, C. C.; Lee, D. D.

2009-01-01

A recently invented speech-recognition method applies to words that are articulated by means of the tongue and throat muscles but are otherwise not voiced or, at most, are spoken sotto voce. This method could satisfy a need for speech recognition under circumstances in which normal audible speech is difficult, poses a hazard, is disturbing to listeners, or compromises privacy. The method could also be used to augment traditional speech recognition by providing an additional source of information about articulator activity. The method can be characterized as intermediate between (1) conventional speech recognition through processing of voice sounds and (2) a method, not yet developed, of processing electroencephalographic signals to extract unspoken words directly from thoughts. This method involves computational processing of digitized electromyographic (EMG) signals from muscle innervation acquired by surface electrodes under a subject's chin near the tongue and on the side of the subject s throat near the larynx. After preprocessing, digitization, and feature extraction, EMG signals are processed by a neural-network pattern classifier, implemented in software, that performs the bulk of the recognition task as described.
Exploring expressivity and emotion with artificial voice and speech technologies.

PubMed

Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

2013-10-01

Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.
Graphics with Special Interfaces for Disabled People.

ERIC Educational Resources Information Center

Tronconi, A.; And Others

The paper describes new software and special input devices to allow physically impaired children to utilize the graphic capabilities of personal computers. Special input devices for computer graphics access--the voice recognition card, the single switch, or the mouse emulator--can be used either singly or in combination by the disabled to control…
Literature review of voice recognition and generation technology for Army helicopter applications

NASA Astrophysics Data System (ADS)

Christ, K. A.

1984-08-01

This report is a literature review on the topics of voice recognition and generation. Areas covered are: manual versus vocal data input, vocabulary, stress and workload, noise, protective masks, feedback, and voice warning systems. Results of the studies presented in this report indicate that voice data entry has less of an impact on a pilot's flight performance, during low-level flying and other difficult missions, than manual data entry. However, the stress resulting from such missions may cause the pilot's voice to change, reducing the recognition accuracy of the system. The noise present in helicopter cockpits also causes the recognition accuracy to decrease. Noise-cancelling devices are being developed and improved upon to increase the recognition performance in noisy environments. Future research in the fields of voice recognition and generation should be conducted in the areas of stress and workload, vocabulary, and the types of voice generation best suited for the helicopter cockpit. Also, specific tasks should be studied to determine whether voice recognition and generation can be effectively applied.
Temporal voice areas exist in autism spectrum disorder but are dysfunctional for voice identity recognition

PubMed Central

Borowiak, Kamila; von Kriegstein, Katharina

2016-01-01

The ability to recognise the identity of others is a key requirement for successful communication. Brain regions that respond selectively to voices exist in humans from early infancy on. Currently, it is unclear whether dysfunction of these voice-sensitive regions can explain voice identity recognition impairments. Here, we used two independent functional magnetic resonance imaging studies to investigate voice processing in a population that has been reported to have no voice-sensitive regions: autism spectrum disorder (ASD). Our results refute the earlier report that individuals with ASD have no responses in voice-sensitive regions: Passive listening to vocal, compared to non-vocal, sounds elicited typical responses in voice-sensitive regions in the high-functioning ASD group and controls. In contrast, the ASD group had a dysfunction in voice-sensitive regions during voice identity but not speech recognition in the right posterior superior temporal sulcus/gyrus (STS/STG)—a region implicated in processing complex spectrotemporal voice features and unfamiliar voices. The right anterior STS/STG correlated with voice identity recognition performance in controls but not in the ASD group. The findings suggest that right STS/STG dysfunction is critical for explaining voice recognition impairments in high-functioning ASD and show that ASD is not characterised by a general lack of voice-sensitive responses. PMID:27369067
Writing with voice: an investigation of the use of a voice recognition system as a writing aid for a man with aphasia.

PubMed

Bruce, Carolyn; Edmundson, Anne; Coleman, Michael

2003-01-01

People with aphasia may experience difficulties that prevent them from demonstrating in writing what they know and can produce orally. Voice recognition systems that allow the user to speak into a microphone and see their words appear on a computer screen have the potential to assist written communication. This study investigated whether a man with fluent aphasia could learn to use Dragon NaturallySpeaking to write. A single case study of a man with acquired writing difficulties is reported. A detailed account is provided of the stages involved in teaching him to use the software. The therapy tasks carried out to develop his functional use of the system are then described. Outcomes included the percentage of words accurately recognized by the system over time, the quantitative and qualitative changes in written texts produced with and without the use of the speech-recognition system, and the functional benefits the man described. The treatment programme was successful and resulted in a marked improvement in the subject's written work. It also had effects in the functional life domain as the subject could use writing for communication purposes. The results suggest that the technology might benefit others with acquired writing difficulties.
The Voice as Computer Interface: A Look at Tomorrow's Technologies.

ERIC Educational Resources Information Center

Lange, Holley R.

1991-01-01

Discussion of voice as the communications device for computer-human interaction focuses on voice recognition systems for use within a library environment. Voice technologies are described, including voice response and voice recognition; examples of voice systems in use in libraries are examined; and further possibilities, including use with…
Input and Output Mechanisms and Devices. Phase I: Adding Voice Output to a Speaker-Independent Recognition System.

ERIC Educational Resources Information Center

Scott Instruments Corp., Denton, TX.

This project was designed to develop techniques for adding low-cost speech synthesis to educational software. Four tasks were identified for the study: (1) select a microcomputer with a built-in analog-to-digital converter that is currently being used in educational environments; (2) determine the feasibility of implementing expansion and playback…
Voice recognition software can be used for scientific articles.

PubMed

Pommergaard, Hans-Christian; Huang, Chenxi; Burcharth, Jacob; Rosenberg, Jacob

2015-02-01

Dictation of scientific articles has been recognised as an efficient method for producing high-quality, first article drafts. However, standardised transcription service by a secretary may not be available for all researchers and voice recognition software (VRS) may therefore be an alternative. The purpose of this study was to evaluate the out-of-the-box accuracy of VRS. Eleven young researchers without dictation experience dictated the first draft of their own scientific article after thorough preparation according to a pre-defined schedule. The dictate transcribed by VRS was compared with the same dictate transcribed by an experienced research secretary, and the effect of adding words to the vocabulary of the VRS was investigated. The number of errors per hundred words was used as outcome. Furthermore, three experienced researchers assessed the subjective readability using a Likert scale (0-10). Dragon Nuance Premium version 12.5 was used as VRS. The median number of errors per hundred words was 18 (range: 8.5-24.3), which improved when 15,000 words were added to the vocabulary. Subjective readability assessment showed that the texts were understandable with a median score of five (range: 3-9), which was improved with the addition of 5,000 words. The out-of-the-box performance of VRS was acceptable and improved after additional words were added. Further studies are needed to investigate the effect of additional software accuracy training.
Voice Recognition in Face-Blind Patients

PubMed Central

Liu, Ran R.; Pancaroglu, Raika; Hills, Charlotte S.; Duchaine, Brad; Barton, Jason J. S.

2016-01-01

Right or bilateral anterior temporal damage can impair face recognition, but whether this is an associative variant of prosopagnosia or part of a multimodal disorder of person recognition is an unsettled question, with implications for cognitive and neuroanatomic models of person recognition. We assessed voice perception and short-term recognition of recently heard voices in 10 subjects with impaired face recognition acquired after cerebral lesions. All 4 subjects with apperceptive prosopagnosia due to lesions limited to fusiform cortex had intact voice discrimination and recognition. One subject with bilateral fusiform and anterior temporal lesions had a combined apperceptive prosopagnosia and apperceptive phonagnosia, the first such described case. Deficits indicating a multimodal syndrome of person recognition were found only in 2 subjects with bilateral anterior temporal lesions. All 3 subjects with right anterior temporal lesions had normal voice perception and recognition, 2 of whom performed normally on perceptual discrimination of faces. This confirms that such lesions can cause a modality-specific associative prosopagnosia. PMID:25349193
Famous faces and voices: Differential profiles in early right and left semantic dementia and in Alzheimer's disease.

PubMed

Luzzi, Simona; Baldinelli, Sara; Ranaldi, Valentina; Fabi, Katia; Cafazzo, Viviana; Fringuelli, Fabio; Silvestrini, Mauro; Provinciali, Leandro; Reverberi, Carlo; Gainotti, Guido

2017-01-08

Famous face and voice recognition is reported to be impaired both in semantic dementia (SD) and in Alzheimer's Disease (AD), although more severely in the former. In AD a coexistence of perceptual impairment in face and voice processing has also been reported and this could contribute to the altered performance in complex semantic tasks. On the other hand, in SD both face and voice recognition disorders could be related to the prevalence of atrophy in the right temporal lobe (RTL). The aim of the present study was twofold: (1) to investigate famous faces and voices recognition in SD and AD to verify if the two diseases show a differential pattern of impairment, resulting from disruption of different cognitive mechanisms; (2) to check if face and voice recognition disorders prevail in patients with atrophy mainly affecting the RTL. To avoid the potential influence of primary perceptual problems in face and voice recognition, a pool of patients suffering from early SD and AD were administered a detailed set of tests exploring face and voice perception. Thirteen SD (8 with prevalence of right and 5 with prevalence of left temporal atrophy) and 25 CE patients, who did not show visual and auditory perceptual impairment, were finally selected and were administered an experimental battery exploring famous face and voice recognition and naming. Twelve SD patients underwent cerebral PET imaging and were classified in right and left SD according to the onset modality and to the prevalent decrease in FDG uptake in right or left temporal lobe respectively. Correlation of PET imaging and famous face and voice recognition was performed. Results showed a differential performance profile in the two diseases, because AD patients were significantly impaired in the naming tests, but showed preserved recognition, whereas SD patients were profoundly impaired both in naming and in recognition of famous faces and voices. Furthermore, face and voice recognition disorders prevailed in SD patients with RTL atrophy, who also showed a conceptual impairment on the Pyramids and Palm Trees test more important in the pictorial than in the verbal modality. Finally, in 12SD patients in whom PET was available, a strong correlation between FDG uptake and face-to-name and voice-to-name matching data was found in the right but not in the left temporal lobe. The data support the hypothesis of a different cognitive basis for impairment of face and voice recognition in the two dementias and suggest that the pattern of impairment in SD may be due to a loss of semantic representations, while a defect of semantic control, with impaired naming and preserved recognition might be hypothesized in AD. Furthermore, the correlation between face and voice recognition disorders and RTL damage are consistent with the hypothesis assuming that in the RTL person-specific knowledge may be mainly based upon non-verbal representations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Improving Automated Lexical and Discourse Analysis of Online Chat Dialog

DTIC Science & Technology

2007-09-01

include spelling- and grammar-checking on our word processing software; voice-recognition in our automobiles; and telephone-based conversational agents ...conversational agents can help customers make purchases on-line [3]. In addition, discourse analyzers can automatically separate multiple, interleaved...telephone-based conversational agent needs to know if it was asked a question or tasked to do something. Indeed, Stolcke et al demonstrated that
An estimate of the prevalence of developmental phonagnosia.

PubMed

Shilowich, Bryan E; Biederman, Irving

2016-08-01

A web-based survey estimated the distribution of voice recognition abilities with a focus on determining the prevalence of developmental phonagnosia, the inability to identify a familiar person based on their voice. Participants matched clips of 50 celebrity voices to 1-4 named headshots of celebrities whose voices they had previously rated for familiarity. Given a strong correlation between rated familiarity and recognition performance, a residual was calculated based on the average familiarity rating on each trial, which thus constituted each respondent's voice recognition ability that could not be accounted for by familiarity. 3.2% of the respondents (23 of 730 participants) had residual recognition scores 2.28 SDs below the mean (whereas 8 or 1.1% would have been expected from a normal distribution). They also judged whether they could imagine the voice of five familiar celebrities. Individuals who had difficulty in imagining voices were also generally below average in their accuracy of recognition. Copyright © 2016 Elsevier Inc. All rights reserved.
Voice Recognition: A New Assessment Tool?

ERIC Educational Resources Information Center

Jones, Darla

2005-01-01

This article presents the results of a study conducted in Anchorage, Alaska, that evaluated the accuracy and efficiency of using voice recognition (VR) technology to collect oral reading fluency data for classroom-based assessments. The primary research question was as follows: Is voice recognition technology a valid and reliable alternative to…
Children's Recognition of Cartoon Voices.

ERIC Educational Resources Information Center

Spence, Melanie J.; Rollins, Pamela R.; Jerger, Susan

2002-01-01

A study examined developmental changes in talker recognition skills by assessing 72 children's (ages 3-5) recognition of 20 cartoon characters' voices. Four- and 5-year-old children recognized more of the voices than did 3-year-olds. All children were more accurate at recognizing more familiar characters than less familiar characters. (Contains…
It doesn't matter what you say: FMRI correlates of voice learning and recognition independent of speech content.

PubMed

Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal

2017-09-01

Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evaluation of a voice recognition system for the MOTAS pseudo pilot station function

NASA Technical Reports Server (NTRS)

Houck, J. A.

1982-01-01

The Langley Research Center has undertaken a technology development activity to provide a capability, the mission oriented terminal area simulation (MOTAS), wherein terminal area and aircraft systems studies can be performed. An experiment was conducted to evaluate state-of-the-art voice recognition technology and specifically, the Threshold 600 voice recognition system to serve as an aircraft control input device for the MOTAS pseudo pilot station function. The results of the experiment using ten subjects showed a recognition error of 3.67 percent for a 48-word vocabulary tested against a programmed vocabulary of 103 words. After the ten subjects retrained the Threshold 600 system for the words which were misrecognized or rejected, the recognition error decreased to 1.96 percent. The rejection rates for both cases were less than 0.70 percent. Based on the results of the experiment, voice recognition technology and specifically the Threshold 600 voice recognition system were chosen to fulfill this MOTAS function.

Using voice to create hospital progress notes: Description of a mobile application and supporting system integrated with a commercial electronic health record.

PubMed

Payne, Thomas H; Alonso, W David; Markiel, J Andrew; Lybarger, Kevin; White, Andrew A

2018-01-01

We describe the development and design of a smartphone app-based system to create inpatient progress notes using voice, commercial automatic speech recognition software, with text processing to recognize spoken voice commands and format the note, and integration with a commercial EHR. This new system fits hospital rounding workflow and was used to support a randomized clinical trial testing whether use of voice to create notes improves timeliness of note availability, note quality, and physician satisfaction with the note creation process. The system was used to create 709 notes which were placed in the corresponding patient's EHR record. The median time from pressing the Send button to appearance of the formatted note in the Inbox was 8.8 min. It was generally very reliable, accepted by physician users, and secure. This approach provides an alternative to use of keyboard and templates to create progress notes and may appeal to physicians who prefer voice to typing. Copyright © 2017 Elsevier Inc. All rights reserved.
A Joint Time-Frequency and Matrix Decomposition Feature Extraction Methodology for Pathological Voice Classification

NASA Astrophysics Data System (ADS)

Ghoraani, Behnaz; Krishnan, Sridhar

2009-12-01

The number of people affected by speech problems is increasing as the modern world places increasing demands on the human voice via mobile telephones, voice recognition software, and interpersonal verbal communications. In this paper, we propose a novel methodology for automatic pattern classification of pathological voices. The main contribution of this paper is extraction of meaningful and unique features using Adaptive time-frequency distribution (TFD) and nonnegative matrix factorization (NMF). We construct Adaptive TFD as an effective signal analysis domain to dynamically track the nonstationarity in the speech and utilize NMF as a matrix decomposition (MD) technique to quantify the constructed TFD. The proposed method extracts meaningful and unique features from the joint TFD of the speech, and automatically identifies and measures the abnormality of the signal. Depending on the abnormality measure of each signal, we classify the signal into normal or pathological. The proposed method is applied on the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database which consists of 161 pathological and 51 normal speakers, and an overall classification accuracy of 98.6% was achieved.
Effects of emotional and perceptual-motor stress on a voice recognition system's accuracy: An applied investigation

NASA Astrophysics Data System (ADS)

Poock, G. K.; Martin, B. J.

1984-02-01

This was an applied investigation examining the ability of a speech recognition system to recognize speakers' inputs when the speakers were under different stress levels. Subjects were asked to speak to a voice recognition system under three conditions: (1) normal office environment, (2) emotional stress, and (3) perceptual-motor stress. Results indicate a definite relationship between voice recognition system performance and the type of low stress reference patterns used to achieve recognition.
Benefits for Voice Learning Caused by Concurrent Faces Develop over Time.

PubMed

Zäske, Romi; Mühl, Constanze; Schweinberger, Stefan R

2015-01-01

Recognition of personally familiar voices benefits from the concurrent presentation of the corresponding speakers' faces. This effect of audiovisual integration is most pronounced for voices combined with dynamic articulating faces. However, it is unclear if learning unfamiliar voices also benefits from audiovisual face-voice integration or, alternatively, is hampered by attentional capture of faces, i.e., "face-overshadowing". In six study-test cycles we compared the recognition of newly-learned voices following unimodal voice learning vs. bimodal face-voice learning with either static (Exp. 1) or dynamic articulating faces (Exp. 2). Voice recognition accuracies significantly increased for bimodal learning across study-test cycles while remaining stable for unimodal learning, as reflected in numerical costs of bimodal relative to unimodal voice learning in the first two study-test cycles and benefits in the last two cycles. This was independent of whether faces were static images (Exp. 1) or dynamic videos (Exp. 2). In both experiments, slower reaction times to voices previously studied with faces compared to voices only may result from visual search for faces during memory retrieval. A general decrease of reaction times across study-test cycles suggests facilitated recognition with more speaker repetitions. Overall, our data suggest two simultaneous and opposing mechanisms during bimodal face-voice learning: while attentional capture of faces may initially impede voice learning, audiovisual integration may facilitate it thereafter.
Use of speech-to-text technology for documentation by healthcare providers.

PubMed

Ajami, Sima

2016-01-01

Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Influence of Smartphones and Software on Acoustic Voice Measures

PubMed Central

GRILLO, ELIZABETH U.; BROSIOUS, JENNA N.; SORRELL, STACI L.; ANAND, SUPRAJA

2016-01-01

This study assessed the within-subject variability of voice measures captured using different recording devices (i.e., smartphones and head mounted microphone) and software programs (i.e., Analysis of Dysphonia in Speech and Voice (ADSV), Multi-dimensional Voice Program (MDVP), and Praat). Correlations between the software programs that calculated the voice measures were also analyzed. Results demonstrated no significant within-subject variability across devices and software and that some of the measures were highly correlated across software programs. The study suggests that certain smartphones may be appropriate to record daily voice measures representing the effects of vocal loading within individuals. In addition, even though different algorithms are used to compute voice measures across software programs, some of the programs and measures share a similar relationship. PMID:28775797
Digital signal processing algorithms for automatic voice recognition

NASA Technical Reports Server (NTRS)

Botros, Nazeih M.

1987-01-01

The current digital signal analysis algorithms are investigated that are implemented in automatic voice recognition algorithms. Automatic voice recognition means, the capability of a computer to recognize and interact with verbal commands. The digital signal is focused on, rather than the linguistic, analysis of speech signal. Several digital signal processing algorithms are available for voice recognition. Some of these algorithms are: Linear Predictive Coding (LPC), Short-time Fourier Analysis, and Cepstrum Analysis. Among these algorithms, the LPC is the most widely used. This algorithm has short execution time and do not require large memory storage. However, it has several limitations due to the assumptions used to develop it. The other 2 algorithms are frequency domain algorithms with not many assumptions, but they are not widely implemented or investigated. However, with the recent advances in the digital technology, namely signal processors, these 2 frequency domain algorithms may be investigated in order to implement them in voice recognition. This research is concerned with real time, microprocessor based recognition algorithms.
Memory strength and specificity revealed by pupillometry

PubMed Central

Papesh, Megan H.; Goldinger, Stephen D.; Hout, Michael C.

2011-01-01

Voice-specificity effects in recognition memory were investigated using both behavioral data and pupillometry. Volunteers initially heard spoken words and nonwords in two voices; they later provided confidence-based old/new classifications to items presented in their original voices, changed (but familiar) voices, or entirely new voices. Recognition was more accurate for old-voice items, replicating prior research. Pupillometry was used to gauge cognitive demand during both encoding and testing: Enlarged pupils revealed that participants devoted greater effort to encoding items that were subsequently recognized. Further, pupil responses were sensitive to the cue match between encoding and retrieval voices, as well as memory strength. Strong memories, and those with the closest encoding-retrieval voice matches, resulted in the highest peak pupil diameters. The results are discussed with respect to episodic memory models and Whittlesea’s (1997) SCAPE framework for recognition memory. PMID:22019480
Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

NASA Astrophysics Data System (ADS)

White, R. W.; Parks, D. L.

1985-07-01

A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept.
Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

NASA Technical Reports Server (NTRS)

White, R. W.; Parks, D. L.

1985-01-01

A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept.
Pilot study on the feasibility of a computerized speech recognition charting system.

PubMed

Feldman, C A; Stevens, D

1990-08-01

The objective of this study was to determine the feasibility of developing and using a voice recognition computerized charting system to record dental clinical examination data. More specifically, the study was designed to analyze the time and error differential between the traditional examiner/recorder method (ASSISTANT) and computerized voice recognition method (VOICE). DMFS examinations were performed twice on 20 patients using the traditional ASSISTANT and the VOICE charting system. A statistically significant difference was found when comparing the mean ASSISTANT time of 2.69 min to the VOICE time of 3.72 min (P less than 0.001). No statistically significant difference was found when comparing the mean ASSISTANT recording errors of 0.1 to VOICE recording errors of 0.6 (P = 0.059). 90% of the patients indicated they felt comfortable with the dentist talking to a computer and only 5% of the sample indicated they opposed VOICE. Results from this pilot study indicate that a charting system utilizing voice recognition technology could be considered a viable alternative to traditional examiner/recorder methods of clinical charting.
Familiar Person Recognition: Is Autonoetic Consciousness More Likely to Accompany Face Recognition Than Voice Recognition?

NASA Astrophysics Data System (ADS)

Barsics, Catherine; Brédart, Serge

2010-11-01

Autonoetic consciousness is a fundamental property of human memory, enabling us to experience mental time travel, to recollect past events with a feeling of self-involvement, and to project ourselves in the future. Autonoetic consciousness is a characteristic of episodic memory. By contrast, awareness of the past associated with a mere feeling of familiarity or knowing relies on noetic consciousness, depending on semantic memory integrity. Present research was aimed at evaluating whether conscious recollection of episodic memories is more likely to occur following the recognition of a familiar face than following the recognition of a familiar voice. Recall of semantic information (biographical information) was also assessed. Previous studies that investigated the recall of biographical information following person recognition used faces and voices of famous people as stimuli. In this study, the participants were presented with personally familiar people's voices and faces, thus avoiding the presence of identity cues in the spoken extracts and allowing a stricter control of frequency exposure with both types of stimuli (voices and faces). In the present study, the rate of retrieved episodic memories, associated with autonoetic awareness, was significantly higher from familiar faces than familiar voices even though the level of overall recognition was similar for both these stimuli domains. The same pattern was observed regarding semantic information retrieval. These results and their implications for current Interactive Activation and Competition person recognition models are discussed.
Frequency and analysis of non-clinical errors made in radiology reports using the National Integrated Medical Imaging System voice recognition dictation software.

PubMed

Motyer, R E; Liddy, S; Torreggiani, W C; Buckley, O

2016-11-01

Voice recognition (VR) dictation of radiology reports has become the mainstay of reporting in many institutions worldwide. Despite benefit, such software is not without limitations, and transcription errors have been widely reported. Evaluate the frequency and nature of non-clinical transcription error using VR dictation software. Retrospective audit of 378 finalised radiology reports. Errors were counted and categorised by significance, error type and sub-type. Data regarding imaging modality, report length and dictation time was collected. 67 (17.72 %) reports contained ≥1 errors, with 7 (1.85 %) containing 'significant' and 9 (2.38 %) containing 'very significant' errors. A total of 90 errors were identified from the 378 reports analysed, with 74 (82.22 %) classified as 'insignificant', 7 (7.78 %) as 'significant', 9 (10 %) as 'very significant'. 68 (75.56 %) errors were 'spelling and grammar', 20 (22.22 %) 'missense' and 2 (2.22 %) 'nonsense'. 'Punctuation' error was most common sub-type, accounting for 27 errors (30 %). Complex imaging modalities had higher error rates per report and sentence. Computed tomography contained 0.040 errors per sentence compared to plain film with 0.030. Longer reports had a higher error rate, with reports >25 sentences containing an average of 1.23 errors per report compared to 0-5 sentences containing 0.09. These findings highlight the limitations of VR dictation software. While most error was deemed insignificant, there were occurrences of error with potential to alter report interpretation and patient management. Longer reports and reports on more complex imaging had higher error rates and this should be taken into account by the reporting radiologist.
Robotic air vehicle. Blending artificial intelligence with conventional software

NASA Technical Reports Server (NTRS)

Mcnulty, Christa; Graham, Joyce; Roewer, Paul

1987-01-01

The Robotic Air Vehicle (RAV) system is described. The program's objectives were to design, implement, and demonstrate cooperating expert systems for piloting robotic air vehicles. The development of this system merges conventional programming used in passive navigation with Artificial Intelligence techniques such as voice recognition, spatial reasoning, and expert systems. The individual components of the RAV system are discussed as well as their interactions with each other and how they operate as a system.
Superior voice recognition in a patient with acquired prosopagnosia and object agnosia.

PubMed

Hoover, Adria E N; Démonet, Jean-François; Steeves, Jennifer K E

2010-11-01

Anecdotally, it has been reported that individuals with acquired prosopagnosia compensate for their inability to recognize faces by using other person identity cues such as hair, gait or the voice. Are they therefore superior at the use of non-face cues, specifically voices, to person identity? Here, we empirically measure person and object identity recognition in a patient with acquired prosopagnosia and object agnosia. We quantify person identity (face and voice) and object identity (car and horn) recognition for visual, auditory, and bimodal (visual and auditory) stimuli. The patient is unable to recognize faces or cars, consistent with his prosopagnosia and object agnosia, respectively. He is perfectly able to recognize people's voices and car horns and bimodal stimuli. These data show a reverse shift in the typical weighting of visual over auditory information for audiovisual stimuli in a compromised visual recognition system. Moreover, the patient shows selectively superior voice recognition compared to the controls revealing that two different stimulus domains, persons and objects, may not be equally affected by sensory adaptation effects. This also implies that person and object identity recognition are processed in separate pathways. These data demonstrate that an individual with acquired prosopagnosia and object agnosia can compensate for the visual impairment and become quite skilled at using spared aspects of sensory processing. In the case of acquired prosopagnosia it is advantageous to develop a superior use of voices for person identity recognition in everyday life. Copyright © 2010 Elsevier Ltd. All rights reserved.
Do What I Say! Voice Recognition Makes Major Advances.

ERIC Educational Resources Information Center

Ruley, C. Dorsey

1994-01-01

Explains voice recognition technology applications in the workplace, schools, and libraries. Highlights include a voice-controlled work station using the DragonDictate system that can be used with dyslexic students, converting text to speech, and converting speech to text. (LRW)
The recognition of female voice based on voice registers in singing techniques in real-time using hankel transform method and macdonald function

NASA Astrophysics Data System (ADS)

Meiyanti, R.; Subandi, A.; Fuqara, N.; Budiman, M. A.; Siahaan, A. P. U.

2018-03-01

A singer doesn’t just recite the lyrics of a song, but also with the use of particular sound techniques to make it more beautiful. In the singing technique, more female have a diverse sound registers than male. There are so many registers of the human voice, but the voice registers used while singing, among others, Chest Voice, Head Voice, Falsetto, and Vocal fry. Research of speech recognition based on the female’s voice registers in singing technique is built using Borland Delphi 7.0. Speech recognition process performed by the input recorded voice samples and also in real time. Voice input will result in weight energy values based on calculations using Hankel Transformation method and Macdonald Functions. The results showed that the accuracy of the system depends on the accuracy of sound engineering that trained and tested, and obtained an average percentage of the successful introduction of the voice registers record reached 48.75 percent, while the average percentage of the successful introduction of the voice registers in real time to reach 57 percent.
Evaluation of Speech Recognition of Cochlear Implant Recipients Using Adaptive, Digital Remote Microphone Technology and a Speech Enhancement Sound Processing Algorithm.

PubMed

Wolfe, Jace; Morais, Mila; Schafer, Erin; Agrawal, Smita; Koch, Dawn

2015-05-01

Cochlear implant recipients often experience difficulty with understanding speech in the presence of noise. Cochlear implant manufacturers have developed sound processing algorithms designed to improve speech recognition in noise, and research has shown these technologies to be effective. Remote microphone technology utilizing adaptive, digital wireless radio transmission has also been shown to provide significant improvement in speech recognition in noise. There are no studies examining the potential improvement in speech recognition in noise when these two technologies are used simultaneously. The goal of this study was to evaluate the potential benefits and limitations associated with the simultaneous use of a sound processing algorithm designed to improve performance in noise (Advanced Bionics ClearVoice) and a remote microphone system that incorporates adaptive, digital wireless radio transmission (Phonak Roger). A two-by-two way repeated measures design was used to examine performance differences obtained without these technologies compared to the use of each technology separately as well as the simultaneous use of both technologies. Eleven Advanced Bionics (AB) cochlear implant recipients, ages 11 to 68 yr. AzBio sentence recognition was measured in quiet and in the presence of classroom noise ranging in level from 50 to 80 dBA in 5-dB steps. Performance was evaluated in four conditions: (1) No ClearVoice and no Roger, (2) ClearVoice enabled without the use of Roger, (3) ClearVoice disabled with Roger enabled, and (4) simultaneous use of ClearVoice and Roger. Speech recognition in quiet was better than speech recognition in noise for all conditions. Use of ClearVoice and Roger each provided significant improvement in speech recognition in noise. The best performance in noise was obtained with the simultaneous use of ClearVoice and Roger. ClearVoice and Roger technology each improves speech recognition in noise, particularly when used at the same time. Because ClearVoice does not degrade performance in quiet settings, clinicians should consider recommending ClearVoice for routine, full-time use for AB implant recipients. Roger should be used in all instances in which remote microphone technology may assist the user in understanding speech in the presence of noise. American Academy of Audiology.
Voice gender and the segregation of competing talkers: Perceptual learning in cochlear implant simulations

PubMed Central

Sullivan, Jessica R.; Assmann, Peter F.; Hossain, Shaikat; Schafer, Erin C.

2017-01-01

Two experiments explored the role of differences in voice gender in the recognition of speech masked by a competing talker in cochlear implant simulations. Experiment 1 confirmed that listeners with normal hearing receive little benefit from differences in voice gender between a target and masker sentence in four- and eight-channel simulations, consistent with previous findings that cochlear implants deliver an impoverished representation of the cues for voice gender. However, gender differences led to small but significant improvements in word recognition with 16 and 32 channels. Experiment 2 assessed the benefits of perceptual training on the use of voice gender cues in an eight-channel simulation. Listeners were assigned to one of four groups: (1) word recognition training with target and masker differing in gender; (2) word recognition training with same-gender target and masker; (3) gender recognition training; or (4) control with no training. Significant improvements in word recognition were observed from pre- to post-test sessions for all three training groups compared to the control group. These improvements were maintained at the late session (one week following the last training session) for all three groups. There was an overall improvement in masked word recognition performance provided by gender mismatch following training, but the amount of benefit did not differ as a function of the type of training. The training effects observed here are consistent with a form of rapid perceptual learning that contributes to the segregation of competing voices but does not specifically enhance the benefits provided by voice gender cues. PMID:28372046
Voice, Schooling, Inequality, and Scale

ERIC Educational Resources Information Center

Collins, James

2013-01-01

The rich studies in this collection show that the investigation of voice requires analysis of "recognition" across layered spatial-temporal and sociolinguistic scales. I argue that the concepts of voice, recognition, and scale provide insight into contemporary educational inequality and that their study benefits, in turn, from paying attention to…

Motorcycle Start-stop System based on Intelligent Biometric Voice Recognition

NASA Astrophysics Data System (ADS)

Winda, A.; E Byan, W. R.; Sofyan; Armansyah; Zariantin, D. L.; Josep, B. G.

2017-03-01

Current mechanical key in the motorcycle is prone to bulgary, being stolen or misplaced. Intelligent biometric voice recognition as means to replace this mechanism is proposed as an alternative. The proposed system will decide whether the voice is belong to the user or not and the word utter by the user is ‘On’ or ‘Off’. The decision voice will be sent to Arduino in order to start or stop the engine. The recorded voice is processed in order to get some features which later be used as input to the proposed system. The Mel-Frequency Ceptral Coefficient (MFCC) is adopted as a feature extraction technique. The extracted feature is the used as input to the SVM-based identifier. Experimental results confirm the effectiveness of the proposed intelligent voice recognition and word recognition system. It show that the proposed method produces a good training and testing accuracy, 99.31% and 99.43%, respectively. Moreover, the proposed system shows the performance of false rejection rate (FRR) and false acceptance rate (FAR) accuracy of 0.18% and 17.58%, respectively. In the intelligent word recognition shows that the training and testing accuracy are 100% and 96.3%, respectively.
Voice tracking and spoken word recognition in the presence of other voices

NASA Astrophysics Data System (ADS)

Litong-Palima, Marisciel; Violanda, Renante; Saloma, Caesar

2004-12-01

We study the human hearing process by modeling the hair cell as a thresholded Hopf bifurcator and compare our calculations with experimental results involving human subjects in two different multi-source listening tasks of voice tracking and spoken-word recognition. In the model, we observed noise suppression by destructive interference between noise sources which weakens the effective noise strength acting on the hair cell. Different success rate characteristics were observed for the two tasks. Hair cell performance at low threshold levels agree well with results from voice-tracking experiments while those of word-recognition experiments are consistent with a linear model of the hearing process. The ability of humans to track a target voice is robust against cross-talk interference unlike word-recognition performance which deteriorates quickly with the number of uncorrelated noise sources in the environment which is a response behavior that is associated with linear systems.
Cockpit voice recognition program at Princeton University

NASA Technical Reports Server (NTRS)

Huang, C. Y.

1983-01-01

Voice recognition technology (VRT) is applied to aeronautics, particularly on the pilot workload alleviation. The VRT does not have to prove its maturity any longer. The feasibility of voice tuning of radio and DME are demonstrated since there are immediate advantages to the pilot and can be completed in a reasonable time.
The processing of auditory and visual recognition of self-stimuli.

PubMed

Hughes, Susan M; Nicholson, Shevon E

2010-12-01

This study examined self-recognition processing in both the auditory and visual modalities by determining how comparable hearing a recording of one's own voice was to seeing photograph of one's own face. We also investigated whether the simultaneous presentation of auditory and visual self-stimuli would either facilitate or inhibit self-identification. Ninety-one participants completed reaction-time tasks of self-recognition when presented with their own faces, own voices, and combinations of the two. Reaction time and errors made when responding with both the right and left hand were recorded to determine if there were lateralization effects on these tasks. Our findings showed that visual self-recognition for facial photographs appears to be superior to auditory self-recognition for voice recordings. Furthermore, a combined presentation of one's own face and voice appeared to inhibit rather than facilitate self-recognition and there was a left-hand advantage for reaction time on the combined-presentation tasks. Copyright © 2010 Elsevier Inc. All rights reserved.
Impact of a voice recognition system on report cycle time and radiologist reading time

NASA Astrophysics Data System (ADS)

Melson, David L.; Brophy, Robert; Blaine, G. James; Jost, R. Gilbert; Brink, Gary S.

1998-07-01

Because of its exciting potential to improve clinical service, as well as reduce costs, a voice recognition system for radiological dictation was recently installed at our institution. This system will be clinically successful if it dramatically reduces radiology report turnaround time without substantially affecting radiologist dictation and editing time. This report summarizes an observer study currently under way in which radiologist reporting times using the traditional transcription system and the voice recognition system are compared. Four radiologists are observed interpreting portable intensive care unit (ICU) chest examinations at a workstation in the chest reading area. Data are recorded with the radiologists using the transcription system and using the voice recognition system. The measurements distinguish between time spent performing clerical tasks and time spent actually dictating the report. Editing time and the number of corrections made are recorded. Additionally, statistics are gathered to assess the voice recognition system's impact on the report cycle time -- the time from report dictation to availability of an edited and finalized report -- and the length of reports.
Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.

ERIC Educational Resources Information Center

Horn, Carin E.; Scott, Brian L.

A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…
Investigations of Hemispheric Specialization of Self-Voice Recognition

ERIC Educational Resources Information Center

Rosa, Christine; Lassonde, Maryse; Pinard, Claudine; Keenan, Julian Paul; Belin, Pascal

2008-01-01

Three experiments investigated functional asymmetries related to self-recognition in the domain of voices. In Experiment 1, participants were asked to identify one of three presented voices (self, familiar or unknown) by responding with either the right or the left-hand. In Experiment 2, participants were presented with auditory morphs between the…
[Computed assisted voice recognition. A dream or reality in the pathologist's routine work?].

PubMed

Delling, G; Delling, D

1999-03-01

During the last 30 years the analysis of human speech with powerful computers has taken great strides; therefore, cost-effective, comfortable solutions are now available for use in professional routine work. The advantages of using voice recognition are the creation of new documentation or archives, reduced personnel costs and, last but not least, independence in cases of unforeseen notification of illness or owing to annual leave. For voice recognition systems to be used easily, a considerable amount of time must be invested for the first 3 months. Younger colleagues in particular will be more motivated to dictate more precisely and more detailed because of the introduction of voice recognition. The effects on other sectors of medical training, quality control, histology report preparation, and transmission can only be speculated.
The program complex for vocal recognition

NASA Astrophysics Data System (ADS)

Konev, Anton; Kostyuchenko, Evgeny; Yakimuk, Alexey

2017-01-01

This article discusses the possibility of applying the algorithm of determining the pitch frequency for the note recognition problems. Preliminary study of programs-analogues were carried out for programs with function “recognition of the music”. The software package based on the algorithm for pitch frequency calculation was implemented and tested. It was shown that the algorithm allows recognizing the notes in the vocal performance of the user. A single musical instrument, a set of musical instruments, and a human voice humming a tune can be the sound source. The input file is initially presented in the .wav format or is recorded in this format from a microphone. Processing is performed by sequentially determining the pitch frequency and conversion of its values to the note. According to test results, modification of algorithms used in the complex was planned.
Normal voice processing after posterior superior temporal sulcus lesion.

PubMed

Jiahui, Guo; Garrido, Lúcia; Liu, Ran R; Susilo, Tirta; Barton, Jason J S; Duchaine, Bradley

2017-10-01

The right posterior superior temporal sulcus (pSTS) shows a strong response to voices, but the cognitive processes generating this response are unclear. One possibility is that this activity reflects basic voice processing. However, several fMRI and magnetoencephalography findings suggest instead that pSTS serves as an integrative hub that combines voice and face information. Here we investigate whether right pSTS contributes to basic voice processing by testing Faith, a patient whose right pSTS was resected, with eight behavioral tasks assessing voice identity perception and recognition, voice sex perception, and voice expression perception. Faith performed normally on all the tasks. Her normal performance indicates right pSTS is not necessary for intact voice recognition and suggests that pSTS activations to voices reflect higher-level processes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.

PubMed

Hartman, Douglas J

2015-06-01

Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2015 Elsevier Inc. All rights reserved.
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.

PubMed

Hartman, Douglas J

2016-03-01

Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2016 Elsevier Inc. All rights reserved.
Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair

NASA Astrophysics Data System (ADS)

Sasou, Akira; Kojima, Hiroaki

2009-12-01

Conventional voice-driven wheelchairs usually employ headset microphones that are capable of achieving sufficient recognition accuracy, even in the presence of surrounding noise. However, such interfaces require users to wear sensors such as a headset microphone, which can be an impediment, especially for the hand disabled. Conversely, it is also well known that the speech recognition accuracy drastically degrades when the microphone is placed far from the user. In this paper, we develop a noise robust speech recognition system for a voice-driven wheelchair. This system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors. We verified the effectiveness of our system in experiments in different environments, and confirmed that our system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors.
Impact of Fetal-Neonatal Iron Deficiency on Recognition Memory at 2 Months of Age.

PubMed

Geng, Fengji; Mai, Xiaoqin; Zhan, Jianying; Xu, Lin; Zhao, Zhengyan; Georgieff, Michael; Shao, Jie; Lozoff, Betsy

2015-12-01

To assess the effects of fetal-neonatal iron deficiency on recognition memory in early infancy. Perinatal iron deficiency delays or disrupts hippocampal development in animal models and thus may impair related neural functions in human infants, such as recognition memory. Event-related potentials were used in an auditory recognition memory task to compare 2-month-old Chinese infants with iron sufficiency or deficiency at birth. Fetal-neonatal iron deficiency was defined 2 ways: high zinc protoporphyrin/heme ratio (ZPP/H > 118 μmol/mol) or low serum ferritin (<75 μg/L) in cord blood. Late slow wave was used to measure infant recognition of mother's voice. Event related potentials patterns differed significantly for fetal-neonatal iron deficiency as defined by high cord ZPP/H but not low ferritin. Comparing 35 infants with iron deficiency (ZPP/H > 118 μmol/mol) to 92 with lower ZPP/H (iron-sufficient), only infants with iron sufficiency showed larger late slow wave amplitude for stranger's voice than mother's voice in frontal-central and parietal-occipital locations, indicating the recognition of mother's voice. Infants with iron sufficiency showed electrophysiological evidence of recognizing their mother's voice, whereas infants with fetal-neonatal iron deficiency did not. Their poorer auditory recognition memory at 2 months of age is consistent with effects of fetal-neonatal iron deficiency on the developing hippocampus. Copyright © 2015 Elsevier Inc. All rights reserved.
Speech therapy and voice recognition instrument

NASA Technical Reports Server (NTRS)

Cohen, J.; Babcock, M. L.

1972-01-01

Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.
In the Beginning Was the Familiar Voice Personally Familiar Voices in the Evolutionary and Contemporary Biology of Communication

PubMed Central

Sidtis, Diana; Kreiman, Jody

2011-01-01

The human voice is described in dialogic linguistics as an embodiment of self in a social context, contributing to expression, perception and mutual exchange of self, consciousness, inner life, and personhood. While these approaches are subjective and arise from phenomenological perspectives, scientific facts about personal vocal identity, and its role in biological development, support these views. It is our purpose to review studies of the biology of personal vocal identity -- the familiar voice pattern-- as providing an empirical foundation for the view that the human voice is an embodiment of self in the social context. Recent developments in the biology and evolution of communication are concordant with these notions, revealing that familiar voice recognition (also known as vocal identity recognition or individual vocal recognition) or contributed to survival in the earliest vocalizing species. Contemporary ethology documents the crucial role of familiar voices across animal species in signaling and perceiving internal states and personal identities. Neuropsychological studies of voice reveal multimodal cerebral associations arising across brain structures involved in memory, emotion, attention, and arousal in vocal perception and production, such that the voice represents the whole person. Although its roots are in evolutionary biology, human competence for processing layered social and personal meanings in the voice, as well as personal identity in a large repertory of familiar voice patterns, has achieved an immense sophistication. PMID:21710374
Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

PubMed

Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

2018-05-01

Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Writing with Voice: An Investigation of the Use of a Voice Recognition System as a Writing Aid for a Man with Aphasia

ERIC Educational Resources Information Center

Bruce, Carolyn; Edmundson, Anne; Coleman, Michael

2003-01-01

Background: People with aphasia may experience difficulties that prevent them from demonstrating in writing what they know and can produce orally. Voice recognition systems that allow the user to speak into a microphone and see their words appear on a computer screen have the potential to assist written communication. Aim: This study investigated…
Cost-sensitive learning for emotion robust speaker recognition.

PubMed

Li, Dongdong; Yang, Yingchun; Dai, Weihui

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Cost-Sensitive Learning for Emotion Robust Speaker Recognition

PubMed Central

Li, Dongdong; Yang, Yingchun

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492

The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

PubMed

Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J

2004-09-01

The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.
DLMS Voice Data Entry.

DTIC Science & Technology

1980-06-01

34 LIST OF ILLUSTRATIONS FIGURE PAGE 1 Block Diagram of DLMS Voice Recognition System .............. S 2 Flowchart of DefaulV...particular are a speech preprocessor and a minicomputer. In the VRS, as shown in the block diagram of Fig. 1, the preprocessor is a TTI model 8040 and...Data General 6026 Magnetic Zo 4 Tape Unit Display L-- - Equipment Cabinet Fig. 1 block Diagram of DIMS Voice Recognition System qS 2. Flexible Disk
``The perceptual bases of speaker identity'' revisited

NASA Astrophysics Data System (ADS)

Voiers, William D.

2003-10-01

A series of experiments begun 40 years ago [W. D. Voiers, J. Acoust. Soc. Am. 36, 1065-1073 (1964)] was concerned with identifying the perceived voice traits (PVTs) on which human recognition of voices depends. It culminated with the development of a voice taxonomy based on 20 PVTs and a set of highly reliable rating scales for classifying voices with respect to those PVTs. The development of a perceptual voice taxonomy was motivated by the need for a practical method of evaluating speaker recognizability in voice communication systems. The Diagnostic Speaker Recognition Test (DSRT) evaluates the effects of systems on speaker recognizability as reflected in changes in the inter-listener reliability of voice ratings on the 20 PVTs. The DSRT thus provides a qualitative, as well as quantitative, evaluation of the effects of a system on speaker recognizability. A fringe benefit of this project is PVT rating data for a sample of 680 voices. [Work partially supported by USAFRL.
The Cambridge Mindreading (CAM) Face-Voice Battery: Testing complex emotion recognition in adults with and without Asperger syndrome.

PubMed

Golan, Ofer; Baron-Cohen, Simon; Hill, Jacqueline

2006-02-01

Adults with Asperger Syndrome (AS) can recognise simple emotions and pass basic theory of mind tasks, but have difficulties recognising more complex emotions and mental states. This study describes a new battery of tasks, testing recognition of 20 complex emotions and mental states from faces and voices. The battery was given to males and females with AS and matched controls. Results showed the AS group performed worse than controls overall, on emotion recognition from faces and voices and on 12/20 specific emotions. Females recognised faces better than males regardless of diagnosis, and males with AS had more difficulties recognising emotions from faces than from voices. The implications of these results are discussed in relation to social functioning in AS.
Applications of artificial intelligence to space station: General purpose intelligent sensor interface

NASA Technical Reports Server (NTRS)

Mckee, James W.

1988-01-01

This final report describes the accomplishments of the General Purpose Intelligent Sensor Interface task of the Applications of Artificial Intelligence to Space Station grant for the period from October 1, 1987 through September 30, 1988. Portions of the First Biannual Report not revised will not be included but only referenced. The goal is to develop an intelligent sensor system that will simplify the design and development of expert systems using sensors of the physical phenomena as a source of data. This research will concentrate on the integration of image processing sensors and voice processing sensors with a computer designed for expert system development. The result of this research will be the design and documentation of a system in which the user will not need to be an expert in such areas as image processing algorithms, local area networks, image processor hardware selection or interfacing, television camera selection, voice recognition hardware selection, or analog signal processing. The user will be able to access data from video or voice sensors through standard LISP statements without any need to know about the sensor hardware or software.
Is it me? Self-recognition bias across sensory modalities and its relationship to autistic traits.

PubMed

Chakraborty, Anya; Chakrabarti, Bhismadev

2015-01-01

Atypical self-processing is an emerging theme in autism research, suggested by lower self-reference effect in memory, and atypical neural responses to visual self-representations. Most research on physical self-processing in autism uses visual stimuli. However, the self is a multimodal construct, and therefore, it is essential to test self-recognition in other sensory modalities as well. Self-recognition in the auditory modality remains relatively unexplored and has not been tested in relation to autism and related traits. This study investigates self-recognition in auditory and visual domain in the general population and tests if it is associated with autistic traits. Thirty-nine neurotypical adults participated in a two-part study. In the first session, individual participant's voice was recorded and face was photographed and morphed respectively with voices and faces from unfamiliar identities. In the second session, participants performed a 'self-identification' task, classifying each morph as 'self' voice (or face) or an 'other' voice (or face). All participants also completed the Autism Spectrum Quotient (AQ). For each sensory modality, slope of the self-recognition curve was used as individual self-recognition metric. These two self-recognition metrics were tested for association between each other, and with autistic traits. Fifty percent 'self' response was reached for a higher percentage of self in the auditory domain compared to the visual domain (t = 3.142; P < 0.01). No significant correlation was noted between self-recognition bias across sensory modalities (τ = -0.165, P = 0.204). Higher recognition bias for self-voice was observed in individuals higher in autistic traits (τ AQ = 0.301, P = 0.008). No such correlation was observed between recognition bias for self-face and autistic traits (τ AQ = -0.020, P = 0.438). Our data shows that recognition bias for physical self-representation is not related across sensory modalities. Further, individuals with higher autistic traits were better able to discriminate self from other voices, but this relation was not observed with self-face. A narrow self-other overlap in the auditory domain seen in individuals with high autistic traits could arise due to enhanced perceptual processing of auditory stimuli often observed in individuals with autism.
Task-Oriented, Naturally Elicited Speech (TONE) Database for the Force Requirements Expert System, Hawaii (FRESH)

DTIC Science & Technology

1988-09-01

Group Subgroup Command and control; Computational linguistics; expert system voice recognition; man- machine interface; U.S. Government 19 Abstract...simulates the characteristics of FRESH on a smaller scale. This study assisted NOSC in developing a voice-recognition, man- machine interface that could...scale. This study assisted NOSC in developing a voice-recogni- tion, man- machine interface that could be used with TONE and upgraded at a later date
Gender recognition from vocal source

NASA Astrophysics Data System (ADS)

Sorokin, V. N.; Makarov, I. S.

2008-07-01

Efficiency of automatic recognition of male and female voices based on solving the inverse problem for glottis area dynamics and for waveform of the glottal airflow volume velocity pulse is studied. The inverse problem is regularized through the use of analytical models of the voice excitation pulse and of the dynamics of the glottis area, as well as the model of one-dimensional glottal airflow. Parameters of these models and spectral parameters of the volume velocity pulse are considered. The following parameters are found to be most promising: the instant of maximum glottis area, the maximum derivative of the area, the slope of the spectrum of the glottal airflow volume velocity pulse, the amplitude ratios of harmonics of this spectrum, and the pitch. On the plane of the first two main components in the space of these parameters, an almost twofold decrease in the classification error relative to that for the pitch alone is attained. The male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%.
Recognition of voice commands using adaptation of foreign language speech recognizer via selection of phonetic transcriptions

NASA Astrophysics Data System (ADS)

Maskeliunas, Rytis; Rudzionis, Vytautas

2011-06-01

In recent years various commercial speech recognizers have become available. These recognizers provide the possibility to develop applications incorporating various speech recognition techniques easily and quickly. All of these commercial recognizers are typically targeted to widely spoken languages having large market potential; however, it may be possible to adapt available commercial recognizers for use in environments where less widely spoken languages are used. Since most commercial recognition engines are closed systems the single avenue for the adaptation is to try set ways for the selection of proper phonetic transcription methods between the two languages. This paper deals with the methods to find the phonetic transcriptions for Lithuanian voice commands to be recognized using English speech engines. The experimental evaluation showed that it is possible to find phonetic transcriptions that will enable the recognition of Lithuanian voice commands with recognition accuracy of over 90%.
Audiovisual speech facilitates voice learning.

PubMed

Sheffert, Sonya M; Olson, Elizabeth

2004-02-01

In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.
Generation of surgical pathology report using a 5,000-word speech recognizer.

PubMed

Tischler, A S; Martin, M R

1989-10-01

Pressures to decrease both turnaround time and operating costs simultaneously have placed conflicting demands on traditional forms of medical transcription. The new technology of voice recognition extends the promise of enabling the pathologist or other medical professional to dictate a correct report and have it printed and/or transmitted to a database immediately. The usefulness of voice recognition systems depends on several factors, including ease of use, reliability, speed, and accuracy. These in turn depend on the general underlying design of the systems and inclusion in the systems of a specific knowledge base appropriate for each application. Development of a good knowledge base requires close collaboration between a domain expert and a knowledge engineer with expertise in voice recognition. The authors have recently completed a knowledge base for surgical pathology using the Kurzweil VoiceReport 5,000-word system.
Understanding the mechanisms of familiar voice-identity recognition in the human brain.

PubMed

Maguinness, Corrina; Roswandowitz, Claudia; von Kriegstein, Katharina

2018-03-31

Humans have a remarkable skill for voice-identity recognition: most of us can remember many voices that surround us as 'unique'. In this review, we explore the computational and neural mechanisms which may support our ability to represent and recognise a unique voice-identity. We examine the functional architecture of voice-sensitive regions in the superior temporal gyrus/sulcus, and bring together findings on how these regions may interact with each other, and additional face-sensitive regions, to support voice-identity processing. We also contrast findings from studies on neurotypicals and clinical populations which have examined the processing of familiar and unfamiliar voices. Taken together, the findings suggest that representations of familiar and unfamiliar voices might dissociate in the human brain. Such an observation does not fit well with current models for voice-identity processing, which by-and-large assume a common sequential analysis of the incoming voice signal, regardless of voice familiarity. We provide a revised audio-visual integrative model of voice-identity processing which brings together traditional and prototype models of identity processing. This revised model includes a mechanism of how voice-identity representations are established and provides a novel framework for understanding and examining the potential differences in familiar and unfamiliar voice processing in the human brain. Copyright © 2018 Elsevier Ltd. All rights reserved.
Definition of problems of persons in sheltered care environments

NASA Technical Reports Server (NTRS)

Fetzner, W. N.

1979-01-01

Innovations in health care using aerospace technologies are described. Voice synthesizer and voice recognition technologies were used in developing voice controlled wheel chairs and optacons. Telephone interface modules are also described.
Speech recognition systems on the Cell Broadband Engine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Y; Jones, H; Vaidya, S

In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less
The cognitive neuroscience of person identification.

PubMed

Biederman, Irving; Shilowich, Bryan E; Herald, Sarah B; Margalit, Eshed; Maarek, Rafael; Meschke, Emily X; Hacker, Catrina M

2018-02-14

We compare and contrast five differences between person identification by voice and face. 1. There is little or no cost when a familiar face is to be recognized from an unrestricted set of possible faces, even at Rapid Serial Visual Presentation (RSVP) rates, but the accuracy of familiar voice recognition declines precipitously when the set of possible speakers is increased from one to a mere handful. 2. Whereas deficits in face recognition are typically perceptual in origin, those with normal perception of voices can manifest severe deficits in their identification. 3. Congenital prosopagnosics (CPros) and congenital phonagnosics (CPhon) are generally unable to imagine familiar faces and voices, respectively. Only in CPros, however, is this deficit a manifestation of a general inability to form visual images of any kind. CPhons report no deficit in imaging non-voice sounds. 4. The prevalence of CPhons of 3.2% is somewhat higher than the reported prevalence of approximately 2.0% for CPros in the population. There is evidence that CPhon represents a distinct condition statistically and not just normal variation. 5. Face and voice recognition proficiency are uncorrelated rather than reflecting limitations of a general capacity for person individuation. Copyright © 2018 Elsevier Ltd. All rights reserved.
The Glasgow Voice Memory Test: Assessing the ability to memorize and recognize unfamiliar voices.

PubMed

Aglieri, Virginia; Watson, Rebecca; Pernet, Cyril; Latinus, Marianne; Garrido, Lúcia; Belin, Pascal

2017-02-01

One thousand one hundred and twenty subjects as well as a developmental phonagnosic subject (KH) along with age-matched controls performed the Glasgow Voice Memory Test, which assesses the ability to encode and immediately recognize, through an old/new judgment, both unfamiliar voices (delivered as vowels, making language requirements minimal) and bell sounds. The inclusion of non-vocal stimuli allows the detection of significant dissociations between the two categories (vocal vs. non-vocal stimuli). The distributions of accuracy and sensitivity scores (d') reflected a wide range of individual differences in voice recognition performance in the population. As expected, KH showed a dissociation between the recognition of voices and bell sounds, her performance being significantly poorer than matched controls for voices but not for bells. By providing normative data of a large sample and by testing a developmental phonagnosic subject, we demonstrated that the Glasgow Voice Memory Test, available online and accessible from all over the world, can be a valid screening tool (~5 min) for a preliminary detection of potential cases of phonagnosia and of "super recognizers" for voices.
Implicit multisensory associations influence voice recognition.

PubMed

von Kriegstein, Katharina; Giraud, Anne-Lise

2006-10-01

Natural objects provide partially redundant information to the brain through different sensory modalities. For example, voices and faces both give information about the speech content, age, and gender of a person. Thanks to this redundancy, multimodal recognition is fast, robust, and automatic. In unimodal perception, however, only part of the information about an object is available. Here, we addressed whether, even under conditions of unimodal sensory input, crossmodal neural circuits that have been shaped by previous associative learning become activated and underpin a performance benefit. We measured brain activity with functional magnetic resonance imaging before, while, and after participants learned to associate either sensory redundant stimuli, i.e. voices and faces, or arbitrary multimodal combinations, i.e. voices and written names, ring tones, and cell phones or brand names of these cell phones. After learning, participants were better at recognizing unimodal auditory voices that had been paired with faces than those paired with written names, and association of voices with faces resulted in an increased functional coupling between voice and face areas. No such effects were observed for ring tones that had been paired with cell phones or names. These findings demonstrate that brief exposure to ecologically valid and sensory redundant stimulus pairs, such as voices and faces, induces specific multisensory associations. Consistent with predictive coding theories, associative representations become thereafter available for unimodal perception and facilitate object recognition. These data suggest that for natural objects effective predictive signals can be generated across sensory systems and proceed by optimization of functional connectivity between specialized cortical sensory modules.
Artificially intelligent recognition of Arabic speaker using voice print-based local features

NASA Astrophysics Data System (ADS)

Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz

2016-11-01

Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.
Emotional Recognition in Autism Spectrum Conditions from Voices and Faces

ERIC Educational Resources Information Center

Stewart, Mary E.; McAdam, Clair; Ota, Mitsuhiko; Peppe, Sue; Cleland, Joanne

2013-01-01

The present study reports on a new vocal emotion recognition task and assesses whether people with autism spectrum conditions (ASC) perform differently from typically developed individuals on tests of emotional identification from both the face and the voice. The new test of vocal emotion contained trials in which the vocal emotion of the sentence…
Cultural In-Group Advantage: Emotion Recognition in African American and European American Faces and Voices

ERIC Educational Resources Information Center

Wickline, Virginia B.; Bailey, Wendy; Nowicki, Stephen

2009-01-01

The authors explored whether there were in-group advantages in emotion recognition of faces and voices by culture or geographic region. Participants were 72 African American students (33 men, 39 women), 102 European American students (30 men, 72 women), 30 African international students (16 men, 14 women), and 30 European international students…

Keys to the Adoption and Use of Voice Recognition Technology in Organizations.

ERIC Educational Resources Information Center

Goette, Tanya

2000-01-01

Presents results from a field study of individuals with disabilities who used voice recognition technology (VRT). Results indicated that task-technology fit, training, the environment, and disability limitations were the differentiating items, and that using VRT for a trial period may be the major factor in successful adoption of the technology.…
Practical applications of interactive voice technologies: Some accomplishments and prospects

NASA Technical Reports Server (NTRS)

Grady, Michael W.; Hicklin, M. B.; Porter, J. E.

1977-01-01

A technology assessment of the application of computers and electronics to complex systems is presented. Three existing systems which utilize voice technology (speech recognition and speech generation) are described. Future directions in voice technology are also described.
Educational Technology and Student Voice: Examining Teacher Candidates' Perceptions

ERIC Educational Resources Information Center

Byker, Erik Jon; Putman, S. Michael; Handler, Laura; Polly, Drew

2017-01-01

Student Voice is a term that honors the participatory roles that students have when they enter learning spaces like classrooms. Student Voice is the recognition of students' choice, creativity, and freedom. Seminal educationists--like Dewey and Montessori--centered the purposes of education in the flourishing and valuing of Student Voice. This…
Age- and gender-related variations of emotion recognition in pseudowords and faces.

PubMed

Demenescu, Liliana R; Mathiak, Krystyna A; Mathiak, Klaus

2014-01-01

BACKGROUND/STUDY CONTEXT: The ability to interpret emotionally salient stimuli is an important skill for successful social functioning at any age. The objective of the present study was to disentangle age and gender effects on emotion recognition ability in voices and faces. Three age groups of participants (young, age range: 18-35 years; middle-aged, age range: 36-55 years; and older, age range: 56-75 years) identified basic emotions presented in voices and faces in a forced-choice paradigm. Five emotions (angry, fearful, sad, disgusted, and happy) and a nonemotional category (neutral) were shown as encoded in color photographs of facial expressions and pseudowords spoken in affective prosody. Overall, older participants had a lower accuracy rate in categorizing emotions than young and middle-aged participants. Females performed better than males in recognizing emotions from voices, and this gender difference emerged in middle-aged and older participants. The performance of emotion recognition in faces was significantly correlated with the performance in voices. The current study provides further evidence for a general age and gender effect on emotion recognition; the advantage of females seems to be age- and stimulus modality-dependent.
Implicit Multisensory Associations Influence Voice Recognition

PubMed Central

von Kriegstein, Katharina; Giraud, Anne-Lise

2006-01-01

Natural objects provide partially redundant information to the brain through different sensory modalities. For example, voices and faces both give information about the speech content, age, and gender of a person. Thanks to this redundancy, multimodal recognition is fast, robust, and automatic. In unimodal perception, however, only part of the information about an object is available. Here, we addressed whether, even under conditions of unimodal sensory input, crossmodal neural circuits that have been shaped by previous associative learning become activated and underpin a performance benefit. We measured brain activity with functional magnetic resonance imaging before, while, and after participants learned to associate either sensory redundant stimuli, i.e. voices and faces, or arbitrary multimodal combinations, i.e. voices and written names, ring tones, and cell phones or brand names of these cell phones. After learning, participants were better at recognizing unimodal auditory voices that had been paired with faces than those paired with written names, and association of voices with faces resulted in an increased functional coupling between voice and face areas. No such effects were observed for ring tones that had been paired with cell phones or names. These findings demonstrate that brief exposure to ecologically valid and sensory redundant stimulus pairs, such as voices and faces, induces specific multisensory associations. Consistent with predictive coding theories, associative representations become thereafter available for unimodal perception and facilitate object recognition. These data suggest that for natural objects effective predictive signals can be generated across sensory systems and proceed by optimization of functional connectivity between specialized cortical sensory modules. PMID:17002519
Voice parameters and videonasolaryngoscopy in children with vocal nodules: a longitudinal study, before and after voice therapy.

PubMed

Valadez, Victor; Ysunza, Antonio; Ocharan-Hernandez, Esther; Garrido-Bustamante, Norma; Sanchez-Valerio, Araceli; Pamplona, Ma C

2012-09-01

Vocal Nodules (VN) are a functional voice disorder associated with voice misuse and abuse in children. There are few reports addressing vocal parameters in children with VN, especially after a period of vocal rehabilitation. The purpose of this study is to describe measurements of vocal parameters including Fundamental Frequency (FF), Shimmer (S), and Jitter (J), videonasolaryngoscopy examination and clinical perceptual assessment, before and after voice therapy in children with VN. Voice therapy was provided using visual support through Speech-Viewer software. Twenty patients with VN were studied. An acoustical analysis of voice was performed and compared with data from subjects from a control group matched by age and gender. Also, clinical perceptual assessment of voice and videonasolaryngoscopy were performed to all patients with VN. After a period of voice therapy, provided with visual support using Speech Viewer-III (SV-III-IBM) software, new acoustical analyses, perceptual assessments and videonasolaryngoscopies were performed. Before the onset of voice therapy, there was a significant difference (p<0.05) in mean FF, S and J, between the patients with VN and subjects from the control group. After the voice therapy period, a significant improvement (p<0.05) was found in all acoustic voice parameters. Moreover, perceptual voice analysis demonstrated improvement in all cases. Finally, videonasolaryngoscopy demonstrated that vocal nodules were no longer discernible on the vocal folds in any of the cases. SV-III software seems to be a safe and reliable method for providing voice therapy in children with VN. Acoustic voice parameters, perceptual data and videonasolaryngoscopy were significantly improved after the speech therapy period was completed. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Voice Based City Panic Button System

NASA Astrophysics Data System (ADS)

Febriansyah; Zainuddin, Zahir; Bachtiar Nappu, M.

2018-03-01

The development of voice activated panic button application aims to design faster early notification of hazardous condition in community to the nearest police by using speech as the detector where the current application still applies touch-combination on screen and use coordination of orders from control center then the early notification still takes longer time. The method used in this research was by using voice recognition as the user voice detection and haversine formula for the comparison of closest distance between the user and the police. This research was equipped with auto sms, which sent notification to the victim’s relatives, that was also integrated with Google Maps application (GMaps) as the map to the victim’s location. The results show that voice registration on the application reaches 100%, incident detection using speech recognition while the application is running is 94.67% in average, and the auto sms to the victim relatives reaches 100%.
Development of Efficient Authoring Software for e-Learning Contents

NASA Astrophysics Data System (ADS)

Kozono, Kazutake; Teramoto, Akemi; Akiyama, Hidenori

The contents creation in e-Learning system becomes an important problem. The contents of e-Learning should include figure and voice media for a high-level educational effect. However, the use of figure and voice complicates the operation of authoring software considerably. A new authoring software, which can build e-Learning contents efficiently, has been developed to solve this problem. This paper reports development results of the authoring software.
Project planning, training, measurement and sustainment: the successful implementation of voice recognition.

PubMed

Antiles, S; Couris, J; Schweitzer, A; Rosenthal, D; Da Silva, R Q

2000-01-01

Computerized voice recognition systems (VR) can reduce costs and enhance service. The capital outlay required for conversion to a VR system is significant; therefore, it is incumbent on radiology departments to provide cost and service justifications to administrators. Massachusetts General Hospital (MGH) in Boston implemented VR over a two-year period and achieved annual savings of $530,000 and a 50% decrease in report throughput. Those accomplishments required solid planning and implementation strategies, training and sustainment programs. This article walks through the process, step by step, in the hope of providing a tool set for future implementations. Because VR has dramatic implications for workflow, a solid operational plan is needed when assessing vendors and planning for implementation. The goals for implementation should be to minimize operational disruptions and capitalize on efficiencies of the technology. Senior leadership--the department chair or vice-chair--must select the goals to be accomplished and oversee, manage and direct the VR initiative. The importance of this point cannot be overstated, since implementation will require behavior changes from radiologists and others who may not perceive any personal benefits. Training is the pivotal factor affecting the success of voice recognition, and practice is the only way for radiologists to enhance their skills. Through practice, radiologists will discover shortcuts, and their speed and comfort will improve. Measurement and data analysis are critical to changing and improving the voice recognition application and are vital to decision-making. Some of the issues about which valuable date can be collected are technical and educational problems, VR penetration, report turnaround time and annual cost savings. Sustained effort is indispensable to the maintenance of voice recognition. Finally, all efforts made and gains achieved may prove to be futile without ongoing sustainment of the system through retraining, education and technical support.
Selective attention in perceptual adjustments to voice.

PubMed

Mullennix, J W; Howe, J N

1999-10-01

The effects of perceptual adjustments to voice information on the perception of isolated spoken words were examined. In two experiments, spoken target words were preceded or followed within a trial by a neutral word spoken in the same voice or in a different voice as the target. Over-all, words were reproduced more accurately on trials on which the voice of the neutral word matched the voice of the spoken target word, suggesting that perceptual adjustments to voice interfere with word processing. This result, however, was mediated by selective attention to voice. The results provide further evidence of a close processing relationship between perceptual adjustments to voice and spoken word recognition.
Whispering - The hidden side of auditory communication.

PubMed

Frühholz, Sascha; Trost, Wiebke; Grandjean, Didier

2016-11-15

Whispering is a unique expression mode that is specific to auditory communication. Individuals switch their vocalization mode to whispering especially when affected by inner emotions in certain social contexts, such as in intimate relationships or intimidating social interactions. Although this context-dependent whispering is adaptive, whispered voices are acoustically far less rich than phonated voices and thus impose higher hearing and neural auditory decoding demands for recognizing their socio-affective value by listeners. The neural dynamics underlying this recognition especially from whispered voices are largely unknown. Here we show that whispered voices in humans are considerably impoverished as quantified by an entropy measure of spectral acoustic information, and this missing information needs large-scale neural compensation in terms of auditory and cognitive processing. Notably, recognizing the socio-affective information from voices was slightly more difficult from whispered voices, probably based on missing tonal information. While phonated voices elicited extended activity in auditory regions for decoding of relevant tonal and time information and the valence of voices, whispered voices elicited activity in a complex auditory-frontal brain network. Our data suggest that a large-scale multidirectional brain network compensates for the impoverished sound quality of socially meaningful environmental signals to support their accurate recognition and valence attribution. Copyright © 2016 Elsevier Inc. All rights reserved.
Crossmodal plasticity in the fusiform gyrus of late blind individuals during voice recognition.

PubMed

Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

2014-12-01

Blind individuals are trained in identifying other people through voices. In congenitally blind adults the anterior fusiform gyrus has been shown to be active during voice recognition. Such crossmodal changes have been associated with a superiority of blind adults in voice perception. The key question of the present functional magnetic resonance imaging (fMRI) study was whether visual deprivation that occurs in adulthood is followed by similar adaptive changes of the voice identification system. Late blind individuals and matched sighted participants were tested in a priming paradigm, in which two voice stimuli were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either coming from an old or a young person. Only in late blind but not in matched sighted controls, the activation in the anterior fusiform gyrus was modulated by voice identity: late blind volunteers showed an increase of the BOLD signal in response to person-incongruent compared with person-congruent trials. These results suggest that the fusiform gyrus adapts to input of a new modality even in the mature brain and thus demonstrate an adult type of crossmodal plasticity. Copyright © 2014 Elsevier Inc. All rights reserved.
Vocal Parameters and Self-Perception in Individuals With Adductor Spasmodic Dysphonia.

PubMed

Rojas, Gleidy Vannesa E; Ricz, Hilton; Tumas, Vitor; Rodrigues, Guilherme R; Toscano, Patrícia; Aguiar-Ricz, Lílian

2017-05-01

The study aimed to compare and correlate perceptual-auditory analysis of vocal parameters and self-perception in individuals with adductor spasmodic dysphonia before and after the application of botulinum toxin. This is a prospective cohort study. Sixteen individuals with a diagnosis of adductor spasmodic dysphonia were submitted to the application of botulinum toxin in the thyroarytenoid muscle, to the recording of a voice signal, and to the Voice Handicap Index (VHI) questionnaire before the application and at two time points after application. Two judges performed a perceptual-auditory analysis of eight vocal parameters with the aid of the Praat software for the visualization of narrow band spectrography, pitch, and intensity contour. Comparison of the vocal parameters before toxin application and on the first return revealed a reduction of oscillation intensity (P = 0.002), voice breaks (P = 0.002), and vocal tremor (P = 0.002). The same parameters increased on the second return. The degree of severity, strained-strangled voice, roughness, breathiness, and asthenia was unchanged. The total score and the emotional domain score of the VHI were reduced on the first return. There was a moderate correlation between the degree of voice severity and the total VHI score before application and on the second return, and a weak correlation on the first return. Perceptual-auditory analysis and self-perception proved to be efficient in the recognition of vocal changes and of the vocal impact on individuals with adductor spasmodic dysphonia under treatment with botulinum toxin, permitting the quantitation of changes along time. Copyright © 2017. Published by Elsevier Inc.
Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners.

PubMed

Paulmann, Silke; Uskul, Ayse K

2014-01-01

This cross-cultural study of emotional tone of voice recognition tests the in-group advantage hypothesis (Elfenbein & Ambady, 2002) employing a quasi-balanced design. Individuals of Chinese and British background were asked to recognise pseudosentences produced by Chinese and British native speakers, displaying one of seven emotions (anger, disgust, fear, happy, neutral tone of voice, sad, and surprise). Findings reveal that emotional displays were recognised at rates higher than predicted by chance; however, members of each cultural group were more accurate in recognising the displays communicated by a member of their own cultural group than a member of the other cultural group. Moreover, the evaluation of error matrices indicates that both culture groups relied on similar mechanism when recognising emotional displays from the voice. Overall, the study reveals evidence for both universal and culture-specific principles in vocal emotion recognition.
Processing voiceless vowels in Japanese: Effects of language-specific phonological knowledge

NASA Astrophysics Data System (ADS)

Ogasawara, Naomi

2005-04-01

There has been little research on processing allophonic variation in the field of psycholinguistics. This study focuses on processing the voiced/voiceless allophonic alternation of high vowels in Japanese. Three perception experiments were conducted to explore how listeners parse out vowels with the voicing alternation from other segments in the speech stream and how the different voicing statuses of the vowel affect listeners' word recognition process. The results from the three experiments show that listeners use phonological knowledge of their native language for phoneme processing and for word recognition. However, interactions of the phonological and acoustic effects are observed to be different in each process. The facilitatory phonological effect and the inhibitory acoustic effect cancel out one another in phoneme processing; while in word recognition, the facilitatory phonological effect overrides the inhibitory acoustic effect.
Integrating voice evaluation: correlation between acoustic and audio-perceptual measures.

PubMed

Vaz Freitas, Susana; Melo Pestana, Pedro; Almeida, Vítor; Ferreira, Aníbal

2015-05-01

This article aims to establish correlations between acoustic and audio-perceptual measures using the GRBAS scale with respect to four different voice analysis software programs. Exploratory, transversal. A total of 90 voice records were collected and analyzed with the Dr. Speech (Tiger Electronics, Seattle, WA), Multidimensional Voice Program (Kay Elemetrics, NJ, USA), PRAAT (University of Amsterdam, The Netherlands), and Voice Studio (Seegnal, Oporto, Portugal) software programs. The acoustic measures were correlated to the audio-perceptual parameters of the GRBAS and rated by 10 experts. The predictive value of the acoustic measurements related to the audio-perceptual parameters exhibited magnitudes ranging from weak (R(2)a=0.17) to moderate (R(2)a=0.71). The parameter exhibiting the highest correlation magnitude is B (Breathiness), whereas the weaker correlation magnitudes were found to be for A (Asthenia) and S (Strain). The acoustic measures with stronger predictive values were local Shimmer, harmonics-to-noise ratio, APQ5 shimmer, and PPQ5 jitter, with different magnitudes for each one of the studied software programs. Some acoustic measures are pointed as significant predictors of GRBAS parameters, but they differ among software programs. B (Breathiness) was the parameter exhibiting the highest correlation magnitude. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Separation of Singing Voice from Music Accompaniment for Monaural Recordings

DTIC Science & Technology

2005-09-01

Directory: pub/tech-report/2005 File in pdf format: TR61.pdf Separation of Singing Voice from Music Accompaniment for Monaural Recordings Yipeng Li...Abstract Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer...identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little
Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

PubMed

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers' voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker's face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Improving Speaker Recognition by Biometric Voice Deconstruction

PubMed Central

Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

2015-01-01

Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions. PMID:26442245
Improving Speaker Recognition by Biometric Voice Deconstruction.

PubMed

Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

2015-01-01

Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.

Postlingual adult performance in noise with HiRes 120 and ClearVoice Low, Medium, and High.

PubMed

Holden, Laura K; Brenner, Christine; Reeder, Ruth M; Firszt, Jill B

2013-11-01

The study's objectives were to evaluate speech recognition in multiple listening conditions using several noise types with HiRes 120 and ClearVoice (Low, Medium, High) and to determine which ClearVoice program was most beneficial for everyday use. Fifteen postlingual adults attended four sessions; speech recognition was assessed at sessions 1 and 3 with HiRes 120 and at sessions 2 and 4 with all ClearVoice programs. Test measures included sentences presented in restaurant noise (R-SPACE), in speech-spectrum noise, in four- and eight-talker babble, and connected discourse presented in 12-talker babble. Participants completed a questionnaire comparing ClearVoice programs. Significant group differences in performance between HiRes 120 and ClearVoice were present only in the R-SPACE; performance was better with ClearVoice High than HiRes 120. Among ClearVoice programs, no significant group differences were present for any measure. Individual results revealed most participants performed better in the R-SPACE with ClearVoice than HiRes 120. For other measures, significant individual differences between HiRes 120 and ClearVoice were not prevalent. Individual results among ClearVoice programs differed and overall preferences varied. Questionnaire data indicated increased understanding with High and Medium in certain environments. R-SPACE and questionnaire results indicated an advantage for ClearVoice High and Medium. Individual test and preference data showed mixed results between ClearVoice programs making global recommendations difficult; however, results suggest providing ClearVoice High and Medium and HiRes 120 as processor options for adults willing to change settings. For adults unwilling or unable to change settings, ClearVoice Medium is a practical choice for daily listening.
Real-Time Reconfigurable Adaptive Speech Recognition Command and Control Apparatus and Method

NASA Technical Reports Server (NTRS)

Salazar, George A. (Inventor); Haynes, Dena S. (Inventor); Sommers, Marc J. (Inventor)

1998-01-01

An adaptive speech recognition and control system and method for controlling various mechanisms and systems in response to spoken instructions and in which spoken commands are effective to direct the system into appropriate memory nodes, and to respective appropriate memory templates corresponding to the voiced command is discussed. Spoken commands from any of a group of operators for which the system is trained may be identified, and voice templates are updated as required in response to changes in pronunciation and voice characteristics over time of any of the operators for which the system is trained. Provisions are made for both near-real-time retraining of the system with respect to individual terms which are determined not be positively identified, and for an overall system training and updating process in which recognition of each command and vocabulary term is checked, and in which the memory templates are retrained if necessary for respective commands or vocabulary terms with respect to an operator currently using the system. In one embodiment, the system includes input circuitry connected to a microphone and including signal processing and control sections for sensing the level of vocabulary recognition over a given period and, if recognition performance falls below a given level, processing audio-derived signals for enhancing recognition performance of the system.
Biometric Fusion Demonstration System Scientific Report

DTIC Science & Technology

2004-03-01

verification and facial recognition , searching watchlist databases comprised of full or partial facial images or voice recordings. Multiple-biometric...17 2.2.1.1 Fingerprint and Facial Recognition ............................... 17...iv DRDC Ottawa CR 2004 – 056 2.2.1.2 Iris Recognition and Facial Recognition ........................ 18
Freedom to Grow: Children's Perspectives of Student Voice

ERIC Educational Resources Information Center

Quinn, Sarah; Owen, Susanne

2014-01-01

This article explores the power of student voice, in recognition of the child's right to be treated as a capable, competent social actor involved in the education process. In this study, student voice is considered in the light of improving students' engagement and personal and social development at the primary school level. It emphasizes the…
Recognition of facial, auditory, and bodily emotions in older adults.

PubMed

Ruffman, Ted; Halberstadt, Jamin; Murray, Janice

2009-11-01

Understanding older adults' social functioning difficulties requires insight into their recognition of emotion processing in voices and bodies, not just faces, the focus of most prior research. We examined 60 young and 61 older adults' recognition of basic emotions in facial, vocal, and bodily expressions, and when matching faces and bodies to voices, using 120 emotion items. Older adults were worse than young adults in 17 of 30 comparisons, with consistent difficulties in recognizing both positive (happy) and negative (angry and sad) vocal and bodily expressions. Nearly three quarters of older adults functioned at a level similar to the lowest one fourth of young adults, suggesting that age-related changes are common. In addition, we found that older adults' difficulty in matching emotions was not explained by difficulty on the component sources (i.e., faces or voices on their own), suggesting an additional problem of integration.
Secure Recognition of Voice-Less Commands Using Videos

NASA Astrophysics Data System (ADS)

Yau, Wai Chee; Kumar, Dinesh Kant; Weghorn, Hans

Interest in voice recognition technologies for internet applications is growing due to the flexibility of speech-based communication. The major drawback with the use of sound for internet access with computers is that the commands will be audible to other people in the vicinity. This paper examines a secure and voice-less method for recognition of speech-based commands using video without evaluating sound signals. The proposed approach represents mouth movements in the video data using 2D spatio-temporal templates (STT). Zernike moments (ZM) are computed from STT and fed into support vector machines (SVM) to be classified into one of the utterances. The experimental results demonstrate that the proposed technique produces a high accuracy of 98% in a phoneme classification task. The proposed technique is demonstrated to be invariant to global variations of illumination level. Such a system is useful for securely interpreting user commands for internet applications on mobile devices.
Early Detection of Severe Apnoea through Voice Analysis and Automatic Speaker Recognition Techniques

NASA Astrophysics Data System (ADS)

Fernández, Ruben; Blanco, Jose Luis; Díaz, David; Hernández, Luis A.; López, Eduardo; Alcázar, José

This study is part of an on-going collaborative effort between the medical and the signal processing communities to promote research on applying voice analysis and Automatic Speaker Recognition techniques (ASR) for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based diagnosis could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we present and discuss the possibilities of using generative Gaussian Mixture Models (GMMs), generally used in ASR systems, to model distinctive apnoea voice characteristics (i.e. abnormal nasalization). Finally, we present experimental findings regarding the discriminative power of speaker recognition techniques applied to severe apnoea detection. We have achieved an 81.25 % correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Remote voice training: A case study on space shuttle applications, appendix C

NASA Technical Reports Server (NTRS)

Mollakarimi, Cindy; Hamid, Tamin

1990-01-01

The Tile Automation System includes applications of automation and robotics technology to all aspects of the Shuttle tile processing and inspection system. An integrated set of rapid prototyping testbeds was developed which include speech recognition and synthesis, laser imaging systems, distributed Ada programming environments, distributed relational data base architectures, distributed computer network architectures, multi-media workbenches, and human factors considerations. Remote voice training in the Tile Automation System is discussed. The user is prompted over a headset by synthesized speech for the training sequences. The voice recognition units and the voice output units are remote from the user and are connected by Ethernet to the main computer system. A supervisory channel is used to monitor the training sequences. Discussions include the training approaches as well as the human factors problems and solutions for this system utilizing remote training techniques.
Initial Progress Toward Development of a Voice-Based Computer-Delivered Motivational Intervention for Heavy Drinking College Students: An Experimental Study

PubMed Central

Lechner, William J; MacGlashan, James; Wray, Tyler B; Littman, Michael L

2017-01-01

Background Computer-delivered interventions have been shown to be effective in reducing alcohol consumption in heavy drinking college students. However, these computer-delivered interventions rely on mouse, keyboard, or touchscreen responses for interactions between the users and the computer-delivered intervention. The principles of motivational interviewing suggest that in-person interventions may be effective, in part, because they encourage individuals to think through and speak aloud their motivations for changing a health behavior, which current computer-delivered interventions do not allow. Objective The objective of this study was to take the initial steps toward development of a voice-based computer-delivered intervention that can ask open-ended questions and respond appropriately to users’ verbal responses, more closely mirroring a human-delivered motivational intervention. Methods We developed (1) a voice-based computer-delivered intervention that was run by a human controller and that allowed participants to speak their responses to scripted prompts delivered by speech generation software and (2) a text-based computer-delivered intervention that relied on the mouse, keyboard, and computer screen for all interactions. We randomized 60 heavy drinking college students to interact with the voice-based computer-delivered intervention and 30 to interact with the text-based computer-delivered intervention and compared their ratings of the systems as well as their motivation to change drinking and their drinking behavior at 1-month follow-up. Results Participants reported that the voice-based computer-delivered intervention engaged positively with them in the session and delivered content in a manner consistent with motivational interviewing principles. At 1-month follow-up, participants in the voice-based computer-delivered intervention condition reported significant decreases in quantity, frequency, and problems associated with drinking, and increased perceived importance of changing drinking behaviors. In comparison to the text-based computer-delivered intervention condition, those assigned to voice-based computer-delivered intervention reported significantly fewer alcohol-related problems at the 1-month follow-up (incident rate ratio 0.60, 95% CI 0.44-0.83, P=.002). The conditions did not differ significantly on perceived importance of changing drinking or on measures of drinking quantity and frequency of heavy drinking. Conclusions Results indicate that it is feasible to construct a series of open-ended questions and a bank of responses and follow-up prompts that can be used in a future fully automated voice-based computer-delivered intervention that may mirror more closely human-delivered motivational interventions to reduce drinking. Such efforts will require using advanced speech recognition capabilities and machine-learning approaches to train a program to mirror the decisions made by human controllers in the voice-based computer-delivered intervention used in this study. In addition, future studies should examine enhancements that can increase the perceived warmth and empathy of voice-based computer-delivered intervention, possibly through greater personalization, improvements in the speech generation software, and embodying the computer-delivered intervention in a physical form. PMID:28659259
Statistical Evaluation of Biometric Evidence in Forensic Automatic Speaker Recognition

NASA Astrophysics Data System (ADS)

Drygajlo, Andrzej

Forensic speaker recognition is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace). This paper aims at presenting forensic automatic speaker recognition (FASR) methods that provide a coherent way of quantifying and presenting recorded voice as biometric evidence. In such methods, the biometric evidence consists of the quantified degree of similarity between speaker-dependent features extracted from the trace and speaker-dependent features extracted from recorded speech of a suspect. The interpretation of recorded voice as evidence in the forensic context presents particular challenges, including within-speaker (within-source) variability and between-speakers (between-sources) variability. Consequently, FASR methods must provide a statistical evaluation which gives the court an indication of the strength of the evidence given the estimated within-source and between-sources variabilities. This paper reports on the first ENFSI evaluation campaign through a fake case, organized by the Netherlands Forensic Institute (NFI), as an example, where an automatic method using the Gaussian mixture models (GMMs) and the Bayesian interpretation (BI) framework were implemented for the forensic speaker recognition task.
Application of Pattern Recognition Techniques to the Classification of Full-Term and Preterm Infant Cry.

PubMed

Orlandi, Silvia; Reyes Garcia, Carlos Alberto; Bandini, Andrea; Donzelli, Gianpaolo; Manfredi, Claudia

2016-11-01

Scientific and clinical advances in perinatology and neonatology have enhanced the chances of survival of preterm and very low weight neonates. Infant cry analysis is a suitable noninvasive complementary tool to assess the neurologic state of infants particularly important in the case of preterm neonates. This article aims at exploiting differences between full-term and preterm infant cry with robust automatic acoustical analysis and data mining techniques. Twenty-two acoustical parameters are estimated in more than 3000 cry units from cry recordings of 28 full-term and 10 preterm newborns. Feature extraction is performed through the BioVoice dedicated software tool, developed at the Biomedical Engineering Lab, University of Firenze, Italy. Classification and pattern recognition is based on genetic algorithms for the selection of the best attributes. Training is performed comparing four classifiers: Logistic Curve, Multilayer Perceptron, Support Vector Machine, and Random Forest and three different testing options: full training set, 10-fold cross-validation, and 66% split. Results show that the best feature set is made up by 10 parameters capable to assess differences between preterm and full-term newborns with about 87% of accuracy. Best results are obtained with the Random Forest method (receiver operating characteristic area, 0.94). These 10 cry features might convey important additional information to assist the clinical specialist in the diagnosis and follow-up of possible delays or disorders in the neurologic development due to premature birth in this extremely vulnerable population of patients. The proposed approach is a first step toward an automatic infant cry recognition system for fast and proper identification of risk in preterm babies. Copyright Â© 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Onset and Maturation of Fetal Heart Rate Response to the Mother's Voice over Late Gestation

ERIC Educational Resources Information Center

Kisilevsky, Barbara S.; Hains, Sylvia M. J.

2011-01-01

Background: Term fetuses discriminate their mother's voice from a female stranger's, suggesting recognition/learning of some property of her voice. Identification of the onset and maturation of the response would increase our understanding of the influence of environmental sounds on the development of sensory abilities and identify the period when…
Functional Connectivity between Face-Movement and Speech-Intelligibility Areas during Auditory-Only Speech Perception

PubMed Central

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas. PMID:24466026
Implementation of the Intelligent Voice System for Kazakh

NASA Astrophysics Data System (ADS)

Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

2014-04-01

Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.
Incorporating Speech Recognition into a Natural User Interface

NASA Technical Reports Server (NTRS)

Chapa, Nicholas

2017-01-01

The Augmented/ Virtual Reality (AVR) Lab has been working to study the applicability of recent virtual and augmented reality hardware and software to KSC operations. This includes the Oculus Rift, HTC Vive, Microsoft HoloLens, and Unity game engine. My project in this lab is to integrate voice recognition and voice commands into an easy to modify system that can be added to an existing portion of a Natural User Interface (NUI). A NUI is an intuitive and simple to use interface incorporating visual, touch, and speech recognition. The inclusion of speech recognition capability will allow users to perform actions or make inquiries using only their voice. The simplicity of needing only to speak to control an on-screen object or enact some digital action means that any user can quickly become accustomed to using this system. Multiple programs were tested for use in a speech command and recognition system. Sphinx4 translates speech to text using a Hidden Markov Model (HMM) based Language Model, an Acoustic Model, and a word Dictionary running on Java. PocketSphinx had similar functionality to Sphinx4 but instead ran on C. However, neither of these programs were ideal as building a Java or C wrapper slowed performance. The most ideal speech recognition system tested was the Unity Engine Grammar Recognizer. A Context Free Grammar (CFG) structure is written in an XML file to specify the structure of phrases and words that will be recognized by Unity Grammar Recognizer. Using Speech Recognition Grammar Specification (SRGS) 1.0 makes modifying the recognized combinations of words and phrases very simple and quick to do. With SRGS 1.0, semantic information can also be added to the XML file, which allows for even more control over how spoken words and phrases are interpreted by Unity. Additionally, using a CFG with SRGS 1.0 produces a Finite State Machine (FSM) functionality limiting the potential for incorrectly heard words or phrases. The purpose of my project was to investigate options for a Speech Recognition System. To that end I attempted to integrate Sphinx4 into a user interface. Sphinx4 had great accuracy and is the only free program able to perform offline speech dictation. However it had a limited dictionary of words that could be recognized, single syllable words were almost impossible for it to hear, and since it ran on Java it could not be integrated into the Unity based NUI. PocketSphinx ran much faster than Sphinx4 which would've made it ideal as a plugin to the Unity NUI, unfortunately creating a C# wrapper for the C code made the program unusable with Unity due to the wrapper slowing code execution and class files becoming unreachable. Unity Grammar Recognizer is the ideal speech recognition interface, it is flexible in recognizing multiple variations of the same command. It is also the most accurate program in recognizing speech due to using an XML grammar to specify speech structure instead of relying solely on a Dictionary and Language model. The Unity Grammar Recognizer will be used with the NUI for these reasons as well as being written in C# which further simplifies the incorporation.
Neurocognition and symptoms identify links between facial recognition and emotion processing in schizophrenia: Meta-analytic findings

PubMed Central

Ventura, Joseph; Wood, Rachel C.; Jimenez, Amy M.; Hellemann, Gerhard S.

2014-01-01

Background In schizophrenia patients, one of the most commonly studied deficits of social cognition is emotion processing (EP), which has documented links to facial recognition (FR). But, how are deficits in facial recognition linked to emotion processing deficits? Can neurocognitive and symptom correlates of FR and EP help differentiate the unique contribution of FR to the domain of social cognition? Methods A meta-analysis of 102 studies (combined n = 4826) in schizophrenia patients was conducted to determine the magnitude and pattern of relationships between facial recognition, emotion processing, neurocognition, and type of symptom. Results Meta-analytic results indicated that facial recognition and emotion processing are strongly interrelated (r = .51). In addition, the relationship between FR and EP through voice prosody (r = .58) is as strong as the relationship between FR and EP based on facial stimuli (r = .53). Further, the relationship between emotion recognition, neurocognition, and symptoms is independent of the emotion processing modality – facial stimuli and voice prosody. Discussion The association between FR and EP that occurs through voice prosody suggests that FR is a fundamental cognitive process. The observed links between FR and EP might be due to bottom-up associations between neurocognition and EP, and not simply because most emotion recognition tasks use visual facial stimuli. In addition, links with symptoms, especially negative symptoms and disorganization, suggest possible symptom mechanisms that contribute to FR and EP deficits. PMID:24268469
Neurocognition and symptoms identify links between facial recognition and emotion processing in schizophrenia: meta-analytic findings.

PubMed

Ventura, Joseph; Wood, Rachel C; Jimenez, Amy M; Hellemann, Gerhard S

2013-12-01

In schizophrenia patients, one of the most commonly studied deficits of social cognition is emotion processing (EP), which has documented links to facial recognition (FR). But, how are deficits in facial recognition linked to emotion processing deficits? Can neurocognitive and symptom correlates of FR and EP help differentiate the unique contribution of FR to the domain of social cognition? A meta-analysis of 102 studies (combined n=4826) in schizophrenia patients was conducted to determine the magnitude and pattern of relationships between facial recognition, emotion processing, neurocognition, and type of symptom. Meta-analytic results indicated that facial recognition and emotion processing are strongly interrelated (r=.51). In addition, the relationship between FR and EP through voice prosody (r=.58) is as strong as the relationship between FR and EP based on facial stimuli (r=.53). Further, the relationship between emotion recognition, neurocognition, and symptoms is independent of the emotion processing modality - facial stimuli and voice prosody. The association between FR and EP that occurs through voice prosody suggests that FR is a fundamental cognitive process. The observed links between FR and EP might be due to bottom-up associations between neurocognition and EP, and not simply because most emotion recognition tasks use visual facial stimuli. In addition, links with symptoms, especially negative symptoms and disorganization, suggest possible symptom mechanisms that contribute to FR and EP deficits. © 2013 Elsevier B.V. All rights reserved.
A voice-input voice-output communication aid for people with severe speech impairment.

PubMed

Hawley, Mark S; Cunningham, Stuart P; Green, Phil D; Enderby, Pam; Palmer, Rebecca; Sehgal, Siddharth; O'Neill, Peter

2013-01-01

A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.
Evaluating a voice recognition system: finding the right product for your department.

PubMed

Freeh, M; Dewey, M; Brigham, L

2001-06-01

The Department of Radiology at the University of Utah Health Sciences Center has been in the process of transitioning from the traditional film-based department to a digital imaging department for the past 2 years. The department is now transitioning from the traditional method of dictating reports (dictation by radiologist to transcription to review and signing by radiologist) to a voice recognition system. The transition to digital operations will not be complete until we have the ability to directly interface the dictation process with the image review process. Voice recognition technology has advanced to the level where it can and should be an integral part of the new way of working in radiology and is an integral part of an efficient digital imaging department. The transition to voice recognition requires the task of identifying the product and the company that will best meet a department's needs. This report introduces the methods we used to evaluate the vendors and the products available as we made our purchasing decision. We discuss our evaluation method and provide a checklist that can be used by other departments to assist with their evaluation process. The criteria used in the evaluation process fall into the following major categories: user operations, technical infrastructure, medical dictionary, system interfaces, service support, cost, and company strength. Conclusions drawn from our evaluation process will be detailed, with the intention being to shorten the process for others as they embark on a similar venture. As more and more organizations investigate the many products and services that are now being offered to enhance the operations of a radiology department, it becomes increasingly important that solid methods are used to most effectively evaluate the new products. This report should help others complete the task of evaluating a voice recognition system and may be adaptable to other products as well.
Speech to Text Translation for Malay Language

NASA Astrophysics Data System (ADS)

Al-khulaidi, Rami Ali; Akmeliawati, Rini

2017-11-01

The speech recognition system is a front end and a back-end process that receives an audio signal uttered by a speaker and converts it into a text transcription. The speech system can be used in several fields including: therapeutic technology, education, social robotics and computer entertainments. In most cases in control tasks, which is the purpose of proposing our system, wherein the speed of performance and response concern as the system should integrate with other controlling platforms such as in voiced controlled robots. Therefore, the need for flexible platforms, that can be easily edited to jibe with functionality of the surroundings, came to the scene; unlike other software programs that require recording audios and multiple training for every entry such as MATLAB and Phoenix. In this paper, a speech recognition system for Malay language is implemented using Microsoft Visual Studio C#. 90 (ninety) Malay phrases were tested by 10 (ten) speakers from both genders in different contexts. The result shows that the overall accuracy (calculated from Confusion Matrix) is satisfactory as it is 92.69%.

A Voice Enabled Procedure Browser for the International Space Station

NASA Technical Reports Server (NTRS)

Rayner, Manny; Chatzichrisafis, Nikos; Hockey, Beth Ann; Farrell, Kim; Renders, Jean-Michel

2005-01-01

Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station (ISS), is to the best of our knowledge the first spoken dialog system in space. This paper gives background on the system and the ISS procedures, then discusses the research developed to address three key problems: grammar-based speech recognition using the Regulus toolkit; SVM based methods for open microphone speech recognition; and robust side-effect free dialogue management for handling undos, corrections and confirmations.
Voice emotion recognition by cochlear-implanted children and their normally-hearing peers

PubMed Central

Chatterjee, Monita; Zion, Danielle; Deroche, Mickael L.; Burianek, Brooke; Limb, Charles; Goren, Alison; Kulkarni, Aditya M.; Christensen, Julie A.

2014-01-01

Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups’ mean performance is similar to aNHs’ performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. PMID:25448167
STS-41 Voice Command System Flight Experiment Report

NASA Technical Reports Server (NTRS)

Salazar, George A.

1981-01-01

This report presents the results of the Voice Command System (VCS) flight experiment on the five-day STS-41 mission. Two mission specialists,Bill Shepherd and Bruce Melnick, used the speaker-dependent system to evaluate the operational effectiveness of using voice to control a spacecraft system. In addition, data was gathered to analyze the effects of microgravity on speech recognition performance.
Automated speech recognition for time recording in out-of-hospital emergency medicine-an experimental approach.

PubMed

Gröschel, J; Philipp, F; Skonetzki, St; Genzwürker, H; Wetter, Th; Ellinger, K

2004-02-01

Precise documentation of medical treatment in emergency medical missions and for resuscitation is essential from a medical, legal and quality assurance point of view [Anästhesiologie und Intensivmedizin, 41 (2000) 737]. All conventional methods of time recording are either too inaccurate or elaborate for routine application. Automated speech recognition may offer a solution. A special erase programme for the documentation of all time events was developed. Standard speech recognition software (IBM ViaVoice 7.0) was adapted and installed on two different computer systems. One was a stationary PC (500MHz Pentium III, 128MB RAM, Soundblaster PCI 128 Soundcard, Win NT 4.0), the other was a mobile pen-PC that had already proven its value during emergency missions [Der Notarzt 16, p. 177] (Fujitsu Stylistic 2300, 230Mhz MMX Processor, 160MB RAM, embedded soundcard ESS 1879 chipset, Win98 2nd ed.). On both computers two different microphones were tested. One was a standard headset that came with the recognition software, the other was a small microphone (Lavalier-Kondensatormikrofon EM 116 from Vivanco), that could be attached to the operators collar. Seven women and 15 men spoke a text with 29 phrases to be recognised. Two emergency physicians tested the system in a simulated emergency setting using the collar microphone and the pen-PC with an analogue wireless connection. Overall recognition was best for the PC with a headset (89%) followed by the pen-PC with a headset (85%), the PC with a microphone (84%) and the pen-PC with a microphone (80%). Nevertheless, the difference was not statistically significant. Recognition became significantly worse (89.5% versus 82.3%, P<0.0001 ) when numbers had to be recognised. The gender of speaker and the number of words in a sentence had no influence. Average recognition in the simulated emergency setting was 75%. At no time did false recognition appear. Time recording with automated speech recognition seems to be possible in emergency medical missions. Although results show an average recognition of only 75%, it is possible that missing elements may be reconstructed more precisely. Future technology should integrate a secure wireless connection between microphone and mobile computer. The system could then prove its value for real out-of-hospital emergencies.
"Who" is saying "what"? Brain-based decoding of human voice and speech.

PubMed

Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer

2008-11-07

Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
Current trends in small vocabulary speech recognition for equipment control

NASA Astrophysics Data System (ADS)

Doukas, Nikolaos; Bardis, Nikolaos G.

2017-09-01

Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.
Emotion Recognition From Singing Voices Using Contemporary Commercial Music and Classical Styles.

PubMed

Hakanpää, Tua; Waaramaa, Teija; Laukkanen, Anne-Maria

2018-02-22

This study examines the recognition of emotion in contemporary commercial music (CCM) and classical styles of singing. This information may be useful in improving the training of interpretation in singing. This is an experimental comparative study. Thirteen singers (11 female, 2 male) with a minimum of 3 years' professional-level singing studies (in CCM or classical technique or both) participated. They sang at three pitches (females: a, e1, a1, males: one octave lower) expressing anger, sadness, joy, tenderness, and a neutral state. Twenty-nine listeners listened to 312 short (0.63- to 4.8-second) voice samples, 135 of which were sung using a classical singing technique and 165 of which were sung in a CCM style. The listeners were asked which emotion they heard. Activity and valence were derived from the chosen emotions. The percentage of correct recognitions out of all the answers in the listening test (N = 9048) was 30.2%. The recognition percentage for the CCM-style singing technique was higher (34.5%) than for the classical-style technique (24.5%). Valence and activation were better perceived than the emotions themselves, and activity was better recognized than valence. A higher pitch was more likely to be perceived as joy or anger, and a lower pitch as sorrow. Both valence and activation were better recognized in the female CCM samples than in the other samples. There are statistically significant differences in the recognition of emotions between classical and CCM styles of singing. Furthermore, in the singing voice, pitch affects the perception of emotions, and valence and activity are more easily recognized than emotions. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Event identification by acoustic signature recognition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dress, W.B.; Kercel, S.W.

1995-07-01

Many events of interest to the security commnnity produce acoustic emissions that are, in principle, identifiable as to cause. Some obvious examples are gunshots, breaking glass, takeoffs and landings of small aircraft, vehicular engine noises, footsteps (high frequencies when on gravel, very low frequencies. when on soil), and voices (whispers to shouts). We are investigating wavelet-based methods to extract unique features of such events for classification and identification. We also discuss methods of classification and pattern recognition specifically tailored for acoustic signatures obtained by wavelet analysis. The paper is divided into three parts: completed work, work in progress, and futuremore » applications. The completed phase has led to the successful recognition of aircraft types on landing and takeoff. Both small aircraft (twin-engine turboprop) and large (commercial airliners) were included in the study. The project considered the design of a small, field-deployable, inexpensive device. The techniques developed during the aircraft identification phase were then adapted to a multispectral electromagnetic interference monitoring device now deployed in a nuclear power plant. This is a general-purpose wavelet analysis engine, spanning 14 octaves, and can be adapted for other specific tasks. Work in progress is focused on applying the methods previously developed to speaker identification. Some of the problems to be overcome include recognition of sounds as voice patterns and as distinct from possible background noises (e.g., music), as well as identification of the speaker from a short-duration voice sample. A generalization of the completed work and the work in progress is a device capable of classifying any number of acoustic events-particularly quasi-stationary events such as engine noises and voices and singular events such as gunshots and breaking glass. We will show examples of both kinds of events and discuss their recognition likelihood.« less
Culture/Religion and Identity: Social Justice versus Recognition

ERIC Educational Resources Information Center

Bekerman, Zvi

2012-01-01

Recognition is the main word attached to multicultural perspectives. The multicultural call for recognition, the one calling for the recognition of cultural minorities and identities, the one now voiced by liberal states all over and also in Israel was a more difficult one. It took the author some time to realize that calling for the recognition…
Initial Progress Toward Development of a Voice-Based Computer-Delivered Motivational Intervention for Heavy Drinking College Students: An Experimental Study.

PubMed

Kahler, Christopher W; Lechner, William J; MacGlashan, James; Wray, Tyler B; Littman, Michael L

2017-06-28

Computer-delivered interventions have been shown to be effective in reducing alcohol consumption in heavy drinking college students. However, these computer-delivered interventions rely on mouse, keyboard, or touchscreen responses for interactions between the users and the computer-delivered intervention. The principles of motivational interviewing suggest that in-person interventions may be effective, in part, because they encourage individuals to think through and speak aloud their motivations for changing a health behavior, which current computer-delivered interventions do not allow. The objective of this study was to take the initial steps toward development of a voice-based computer-delivered intervention that can ask open-ended questions and respond appropriately to users' verbal responses, more closely mirroring a human-delivered motivational intervention. We developed (1) a voice-based computer-delivered intervention that was run by a human controller and that allowed participants to speak their responses to scripted prompts delivered by speech generation software and (2) a text-based computer-delivered intervention that relied on the mouse, keyboard, and computer screen for all interactions. We randomized 60 heavy drinking college students to interact with the voice-based computer-delivered intervention and 30 to interact with the text-based computer-delivered intervention and compared their ratings of the systems as well as their motivation to change drinking and their drinking behavior at 1-month follow-up. Participants reported that the voice-based computer-delivered intervention engaged positively with them in the session and delivered content in a manner consistent with motivational interviewing principles. At 1-month follow-up, participants in the voice-based computer-delivered intervention condition reported significant decreases in quantity, frequency, and problems associated with drinking, and increased perceived importance of changing drinking behaviors. In comparison to the text-based computer-delivered intervention condition, those assigned to voice-based computer-delivered intervention reported significantly fewer alcohol-related problems at the 1-month follow-up (incident rate ratio 0.60, 95% CI 0.44-0.83, P=.002). The conditions did not differ significantly on perceived importance of changing drinking or on measures of drinking quantity and frequency of heavy drinking. Results indicate that it is feasible to construct a series of open-ended questions and a bank of responses and follow-up prompts that can be used in a future fully automated voice-based computer-delivered intervention that may mirror more closely human-delivered motivational interventions to reduce drinking. Such efforts will require using advanced speech recognition capabilities and machine-learning approaches to train a program to mirror the decisions made by human controllers in the voice-based computer-delivered intervention used in this study. In addition, future studies should examine enhancements that can increase the perceived warmth and empathy of voice-based computer-delivered intervention, possibly through greater personalization, improvements in the speech generation software, and embodying the computer-delivered intervention in a physical form. ©Christopher W Kahler, William J Lechner, James MacGlashan, Tyler B Wray, Michael L Littman. Originally published in JMIR Mental Health (http://mental.jmir.org), 28.06.2017.
Social power and recognition of emotional prosody: High power is associated with lower recognition accuracy than low power.

PubMed

Uskul, Ayse K; Paulmann, Silke; Weick, Mario

2016-02-01

Listeners have to pay close attention to a speaker's tone of voice (prosody) during daily conversations. This is particularly important when trying to infer the emotional state of the speaker. Although a growing body of research has explored how emotions are processed from speech in general, little is known about how psychosocial factors such as social power can shape the perception of vocal emotional attributes. Thus, the present studies explored how social power affects emotional prosody recognition. In a correlational study (Study 1) and an experimental study (Study 2), we show that high power is associated with lower accuracy in emotional prosody recognition than low power. These results, for the first time, suggest that individuals experiencing high or low power perceive emotional tone of voice differently. (c) 2016 APA, all rights reserved).
Auditory emotion recognition impairments in Schizophrenia: Relationship to acoustic features and cognition

PubMed Central

Gold, Rinat; Butler, Pamela; Revheim, Nadine; Leitman, David; Hansen, John A.; Gur, Ruben; Kantrowitz, Joshua T.; Laukka, Petri; Juslin, Patrik N.; Silipo, Gail S.; Javitt, Daniel C.

2013-01-01

Objective Schizophrenia is associated with deficits in ability to perceive emotion based upon tone of voice. The basis for this deficit, however, remains unclear and assessment batteries remain limited. We evaluated performance in schizophrenia on a novel voice emotion recognition battery with well characterized physical features, relative to impairments in more general emotional and cognitive function. Methods We studied in a primary sample of 92 patients relative to 73 controls. Stimuli were characterized according to both intended emotion and physical features (e.g., pitch, intensity) that contributed to the emotional percept. Parallel measures of visual emotion recognition, pitch perception, general cognition, and overall outcome were obtained. More limited measures were obtained in an independent replication sample of 36 patients, 31 age-matched controls, and 188 general comparison subjects. Results Patients showed significant, large effect size deficits in voice emotion recognition (F=25.4, p<.00001, d=1.1), and were preferentially impaired in recognition of emotion based upon pitch-, but not intensity-features (group X feature interaction: F=7.79, p=.006). Emotion recognition deficits were significantly correlated with pitch perception impairments both across (r=56, p<.0001) and within (r=.47, p<.0001) group. Path analysis showed both sensory-specific and general cognitive contributions to auditory emotion recognition deficits in schizophrenia. Similar patterns of results were observed in the replication sample. Conclusions The present study demonstrates impairments in auditory emotion recognition in schizophrenia relative to acoustic features of underlying stimuli. Furthermore, it provides tools and highlights the need for greater attention to physical features of stimuli used for study of social cognition in neuropsychiatric disorders. PMID:22362394
The influence of nationality on the accuracy of face and voice recognition.

PubMed

Doty, N D

1998-01-01

Sixty English and U.S. citizens were tested to determine the effect of nationality on accuracy in recognizing previously witnessed faces and voices. Subjects viewed a frontal facial photograph and were then asked to select that face from a set of 10 oblique facial photographs. Subjects listened to a recorded voice and were then asked to select the same voice from a set of 10 voice recordings. This process was repeated 7 more times, such that subjects identified a male and female face and voice from England, France, Belize, and the United States. Subjects demonstrated better accuracy recognizing the faces and voices of their own nationality. Subgoups analysis further supported the other-nationality effect as well as the previously documented other-race effect.
Assessment of voice, speech, and related quality of life in advanced head and neck cancer patients 10-years+ after chemoradiotherapy.

PubMed

Kraaijenga, S A C; Oskam, I M; van Son, R J J H; Hamming-Vrieze, O; Hilgers, F J M; van den Brekel, M W M; van der Molen, L

2016-04-01

Assessment of long-term objective and subjective voice, speech, articulation, and quality of life in patients with head and neck cancer (HNC) treated with concurrent chemoradiotherapy (CRT) for advanced, stage IV disease. Twenty-two disease-free survivors, treated with cisplatin-based CRT for inoperable HNC (1999-2004), were evaluated at 10-years post-treatment. A standard Dutch text was recorded. Perceptual analysis of voice, speech, and articulation was conducted by two expert listeners (SLPs). Also an experimental expert system based on automatic speech recognition was used. Patients' perception of voice and speech and related quality of life was assessed with the Voice Handicap Index (VHI) and Speech Handicap Index (SHI) questionnaires. At a median follow-up of 11-years, perceptual evaluation showed abnormal scores in up to 64% of cases, depending on the outcome parameter analyzed. Automatic assessment of voice and speech parameters correlated moderate to strong with perceptual outcome scores. Patient-reported problems with voice (VHI>15) and speech (SHI>6) in daily life were present in 68% and 77% of patients, respectively. Patients treated with IMRT showed significantly less impairment compared to those treated with conventional radiotherapy. More than 10-years after organ-preservation treatment, voice and speech problems are common in this patient cohort, as assessed with perceptual evaluation, automatic speech recognition, and with validated structured questionnaires. There were fewer complaints in patients treated with IMRT than with conventional radiotherapy. Copyright © 2016 Elsevier Ltd. All rights reserved.
DigitalVHI--a freeware open-source software application to capture the Voice Handicap Index and other questionnaire data in various languages.

PubMed

Herbst, Christian T; Oh, Jinook; Vydrová, Jitka; Švec, Jan G

2015-07-01

In this short report we introduce DigitalVHI, a free open-source software application for obtaining Voice Handicap Index (VHI) and other questionnaire data, which can be put on a computer in clinics and used in clinical practice. The software can simplify performing clinical studies since it makes the VHI scores directly available for analysis in a digital form. It can be downloaded from http://www.christian-herbst.org/DigitalVHI/.
Interpreting Chicken-Scratch: Lexical Access for Handwritten Words

ERIC Educational Resources Information Center

Barnhart, Anthony S.; Goldinger, Stephen D.

2010-01-01

Handwritten word recognition is a field of study that has largely been neglected in the psychological literature, despite its prevalence in society. Whereas studies of spoken word recognition almost exclusively employ natural, human voices as stimuli, studies of visual word recognition use synthetic typefaces, thus simplifying the process of word…
Towards Real-Time Speech Emotion Recognition for Affective E-Learning

ERIC Educational Resources Information Center

Bahreini, Kiavash; Nadolski, Rob; Westera, Wim

2016-01-01

This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…
A self-teaching image processing and voice-recognition-based, intelligent and interactive system to educate visually impaired children

NASA Astrophysics Data System (ADS)

Iqbal, Asim; Farooq, Umar; Mahmood, Hassan; Asad, Muhammad Usman; Khan, Akrama; Atiq, Hafiz Muhammad

2010-02-01

A self teaching image processing and voice recognition based system is developed to educate visually impaired children, chiefly in their primary education. System comprises of a computer, a vision camera, an ear speaker and a microphone. Camera, attached with the computer system is mounted on the ceiling opposite (on the required angle) to the desk on which the book is placed. Sample images and voices in the form of instructions and commands of English, Urdu alphabets, Numeric Digits, Operators and Shapes are already stored in the database. A blind child first reads the embossed character (object) with the help of fingers than he speaks the answer, name of the character, shape etc into the microphone. With the voice command of a blind child received by the microphone, image is taken by the camera which is processed by MATLAB® program developed with the help of Image Acquisition and Image processing toolbox and generates a response or required set of instructions to child via ear speaker, resulting in self education of a visually impaired child. Speech recognition program is also developed in MATLAB® with the help of Data Acquisition and Signal Processing toolbox which records and process the command of the blind child.
Cultural in-group advantage: emotion recognition in African American and European American faces and voices.

PubMed

Wickline, Virginia B; Bailey, Wendy; Nowicki, Stephen

2009-03-01

The authors explored whether there were in-group advantages in emotion recognition of faces and voices by culture or geographic region. Participants were 72 African American students (33 men, 39 women), 102 European American students (30 men, 72 women), 30 African international students (16 men, 14 women), and 30 European international students (15 men, 15 women). The participants determined emotions in African American and European American faces and voices. Results showed an in-group advantage-sometimes by culture, less often by race-in recognizing facial and vocal emotional expressions. African international students were generally less accurate at interpreting American nonverbal stimuli than were European American, African American, and European international peers. Results suggest that, although partly universal, emotional expressions have subtle differences across cultures that persons must learn.
Progressive Associative Phonagnosia: A Neuropsychological Analysis

ERIC Educational Resources Information Center

Hailstone, Julia C.; Crutch, Sebastian J.; Vestergaard, Martin D.; Patterson, Roy D.; Warren, Jason D.

2010-01-01

There are few detailed studies of impaired voice recognition, or phonagnosia. Here we describe two patients with progressive phonagnosia in the context of frontotemporal lobar degeneration. Patient QR presented with behavioural decline and increasing difficulty recognising familiar voices, while patient KL presented with progressive prosopagnosia.…

Voice Enabled Framework to Support Post-Surgical Discharge Monitoring

PubMed Central

Blansit, Kevin; Marmor, Rebecca; Zhao, Beiqun; Tien, Dan

2017-01-01

Unplanned surgical readmissions pose a challenging problem for the American healthcare system. We propose to combine consumer electronic voice recognition technology with the FHIR standard to create a post-surgical discharge monitoring app to identify and alert physicians to a patient’s deteriorating status. PMID:29854267
Cerebral Processing of Voice Gender Studied Using a Continuous Carryover fMRI Design

PubMed Central

Pernet, Cyril; Latinus, Marianne; Crabbe, Frances; Belin, Pascal

2013-01-01

Normal listeners effortlessly determine a person's gender by voice, but the cerebral mechanisms underlying this ability remain unclear. Here, we demonstrate 2 stages of cerebral processing during voice gender categorization. Using voice morphing along with an adaptation-optimized functional magnetic resonance imaging design, we found that secondary auditory cortex including the anterior part of the temporal voice areas in the right hemisphere responded primarily to acoustical distance with the previously heard stimulus. In contrast, a network of bilateral regions involving inferior prefrontal and anterior and posterior cingulate cortex reflected perceived stimulus ambiguity. These findings suggest that voice gender recognition involves neuronal populations along the auditory ventral stream responsible for auditory feature extraction, functioning in pair with the prefrontal cortex in voice gender perception. PMID:22490550
V2S: Voice to Sign Language Translation System for Malaysian Deaf People

NASA Astrophysics Data System (ADS)

Mean Foong, Oi; Low, Tang Jung; La, Wai Wan

The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.
The voice conveys emotion in ten globalized cultures and one remote village in Bhutan.

PubMed

Cordaro, Daniel T; Keltner, Dacher; Tshering, Sumjay; Wangchuk, Dorji; Flynn, Lisa M

2016-02-01

With data from 10 different globalized cultures and 1 remote, isolated village in Bhutan, we examined universals and cultural variations in the recognition of 16 nonverbal emotional vocalizations. College students in 10 nations (Study 1) and villagers in remote Bhutan (Study 2) were asked to match emotional vocalizations to 1-sentence stories of the same valence. Guided by previous conceptualizations of recognition accuracy, across both studies, 7 of the 16 vocal burst stimuli were found to have strong or very strong recognition in all 11 cultures, 6 vocal bursts were found to have moderate recognition, and 4 were not universally recognized. All vocal burst stimuli varied significantly in terms of the degree to which they were recognized across the 11 cultures. Our discussion focuses on the implications of these results for current debates concerning the emotion conveyed in the voice. (c) 2016 APA, all rights reserved).
A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space.

PubMed

Yu, Chengzhu; Hansen, John H L

2017-03-01

Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.
DTO-675: Voice Control of the Closed Circuit Television System

NASA Technical Reports Server (NTRS)

Salazar, George; Gaston, Darilyn M.; Haynes, Dena S.

1996-01-01

This report presents the results of the Detail Test Object (DTO)-675 "Voice Control of the Closed Circuit Television (CCTV)" system. The DTO is a follow-on flight of the Voice Command System (VCS) that flew as a secondary payload on STS-41. Several design changes were made to the VCS for the STS-78 mission. This report discusses those design changes, the data collected during the mission, recognition problems encountered, and findings.
Hands-free human-machine interaction with voice

NASA Astrophysics Data System (ADS)

Juang, B. H.

2004-05-01

Voice is natural communication interface between a human and a machine. The machine, when placed in today's communication networks, may be configured to provide automation to save substantial operating cost, as demonstrated in AT&T's VRCP (Voice Recognition Call Processing), or to facilitate intelligent services, such as virtual personal assistants, to enhance individual productivity. These intelligent services often need to be accessible anytime, anywhere (e.g., in cars when the user is in a hands-busy-eyes-busy situation or during meetings where constantly talking to a microphone is either undersirable or impossible), and thus call for advanced signal processing and automatic speech recognition techniques which support what we call ``hands-free'' human-machine communication. These techniques entail a broad spectrum of technical ideas, ranging from use of directional microphones and acoustic echo cancellatiion to robust speech recognition. In this talk, we highlight a number of key techniques that were developed for hands-free human-machine communication in the mid-1990s after Bell Labs became a unit of Lucent Technologies. A video clip will be played to demonstrate the accomplishement.
Military applications of automatic speech recognition and future requirements

NASA Technical Reports Server (NTRS)

Beek, Bruno; Cupples, Edward J.

1977-01-01

An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.
Voice emotion recognition by cochlear-implanted children and their normally-hearing peers.

PubMed

Chatterjee, Monita; Zion, Danielle J; Deroche, Mickael L; Burianek, Brooke A; Limb, Charles J; Goren, Alison P; Kulkarni, Aditya M; Christensen, Julie A

2015-04-01

Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups' mean performance is similar to aNHs' performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. This article is part of a Special Issue entitled . Copyright © 2014 Elsevier B.V. All rights reserved.
Voice Technologies in Libraries: A Look into the Future.

ERIC Educational Resources Information Center

Lange, Holley R., Ed.; And Others

1991-01-01

Discussion of synthesized speech and voice recognition focuses on a forum that addressed the potential for speech technologies in libraries. Topics discussed by three contributors include possible library applications in technical processing, book receipt, circulation control, and database access; use by disabled and illiterate users; and problems…
Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

PubMed

Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

2016-10-01

Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
The "VoiceForum" Platform for Spoken Interaction

ERIC Educational Resources Information Center

Fynn, Fohn; Wigham, Chiara R.

2011-01-01

Showcased in the courseware exhibition, "VoiceForum" is a web-based software platform for asynchronous learner interaction in threaded discussions using voice and text. A dedicated space is provided for the tutor who can give feedback on a posted message and dialogue with the participants at a separate level from the main interactional…
Real time analysis of voiced sounds

NASA Technical Reports Server (NTRS)

Hong, J. P. (Inventor)

1976-01-01

A power spectrum analysis of the harmonic content of a voiced sound signal is conducted in real time by phase-lock-loop tracking of the fundamental frequency, (f sub 0) of the signal and successive harmonics (h sub 1 through h sub n) of the fundamental frequency. The analysis also includes measuring the quadrature power and phase of each frequency tracked, differentiating the power measurements of the harmonics in adjacent pairs, and analyzing successive differentials to determine peak power points in the power spectrum for display or use in analysis of voiced sound, such as for voice recognition.
Voice Response System Statistics Program : Operational Handbook.

DOT National Transportation Integrated Search

1980-06-01

This report documents the Voice Response System (VRS) Statistics Program developed for the preflight weather briefing VRS. It describes the VRS statistical report format and contents, the software program structure, and the program operation.
Neural Processing of Vocal Emotion and Identity

ERIC Educational Resources Information Center

Spreckelmeyer, Katja N.; Kutas, Marta; Urbach, Thomas; Altenmuller, Eckart; Munte, Thomas F.

2009-01-01

The voice is a marker of a person's identity which allows individual recognition even if the person is not in sight. Listening to a voice also affords inferences about the speaker's emotional state. Both these types of personal information are encoded in characteristic acoustic feature patterns analyzed within the auditory cortex. In the present…
Voice Interactive Analysis System Study. Final Report, August 28, 1978 through March 23, 1979.

ERIC Educational Resources Information Center

Harry, D. P.; And Others

The Voice Interactive Analysis System study continued research and development of the LISTEN real-time, minicomputer based connected speech recognition system, within NAVTRAEQUIPCEN'S program of developing automatic speech technology in support of training. An attempt was made to identify the most effective features detected by the TTI-500 model…
Social support and substitute voice acquisition on psychological adjustment among patients after laryngectomy.

PubMed

Kotake, Kumiko; Suzukamo, Yoshimi; Kai, Ichiro; Iwanaga, Kazuyo; Takahashi, Aya

2017-03-01

The objective is to clarify whether social support and acquisition of alternative voice enhance the psychological adjustment of laryngectomized patients and which part of the psychological adjustment structure would be influenced by social support. We contacted 1445 patients enrolled in a patient association using mail surveys and 679 patients agreed to participate in the study. The survey items included age, sex, occupation, post-surgery duration, communication method, psychological adjustment (by the Nottingham Adjustment Scale Japanese Laryngectomy Version: NAS-J-L), and the formal support (by Hospital Patient Satisfaction Questionnaire-25: HPSQ-25). Social support and communication methods were added to the three-tier structural model of psychological adjustment shown in our previous study, and a covariance structure analysis was conducted. Formal/informal supports and acquisition of alternative voice influence only the "recognition of oneself as voluntary agent", the first tier of the three-tier structure of psychological adjustment. The results suggest that social support and acquisition of alternative voice may enhance the recognition of oneself as voluntary agent and promote the psychological adjustment.
A Comparison of Text, Voice, and Screencasting Feedback to Online Students

ERIC Educational Resources Information Center

Orlando, John

2016-01-01

The emergence of simple video and voice recording software has allowed faculty to deliver online course content in a variety of rich formats. But most faculty are still using traditional text comments for feedback to students. The author launched a study comparing student and faculty perceptions of text, voice, and screencasting feedback. The…
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

PubMed

Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

2015-07-01

It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line with the 'auditory-visual view' of auditory speech perception, which assumes that auditory speech recognition is optimized by using predictions from previously encoded speaker-specific audio-visual internal models. Copyright © 2015 Elsevier Ltd. All rights reserved.
Valuing autonomy, struggling for an identity and a collective voice, and seeking role recognition: community mental health nurses' perceptions of their roles.

PubMed

White, Jane H; Kudless, Mary

2008-10-01

Leaders in this community mental health system approached the problem of job frustration, morale issues, and turnover concerns of their Community Mental Health Nurses (CMHNs) by designing a qualitative study using Participant Action Research (PAR) methodology based on the philosophy of Habermas. Six focus groups were conducted to address the nurses' concerns. The themes of Valuing Autonomy, Struggling for an Identity and Collective Voice, and Seeking Role Recognition best explained the participants' concerns. The study concluded with an action plan, the implementation of the plan, and a discussion of the plan's final outcomes.

Design of a digital voice data compression technique for orbiter voice channels

NASA Technical Reports Server (NTRS)

1975-01-01

Candidate techniques were investigated for digital voice compression to a transmission rate of 8 kbps. Good voice quality, speaker recognition, and robustness in the presence of error bursts were considered. The technique of delayed-decision adaptive predictive coding is described and compared with conventional adaptive predictive coding. Results include a set of experimental simulations recorded on analog tape. The two FM broadcast segments produced show the delayed-decision technique to be virtually undegraded or minimally degraded at .001 and .01 Viterbi decoder bit error rates. Preliminary estimates of the hardware complexity of this technique indicate potential for implementation in space shuttle orbiters.
Voiced Excitations

DTIC Science & Technology

2004-12-01

3701 North Fairfax Drive Arlington, VA 22203-1714 NA NA NA Radar & EM Speech, Voiced Speech Excitations 61 ULUNCLASSIFIED UNCLASSIFIED UNCLASSIFIED...New Ideas for Speech Recognition and Related Technologies”, Lawrence Livermore National Laboratory Report, UCRL -UR-120310 , 1995 . Available from...Livermore Laboratory report UCRL -JC-134775M Holzrichter 2003, Holzrichter J.F., Kobler, J. B., Rosowski, J.J., Burke, G.J., (2003) “EM wave
Children's Recognition of Their Own Recorded Voice: Influence of Age and Phonological Impairment

ERIC Educational Resources Information Center

Strombergsson, Sofia

2013-01-01

Children with phonological impairment (PI) often have difficulties perceiving insufficiencies in their own speech. The use of recordings has been suggested as a way of directing the child's attention toward his/her own speech, despite a lack of evidence that children actually recognize their recorded voice as their own. We present two studies of…
Vocal Identity Recognition in Autism Spectrum Disorder

PubMed Central

Lin, I-Fan; Yamada, Takashi; Komine, Yoko; Kato, Nobumasa; Kato, Masaharu; Kashino, Makio

2015-01-01

Voices can convey information about a speaker. When forming an abstract representation of a speaker, it is important to extract relevant features from acoustic signals that are invariant to the modulation of these signals. This study investigated the way in which individuals with autism spectrum disorder (ASD) recognize and memorize vocal identity. The ASD group and control group performed similarly in a task when asked to choose the name of the newly-learned speaker based on his or her voice, and the ASD group outperformed the control group in a subsequent familiarity test when asked to discriminate the previously trained voices and untrained voices. These findings suggest that individuals with ASD recognized and memorized voices as well as the neurotypical individuals did, but they categorized voices in a different way: individuals with ASD categorized voices quantitatively based on the exact acoustic features, while neurotypical individuals categorized voices qualitatively based on the acoustic patterns correlated to the speakers' physical and mental properties. PMID:26070199
Vocal Identity Recognition in Autism Spectrum Disorder.

PubMed

Lin, I-Fan; Yamada, Takashi; Komine, Yoko; Kato, Nobumasa; Kato, Masaharu; Kashino, Makio

2015-01-01

Voices can convey information about a speaker. When forming an abstract representation of a speaker, it is important to extract relevant features from acoustic signals that are invariant to the modulation of these signals. This study investigated the way in which individuals with autism spectrum disorder (ASD) recognize and memorize vocal identity. The ASD group and control group performed similarly in a task when asked to choose the name of the newly-learned speaker based on his or her voice, and the ASD group outperformed the control group in a subsequent familiarity test when asked to discriminate the previously trained voices and untrained voices. These findings suggest that individuals with ASD recognized and memorized voices as well as the neurotypical individuals did, but they categorized voices in a different way: individuals with ASD categorized voices quantitatively based on the exact acoustic features, while neurotypical individuals categorized voices qualitatively based on the acoustic patterns correlated to the speakers' physical and mental properties.
The electronic, 'paperless' medical office; has it arrived?

PubMed

Gates, P; Urquhart, J

2007-02-01

Modern information technology offers efficiencies in medical practice, with a reduction in secretarial time in maintaining, filing and retrieving the paper medical record. Electronic requesting of investigations allows tracking of outstanding results. Less storage space is required and telephone calls from pharmacies, pathology and medical imaging service providers to clarify the hand-written request are abolished. Voice recognition software reduces secretarial typing time per letter. These combined benefits can lead to significantly reduced costs and improved patient care. The paperless office is possible, but requires commitment and training of all staff; it is preferable but not absolutely essential that at least one member of the practice has an interest and some expertise in computers. More importantly, back-up from information technology providers and back-up of the electronic data are absolutely crucial and a paperless environment should not be considered without them.
Automated speech understanding: the next generation

NASA Astrophysics Data System (ADS)

Picone, J.; Ebel, W. J.; Deshmukh, N.

1995-04-01

Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.
Detecting paroxysmal coughing from pertussis cases using voice recognition technology.

PubMed

Parker, Danny; Picone, Joseph; Harati, Amir; Lu, Shuang; Jenkyns, Marion H; Polgreen, Philip M

2013-01-01

Pertussis is highly contagious; thus, prompt identification of cases is essential to control outbreaks. Clinicians experienced with the disease can easily identify classic cases, where patients have bursts of rapid coughing followed by gasps, and a characteristic whooping sound. However, many clinicians have never seen a case, and thus may miss initial cases during an outbreak. The purpose of this project was to use voice-recognition software to distinguish pertussis coughs from croup and other coughs. We collected a series of recordings representing pertussis, croup and miscellaneous coughing by children. We manually categorized coughs as either pertussis or non-pertussis, and extracted features for each category. We used Mel-frequency cepstral coefficients (MFCC), a sampling rate of 16 KHz, a frame Duration of 25 msec, and a frame rate of 10 msec. The coughs were filtered. Each cough was divided into 3 sections of proportion 3-4-3. The average of the 13 MFCCs for each section was computed and made into a 39-element feature vector used for the classification. We used the following machine learning algorithms: Neural Networks, K-Nearest Neighbor (KNN), and a 200 tree Random Forest (RF). Data were reserved for cross-validation of the KNN and RF. The Neural Network was trained 100 times, and the averaged results are presented. After categorization, we had 16 examples of non-pertussis coughs and 31 examples of pertussis coughs. Over 90% of all pertussis coughs were properly classified as pertussis. The error rates were: Type I errors of 7%, 12%, and 25% and Type II errors of 8%, 0%, and 0%, using the Neural Network, Random Forest, and KNN, respectively. Our results suggest that we can build a robust classifier to assist clinicians and the public to help identify pertussis cases in children presenting with typical symptoms.
Bilingual Computerized Speech Recognition Screening for Depression Symptoms

ERIC Educational Resources Information Center

Gonzalez, Gerardo; Carter, Colby; Blanes, Erika

2007-01-01

The Voice-Interactive Depression Assessment System (VIDAS) is a computerized speech recognition application for screening depression based on the Center for Epidemiological Studies--Depression scale in English and Spanish. Study 1 included 50 English and 47 Spanish speakers. Study 2 involved 108 English and 109 Spanish speakers. Participants…
The Army word recognition system

NASA Technical Reports Server (NTRS)

Hadden, David R.; Haratz, David

1977-01-01

The application of speech recognition technology in the Army command and control area is presented. The problems associated with this program are described as well as as its relevance in terms of the man/machine interactions, voice inflexions, and the amount of training needed to interact with and utilize the automated system.
Hearing the Unheard: An Interdisciplinary, Mixed Methodology Study of Women's Experiences of Hearing Voices (Auditory Verbal Hallucinations).

PubMed

McCarthy-Jones, Simon; Castro Romero, Maria; McCarthy-Jones, Roseline; Dillon, Jacqui; Cooper-Rompato, Christine; Kieran, Kathryn; Kaufman, Milissa; Blackman, Lisa

2015-01-01

This paper explores the experiences of women who "hear voices" (auditory verbal hallucinations). We begin by examining historical understandings of women hearing voices, showing these have been driven by androcentric theories of how women's bodies functioned leading to women being viewed as requiring their voices be interpreted by men. We show the twentieth century was associated with recognition that the mental violation of women's minds (represented by some voice-hearing) was often a consequence of the physical violation of women's bodies. We next report the results of a qualitative study into voice-hearing women's experiences (n = 8). This found similarities between women's relationships with their voices and their relationships with others and the wider social context. Finally, we present results from a quantitative study comparing voice-hearing in women (n = 65) and men (n = 132) in a psychiatric setting. Women were more likely than men to have certain forms of voice-hearing (voices conversing) and to have antecedent events of trauma, physical illness, and relationship problems. Voices identified as female may have more positive affect than male voices. We conclude that women voice-hearers have and continue to face specific challenges necessitating research and activism, and hope this paper will act as a stimulus to such work.
A memory like a female Fur Seal: long-lasting recognition of pup's voice by mothers.

PubMed

Mathevon, Nicolas; Charrier, Isabelle; Aubin, Thierry

2004-06-01

In colonial mammals like fur seals, mutual vocal recognition between mothers and their pup is of primary importance for breeding success. Females alternate feeding sea-trips with suckling periods on land, and when coming back from the ocean, they have to vocally find their offspring among numerous similar-looking pups. Young fur seals emit a 'mother-attraction call' that presents individual characteristics. In this paper, we review the perceptual process of pup's call recognition by Subantarctic Fur Seal Arctocephalus tropicalis mothers. To identify their progeny, females rely on the frequency modulation pattern and spectral features of this call. As the acoustic characteristics of a pup's call change throughout the lactation period due to the growing process, mothers have thus to refine their memorization of their pup's voice. Field experiments show that female Fur Seals are able to retain all the successive versions of their pup's call.
Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

NASA Astrophysics Data System (ADS)

Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.

2009-12-01

This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Voice Identification: Levels-of-Processing and the Relationship between Prior Description Accuracy and Recognition Accuracy.

ERIC Educational Resources Information Center

Walter, Todd J.

A study examined whether a person's ability to accurately identify a voice is influenced by factors similar to those proposed by the Supreme Court for eyewitness identification accuracy. In particular, the Supreme Court has suggested that a person's prior description accuracy of a suspect, degree of attention to a suspect, and confidence in…
Some effects of stress on users of a voice recognition system: A preliminary inquiry

NASA Astrophysics Data System (ADS)

French, B. A.

1983-03-01

Recent work with Automatic Speech Recognition has focused on applications and productivity considerations in the man-machine interface. This thesis is an attempt to see if placing users of such equipment under time-induced stress has an effect on their percent correct recognition rates. Subjects were given a message-handling task of fixed length and allowed progressively shorter times to attempt to complete it. Questionnaire responses indicate stress levels increased with decreased time-allowance; recognition rates decreased as time was reduced.
Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants.

PubMed

Hoy, Matthew B

2018-01-01

Voice assistants are software agents that can interpret human speech and respond via synthesized voices. Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant are the most popular voice assistants and are embedded in smartphones or dedicated home speakers. Users can ask their assistants questions, control home automation devices and media playback via voice, and manage other basic tasks such as email, to-do lists, and calendars with verbal commands. This column will explore the basic workings and common features of today's voice assistants. It will also discuss some of the privacy and security issues inherent to voice assistants and some potential future uses for these devices. As voice assistants become more widely used, librarians will want to be familiar with their operation and perhaps consider them as a means to deliver library services and materials.
Voice control of the space shuttle video system

NASA Technical Reports Server (NTRS)

Bejczy, A. K.; Dotson, R. S.; Brown, J. W.; Lewis, J. L.

1981-01-01

A pilot voice control system developed at the Jet Propulsion Laboratory (JPL) to test and evaluate the feasibility of controlling the shuttle TV cameras and monitors by voice commands utilizes a commercially available discrete word speech recognizer which can be trained to the individual utterances of each operator. Successful ground tests were conducted using a simulated full-scale space shuttle manipulator. The test configuration involved the berthing, maneuvering and deploying a simulated science payload in the shuttle bay. The handling task typically required 15 to 20 minutes and 60 to 80 commands to 4 TV cameras and 2 TV monitors. The best test runs show 96 to 100 percent voice recognition accuracy.
The role of voice input for human-machine communication.

PubMed Central

Cohen, P R; Oviatt, S L

1995-01-01

Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology. PMID:7479803
Sound specificity effects in spoken word recognition: The effect of integrality between words and sounds.

PubMed

Strori, Dorina; Zaar, Johannes; Cooke, Martin; Mattys, Sven L

2018-01-01

Recent evidence has shown that nonlinguistic sounds co-occurring with spoken words may be retained in memory and affect later retrieval of the words. This sound-specificity effect shares many characteristics with the classic voice-specificity effect. In this study, we argue that the sound-specificity effect is conditional upon the context in which the word and sound coexist. Specifically, we argue that, besides co-occurrence, integrality between words and sounds is a crucial factor in the emergence of the effect. In two recognition-memory experiments, we compared the emergence of voice and sound specificity effects. In Experiment 1 , we examined two conditions where integrality is high. Namely, the classic voice-specificity effect (Exp. 1a) was compared with a condition in which the intensity envelope of a background sound was modulated along the intensity envelope of the accompanying spoken word (Exp. 1b). Results revealed a robust voice-specificity effect and, critically, a comparable sound-specificity effect: A change in the paired sound from exposure to test led to a decrease in word-recognition performance. In the second experiment, we sought to disentangle the contribution of integrality from a mere co-occurrence context effect by removing the intensity modulation. The absence of integrality led to the disappearance of the sound-specificity effect. Taken together, the results suggest that the assimilation of background sounds into memory cannot be reduced to a simple context effect. Rather, it is conditioned by the extent to which words and sounds are perceived as integral as opposed to distinct auditory objects.
Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

PubMed Central

Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

2016-01-01

Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714

Multimodal approaches for emotion recognition: a survey

NASA Astrophysics Data System (ADS)

Sebe, Nicu; Cohen, Ira; Gevers, Theo; Huang, Thomas S.

2004-12-01

Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing-emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in several applications. This paper explores new ways of human-computer interaction that enable the computer to be more aware of the user's emotional and attentional expressions. We present the basic research in the field and the recent advances into the emotion recognition from facial, voice, and physiological signals, where the different modalities are treated independently. We then describe the challenging problem of multimodal emotion recognition and we advocate the use of probabilistic graphical models when fusing the different modalities. We also discuss the difficult issues of obtaining reliable affective data, obtaining ground truth for emotion recognition, and the use of unlabeled data.
Multimodal approaches for emotion recognition: a survey

NASA Astrophysics Data System (ADS)

Sebe, Nicu; Cohen, Ira; Gevers, Theo; Huang, Thomas S.

2005-01-01

Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing-emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in several applications. This paper explores new ways of human-computer interaction that enable the computer to be more aware of the user's emotional and attentional expressions. We present the basic research in the field and the recent advances into the emotion recognition from facial, voice, and physiological signals, where the different modalities are treated independently. We then describe the challenging problem of multimodal emotion recognition and we advocate the use of probabilistic graphical models when fusing the different modalities. We also discuss the difficult issues of obtaining reliable affective data, obtaining ground truth for emotion recognition, and the use of unlabeled data.
Voice gender discrimination provides a measure of more than pitch-related perception in cochlear implant users.

PubMed

Li, Tianhao; Fu, Qian-Jie

2011-08-01

(1) To investigate whether voice gender discrimination (VGD) could be a useful indicator of the spectral and temporal processing abilities of individual cochlear implant (CI) users; (2) To examine the relationship between VGD and speech recognition with CI when comparable acoustic cues are used for both perception processes. VGD was measured using two talker sets with different inter-gender fundamental frequencies (F(0)), as well as different acoustic CI simulations. Vowel and consonant recognition in quiet and noise were also measured and compared with VGD performance. Eleven postlingually deaf CI users. The results showed that (1) mean VGD performance differed for different stimulus sets, (2) VGD and speech recognition performance varied among individual CI users, and (3) individual VGD performance was significantly correlated with speech recognition performance under certain conditions. VGD measured with selected stimulus sets might be useful for assessing not only pitch-related perception, but also spectral and temporal processing by individual CI users. In addition to improvements in spectral resolution and modulation detection, the improvement in higher modulation frequency discrimination might be particularly important for CI users in noisy environments.
The Bangor Voice Matching Test: A standardized test for the assessment of voice perception ability.

PubMed

Mühl, Constanze; Sheil, Orla; Jarutytė, Lina; Bestelmeyer, Patricia E G

2017-11-09

Recognising the identity of conspecifics is an important yet highly variable skill. Approximately 2 % of the population suffers from a socially debilitating deficit in face recognition. More recently the existence of a similar deficit in voice perception has emerged (phonagnosia). Face perception tests have been readily available for years, advancing our understanding of underlying mechanisms in face perception. In contrast, voice perception has received less attention, and the construction of standardized voice perception tests has been neglected. Here we report the construction of the first standardized test for voice perception ability. Participants make a same/different identity decision after hearing two voice samples. Item Response Theory guided item selection to ensure the test discriminates between a range of abilities. The test provides a starting point for the systematic exploration of the cognitive and neural mechanisms underlying voice perception. With a high test-retest reliability (r=.86) and short assessment duration (~10 min) this test examines individual abilities reliably and quickly and therefore also has potential for use in developmental and neuropsychological populations.
Recognizing famous voices: influence of stimulus duration and different types of retrieval cues.

PubMed

Schweinberger, S R; Herholz, A; Sommer, W

1997-04-01

The current investigation measured the effects of increasing stimulus duration on listeners' ability to recognize famous voices. In addition, the investigation studied the influence of different types of cues on the naming of voices that could not be named before. Participants were presented with samples of famous and unfamiliar voices and were asked to decide whether or not the samples were spoken by a famous person. The duration of each sample increased in seven steps from 0.25 s up to a maximum of 2 s. Voice recognition improvements with stimulus duration were with a growth function. Gains were most rapid within the first second and less pronounced thereafter. When participants were unable to name a famous voice, they were cued with either a second voice sample, the occupation, or the initials of the celebrity. Initials were most effective in eliciting the name only when semantic information about the speaker had been accessed prior to cue presentation. Paralleling previous research on face naming, this may indicate that voice naming is contingent on previous activation of person-specific semantic information.
Assistive Software for Disabled Learners

ERIC Educational Resources Information Center

Clark, Sharon; Baggaley, Jon

2004-01-01

Previous reports in this series (#32 and 36) have discussed online software features of value to disabled learners in distance education. The current report evaluates four specific assistive software products with useful features for visually and hearing impaired learners: "ATutor", "ACollab", "Natural Voice", and "Just Vanilla". The evaluative…
Predicting Voice Disorder Status From Smoothed Measures of Cepstral Peak Prominence Using Praat and Analysis of Dysphonia in Speech and Voice (ADSV).

PubMed

Sauder, Cara; Bretl, Michelle; Eadie, Tanya

2017-09-01

The purposes of this study were to (1) determine and compare the diagnostic accuracy of a single acoustic measure, smoothed cepstral peak prominence (CPPS), to predict voice disorder status from connected speech samples using two software systems: Analysis of Dysphonia in Speech and Voice (ADSV) and Praat; and (2) to determine the relationship between measures of CPPS generated from these programs. This is a retrospective cross-sectional study. Measures of CPPS were obtained from connected speech recordings of 100 subjects with voice disorders and 70 nondysphonic subjects without vocal complaints using commercially available ADSV and freely downloadable Praat software programs. Logistic regression and receiver operating characteristic (ROC) analyses were used to evaluate and compare the diagnostic accuracy of CPPS measures. Relationships between CPPS measures from the programs were determined. Results showed acceptable overall accuracy rates (75% accuracy, ADSV; 82% accuracy, Praat) and area under the ROC curves (area under the curve [AUC] = 0.81, ADSV; AUC = 0.91, Praat) for predicting voice disorder status, with slight differences in sensitivity and specificity. CPPS measures derived from Praat were uniquely predictive of disorder status above and beyond CPPS measures from ADSV (χ 2 (1) = 40.71, P < 0.001). CPPS measures from both programs were significantly and highly correlated (r = 0.88, P < 0.001). A single acoustic measure of CPPS was highly predictive of voice disorder status using either program. Clinicians may consider using CPPS to complement clinical voice evaluation and screening protocols. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
A preliminary comparison of speech recognition functionality in dental practice management systems.

PubMed

Irwin, Jeannie Y; Schleyer, Titus

2008-11-06

In this study, we examined speech recognition functionality in four leading dental practice management systems. Twenty dental students used voice to chart a simulated patient with 18 findings in each system. Results show it can take over a minute to chart one finding and that users frequently have to repeat commands. Limited functionality, poor usability and a high error rate appear to retard adoption of speech recognition in dentistry.
[Acoustic voice analysis using the Praat program: comparative study with the Dr. Speech program].

PubMed

Núñez Batalla, Faustino; González Márquez, Rocío; Peláez González, M Belén; González Laborda, Irene; Fernández Fernández, María; Morato Galán, Marta

2014-01-01

The European Laryngological Society (ELS) basic protocol for functional assessment of voice pathology includes 5 different approaches: perception, videostroboscopy, acoustics, aerodynamics and subjective rating by the patient. In this study we focused on acoustic voice analysis. The purpose of the present study was to correlate the results obtained by the commercial software Dr. Speech and the free software Praat in 2 fields: 1. Narrow-band spectrogram (the presence of noise according to Yanagihara, and the presence of subharmonics) (semi-quantitative). 2. Voice acoustic parameters (jitter, shimmer, harmonics-to-noise ratio, fundamental frequency) (quantitative). We studied a total of 99 voice samples from individuals with Reinke's oedema diagnosed using videostroboscopy. One independent observer used Dr. Speech 3.0 and a second one used the Praat program (Phonetic Sciences, University of Amsterdam). The spectrographic analysis consisted of obtaining a narrow-band spectrogram from the previous digitalised voice samples by the 2 independent observers. They then determined the presence of noise in the spectrogram, using the Yanagihara grades, as well as the presence of subharmonics. As a final result, the acoustic parameters of jitter, shimmer, harmonics-to-noise ratio and fundamental frequency were obtained from the 2 acoustic analysis programs. The results indicated that the sound spectrogram and the numerical values obtained for shimmer and jitter were similar for both computer programs, even though types 1, 2 and 3 voice samples were analysed. The Praat and Dr. Speech programs provide similar results in the acoustic analysis of pathological voices. Copyright © 2013 Elsevier España, S.L. All rights reserved.
Utilization of Internet Protocol-Based Voice Systems in Remote Payload Operations

NASA Technical Reports Server (NTRS)

Best, Susan; Nichols, Kelvin; Bradford, Robert

2003-01-01

This viewgraph presentation provides an overview of a proposed voice communication system for use in remote payload operations performed on the International Space Station. The system, Internet Voice Distribution System (IVoDS), would make use of existing Internet protocols, and offer a number of advantages over the system currently in use. Topics covered include: system description and operation, system software and hardware, system architecture, project status, and technology transfer applications.
Double Fourier analysis for Emotion Identification in Voiced Speech

NASA Astrophysics Data System (ADS)

Sierra-Sosa, D.; Bastidas, M.; Ortiz P., D.; Quintero, O. L.

2016-04-01

We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech. Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions. A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds. Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions. Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it. Finally features related with emotions in voiced speech are extracted and presented.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, J.F.; Ng, L.C.

1998-03-17

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, John F.; Ng, Lawrence C.

1998-01-01

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
Reasons for non-adherence to cardiometabolic medications, and acceptability of an interactive voice response intervention in patients with hypertension and type 2 diabetes in primary care: a qualitative study

PubMed Central

Sutton, Stephen

2017-01-01

Objectives This study explored the reasons for patients’ non-adherence to cardiometabolic medications, and tested the acceptability of the interactive voice response (IVR) as a way to address these reasons, and support patients, between primary care consultations. Design, method, participants and setting The study included face-to-face interviews with 19 patients with hypertension and/or type 2 diabetes mellitus, selected from primary care databases, and presumed to be non-adherent. Thirteen of these patients pretested elements of the IVR intervention few months later, using a think-aloud protocol. Five practice nurses were interviewed. Data were analysed using multiperspective, and longitudinalthematic analysis. Results Negative beliefs about taking medications, the complexity of prescribed medication regimens, and the limited ability to cope with the underlying affective state, within challenging contexts, were mentioned as important reasons for non-adherence. Nurses reported time constraints to address each patient’s different reasons for non-adherence, and limited efficacy to support patients, between primary care consultations. Patients gave positive experiential feedback about the IVR messages as a way to support them take their medicines, and provided recommendations for intervention content and delivery mode. Specifically, they liked the voice delivering the messages and the voice recognition software. For intervention content, they preferred messages that were tailored, and included messages with ‘information about health consequences’, ‘action plans’, or simple reminders for performing the behaviour. Conclusions Patients with hypertension and/or type 2 diabetes, and practice nurses, suggested messages tailored to each patient’s reasons for non-adherence. Participants recommended IVR as an acceptable platform to support adherence to cardiometabolic medications between primary care consultations. Future studies could usefully test the acceptability, and feasibility, of tailored IVR interventions to support medication adherence, as an adjunct to primary care. PMID:28801402
Human-computer interaction for alert warning and attention allocation systems of the multimodal watchstation

NASA Astrophysics Data System (ADS)

Obermayer, Richard W.; Nugent, William A.

2000-11-01

The SPAWAR Systems Center San Diego is currently developing an advanced Multi-Modal Watchstation (MMWS); design concepts and software from this effort are intended for transition to future United States Navy surface combatants. The MMWS features multiple flat panel displays and several modes of user interaction, including voice input and output, natural language recognition, 3D audio, stylus and gestural inputs. In 1999, an extensive literature review was conducted on basic and applied research concerned with alerting and warning systems. After summarizing that literature, a human computer interaction (HCI) designer's guide was prepared to support the design of an attention allocation subsystem (AAS) for the MMWS. The resultant HCI guidelines are being applied in the design of a fully interactive AAS prototype. An overview of key findings from the literature review, a proposed design methodology with illustrative examples, and an assessment of progress made in implementing the HCI designers guide are presented.
Hearing the Unheard: An Interdisciplinary, Mixed Methodology Study of Women’s Experiences of Hearing Voices (Auditory Verbal Hallucinations)

PubMed Central

McCarthy-Jones, Simon; Castro Romero, Maria; McCarthy-Jones, Roseline; Dillon, Jacqui; Cooper-Rompato, Christine; Kieran, Kathryn; Kaufman, Milissa; Blackman, Lisa

2015-01-01

This paper explores the experiences of women who “hear voices” (auditory verbal hallucinations). We begin by examining historical understandings of women hearing voices, showing these have been driven by androcentric theories of how women’s bodies functioned leading to women being viewed as requiring their voices be interpreted by men. We show the twentieth century was associated with recognition that the mental violation of women’s minds (represented by some voice-hearing) was often a consequence of the physical violation of women’s bodies. We next report the results of a qualitative study into voice-hearing women’s experiences (n = 8). This found similarities between women’s relationships with their voices and their relationships with others and the wider social context. Finally, we present results from a quantitative study comparing voice-hearing in women (n = 65) and men (n = 132) in a psychiatric setting. Women were more likely than men to have certain forms of voice-hearing (voices conversing) and to have antecedent events of trauma, physical illness, and relationship problems. Voices identified as female may have more positive affect than male voices. We conclude that women voice-hearers have and continue to face specific challenges necessitating research and activism, and hope this paper will act as a stimulus to such work. PMID:26779041
Telemedicine using free voice over internet protocol (VoIP) technology.

PubMed

Miller, David J; Miljkovic, Nikola; Chiesa, Chad; Callahan, John B; Webb, Brad; Boedeker, Ben H

2011-01-01

Though dedicated videoteleconference (VTC) systems deliver high quality, low-latency audio and video for telemedical applications, they require expensive hardware and extensive infrastructure. The purpose of this study was to investigate free commercially available Voice over Internet Protocol (VoIP) software as a low cost alternative for telemedicine.
Within-Category VOT Affects Recovery from "Lexical" Garden-Paths: Evidence against Phoneme-Level Inhibition

ERIC Educational Resources Information Center

McMurray, Bob; Tanenhaus, Michael K.; Aslin, Richard N.

2009-01-01

Spoken word recognition shows gradient sensitivity to within-category voice onset time (VOT), as predicted by several current models of spoken word recognition, including TRACE (McClelland, J., & Elman, J. (1986). The TRACE model of speech perception. "Cognitive Psychology," 18, 1-86). It remains unclear, however, whether this sensitivity is…
Some Thoughts on the Meaning of and Values that Influence Degree Recognition in Canada

ERIC Educational Resources Information Center

Skolnik, Michael L.

2006-01-01

What has been called "degree recognition" has become the subject of considerable attention in Canadian higher education within the past decade. While concerns similar to those that are being voiced today have arisen occasionally in the past, the scale of this phenomenon today is unprecedented historically. In response to the increased…
Suggestions for Layout and Functional Behavior of Software-Based Voice Switch Keysets

NASA Technical Reports Server (NTRS)

Scott, David W.

2010-01-01

Marshall Space Flight Center (MSFC) provides communication services for a number of real time environments, including Space Shuttle Propulsion support and International Space Station (ISS) payload operations. In such settings, control team members speak with each other via multiple voice circuits or loops. Each loop has a particular purpose and constituency, and users are assigned listen and/or talk capabilities for a given loop based on their role in fulfilling the purpose. A voice switch is a given facility's hardware and software that supports such communication, and may be interconnected with other facilities switches to create a large network that, from an end user perspective, acts like a single system. Since users typically monitor and/or respond to several voice loops concurrently for hours on end and real time operations can be very dynamic and intense, it s vital that a control panel or keyset for interfacing with the voice switch be a servant that reduces stress, not a master that adds it. Implementing the visual interface on a computer screen provides tremendous flexibility and configurability, but there s a very real risk of overcomplication. (Remember how office automation made life easier, which led to a deluge of documents that made life harder?) This paper a) discusses some basic human factors considerations related to keysets implemented as application software windows, b) suggests what to standardize at the facility level and what to leave to the user's preference, and c) provides screen shot mockups for a robust but reasonably simple user experience. Concepts apply to keyset needs in almost any type of operations control or support center.

Objective Identification of Prepubertal Female Singers and Non-singers by Singing Power Ratio Using Matlab.

PubMed

Usha, M; Geetha, Y V; Darshan, Y S

2017-03-01

The field of music is increasingly gaining scope and attracting researchers from varied fields in terms of improvising the art of voice modulation in singing. There has been a lot of competition, and young budding singers are emerging with more talent. This study is aimed to develop software to differentiate a prepubertal voice as that of a singer or a non-singer using an objective tool-singing power ratio (SPR)-as an objective measure to quantify the resonant voice quality. Recordings of singing and phonation were obtained from 30 singers and 30 non-singer girls (8-10 years). Three professional singers perceptually evaluated all samples using a rating scale and categorized them as singers or non-singers. Using Matlab, a program was developed to automatically calculate the SPR of a particular sample and classify it into either of two groups based on the normative values of SPR developed manually. Positive correlation for SPR of phonation or singing was found between perceptual and manual ratings, and objective values of SPR. Software could automatically give the SPR values for samples that are fed and could further differentiate them as singer or non-singer. Researchers need not depend on professional singers or musicians for the judgment of voice for research purposes. This software uses an objective tool, which serves as an instrument to judge singing talent using singing and phonation samples of children. Also, it can be used as a first line of judgment in any singing audition process, which could ease the work of professionals. Copyright © 2017 The Voice Foundation. All rights reserved.
[Detection of endpoint for segmentation between consonants and vowels in aphasia rehabilitation software based on artificial intelligence scheduling].

PubMed

Deng, Xingjuan; Chen, Ji; Shuai, Jie

2009-08-01

For the purpose of improving the efficiency of aphasia rehabilitation training, artificial intelligence-scheduling function is added in the aphasia rehabilitation software, and the software's performance is improved. With the characteristics of aphasia patient's voice as well as with the need of artificial intelligence-scheduling functions under consideration, the present authors have designed a set of endpoint detection algorithm. It determines the reference endpoints, then extracts every word and ensures the reasonable segmentation points between consonants and vowels, using the reference endpoints. The results of experiments show that the algorithm is able to attain the objects of detection at a higher accuracy rate. Therefore, it is applicable to the detection of endpoint on aphasia-patient's voice.
Intentional Voice Command Detection for Trigger-Free Speech Interface

NASA Astrophysics Data System (ADS)

Obuchi, Yasunari; Sumiyoshi, Takashi

In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
Automatic assessment of voice quality according to the GRBAS scale.

PubMed

Sáenz-Lechón, Nicolás; Godino-Llorente, Juan I; Osma-Ruiz, Víctor; Blanco-Velasco, Manuel; Cruz-Roldán, Fernando

2006-01-01

Nowadays, the most extended techniques to measure the voice quality are based on perceptual evaluation by well trained professionals. The GRBAS scale is a widely used method for perceptual evaluation of voice quality. The GRBAS scale is widely used in Japan and there is increasing interest in both Europe and the United States. However, this technique needs well-trained experts, and is based on the evaluator's expertise, depending a lot on his own psycho-physical state. Furthermore, a great variability in the assessments performed from one evaluator to another is observed. Therefore, an objective method to provide such measurement of voice quality would be very valuable. In this paper, the automatic assessment of voice quality is addressed by means of short-term Mel cepstral parameters (MFCC), and learning vector quantization (LVQ) in a pattern recognition stage. Results show that this approach provides acceptable results for this purpose, with accuracy around 65% at the best.
[Applicability of voice acoustic analysis with vocal loading testto diagnostics of occupational voice diseases].

PubMed

Niebudek-Bogusz, Ewa; Sliwińska-Kowalska, Mariola

2006-01-01

An assessment of the vocal system, as a part of the medical certification of occupational diseases, should be objective and reliable. Therefore, interest in the method of acoustic voice analysis enabling objective assessment of voice parameters is still growing. The aim of the present study was to evaluate the applicability of acoustic analysis with vocal loading test to the diagnostics of occupational voice disorders. The results of acoustic voice analysis were compared using IRIS software for phoniatrics, before and after a 30-min vocal loading test in 35 female teachers with diagnosed occupational voice disorders (group I) and in 31 female teachers with functional dysphonia (group II). In group I, vocal effort produced significant abnormalities in voice acoustic parameters, compared to group II. These included significantly increased mean fundamental frequency (Fo) value (by 11 Hz) and worsened jitter, shimmer and NHR parameters. Also, the percentage of subjects showing abnormalities in voice acoustic analysis was higher in this group. Conducting voice acoustic analysis before and after the vocal loading test makes it possible to objectively confirm irreversible voice impairments in persons with work-related pathologies of the larynx, which is essential for medical certification of occupational voice diseases.
Smartphones Offer New Opportunities in Clinical Voice Research.

PubMed

Manfredi, C; Lebacq, J; Cantarella, G; Schoentgen, J; Orlandi, S; Bandini, A; DeJonckere, P H

2017-01-01

Smartphone technology provides new opportunities for recording standardized voice samples of patients and sending the files by e-mail to the voice laboratory. This drastically improves the collection of baseline data, as used in research on efficiency of voice treatments. However, the basic requirement is the suitability of smartphones for recording and digitizing pathologic voices (mainly characterized by period perturbations and noise) without significant distortion. In this experiment, two smartphones (a very inexpensive one and a high-level one) were tested and compared with direct microphone recordings in a soundproof room. The voice stimuli consisted in synthesized deviant voice samples (median of fundamental frequency: 120 and 200 Hz) with three levels of jitter and three levels of added noise. All voice samples were analyzed using PRAAT software. The results show high correlations between jitter, shimmer, and noise-to-harmonics ratio measured on the recordings via both smartphones, the microphone, and measured directly on the sound files from the synthesizer. Smartphones thus appear adequate for reliable recording and digitizing of pathologic voices. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Comparison of Voice Handicap Index Scores Between Female Students of Speech Therapy and Other Health Professions.

PubMed

Tafiadis, Dionysios; Chronopoulos, Spyridon K; Siafaka, Vassiliki; Drosos, Konstantinos; Kosma, Evangelia I; Toki, Eugenia I; Ziavra, Nausica

2017-09-01

Students' groups (eg, teachers, speech language pathologists) are presumably at risk of developing a voice disorder due to misuse of their voice, which will affect their way of living. Multidisciplinary voice assessment of student populations is currently spread widely along with the use of self-reported questionnaires. This study compared the Voice Handicap Index domains and item scores between female students of speech and language therapy and of other health professions in Greece. We also examined the probability of speech language therapy students developing any vocal symptom. Two hundred female non-dysphonic students (aged 18-31) were recruited. Participants answered the Voice Evaluation Form and the Greek adaptation of the Voice Handicap Index. Significant differences were observed between the two groups (students of speech therapy and other health professions) through Voice Handicap Index (total score, functional and physical domains), excluding the emotional domain. Furthermore, significant differences for specific Voice Handicap Index items, between subgroups, were observed. In conclusion, speech language therapy students had higher Voice Handicap Index scores, which probably could be an indicator for avoiding profession-related dysphonia at a later stage. Also, Voice Handicap Index could be at a first glance an assessment tool for the recognition of potential voice disorder development in students. In turn, the results could be used for indirect therapy approaches, such as providing methods for maintaining vocal health in different student populations. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Vocal recognition of owners by domestic cats (Felis catus).

PubMed

Saito, Atsuko; Shinozuka, Kazutaka

2013-07-01

Domestic cats have had a 10,000-year history of cohabitation with humans and seem to have the ability to communicate with humans. However, this has not been widely examined. We studied 20 domestic cats to investigate whether they could recognize their owners by using voices that called out the subjects' names, with a habituation-dishabituation method. While the owner was out of the cat's sight, we played three different strangers' voices serially, followed by the owner's voice. We recorded the cat's reactions to the voices and categorized them into six behavioral categories. In addition, ten naive raters rated the cats' response magnitudes. The cats responded to human voices not by communicative behavior (vocalization and tail movement), but by orienting behavior (ear movement and head movement). This tendency did not change even when they were called by their owners. Of the 20 cats, 15 demonstrated a lower response magnitude to the third voice than to the first voice. These habituated cats showed a significant rebound in response to the subsequent presentation of their owners' voices. This result indicates that cats are able to use vocal cues alone to distinguish between humans.
Optimized delivery radiological reports: applying Six Sigma methodology to a radiology department.

PubMed

Cavagna, Enrico; Berletti, Riccardo; Schiavon, Francesco; Scarsi, Barbara; Barbato, Giuseppe

2003-03-01

To optimise the process of reporting and delivering radiological examinations with a view to achieving 100% service delivery within 72 hours to outpatients and 36 hours to inpatients. To this end, we used the Six Sigma method which adopts a systematic approach and rigorous statistical analysis to analyse and improve processes, by reducing variability and minimising errors. More specifically, our study focused on the process of radiological report creation, from the end of the examination to the time when the report is made available to the patient, to examine the bottlenecks and identify the measures to be taken to improve the process. Six Sigma uses a five-step problem-solving process called DMAIC, an acronym for Define, Measure, Analyze, Improve and Control. The first step is to define the problem and the elements crucial to quality, in terms of Total Quality Control. Next, the situation is analysed to identify the root causes of the problem and determine which of these is most influential. The situation is then improved by implementing change. Finally, to make sure that the change is long-lasting, measures are taken to sustain the improvements and obtain long-term control. In our case we analysed all of the phases the report passes through before reaching the user, and studied the impact of voice-recognition reporting on the speed of the report creation process. Analysis of the information collected showed that the tools available for report creation (dictaphone, voice-recognition system) and the transport of films and reports were the two critical elements on which to focus our efforts. Of all the phases making up the process, reporting (from end of examination to end of reporting) and distribution (from the report available to administrative staff to report available to the patient) account for 90% of process variability (73% and 17%, respectively). We further found that the reports dictated into a voice-recognition reporting system are delivered in 45 hours (median), whereas those dictated using a dictaphone take 96 hours: voice-recognition reporting systems therefore improve performance by 50 hours. Unfortunately, 38% of our reports are delivered within longer timeframes than the 72h for outpatients and 36h for inpatients agreed with the service users. Reports for inpatients have much faster delivery times and lower variability, as 95% of these examinations are reported using voice-recognition reporting (as a result of the greater sensitivity of physicians to the problem of inpatient waiting times). For conventional radiology examinations, numerically greater than CT or MRI, there is a stronger tendency to use the dictaphone which allows for faster dictation as it is unburdened by administrative tasks such as entering examination codes, correcting errors, etc. Freelance status has no impact on report delivery times, service delivery being the same as in the institutional setting. The subprocess of reporting is strongly affected by the choice of reporting method (voice-recognition system or dictaphone), whereas report delivery is affected by the individual's behaviour patterns and ultimately by habits generated by the lack of a clearly charted process (lack of synchronisation among the various phases), and therefore potentially avoidable. The analytical study of the various phases of examination reporting, from writing to delivery, allowed us to identify the process bottlenecks and take corrective measures. Regardless of imaging modality and individual physician, examination reporting consistently takes longer when a dictaphone is used instead of a voice-recognition reporting system, as this makes the process more complex. To improve the two critical subprocesses whilst maintaining constant resources, a first step is to abandon the dictaphone in favour of the voice-recognition system. In addition, we are experimenting other measures to improve the collection and sorting of examinations and the delivery of reports: the technical staff take the films from the examination rooms to the reporting rooms three times a day; the radiologists collect their examinations and prepare the reports, possibly on the same day; the radiologists leave their signed reports on the table in the central reporting room; the administrative staff collect the signed reports three times a day in the morning and afternoon to be able to deliver them on the same day. This project has allowed us to become familiar with the principles of total quality, to better understand our internal processes and to take effective measures to optimise them. This has resulted in enhanced satisfaction of all the department staff and has laid the grounds for further measures in the future.
Voices to reckon with: perceptions of voice identity in clinical and non-clinical voice hearers

PubMed Central

Badcock, Johanna C.; Chhabra, Saruchi

2013-01-01

The current review focuses on the perception of voice identity in clinical and non-clinical voice hearers. Identity perception in auditory verbal hallucinations (AVH) is grounded in the mechanisms of human (i.e., real, external) voice perception, and shapes the emotional (distress) and behavioral (help-seeking) response to the experience. Yet, the phenomenological assessment of voice identity is often limited, for example to the gender of the voice, and has failed to take advantage of recent models and evidence on human voice perception. In this paper we aim to synthesize the literature on identity in real and hallucinated voices and begin by providing a comprehensive overview of the features used to judge voice identity in healthy individuals and in people with schizophrenia. The findings suggest some subtle, but possibly systematic biases across different levels of voice identity in clinical hallucinators that are associated with higher levels of distress. Next we provide a critical evaluation of voice processing abilities in clinical and non-clinical voice hearers, including recent data collected in our laboratory. Our studies used diverse methods, assessing recognition and binding of words and voices in memory as well as multidimensional scaling of voice dissimilarity judgments. The findings overall point to significant difficulties recognizing familiar speakers and discriminating between unfamiliar speakers in people with schizophrenia, both with and without AVH. In contrast, these voice processing abilities appear to be generally intact in non-clinical hallucinators. The review highlights some important avenues for future research and treatment of AVH associated with a need for care, and suggests some novel insights into other symptoms of psychosis. PMID:23565088
The "Reading the Mind in the Voice" Test-Revised: A Study of Complex Emotion Recognition in Adults with and Without Autism Spectrum Conditions

ERIC Educational Resources Information Center

Golan, Ofer; Baron-Cohen, Simon; Hill, Jacqueline J.; Rutherford, M. D.

2007-01-01

This study reports a revised version of the "Reading the Mind in the Voice" (RMV) task. The original task (Rutherford et al., (2002), "Journal of Autism and Developmental Disorders, 32," 189-194) suffered from ceiling effects and limited sensitivity. To improve that, the task was shortened and two more foils were added to each of the remaining…
A Software Defined Integrated T1 Digital Network for Voice, Data and Video.

ERIC Educational Resources Information Center

Hill, James R.

The Dallas County Community College District developed and implemented a strategic plan for communications that utilizes a county-wide integrated network to carry voice, data, and video information to nine locations within the district. The network, which was installed and operational by March 1987, utilizes microwave, fiber optics, digital cross…
Characterization versus Narration: Drama's Role in Multimedia Instructional Software

ERIC Educational Resources Information Center

Cates, Ward Mitchell; Bishop, M. J.; Hung, Woei

2005-01-01

As part of an ongoing research program, the authors investigated the use of single-voiced narration and multi-voiced characterizations/monologues in a formative evaluation study of an instructional lesson on information processing. That lesson employed a design based on the use of content-related metaphors and a metaphorical graphical user…
The Effect of Background Traffic Packet Size to VoIP Speech Quality

NASA Astrophysics Data System (ADS)

Triyason, Tuul; Kanthamanon, Prasert; Warasup, Kittipong; Yamsaengsung, Siam; Supattatham, Montri

VoIP is gaining acceptance into the corporate world especially, in small and medium sized business that want to save cost for gaining advantage over their competitors. The good voice quality is one of challenging task in deployment plan because VoIP voice quality was affected by packet loss and jitter delay. In this paper, we study the effect of background traffic packet size to voice quality. The background traffic was generated by Bricks software and the speech quality was assessed by MOS. The obtained result shows an interesting relationship between the voice quality and the number of TCP packets and their size. With the same amount of data smaller packets affect the voice's quality more than the larger packet.
A perspective on early commercial applications of voice-processing technology for telecommunications and aids for the handicapped.

PubMed Central

Seelbach, C

1995-01-01

The Colloquium on Human-Machine Communication by Voice highlighted the global technical community's focus on the problems and promise of voice-processing technology, particularly, speech recognition and speech synthesis. Clearly, there are many areas in both the research and development of these technologies that can be advanced significantly. However, it is also true that there are many applications of these technologies that are capable of commercialization now. Early successful commercialization of new technology is vital to ensure continuing interest in its development. This paper addresses efforts to commercialize speech technologies in two markets: telecommunications and aids for the handicapped. PMID:7479814
The Use of Voice Cues for Speaker Gender Recognition in Cochlear Implant Recipients

ERIC Educational Resources Information Center

Meister, Hartmut; Fürsen, Katrin; Streicher, Barbara; Lang-Roth, Ruth; Walger, Martin

2016-01-01

Purpose: The focus of this study was to examine the influence of fundamental frequency (F0) and vocal tract length (VTL) modifications on speaker gender recognition in cochlear implant (CI) recipients for different stimulus types. Method: Single words and sentences were manipulated using isolated or combined F0 and VTL cues. Using an 11-point…
Voice gender discrimination provides a measure of more than pitch-related perception in cochlear implant users

PubMed Central

Li, Tianhao; Fu, Qian-Jie

2013-01-01

Objectives (1) To investigate whether voice gender discrimination (VGD) could be a useful indicator of the spectral and temporal processing abilities of individual cochlear implant (CI) users; (2) To examine the relationship between VGD and speech recognition with CI when comparable acoustic cues are used for both perception processes. Design VGD was measured using two talker sets with different inter-gender fundamental frequencies (F0), as well as different acoustic CI simulations. Vowel and consonant recognition in quiet and noise were also measured and compared with VGD performance. Study sample Eleven postlingually deaf CI users. Results The results showed that (1) mean VGD performance differed for different stimulus sets, (2) VGD and speech recognition performance varied among individual CI users, and (3) individual VGD performance was significantly correlated with speech recognition performance under certain conditions. Conclusions VGD measured with selected stimulus sets might be useful for assessing not only pitch-related perception, but also spectral and temporal processing by individual CI users. In addition to improvements in spectral resolution and modulation detection, the improvement in higher modulation frequency discrimination might be particularly important for CI users in noisy environments. PMID:21696330
Using Ambulatory Voice Monitoring to Investigate Common Voice Disorders: Research Update

PubMed Central

Mehta, Daryush D.; Van Stan, Jarrad H.; Zañartu, Matías; Ghassemi, Marzyeh; Guttag, John V.; Espinoza, Víctor M.; Cortés, Juan P.; Cheyne, Harold A.; Hillman, Robert E.

2015-01-01

Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, referred to as vocal hyperfunction. The clinical management of hyperfunctional voice disorders would be greatly enhanced by the ability to monitor and quantify detrimental vocal behaviors during an individual’s activities of daily life. This paper provides an update on ongoing work that uses a miniature accelerometer on the neck surface below the larynx to collect a large set of ambulatory data on patients with hyperfunctional voice disorders (before and after treatment) and matched-control subjects. Three types of analysis approaches are being employed in an effort to identify the best set of measures for differentiating among hyperfunctional and normal patterns of vocal behavior: (1) ambulatory measures of voice use that include vocal dose and voice quality correlates, (2) aerodynamic measures based on glottal airflow estimates extracted from the accelerometer signal using subject-specific vocal system models, and (3) classification based on machine learning and pattern recognition approaches that have been used successfully in analyzing long-term recordings of other physiological signals. Preliminary results demonstrate the potential for ambulatory voice monitoring to improve the diagnosis and treatment of common hyperfunctional voice disorders. PMID:26528472
Brain systems mediating voice identity processing in blind humans.

PubMed

Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

2014-09-01

Blind people rely more on vocal cues when they recognize a person's identity than sighted people. Indeed, a number of studies have reported better voice recognition skills in blind than in sighted adults. The present functional magnetic resonance imaging study investigated changes in the functional organization of neural systems involved in voice identity processing following congenital blindness. A group of congenitally blind individuals and matched sighted control participants were tested in a priming paradigm, in which two voice stimuli (S1, S2) were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either a old or a young person. Person-incongruent voices (S2) compared with person-congruent voices elicited an increased activation in the right anterior fusiform gyrus in congenitally blind individuals but not in matched sighted control participants. In contrast, only matched sighted controls showed a higher activation in response to person-incongruent compared with person-congruent voices (S2) in the right posterior superior temporal sulcus. These results provide evidence for crossmodal plastic changes of the person identification system in the brain after visual deprivation. Copyright © 2014 Wiley Periodicals, Inc.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J.F.; Ng, L.C.

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less

The Effects of Size and Type of Vocal Fold Polyp on Some Acoustic Voice Parameters.

PubMed

Akbari, Elaheh; Seifpanahi, Sadegh; Ghorbani, Ali; Izadi, Farzad; Torabinezhad, Farhad

2018-03-01

Vocal abuse and misuse would result in vocal fold polyp. Certain features define the extent of vocal folds polyp effects on voice acoustic parameters. The present study aimed to define the effects of polyp size on acoustic voice parameters, and compare these parameters in hemorrhagic and non-hemorrhagic polyps. In the present retrospective study, 28 individuals with hemorrhagic or non-hemorrhagic polyps of the true vocal folds were recruited to investigate acoustic voice parameters of vowel/ æ/ computed by the Praat software. The data were analyzed using the SPSS software, version 17.0. According to the type and size of polyps, mean acoustic differences and correlations were analyzed by the statistical t test and Pearson correlation test, respectively; with significance level below 0.05. The results indicated that jitter and the harmonics-to-noise ratio had a significant positive and negative correlation with the polyp size (P=0.01), respectively. In addition, both mentioned parameters were significantly different between the two types of the investigated polyps. Both the type and size of polyps have effects on acoustic voice characteristics. In the present study, a novel method to measure polyp size was introduced. Further confirmation of this method as a tool to compare polyp sizes requires additional investigations.
The Effects of Size and Type of Vocal Fold Polyp on Some Acoustic Voice Parameters

PubMed Central

Akbari, Elaheh; Seifpanahi, Sadegh; Ghorbani, Ali; Izadi, Farzad; Torabinezhad, Farhad

2018-01-01

Background Vocal abuse and misuse would result in vocal fold polyp. Certain features define the extent of vocal folds polyp effects on voice acoustic parameters. The present study aimed to define the effects of polyp size on acoustic voice parameters, and compare these parameters in hemorrhagic and non-hemorrhagic polyps. Methods In the present retrospective study, 28 individuals with hemorrhagic or non-hemorrhagic polyps of the true vocal folds were recruited to investigate acoustic voice parameters of vowel/ æ/ computed by the Praat software. The data were analyzed using the SPSS software, version 17.0. According to the type and size of polyps, mean acoustic differences and correlations were analyzed by the statistical t test and Pearson correlation test, respectively; with significance level below 0.05. Results The results indicated that jitter and the harmonics-to-noise ratio had a significant positive and negative correlation with the polyp size (P=0.01), respectively. In addition, both mentioned parameters were significantly different between the two types of the investigated polyps. Conclusion Both the type and size of polyps have effects on acoustic voice characteristics. In the present study, a novel method to measure polyp size was introduced. Further confirmation of this method as a tool to compare polyp sizes requires additional investigations. PMID:29749984
Speech recognition technology: an outlook for human-to-machine interaction.

PubMed

Erdel, T; Crooks, S

2000-01-01

Speech recognition, as an enabling technology in healthcare-systems computing, is a topic that has been discussed for quite some time, but is just now coming to fruition. Traditionally, speech-recognition software has been constrained by hardware, but improved processors and increased memory capacities are starting to remove some of these limitations. With these barriers removed, companies that create software for the healthcare setting have the opportunity to write more successful applications. Among the criticisms of speech-recognition applications are the high rates of error and steep training curves. However, even in the face of such negative perceptions, there remains significant opportunities for speech recognition to allow healthcare providers and, more specifically, physicians, to work more efficiently and ultimately spend more time with their patients and less time completing necessary documentation. This article will identify opportunities for inclusion of speech-recognition technology in the healthcare setting and examine major categories of speech-recognition software--continuous speech recognition, command and control, and text-to-speech. We will discuss the advantages and disadvantages of each area, the limitations of the software today, and how future trends might affect them.
Multidimensional assessment of strongly irregular voices such as in substitution voicing and spasmodic dysphonia: a compilation of own research.

PubMed

Moerman, Mieke; Martens, Jean-Pierre; Dejonckere, Philippe

2015-04-01

This article is a compilation of own research performed during the European COoperation in Science and Technology (COST) action 2103: 'Advance Voice Function Assessment', an initiative of voice and speech processing teams consisting of physicists, engineers, and clinicians. This manuscript concerns analyzing largely irregular voicing types, namely substitution voicing (SV) and adductor spasmodic dysphonia (AdSD). A specific perceptual rating scale (IINFVo) was developed, and the Auditory Model Based Pitch Extractor (AMPEX), a piece of software that automatically analyses running speech and generates pitch values in background noise, was applied. The IINFVo perceptual rating scale has been shown to be useful in evaluating SV. The analysis of strongly irregular voices stimulated a modification of the European Laryngological Society's assessment protocol which was originally designed for the common types of (less severe) dysphonia. Acoustic analysis with AMPEX demonstrates that the most informative features are, for SV, the voicing-related acoustic features and, for AdSD, the perturbation measures. Poor correlations between self-assessment and acoustic and perceptual dimensions in the assessment of highly irregular voices argue for a multidimensional approach.
Real-time interactive speech technology at Threshold Technology, Incorporated

NASA Technical Reports Server (NTRS)

Herscher, Marvin B.

1977-01-01

Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed.
Multi-Dimensional Voice Program (MDVP) vs Praat for Assessing Euphonic Subjects: A Preliminary Study on the Gender-discriminating Power of Acoustic Analysis Software.

PubMed

Lovato, Andrea; De Colle, Wladimiro; Giacomelli, Luciano; Piacente, Alessandro; Righetto, Lara; Marioni, Gino; de Filippis, Cosimo

2016-11-01

The aim of this study was to compare the discriminatory power of the Multi-Dimensional Voice Program (MDVP) and Praat in distinguishing the gender of euphonic adults. This is a cross-sectional study. The recordings of 100 euphonic volunteers (50 males and 50 females) producing a sustained vowel /a/ were analyzed with MDVP and Praat software. Both computer programs identified significant differences between male and female volunteers in absolute jitter (MDVP P < 0.00001 and Praat P < 0.00001) and in shimmer in decibel (dB) (MDVP P = 0.006 and Praat P = 0.001). Using the scale proposed by Hosmer and Lemeshow, we found no gender discrimination for shimmer in dB with either the MDVP (area under the receiver operating characteristics curve [AUC] = 0.658) or Praat (AUC = 0.682). In our series, on the other hand, MDVP absolute jitter achieved an acceptable discrimination between males and females (AUC = 0.752), and Praat absolute jitter achieved an outstanding discrimination (AUC = 0.901). The discriminatory power of Praat absolute jitter was significantly higher than that of the MDVP (P = 0.003). Absolute jitter sensitivity and specificity were also higher for Praat (83% and 80%) than for the MDVP (74% and 49%). Differences attributable to a subject's gender and to the software used to measure acoustic parameters should be carefully considered in both research and clinical settings. Further studies are needed to test the discriminatory power of different voice analysis programs when differentiating between normal and dysphonic voices. Copyright Â© 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
A Development of a System Enables Character Input and PC Operation via Voice for a Physically Disabled Person with a Speech Impediment

NASA Astrophysics Data System (ADS)

Tanioka, Toshimasa; Egashira, Hiroyuki; Takata, Mayumi; Okazaki, Yasuhisa; Watanabe, Kenzi; Kondo, Hiroki

We have designed and implemented a PC operation support system for a physically disabled person with a speech impediment via voice. Voice operation is an effective method for a physically disabled person with involuntary movement of the limbs and the head. We have applied a commercial speech recognition engine to develop our system for practical purposes. Adoption of a commercial engine reduces development cost and will contribute to make our system useful to another speech impediment people. We have customized commercial speech recognition engine so that it can recognize the utterance of a person with a speech impediment. We have restricted the words that the recognition engine recognizes and separated a target words from similar words in pronunciation to avoid misrecognition. Huge number of words registered in commercial speech recognition engines cause frequent misrecognition for speech impediments' utterance, because their utterance is not clear and unstable. We have solved this problem by narrowing the choice of input down in a small number and also by registering their ambiguous pronunciations in addition to the original ones. To realize all character inputs and all PC operation with a small number of words, we have designed multiple input modes with categorized dictionaries and have introduced two-step input in each mode except numeral input to enable correct operation with small number of words. The system we have developed is in practical level. The first author of this paper is physically disabled with a speech impediment. He has been able not only character input into PC but also to operate Windows system smoothly by using this system. He uses this system in his daily life. This paper is written by him with this system. At present, the speech recognition is customized to him. It is, however, possible to customize for other users by changing words and registering new pronunciation according to each user's utterance.
Using Continuous Voice Recognition Technology as an Input Medium to the Naval Warfare Interactive Simulation System (NWISS).

DTIC Science & Technology

1984-06-01

Co ,u’arataor, Gr 7- / ’ . c ; / , caae.ic >ar. ’ ’# d:.i II ’ ..... .. . . .. .. . ... . , rV ABSTRACT A great d-al of research has been conducted an...9 2. Continuous Voice -%ecoait.ior, ....... 11 B. VERBEX 3000 SPEECH APPLiCATION DEVELOP !ENT SYSTEM! ( SPADS ...13 C . NAVAL IAR FARE INT7EACTI7E S:AIULATIC"N SYSTEM (NWISS) ....... .................. 14 D. PURPOSE .................... 16 1. A Past
A Multidimensional Approach to the Study of Emotion Recognition in Autism Spectrum Disorders

PubMed Central

Xavier, Jean; Vignaud, Violaine; Ruggiero, Rosa; Bodeau, Nicolas; Cohen, David; Chaby, Laurence

2015-01-01

Although deficits in emotion recognition have been widely reported in autism spectrum disorder (ASD), experiments have been restricted to either facial or vocal expressions. Here, we explored multimodal emotion processing in children with ASD (N = 19) and with typical development (TD, N = 19), considering uni (faces and voices) and multimodal (faces/voices simultaneously) stimuli and developmental comorbidities (neuro-visual, language and motor impairments). Compared to TD controls, children with ASD had rather high and heterogeneous emotion recognition scores but showed also several significant differences: lower emotion recognition scores for visual stimuli, for neutral emotion, and a greater number of saccades during visual task. Multivariate analyses showed that: (1) the difficulties they experienced with visual stimuli were partially alleviated with multimodal stimuli. (2) Developmental age was significantly associated with emotion recognition in TD children, whereas it was the case only for the multimodal task in children with ASD. (3) Language impairments tended to be associated with emotion recognition scores of ASD children in the auditory modality. Conversely, in the visual or bimodal (visuo-auditory) tasks, the impact of developmental coordination disorder or neuro-visual impairments was not found. We conclude that impaired emotion processing constitutes a dimension to explore in the field of ASD, as research has the potential to define more homogeneous subgroups and tailored interventions. However, it is clear that developmental age, the nature of the stimuli, and other developmental comorbidities must also be taken into account when studying this dimension. PMID:26733928
Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.

PubMed

Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina

2014-12-01

In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.
Adaptive Suppression of Noise in Voice Communications

NASA Technical Reports Server (NTRS)

Kozel, David; DeVault, James A.; Birr, Richard B.

2003-01-01

A subsystem for the adaptive suppression of noise in a voice communication system effects a high level of reduction of noise that enters the system through microphones. The subsystem includes a digital signal processor (DSP) plus circuitry that implements voice-recognition and spectral- manipulation techniques. The development of the adaptive noise-suppression subsystem was prompted by the following considerations: During processing of the space shuttle at Kennedy Space Center, voice communications among test team members have been significantly impaired in several instances because some test participants have had to communicate from locations with high ambient noise levels. Ear protection for the personnel involved is commercially available and is used in such situations. However, commercially available noise-canceling microphones do not provide sufficient reduction of noise that enters through microphones and thus becomes transmitted on outbound communication links.
Evolving Spiking Neural Networks for Recognition of Aged Voices.

PubMed

Silva, Marco; Vellasco, Marley M B R; Cataldo, Edson

2017-01-01

The aging of the voice, known as presbyphonia, is a natural process that can cause great change in vocal quality of the individual. This is a relevant problem to those people who use their voices professionally, and its early identification can help determine a suitable treatment to avoid its progress or even to eliminate the problem. This work focuses on the development of a new model for the identification of aging voices (independently of their chronological age), using as input attributes parameters extracted from the voice and glottal signals. The proposed model, named Quantum binary-real evolving Spiking Neural Network (QbrSNN), is based on spiking neural networks (SNNs), with an unsupervised training algorithm, and a Quantum-Inspired Evolutionary Algorithm that automatically determines the most relevant attributes and the optimal parameters that configure the SNN. The QbrSNN model was evaluated in a database composed of 120 records, containing samples from three groups of speakers. The results obtained indicate that the proposed model provides better accuracy than other approaches, with fewer input attributes. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Memory for faces and voices varies as a function of sex and expressed emotion.

PubMed

S Cortes, Diana; Laukka, Petri; Lindahl, Christina; Fischer, Håkan

2017-01-01

We investigated how memory for faces and voices (presented separately and in combination) varies as a function of sex and emotional expression (anger, disgust, fear, happiness, sadness, and neutral). At encoding, participants judged the expressed emotion of items in forced-choice tasks, followed by incidental Remember/Know recognition tasks. Results from 600 participants showed that accuracy (hits minus false alarms) was consistently higher for neutral compared to emotional items, whereas accuracy for specific emotions varied across the presentation modalities (i.e., faces, voices, and face-voice combinations). For the subjective sense of recollection ("remember" hits), neutral items received the highest hit rates only for faces, whereas for voices and face-voice combinations anger and fear expressions instead received the highest recollection rates. We also observed better accuracy for items by female expressers, and own-sex bias where female participants displayed memory advantage for female faces and face-voice combinations. Results further suggest that own-sex bias can be explained by recollection, rather than familiarity, rates. Overall, results show that memory for faces and voices may be influenced by the expressions that they carry, as well as by the sex of both items and participants. Emotion expressions may also enhance the subjective sense of recollection without enhancing memory accuracy.
Memory for faces and voices varies as a function of sex and expressed emotion

PubMed Central

Laukka, Petri; Lindahl, Christina; Fischer, Håkan

2017-01-01

We investigated how memory for faces and voices (presented separately and in combination) varies as a function of sex and emotional expression (anger, disgust, fear, happiness, sadness, and neutral). At encoding, participants judged the expressed emotion of items in forced-choice tasks, followed by incidental Remember/Know recognition tasks. Results from 600 participants showed that accuracy (hits minus false alarms) was consistently higher for neutral compared to emotional items, whereas accuracy for specific emotions varied across the presentation modalities (i.e., faces, voices, and face-voice combinations). For the subjective sense of recollection (“remember” hits), neutral items received the highest hit rates only for faces, whereas for voices and face-voice combinations anger and fear expressions instead received the highest recollection rates. We also observed better accuracy for items by female expressers, and own-sex bias where female participants displayed memory advantage for female faces and face-voice combinations. Results further suggest that own-sex bias can be explained by recollection, rather than familiarity, rates. Overall, results show that memory for faces and voices may be influenced by the expressions that they carry, as well as by the sex of both items and participants. Emotion expressions may also enhance the subjective sense of recollection without enhancing memory accuracy. PMID:28570691
More than Just Two Sexes: The Neural Correlates of Voice Gender Perception in Gender Dysphoria

PubMed Central

Junger, Jessica; Habel, Ute; Bröhr, Sabine; Neulen, Josef; Neuschaefer-Rube, Christiane; Birkholz, Peter; Kohler, Christian; Schneider, Frank; Derntl, Birgit; Pauly, Katharina

2014-01-01

Gender dysphoria (also known as “transsexualism”) is characterized as a discrepancy between anatomical sex and gender identity. Research points towards neurobiological influences. Due to the sexually dimorphic characteristics of the human voice, voice gender perception provides a biologically relevant function, e.g. in the context of mating selection. There is evidence for a better recognition of voices of the opposite sex and a differentiation of the sexes in its underlying functional cerebral correlates, namely the prefrontal and middle temporal areas. This fMRI study investigated the neural correlates of voice gender perception in 32 male-to-female gender dysphoric individuals (MtFs) compared to 20 non-gender dysphoric men and 19 non-gender dysphoric women. Participants indicated the sex of 240 voice stimuli modified in semitone steps in the direction to the other gender. Compared to men and women, MtFs showed differences in a neural network including the medial prefrontal gyrus, the insula, and the precuneus when responding to male vs. female voices. With increased voice morphing men recruited more prefrontal areas compared to women and MtFs, while MtFs revealed a pattern more similar to women. On a behavioral and neuronal level, our results support the feeling of MtFs reporting they cannot identify with their assigned sex. PMID:25375171
Extending the Capture Volume of an Iris Recognition System Using Wavefront Coding and Super-Resolution.

PubMed

Hsieh, Sheng-Hsun; Li, Yung-Hui; Tien, Chung-Hao; Chang, Chin-Chen

2016-12-01

Iris recognition has gained increasing popularity over the last few decades; however, the stand-off distance in a conventional iris recognition system is too short, which limits its application. In this paper, we propose a novel hardware-software hybrid method to increase the stand-off distance in an iris recognition system. When designing the system hardware, we use an optimized wavefront coding technique to extend the depth of field. To compensate for the blurring of the image caused by wavefront coding, on the software side, the proposed system uses a local patch-based super-resolution method to restore the blurred image to its clear version. The collaborative effect of the new hardware design and software post-processing showed great potential in our experiment. The experimental results showed that such improvement cannot be achieved by using a hardware-or software-only design. The proposed system can increase the capture volume of a conventional iris recognition system by three times and maintain the system's high recognition rate.
Human factors issues associated with the use of speech technology in the cockpit

NASA Technical Reports Server (NTRS)

Kersteen, Z. A.; Damos, D.

1983-01-01

The human factors issues associated with the use of voice technology in the cockpit are summarized. The formulation of the LHX avionics suite is described and the allocation of tasks to voice in the cockpit is discussed. State-of-the-art speech recognition technology is reviewed. Finally, a questionnaire designed to tap pilot opinions concerning the allocation of tasks to voice input and output in the cockpit is presented. This questionnaire was designed to be administered to operational AH-1G Cobra gunship pilots. Half of the questionnaire deals specifically with the AH-1G cockpit and the types of tasks pilots would like to have performed by voice in this existing rotorcraft. The remaining portion of the questionnaire deals with an undefined rotorcraft of the future and is aimed at determining what types of tasks these pilots would like to have performed by voice technology if anything was possible, i.e. if there were no technological constraints.
Voice recognition technology implementation in surgical pathology: advantages and limitations.

PubMed

Singh, Meenakshi; Pal, Timothy R

2011-11-01

Voice recognition technology (VRT) has been in use for medical transcription outside of laboratories for many years, and in recent years it has evolved to a level where it merits consideration by surgical pathologists. To determine the feasibility and impact of making a transition from a transcriptionist-based service to VRT in surgical pathology. We have evaluated VRT in a phased manner for sign out of general and subspecialty surgical pathology cases after conducting a pilot study. We evaluated the effect on turnaround time, workflow, staffing, typographical error rates, and the overall ability of VRT to be adapted for use in surgical pathology. The stepwise implementation of VRT has resulted in real-time sign out of cases and improvement in average turnaround time from 4 to 3 days. The percentage of cases signed out in 1 day improved from 22% to 37%. Amendment rates for typographical errors have decreased. Use of templates and synoptic reports has been facilitated. The transcription staff has been reassigned to other duties and is successfully assisting in other areas. Resident involvement and exposure to complete case sign out has been achieved resulting in a positive impact on resident education. Voice recognition technology allows for a seamless workflow in surgical pathology, with improvements in turnaround time and a positive impact on competency-based resident education. Individual practices may assess the value of VRT and decide to implement it, potentially with gains in many aspects of their practice.
Science 101: How Does Speech-Recognition Software Work?

ERIC Educational Resources Information Center

Robertson, Bill

2016-01-01

This column provides background science information for elementary teachers. Many innovations with computer software begin with analysis of how humans do a task. This article takes a look at how humans recognize spoken words and explains the origins of speech-recognition software.
Similar representations of emotions across faces and voices.

PubMed

Kuhn, Lisa Katharina; Wydell, Taeko; Lavan, Nadine; McGettigan, Carolyn; Garrido, Lúcia

2017-09-01

[Correction Notice: An Erratum for this article was reported in Vol 17(6) of Emotion (see record 2017-18585-001). In the article, the copyright attribution was incorrectly listed and the Creative Commons CC-BY license disclaimer was incorrectly omitted from the author note. The correct copyright is "© 2017 The Author(s)" and the omitted disclaimer is below. All versions of this article have been corrected. "This article has been published under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Copyright for this article is retained by the author(s). Author(s) grant(s) the American Psychological Association the exclusive right to publish the article and identify itself as the original publisher."] Emotions are a vital component of social communication, carried across a range of modalities and via different perceptual signals such as specific muscle contractions in the face and in the upper respiratory system. Previous studies have found that emotion recognition impairments after brain damage depend on the modality of presentation: recognition from faces may be impaired whereas recognition from voices remains preserved, and vice versa. On the other hand, there is also evidence for shared neural activation during emotion processing in both modalities. In a behavioral study, we investigated whether there are shared representations in the recognition of emotions from faces and voices. We used a within-subjects design in which participants rated the intensity of facial expressions and nonverbal vocalizations for each of the 6 basic emotion labels. For each participant and each modality, we then computed a representation matrix with the intensity ratings of each emotion. These matrices allowed us to examine the patterns of confusions between emotions and to characterize the representations of emotions within each modality. We then compared the representations across modalities by computing the correlations of the representation matrices across faces and voices. We found highly correlated matrices across modalities, which suggest similar representations of emotions across faces and voices. We also showed that these results could not be explained by commonalities between low-level visual and acoustic properties of the stimuli. We thus propose that there are similar or shared coding mechanisms for emotions which may act independently of modality, despite their distinct perceptual inputs. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

Detecting Paroxysmal Coughing from Pertussis Cases Using Voice Recognition Technology

PubMed Central

Parker, Danny; Picone, Joseph; Harati, Amir; Lu, Shuang; Jenkyns, Marion H.; Polgreen, Philip M.

2013-01-01

Background Pertussis is highly contagious; thus, prompt identification of cases is essential to control outbreaks. Clinicians experienced with the disease can easily identify classic cases, where patients have bursts of rapid coughing followed by gasps, and a characteristic whooping sound. However, many clinicians have never seen a case, and thus may miss initial cases during an outbreak. The purpose of this project was to use voice-recognition software to distinguish pertussis coughs from croup and other coughs. Methods We collected a series of recordings representing pertussis, croup and miscellaneous coughing by children. We manually categorized coughs as either pertussis or non-pertussis, and extracted features for each category. We used Mel-frequency cepstral coefficients (MFCC), a sampling rate of 16 KHz, a frame Duration of 25 msec, and a frame rate of 10 msec. The coughs were filtered. Each cough was divided into 3 sections of proportion 3-4-3. The average of the 13 MFCCs for each section was computed and made into a 39-element feature vector used for the classification. We used the following machine learning algorithms: Neural Networks, K-Nearest Neighbor (KNN), and a 200 tree Random Forest (RF). Data were reserved for cross-validation of the KNN and RF. The Neural Network was trained 100 times, and the averaged results are presented. Results After categorization, we had 16 examples of non-pertussis coughs and 31 examples of pertussis coughs. Over 90% of all pertussis coughs were properly classified as pertussis. The error rates were: Type I errors of 7%, 12%, and 25% and Type II errors of 8%, 0%, and 0%, using the Neural Network, Random Forest, and KNN, respectively. Conclusion Our results suggest that we can build a robust classifier to assist clinicians and the public to help identify pertussis cases in children presenting with typical symptoms. PMID:24391730
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

PubMed

Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado

2016-01-01

Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).
Integrative Lifecourse and Genetic Analysis of Military Working Dogs

DTIC Science & Technology

2012-10-01

Recognition), ICR (Intelligent Character Recognition) and HWR ( Handwriting Recognition). A number of various software packages were evaluated and we have...the third-party software is able to recognize check-boxes and columns and do a reasonable job with handwriting – which is does. This workflow will
User Committees Give NCI and Frederick National Lab Employees a Voice | Poster

Cancer.gov

Do you want a wider selection of food options at the Discovery Cafeteria? Do you wish Purchasing and Logistics would enhance the current software for more efficient processing? Maybe you’d like to see better defined service availability times at Occupational Health Services (OHS). Whatever your suggestion, you can make your voice heard by contacting the appropriate user
Chaos tool implementation for non-singer and singer voice comparison (preliminary study)

NASA Astrophysics Data System (ADS)

Dajer, Me; Pereira, Jc; Maciel, Cd

2007-11-01

Voice waveform is linked to the stretch, shorten, widen or constrict vocal tract. The articulation effects of the singer's vocal tract modify the voice acoustical characteristics and differ from the non-singer voices. In the last decades, Chaos Theory has shown the possibility to explore the dynamic nature of voice signals from a different point of view. The purpose of this paper is to apply the chaos technique of phase space reconstruction to analyze non- singers and singer voices in order to explore the signal nonlinear dynamic, and correlate them with traditional acoustic parameters. Eight voice samples of sustained vowel /i/ from non-singers and eight from singers were analyzed with "ANL" software. The samples were also acoustically analyzed with "Analise de Voz 5.0" in order to extract acoustic perturbation measures jitter and shimmer, and the coefficient of excess - (EX). The results showed different visual patterns for the two groups correlated with different jitter, shimmer, and coefficient of excess values. We conclude that these results clearly indicate the potential of phase space reconstruction technique for analysis and comparison of non-singers and singer voices. They also show a promising tool for training voices application.
Talker and accent variability effects on spoken word recognition

NASA Astrophysics Data System (ADS)

Nyang, Edna E.; Rogers, Catherine L.; Nishi, Kanae

2003-04-01

A number of studies have shown that words in a list are recognized less accurately in noise and with longer response latencies when they are spoken by multiple talkers, rather than a single talker. These results have been interpreted as support for an exemplar-based model of speech perception, in which it is assumed that detailed information regarding the speaker's voice is preserved in memory and used in recognition, rather than being eliminated via normalization. In the present study, the effects of varying both accent and talker are investigated using lists of words spoken by (a) a single native English speaker, (b) six native English speakers, (c) three native English speakers and three Japanese-accented English speakers. Twelve /hVd/ words were mixed with multi-speaker babble at three signal-to-noise ratios (+10, +5, and 0 dB) to create the word lists. Native English-speaking listeners' percent-correct recognition for words produced by native English speakers across the three talker conditions (single talker native, multi-talker native, and multi-talker mixed native and non-native) and three signal-to-noise ratios will be compared to determine whether sources of speaker variability other than voice alone add to the processing demands imposed by simple (i.e., single accent) speaker variability in spoken word recognition.
Automatic speech recognition in air-ground data link

NASA Technical Reports Server (NTRS)

Armstrong, Herbert B.

1989-01-01

In the present air traffic system, information presented to the transport aircraft cockpit crew may originate from a variety of sources and may be presented to the crew in visual or aural form, either through cockpit instrument displays or, most often, through voice communication. Voice radio communications are the most error prone method for air-ground data link. Voice messages can be misstated or misunderstood and radio frequency congestion can delay or obscure important messages. To prevent proliferation, a multiplexed data link display can be designed to present information from multiple data link sources on a shared cockpit display unit (CDU) or multi-function display (MFD) or some future combination of flight management and data link information. An aural data link which incorporates an automatic speech recognition (ASR) system for crew response offers several advantages over visual displays. The possibility of applying ASR to the air-ground data link was investigated. The first step was to review current efforts in ASR applications in the cockpit and in air traffic control and evaluated their possible data line application. Next, a series of preliminary research questions is to be developed for possible future collaboration.
Internet Based Remote Operations

NASA Technical Reports Server (NTRS)

Chamberlain, James

1999-01-01

This is the Final Report for the Internet Based Remote Operations Contract, has performed payload operations research support tasks March 1999 through September 1999. These tasks support the GSD goal of developing a secure, inexpensive data, voice, and video mission communications capability between remote payload investigators and the NASA payload operations team in the International Space Station (ISS) era. AZTek has provided feedback from the NASA payload community by utilizing its extensive payload development and operations experience to test and evaluate remote payload operations systems. AZTek has focused on use of the "public Internet" and inexpensive, Commercial-off-the-shelf (COTS) Internet-based tools that would most benefit "small" (e.g., $2 Million or less) payloads and small developers without permanent remote operations facilities. Such projects have limited budgets to support installation and development of high-speed dedicated communications links and high-end, custom ground support equipment and software. The primary conclusions of the study are as follows: (1) The trend of using Internet technology for "live" collaborative applications such as telescience will continue. The GSD-developed data and voice capabilities continued to work well over the "public" Internet during this period. 2. Transmitting multiple voice streams from a voice-conferencing server to a client PC to be mixed and played on the PC is feasible. 3. There are two classes of voice vendors in the market: - Large traditional phone equipment vendors pursuing integration of PSTN with Internet, and Small Internet startups.The key to selecting a vendor will be to find a company sufficiently large and established to provide a base voice-conferencing software product line for the next several years.
Speech perception and communication ability over the telephone by Mandarin-speaking children with cochlear implants.

PubMed

Wu, Che-Ming; Liu, Tien-Chen; Wang, Nan-Mai; Chao, Wei-Chieh

2013-08-01

(1) To understand speech perception and communication ability through real telephone calls by Mandarin-speaking children with cochlear implants and compare them to live-voice perception, (2) to report the general condition of telephone use of this population, and (3) to investigate the factors that correlate with telephone speech perception performance. Fifty-six children with over 4 years of implant use (aged 6.8-13.6 years, mean duration 8.0 years) took three speech perception tests administered using telephone and live voice to examine sentence, monosyllabic-word and Mandarin tone perception. The children also filled out a questionnaire survey investigating everyday telephone use. Wilcoxon signed-rank test was used to compare the scores between live-voice and telephone tests, and Pearson's test to examine the correlation between them. The mean scores were 86.4%, 69.8% and 70.5% respectively for sentence, word and tone recognition over the telephone. The corresponding live-voice mean scores were 94.3%, 84.0% and 70.8%. Wilcoxon signed-rank test showed the sentence and word scores were significantly different between telephone and live voice test, while the tone recognition scores were not, indicating tone perception was less worsened by telephone transmission than words and sentences. Spearman's test showed that chronological age and duration of implant use were weakly correlated with the perception test scores. The questionnaire survey showed 78% of the children could initiate phone calls and 59% could use the telephone 2 years after implantation. Implanted children are potentially capable of using the telephone 2 years after implantation, and communication ability over the telephone becomes satisfactory 4 years after implantation. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Gay- and Lesbian-Sounding Auditory Cues Elicit Stereotyping and Discrimination.

PubMed

Fasoli, Fabio; Maass, Anne; Paladino, Maria Paola; Sulpizio, Simone

2017-07-01

The growing body of literature on the recognition of sexual orientation from voice ("auditory gaydar") is silent on the cognitive and social consequences of having a gay-/lesbian- versus heterosexual-sounding voice. We investigated this issue in four studies (overall N = 276), conducted in Italian language, in which heterosexual listeners were exposed to single-sentence voice samples of gay/lesbian and heterosexual speakers. In all four studies, listeners were found to make gender-typical inferences about traits and preferences of heterosexual speakers, but gender-atypical inferences about those of gay or lesbian speakers. Behavioral intention measures showed that listeners considered lesbian and gay speakers as less suitable for a leadership position, and male (but not female) listeners took distance from gay speakers. Together, this research demonstrates that having a gay/lesbian rather than heterosexual-sounding voice has tangible consequences for stereotyping and discrimination.
Style and Usage Software: Mentor, not Judge.

ERIC Educational Resources Information Center

Smye, Randy

Computer software style and usage checkers can encourage students' recursive revision strategies. For example, HOMER is based on the revision pedagogy presented in Richard Lanham's "Revising Prose," while Grammatik II focuses on readability, passive voice, and possibly misused words or phrases. Writer's Workbench "Style" (a UNIX program) provides…
Air Traffic Control Experimentation and Evaluation with the NASA ATS-6 Satellite : Volume 4. Data Reduction and Analysis Software.

DOT National Transportation Integrated Search

1976-09-01

Software used for the reduction and analysis of the multipath prober, modem evaluation (voice, digital data, and ranging), and antenna evaluation data acquired during the ATS-6 field test program is described. Multipath algorithms include reformattin...
Basic and complex emotion recognition in children with autism: cross-cultural findings.

PubMed

Fridenson-Hayo, Shimrit; Berggren, Steve; Lassalle, Amandine; Tal, Shahar; Pigat, Delia; Bölte, Sven; Baron-Cohen, Simon; Golan, Ofer

2016-01-01

Children with autism spectrum conditions (ASC) have emotion recognition deficits when tested in different expression modalities (face, voice, body). However, these findings usually focus on basic emotions, using one or two expression modalities. In addition, cultural similarities and differences in emotion recognition patterns in children with ASC have not been explored before. The current study examined the similarities and differences in the recognition of basic and complex emotions by children with ASC and typically developing (TD) controls across three cultures: Israel, Britain, and Sweden. Fifty-five children with high-functioning ASC, aged 5-9, were compared to 58 TD children. On each site, groups were matched on age, sex, and IQ. Children were tested using four tasks, examining recognition of basic and complex emotions from voice recordings, videos of facial and bodily expressions, and emotional video scenarios including all modalities in context. Compared to their TD peers, children with ASC showed emotion recognition deficits in both basic and complex emotions on all three modalities and their integration in context. Complex emotions were harder to recognize, compared to basic emotions for the entire sample. Cross-cultural agreement was found for all major findings, with minor deviations on the face and body tasks. Our findings highlight the multimodal nature of ER deficits in ASC, which exist for basic as well as complex emotions and are relatively stable cross-culturally. Cross-cultural research has the potential to reveal both autism-specific universal deficits and the role that specific cultures play in the way empathy operates in different countries.
Impact of a Telehealth Program With Voice Recognition Technology in Patients With Chronic Heart Failure: Feasibility Study

PubMed Central

Lee, Heesun; Choi, Sae Won; Yoon, Yeonyee E; Park, Hyo Eun; Lee, Sang Eun; Lee, Seung-Pyo; Kim, Hyung-Kwan; Cho, Hyun-Jai; Choi, Su-Yeon; Lee, Hae-Young; Choi, Jonghyuk; Lee, Young-Joon; Kim, Yong-Jin; Cho, Goo-Yeong; Choi, Jinwook; Sohn, Dae-Won

2017-01-01

Background Despite the advances in the diagnosis and treatment of heart failure (HF), the current hospital-oriented framework for HF management does not appear to be sufficient to maintain the stability of HF patients in the long term. The importance of self-care management is increasingly being emphasized as a promising long-term treatment strategy for patients with chronic HF. Objective The objective of this study was to evaluate whether a new information communication technology (ICT)–based telehealth program with voice recognition technology could improve clinical or laboratory outcomes in HF patients. Methods In this prospective single-arm pilot study, we recruited 31 consecutive patients with chronic HF who were referred to our institute. An ICT-based telehealth program with voice recognition technology was developed and used by patients with HF for 12 weeks. Patients were educated on the use of this program via mobile phone, landline, or the Internet for the purpose of improving communication and data collection. Using these systems, we collected comprehensive data elements related to the risk of HF self-care management such as weight, diet, exercise, medication adherence, overall symptom change, and home blood pressure. The study endpoints were the changes observed in urine sodium concentration (uNa), Minnesota Living with Heart Failure (MLHFQ) scores, 6-min walk test, and N-terminal prohormone of brain natriuretic peptide (NT-proBNP) as surrogate markers for appropriate HF management. Results Among the 31 enrolled patients, 27 (87%) patients completed the study, and 10 (10/27, 37%) showed good adherence to ICT-based telehealth program with voice recognition technology, which was defined as the use of the program for 100 times or more during the study period. Nearly three-fourths of the patients had been hospitalized at least once because of HF before the enrollment (20/27, 74%); 14 patients had 1, 2 patients had 2, and 4 patients had 3 or more previous HF hospitalizations. In the total study population, there was no significant interval change in laboratory and functional outcome variables after 12 weeks of ICT-based telehealth program. In patients with good adherence to ICT-based telehealth program, there was a significant improvement in the mean uNa (103.1 to 78.1; P=.01) but not in those without (85.4 to 96.9; P=.49). Similarly, a marginal improvement in MLHFQ scores was only observed in patients with good adherence (27.5 to 21.4; P=.08) but not in their counterparts (19.0 to 19.7; P=.73). The mean 6-min walk distance and NT-proBNP were not significantly increased in patients regardless of their adherence. Conclusions Short-term application of ICT-based telehealth program with voice recognition technology showed the potential to improve uNa values and MLHFQ scores in HF patients, suggesting that better control of sodium intake and greater quality of life can be achieved by this program. PMID:28970189
User Committees Give NCI and Frederick National Lab Employees a Voice | Poster

Cancer.gov

Do you want a wider selection of food options at the Discovery Cafeteria? Do you wish Purchasing and Logistics would enhance the current software for more efficient processing? Maybe you’d like to see better defined service availability times at Occupational Health Services (OHS). Whatever your suggestion, you can make your voice heard by contacting the appropriate user committee online.
A Dual-Mode Human Computer Interface Combining Speech and Tongue Motion for People with Severe Disabilities

PubMed Central

Huo, Xueliang; Park, Hangue; Kim, Jeonghee; Ghovanloo, Maysam

2015-01-01

We are presenting a new wireless and wearable human computer interface called the dual-mode Tongue Drive System (dTDS), which is designed to allow people with severe disabilities to use computers more effectively with increased speed, flexibility, usability, and independence through their tongue motion and speech. The dTDS detects users’ tongue motion using a magnetic tracer and an array of magnetic sensors embedded in a compact and ergonomic wireless headset. It also captures the users’ voice wirelessly using a small microphone embedded in the same headset. Preliminary evaluation results based on 14 able-bodied subjects and three individuals with high level spinal cord injuries at level C3–C5 indicated that the dTDS headset, combined with a commercially available speech recognition (SR) software, can provide end users with significantly higher performance than either unimodal forms based on the tongue motion or speech alone, particularly in completing tasks that require both pointing and text entry. PMID:23475380
Applied virtual reality at the Research Triangle Institute

NASA Technical Reports Server (NTRS)

Montoya, R. Jorge

1994-01-01

Virtual Reality (VR) is a way for humans to use computers in visualizing, manipulating and interacting with large geometric data bases. This paper describes a VR infrastructure and its application to marketing, modeling, architectural walk through, and training problems. VR integration techniques used in these applications are based on a uniform approach which promotes portability and reusability of developed modules. For each problem, a 3D object data base is created using data captured by hand or electronically. The object's realism is enhanced through either procedural or photo textures. The virtual environment is created and populated with the data base using software tools which also support interactions with and immersivity in the environment. These capabilities are augmented by other sensory channels such as voice recognition, 3D sound, and tracking. Four applications are presented: a virtual furniture showroom, virtual reality models of the North Carolina Global TransPark, a walk through the Dresden Fraunenkirche, and the maintenance training simulator for the National Guard.
78 FR 58305 - Honeywell International, Inc.; Analysis of Agreement Containing Consent Order To Aid Public Comment

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-23

..., formulas, patterns, devices, manufacturing processes, or customer names. If you want the Commission to give... barcode scanners, barcode printers, RFID systems and voice recognition systems. III. Scan Engines The...
Who Goes There?

ERIC Educational Resources Information Center

Vail, Kathleen

1995-01-01

Biometrics (hand geometry, iris and retina scanners, voice and facial recognition, signature dynamics, facial thermography, and fingerprint readers) identifies people based on physical characteristics. Administrators worried about kidnapping, vandalism, theft, and violent intruders might welcome these security measures when they become more…
Exploring the feasibility of smart phone microphone for measurement of acoustic voice parameters and voice pathology screening.

PubMed

Uloza, Virgilijus; Padervinskis, Evaldas; Vegiene, Aurelija; Pribuisiene, Ruta; Saferis, Viktoras; Vaiciukynas, Evaldas; Gelzinis, Adas; Verikas, Antanas

2015-11-01

The objective of this study is to evaluate the reliability of acoustic voice parameters obtained using smart phone (SP) microphones and investigate the utility of use of SP voice recordings for voice screening. Voice samples of sustained vowel/a/obtained from 118 subjects (34 normal and 84 pathological voices) were recorded simultaneously through two microphones: oral AKG Perception 220 microphone and SP Samsung Galaxy Note3 microphone. Acoustic voice signal data were measured for fundamental frequency, jitter and shimmer, normalized noise energy (NNE), signal to noise ratio and harmonic to noise ratio using Dr. Speech software. Discriminant analysis-based Correct Classification Rate (CCR) and Random Forest Classifier (RFC) based Equal Error Rate (EER) were used to evaluate the feasibility of acoustic voice parameters classifying normal and pathological voice classes. Lithuanian version of Glottal Function Index (LT_GFI) questionnaire was utilized for self-assessment of the severity of voice disorder. The correlations of acoustic voice parameters obtained with two types of microphones were statistically significant and strong (r = 0.73-1.0) for the entire measurements. When classifying into normal/pathological voice classes, the Oral-NNE revealed the CCR of 73.7% and the pair of SP-NNE and SP-shimmer parameters revealed CCR of 79.5%. However, fusion of the results obtained from SP voice recordings and GFI data provided the CCR of 84.60% and RFC revealed the EER of 7.9%, respectively. In conclusion, measurements of acoustic voice parameters using SP microphone were shown to be reliable in clinical settings demonstrating high CCR and low EER when distinguishing normal and pathological voice classes, and validated the suitability of the SP microphone signal for the task of automatic voice analysis and screening.

Effects of cross-language voice training on speech perception: Whose familiar voices are more intelligible?

PubMed Central

Levi, Susannah V.; Winters, Stephen J.; Pisoni, David B.

2011-01-01

Previous research has shown that familiarity with a talker’s voice can improve linguistic processing (herein, “Familiar Talker Advantage”), but this benefit is constrained by the context in which the talker’s voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers’ voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers’ voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. PMID:22225059
Systems concept for speech technology application in general aviation

NASA Technical Reports Server (NTRS)

North, R. A.; Bergeron, H.

1984-01-01

The application potential of voice recognition and synthesis circuits for general aviation, single-pilot IFR (SPIFR) situations is examined. The viewpoint of the pilot was central to workload analyses and assessment of the effectiveness of the voice systems. A twin-engine, high performance general aviation aircraft on a cross-country fixed route was employed as the study model. No actual control movements were considered and other possible functions were scored by three IFR-rated instructors. The SPIFR was concluded helpful in alleviating visual and manual workloads during take-off, approach and landing, particularly for data retrieval and entry tasks. Voice synthesis was an aid in alerting a pilot to in-flight problems. It is expected that usable systems will be available within 5 yr.
Detection of Terrorist Preparations by an Artificial Intelligence Expert System Employing Fuzzy Signal Detection Theory

DTIC Science & Technology

2004-10-25

FUSEDOT does not require facial recognition , or video surveillance of public areas, both of which are apparently a component of TIA ([26], pp...does not use fuzzy signal detection. Involves facial recognition and video surveillance of public areas. Involves monitoring the content of voice...fuzzy signal detection, which TIA does not. Second, FUSEDOT would be easier to develop, because it does not require the development of facial
Vocal training in an anthropometrical aspect.

PubMed

Wyganowska-Świątkowska, Marzena; Kowalkowska, Iwona; Flicińska-Pamfil, Grażyna; Dąbrowski, Mikołaj; Kopczyński, Przemysław; Wiskirska-Woźnica, Bożena

2017-12-01

As shown in our previous paper, the dimensions of the cerebral parts of the cranium and face of the vocal students were higher than those of the non-singing students. The aim of the present study was to analyse the type of voice and its development depending on selected dimensions. A total of 56 vocal students - 36 women and 20 men - who underwent anthropometric measurements were divided into groups according to their voice type. Two professors of singing made a subjective, independent evaluation of individual students' vocal development progress during the four years of training. The findings were analysed statistically with the current licensed versions of Statistica software. We found statistically significant positive correlation between: the head length, head and face width, depth of upper and middle face, nose length and student's voice development. The dimensions of the head and the face have no impact on type of voice; however, some anatomical characteristics may have impact on voice development.
Robot Command Interface Using an Audio-Visual Speech Recognition System

NASA Astrophysics Data System (ADS)

Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
Intra-oral pressure-based voicing control of electrolaryngeal speech with intra-oral vibrator.

PubMed

Takahashi, Hirokazu; Nakao, Masayuki; Kikuchi, Yataro; Kaga, Kimitaka

2008-07-01

In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.
An innovative multimodal virtual platform for communication with devices in a natural way

NASA Astrophysics Data System (ADS)

Kinkar, Chhayarani R.; Golash, Richa; Upadhyay, Akhilesh R.

2012-03-01

As technology grows people are diverted and are more interested in communicating with machine or computer naturally. This will make machine more compact and portable by avoiding remote, keyboard etc. also it will help them to live in an environment free from electromagnetic waves. This thought has made 'recognition of natural modality in human computer interaction' a most appealing and promising research field. Simultaneously it has been observed that using single mode of interaction limit the complete utilization of commands as well as data flow. In this paper a multimodal platform, where out of many natural modalities like eye gaze, speech, voice, face etc. human gestures are combined with human voice is proposed which will minimize the mean square error. This will loosen the strict environment needed for accurate and robust interaction while using single mode. Gesture complement Speech, gestures are ideal for direct object manipulation and natural language is used for descriptive tasks. Human computer interaction basically requires two broad sections recognition and interpretation. Recognition and interpretation of natural modality in complex binary instruction is a tough task as it integrate real world to virtual environment. The main idea of the paper is to develop a efficient model for data fusion coming from heterogeneous sensors, camera and microphone. Through this paper we have analyzed that the efficiency is increased if heterogeneous data (image & voice) is combined at feature level using artificial intelligence. The long term goal of this paper is to design a robust system for physically not able or having less technical knowledge.
On the definition and interpretation of voice selective activation in the temporal cortex

PubMed Central

Bethmann, Anja; Brechmann, André

2014-01-01

Regions along the superior temporal sulci and in the anterior temporal lobes have been found to be involved in voice processing. It has even been argued that parts of the temporal cortices serve as voice-selective areas. Yet, evidence for voice-selective activation in the strict sense is still missing. The current fMRI study aimed at assessing the degree of voice-specific processing in different parts of the superior and middle temporal cortices. To this end, voices of famous persons were contrasted with widely different categories, which were sounds of animals and musical instruments. The argumentation was that only brain regions with statistically proven absence of activation by the control stimuli may be considered as candidates for voice-selective areas. Neural activity was found to be stronger in response to human voices in all analyzed parts of the temporal lobes except for the middle and posterior STG. More importantly, the activation differences between voices and the other environmental sounds increased continuously from the mid-posterior STG to the anterior MTG. Here, only voices but not the control stimuli excited an increase of the BOLD response above a resting baseline level. The findings are discussed with reference to the function of the anterior temporal lobes in person recognition and the general question on how to define selectivity of brain regions for a specific class of stimuli or tasks. In addition, our results corroborate recent assumptions about the hierarchical organization of auditory processing building on a processing stream from the primary auditory cortices to anterior portions of the temporal lobes. PMID:25071527
Accommodation and Compliance Series: Employees with Arthritis

MedlinePlus

... handed keyboard, an articulating keyboard tray, speech recognition software, a trackball, and office equipment for a workstation ... space heater, additional window insulation, and speech recognition software. An insurance clerk with arthritis from systemic lupus ...
Should visual speech cues (speechreading) be considered when fitting hearing aids?

NASA Astrophysics Data System (ADS)

Grant, Ken

2002-05-01

When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.
Wavelet-based associative memory

NASA Astrophysics Data System (ADS)

Jones, Katharine J.

2004-04-01

Faces provide important characteristics of a person"s identification. In security checks, face recognition still remains the method in continuous use despite other approaches (i.e. fingerprints, voice recognition, pupil contraction, DNA scanners). With an associative memory, the output data is recalled directly using the input data. This can be achieved with a Nonlinear Holographic Associative Memory (NHAM). This approach can also distinguish between strongly correlated images and images that are partially or totally enclosed by others. Adaptive wavelet lifting has been used for Content-Based Image Retrieval. In this paper, adaptive wavelet lifting will be applied to face recognition to achieve an associative memory.
A posteriori error estimates in voice source recovery

NASA Astrophysics Data System (ADS)

Leonov, A. S.; Sorokin, V. N.

2017-12-01

The inverse problem of voice source pulse recovery from a segment of a speech signal is under consideration. A special mathematical model is used for the solution that relates these quantities. A variational method of solving inverse problem of voice source recovery for a new parametric class of sources, that is for piecewise-linear sources (PWL-sources), is proposed. Also, a technique for a posteriori numerical error estimation for obtained solutions is presented. A computer study of the adequacy of adopted speech production model with PWL-sources is performed in solving the inverse problems for various types of voice signals, as well as corresponding study of a posteriori error estimates. Numerical experiments for speech signals show satisfactory properties of proposed a posteriori error estimates, which represent the upper bounds of possible errors in solving the inverse problem. The estimate of the most probable error in determining the source-pulse shapes is about 7-8% for the investigated speech material. It is noted that a posteriori error estimates can be used as a criterion of the quality for obtained voice source pulses in application to speaker recognition.
Experimental study on GMM-based speaker recognition

NASA Astrophysics Data System (ADS)

Ye, Wenxing; Wu, Dapeng; Nucci, Antonio

2010-04-01

Speaker recognition plays a very important role in the field of biometric security. In order to improve the recognition performance, many pattern recognition techniques have be explored in the literature. Among these techniques, the Gaussian Mixture Model (GMM) is proved to be an effective statistic model for speaker recognition and is used in most state-of-the-art speaker recognition systems. The GMM is used to represent the 'voice print' of a speaker through modeling the spectral characteristic of speech signals of the speaker. In this paper, we implement a speaker recognition system, which consists of preprocessing, Mel-Frequency Cepstrum Coefficients (MFCCs) based feature extraction, and GMM based classification. We test our system with TIDIGITS data set (325 speakers) and our own recordings of more than 200 speakers; our system achieves 100% correct recognition rate. Moreover, we also test our system under the scenario that training samples are from one language but test samples are from a different language; our system also achieves 100% correct recognition rate, which indicates that our system is language independent.
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

PubMed

Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
Combined Use of Standard and Throat Microphones for Measurement of Acoustic Voice Parameters and Voice Categorization.

PubMed

Uloza, Virgilijus; Padervinskis, Evaldas; Uloziene, Ingrida; Saferis, Viktoras; Verikas, Antanas

2015-09-01

The aim of the present study was to evaluate the reliability of the measurements of acoustic voice parameters obtained simultaneously using oral and contact (throat) microphones and to investigate utility of combined use of these microphones for voice categorization. Voice samples of sustained vowel /a/ obtained from 157 subjects (105 healthy and 52 pathological voices) were recorded in a soundproof booth simultaneously through two microphones: oral AKG Perception 220 microphone (AKG Acoustics, Vienna, Austria) and contact (throat) Triumph PC microphone (Clearer Communications, Inc, Burnaby, Canada) placed on the lamina of thyroid cartilage. Acoustic voice signal data were measured for fundamental frequency, percent of jitter and shimmer, normalized noise energy, signal-to-noise ratio, and harmonic-to-noise ratio using Dr. Speech software (Tiger Electronics, Seattle, WA). The correlations of acoustic voice parameters in vocal performance were statistically significant and strong (r = 0.71-1.0) for the entire functional measurements obtained for the two microphones. When classifying into healthy-pathological voice classes, the oral-shimmer revealed the correct classification rate (CCR) of 75.2% and the throat-jitter revealed CCR of 70.7%. However, combination of both throat and oral microphones allowed identifying a set of three voice parameters: throat-signal-to-noise ratio, oral-shimmer, and oral-normalized noise energy, which provided the CCR of 80.3%. The measurements of acoustic voice parameters using a combination of oral and throat microphones showed to be reliable in clinical settings and demonstrated high CCRs when distinguishing the healthy and pathological voice patient groups. Our study validates the suitability of the throat microphone signal for the task of automatic voice analysis for the purpose of voice screening. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Perceptual, auditory and acoustic vocal analysis of speech and singing in choir conductors.

PubMed

Rehder, Maria Inês Beltrati Cornacchioni; Behlau, Mara

2008-01-01

the voice of choir conductors. to evaluate the vocal quality of choir conductors based on the production of a sustained vowel during singing and when speaking in order to observe auditory and acoustic differences. participants of this study were 100 choir conductors, with an equal distribution between genders. Participants were asked to produce the sustained vowel "é" using a singing and speaking voice. Speech samples were analyzed based on auditory-perceptive and acoustic parameters. The auditory-perceptive analysis was carried out by two speech-language pathologist, specialists in this field of knowledge. The acoustic analysis was carried out with the support of the computer software Doctor Speech (Tiger Electronics, SRD, USA, version 4.0), using the Real Analysis module. the auditory-perceptive analysis of the vocal quality indicated that most conductors have adapted voices, presenting more alterations in their speaking voice. The acoustic analysis indicated different values between genders and between the different production modalities. The fundamental frequency was higher in the singing voice, as well as the values for the first formant; the second formant presented lower values in the singing voice, with statistically significant results only for women. the voice of choir conductors is adapted, presenting fewer deviations in the singing voice when compared to the speaking voice. Productions differ based the voice modality, singing or speaking.
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

PubMed Central

Martinelli, Eugenio; Mencattini, Arianna; Di Natale, Corrado

2016-01-01

Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present ‘intelligent personal assistants’, and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants’ emotional state, selective/differential data collection based on emotional content, etc.). PMID:27563724
Nature and extent of person recognition impairments associated with Capgras syndrome in Lewy body dementia

PubMed Central

Fiacconi, Chris M.; Barkley, Victoria; Finger, Elizabeth C.; Carson, Nicole; Duke, Devin; Rosenbaum, R. Shayna; Gilboa, Asaf; Köhler, Stefan

2014-01-01

Patients with Capgras syndrome (CS) adopt the delusional belief that persons well-known to them have been replaced by an imposter. Several current theoretical models of CS attribute such misidentification problems to deficits in covert recognition processes related to the generation of appropriate affective autonomic signals. These models assume intact overt recognition processes for the imposter and, more broadly, for other individuals. As such, it has been suggested that CS could reflect the “mirror-image” of prosopagnosia. The purpose of the current study was to determine whether overt person recognition abilities are indeed always spared in CS. Furthermore, we examined whether CS might be associated with any impairments in overt affective judgments of facial expressions. We pursued these goals by studying a patient with Dementia with Lewy bodies (DLB) who showed clear signs of CS, and by comparing him to another patient with DLB who did not experience CS, as well as to a group of healthy control participants. Clinical magnetic resonance imaging scans revealed medial prefrontal cortex (mPFC) atrophy that appeared to be uniquely associated with the presence CS. We assessed overt person recognition with three fame recognition tasks, using faces, voices, and names as cues. We also included measures of confidence and probed pertinent semantic knowledge. In addition, participants rated the intensity of fearful facial expressions. We found that CS was associated with overt person recognition deficits when probed with faces and voices, but not with names. Critically, these deficits were not present in the DLB patient without CS. In addition, CS was associated with impairments in overt judgments of affect intensity. Taken together, our findings cast doubt on the traditional view that CS is the mirror-image of prosopagnosia and that it spares overt recognition abilities. These findings can still be accommodated by models of CS that emphasize deficits in autonomic responding, to the extent that the potential role of interoceptive awareness in overt judgments is taken into account. PMID:25309399
We'll Take It from Here: Further Developments We'd Like To See in Virtual Reference Software.

ERIC Educational Resources Information Center

Coffman, Steven

2001-01-01

Discussion of virtual reference services focuses on software that is currently available and further developments that are needed. Topics include co-browsing and collaboration capabilities; communications technology, including chat technology and voice over Internet protocol (VoIP); networked reference services; and online reference collections…
Schools and Software: What's Now and What's Next

ERIC Educational Resources Information Center

Freeland, Julia; Hernandez, Alex

2014-01-01

What software tools do school systems actually want? Demand-side analyses typically reflect the loudest voices in the market that companies are eager to please--in the case of education technology, the largest urban districts with the largest technology budgets. But half of the nation's 48 million public school students attend approximately 3,700…

The Effects of Certain Background Noises on the Performance of a Voice Recognition System.

DTIC Science & Technology

1980-09-01

Principles in Experimental Design. New York: McGraw-Hill, 1962. Woodworth, R.S. and H. Schlosberg, Experimental Psychology, (Revised edition), New...collection iheet APPENDIX II EXPERIMENTAL PROTOCOL AND SUBJECTS’ INSTRICTJONS THIS IS AN EXPERIMENT DESIGNED TO EVALUJATE SOME ," lE RECOGNITION EQUIPMENT. I...37. CDR Paul Chatelier OUSD R&E Room 3D129 Pentagon Washington, D.C. 20301 38. Ralph Cleveland NFMSO Code 9333 Mechanicsburg, PA 17055 39. Clay Coler
Lend Me Your Voice - A Constructivist Approach to Augmentative Communication

NASA Astrophysics Data System (ADS)

Mangiatordi, Andrea; Acosta, Micaela; Castellano, Roxana

This paper envisions a software project dedicated to disabled children with communication impairments or restrictions. The idea is to develop a costless and functional communication aid capable of voice output. This software could be a very convenient solution if used on the laptops provided by One Laptop Per Child to children in different developing countries. These laptops, together with their operating system, are designed following constructionist ideas, focused on cooperative learning and "learning learning". This work discusses the creation of a tool which is both an aid to the disabled children and a base on which inclusive context can be built in school classes. An ideal lesson plan is discussed which could be used as a guideline for teachers.
Investigation of air transportation technology at Princeton University, 1983

NASA Technical Reports Server (NTRS)

Stengel, Robert F.

1987-01-01

Progress is discussed for each of the following areas: voice recognition technology for flight control; guidance and control strategies for penetration of microbursts and wind shear; application of artificial intelligence in flight control systems; and computer-aided aircraft design.
Federal Barriers to Innovation

ERIC Educational Resources Information Center

Miller, Raegen; Lake, Robin

2012-01-01

With educational outcomes inadequate, resources tight, and students' academic needs growing more complex, America's education system is certainly ready for technological innovation. And technology itself is ripe to be exploited. Devices harnessing cheap computing power have become smart and connected. Voice recognition, artificial intelligence,…
Defining the ATC Controller Interface for Data Link Clearances

NASA Technical Reports Server (NTRS)

Rankin, James

1998-01-01

The Controller Interface (CI) is the primary method for Air Traffic Controllers to communicate with aircraft via Controller-Pilot Data Link Communications (CPDLC). The controller, wearing a microphone/headset, aurally gives instructions to aircraft as he/she would with today's voice radio systems. The CI's voice recognition system converts the instructions to digitized messages that are formatted according to the RTCA DO-219 Operational Performance Standards for ATC Two-Way Data Link Communications. The DO-219 messages are transferred via RS-232 to the ATIDS system for uplink using a Mode-S datalink. Pilot acknowledgments of controller messages are downlinked to the ATIDS system and transferred to the Cl. A computer monitor is used to convey information to the controller. Aircraft data from the ARTS database are displayed on flight strips. The flight strips are electronic versions of the strips currently used in the ATC system. Outgoing controller messages cause the respective strip to change color to indicate an unacknowledged transmission. The message text is shown on the flight strips for reference. When the pilot acknowledges the message, the strip returns to its normal color. A map of the airport can also be displayed on the monitor. In addition to voice recognition, the controller can enter messages using the monitor's touch screen or by mouse/keyboard.
Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction.

PubMed

Nass, C; Lee, K M

2001-09-01

Would people exhibit similarity-attraction and consistency-attraction toward unambiguously computer-generated speech even when personality is clearly not relevant? In Experiment 1, participants (extrovert or introvert) heard a synthesized voice (extrovert or introvert) on a book-buying Web site. Participants accurately recognized personality cues in text to speech and showed similarity-attraction in their evaluation of the computer voice, the book reviews, and the reviewer. Experiment 2, in a Web auction context, added personality of the text to the previous design. The results replicated Experiment 1 and demonstrated consistency (voice and text personality)-attraction. To maximize liking and trust, designers should set parameters, for example, words per minute or frequency range, that create a personality that is consistent with the user and the content being presented.
Newly learned word forms are abstract and integrated immediately after acquisition

PubMed Central

Kapnoula, Efthymia C.; McMurray, Bob

2015-01-01

A hotly debated question in word learning concerns the conditions under which newly learned words compete or interfere with familiar words during spoken word recognition. This has recently been described as a key marker of the integration of a new word into the lexicon and was thought to require consolidation Dumay & Gaskell, (Psychological Science, 18, 35–39, 2007; Gaskell & Dumay, Cognition, 89, 105–132, 2003). Recently, however, Kapnoula, Packard, Gupta, and McMurray, (Cognition, 134, 85–99, 2015) showed that interference can be observed immediately after a word is first learned, implying very rapid integration of new words into the lexicon. It is an open question whether these kinds of effects derive from episodic traces of novel words or from more abstract and lexicalized representations. Here we addressed this question by testing inhibition for newly learned words using training and test stimuli presented in different talker voices. During training, participants were exposed to a set of nonwords spoken by a female speaker. Immediately after training, we assessed the ability of the novel word forms to inhibit familiar words, using a variant of the visual world paradigm. Crucially, the test items were produced by a male speaker. An analysis of fixations showed that even with a change in voice, newly learned words interfered with the recognition of similar known words. These findings show that lexical competition effects from newly learned words spread across different talker voices, which suggests that newly learned words can be sufficiently lexicalized, and abstract with respect to talker voice, without consolidation. PMID:26202702
The Temporal Lobes Differentiate between the Voices of Famous and Unknown People: An Event-Related fMRI Study on Speaker Recognition

PubMed Central

Bethmann, Anja; Scheich, Henning; Brechmann, André

2012-01-01

It is widely accepted that the perception of human voices is supported by neural structures located along the superior temporal sulci. However, there is an ongoing discussion to what extent the activations found in fMRI studies are evoked by the vocal features themselves or are the result of phonetic processing. To show that the temporal lobes are indeed engaged in voice processing, short utterances spoken by famous and unknown people were presented to healthy young participants whose task it was to identify the familiar speakers. In two event-related fMRI experiments, the temporal lobes were found to differentiate between familiar and unfamiliar voices such that named voices elicited higher BOLD signal intensities than unfamiliar voices. Yet, the temporal cortices did not only discriminate between familiar and unfamiliar voices. Experiment 2, which required overtly spoken responses and allowed to distinguish between four familiarity grades, revealed that there was a fine-grained differentiation between all of these familiarity levels with higher familiarity being associated with larger BOLD signal amplitudes. Finally, we observed a gradual response change such that the BOLD signal differences between unfamiliar and highly familiar voices increased with the distance of an area from the transverse temporal gyri, especially towards the anterior temporal cortex and the middle temporal gyri. Therefore, the results suggest that (the anterior and non-superior portions of) the temporal lobes participate in voice-specific processing independent from phonetic components also involved in spoken speech material. PMID:23112826
The non-trusty clown attack on model-based speaker recognition systems

NASA Astrophysics Data System (ADS)

Farrokh Baroughi, Alireza; Craver, Scott

2015-03-01

Biometric detectors for speaker identification commonly employ a statistical model for a subject's voice, such as a Gaussian Mixture Model, that combines multiple means to improve detector performance. This allows a malicious insider to amend or append a component of a subject's statistical model so that a detector behaves normally except under a carefully engineered circumstance. This allows an attacker to force a misclassification of his or her voice only when desired, by smuggling data into a database far in advance of an attack. Note that the attack is possible if attacker has access to database even for a limited time to modify victim's model. We exhibit such an attack on a speaker identification, in which an attacker can force a misclassification by speaking in an unusual voice, and replacing the least weighted component of victim's model by the most weighted competent of the unusual voice of the attacker's model. The reason attacker make his or her voice unusual during the attack is because his or her normal voice model can be in database, and by attacking with unusual voice, the attacker has the option to be recognized as himself or herself when talking normally or as the victim when talking in the unusual manner. By attaching an appropriately weighted vector to a victim's model, we can impersonate all users in our simulations, while avoiding unwanted false rejections.
A preliminary analysis of human factors affecting the recognition accuracy of a discrete word recognizer for C3 systems

NASA Astrophysics Data System (ADS)

Yellen, H. W.

1983-03-01

Literature pertaining to Voice Recognition abounds with information relevant to the assessment of transitory speech recognition devices. In the past, engineering requirements have dictated the path this technology followed. But, other factors do exist that influence recognition accuracy. This thesis explores the impact of Human Factors on the successful recognition of speech, principally addressing the differences or variability among users. A Threshold Technology T-600 was used for a 100 utterance vocubalary to test 44 subjects. A statistical analysis was conducted on 5 generic categories of Human Factors: Occupational, Operational, Psychological, Physiological and Personal. How the equipment is trained and the experience level of the speaker were found to be key characteristics influencing recognition accuracy. To a lesser extent computer experience, time or week, accent, vital capacity and rate of air flow, speaker cooperativeness and anxiety were found to affect overall error rates.
Virtual Observer Controller (VOC) for Small Unit Infantry Laser Simulation Training

DTIC Science & Technology

2007-04-01

per-seat license when deployed. As a result, ViaVoice was abandoned early in development. Next, the SPHINX engine from Carnegie Mellon University was...examined. Sphinx is Java-based software, providing cross-platform functionality, and it is also free, open-source software. Software developers at...IST had experience using SPHINX , so it was initially selected it to be the VOC speech engine. After implementing a small portion of the VOC grammar
Accuracy of computer-assisted navigation: significant augmentation by facial recognition software.

PubMed

Glicksman, Jordan T; Reger, Christine; Parasher, Arjun K; Kennedy, David W

2017-09-01

Over the past 20 years, image guidance navigation has been used with increasing frequency as an adjunct during sinus and skull base surgery. These devices commonly utilize surface registration, where varying pressure of the registration probe and loss of contact with the face during the skin tracing process can lead to registration inaccuracies, and the number of registration points incorporated is necessarily limited. The aim of this study was to evaluate the use of novel facial recognition software for image guidance registration. Consecutive adults undergoing endoscopic sinus surgery (ESS) were prospectively studied. Patients underwent image guidance registration via both conventional surface registration and facial recognition software. The accuracy of both registration processes were measured at the head of the middle turbinate (MTH), middle turbinate axilla (MTA), anterior wall of sphenoid sinus (SS), and nasal tip (NT). Forty-five patients were included in this investigation. Facial recognition was accurate to within a mean of 0.47 mm at the MTH, 0.33 mm at the MTA, 0.39 mm at the SS, and 0.36 mm at the NT. Facial recognition was more accurate than surface registration at the MTH by an average of 0.43 mm (p = 0.002), at the MTA by an average of 0.44 mm (p < 0.001), and at the SS by an average of 0.40 mm (p < 0.001). The integration of facial recognition software did not adversely affect registration time. In this prospective study, automated facial recognition software significantly improved the accuracy of image guidance registration when compared to conventional surface registration. © 2017 ARS-AAOA, LLC.
Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

PubMed Central

Wong, Raymond

2013-01-01

Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684
Near-infrared face recognition utilizing open CV software

NASA Astrophysics Data System (ADS)

Sellami, Louiza; Ngo, Hau; Fowler, Chris J.; Kearney, Liam M.

2014-06-01

Commercially available hardware, freely available algorithms, and authors' developed software are synergized successfully to detect and recognize subjects in an environment without visible light. This project integrates three major components: an illumination device operating in near infrared (NIR) spectrum, a NIR capable camera and a software algorithm capable of performing image manipulation, facial detection and recognition. Focusing our efforts in the near infrared spectrum allows the low budget system to operate covertly while still allowing for accurate face recognition. In doing so a valuable function has been developed which presents potential benefits in future civilian and military security and surveillance operations.
Higher-order neural network software for distortion invariant object recognition

NASA Technical Reports Server (NTRS)

Reid, Max B.; Spirkovska, Lilly

1991-01-01

The state-of-the-art in pattern recognition for such applications as automatic target recognition and industrial robotic vision relies on digital image processing. We present a higher-order neural network model and software which performs the complete feature extraction-pattern classification paradigm required for automatic pattern recognition. Using a third-order neural network, we demonstrate complete, 100 percent accurate invariance to distortions of scale, position, and in-plate rotation. In a higher-order neural network, feature extraction is built into the network, and does not have to be learned. Only the relatively simple classification step must be learned. This is key to achieving very rapid training. The training set is much smaller than with standard neural network software because the higher-order network only has to be shown one view of each object to be learned, not every possible view. The software and graphical user interface run on any Sun workstation. Results of the use of the neural software in autonomous robotic vision systems are presented. Such a system could have extensive application in robotic manufacturing.
Information system for diagnosis of respiratory system diseases

NASA Astrophysics Data System (ADS)

Abramov, G. V.; Korobova, L. A.; Ivashin, A. L.; Matytsina, I. A.

2018-05-01

An information system is for the diagnosis of patients with lung diseases. The main problem solved by this system is the definition of the parameters of cough fragments in the monitoring recordings using a voice recorder. The authors give the recognition criteria of recorded cough moments, audio records analysis. The results of the research are systematized. The cough recognition system can be used by the medical specialists to diagnose the condition of the patients and to monitor the process of their treatment.
A Multimodal Emotion Detection System during Human-Robot Interaction

PubMed Central

Alonso-Martín, Fernando; Malfaz, María; Sequeira, João; Gorostiza, Javier F.; Salichs, Miguel A.

2013-01-01

In this paper, a multimodal user-emotion detection system for social robots is presented. This system is intended to be used during human–robot interaction, and it is integrated as part of the overall interaction system of the robot: the Robotics Dialog System (RDS). Two modes are used to detect emotions: the voice and face expression analysis. In order to analyze the voice of the user, a new component has been developed: Gender and Emotion Voice Analysis (GEVA), which is written using the Chuck language. For emotion detection in facial expressions, the system, Gender and Emotion Facial Analysis (GEFA), has been also developed. This last system integrates two third-party solutions: Sophisticated High-speed Object Recognition Engine (SHORE) and Computer Expression Recognition Toolbox (CERT). Once these new components (GEVA and GEFA) give their results, a decision rule is applied in order to combine the information given by both of them. The result of this rule, the detected emotion, is integrated into the dialog system through communicative acts. Hence, each communicative act gives, among other things, the detected emotion of the user to the RDS so it can adapt its strategy in order to get a greater satisfaction degree during the human–robot dialog. Each of the new components, GEVA and GEFA, can also be used individually. Moreover, they are integrated with the robotic control platform ROS (Robot Operating System). Several experiments with real users were performed to determine the accuracy of each component and to set the final decision rule. The results obtained from applying this decision rule in these experiments show a high success rate in automatic user emotion recognition, improving the results given by the two information channels (audio and visual) separately. PMID:24240598
Microcomputers: Independence and Information Access for the Physically Handicapped.

ERIC Educational Resources Information Center

Regen, Shari S.; Chen, Ching-chih

1984-01-01

Provides overview of recent technological developments in microcomputer technology for the physically disabled, including discussion of view expansion, "talking terminals," voice recognition, and price and convenience of micro-based products. Equipment manufacturers and training centers for the physically disabled are listed and microcomputer…
Sperry Univac speech communications technology

NASA Technical Reports Server (NTRS)

Medress, Mark F.

1977-01-01

Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.
Relationship Between Voice and Motor Disabilities of Parkinson's Disease.

PubMed

Majdinasab, Fatemeh; Karkheiran, Siamak; Soltani, Majid; Moradi, Negin; Shahidi, Gholamali

2016-11-01

To evaluate voice of Iranian patients with Parkinson's disease (PD) and find any relationship between motor disabilities and acoustic voice parameters as speech motor components. We evaluated 27 Farsi-speaking PD patients and 21 age- and sex-matched healthy persons as control. Motor performance was assessed by the Unified Parkinson's Disease Rating Scale part III and Hoehn and Yahr rating scale in the "on" state. Acoustic voice evaluation, including fundamental frequency (f0), standard deviation of f0, minimum of f0, maximum of f0, shimmer, jitter, and harmonic to noise ratio, was done using the Praat software via /a/ prolongation. No difference was seen between the voice of the patients and the voice of the controls. f0 and its variation had a significant correlation with the duration of the disease, but did not have any relationships with the Unified Parkinson's Disease Rating Scale part III. Only limited relationship was observed between voice and motor disabilities. Tremor is an important main feature of PD that affects motor and phonation systems. Females had an older age at onset, more prolonged disease, and more severe motor disabilities (not statistically significant), but phonation disorders were more frequent in males and showed more relationship with severity of motor disabilities. Voice is affected by PD earlier than many other motor components and is more sensitive to disease progression. Tremor is the most effective part of PD that impacts voice. PD has more effect on voice of male versus female patients. Copyright Â© 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.

PubMed

Shao, Xu; Milner, Ben

2005-08-01

This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.
Measures of voiced frication for automatic classification

NASA Astrophysics Data System (ADS)

Jackson, Philip J. B.; Jesus, Luis M. T.; Shadle, Christine H.; Pincas, Jonathan

2004-05-01

As an approach to understanding the characteristics of the acoustic sources in voiced fricatives, it seems apt to draw on knowledge of vowels and voiceless fricatives, which have been relatively well studied. However, the presence of both phonation and frication in these mixed-source sounds offers the possibility of mutual interaction effects, with variations across place of articulation. This paper examines the acoustic and articulatory consequences of these interactions and explores automatic techniques for finding parametric and statistical descriptions of these phenomena. A reliable and consistent set of such acoustic cues could be used for phonetic classification or speech recognition. Following work on devoicing of European Portuguese voiced fricatives [Jesus and Shadle, in Mamede et al. (eds.) (Springer-Verlag, Berlin, 2003), pp. 1-8]. and the modulating effect of voicing on frication [Jackson and Shadle, J. Acoust. Soc. Am. 108, 1421-1434 (2000)], the present study focuses on three types of information: (i) sequences and durations of acoustic events in VC transitions, (ii) temporal, spectral and modulation measures from the periodic and aperiodic components of the acoustic signal, and (iii) voicing activity derived from simultaneous EGG data. Analysis of interactions observed in British/American English and European Portuguese speech corpora will be compared, and the principal findings discussed.
The Next Era: Deep Learning in Pharmaceutical Research.

PubMed

Ekins, Sean

2016-11-01

Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule's properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique.
Do Listeners Store in Memory a Speaker's Habitual Utterance-Final Phonation Type?

PubMed Central

Bőhm, Tamás; Shattuck-Hufnagel, Stefanie

2009-01-01

Earlier studies report systematic differences across speakers in the occurrence of utterance-final irregular phonation; the work reported here investigated whether human listeners remember this speaker-specific information and can access it when necessary (a prerequisite for using this cue in speaker recognition). Listeners personally familiar with the voices of the speakers were presented with pairs of speech samples: one with the original and the other with transformed final phonation type. Asked to select the member of the pair that was closer to the talker's voice, most listeners tended to choose the unmanipulated token (even though they judged them to sound essentially equally natural). This suggests that utterance-final pitch period irregularity is part of the mental representation of individual speaker voices, although this may depend on the individual speaker and listener to some extent. PMID:19776665
The effect of voice onset time differences on lexical access in Dutch.

PubMed

van Alphen, Petra M; McQueen, James M

2006-02-01

Effects on spoken-word recognition of prevoicing differences in Dutch initial voiced plosives were examined. In 2 cross-modal identity-priming experiments, participants heard prime words and nonwords beginning with voiced plosives with 12, 6, or 0 periods of prevoicing or matched items beginning with voiceless plosives and made lexical decisions to visual tokens of those items. Six-period primes had the same effect as 12-period primes. Zero-period primes had a different effect, but only when their voiceless counterparts were real words. Listeners could nevertheless discriminate the 6-period primes from the 12- and 0-period primes. Phonetic detail appears to influence lexical access only to the extent that it is useful: In Dutch, presence versus absence of prevoicing is more informative than amount of prevoicing. ((c) 2006 APA, all rights reserved).
Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task.

PubMed

Brockmann, Meike; Drinnan, Michael J; Storck, Claudio; Carding, Paul N

2011-01-01

The aims of this study were to examine vowel and gender effects on jitter and shimmer in a typical clinical voice task while correcting for the confounding effects of voice sound pressure level (SPL) and fundamental frequency (F(0)). Furthermore the relative effect sizes of vowel, gender, voice SPL, and F(0) were assessed, and recommendations for clinical measurements were derived. With this cross-sectional single cohort study, 57 healthy adults (28 women, 29 men) aged 20-40 years were investigated. Three phonations of /a/, /o/, and /i/ at "normal" voice loudness were analyzed using Praat (software). The effects of vowel, gender, voice SPL, and F(0) on jitter and shimmer were assessed using descriptive and inferential (analysis of covariance) statistics. The effect sizes were determined with the eta-squared statistic. Vowels, gender, voice SPL, and F(0), each had significant effects either on jitter or on shimmer, or both. Voice SPL was the most important factor, whereas vowel, gender, and F(0) effects were comparatively small. Because men had systematically higher voice SPL, the gender effects on jitter and shimmer were smaller when correcting for SPL and F(0). Surprisingly, in clinical assessments, voice SPL has the single biggest impact on jitter and shimmer. Vowel and gender effects were clinically important, whereas fundamental frequency had a relatively small influence. Phonations at a predefined voice SPL (80 dB minimum) and vowel (/a/) would enhance measurement reliability. Furthermore, gender-specific thresholds applying these guidelines should be established. However, the efficiency of these measures should be verified and tested with patients. Copyright Â© 2011 The Voice Foundation. All rights reserved.
Prevalence of Hearing Loss in Teachers of Singing and Voice Students.

PubMed

Isaac, Mitchell J; McBroom, Deanna H; Nguyen, Shaun A; Halstead, Lucinda A

2017-05-01

Singers and voice teachers are exposed to a range of noise levels during a normal working day. This study aimed to assess the hearing thresholds in a large sample of generally healthy professional voice teachers and voice students to determine the prevalence of hearing loss in this population. A cross-sectional study was carried out. Voice teachers and vocal students had the option to volunteer for a hearing screening of six standard frequencies in a quiet room with the Shoebox audiometer (Clearwater Clinical Limited) and to fill out a brief survey. Data were analyzed for the prevalence and severity of hearing loss in teachers and students based on several parameters assessed in the surveys. All data were analyzed using Microsoft Excel (Microsoft Corp.) and SPSS Statistics Software (IBM Corp.). A total of 158 participants were included: 58 self-identified as voice teachers, 106 as voice students, and 6 as both. The 6 participants who identified as both, were included in both categories for statistical purposes. Of the 158 participants, 36 had some level of hearing loss: 51.7% of voice teachers had hearing loss, and 7.5% of voice students had hearing loss. Several parameters of noise exposure were found to positively correlate with hearing loss and tinnitus (P < 0.05). Years as a voice teacher and age were both predictors of hearing loss (P < 0.05). Hearing loss in a cohort of voice teachers appears to be more prevalent and severe than previously thought. There is a significant association between years teaching and hearing loss. Raising awareness in this population may prompt teachers and students to adopt strategies to protect their hearing. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Integrative Lifecourse and Genetic Analysis of Military Working Dogs

DTIC Science & Technology

2012-10-01

Intelligent Character Recognition) and HWR ( Handwriting Recognition). A number of various software packages were evaluated and we have settled on a...third-party software is able to recognize check-boxes and columns and do a reasonable job with handwriting – which is does. This workflow will
High Tech and Library Access for People with Disabilities.

ERIC Educational Resources Information Center

Roatch, Mary A.

1992-01-01

Describes tools that enable people with disabilities to access print information, including optical character recognition, synthetic voice output, other input devices, Braille access devices, large print displays, television and video, TDD (Telecommunications Devices for the Deaf), and Telebraille. Use of technology by libraries to meet mandates…
ERP evidence of preserved early memory function in term infants with neonatal encephalopathy following therapeutic hypothermia.

PubMed

Pfister, Katie M; Zhang, Lei; Miller, Neely C; Hultgren, Solveig; Boys, Chris J; Georgieff, Michael K

2016-12-01

Neonatal encephalopathy (NE) carries high risk for neurodevelopmental impairments. Therapeutic hypothermia (TH) reduces this risk, particularly for moderate encephalopathy (ME). Nevertheless, these infants often have subtle functional deficits, including abnormal memory function. Detection of deficits at the earliest possible time-point would allow for intervention during a period of maximal brain plasticity. Recognition memory function in 22 infants with NE treated with TH was compared to 23 healthy controls using event-related potentials (ERPs) at 2 wk of age. ERPs were recorded to mother's voice alternating with a stranger's voice to assess attentional responses (P2), novelty detection (slow wave), and discrimination between familiar and novel (difference wave). Development was tested at 12 mo using the Bayley Scales of Infant Development, Third Edition (BSID-III). The NE group showed similar ERP components and BSID-III scores to controls. However, infants with NE showed discrimination at midline leads (P = 0.01), whereas controls showed discrimination in the left hemisphere (P = 0.05). Normal MRI (P = 0.05) and seizure-free electroencephalogram (EEG) (P = 0.04) correlated positively with outcomes. Infants with NE have preserved recognition memory function after TH. The spatially different recognition memory processing after early brain injury may represent compensatory changes in the brain circuitry and reflect a benefit of TH.
Speech-recognition interfaces for music information retrieval

NASA Astrophysics Data System (ADS)

Goto, Masataka

2005-09-01

This paper describes two hands-free music information retrieval (MIR) systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. Our MIR-based jukebox systems employ two different speech-recognition interfaces for MIR, speech completion and speech spotter, which exploit intentionally controlled nonverbal speech information in original ways. The first is a music retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user only remembers part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces. (Video clips: http://staff.aist.go.jp/m.goto/MIR/speech-if.html)
Phase effects in masking by harmonic complexes: speech recognition.

PubMed

Deroche, Mickael L D; Culling, John F; Chatterjee, Monita

2013-12-01

Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.
Leave it to Florence.

PubMed

Whyatt, Jane

2014-12-31

There are four practice nurses at Heatherlands Medical Centre in Woodchurch, Cheshire--and one 'intelligent system' named Florence. With a voice like a car satnav, 'she' is a software robot, or Artificial Intelligence (AI).
[The investigation of formant on different artistic voice].

PubMed

Wang, Jianqun; Gao, Xia; Liu, Xiaozhou; Feng, Yulin; Shen, Xiaohui; Yu, Chenjie; Yang, Ye

2008-08-01

To explore the characteristic of formant-a very important parameter in the spectrogram of three types of artistic voice (western mode; Chinese mode; Beijing opera). We used MATLAB software to make the short-time Fourier transform and spectrogram analysis on the homeostasis vowel examples of the three types. The western mode had different representation "singer formant" (Fs) based on the voice part; the Chinese mode's notable features were that F1, F2, F3, were continuous and the energy of them changed softly; the Beijing opera had the common representation which was a very wide formant and there was soft transition between formants and various harmonic, besides it showed a similar component like the "Fs" (two formants connected normally). Different artistic voice showed their own characteristics of the formant parameter in the spectrogram, which had important value on the identification, objective evaluation and prediction.
F0 Characteristics of Newsreaders on Varied Emotional Texts in Tamil Language.

PubMed

Gunasekaran, Nishanthi; Boominathan, Prakash; Seethapathy, Jayashree

2017-12-26

The objective of this study was to profile speaking F 0 and its variations in newsreaders on varied emotional texts. This study has a prospective, case-control study design. Fifteen professional newsreaders and 15 non-newsreaders were the participants. The participants read the news bulletin that conveyed different emotions (shock, neutral, happy, and sad) in a habitual and "newsreading" voice. Speaking fundamental frequency (SFF) and F 0 variations were extracted from 1620 tokens using Praat software (version 5.2.32) on the opening lines, headlines, news stories, and closing lines of each news item. Paired t test, independent t test, and Friedman test were used for statistical analysis. Both male and female newsreaders had significantly (P ≤ 0.05) higher SFFs and standard deviations (SDs) of SFF in newsreading voice than speaking voice. Female non-newsreaders demonstrated significantly higher SFF and SD of SFF in newsreading voice, whereas no significant differences were noticed in the frequency parameters for male non-newsreaders. No significant difference was noted in the frequency parameters of speaking and newsreading voice between male newsreaders and male non-newsreaders. A significant difference in the SD of SFF was noticed between female newsreaders and female non-newsreaders in newsreading voice. Female newsreaders had a higher frequency range in both speaking voice and newsreading voice when compared with non-newsreaders. F 0 characteristics and frequency range determine the amount of frequency changes exercised by newsreaders while reading bulletins. This information is highly pedagogic for training voices in this profession. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
A Qualitative Examination of Situational Risk Recognition Among Female Victims of Physical Intimate Partner Violence.

PubMed

Sherrill, Andrew M; Bell, Kathryn M; Wyngarden, Nicole

2016-07-01

Little is known about intimate partner violence (IPV) victims' situational risk recognition, defined as the ability to identify situational factors that signal imminent risk of victimization. Using semi-structured interviews, qualitative data were collected from a community sample of 31 female victims of IPV episodes involving substance use. Thirteen themes were identified, the most prevalent being related to the partner's verbal behavior, tone of voice, motor behavior, alcohol or drug use, and facial expression. Participants reporting at least some anticipation of physical aggression (61.3% of the sample) tended to identify multiple factors (M = 3.47), suggesting numerous situational features often contribute to situational risk recognition. © The Author(s) 2015.
Adapting a Computerized Medical Dictation System to Prepare Academic Papers in Radiology.

PubMed

Sánchez, Yadiel; Prabhakar, Anand M; Uppot, Raul N

2017-09-14

Everyday radiologists use dictation software to compose clinical reports of imaging findings. The dictation software is tailored for medical use and to the speech pattern of each radiologist. Over the past 10 years we have used dictation software to compose academic manuscripts, correspondence letters, and texts of educational exhibits. The advantages of using voice dictation is faster composition of manuscripts. However, use of such software requires preparation. The purpose of this article is to review the steps of adapting a clinical dictation software for dictating academic manuscripts and detail the advantages and limitations of this technique. Copyright © 2017 Elsevier Inc. All rights reserved.
Automatic Assessment of Acoustic Parameters of the Singing Voice: Application to Professional Western Operatic and Jazz Singers.

PubMed

Manfredi, Claudia; Barbagallo, Davide; Baracca, Giovanna; Orlandi, Silvia; Bandini, Andrea; Dejonckere, Philippe H

2015-07-01

The obvious perceptual differences between various singing styles like Western operatic and jazz rely on specific dissimilarities in vocal technique. The present study focuses on differences in vibrato acoustics and in singer's formant as analyzed by a novel software tool, named BioVoice, based on robust high-resolution and adaptive techniques that have proven its validity on synthetic voice signals. A total of 48 professional singers were investigated (29 females; 19 males; 29 Western operatic; and 19 jazz). They were asked to sing "a cappella," but with artistic expression, a well-known musical phrase from Gershwin's Porgy and Bess, in their own style: either operatic or jazz. A specific sustained note was extracted for detailed vibrato analysis. Beside rate (s(-1)) and extent (cents), duration (seconds) and regularity were computed. Two new concepts are introduced: vibrato jitter and vibrato shimmer, by analogy with the traditional jitter and shimmer of voice signals. For the singer's formant, on the same sustained tone, the ratio of the acoustic energy in formants 1-2 to the energy in formants 3, 4, and 5 was automatically computed, providing a quality ratio (QR). Vibrato rates did not differ among groups. Extent was significantly larger in operatic singers, particularly females. Vibrato jitter and vibrato shimmer were significantly smaller in operatic singers. Duration of vibrato was also significantly longer in operatic singers. QR was significantly lower in male operatic singers. Some vibrato characteristics (extent, regularity, and duration) very clearly differentiate the Western operatic singing style from the jazz singing style. The singer's formant is typical of male operatic singers. The new software tool is well suited to provide useful feedback in a pedagogical context. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Children's Recognition of Emotions from Vocal Cues

ERIC Educational Resources Information Center

Sauter, Disa A.; Panattoni, Charlotte; Happe, Francesca

2013-01-01

Emotional cues contain important information about the intentions and feelings of others. Despite a wealth of research into children's understanding of facial signals of emotions, little research has investigated the developmental trajectory of interpreting affective cues in the voice. In this study, 48 children ranging between 5 and 10 years were…
Must One Be "In Recovery" To Help?

ERIC Educational Resources Information Center

Trimpey, Jack

Rational Recovery (RR) and the Addictive Voice Recognition Technique (AVRT) are described. Rational recovery is a young organization which views alcohol and drug dependency differently from the traditional field which sees addiction as a symptom of something, of a disease, of spiritual bankruptcy, of irrational thinking, of unhappiness, of…

Morphosyntactic Neural Analysis for Generalized Lexical Normalization

ERIC Educational Resources Information Center

Leeman-Munk, Samuel Paul

2016-01-01

The phenomenal growth of social media, web forums, and online reviews has spurred a growing interest in automated analysis of user-generated text. At the same time, a proliferation of voice recordings and efforts to archive culture heritage documents are fueling demand for effective automatic speech recognition (ASR) and optical character…
Web Surveys to Digital Movies: Technological Tools of the Trade.

ERIC Educational Resources Information Center

Fetterman, David M.

2002-01-01

Highlights some of the technological tools used by educational researchers today, focusing on data collection related tools such as Web surveys, digital photography, voice recognition and transcription, file sharing and virtual office, videoconferencing on the Internet, instantaneous chat and chat rooms, reporting and dissemination, and digital…
Impact of a Telehealth Program With Voice Recognition Technology in Patients With Chronic Heart Failure: Feasibility Study.

PubMed

Lee, Heesun; Park, Jun-Bean; Choi, Sae Won; Yoon, Yeonyee E; Park, Hyo Eun; Lee, Sang Eun; Lee, Seung-Pyo; Kim, Hyung-Kwan; Cho, Hyun-Jai; Choi, Su-Yeon; Lee, Hae-Young; Choi, Jonghyuk; Lee, Young-Joon; Kim, Yong-Jin; Cho, Goo-Yeong; Choi, Jinwook; Sohn, Dae-Won

2017-10-02

Despite the advances in the diagnosis and treatment of heart failure (HF), the current hospital-oriented framework for HF management does not appear to be sufficient to maintain the stability of HF patients in the long term. The importance of self-care management is increasingly being emphasized as a promising long-term treatment strategy for patients with chronic HF. The objective of this study was to evaluate whether a new information communication technology (ICT)-based telehealth program with voice recognition technology could improve clinical or laboratory outcomes in HF patients. In this prospective single-arm pilot study, we recruited 31 consecutive patients with chronic HF who were referred to our institute. An ICT-based telehealth program with voice recognition technology was developed and used by patients with HF for 12 weeks. Patients were educated on the use of this program via mobile phone, landline, or the Internet for the purpose of improving communication and data collection. Using these systems, we collected comprehensive data elements related to the risk of HF self-care management such as weight, diet, exercise, medication adherence, overall symptom change, and home blood pressure. The study endpoints were the changes observed in urine sodium concentration (uNa), Minnesota Living with Heart Failure (MLHFQ) scores, 6-min walk test, and N-terminal prohormone of brain natriuretic peptide (NT-proBNP) as surrogate markers for appropriate HF management. Among the 31 enrolled patients, 27 (87%) patients completed the study, and 10 (10/27, 37%) showed good adherence to ICT-based telehealth program with voice recognition technology, which was defined as the use of the program for 100 times or more during the study period. Nearly three-fourths of the patients had been hospitalized at least once because of HF before the enrollment (20/27, 74%); 14 patients had 1, 2 patients had 2, and 4 patients had 3 or more previous HF hospitalizations. In the total study population, there was no significant interval change in laboratory and functional outcome variables after 12 weeks of ICT-based telehealth program. In patients with good adherence to ICT-based telehealth program, there was a significant improvement in the mean uNa (103.1 to 78.1; P=.01) but not in those without (85.4 to 96.9; P=.49). Similarly, a marginal improvement in MLHFQ scores was only observed in patients with good adherence (27.5 to 21.4; P=.08) but not in their counterparts (19.0 to 19.7; P=.73). The mean 6-min walk distance and NT-proBNP were not significantly increased in patients regardless of their adherence. Short-term application of ICT-based telehealth program with voice recognition technology showed the potential to improve uNa values and MLHFQ scores in HF patients, suggesting that better control of sodium intake and greater quality of life can be achieved by this program. ©Heesun Lee, Jun-Bean Park, Sae Won Choi, Yeonyee E Yoon, Hyo Eun Park, Sang Eun Lee, Seung-Pyo Lee, Hyung-Kwan Kim, Hyun-Jai Cho, Su-Yeon Choi, Hae-Young Lee, Jonghyuk Choi, Young-Joon Lee, Yong-Jin Kim, Goo-Yeong Cho, Jinwook Choi, Dae-Won Sohn. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 02.10.2017.
VOIP for Telerehabilitation: A Risk Analysis for Privacy, Security, and HIPAA Compliance

PubMed Central

Watzlaf, Valerie J.M.; Moeini, Sohrab; Firouzan, Patti

2010-01-01

Voice over the Internet Protocol (VoIP) systems such as Adobe ConnectNow, Skype, ooVoo, etc. may include the use of software applications for telerehabilitation (TR) therapy that can provide voice and video teleconferencing between patients and therapists. Privacy and security applications as well as HIPAA compliance within these protocols have been questioned by information technologists, providers of care and other health care entities. This paper develops a privacy and security checklist that can be used within a VoIP system to determine if it meets privacy and security procedures and whether it is HIPAA compliant. Based on this analysis, specific HIPAA criteria that therapists and health care facilities should follow are outlined and discussed, and therapists must weigh the risks and benefits when deciding to use VoIP software for TR. PMID:25945172
VOIP for Telerehabilitation: A Risk Analysis for Privacy, Security, and HIPAA Compliance.

PubMed

Watzlaf, Valerie J M; Moeini, Sohrab; Firouzan, Patti

2010-01-01

Voice over the Internet Protocol (VoIP) systems such as Adobe ConnectNow, Skype, ooVoo, etc. may include the use of software applications for telerehabilitation (TR) therapy that can provide voice and video teleconferencing between patients and therapists. Privacy and security applications as well as HIPAA compliance within these protocols have been questioned by information technologists, providers of care and other health care entities. This paper develops a privacy and security checklist that can be used within a VoIP system to determine if it meets privacy and security procedures and whether it is HIPAA compliant. Based on this analysis, specific HIPAA criteria that therapists and health care facilities should follow are outlined and discussed, and therapists must weigh the risks and benefits when deciding to use VoIP software for TR.
Economic Evaluation of Voice Recognition (VR) for the Clinician’s Desktop at the Naval Hospital Roosevelt Roads

DTIC Science & Technology

1997-09-01

first PC-based, very large vocabulary dictation system with a continuous natural language free flow approach to speech recognition. (This system allows...indicating the likelihood that a particular stored HMM reference model is the best match for the input. This approach is called the Baum-Welch...InfoCentral, and Envoy 1.0; and Lotus Development Corp.’s SmartSuite 3, Approach 3.0, and Organizer. 2. IBM At a press conference in New York in June 1997, IBM
Voice technology and BBN

NASA Technical Reports Server (NTRS)

Wolf, Jared J.

1977-01-01

The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.
Speech recognition for embedded automatic positioner for laparoscope

NASA Astrophysics Data System (ADS)

Chen, Xiaodong; Yin, Qingyun; Wang, Yi; Yu, Daoyin

2014-07-01

In this paper a novel speech recognition methodology based on Hidden Markov Model (HMM) is proposed for embedded Automatic Positioner for Laparoscope (APL), which includes a fixed point ARM processor as the core. The APL system is designed to assist the doctor in laparoscopic surgery, by implementing the specific doctor's vocal control to the laparoscope. Real-time respond to the voice commands asks for more efficient speech recognition algorithm for the APL. In order to reduce computation cost without significant loss in recognition accuracy, both arithmetic and algorithmic optimizations are applied in the method presented. First, depending on arithmetic optimizations most, a fixed point frontend for speech feature analysis is built according to the ARM processor's character. Then the fast likelihood computation algorithm is used to reduce computational complexity of the HMM-based recognition algorithm. The experimental results show that, the method shortens the recognition time within 0.5s, while the accuracy higher than 99%, demonstrating its ability to achieve real-time vocal control to the APL.
Experimental evidence of vocal recognition in young and adult black-legged kittiwakes

USGS Publications Warehouse

Mulard, Hervé; Aubin, T.; White, J.F.; Hatch, Shyla A.; Danchin, E.

2008-01-01

Individual recognition is required in most social interactions, and its presence has been confirmed in many species. In birds, vocal cues appear to be a major component of recognition. Curiously, vocal recognition seems absent or limited in some highly social species such as the black-legged kittiwake, Rissa tridactyla. Using playback experiments, we found that kittiwake chicks recognized their parents vocally, this capacity being detectable as early as 20 days after hatching, the youngest age tested. Mates also recognized each other's long calls. Some birds reacted to their partner's voice when only a part of the long call was played back. Nevertheless, only about a third of the tested birds reacted to their mate's or parents' call and we were unable to detect recognition among neighbours. We discuss the low reactivity of kittiwakes in relation to their cliff-nesting habit and compare our results with evidence of vocal recognition in other larids. ?? 2008 The Association for the Study of Animal Behaviour.
Construction site Voice Operated Information System (VOIS) test

NASA Astrophysics Data System (ADS)

Lawrence, Debbie J.; Hettchen, William

1991-01-01

The Voice Activated Information System (VAIS), developed by USACERL, allows inspectors to verbally log on-site inspection reports on a hand held tape recorder. The tape is later processed by the VAIS, which enters the information into the system's database and produces a written report. The Voice Operated Information System (VOIS), developed by USACERL and Automated Sciences Group, through a ESACERL cooperative research and development agreement (CRDA), is an improved voice recognition system based on the concepts and function of the VAIS. To determine the applicability of the VOIS to Corps of Engineers construction projects, Technology Transfer Test Bad (T3B) funds were provided to the Corps of Engineers National Security Agency (NSA) Area Office (Fort Meade) to procure and implement the VOIS, and to train personnel in its use. This report summarizes the NSA application of the VOIS to quality assurance inspection of radio frequency shielding and to progress payment logs, and concludes that the VOIS is an easily implemented system that can offer improvements when applied to repetitive inspection procedures. Use of VOIS can save time during inspection, improve documentation storage, and provide flexible retrieval of stored information.
Feasibility of automated speech sample collection with stuttering children using interactive voice response (IVR) technology.

PubMed

Vogel, Adam P; Block, Susan; Kefalianos, Elaina; Onslow, Mark; Eadie, Patricia; Barth, Ben; Conway, Laura; Mundt, James C; Reilly, Sheena

2015-04-01

To investigate the feasibility of adopting automated interactive voice response (IVR) technology for remotely capturing standardized speech samples from stuttering children. Participants were 10 6-year-old stuttering children. Their parents called a toll-free number from their homes and were prompted to elicit speech from their children using a standard protocol involving conversation, picture description and games. The automated IVR system was implemented using an off-the-shelf telephony software program and delivered by a standard desktop computer. The software infrastructure utilizes voice over internet protocol. Speech samples were automatically recorded during the calls. Video recordings were simultaneously acquired in the home at the time of the call to evaluate the fidelity of the telephone collected samples. Key outcome measures included syllables spoken, percentage of syllables stuttered and an overall rating of stuttering severity using a 10-point scale. Data revealed a high level of relative reliability in terms of intra-class correlation between the video and telephone acquired samples on all outcome measures during the conversation task. Findings were less consistent for speech samples during picture description and games. Results suggest that IVR technology can be used successfully to automate remote capture of child speech samples.
Sensing of Particular Speakers for the Construction of Voice Interface Utilized in Noisy Environment

NASA Astrophysics Data System (ADS)

Sawada, Hideyuki; Ohkado, Minoru

Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.
Audio feature extraction using probability distribution function

NASA Astrophysics Data System (ADS)

Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.

2015-05-01

Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.
Learning Media Application Based On Microcontroller Chip Technology In Early Age

NASA Astrophysics Data System (ADS)

Ika Hidayati, Permata

2018-04-01

In Early childhood cognitive intelligence need right rncdia learning that can help a child’s cognitive intelligence quickly. The purpose of this study to design a learning media in the form of a puppet can used to introduce human anatomy during early childhood. This educational doll utilizing voice recognition technology from EasyVR module to receive commands from the user to introduce body parts on a doll, is used as an indicator TED. In addition to providing the introduction of human anatomy, this dolljut. a user can give a shout out to mainly play previously stored voice module sound recorder. Results obtained from this study is that this educational dolls can detect more than voice and spoken commands that can be random detected. Distance concrete of this doll in detecting the sound is up to a distance of 2.5 meters.
Vocal fold nodules in adult singers: regional opinions about etiologic factors, career impact, and treatment. A survey of otolaryngologists, speech pathologists, and teachers of singing.

PubMed

Hogikyan, N D; Appel, S; Guinn, L W; Haxer, M J

1999-03-01

This study was undertaken to better understand current regional opinions regarding vocal fold nodules in adult singers. A questionnaire was sent to 298 persons representing the 3 professional groups most involved with the care of singers with vocal nodules: otolaryngologists, speech pathologists, and teachers of singing. The questionnaire queried respondents about their level of experience with this problem, and their beliefs about causative factors, career impact, and optimum treatment. Responses within and between groups were similar, with differences between groups primarily in the magnitude of positive or negative responses, rather than in the polarity of the responses. Prevailing opinions included: recognition of causative factors in both singing and speaking voice practices, optimism about responsiveness to appropriate treatment, enthusiasm for coordinated voice therapy and voice training as first-line treatment, and acceptance of microsurgical management as appropriate treatment if behavioral management fails.
Developing and Evaluating an Oral Skills Training Website Supported by Automatic Speech Recognition Technology

ERIC Educational Resources Information Center

Chen, Howard Hao-Jan

2011-01-01

Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…
Speech Recognition Software for Language Learning: Toward an Evaluation of Validity and Student Perceptions

ERIC Educational Resources Information Center

Cordier, Deborah

2009-01-01

A renewed focus on foreign language (FL) learning and speech for communication has resulted in computer-assisted language learning (CALL) software developed with Automatic Speech Recognition (ASR). ASR features for FL pronunciation (Lafford, 2004) are functional components of CALL designs used for FL teaching and learning. The ASR features…
Automatic Speech Recognition: Reliability and Pedagogical Implications for Teaching Pronunciation

ERIC Educational Resources Information Center

Kim, In-Seok

2006-01-01

This study examines the reliability of automatic speech recognition (ASR) software used to teach English pronunciation, focusing on one particular piece of software, "FluSpeak, as a typical example." Thirty-six Korean English as a Foreign Language (EFL) college students participated in an experiment in which they listened to 15 sentences…
Interpreting Chicken-Scratch: Lexical Access for Handwritten Words

PubMed Central

Barnhart, Anthony S.; Goldinger, Stephen D.

2014-01-01

Handwritten word recognition is a field of study that has largely been neglected in the psychological literature, despite its prevalence in society. Whereas studies of spoken word recognition almost exclusively employ natural, human voices as stimuli, studies of visual word recognition use synthetic typefaces, thus simplifying the process of word recognition. The current study examined the effects of handwriting on a series of lexical variables thought to influence bottom-up and top-down processing, including word frequency, regularity, bidirectional consistency, and imageability. The results suggest that the natural physical ambiguity of handwritten stimuli forces a greater reliance on top-down processes, because almost all effects were magnified, relative to conditions with computer print. These findings suggest that processes of word perception naturally adapt to handwriting, compensating for physical ambiguity by increasing top-down feedback. PMID:20695708
[Research on Barrier-free Home Environment System Based on Speech Recognition].

PubMed

Zhu, Husheng; Yu, Hongliu; Shi, Ping; Fang, Youfang; Jian, Zhuo

2015-10-01

The number of people with physical disabilities is increasing year by year, and the trend of population aging is more and more serious. In order to improve the quality of the life, a control system of accessible home environment for the patients with serious disabilities was developed to control the home electrical devices with the voice of the patients. The control system includes a central control platform, a speech recognition module, a terminal operation module, etc. The system combines the speech recognition control technology and wireless information transmission technology with the embedded mobile computing technology, and interconnects the lamp, electronic locks, alarms, TV and other electrical devices in the home environment as a whole system through a wireless network node. The experimental results showed that speech recognition success rate was more than 84% in the home environment.

Toward A Simulation-Based Tool for the Treatment of Vocal Fold Paralysis

PubMed Central

Mittal, Rajat; Zheng, Xudong; Bhardwaj, Rajneesh; Seo, Jung Hee; Xue, Qian; Bielamowicz, Steven

2011-01-01

Advances in high-performance computing are enabling a new generation of software tools that employ computational modeling for surgical planning. Surgical management of laryngeal paralysis is one area where such computational tools could have a significant impact. The current paper describes a comprehensive effort to develop a software tool for planning medialization laryngoplasty where a prosthetic implant is inserted into the larynx in order to medialize the paralyzed vocal fold (VF). While this is one of the most common procedures used to restore voice in patients with VF paralysis, it has a relatively high revision rate, and the tool being developed is expected to improve surgical outcomes. This software tool models the biomechanics of airflow-induced vibration in the human larynx and incorporates sophisticated approaches for modeling the turbulent laryngeal flow, the complex dynamics of the VFs, as well as the production of voiced sound. The current paper describes the key elements of the modeling approach, presents computational results that demonstrate the utility of the approach and also describes some of the limitations and challenges. PMID:21556320
Success with voice recognition.

PubMed

Sferrella, Sheila M

2003-01-01

You need a compelling reason to implement voice recognition technology. At my institution, the compelling reason was a turnaround time for Radiology results of more than two days. Only 41 percent of our reports were transcribed and signed within 24 hours. In November 1998, a team from Lehigh Valley Hospital went to RSNA and reviewed every voice system on the market. The evaluation was done with the radiologist workflow in mind, and we came back from the meeting with the vendor selection completed. The next steps included developing a business plan, approval of funds, reference calls to more than 15 sites and contract negotiation, all of which took about six months. The department of Radiology at Lehigh Valley Hospital and Health Network (LVHHN) is a multi-site center that performs over 360,000 procedures annually. The department handles all modalities of radiology: general diagnosis, neuroradiology, ultrasound, CT Scan, MRI, interventional radiology, arthography, myelography, bone densitometry, nuclear medicine, PET imaging, vascular lab and other advanced procedures. The department consists of 200 FTEs and a medical staff of more than 40 radiologists. The budget is in the $10.3 million range. There are three hospital sites and four outpatient imaging center sites where services are provided. At Lehigh Valley Hospital, radiologists are not dedicated to one subspecialty, so implementing a voice system by modality was not an option. Because transcription was so far behind, we needed to eliminate that part of the process. As a result, we decided to deploy the system all at once and with the radiologists as editors. The planning and testing phase took about four months, and the implementation took two weeks. We deployed over 40 workstations and trained close to 50 physicians. The radiologists brought in an extra radiologist from our group for the two weeks of training. That allowed us to train without taking a radiologist out of the department. We trained three to six radiologists a day. I projected a savings of 5.0 FTEs over two years. The actual savings were 8.0 FTEs within three weeks for the first phase and an additional 4.3 FTEs within two weeks of the second phase. The transcription staff was retained to perform other types of transcription and not displaced. The goal was to reduce Medical Records' outsourcing expenses by $670,000 over three years. The actual savings are in excess of $900,000. The proposed payback period was 17 months, and the actual was less than 12 months. For two years prior to implementing the voice system, the turnaround time at Lehigh Valley was 41 percent within 24 hours. One week after implementation, the turnaround time was 78 percent within 24 hours. Today it ranges between 85 percent and 92 percent. Overall, the radiologists at Lehigh Valley Hospital did an excellent job with the cultural change to voice recognition. It has made a major impact on our ability to get reports to physicians in a timely manner so they can make treatment decisions.
Vocal parameters and voice-related quality of life in adult women with and without ovarian function.

PubMed

Ferraz, Pablo Rodrigo Rocha; Bertoldo, Simão Veras; Costa, Luanne Gabrielle Morais; Serra, Emmeliny Cristini Nogueira; Silva, Eduardo Magalhães; Brito, Luciane Maria Oliveira; Chein, Maria Bethânia da Costa

2013-05-01

To identify the perceptual and acoustic parameters of voice in adult women with and without ovarian function and its impact on quality of life related to voice. Cross-sectional and analytical study with 106 women divided into, two groups: G1, with ovarian function (n=43) and G2, without physiological ovarian function (n=63). The women were instructed to sustain the vowel "a" and the sounds of /s/ and /z/ in habitual pitch and loudness. They were also asked to classify their voices and answer the voice-related quality of life (V-RQOL) questionnaire. The perceptual analysis of the vocal samples was performed by three speech-language pathologists using the GRBASI (G: grade; R: roughness; B: breathness; A: asthenia; S: strain; I: instability) scale. The acoustic analysis was carried out with the software VoxMetria 2.7h (CTS Informatica). The data were analyzed using descriptive statistics. In the perceptual analysis, both groups showed a mild deviation for the parameters roughness, strain, and instability, but only G2 showed a mild impact for the overall degree of dysphonia. The mean of fundamental frequency was significantly lower for the G2, with a difference of 17.41Hz between the two groups. There was no impact on V-RQOL in any of the V-RQOL domains for this group. With the menopause, there is a change in women's voices, impacting on some voice parameters. However, there is no direct impact on their quality of life related to voice. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback.

PubMed

Behroozmand, Roozbeh; Larson, Charles R

2011-06-06

The motor-driven predictions about expected sensory feedback (efference copies) have been proposed to play an important role in recognition of sensory consequences of self-produced motor actions. In the auditory system, this effect was suggested to result in suppression of sensory neural responses to self-produced voices that are predicted by the efference copies during vocal production in comparison with passive listening to the playback of the identical self-vocalizations. In the present study, event-related potentials (ERPs) were recorded in response to upward pitch shift stimuli (PSS) with five different magnitudes (0, +50, +100, +200 and +400 cents) at voice onset during active vocal production and passive listening to the playback. Results indicated that the suppression of the N1 component during vocal production was largest for unaltered voice feedback (PSS: 0 cents), became smaller as the magnitude of PSS increased to 200 cents, and was almost completely eliminated in response to 400 cents stimuli. Findings of the present study suggest that the brain utilizes the motor predictions (efference copies) to determine the source of incoming stimuli and maximally suppresses the auditory responses to unaltered feedback of self-vocalizations. The reduction of suppression for 50, 100 and 200 cents and its elimination for 400 cents pitch-shifted voice auditory feedback support the idea that motor-driven suppression of voice feedback leads to distinctly different sensory neural processing of self vs. non-self vocalizations. This characteristic may enable the audio-vocal system to more effectively detect and correct for unexpected errors in the feedback of self-produced voice pitch compared with externally-generated sounds.
Layered approach to workstation design for medical image viewing

NASA Astrophysics Data System (ADS)

Haynor, David R.; Zick, Gregory L.; Heritage, Marcus B.; Kim, Yongmin

1992-07-01

Software engineering principles suggest that complex software systems are best constructed from independent, self-contained modules, thereby maximizing the portability, maintainability and modifiability of the produced code. This principal is important in the design of medical imaging workstations, where further developments in technology (CPU, memory, interface devices, displays, network connections) are required for clinically acceptable workstations, and it is desirable to provide different hardware platforms with the ''same look and feel'' for the user. In addition, the set of desired functions is relatively well understood, but the optimal user interface for delivering these functions on a clinically acceptable workstation is still different depending on department, specialty, or individual preference. At the University of Washington, we are developing a viewing station based on the IBM RISC/6000 computer and on new technologies that are just becoming commercially available. These include advanced voice recognition systems and an ultra-high-speed network. We are developing a set of specifications and a conceptual design for the workstation, and will be producing a prototype. This paper presents our current concepts concerning the architecture and software system design of the future prototype. Our conceptual design specifies requirements for a Database Application Programming Interface (DBAPI) and for a User API (UAPI). The DBAPI consists of a set of subroutine calls that define the admissible transactions between the workstation and an image archive. The UAPI describes the requests a user interface program can make of the workstation. It incorporates basic display and image processing functions, yet is specifically designed to allow extensions to the basic set at the application level. We will discuss the fundamental elements of the two API''s and illustrate their application to workstation design.
Internet-Based System for Voice Communication With the ISS

NASA Technical Reports Server (NTRS)

Chamberlain, James; Myers, Gerry; Clem, David; Speir, Terri

2005-01-01

The Internet Voice Distribution System (IVoDS) is a voice-communication system that comprises mainly computer hardware and software. The IVoDS was developed to supplement and eventually replace the Enhanced Voice Distribution System (EVoDS), which, heretofore, has constituted the terrestrial subsystem of a system for voice communications among crewmembers of the International Space Station (ISS), workers at the Payloads Operations Center at Marshall Space Flight Center, principal investigators at diverse locations who are responsible for specific payloads, and others. The IVoDS utilizes a communication infrastructure of NASA and NASArelated intranets in addition to, as its name suggests, the Internet. Whereas the EVoDS utilizes traditional circuitswitched telephony, the IVoDS is a packet-data system that utilizes a voice over Internet protocol (VOIP). Relative to the EVoDS, the IVoDS offers advantages of greater flexibility and lower cost for expansion and reconfiguration. The IVoDS is an extended version of a commercial Internet-based voice conferencing system that enables each user to participate in only one conference at a time. In the IVoDS, a user can receive audio from as many as eight conferences simultaneously while sending audio to one of them. The IVoDS also incorporates administrative controls, beyond those of the commercial system, that provide greater security and control of the capabilities and authorizations for talking and listening afforded to each user.
Facial recognition software success rates for the identification of 3D surface reconstructed facial images: implications for patient privacy and security.

PubMed

Mazura, Jan C; Juluru, Krishna; Chen, Joseph J; Morgan, Tara A; John, Majnu; Siegel, Eliot L

2012-06-01

Image de-identification has focused on the removal of textual protected health information (PHI). Surface reconstructions of the face have the potential to reveal a subject's identity even when textual PHI is absent. This study assessed the ability of a computer application to match research subjects' 3D facial reconstructions with conventional photographs of their face. In a prospective study, 29 subjects underwent CT scans of the head and had frontal digital photographs of their face taken. Facial reconstructions of each CT dataset were generated on a 3D workstation. In phase 1, photographs of the 29 subjects undergoing CT scans were added to a digital directory and tested for recognition using facial recognition software. In phases 2-4, additional photographs were added in groups of 50 to increase the pool of possible matches and the test for recognition was repeated. As an internal control, photographs of all subjects were tested for recognition against an identical photograph. Of 3D reconstructions, 27.5% were matched correctly to corresponding photographs (95% upper CL, 40.1%). All study subject photographs were matched correctly to identical photographs (95% lower CL, 88.6%). Of 3D reconstructions, 96.6% were recognized simply as a face by the software (95% lower CL, 83.5%). Facial recognition software has the potential to recognize features on 3D CT surface reconstructions and match these with photographs, with implications for PHI.
Failing To Marvel: The Nuances, Complexities, and Challenges of Multicultural Education.

ERIC Educational Resources Information Center

Simoes de Carvalho, Paulo M.

1998-01-01

Reviews the complex nature of multicultural education, which as it advocates recognition of the values of many cultures, is nevertheless grounded in a Western culture and subject to Western deconstruction. Considers the challenge of the multicultural educator to recognize his or her own voice as representative of the dominant culture. (SLD)
Blood Memory and the Arts: Indigenous Genealogies and Imagined Truths

ERIC Educational Resources Information Center

Mithlo, Nancy Marie

2011-01-01

Contemporary Native arts are rarely included in global arts settings that highlight any number of other disenfranchised artists seeking to gain recognition and a voice in the form of critical exhibition practice or scholarship. This article argues that Native artists can benefit from an increased participation in these broader arts networks, given…
Effect of Technological Changes in Information Transfer on the Delivery of Pharmacy Services.

ERIC Educational Resources Information Center

Barker, Kenneth N.; And Others

1989-01-01

Personal computer technology has arrived in health care. Specific technological advances are optical disc storage, smart cards, voice recognition, and robotics. This paper discusses computers in medicine, in nursing, in conglomerates, and with patients. Future health care will be delivered in primary care centers, medical supermarkets, specialized…
Dysphonia Detected by Pattern Recognition of Spectral Composition.

ERIC Educational Resources Information Center

Leinonen, Lea; And Others

1992-01-01

This study analyzed production of a long vowel sound within Finnish words by normal or dysphonic voices, using the Self-Organizing Map, the artificial neural network algorithm of T. Kohonen which produces two-dimensional representations of speech. The method was found to be both sensitive and specific in the detection of dysphonia. (Author/JDD)
Speech Recognition in Fluctuating and Continuous Maskers: Effects of Hearing Loss and Presentation Level.

ERIC Educational Resources Information Center

Summers, Van; Molis, Michelle R.

2004-01-01

Listeners with normal-hearing sensitivity recognize speech more accurately in the presence of fluctuating background sounds, such as a single competing voice, than in unmodulated noise at the same overall level. These performance differences ore greatly reduced in listeners with hearing impairment, who generally receive little benefit from…
78 FR 17276 - Agency Information Collection Activities: Proposed Request and Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2013-03-20

... information collection in field offices via personal contact (face-to-face or telephone interview) using the... voice recognition technology, or by keying in responses using a telephone key pad. The SSIMWR allows... Development Worksheets: Face-to-Face Interview and Telephone Interview--20 CFR 416.204(b) and 422.135--0960...
77 FR 76591 - Agency Information Collection Activities: Proposed Request and Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2012-12-28

... voice recognition technology, or by keying in responses using a telephone key pad. The SSIMWR allows... Worksheets: Face-to-Face Interview and Telephone Interview--20 CFR 416.204(b) and 422.135--0960- 0780. SSA... each interview either over the telephone or through a face-to-face discussion with the centenarian...
Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

NASA Technical Reports Server (NTRS)

Ye, Sherry

2015-01-01

NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.
Pupils dilate for vocal or familiar music.

PubMed

Weiss, Michael W; Trehub, Sandra E; Schellenberg, E Glenn; Habashi, Peter

2016-08-01

Previous research reveals that vocal melodies are remembered better than instrumental renditions. Here we explored the possibility that the voice, as a highly salient stimulus, elicits greater arousal than nonvocal stimuli, resulting in greater pupil dilation for vocal than for instrumental melodies. We also explored the possibility that pupil dilation indexes memory for melodies. We tracked pupil dilation during a single exposure to 24 unfamiliar folk melodies (half sung to la la, half piano) and during a subsequent recognition test in which the previously heard melodies were intermixed with 24 novel melodies (half sung, half piano) from the same corpus. Pupil dilation was greater for vocal melodies than for piano melodies in the exposure phase and in the test phase. It was also greater for previously heard melodies than for novel melodies. Our findings provide the first evidence that pupillometry can be used to measure recognition of stimuli that unfold over several seconds. They also provide the first evidence of enhanced arousal to vocal melodies during encoding and retrieval, thereby supporting the more general notion of the voice as a privileged signal. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Remediation of Deficits in Recognition of Facial Emotions in Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Weinger, Paige M.; Depue, Richard A.

2011-01-01

This study evaluated the efficacy of the Mind Reading interactive computer software to remediate emotion recognition deficits in children with autism spectrum disorders (ASD). Six unmedicated children with ASD and 11 unmedicated non-clinical control subjects participated in the study. The clinical sample used the software for five sessions. The…
EduSpeak[R]: A Speech Recognition and Pronunciation Scoring Toolkit for Computer-Aided Language Learning Applications

ERIC Educational Resources Information Center

Franco, Horacio; Bratt, Harry; Rossier, Romain; Rao Gadde, Venkata; Shriberg, Elizabeth; Abrash, Victor; Precoda, Kristin

2010-01-01

SRI International's EduSpeak[R] system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to…
Automatic forensic face recognition from digital images.

PubMed

Peacock, C; Goode, A; Brett, A

2004-01-01

Digital image evidence is now widely available from criminal investigations and surveillance operations, often captured by security and surveillance CCTV. This has resulted in a growing demand from law enforcement agencies for automatic person-recognition based on image data. In forensic science, a fundamental requirement for such automatic face recognition is to evaluate the weight that can justifiably be attached to this recognition evidence in a scientific framework. This paper describes a pilot study carried out by the Forensic Science Service (UK) which explores the use of digital facial images in forensic investigation. For the purpose of the experiment a specific software package was chosen (Image Metrics Optasia). The paper does not describe the techniques used by the software to reach its decision of probabilistic matches to facial images, but accepts the output of the software as though it were a 'black box'. In this way, the paper lays a foundation for how face recognition systems can be compared in a forensic framework. The aim of the paper is to explore how reliably and under what conditions digital facial images can be presented in evidence.
The Next Era: Deep Learning in Pharmaceutical Research

PubMed Central

Ekins, Sean

2016-01-01

Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule’s properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique. PMID:27599991

Hemispheric association and dissociation of voice and speech information processing in stroke.

PubMed

Jones, Anna B; Farrall, Andrew J; Belin, Pascal; Pernet, Cyril R

2015-10-01

As we listen to someone speaking, we extract both linguistic and non-linguistic information. Knowing how these two sets of information are processed in the brain is fundamental for the general understanding of social communication, speech recognition and therapy of language impairments. We investigated the pattern of performances in phoneme versus gender categorization in left and right hemisphere stroke patients, and found an anatomo-functional dissociation in the right frontal cortex, establishing a new syndrome in voice discrimination abilities. In addition, phoneme and gender performances were most often associated than dissociated in the left hemisphere patients, suggesting a common neural underpinnings. Copyright © 2015 Elsevier Ltd. All rights reserved.
Robust matching for voice recognition

NASA Astrophysics Data System (ADS)

Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

1994-10-01

This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.
Voice response system of color and pattern on clothes for visually handicapped person.

PubMed

Miyake, Masao; Manabe, Yoshitsugu; Uranishi, Yuki; Imura, Masataka; Oshiro, Osamu

2013-01-01

For visually handicapped people, a mental support is important in their independent daily life and participation in a society. It is expected to develop a system which can recognize colors and patterns on clothes so that they can go out with less concerns. We have worked on a basic study into such a system, and developed a prototype system which can stably recognize colors and patterns and immediately provide these information in voice, when a user faces it to clothes. In the results of evaluation experiments it is shown that the prototype system is superior to the system in the basic study at the accuracy rate for the recognition of color and pattern.
Speech processing using maximum likelihood continuity mapping

DOEpatents

Hogden, John E.

2000-01-01

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.E.

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Illumination-invariant hand gesture recognition

NASA Astrophysics Data System (ADS)

Mendoza-Morales, América I.; Miramontes-Jaramillo, Daniel; Kober, Vitaly

2015-09-01

In recent years, human-computer interaction (HCI) has received a lot of interest in industry and science because it provides new ways to interact with modern devices through voice, body, and facial/hand gestures. The application range of the HCI is from easy control of home appliances to entertainment. Hand gesture recognition is a particularly interesting problem because the shape and movement of hands usually are complex and flexible to be able to codify many different signs. In this work we propose a three step algorithm: first, detection of hands in the current frame is carried out; second, hand tracking across the video sequence is performed; finally, robust recognition of gestures across subsequent frames is made. Recognition rate highly depends on non-uniform illumination of the scene and occlusion of hands. In order to overcome these issues we use two Microsoft Kinect devices utilizing combined information from RGB and infrared sensors. The algorithm performance is tested in terms of recognition rate and processing time.
Automatic speech recognition research at NASA-Ames Research Center

NASA Technical Reports Server (NTRS)

Coler, Clayton R.; Plummer, Robert P.; Huff, Edward M.; Hitchcock, Myron H.

1977-01-01

A trainable acoustic pattern recognizer manufactured by Scope Electronics is presented. The voice command system VCS encodes speech by sampling 16 bandpass filters with center frequencies in the range from 200 to 5000 Hz. Variations in speaking rate are compensated for by a compression algorithm that subdivides each utterance into eight subintervals in such a way that the amount of spectral change within each subinterval is the same. The recorded filter values within each subinterval are then reduced to a 15-bit representation, giving a 120-bit encoding for each utterance. The VCS incorporates a simple recognition algorithm that utilizes five training samples of each word in a vocabulary of up to 24 words. The recognition rate of approximately 85 percent correct for untrained speakers and 94 percent correct for trained speakers was not considered adequate for flight systems use. Therefore, the built-in recognition algorithm was disabled, and the VCS was modified to transmit 120-bit encodings to an external computer for recognition.
[Creating language model of the forensic medicine domain for developing a autopsy recording system by automatic speech recognition].

PubMed

Niijima, H; Ito, N; Ogino, S; Takatori, T; Iwase, H; Kobayashi, M

2000-11-01

For the purpose of practical use of speech recognition technology for recording of forensic autopsy, a language model of the speech recording system, specialized for the forensic autopsy, was developed. The language model for the forensic autopsy by applying 3-gram model was created, and an acoustic model for Japanese speech recognition by Hidden Markov Model in addition to the above were utilized to customize the speech recognition engine for forensic autopsy. A forensic vocabulary set of over 10,000 words was compiled and some 300,000 sentence patterns were made to create the forensic language model, then properly mixing with a general language model to attain high exactitude. When tried by dictating autopsy findings, this speech recognition system was proved to be about 95% of recognition rate that seems to have reached to the practical usability in view of speech recognition software, though there remains rooms for improving its hardware and application-layer software.
Acoustic-Perceptual Correlates of Voice in Indian Hindu Purohits.

PubMed

Balasubramanium, Radish Kumar; Karuppali, Sudhin; Bajaj, Gagan; Shastry, Anuradha; Bhat, Jayashree

2018-05-16

Purohit, in the Indian religious context (Hindu), means priest. Purohits are professional voice users who use their voice while performing regular worships and rituals in temples and homes. Any deviations in their voice can have an impact on their profession. Hence, there is a need to investigate the voice characteristics of purohits using perceptual and acoustic analyses. A total of 44 men in the age range of 18-30 years were divided into two groups. Group 1 consisted of purohits who were trained since childhood (n = 22) in the traditional gurukul system. Group 2 (n = 22) consisted of normal controls. Phonation and spontaneous speech samples were obtained from all the participants at a comfortable pitch and loudness. The Praat software (Version 5.3.31) and the Speech tool were used to analyze the traditional acoustic and cepstral parameters, respectively, whereas GRBAS was used to perceptually evaluate the voice. Results of the independent t test revealed no significant differences across the groups for perceptual and traditional acoustic measures except for intensity, which was significantly higher in purohits' voices at P < 0.05. However, the cepstral values (cepstral peak prominence and smoothened cepstral peak prominence) were much higher in purohits than in controls at P < 0.05 CONCLUSIONS: Results revealed that purohits did not exhibit vocal deviations as analyzed through perceptual and acoustic parameters. In contrast, cepstral measures were higher in Indian Hindu purohits in comparison with normal controls, suggestive of a higher degree of harmonic organization in purohits. Further studies are required to analyze the physiological correlates of increased cepstral measures in purohits' voices. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback

PubMed Central

2011-01-01

Background The motor-driven predictions about expected sensory feedback (efference copies) have been proposed to play an important role in recognition of sensory consequences of self-produced motor actions. In the auditory system, this effect was suggested to result in suppression of sensory neural responses to self-produced voices that are predicted by the efference copies during vocal production in comparison with passive listening to the playback of the identical self-vocalizations. In the present study, event-related potentials (ERPs) were recorded in response to upward pitch shift stimuli (PSS) with five different magnitudes (0, +50, +100, +200 and +400 cents) at voice onset during active vocal production and passive listening to the playback. Results Results indicated that the suppression of the N1 component during vocal production was largest for unaltered voice feedback (PSS: 0 cents), became smaller as the magnitude of PSS increased to 200 cents, and was almost completely eliminated in response to 400 cents stimuli. Conclusions Findings of the present study suggest that the brain utilizes the motor predictions (efference copies) to determine the source of incoming stimuli and maximally suppresses the auditory responses to unaltered feedback of self-vocalizations. The reduction of suppression for 50, 100 and 200 cents and its elimination for 400 cents pitch-shifted voice auditory feedback support the idea that motor-driven suppression of voice feedback leads to distinctly different sensory neural processing of self vs. non-self vocalizations. This characteristic may enable the audio-vocal system to more effectively detect and correct for unexpected errors in the feedback of self-produced voice pitch compared with externally-generated sounds. PMID:21645406
The computer in office medical practice.

PubMed

Dowdle, John

2002-04-01

There will continue to be change and evolution in the medical office environment. As voice recognition systems continue to improve, instant creation of office notes with the absence of dictation may be commonplace. As medical and computer technology evolves, we must continue to evaluate the many new computer systems that can assist us in our clinical office practice.
Exploring the Use of Emoji as a Visual Research Method for Eliciting Young Children's Voices in Childhood Research

ERIC Educational Resources Information Center

Fane, Jennifer; MacDougall, Colin; Jovanovic, Jessie; Redmond, Gerry; Gibbs, Lisa

2018-01-01

Recognition of the need to move from research "on" children to research "with" children has prompted significant theoretical and methodological debate as to how young children can be positioned as active participants in the research process. Visual research methods such as drawing, photography, and videography have received…
Effects of Familiarity and Feeding on Newborn Speech-Voice Recognition

ERIC Educational Resources Information Center

Valiante, A. Grace; Barr, Ronald G.; Zelazo, Philip R.; Brant, Rollin; Young, Simon N.

2013-01-01

Newborn infants preferentially orient to familiar over unfamiliar speech sounds. They are also better at remembering unfamiliar speech sounds for short periods of time if learning and retention occur after a feed than before. It is unknown whether short-term memory for speech is enhanced when the sound is familiar (versus unfamiliar) and, if so,…
The Sound-to-Speech Translations Utilizing Graphics Mediation Interface for Students with Severe Handicaps. Final Report.

ERIC Educational Resources Information Center

Brown, Carrie; And Others

This final report describes activities and outcomes of a research project on a sound-to-speech translation system utilizing a graphic mediation interface for students with severe disabilities. The STS/Graphics system is a voice recognition, computer-based system designed to allow individuals with mental retardation and/or severe physical…
Killing Curiosity? An Analysis of Celebrated Identity Performances among Teachers and Students in Nine London Secondary Science Classrooms

ERIC Educational Resources Information Center

Archer, Louise; Dawson, Emily; DeWitt, Jennifer; Godec, Spela; King, Heather; Mau, Ada; Nomikou, Effrosyni; Seakins, Amy

2017-01-01

In this paper, we take the view that school classrooms are spaces that are constituted by complex power struggles (for voice, authenticity, and recognition), involving multiple layers of resistance and contestation between the "institution," teachers and students, which can have profound implications for students' science identity and…
Building Biases in Infancy: The Influence of Race on Face and Voice Emotion Matching

ERIC Educational Resources Information Center

Vogel, Margaret; Monesson, Alexandra; Scott, Lisa S.

2012-01-01

Early in the first year of life infants exhibit equivalent performance distinguishing among people within their own race and within other races. However, with development and experience, their face recognition skills become tuned to groups of people they interact with the most. This developmental tuning is hypothesized to be the origin of adult…
Moving beyond the Hype: What Does the Celebration of Student Writing Do for Students?

ERIC Educational Resources Information Center

Carter, Genesea M.; Gallegos, Erin Penner

2017-01-01

Over the last decade celebrations of student writing (CSWs) have been instituted at universities across the nation as a public way to celebrate students' voices, identities, and literacies. Often touted as a way to gain campus-wide recognition and support for first-year composition courses, this event also purportedly fosters agency and authority…
The Suitability of Cloud-Based Speech Recognition Engines for Language Learning

ERIC Educational Resources Information Center

Daniels, Paul; Iwago, Koji

2017-01-01

As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…
Fundamental frequency and voice perturbation measures in smokers and non-smokers: An acoustic and perceptual study

NASA Astrophysics Data System (ADS)

Freeman, Allison

This research examined the fundamental frequency and perturbation (jitter % and shimmer %) measures in young adult (20-30 year-old) and middle-aged adult (40-55 year-old) smokers and non-smokers; there were 36 smokers and 36 non-smokers. Acoustic analysis was carried out utilizing one task: production of sustained /a/. These voice samples were analyzed utilizing Multi-Dimensional Voice Program (MDVP) software, which provided values for fundamental frequency, jitter %, and shimmer %.These values were analyzed for trends regarding smoking status, age, and gender. Statistical significance was found regarding the fundamental frequency, jitter %, and shimmer % for smokers as compared to non-smokers; smokers were found to have significantly lower fundamental frequency values, and significantly higher jitter % and shimmer % values. Statistical significance was not found regarding fundamental frequency, jitter %, and shimmer % for age group comparisons. With regard to gender, statistical significance was found regarding fundamental frequency; females were found to have statistically higher fundamental frequencies as compared to males. However, the relationships between gender and jitter % and shimmer % lacked statistical significance. These results indicate that smoking negatively affects voice quality. This study also examined the ability of untrained listeners to identify smokers and non-smokers based on their voices. Results of this voice perception task suggest that listeners are not accurately able to identify smokers and non-smokers, as statistical significance was not reached. However, despite a lack of significance, trends in data suggest that listeners are able to utilize voice quality to identify smokers and non-smokers.
Voice and gesture-based 3D multimedia presentation tool

NASA Astrophysics Data System (ADS)

Fukutake, Hiromichi; Akazawa, Yoshiaki; Okada, Yoshihiro

2007-09-01

This paper proposes a 3D multimedia presentation tool that allows the user to manipulate intuitively only through the voice input and the gesture input without using a standard keyboard or a mouse device. The authors developed this system as a presentation tool to be used in a presentation room equipped a large screen like an exhibition room in a museum because, in such a presentation environment, it is better to use voice commands and the gesture pointing input rather than using a keyboard or a mouse device. This system was developed using IntelligentBox, which is a component-based 3D graphics software development system. IntelligentBox has already provided various types of 3D visible, reactive functional components called boxes, e.g., a voice input component and various multimedia handling components. IntelligentBox also provides a dynamic data linkage mechanism called slot-connection that allows the user to develop 3D graphics applications by combining already existing boxes through direct manipulations on a computer screen. Using IntelligentBox, the 3D multimedia presentation tool proposed in this paper was also developed as combined components only through direct manipulations on a computer screen. The authors have already proposed a 3D multimedia presentation tool using a stage metaphor and its voice input interface. This time, we extended the system to make it accept the user gesture input besides voice commands. This paper explains details of the proposed 3D multimedia presentation tool and especially describes its component-based voice and gesture input interfaces.

IBM techexplorer and MathML: Interactive Multimodal Scientific Documents

NASA Astrophysics Data System (ADS)

Diaz, Angel

2001-06-01

The World Wide Web provides a standard publishing platform for disseminating scientific and technical articles, books, journals, courseware, or even homework on the internet; however, the transition from paper to web-based interactive content has brought new opportunities for creating interactive content. Students, scientists, and engineers are now faced with the task of rendering the 2D presentational structure of mathematics, harnessing the wealth of scientific and technical software, and creating truly accessible scientific portals across international boundaries and markets. The recent emergence of World Wide Web Consortium (W3C) standards such as the Mathematical Markup Language (MathML), Language (XSL), and Aural CSS (ACSS) provide a foundation whereby mathematics can be displayed, enlivened, computed, and audio formatted. With interoperability ensured by standards, software applications can be easily brought together to create extensible and interactive scientific content. In this presentation we will provide an overview of the IBM techexplorer Hypermedia Browser, a web browser plug-in and ActiveX control aimed at bringing interactive mathematics to the masses across platforms and applications. We will demonstrate "live" mathematics where documents that contain MathML expressions can be edited and computed right inside your favorite web browser. This demonstration will be generalized as we show how MathML can be used to enliven even PowerPoint presentations. Finally, we will close the loop by demonstrating a novel approach to spoken mathematics based on MathML, DOM, XSL, ACSS, techexplorer, and IBM ViaVoice. By making use of techexplorer as the glue that binds the rendered content to the web browser, the back-end computation software, the Java applets that augment the exposition, and voice-rendering systems such as ViaVoice, authors can indeed create truly extensible and interactive scientific content. For more information see: [http://www.software.ibm.com/techexplorer] [http://www.alphaworks.ibm.com] [http://www.w3.org
On the recognition of complex structures: Computer software using artificial intelligence applied to pattern recognition

NASA Technical Reports Server (NTRS)

Yakimovsky, Y.

1974-01-01

An approach to simultaneous interpretation of objects in complex structures so as to maximize a combined utility function is presented. Results of the application of a computer software system to assign meaning to regions in a segmented image based on the principles described in this paper and on a special interactive sequential classification learning system, which is referenced, are demonstrated.
Iris Cryptography for Security Purpose

NASA Astrophysics Data System (ADS)

Ajith, Srighakollapu; Balaji Ganesh Kumar, M.; Latha, S.; Samiappan, Dhanalakshmi; Muthu, P.

2018-04-01

In today's world, the security became the major issue to every human being. A major issue is hacking as hackers are everywhere, as the technology was developed still there are many issues where the technology fails to meet the security. Engineers, scientists were discovering the new products for security purpose as biometrics sensors like face recognition, pattern recognition, gesture recognition, voice authentication etcetera. But these devices fail to reach the expected results. In this work, we are going to present an approach to generate a unique secure key using the iris template. Here the iris templates are processed using the well-defined processing techniques. Using the encryption and decryption process they are stored, traversed and utilized. As of the work, we can conclude that the iris cryptography gives us the expected results for securing the data from eavesdroppers.
Using voice input and audio feedback to enhance the reality of a virtual experience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miner, N.E.

1994-04-01

Virtual Reality (VR) is a rapidly emerging technology which allows participants to experience a virtual environment through stimulation of the participant`s senses. Intuitive and natural interactions with the virtual world help to create a realistic experience. Typically, a participant is immersed in a virtual environment through the use of a 3-D viewer. Realistic, computer-generated environment models and accurate tracking of a participant`s view are important factors for adding realism to a virtual experience. Stimulating a participant`s sense of sound and providing a natural form of communication for interacting with the virtual world are equally important. This paper discusses the advantagesmore » and importance of incorporating voice recognition and audio feedback capabilities into a virtual world experience. Various approaches and levels of complexity are discussed. Examples of the use of voice and sound are presented through the description of a research application developed in the VR laboratory at Sandia National Laboratories.« less
A voice-actuated wind tunnel model leak checking system

NASA Technical Reports Server (NTRS)

Larson, William E.

1989-01-01

A computer program has been developed that improves the efficiency of wind tunnel model leak checking. The program uses a voice recognition unit to relay a technician's commands to the computer. The computer, after receiving a command, can respond to the technician via a voice response unit. Information about the model pressure orifice being checked is displayed on a gas-plasma terminal. On command, the program records up to 30 seconds of pressure data. After the recording is complete, the raw data and a straight line fit of the data are plotted on the terminal. This allows the technician to make a decision on the integrity of the orifice being checked. All results of the leak check program are stored in a database file that can be listed on the line printer for record keeping purposes or displayed on the terminal to help the technician find unchecked orifices. This program allows one technician to check a model for leaks instead of the two or three previously required.
Design and development of a Space Station proximity operations research and development mockup

NASA Technical Reports Server (NTRS)

Haines, Richard F.

1986-01-01

Proximity operations (Prox-Ops) on-orbit refers to all activities taking place within one km of the Space Station. Designing a Prox-Ops control station calls for a comprehensive systems approach which takes into account structural constraints, orbital dynamics including approach/departure flight paths, myriad human factors and other topics. This paper describes a reconfigurable full-scale mock-up of a Prox-Ops station constructed at Ames incorporating an array of windows (with dynamic star field, target vehicle(s), and head-up symbology), head-down perspective display of manned and unmanned vehicles, voice- actuated 'electronic checklist', computer-generated voice system, expert system (to help diagnose subsystem malfunctions), and other displays and controls. The facility is used for demonstrations of selected Prox-Ops approach scenarios, human factors research (work-load assessment, determining external vision envelope requirements, head-down and head-up symbology design, voice synthesis and recognition research, etc.) and development of engineering design guidelines for future module interiors.
Representational specificity of within-category phonetic variation in the mental lexicon

NASA Astrophysics Data System (ADS)

Ju, Min; Luce, Paul A.

2003-10-01

This study examines (1) whether within-category phonetic variation in voice onset time (VOT) is encoded in long-term memory and has consequences for subsequent word recognition and, if so, (2) whether such effects are greater in words with voiced counterparts (pat/bat) than those without (cow/*gow), given that VOT information is more critical for lexical discrimination in the former. Two long-term repetition priming experiments were conducted using words containing word-initial voiceless stops varying in VOT. Reaction times to a lexical decision were compared between the same and different VOT conditions in words with or without voiced counterparts. If veridical representations of each episode are preserved in memory, variation in VOT should have demonstrable effects on the magnitude of priming. However, if within-category variation is discarded and form-based representations are abstract, the variation in VOT should not mediate priming. The implications of these results for the specificity and abstractness of phonetic representations in long-term memory will be discussed.
The written voice: implicit memory effects of voice characteristics following silent reading and auditory presentation.

PubMed

Abramson, Marianne

2007-12-01

After being familiarized with two voices, either implicit (auditory lexical decision) or explicit memory (auditory recognition) for words from silently read sentences was assessed among 32 men and 32 women volunteers. In the silently read sentences, the sex of speaker was implied in the initial words, e.g., "He said, ..." or "She said...". Tone in question versus statement was also manipulated by appropriate punctuation. Auditory lexical decision priming was found for sex- and tone-consistent items following silent reading, but only up to 5 min. after silent reading. In a second study, similar lexical decision priming was found following listening to the sentences, although these effects remained reliable after a 2-day delay. The effect sizes for lexical decision priming showed that tone-consistency and sex-consistency were strong following both silent reading and listening 5 min. after studying. These results suggest that readers create episodic traces of text from auditory images of silently read sentences as they do during listening.
Computerized literature reference system: use of an optical scanner and optical character recognition software.

PubMed

Lossef, S V; Schwartz, L H

1990-09-01

A computerized reference system for radiology journal articles was developed by using an IBM-compatible personal computer with a hand-held optical scanner and optical character recognition software. This allows direct entry of scanned text from printed material into word processing or data-base files. Additionally, line diagrams and photographs of radiographs can be incorporated into these files. A text search and retrieval software program enables rapid searching for keywords in scanned documents. The hand scanner and software programs are commercially available, relatively inexpensive, and easily used. This permits construction of a personalized radiology literature file of readily accessible text and images requiring minimal typing or keystroke entry.
Visual and auditory socio-cognitive perception in unilateral temporal lobe epilepsy in children and adolescents: a prospective controlled study.

PubMed

Laurent, Agathe; Arzimanoglou, Alexis; Panagiotakaki, Eleni; Sfaello, Ignacio; Kahane, Philippe; Ryvlin, Philippe; Hirsch, Edouard; de Schonen, Scania

2014-12-01

A high rate of abnormal social behavioural traits or perceptual deficits is observed in children with unilateral temporal lobe epilepsy. In the present study, perception of auditory and visual social signals, carried by faces and voices, was evaluated in children or adolescents with temporal lobe epilepsy. We prospectively investigated a sample of 62 children with focal non-idiopathic epilepsy early in the course of the disorder. The present analysis included 39 children with a confirmed diagnosis of temporal lobe epilepsy. Control participants (72), distributed across 10 age groups, served as a control group. Our socio-perceptual evaluation protocol comprised three socio-visual tasks (face identity, facial emotion and gaze direction recognition), two socio-auditory tasks (voice identity and emotional prosody recognition), and three control tasks (lip reading, geometrical pattern and linguistic intonation recognition). All 39 patients also benefited from a neuropsychological examination. As a group, children with temporal lobe epilepsy performed at a significantly lower level compared to the control group with regards to recognition of facial identity, direction of eye gaze, and emotional facial expressions. We found no relationship between the type of visual deficit and age at first seizure, duration of epilepsy, or the epilepsy-affected cerebral hemisphere. Deficits in socio-perceptual tasks could be found independently of the presence of deficits in visual or auditory episodic memory, visual non-facial pattern processing (control tasks), or speech perception. A normal FSIQ did not exempt some of the patients from an underlying deficit in some of the socio-perceptual tasks. Temporal lobe epilepsy not only impairs development of emotion recognition, but can also impair development of perception of other socio-perceptual signals in children with or without intellectual deficiency. Prospective studies need to be designed to evaluate the results of appropriate re-education programs in children presenting with deficits in social cue processing.
Duenna-An experimental language teaching application

NASA Astrophysics Data System (ADS)

Horváth, Balázs Zsigmond; Blaske, Bence; Szabó, Anita

The presented TTS (text-to-speech) application is an auxiliary tool for language teaching. It utilizes computer-generated voices to simulate dialogs representing different grammatical problems or speech contexts. The software is capable of producing as many examples of dialogs as required to enhance the language learning experience and thus serve curriculum representation, grammar contextualization and pronunciation at the same time. It is designed to be used on a regular basis in the language classroom and students gladly write materials for listening comprehension tasks with it. A pilot study involving 26 students (divided into control and trial groups) practicing for their school-leaving exam, indicates that computer-generated voices are adequate to recreate audio course book materials as well. The voices used were able to involve the students as effectively as if they were listening to recorded human speech.
[Approach to the Development of Mind and Persona].

PubMed

Sawaguchi, Toshiko

2018-01-01

To access medical specialists by health specialists working in the regional health field, the possibility of utilizing the voice approach for dissociative identity disorder (DID) patients as a health assessment for medical access (HAMA) was investigated. The first step is to investigate whether the plural personae in a single DID patient can be discriminated by voice analysis. Voices of DID patients including these with different personae were extracted from YouTube and were analysed using the software PRAAT with basic frequency, oral factors, chin factors and tongue factors. In addition, RAKUGO story teller voices made artificially and dramatically were analysed in the same manner. Quantitive and qualitative analysis method were carried out and nested logistic regression and a nested generalized linear model was developed. The voice from different personae in one DID patient could be visually and easily distinquished using basic frequency curve, cluster analysis and factor analysis. In the canonical analysis, only Roy's maximum root was <0.01. In the nested generalized linear model, the model using a standard deviation (SD) indicator fit best and some other possibilities are shown here. In DID patients, the short transition time among plural personae could guide to the risky situation such as suicide. So if the voice approach can show the time threshold of changes between the different personae, it would be useful as an Access Assessment in the form of a simple HAMA.
A text input system developed by using lips image recognition based LabVIEW for the seriously disabled.

PubMed

Chen, S C; Shao, C L; Liang, C K; Lin, S W; Huang, T H; Hsieh, M C; Yang, C H; Luo, C H; Wuo, C M

2004-01-01

In this paper, we present a text input system for the seriously disabled by using lips image recognition based on LabVIEW. This system can be divided into the software subsystem and the hardware subsystem. In the software subsystem, we adopted the technique of image processing to recognize the status of mouth-opened or mouth-closed depending the relative distance between the upper lip and the lower lip. In the hardware subsystem, parallel port built in PC is used to transmit the recognized result of mouth status to the Morse-code text input system. Integrating the software subsystem with the hardware subsystem, we implement a text input system by using lips image recognition programmed in LabVIEW language. We hope the system can help the seriously disabled to communicate with normal people more easily.
75 FR 43365 - Revision of Delegations of Authority

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-23

... Judges (``OALJ''). Pursuant to this new reporting structure, the Secretary has delegated to the ASA...: Feasibility studies; economic analyses; systems design; acquisition of equipment, software, services, and..., development, acquisition, and use of equipment and systems for voice, data, and communications, excluding the...
Google Glass-Directed Monitoring and Control of Microfluidic Biosensors and Actuators

PubMed Central

Zhang, Yu Shrike; Busignani, Fabio; Ribas, João; Aleman, Julio; Rodrigues, Talles Nascimento; Shaegh, Seyed Ali Mousavi; Massa, Solange; Rossi, Camilla Baj; Taurino, Irene; Shin, Su-Ryon; Calzone, Giovanni; Amaratunga, Givan Mark; Chambers, Douglas Leon; Jabari, Saman; Niu, Yuxi; Manoharan, Vijayan; Dokmeci, Mehmet Remzi; Carrara, Sandro; Demarchi, Danilo; Khademhosseini, Ali

2016-01-01

Google Glass is a recently designed wearable device capable of displaying information in a smartphone-like hands-free format by wireless communication. The Glass also provides convenient control over remote devices, primarily enabled by voice recognition commands. These unique features of the Google Glass make it useful for medical and biomedical applications where hands-free experiences are strongly preferred. Here, we report for the first time, an integral set of hardware, firmware, software, and Glassware that enabled wireless transmission of sensor data onto the Google Glass for on-demand data visualization and real-time analysis. Additionally, the platform allowed the user to control outputs entered through the Glass, therefore achieving bi-directional Glass-device interfacing. Using this versatile platform, we demonstrated its capability in monitoring physical and physiological parameters such as temperature, pH, and morphology of liver- and heart-on-chips. Furthermore, we showed the capability to remotely introduce pharmaceutical compounds into a microfluidic human primary liver bioreactor at desired time points while monitoring their effects through the Glass. We believe that such an innovative platform, along with its concept, has set up a premise in wearable monitoring and controlling technology for a wide variety of applications in biomedicine. PMID:26928456
Google Glass-Directed Monitoring and Control of Microfluidic Biosensors and Actuators

NASA Astrophysics Data System (ADS)

Zhang, Yu Shrike; Busignani, Fabio; Ribas, João; Aleman, Julio; Rodrigues, Talles Nascimento; Shaegh, Seyed Ali Mousavi; Massa, Solange; Rossi, Camilla Baj; Taurino, Irene; Shin, Su-Ryon; Calzone, Giovanni; Amaratunga, Givan Mark; Chambers, Douglas Leon; Jabari, Saman; Niu, Yuxi; Manoharan, Vijayan; Dokmeci, Mehmet Remzi; Carrara, Sandro; Demarchi, Danilo; Khademhosseini, Ali

2016-03-01

Google Glass is a recently designed wearable device capable of displaying information in a smartphone-like hands-free format by wireless communication. The Glass also provides convenient control over remote devices, primarily enabled by voice recognition commands. These unique features of the Google Glass make it useful for medical and biomedical applications where hands-free experiences are strongly preferred. Here, we report for the first time, an integral set of hardware, firmware, software, and Glassware that enabled wireless transmission of sensor data onto the Google Glass for on-demand data visualization and real-time analysis. Additionally, the platform allowed the user to control outputs entered through the Glass, therefore achieving bi-directional Glass-device interfacing. Using this versatile platform, we demonstrated its capability in monitoring physical and physiological parameters such as temperature, pH, and morphology of liver- and heart-on-chips. Furthermore, we showed the capability to remotely introduce pharmaceutical compounds into a microfluidic human primary liver bioreactor at desired time points while monitoring their effects through the Glass. We believe that such an innovative platform, along with its concept, has set up a premise in wearable monitoring and controlling technology for a wide variety of applications in biomedicine.
Google Glass-Directed Monitoring and Control of Microfluidic Biosensors and Actuators.

PubMed

Zhang, Yu Shrike; Busignani, Fabio; Ribas, João; Aleman, Julio; Rodrigues, Talles Nascimento; Shaegh, Seyed Ali Mousavi; Massa, Solange; Baj Rossi, Camilla; Taurino, Irene; Shin, Su-Ryon; Calzone, Giovanni; Amaratunga, Givan Mark; Chambers, Douglas Leon; Jabari, Saman; Niu, Yuxi; Manoharan, Vijayan; Dokmeci, Mehmet Remzi; Carrara, Sandro; Demarchi, Danilo; Khademhosseini, Ali

2016-03-01

Google Glass is a recently designed wearable device capable of displaying information in a smartphone-like hands-free format by wireless communication. The Glass also provides convenient control over remote devices, primarily enabled by voice recognition commands. These unique features of the Google Glass make it useful for medical and biomedical applications where hands-free experiences are strongly preferred. Here, we report for the first time, an integral set of hardware, firmware, software, and Glassware that enabled wireless transmission of sensor data onto the Google Glass for on-demand data visualization and real-time analysis. Additionally, the platform allowed the user to control outputs entered through the Glass, therefore achieving bi-directional Glass-device interfacing. Using this versatile platform, we demonstrated its capability in monitoring physical and physiological parameters such as temperature, pH, and morphology of liver- and heart-on-chips. Furthermore, we showed the capability to remotely introduce pharmaceutical compounds into a microfluidic human primary liver bioreactor at desired time points while monitoring their effects through the Glass. We believe that such an innovative platform, along with its concept, has set up a premise in wearable monitoring and controlling technology for a wide variety of applications in biomedicine.
Deaf-And-Mute Sign Language Generation System

NASA Astrophysics Data System (ADS)

Kawai, Hideo; Tamura, Shinichi

1984-08-01

We have developed a system which can recognize speech and generate the corresponding animation-like sign language sequence. The system is implemented in a popular personal computer. This has three video-RAM's and a voice recognition board which can recognize only registered voice of a specific speaker. Presently, fourty sign language patterns and fifty finger spellings are stored in two floppy disks. Each sign pattern is composed of one to four sub-patterns. That is, if the pattern is composed of one sub-pattern, it is displayed as a still pattern. If not, it is displayed as a motion pattern. This system will help communications between deaf-and-mute persons and healthy persons. In order to display in high speed, almost programs are written in a machine language.
Effect of adenoid hypertrophy on the voice and laryngeal mucosa in children.

PubMed

Gomaa, Mohammed A; Mohammed, Haitham M; Abdalla, Adel A; Nasr, Dalia M

2013-12-01

The adenoids, or pharyngeal tonsils, are lymphatic tissue localized at the mucous layer of the roof and posterior wall of nasopharynx. Dysphonia defined as perceptual audible change of a patient's habitual voice as self judged or judged by his or her listeners. The diagnosis of dysphonia relies on clinical judgment based on phoniatric symptoms, auditory perceptual assessment of voice (APA) and full laryngeal examination. Our study was conducted to evaluate the effect of adenoid hypertrophy on voice and laryngeal mucosa. The study sample composed of sixty children, forty of them had adenoid hypertrophy (patient's group) and twenty healthy children (control group). Patient's group composed of 17 boys (42.5%) and 23 girls (57.5%), while control group consists of 8 males (40%) and 12 females (60%). All patients and control group subjected to history taking, clinical examination, lateral soft tissue X-ray on the nasopharynx, APA based on the modified GRBAS scale and full laryngeal examination. The data are collected and analyzed statistically by using software SPSS. Our results showed that there is a significant association between adenoid hypertrophy and, degree of dysphonia, leaky voice, pitch of voice and laryngeal lesion. Adenoid hypertrophy did not associate with loudness of voice, as well as character (irregular, breathy and strained). Laryngeal lesions were detected in thirteen children from patient group (32.5%): nodules (n = 6), thickening (n = 5), congestion (n = 2), while one child only out of 20 children of the control group had congestion (5.0%). Our results showed the importance of the assessment of voice and laryngeal examination in patients with adenoid hypertrophy, also treating the minimal mucosal lesions that results from adenoid hypertrophy should be taken in consideration. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Intimate Geographies: Reclaiming Citizenship and Community in "The Autobiography of Delfina Cuero" and Bonita Nunez's "Diaries"

ERIC Educational Resources Information Center

Fitzgerald, Stephanie

2006-01-01

American Indian women's autobiographies recount a specific type of life experience that has often been overlooked, one that is equally important in understanding the genre and to develop ways of reading these texts that balance the recovery and recognition of the Native voice and agency contained within them with the processes of creation and the…

Advanced Productivity Analysis Methods for Air Traffic Control Operations

DTIC Science & Technology

1976-12-01

Routine Work ............................... 37 4.2.2. Surveillance Work .......................... 40 4.2.3. Conflict Prcessing Work ................... 41...crossing and overtake conflicts) includes potential- conflict recognition, assessment, and resolution decision making and A/N voice communications...makers to utilize £ .quantitative and dynamic analysis as a tool for decision - making. 1.1.3 Types of Simulation Models Although there are many ways to
The Complexity of Literacy in Kenya: Narrative Analysis of Maasai Women's Experiences

ERIC Educational Resources Information Center

Taeko, Takayanagi

2014-01-01

This paper aims to challenge limited notions of literacy and argues for the recognition of Maasai women's self-determined learning in order to bring about human development in Kenya. It also seeks to construct a complex picture of literacy, drawing on postcolonial feminist theory as a framework to ensure that the woman's voice is heard. Through…
Enhancement of temporal periodicity cues in cochlear implants: Effects on prosodic perception and vowel identification

NASA Astrophysics Data System (ADS)

Green, Tim; Faulkner, Andrew; Rosen, Stuart; Macherey, Olivier

2005-07-01

Standard continuous interleaved sampling processing, and a modified processing strategy designed to enhance temporal cues to voice pitch, were compared on tests of intonation perception, and vowel perception, both in implant users and in acoustic simulations. In standard processing, 400 Hz low-pass envelopes modulated either pulse trains (implant users) or noise carriers (simulations). In the modified strategy, slow-rate envelope modulations, which convey dynamic spectral variation crucial for speech understanding, were extracted by low-pass filtering (32 Hz). In addition, during voiced speech, higher-rate temporal modulation in each channel was provided by 100% amplitude-modulation by a sawtooth-like wave form whose periodicity followed the fundamental frequency (F0) of the input. Channel levels were determined by the product of the lower- and higher-rate modulation components. Both in acoustic simulations and in implant users, the ability to use intonation information to identify sentences as question or statement was significantly better with modified processing. However, while there was no difference in vowel recognition in the acoustic simulation, implant users performed worse with modified processing both in vowel recognition and in formant frequency discrimination. It appears that, while enhancing pitch perception, modified processing harmed the transmission of spectral information.
On combining multi-normalization and ancillary measures for the optimal score level fusion of fingerprint and voice biometrics

NASA Astrophysics Data System (ADS)

Mohammed Anzar, Sharafudeen Thaha; Sathidevi, Puthumangalathu Savithri

2014-12-01

In this paper, we have considered the utility of multi-normalization and ancillary measures, for the optimal score level fusion of fingerprint and voice biometrics. An efficient matching score preprocessing technique based on multi-normalization is employed for improving the performance of the multimodal system, under various noise conditions. Ancillary measures derived from the feature space and the score space are used in addition to the matching score vectors, for weighing the modalities, based on their relative degradation. Reliability (dispersion) and the separability (inter-/intra-class distance and d-prime statistics) measures under various noise conditions are estimated from the individual modalities, during the training/validation stage. The `best integration weights' are then computed by algebraically combining these measures using the weighted sum rule. The computed integration weights are then optimized against the recognition accuracy using techniques such as grid search, genetic algorithm and particle swarm optimization. The experimental results show that, the proposed biometric solution leads to considerable improvement in the recognition performance even under low signal-to-noise ratio (SNR) conditions and reduces the false acceptance rate (FAR) and false rejection rate (FRR), making the system useful for security as well as forensic applications.
Major depression is associated with impaired processing of emotion in music as well as in facial and vocal stimuli.

PubMed

Naranjo, C; Kornreich, C; Campanella, S; Noël, X; Vandriette, Y; Gillain, B; de Longueville, X; Delatte, B; Verbanck, P; Constant, E

2011-02-01

The processing of emotional stimuli is thought to be negatively biased in major depression. This study investigates this issue using musical, vocal and facial affective stimuli. 23 depressed in-patients and 23 matched healthy controls were recruited. Affective information processing was assessed through musical, vocal and facial emotion recognition tasks. Depression, anxiety level and attention capacity were controlled. The depressed participants demonstrated less accurate identification of emotions than the control group in all three sorts of emotion-recognition tasks. The depressed group also gave higher intensity ratings than the controls when scoring negative emotions, and they were more likely to attribute negative emotions to neutral voices and faces. Our in-patient group might differ from the more general population of depressed adults. They were all taking anti-depressant medication, which may have had an influence on their emotional information processing. Major depression is associated with a general negative bias in the processing of emotional stimuli. Emotional processing impairment in depression is not confined to interpersonal stimuli (faces and voices), being also present in the ability to feel music accurately. © 2010 Elsevier B.V. All rights reserved.
Flexible and wearable electronic silk fabrics for human physiological monitoring

NASA Astrophysics Data System (ADS)

Mao, Cuiping; Zhang, Huihui; Lu, Zhisong

2017-09-01

The development of textile-based devices for human physiological monitoring has attracted tremendous interest in recent years. However, flexible physiological sensing elements based on silk fabrics have not been realized. In this paper, ZnO nanorod arrays are grown in situ on reduced graphene oxide-coated silk fabrics via a facile electro-deposition method for the fabrication of silk-fabric-based mechanical sensing devices. The data show that well-aligned ZnO nanorods with hexagonal wurtzite crystalline structures are synthesized on the conductive silk fabric surface. After magnetron sputtering of gold electrodes, silk-fabric-based devices are produced and applied to detect periodic bending and twisting. Based on the electric signals, the deformation and release processes can be easily differentiated. Human arterial pulse and respiration can also be real-time monitored to calculate the pulse rate and respiration frequency, respectively. Throat vibrations during coughing and singing are detected to demonstrate the voice recognition capability. This work may not only help develop silk-fabric-based mechanical sensing elements for potential applications in clinical diagnosis, daily healthcare monitoring and voice recognition, but also provide a versatile method for fabricating textile-based flexible electronic devices.
Florida manatee avoidance technology: A pilot program by the Florida Fish and Wildlife Conservation Commission

NASA Astrophysics Data System (ADS)

Frisch, Katherine; Haubold, Elsa

2003-10-01

Since 1976, approximately 25% of the annual Florida manatee (Trichechus manatus latirostris) mortality has been attributed to collisions with watercraft. In 2001, the Florida Legislature appropriated $200,000 in funds for research projects using technological solutions to directly address the problem of collisions between manatees and watercraft. The Florida Fish & Wildlife Conservation Commission initially funded seven projects for the first two fiscal years. The selected proposals were designed to explore technology that had not previously been applied to the manatee/boat collision problem and included many acoustic concepts related to voice recognition, sonar, and an alerting device to be put on boats to warn manatees. The most promising results to date are from projects employing voice-recognition techniques to identify manatee vocalizations and warn boaters of the manatees' presence. Sonar technology, much like that used in fish finders, is promising but has met with regulatory problems regarding permitting and remains to be tested, as has the manatee-alerting device. The state of Florida found results of the initial years of funding compelling and plans to fund further manatee avoidance technology research in a continued effort to mitigate the problem of manatee/boat collisions.
Design and performance of a large vocabulary discrete word recognition system. Volume 1: Technical report. [real time computer technique for voice data processing

NASA Technical Reports Server (NTRS)

1973-01-01

The development, construction, and test of a 100-word vocabulary near real time word recognition system are reported. Included are reasonable replacement of any one or all 100 words in the vocabulary, rapid learning of a new speaker, storage and retrieval of training sets, verbal or manual single word deletion, continuous adaptation with verbal or manual error correction, on-line verification of vocabulary as spoken, system modes selectable via verification display keyboard, relationship of classified word to neighboring word, and a versatile input/output interface to accommodate a variety of applications.
Designing of Intelligent Multilingual Patient Reported Outcome System (IMPROS)

PubMed Central

Pourasghar, Faramarz; Partovi, Yeganeh

2015-01-01

Background: By self-reporting outcome procedure the patients themselves record disease symptoms outside medical centers and then report them to medical staff in specific periods of time. One of the self-reporting methods is the application of interactive voice response (IVR), in which some pre-designed questions in the form of voice tracks would be played and then the caller responses the questions by pressing phone’s keypad bottoms. Aim: The present research explains the main framework of such system designing according to IVR technology that is for the first time designed and administered in Iran. Methods: Interactive Voice Response system was composed by two main parts of hardware and software. Hardware section includes one or several digital phone lines, a modem card with voice playing capability and a PC. IVR software on the other hand, acts as an intelligent control center, records call information and controls incoming data. Results: One of the main features of the system is its capability to be administered in common PCs, utilizing simple and cheap modems, high speed to take responses and it’s appropriateness to low literate patients. The system is applicable for monitoring chronic diseases, cancer and also in psychological diseases and can be suitable for taking care of elders and Children who require long term cares. Other features include user-friendly, decrease in direct and indirect costs of disease treatment and enjoying from high level of security to access patients’ profiles. Conclusions: Intelligent multilingual patient reported outcome system (IMPROS) by controlling diseases gives the opportunity to patients to have more participation during treatment and it improves mutual interaction between patient and medical staff. Moreover it increases the quality of medical services, Additional to empowering patients and their followers. PMID:26635441
Research on realization scheme of interactive voice response (IVR) system

NASA Astrophysics Data System (ADS)

Jin, Xin; Zhu, Guangxi

2003-12-01

In this paper, a novel interactive voice response (IVR) system is proposed, which is apparently different from the traditional. Using software operation and network control, the IVR system is presented which only depends on software in the server in which the system lies and the hardware in network terminals on user side, such as gateway (GW), personal gateway (PG), PC and so on. The system transmits the audio using real time protocol (RTP) protocol via internet to the network terminals and controls flow using finite state machine (FSM) stimulated by H.245 massages sent from user side and the system control factors. Being compared with other existing schemes, this IVR system results in several advantages, such as greatly saving the system cost, fully utilizing the existing network resources and enhancing the flexibility. The system is capable to be put in any service server anywhere in the Internet and even fits for the wireless applications based on packet switched communication. The IVR system has been put into reality and passed the system test.
Improving the Capture and Re-Use of Data with Wearable Computers

NASA Technical Reports Server (NTRS)

Pfarr, Barbara; Fating, Curtis C.; Green, Daniel; Powers, Edward I. (Technical Monitor)

2001-01-01

At the Goddard Space Flight Center, members of the Real-Time Software Engineering Branch are developing a wearable, wireless, voice-activated computer for use in a wide range of crosscutting space applications that would benefit from having instant Internet, network, and computer access with complete mobility and hands-free operations. These applications can be applied across many fields and disciplines including spacecraft fabrication, integration and testing (including environmental testing), and astronaut on-orbit control and monitoring of experiments with ground based experimenters. To satisfy the needs of NASA customers, this wearable computer needs to be connected to a wireless network, to transmit and receive real-time video over the network, and to receive updated documents via the Internet or NASA servers. The voice-activated computer, with a unique vocabulary, will allow the users to access documentation in a hands free environment and interact in real-time with remote users. We will discuss wearable computer development, hardware and software issues, wireless network limitations, video/audio solutions and difficulties in language development.
Software for roof defects recognition on aerial photographs

NASA Astrophysics Data System (ADS)

Yudin, D.; Naumov, A.; Dolzhenko, A.; Patrakova, E.

2018-05-01

The article presents information on software for roof defects recognition on aerial photographs, made with air drones. An areal image segmentation mechanism is described. It allows detecting roof defects – unsmoothness that causes water stagnation after rain. It is shown that HSV-transformation approach allows quick detection of stagnation areas, their size and perimeters, but is sensitive to shadows and changes of the roofing-types. Deep Fully Convolutional Network software solution eliminates this drawback. The tested data set consists of the roofing photos with defects and binary masks for them. FCN approach gave acceptable results of image segmentation in Dice metric average value. This software can be used in inspection automation of roof conditions in the production sector and housing and utilities infrastructure.
Voice Signals Produced With Jitter Through a Stochastic One-mass Mechanical Model.

PubMed

Cataldo, Edson; Soize, Christian

2017-01-01

The quasiperiodic oscillation of the vocal folds causes perturbations in the length of the glottal cycles, which are known as jitter. The observation of the glottal cycles variations suggests that jitter is a random phenomenon described by random deviations of the glottal cycle lengths in relation to a corresponding mean value and, in general, its values are expressed as a percentage of the duration of the glottal pulse. The objective of this paper is the construction of a stochastic model for jitter using a one-mass mechanical model of the vocal folds, which assumes complete right-left symmetry of the vocal folds, and which considers motions of the vocal folds only in the horizontal direction. The jitter has been the subject for researchers due to its important applications such as the identification of pathological voices (nodules in the vocal folds, paralysis of the vocal folds, or even, the vocal aging, among others). Large values for jitter variations can indicate a pathological characteristic of the voice. The corresponding stiffness of each vocal fold is considered as a stochastic process, and its modeling is proposed. The probability density function of the fundamental frequency related to the voice signals produced are constructed and compared for different levels of jitter. Some samples of synthesized voices in these cases are obtained. It is showed that jitter could be obtained using the model proposed. The Praat software was also used to verify the measures of jitter in the synthesized voice signals. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Behavioral biometrics for verification and recognition of malicious software agents

NASA Astrophysics Data System (ADS)

Yampolskiy, Roman V.; Govindaraju, Venu

2008-04-01

Homeland security requires technologies capable of positive and reliable identification of humans for law enforcement, government, and commercial applications. As artificially intelligent agents improve in their abilities and become a part of our everyday life, the possibility of using such programs for undermining homeland security increases. Virtual assistants, shopping bots, and game playing programs are used daily by millions of people. We propose applying statistical behavior modeling techniques developed by us for recognition of humans to the identification and verification of intelligent and potentially malicious software agents. Our experimental results demonstrate feasibility of such methods for both artificial agent verification and even for recognition purposes.
Foundations for a syntatic pattern recognition system for genomic DNA sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Searles, D.B.

1993-03-01

The goal of the proposed work is the creation of a software system that will perform sophisticated pattern recognition and related functions at a level of abstraction and with expressive power beyond current general-purpose pattern-matching systems for biological sequences; and with a more uniform language, environment, and graphical user interface, and with greater flexibility, extensibility, embeddability, and ability to incorporate other algorithms, than current special-purpose analytic software.
Acoustical analysis of the underlying voice differences between two groups of professional singers: opera and country and western.

PubMed

Burns, P

1986-05-01

An acoustical analysis of the speaking and singing voices of two types of professional singers was conducted. The vowels /i/, /a/, and /o/ were spoken and sung ten times each by seven opera and seven country and western singers. Vowel spectra were derived by computer software techniques allowing quantitative assessment of formant structure (F1-F4), relative amplitude of resonance peaks (F1-F4), fundamental frequency, and harmonic high frequency energy. Formant analysis was the most effective parameter differentiating the two groups. Only opera singers lowered their fourth formant creating a wide-band resonance area (approximately 2,800 Hz) corresponding to the well-known "singing formant." Country and western singers revealed similar resonatory voice characteristics for both spoken and sung output. These results implicate faulty vocal technique in country and western singers as a contributory reason for vocal abuse/fatigue.
Low-Budget, Cost-Effective OCR: Optical Character Recognition for MS-DOS Micros.

ERIC Educational Resources Information Center

Perez, Ernest

1990-01-01

Discusses optical character recognition (OCR) for use with MS-DOS microcomputers. Cost effectiveness is considered, three types of software approaches to character recognition are explained, hardware and operation requirements are described, possible library applications are discussed, future OCR developments are suggested, and a list of OCR…
Emotional Cues during Simultaneous Face and Voice Processing: Electrophysiological Insights

PubMed Central

Liu, Taosheng; Pinheiro, Ana; Zhao, Zhongxin; Nestor, Paul G.; McCarley, Robert W.; Niznikiewicz, Margaret A.

2012-01-01

Both facial expression and tone of voice represent key signals of emotional communication but their brain processing correlates remain unclear. Accordingly, we constructed a novel implicit emotion recognition task consisting of simultaneously presented human faces and voices with neutral, happy, and angry valence, within the context of recognizing monkey faces and voices task. To investigate the temporal unfolding of the processing of affective information from human face-voice pairings, we recorded event-related potentials (ERPs) to these audiovisual test stimuli in 18 normal healthy subjects; N100, P200, N250, P300 components were observed at electrodes in the frontal-central region, while P100, N170, P270 were observed at electrodes in the parietal-occipital region. Results indicated a significant audiovisual stimulus effect on the amplitudes and latencies of components in frontal-central (P200, P300, and N250) but not the parietal occipital region (P100, N170 and P270). Specifically, P200 and P300 amplitudes were more positive for emotional relative to neutral audiovisual stimuli, irrespective of valence, whereas N250 amplitude was more negative for neutral relative to emotional stimuli. No differentiation was observed between angry and happy conditions. The results suggest that the general effect of emotion on audiovisual processing can emerge as early as 200 msec (P200 peak latency) post stimulus onset, in spite of implicit affective processing task demands, and that such effect is mainly distributed in the frontal-central region. PMID:22383987
Can blind persons accurately assess body size from the voice?

PubMed

Pisanski, Katarzyna; Oleszkiewicz, Anna; Sorokowska, Agnieszka

2016-04-01

Vocal tract resonances provide reliable information about a speaker's body size that human listeners use for biosocial judgements as well as speech recognition. Although humans can accurately assess men's relative body size from the voice alone, how this ability is acquired remains unknown. In this study, we test the prediction that accurate voice-based size estimation is possible without prior audiovisual experience linking low frequencies to large bodies. Ninety-one healthy congenitally or early blind, late blind and sighted adults (aged 20-65) participated in the study. On the basis of vowel sounds alone, participants assessed the relative body sizes of male pairs of varying heights. Accuracy of voice-based body size assessments significantly exceeded chance and did not differ among participants who were sighted, or congenitally blind or who had lost their sight later in life. Accuracy increased significantly with relative differences in physical height between men, suggesting that both blind and sighted participants used reliable vocal cues to size (i.e. vocal tract resonances). Our findings demonstrate that prior visual experience is not necessary for accurate body size estimation. This capacity, integral to both nonverbal communication and speech perception, may be present at birth or may generalize from broader cross-modal correspondences. © 2016 The Author(s).
Can blind persons accurately assess body size from the voice?

PubMed Central

Oleszkiewicz, Anna; Sorokowska, Agnieszka

2016-01-01

Vocal tract resonances provide reliable information about a speaker's body size that human listeners use for biosocial judgements as well as speech recognition. Although humans can accurately assess men's relative body size from the voice alone, how this ability is acquired remains unknown. In this study, we test the prediction that accurate voice-based size estimation is possible without prior audiovisual experience linking low frequencies to large bodies. Ninety-one healthy congenitally or early blind, late blind and sighted adults (aged 20–65) participated in the study. On the basis of vowel sounds alone, participants assessed the relative body sizes of male pairs of varying heights. Accuracy of voice-based body size assessments significantly exceeded chance and did not differ among participants who were sighted, or congenitally blind or who had lost their sight later in life. Accuracy increased significantly with relative differences in physical height between men, suggesting that both blind and sighted participants used reliable vocal cues to size (i.e. vocal tract resonances). Our findings demonstrate that prior visual experience is not necessary for accurate body size estimation. This capacity, integral to both nonverbal communication and speech perception, may be present at birth or may generalize from broader cross-modal correspondences. PMID:27095264

Software for Partly Automated Recognition of Targets

NASA Technical Reports Server (NTRS)

Opitz, David; Blundell, Stuart; Bain, William; Morris, Matthew; Carlson, Ian; Mangrich, Mark; Selinsky, T.

2002-01-01

The Feature Analyst is a computer program for assisted (partially automated) recognition of targets in images. This program was developed to accelerate the processing of high-resolution satellite image data for incorporation into geographic information systems (GIS). This program creates an advanced user interface that embeds proprietary machine-learning algorithms in commercial image-processing and GIS software. A human analyst provides samples of target features from multiple sets of data, then the software develops a data-fusion model that automatically extracts the remaining features from selected sets of data. The program thus leverages the natural ability of humans to recognize objects in complex scenes, without requiring the user to explain the human visual recognition process by means of lengthy software. Two major subprograms are the reactive agent and the thinking agent. The reactive agent strives to quickly learn the user's tendencies while the user is selecting targets and to increase the user's productivity by immediately suggesting the next set of pixels that the user may wish to select. The thinking agent utilizes all available resources, taking as much time as needed, to produce the most accurate autonomous feature-extraction model possible.
You can't touch this: touch-free navigation through radiological images.

PubMed

Ebert, Lars C; Hatch, Gary; Ampanozi, Garyfalia; Thali, Michael J; Ross, Steffen

2012-09-01

Keyboards, mice, and touch screens are a potential source of infection or contamination in operating rooms, intensive care units, and autopsy suites. The authors present a low-cost prototype of a system, which allows for touch-free control of a medical image viewer. This touch-free navigation system consists of a computer system (IMac, OS X 10.6 Apple, USA) with a medical image viewer (OsiriX, OsiriX foundation, Switzerland) and a depth camera (Kinect, Microsoft, USA). They implemented software that translates the data delivered by the camera and a voice recognition software into keyboard and mouse commands, which are then passed to OsiriX. In this feasibility study, the authors introduced 10 medical professionals to the system and asked them to re-create 12 images from a CT data set. They evaluated response times and usability of the system compared with standard mouse/keyboard control. Users felt comfortable with the system after approximately 10 minutes. Response time was 120 ms. Users required 1.4 times more time to re-create an image with gesture control. Users with OsiriX experience were significantly faster using the mouse/keyboard and faster than users without prior experience. They rated the system 3.4 out of 5 for ease of use in comparison to the mouse/keyboard. The touch-free, gesture-controlled system performs favorably and removes a potential vector for infection, protecting both patients and staff. Because the camera can be quickly and easily integrated into existing systems, requires no calibration, and is low cost, the barriers to using this technology are low.
The software peculiarities of pattern recognition in track detectors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Starkov, N.

The different kinds of nuclear track recognition algorithms are represented. Several complicated samples of use them in physical experiments are considered. The some processing methods of complicated images are described.
Voice/Natural Language Interfacing for Robotic Control.

DTIC Science & Technology

1987-11-01

THIS PAGE REPORT DOCUMENTATION PAGE Is. REPORT SECURITY CLASSIFICATION lb . RESTRICTIVE MARKINGS UNCLASSIFIED 2a. SECURITY CLASSIFICATION AUTHORITY 3...until major computing power can be profitably allocated to the speech recognition process, off-the- shelf units will never have sufficient intelligence to...coordinate transformation for a location, and opening or closing the gripper’s toggles. External to world operations, each joint may be rotated
The U.S.-China E-Language Project: A Study of a Gaming Approach to English Language Learning for Middle School Students

ERIC Educational Resources Information Center

Green, Patricia J.; Sha, Mandy; Liu, Lu

2011-01-01

In 2001, the U.S. Department of Education and the Ministry of Education in China entered into a bilateral partnership to develop a technology-driven approach to foreign language learning that integrated gaming, immersion, voice recognition, problem-based learning tasks, and other features that made it a significant research and development pilot…
Detection of possible restriction sites for type II restriction enzymes in DNA sequences.

PubMed

Gagniuc, P; Cimponeriu, D; Ionescu-Tîrgovişte, C; Mihai, Andrada; Stavarachi, Monica; Mihai, T; Gavrilă, L

2011-01-01

In order to make a step forward in the knowledge of the mechanism operating in complex polygenic disorders such as diabetes and obesity, this paper proposes a new algorithm (PRSD -possible restriction site detection) and its implementation in Applied Genetics software. This software can be used for in silico detection of potential (hidden) recognition sites for endonucleases and for nucleotide repeats identification. The recognition sites for endonucleases may result from hidden sequences through deletion or insertion of a specific number of nucleotides. Tests were conducted on DNA sequences downloaded from NCBI servers using specific recognition sites for common type II restriction enzymes introduced in the software database (n = 126). Each possible recognition site indicated by the PRSD algorithm implemented in Applied Genetics was checked and confirmed by NEBcutter V2.0 and Webcutter 2.0 software. In the sequence NG_008724.1 (which includes 63632 nucleotides) we found a high number of potential restriction sites for ECO R1 that may be produced by deletion (n = 43 sites) or insertion (n = 591 sites) of one nucleotide. The second module of Applied Genetics has been designed to find simple repeats sizes with a real future in understanding the role of SNPs (Single Nucleotide Polymorphisms) in the pathogenesis of the complex metabolic disorders. We have tested the presence of simple repetitive sequences in five DNA sequence. The software indicated exact position of each repeats detected in the tested sequences. Future development of Applied Genetics can provide an alternative for powerful tools used to search for restriction sites or repetitive sequences or to improve genotyping methods.
TreeRipper web application: towards a fully automated optical tree recognition software.

PubMed

Hughes, Joseph

2011-05-20

Relationships between species, genes and genomes have been printed as trees for over a century. Whilst this may have been the best format for exchanging and sharing phylogenetic hypotheses during the 20th century, the worldwide web now provides faster and automated ways of transferring and sharing phylogenetic knowledge. However, novel software is needed to defrost these published phylogenies for the 21st century. TreeRipper is a simple website for the fully-automated recognition of multifurcating phylogenetic trees (http://linnaeus.zoology.gla.ac.uk/~jhughes/treeripper/). The program accepts a range of input image formats (PNG, JPG/JPEG or GIF). The underlying command line c++ program follows a number of cleaning steps to detect lines, remove node labels, patch-up broken lines and corners and detect line edges. The edge contour is then determined to detect the branch length, tip label positions and the topology of the tree. Optical Character Recognition (OCR) is used to convert the tip labels into text with the freely available tesseract-ocr software. 32% of images meeting the prerequisites for TreeRipper were successfully recognised, the largest tree had 115 leaves. Despite the diversity of ways phylogenies have been illustrated making the design of a fully automated tree recognition software difficult, TreeRipper is a step towards automating the digitization of past phylogenies. We also provide a dataset of 100 tree images and associated tree files for training and/or benchmarking future software. TreeRipper is an open source project licensed under the GNU General Public Licence v3.
Software tool for data mining and its applications

NASA Astrophysics Data System (ADS)

Yang, Jie; Ye, Chenzhou; Chen, Nianyi

2002-03-01

A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.
Optical gesture sensing and depth mapping technologies for head-mounted displays: an overview

NASA Astrophysics Data System (ADS)

Kress, Bernard; Lee, Johnny

2013-05-01

Head Mounted Displays (HMDs), and especially see-through HMDs have gained renewed interest in recent time, and for the first time outside the traditional military and defense realm, due to several high profile consumer electronics companies presenting their products to hit market. Consumer electronics HMDs have quite different requirements and constrains as their military counterparts. Voice comments are the de-facto interface for such devices, but when the voice recognition does not work (not connection to the cloud for example), trackpad and gesture sensing technologies have to be used to communicate information to the device. We review in this paper the various technologies developed today integrating optical gesture sensing in a small footprint, as well as the various related 3d depth mapping sensors.
CD-ROM: A New Light for the Blind and Visually Impaired.

ERIC Educational Resources Information Center

Mates, Barbara T.

1990-01-01

Describes ways of using CD-ROM technology for the benefit of blind and visually impaired library patrons. Science, reference, and American historical documents that can be converted to braille, large print, or voice output from CD-ROMs are described, and hardware, software, and staff considerations are discussed. (LRW)
Using SysML to model complex systems for security.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cano, Lester Arturo

2010-08-01

As security systems integrate more Information Technology the design of these systems has tended to become more complex. Some of the most difficult issues in designing Complex Security Systems (CSS) are: Capturing Requirements: Defining Hardware Interfaces: Defining Software Interfaces: Integrating Technologies: Radio Systems: Voice Over IP Systems: Situational Awareness Systems.
ASERA: A Spectrum Eye Recognition Assistant

NASA Astrophysics Data System (ADS)

Yuan, Hailong; Zhang, Haotong; Zhang, Yanxia; Lei, Yajuan; Dong, Yiqiao; Zhao, Yongheng

2018-04-01

ASERA, ASpectrum Eye Recognition Assistant, aids in quasar spectral recognition and redshift measurement and can also be used to recognize various types of spectra of stars, galaxies and AGNs (Active Galactic Nucleus). This interactive software allows users to visualize observed spectra, superimpose template spectra from the Sloan Digital Sky Survey (SDSS), and interactively access related spectral line information. ASERA is an efficient and user-friendly semi-automated toolkit for the accurate classification of spectra observed by LAMOST (the Large Sky Area Multi-object Fiber Spectroscopic Telescope) and is available as a standalone Java application and as a Java applet. The software offers several functions, including wavelength and flux scale settings, zoom in and out, redshift estimation, and spectral line identification.
Study on intelligent processing system of man-machine interactive garment frame model

NASA Astrophysics Data System (ADS)

Chen, Shuwang; Yin, Xiaowei; Chang, Ruijiang; Pan, Peiyun; Wang, Xuedi; Shi, Shuze; Wei, Zhongqian

2018-05-01

A man-machine interactive garment frame model intelligent processing system is studied in this paper. The system consists of several sensor device, voice processing module, mechanical parts and data centralized acquisition devices. The sensor device is used to collect information on the environment changes brought by the body near the clothes frame model, the data collection device is used to collect the information of the environment change induced by the sensor device, voice processing module is used for speech recognition of nonspecific person to achieve human-machine interaction, mechanical moving parts are used to make corresponding mechanical responses to the information processed by data collection device.it is connected with data acquisition device by a means of one-way connection. There is a one-way connection between sensor device and data collection device, two-way connection between data acquisition device and voice processing module. The data collection device is one-way connection with mechanical movement parts. The intelligent processing system can judge whether it needs to interact with the customer, realize the man-machine interaction instead of the current rigid frame model.
Military and Government Applications of Human-Machine Communication by Voice

NASA Astrophysics Data System (ADS)

Weinstein, Clifford J.

1995-10-01

This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs.
Vocabulary Learning in a Yorkshire Terrier: Slow Mapping of Spoken Words

PubMed Central

Griebel, Ulrike; Oller, D. Kimbrough

2012-01-01

Rapid vocabulary learning in children has been attributed to “fast mapping”, with new words often claimed to be learned through a single presentation. As reported in 2004 in Science a border collie (Rico) not only learned to identify more than 200 words, but fast mapped the new words, remembering meanings after just one presentation. Our research tests the fast mapping interpretation of the Science paper based on Rico's results, while extending the demonstration of large vocabulary recognition to a lap dog. We tested a Yorkshire terrier (Bailey) with the same procedures as Rico, illustrating that Bailey accurately retrieved randomly selected toys from a set of 117 on voice command of the owner. Second we tested her retrieval based on two additional voices, one male, one female, with different accents that had never been involved in her training, again showing she was capable of recognition by voice command. Third, we did both exclusion-based training of new items (toys she had never seen before with names she had never heard before) embedded in a set of known items, with subsequent retention tests designed as in the Rico experiment. After Bailey succeeded on exclusion and retention tests, a crucial evaluation of true mapping tested items previously successfully retrieved in exclusion and retention, but now pitted against each other in a two-choice task. Bailey failed on the true mapping task repeatedly, illustrating that the claim of fast mapping in Rico had not been proven, because no true mapping task had ever been conducted with him. It appears that the task called retention in the Rico study only demonstrated success in retrieval by a process of extended exclusion. PMID:22363421
Constraints on the Transfer of Perceptual Learning in Accented Speech

PubMed Central

Eisner, Frank; Melinger, Alissa; Weber, Andrea

2013-01-01

The perception of speech sounds can be re-tuned through a mechanism of lexically driven perceptual learning after exposure to instances of atypical speech production. This study asked whether this re-tuning is sensitive to the position of the atypical sound within the word. We investigated perceptual learning using English voiced stop consonants, which are commonly devoiced in word-final position by Dutch learners of English. After exposure to a Dutch learner’s productions of devoiced stops in word-final position (but not in any other positions), British English (BE) listeners showed evidence of perceptual learning in a subsequent cross-modal priming task, where auditory primes with devoiced final stops (e.g., “seed”, pronounced [si:th]), facilitated recognition of visual targets with voiced final stops (e.g., SEED). In Experiment 1, this learning effect generalized to test pairs where the critical contrast was in word-initial position, e.g., auditory primes such as “town” facilitated recognition of visual targets like DOWN. Control listeners, who had not heard any stops by the speaker during exposure, showed no learning effects. The generalization to word-initial position did not occur when participants had also heard correctly voiced, word-initial stops during exposure (Experiment 2), and when the speaker was a native BE speaker who mimicked the word-final devoicing (Experiment 3). The readiness of the perceptual system to generalize a previously learned adjustment to other positions within the word thus appears to be modulated by distributional properties of the speech input, as well as by the perceived sociophonetic characteristics of the speaker. The results suggest that the transfer of pre-lexical perceptual adjustments that occur through lexically driven learning can be affected by a combination of acoustic, phonological, and sociophonetic factors. PMID:23554598
Blind speech separation system for humanoid robot with FastICA for audio filtering and separation

NASA Astrophysics Data System (ADS)

Budiharto, Widodo; Santoso Gunawan, Alexander Agung

2016-07-01

Nowadays, there are many developments in building intelligent humanoid robot, mainly in order to handle voice and image. In this research, we propose blind speech separation system using FastICA for audio filtering and separation that can be used in education or entertainment. Our main problem is to separate the multi speech sources and also to filter irrelevant noises. After speech separation step, the results will be integrated with our previous speech and face recognition system which is based on Bioloid GP robot and Raspberry Pi 2 as controller. The experimental results show the accuracy of our blind speech separation system is about 88% in command and query recognition cases.
Speech and Voice Response to a Levodopa Challenge in Late-Stage Parkinson's Disease.

PubMed

Fabbri, Margherita; Guimarães, Isabel; Cardoso, Rita; Coelho, Miguel; Guedes, Leonor Correia; Rosa, Mario M; Godinho, Catarina; Abreu, Daisy; Gonçalves, Nilza; Antonini, Angelo; Ferreira, Joaquim J

2017-01-01

Parkinson's disease (PD) patients are affected by hypokinetic dysarthria, characterized by hypophonia and dysprosody, which worsens with disease progression. Levodopa's (l-dopa) effect on quality of speech is inconclusive; no data are currently available for late-stage PD (LSPD). To assess the modifications of speech and voice in LSPD following an acute l-dopa challenge. LSPD patients [Schwab and England score <50/Hoehn and Yahr stage >3 (MED ON)] performed several vocal tasks before and after an acute l-dopa challenge. The following was assessed: respiratory support for speech, voice quality, stability and variability, speech rate, and motor performance (MDS-UPDRS-III). All voice samples were recorded and analyzed by a speech and language therapist blinded to patients' therapeutic condition using Praat 5.1 software. 24/27 (14 men) LSPD patients succeeded in performing voice tasks. Median age and disease duration of patients were 79 [IQR: 71.5-81.7] and 14.5 [IQR: 11-15.7] years, respectively. In MED OFF, respiratory breath support and pitch break time of LSPD patients were worse than the normative values of non-parkinsonian. A correlation was found between disease duration and voice quality ( R = 0.51; p = 0.013) and speech rate ( R = -0.55; p = 0.008). l-Dopa significantly improved MDS-UPDRS-III score (20%), with no effect on speech as assessed by clinical rating scales and automated analysis. Speech is severely affected in LSPD. Although l-dopa had some effect on motor performance, including axial signs, speech and voice did not improve. The applicability and efficacy of non-pharmacological treatment for speech impairment should be considered for speech disorder management in PD.
Period for Normalization of Voice Acoustic Parameters in Indian Pediatric Cochlear Implantees.

PubMed

Joy, Jeena V; Deshpande, Shweta; Vaid, Dr Neelam

2017-05-01

The purpose of this study was to investigate the duration required by children with cochlear implants to approximate the norms of voice acoustic parameters. The study design is retrospective. Thirty children with cochlear implants (chronological ages ranging between 4.1 and 6.7 years) were divided into three groups, based on the postimplantation duration. Ten normal-hearing children (chronological ages ranging between 4 and 7 years) were selected as the control group. All implanted children underwent an objective voice analysis using Dr. Speech software (Tiger DRS, Inc., Seattle, WA, USA) at 6 months and at 1 and 2 years of implant use. Voice analysis was done for the children in the control group and means were derived for all the parameters analyzed to obtain the normal values. Habitual fundamental frequency (HFF), jitter (frequency variation), and shimmer (amplitude variation) were the voice acoustic parameters analyzed for the vowels |a|, |i|, and |u|. The obtained values of these parameters were then compared with the norms. HFF for the children with implant use for 6 months and 1 year did significantly differ from the control group. However, there was no significant difference (P > 0.5) observed in the children with implant use for 2 years, thus matching the norms. Jitter and shimmer showed a significant difference (P < 0.5) even at 2 years of implant use when compared with the control group. The findings of the study divulge that children with cochlear implants approximate age-matched normal-hearing kids with respect to the voice acoustic parameter of HFF by 2 years of implant use. However, jitter and shimmer were not found to stabilize for the duration studied. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Correlation of VHI-30 to Acoustic Measurements Across Three Common Voice Disorders.

PubMed

Dehqan, Ali; Yadegari, Fariba; Scherer, Ronald C; Dabirmoghadam, Peyman

2017-01-01

Voice disorders that affect the quality of voice also result in varying degrees of psychological and social problems. The research question here is whether the correlations between Voice Handicap Index (VHI)-30 scores and objective acoustic measures differ in patients with different types of voice disorders. The subjects were divided into three groups: muscle tension dysphonia (MTD), benign mid-membranous vocal fold lesions, and unilateral vocal fold paralysis (UVFP). All participants were male. The mean age for the groups were 32.85 ± 8.6 years in the MTD group, 33.24 ± 7.32 years in the benign lesions group, and 34.24 ± 7.51 years in the UVFP group. The participants completed the Persian VHI-30 questionnaire. PRAAT software was used to obtain acoustic analyses. There was a significant correlation between the physical subscale of the VHI-30 and the total score of the VHI-30 and maximum phonation time (MPT) in the MTD group. Also, there was a significant correlation between the total VHI-30 score and the MPT value. There were relatively strong and significant correlations between the physical subscale of the VHI-30 with jitter and shimmer, harmonics-to-noise ratio (HNR) for the group with benign lesions such as nodules and polyps. Also, in this group, there was a significant correlation between the total VHI-30 score and the jitter value. The physical scale had strong and significant correlations between jitter, shimmer, and HNR in the unilateral paralysis group. Findings suggest that although the VHI-30 and the acoustic measurements of voice provide independent information, they are associated to some extent. Copyright © 2017 The Voice Foundation. All rights reserved.

Foundations for a syntatic pattern recognition system for genomic DNA sequences. [Annual] report, 1 December 1991--31 March 1993

DOE Office of Scientific and Technical Information (OSTI.GOV)

Searles, D.B.

1993-03-01

The goal of the proposed work is the creation of a software system that will perform sophisticated pattern recognition and related functions at a level of abstraction and with expressive power beyond current general-purpose pattern-matching systems for biological sequences; and with a more uniform language, environment, and graphical user interface, and with greater flexibility, extensibility, embeddability, and ability to incorporate other algorithms, than current special-purpose analytic software.
How Psychological Stress Affects Emotional Prosody.

PubMed

Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J

2016-01-01

We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity.
How Psychological Stress Affects Emotional Prosody

PubMed Central

Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J.

2016-01-01

We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity. PMID:27802287
Recognition memory and awareness: occurrence of perceptual effects in remembering or in knowing depends on conscious resources at encoding, but not at retrieval.

PubMed

Gardiner, John M; Gregg, Vernon H; Karayianni, Irene

2006-03-01

We report four experiments in which a remember-know paradigm was combined with a response deadline procedure in order to assess memory awareness in fast, as compared with slow,recognition judgments. In the experiments, we also investigated the perceptual effects of study-test congruence, either for picture size or for speaker's voice, following either full or divided attention at study. These perceptual effects occurred in remembering with full attention and in knowing with divided attention, but they were uninfluenced by recognition speed, indicating that their occurrence in remembering or knowing depends more on conscious resources at encoding than on those at retrieval. The results have implications for theoretical accounts of remembering and knowing that assume that remembering is more consciously controlled and effortful, whereas knowing is more automatic and faster.
Sensitivity-Enhanced Wearable Active Voiceprint Sensor Based on Cellular Polypropylene Piezoelectret.

PubMed

Li, Wenbo; Zhao, Sheng; Wu, Nan; Zhong, Junwen; Wang, Bo; Lin, Shizhe; Chen, Shuwen; Yuan, Fang; Jiang, Hulin; Xiao, Yongjun; Hu, Bin; Zhou, Jun

2017-07-19

Wearable active sensors have extensive applications in mobile biosensing and human-machine interaction but require good flexibility, high sensitivity, excellent stability, and self-powered feature. In this work, cellular polypropylene (PP) piezoelectret was chosen as the core material of a sensitivity-enhanced wearable active voiceprint sensor (SWAVS) to realize voiceprint recognition. By virtue of the dipole orientation control method, the air layers in the piezoelectret were efficiently utilized, and the current sensitivity was enhanced (from 1.98 pA/Hz to 5.81 pA/Hz at 115 dB). The SWAVS exhibited the superiorities of high sensitivity, accurate frequency response, and excellent stability. The voiceprint recognition system could make correct reactions to human voices by judging both the password and speaker. This study presented a voiceprint sensor with potential applications in noncontact biometric recognition and safety guarantee systems, promoting the progress of wearable sensor networks.
GeoPad: Innovative Applications of Information Technology in Field Science Education

NASA Astrophysics Data System (ADS)

Knoop, P. A.; van der Pluijm, B.

2003-12-01

A core requirement for most undergraduate degrees in the Earth sciences is a course in field geology, which provides students with training in field science methodologies, including geologic mapping. The University of Michigan Geological Sciences' curriculum includes a seven-week, summer field course, GS-440, based out of the university's Camp Davis Geologic Field Station, near Jackson, WY. Such field-based courses stand to benefit tremendously from recent innovations in Information Technology \$IT\$, especially in the form of increasing portability, new haptic interfaces for personal computers, and advancements in Geographic Information System \$GIS\$ software. Such innovations are enabling in-the-field, real-time access to powerful data collection, analysis, visualization, and interpretation tools. The benefits of these innovations, however, can only be realized on a broad basis when the IT reaches a level of maturity at which users can easily employ it to enhance their learning experience and scientific activities, rather than the IT itself being a primary focus of the curriculum or a constraint on field activities. The GeoPad represents a combination of these novel technologies that achieves that goal. The GeoPad concept integrates a ruggedized Windows XP TabletPC equipped with wireless networking, a portable GPS receiver, digital camera, microphone-headset, voice-recognition software, GIS, and supporting, digital, geo-referenced data-sets. A key advantage of the GeoPad is enabling field-based usage of visualization software and data focusing on \$3D\$ geospatial relationships \$developed as part of the complementary GeoWall initiative\$, which provides a powerful new tool for enhancing and facilitating undergraduate field geology education, as demonstrated during the summer 2003 session of GS-440. In addition to an education in field methodologies, students also gain practical experience using IT that they will encounter during their continued educational, research, or professional careers. This approach is immediately applicable to field geology courses elsewhere and indeed to other field-oriented programs \$e.g., in biology, archeology, ecology\$, given similar needs.
Listening to the student voice to improve educational software.

PubMed

van Wyk, Mari; van Ryneveld, Linda

2017-01-01

Academics often develop software for teaching and learning purposes with the best of intentions, only to be disappointed by the low acceptance rate of the software by their students once it is implemented. In this study, the focus is on software that was designed to enable veterinary students to record their clinical skills. A pilot of the software clearly showed that the program had not been received as well as had been anticipated, and therefore the researchers used a group interview and a questionnaire with closed-ended and open-ended questions to obtain the students' feedback. The open-ended questions were analysed with conceptual content analysis, and themes were identified. Students made valuable suggestions about what they regarded as important considerations when a new software program is introduced. The most important lesson learnt was that students cannot always predict their needs accurately if they are asked for input prior to the development of software. For that reason student input should be obtained on a continuous and regular basis throughout the design and development phases.
Morphing Images: A Potential Tool for Teaching Word Recognition to Children with Severe Learning Difficulties

ERIC Educational Resources Information Center

Sheehy, Kieron

2005-01-01

Children with severe learning difficulties who fail to begin word recognition can learn to recognise pictures and symbols relatively easily. However, finding an effective means of using pictures to teach word recognition has proved problematic. This research explores the use of morphing software to support the transition from picture to word…
Pattern recognition: A basis for remote sensing data analysis

NASA Technical Reports Server (NTRS)

Swain, P. H.

1973-01-01

The theoretical basis for the pattern-recognition-oriented algorithms used in the multispectral data analysis software system is discussed. A model of a general pattern recognition system is presented. The receptor or sensor is usually a multispectral scanner. For each ground resolution element the receptor produces n numbers or measurements corresponding to the n channels of the scanner.
Joint Sparse Representation for Robust Multimodal Biometrics Recognition

DTIC Science & Technology

2014-01-01

comprehensive multimodal dataset and a face database are described in section V. Finally, in section VI, we discuss the computational complexity of...fingerprint, iris, palmprint , hand geometry and voice from subjects of different age, gender and ethnicity as described in Table I. It is a...Taylor, “Constructing nonlinear discriminants from multiple data views,” Machine Learning and Knowl- edge Discovery in Databases , pp. 328–343, 2010
A Preliminary Analysis of Human Factors Affecting the Recognition Accuracy of a Discrete Word Recognizer for C3 Systems.

DTIC Science & Technology

1983-03-01

acoustic wave pattern and, if so, word recognitios would be a sliple matter of the voice recogniticn system scanning the pattern, comparing the slmple...TRAINING WEEL - EEK#1 ORD# UTTERANCE CRT PRCtMPT (co0THREE THREE (Oe1EUROPE ERP V)r;_ OVE IT LEFT MCV7 IT LEFT 01 !CARRIAGE RETURN CAER RETURN LOGOLT LOGO UT
A Cross-Lingual Mobile Medical Communication System Prototype for Foreigners and Subjects with Speech, Hearing, and Mental Disabilities Based on Pictograms

PubMed Central

Wołk, Agnieszka; Glinkowski, Wojciech

2017-01-01

People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer. PMID:29230254
Interactive voice technology: Variations in the vocal utterances of speakers performing a stress-inducing task

NASA Astrophysics Data System (ADS)

Mosko, J. D.; Stevens, K. N.; Griffin, G. R.

1983-08-01

Acoustical analyses were conducted of words produced by four speakers in a motion stress-inducing situation. The aim of the analyses was to document the kinds of changes that occur in the vocal utterances of speakers who are exposed to motion stress and to comment on the implications of these results for the design and development of voice interactive systems. The speakers differed markedly in the types and magnitudes of the changes that occurred in their speech. For some speakers, the stress-inducing experimental condition caused an increase in fundamental frequency, changes in the pattern of vocal fold vibration, shifts in vowel production and changes in the relative amplitudes of sounds containing turbulence noise. All speakers showed greater variability in the experimental condition than in more relaxed control situation. The variability was manifested in the acoustical characteristics of individual phonetic elements, particularly in speech sound variability observed serve to unstressed syllables. The kinds of changes and variability observed serve to emphasize the limitations of speech recognition systems based on template matching of patterns that are stored in the system during a training phase. There is need for a better understanding of these phonetic modifications and for developing ways of incorporating knowledge about these changes within a speech recognition system.
A Cross-Lingual Mobile Medical Communication System Prototype for Foreigners and Subjects with Speech, Hearing, and Mental Disabilities Based on Pictograms.

PubMed

Wołk, Krzysztof; Wołk, Agnieszka; Glinkowski, Wojciech

2017-01-01

People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer.
Software for Partly Automated Recognition of Targets

NASA Technical Reports Server (NTRS)

Opitz, David; Blundell, Stuart; Bain, William; Morris, Matthew; Carlson, Ian; Mangrich, Mark

2003-01-01

The Feature Analyst is a computer program for assisted (partially automated) recognition of targets in images. This program was developed to accelerate the processing of high-resolution satellite image data for incorporation into geographic information systems (GIS). This program creates an advanced user interface that embeds proprietary machine-learning algorithms in commercial image-processing and GIS software. A human analyst provides samples of target features from multiple sets of data, then the software develops a data-fusion model that automatically extracts the remaining features from selected sets of data. The program thus leverages the natural ability of humans to recognize objects in complex scenes, without requiring the user to explain the human visual recognition process by means of lengthy software. Two major subprograms are the reactive agent and the thinking agent. The reactive agent strives to quickly learn the user s tendencies while the user is selecting targets and to increase the user s productivity by immediately suggesting the next set of pixels that the user may wish to select. The thinking agent utilizes all available resources, taking as much time as needed, to produce the most accurate autonomous feature-extraction model possible.
Asymmetric cultural effects on perceptual expertise underlie an own-race bias for voices

PubMed Central

Perrachione, Tyler K.; Chiao, Joan Y.; Wong, Patrick C.M.

2009-01-01

The own-race bias in memory for faces has been a rich source of empirical work on the mechanisms of person perception. This effect is thought to arise because the face-perception system differentially encodes the relevant structural dimensions of features and their configuration based on experiences with different groups of faces. However, the effects of sociocultural experiences on person perception abilities in other identity-conveying modalities like audition have not been explored. Investigating an own-race bias in the auditory domain provides a unique opportunity for studying whether person identification is a modality-independent construct and how it is sensitive to asymmetric cultural experiences. Here we show that an own-race bias in talker identification arises from asymmetric experience with different spoken dialects. When listeners categorized voices by race (White or Black), a subset of the Black voices were categorized as sounding White, while the opposite case was unattested. Acoustic analyses indicated listeners' perceptions about race were consistent with differences in specific phonetic and phonological features. In a subsequent person-identification experiment, the Black voices initially categorized as sounding White elicited an own-race bias from White listeners, but not from Black listeners. These effects are inconsistent with person-perception models that strictly analogize faces and voices based on recognition from only structural features. Our results demonstrate that asymmetric exposure to spoken dialect, independent from talkers' physical characteristics, affects auditory perceptual expertise for talker identification. Person perception thus additionally relies on socioculturally-acquired dynamic information, which may be represented by different mechanisms in different sensory modalities. PMID:19782970
Syntactic and semantic errors in radiology reports associated with speech recognition software.

PubMed

Ringler, Michael D; Goss, Brian C; Bartholmai, Brian J

2017-03-01

Speech recognition software can increase the frequency of errors in radiology reports, which may affect patient care. We retrieved 213,977 speech recognition software-generated reports from 147 different radiologists and proofread them for errors. Errors were classified as "material" if they were believed to alter interpretation of the report. "Immaterial" errors were subclassified as intrusion/omission or spelling errors. The proportion of errors and error type were compared among individual radiologists, imaging subspecialty, and time periods. In all, 20,759 reports (9.7%) contained errors, of which 3992 (1.9%) were material errors. Among immaterial errors, spelling errors were more common than intrusion/omission errors ( p < .001). Proportion of errors and fraction of material errors varied significantly among radiologists and between imaging subspecialties ( p < .001). Errors were more common in cross-sectional reports, reports reinterpreting results of outside examinations, and procedural studies (all p < .001). Error rate decreased over time ( p < .001), which suggests that a quality control program with regular feedback may reduce errors.
Comparison of speech perception performance between Sprint/Esprit 3G and Freedom processors in children implanted with nucleus cochlear implants.

PubMed

Santarelli, Rosamaria; Magnavita, Vincenzo; De Filippi, Roberta; Ventura, Laura; Genovese, Elisabetta; Arslan, Edoardo

2009-04-01

To compare speech perception performance in children fitted with previous generation Nucleus sound processor, Sprint or Esprit 3G, and the Freedom, the most recently released system from the Cochlear Corporation that features a larger input dynamic range. Prospective intrasubject comparative study. University Medical Center. Seventeen prelingually deafened children who had received the Nucleus 24 cochlear implant and used the Sprint or Esprit 3G sound processor. Cochlear implantation with Cochlear device. Speech perception was evaluated at baseline (Sprint, n = 11; Esprit 3G, n = 6) and after 1 month's experience with the Freedom sound processor. Identification and recognition of disyllabic words and identification of vowels were performed via recorded voice in quiet (70 dB [A]), in the presence of background noise at various levels of signal-to-noise ratio (+10, +5, 0, -5) and at a soft presentation level (60 dB [A]). Consonant identification and recognition of disyllabic words, trisyllabic words, and sentences were evaluated in live voice. Frequency discrimination was measured in a subset of subjects (n = 5) by using an adaptive, 3-interval, 3-alternative, forced-choice procedure. Identification of disyllabic words administered at a soft presentation level showed a significant increase when switching to the Freedom compared with the previously worn processor in children using the Sprint or Esprit 3G. Identification and recognition of disyllabic words in the presence of background noise as well as consonant identification and sentence recognition increased significantly for the Freedom compared with the previously worn device only in children fitted with the Sprint. Frequency discrimination was significantly better when switching to the Freedom compared with the previously worn processor. Serial comparisons revealed that that speech perception performance evaluated in children aged 5 to 15 years was superior with the Freedom than previous generations of Nucleus sound processors. These differences are deemed to ensue from an increased input dynamic range, a feature that offers potentially enhanced phonemic discrimination.
Distance Learners' Perspective on User-Friendly Instructional Materials at the University of Zambia

ERIC Educational Resources Information Center

Simui, F.; Thompson, L. C.; Mundende, K.; Mwewa, G.; Kakana, F.; Chishiba, A.; Namangala, B.

2017-01-01

This case study focuses on print-based instructional materials available to distance education learners at the University of Zambia. Using the Visual Paradigm Software, we model distance education learners' voices into sociograms to make a contribution to the ongoing discourse on quality distance learning in poorly resourced communities. Emerging…
Social Software: Participants' Experience Using Social Networking for Learning

ERIC Educational Resources Information Center

Batchelder, Cecil W.

2010-01-01

Social networking tools used in learning provides instructional design with tools for transformative change in education. This study focused on defining the meanings and essences of social networking through the lived common experiences of 7 college students. The problem of the study was a lack of learner voice in understanding the value of social…

Make Your Voice Heard!

ERIC Educational Resources Information Center

Branzburg, Jeffrey

2006-01-01

A podcast is a method of distributing multimedia files, usually (but not limited to) audio in the MP3 format, over the Internet to subscribers. Anybody can be a subscriber--one only needs the proper software to receive the subscription. In this article, the author discusses how to create one's own podcast. Before creating the podcast, one needs a…
Adult Students' Perceptions of Automated Writing Assessment Software: Does It Foster Engagement?

ERIC Educational Resources Information Center

LaGuerre, Joselle L.

2013-01-01

Generally, this descriptive study endeavored to include the voice of adult learners to the scholarly body of research regarding automated writing assessment tools (AWATs). Specifically, the study sought to determine the extent to which students perceive that the AWAT named Criterion fosters learning and if students' opinions differ depending on…
Helping Students Express Their Passion

ERIC Educational Resources Information Center

Mann, Michelle

2011-01-01

Adobe Youth Voices (AYV) is a global educational program sponsored by the Adobe Foundation, the philanthropic arm of software maker Adobe. The education-based initiative teaches underserved kids aged 13-18 how to use digital media to comment on their world, share ideas, and take action on the social issues that are important to them. The AYV…
Installing an Integrated Information System in a Centralized Network.

ERIC Educational Resources Information Center

Mendelson, Andrew D.

1992-01-01

Many schools are looking at ways to centralize the distribution and retrieval of video, voice, and data transmissions in an integrate information system (IIS). A centralized system offers greater control of hardware and software. Describes media network planning to retrofit an Illinois' high school with a fiber optic-based IIS. (MLF)
An Innovative Spreadsheet Application to Teach Management Science Decision Criteria

ERIC Educational Resources Information Center

Hozak, Kurt

2018-01-01

This article describes a Microsoft Excel-based application that uses humorous voice synthesis and timed competition to make it more fun and engaging to learn management science decision criteria. In addition to providing immediate feedback and easily customizable tips that facilitate self-learning, the software randomly generates both the problem…
Student Privacy and Educational Data Mining: Perspectives from Industry

ERIC Educational Resources Information Center

Sabourin, Jennifer; Kosturko, Lucy; FitzGerald, Clare; McQuiggan, Scott

2015-01-01

While the field of educational data mining (EDM) has generated many innovations for improving educational software and student learning, the mining of student data has recently come under a great deal of scrutiny. Many stakeholder groups, including public officials, media outlets, and parents, have voiced concern over the privacy of student data…
The Teaching Voice on the Learning Platform: Seeking Classroom Climates within a Virtual Learning Environment

ERIC Educational Resources Information Center

Crook, Charles; Cluley, Robert

2009-01-01

University staff are now encouraged to supplement their classroom activity with computer-based tools and resources accessible through virtual learning environments (VLEs). Meanwhile, university students increasingly make recreational use of computer networks in the form of various social software applications. This paper explores tensions of…
Internet Telephony: The Next Killer Application? (Or, How I Cut My Long-Distance Phone Bill to Nothing!).

ERIC Educational Resources Information Center

Learn, Larry L., Ed.

1995-01-01

Discusses the evolution of real-time telephony and broadcast applications using the Internet; resulting issues and opportunities; and future implications for regulators, Internet users, and service providers. Topics covered include bandpass, packetized voice, IP structures, class D datagrams, software, technical parameters, legal and regulatory…
Technology in the Public Library: Results from the 1992 PLDS Survey of Technology.

ERIC Educational Resources Information Center

Fidler, Linda M.; Johnson, Debra Wilcox

1994-01-01

Discusses and compares the incorporation of technology by larger public libraries in Canada and the United States. Technology mentioned includes online public access catalogs; remote and local online database searching; microcomputers and software for public use; and fax, voice mail, and Telecommunication Devices for the Deaf and Teletype writer…
Plastic reorganization of neural systems for perception of others in the congenitally blind.

PubMed

Fairhall, S L; Porter, K B; Bellucci, C; Mazzetti, M; Cipolli, C; Gobbini, M I

2017-09-01

Recent evidence suggests that the function of the core system for face perception might extend beyond visual face-perception to a broader role in person perception. To critically test the broader role of core face-system in person perception, we examined the role of the core system during the perception of others in 7 congenitally blind individuals and 15 sighted subjects by measuring their neural responses using fMRI while they listened to voices and performed identity and emotion recognition tasks. We hypothesised that in people who have had no visual experience of faces, core face-system areas may assume a role in the perception of others via voices. Results showed that emotions conveyed by voices can be decoded in homologues of the core face system only in the blind. Moreover, there was a specific enhancement of response to verbal as compared to non-verbal stimuli in bilateral fusiform face areas and the right posterior superior temporal sulcus showing that the core system also assumes some language-related functions in the blind. These results indicate that, in individuals with no history of visual experience, areas of the core system for face perception may assume a role in aspects of voice perception that are relevant to social cognition and perception of others' emotions. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Speech-based Class Attendance

NASA Astrophysics Data System (ADS)

Faizel Amri, Umar; Nur Wahidah Nik Hashim, Nik; Hazrin Hany Mohamad Hanif, Noor

2017-11-01

In the department of engineering, students are required to fulfil at least 80 percent of class attendance. Conventional method requires student to sign his/her initial on the attendance sheet. However, this method is prone to cheating by having another student signing for their fellow classmate that is absent. We develop our hypothesis according to a verse in the Holy Qur’an (95:4), “We have created men in the best of mould”. Based on the verse, we believe each psychological characteristic of human being is unique and thus, their speech characteristic should be unique. In this paper we present the development of speech biometric-based attendance system. The system requires user’s voice to be installed in the system as trained data and it is saved in the system for registration of the user. The following voice of the user will be the test data in order to verify with the trained data stored in the system. The system uses PSD (Power Spectral Density) and Transition Parameter as the method for feature extraction of the voices. Euclidean and Mahalanobis distances are used in order to verified the user’s voice. For this research, ten subjects of five females and five males were chosen to be tested for the performance of the system. The system performance in term of recognition rate is found to be 60% correct identification of individuals.
The structural neuroanatomy of music emotion recognition: Evidence from frontotemporal lobar degeneration

PubMed Central

Omar, Rohani; Henley, Susie M.D.; Bartlett, Jonathan W.; Hailstone, Julia C.; Gordon, Elizabeth; Sauter, Disa A.; Frost, Chris; Scott, Sophie K.; Warren, Jason D.

2011-01-01

Despite growing clinical and neurobiological interest in the brain mechanisms that process emotion in music, these mechanisms remain incompletely understood. Patients with frontotemporal lobar degeneration (FTLD) frequently exhibit clinical syndromes that illustrate the effects of breakdown in emotional and social functioning. Here we investigated the neuroanatomical substrate for recognition of musical emotion in a cohort of 26 patients with FTLD (16 with behavioural variant frontotemporal dementia, bvFTD, 10 with semantic dementia, SemD) using voxel-based morphometry. On neuropsychological evaluation, patients with FTLD showed deficient recognition of canonical emotions (happiness, sadness, anger and fear) from music as well as faces and voices compared with healthy control subjects. Impaired recognition of emotions from music was specifically associated with grey matter loss in a distributed cerebral network including insula, orbitofrontal cortex, anterior cingulate and medial prefrontal cortex, anterior temporal and more posterior temporal and parietal cortices, amygdala and the subcortical mesolimbic system. This network constitutes an essential brain substrate for recognition of musical emotion that overlaps with brain regions previously implicated in coding emotional value, behavioural context, conceptual knowledge and theory of mind. Musical emotion recognition may probe the interface of these processes, delineating a profile of brain damage that is essential for the abstraction of complex social emotions. PMID:21385617
The structural neuroanatomy of music emotion recognition: evidence from frontotemporal lobar degeneration.

PubMed

Omar, Rohani; Henley, Susie M D; Bartlett, Jonathan W; Hailstone, Julia C; Gordon, Elizabeth; Sauter, Disa A; Frost, Chris; Scott, Sophie K; Warren, Jason D

2011-06-01

Despite growing clinical and neurobiological interest in the brain mechanisms that process emotion in music, these mechanisms remain incompletely understood. Patients with frontotemporal lobar degeneration (FTLD) frequently exhibit clinical syndromes that illustrate the effects of breakdown in emotional and social functioning. Here we investigated the neuroanatomical substrate for recognition of musical emotion in a cohort of 26 patients with FTLD (16 with behavioural variant frontotemporal dementia, bvFTD, 10 with semantic dementia, SemD) using voxel-based morphometry. On neuropsychological evaluation, patients with FTLD showed deficient recognition of canonical emotions (happiness, sadness, anger and fear) from music as well as faces and voices compared with healthy control subjects. Impaired recognition of emotions from music was specifically associated with grey matter loss in a distributed cerebral network including insula, orbitofrontal cortex, anterior cingulate and medial prefrontal cortex, anterior temporal and more posterior temporal and parietal cortices, amygdala and the subcortical mesolimbic system. This network constitutes an essential brain substrate for recognition of musical emotion that overlaps with brain regions previously implicated in coding emotional value, behavioural context, conceptual knowledge and theory of mind. Musical emotion recognition may probe the interface of these processes, delineating a profile of brain damage that is essential for the abstraction of complex social emotions. Copyright © 2011 Elsevier Inc. All rights reserved.
Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition.

PubMed

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners' Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners' everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial-F0s to represent male vs. female voices, and different F0 contours (i.e. falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners' sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners' performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution.
Processing F0 with Cochlear Implants: Modulation Frequency Discrimination and Speech Intonation Recognition

PubMed Central

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners’ Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners’ everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial F0s to represent male vs. female voices, and different F0 contours (i.e., falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners’ sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners’ performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution. PMID:18093766
Recognition of handprinted characters for automated cartography A progress report

NASA Technical Reports Server (NTRS)

Lybanon, M.; Brown, R. M.; Gronmeyer, L. K.

1980-01-01

A research program for developing handwritten character recognition techniques is reported. The generation of cartographic/hydrographic manuscripts is overviewed. The performance of hardware/software systems is discussed, along with future research problem areas and planned approaches.
Identification and tracking of particular speaker in noisy environment

NASA Astrophysics Data System (ADS)

Sawada, Hideyuki; Ohkado, Minoru

2004-10-01

Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.
Imaging Systems: What, When, How.

ERIC Educational Resources Information Center

Lunin, Lois F.; And Others

1992-01-01

The three articles in this special section on document image files discuss intelligent character recognition, including comparison with optical character recognition; selection of displays for document image processing, focusing on paperlike displays; and imaging hardware, software, and vendors, including guidelines for system selection. (MES)
Effect of Acting Experience on Emotion Expression and Recognition in Voice: Non-Actors Provide Better Stimuli than Expected.

PubMed

Jürgens, Rebecca; Grass, Annika; Drolet, Matthis; Fischer, Julia

Both in the performative arts and in emotion research, professional actors are assumed to be capable of delivering emotions comparable to spontaneous emotional expressions. This study examines the effects of acting training on vocal emotion depiction and recognition. We predicted that professional actors express emotions in a more realistic fashion than non-professional actors. However, professional acting training may lead to a particular speech pattern; this might account for vocal expressions by actors that are less comparable to authentic samples than the ones by non-professional actors. We compared 80 emotional speech tokens from radio interviews with 80 re-enactments by professional and inexperienced actors, respectively. We analyzed recognition accuracies for emotion and authenticity ratings and compared the acoustic structure of the speech tokens. Both play-acted conditions yielded similar recognition accuracies and possessed more variable pitch contours than the spontaneous recordings. However, professional actors exhibited signs of different articulation patterns compared to non-trained speakers. Our results indicate that for emotion research, emotional expressions by professional actors are not better suited than those from non-actors.
Automatic voice recognition using traditional and artificial neural network approaches

NASA Technical Reports Server (NTRS)

Botros, Nazeih M.

1989-01-01

The main objective of this research is to develop an algorithm for isolated-word recognition. This research is focused on digital signal analysis rather than linguistic analysis of speech. Features extraction is carried out by applying a Linear Predictive Coding (LPC) algorithm with order of 10. Continuous-word and speaker independent recognition will be considered in future study after accomplishing this isolated word research. To examine the similarity between the reference and the training sets, two approaches are explored. The first is implementing traditional pattern recognition techniques where a dynamic time warping algorithm is applied to align the two sets and calculate the probability of matching by measuring the Euclidean distance between the two sets. The second is implementing a backpropagation artificial neural net model with three layers as the pattern classifier. The adaptation rule implemented in this network is the generalized least mean square (LMS) rule. The first approach has been accomplished. A vocabulary of 50 words was selected and tested. The accuracy of the algorithm was found to be around 85 percent. The second approach is in progress at the present time.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.