voice recognition system: Topics by Science.gov

Sample records for voice recognition system

The Voice as Computer Interface: A Look at Tomorrow's Technologies.

ERIC Educational Resources Information Center

Lange, Holley R.

1991-01-01

Discussion of voice as the communications device for computer-human interaction focuses on voice recognition systems for use within a library environment. Voice technologies are described, including voice response and voice recognition; examples of voice systems in use in libraries are examined; and further possibilities, including use with…
Literature review of voice recognition and generation technology for Army helicopter applications

NASA Astrophysics Data System (ADS)

Christ, K. A.

1984-08-01

This report is a literature review on the topics of voice recognition and generation. Areas covered are: manual versus vocal data input, vocabulary, stress and workload, noise, protective masks, feedback, and voice warning systems. Results of the studies presented in this report indicate that voice data entry has less of an impact on a pilot's flight performance, during low-level flying and other difficult missions, than manual data entry. However, the stress resulting from such missions may cause the pilot's voice to change, reducing the recognition accuracy of the system. The noise present in helicopter cockpits also causes the recognition accuracy to decrease. Noise-cancelling devices are being developed and improved upon to increase the recognition performance in noisy environments. Future research in the fields of voice recognition and generation should be conducted in the areas of stress and workload, vocabulary, and the types of voice generation best suited for the helicopter cockpit. Also, specific tasks should be studied to determine whether voice recognition and generation can be effectively applied.
Effects of emotional and perceptual-motor stress on a voice recognition system's accuracy: An applied investigation

NASA Astrophysics Data System (ADS)

Poock, G. K.; Martin, B. J.

1984-02-01

This was an applied investigation examining the ability of a speech recognition system to recognize speakers' inputs when the speakers were under different stress levels. Subjects were asked to speak to a voice recognition system under three conditions: (1) normal office environment, (2) emotional stress, and (3) perceptual-motor stress. Results indicate a definite relationship between voice recognition system performance and the type of low stress reference patterns used to achieve recognition.
Evaluation of a voice recognition system for the MOTAS pseudo pilot station function

NASA Technical Reports Server (NTRS)

Houck, J. A.

1982-01-01

The Langley Research Center has undertaken a technology development activity to provide a capability, the mission oriented terminal area simulation (MOTAS), wherein terminal area and aircraft systems studies can be performed. An experiment was conducted to evaluate state-of-the-art voice recognition technology and specifically, the Threshold 600 voice recognition system to serve as an aircraft control input device for the MOTAS pseudo pilot station function. The results of the experiment using ten subjects showed a recognition error of 3.67 percent for a 48-word vocabulary tested against a programmed vocabulary of 103 words. After the ten subjects retrained the Threshold 600 system for the words which were misrecognized or rejected, the recognition error decreased to 1.96 percent. The rejection rates for both cases were less than 0.70 percent. Based on the results of the experiment, voice recognition technology and specifically the Threshold 600 voice recognition system were chosen to fulfill this MOTAS function.
Pilot study on the feasibility of a computerized speech recognition charting system.

PubMed

Feldman, C A; Stevens, D

1990-08-01

The objective of this study was to determine the feasibility of developing and using a voice recognition computerized charting system to record dental clinical examination data. More specifically, the study was designed to analyze the time and error differential between the traditional examiner/recorder method (ASSISTANT) and computerized voice recognition method (VOICE). DMFS examinations were performed twice on 20 patients using the traditional ASSISTANT and the VOICE charting system. A statistically significant difference was found when comparing the mean ASSISTANT time of 2.69 min to the VOICE time of 3.72 min (P less than 0.001). No statistically significant difference was found when comparing the mean ASSISTANT recording errors of 0.1 to VOICE recording errors of 0.6 (P = 0.059). 90% of the patients indicated they felt comfortable with the dentist talking to a computer and only 5% of the sample indicated they opposed VOICE. Results from this pilot study indicate that a charting system utilizing voice recognition technology could be considered a viable alternative to traditional examiner/recorder methods of clinical charting.
Impact of a voice recognition system on report cycle time and radiologist reading time

NASA Astrophysics Data System (ADS)

Melson, David L.; Brophy, Robert; Blaine, G. James; Jost, R. Gilbert; Brink, Gary S.

1998-07-01

Because of its exciting potential to improve clinical service, as well as reduce costs, a voice recognition system for radiological dictation was recently installed at our institution. This system will be clinically successful if it dramatically reduces radiology report turnaround time without substantially affecting radiologist dictation and editing time. This report summarizes an observer study currently under way in which radiologist reporting times using the traditional transcription system and the voice recognition system are compared. Four radiologists are observed interpreting portable intensive care unit (ICU) chest examinations at a workstation in the chest reading area. Data are recorded with the radiologists using the transcription system and using the voice recognition system. The measurements distinguish between time spent performing clerical tasks and time spent actually dictating the report. Editing time and the number of corrections made are recorded. Additionally, statistics are gathered to assess the voice recognition system's impact on the report cycle time -- the time from report dictation to availability of an edited and finalized report -- and the length of reports.
Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

NASA Astrophysics Data System (ADS)

White, R. W.; Parks, D. L.

1985-07-01

A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept.
Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

NASA Technical Reports Server (NTRS)

White, R. W.; Parks, D. L.

1985-01-01

A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept.
Motorcycle Start-stop System based on Intelligent Biometric Voice Recognition

NASA Astrophysics Data System (ADS)

Winda, A.; E Byan, W. R.; Sofyan; Armansyah; Zariantin, D. L.; Josep, B. G.

2017-03-01

Current mechanical key in the motorcycle is prone to bulgary, being stolen or misplaced. Intelligent biometric voice recognition as means to replace this mechanism is proposed as an alternative. The proposed system will decide whether the voice is belong to the user or not and the word utter by the user is ‘On’ or ‘Off’. The decision voice will be sent to Arduino in order to start or stop the engine. The recorded voice is processed in order to get some features which later be used as input to the proposed system. The Mel-Frequency Ceptral Coefficient (MFCC) is adopted as a feature extraction technique. The extracted feature is the used as input to the SVM-based identifier. Experimental results confirm the effectiveness of the proposed intelligent voice recognition and word recognition system. It show that the proposed method produces a good training and testing accuracy, 99.31% and 99.43%, respectively. Moreover, the proposed system shows the performance of false rejection rate (FRR) and false acceptance rate (FAR) accuracy of 0.18% and 17.58%, respectively. In the intelligent word recognition shows that the training and testing accuracy are 100% and 96.3%, respectively.
Voice Recognition Software Accuracy with Second Language Speakers of English.

ERIC Educational Resources Information Center

Coniam, D.

1999-01-01

Explores the potential of the use of voice-recognition technology with second-language speakers of English. Involves the analysis of the output produced by a small group of very competent second-language subjects reading a text into the voice recognition software Dragon Systems "Dragon NaturallySpeaking." (Author/VWL)
Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair

NASA Astrophysics Data System (ADS)

Sasou, Akira; Kojima, Hiroaki

2009-12-01

Conventional voice-driven wheelchairs usually employ headset microphones that are capable of achieving sufficient recognition accuracy, even in the presence of surrounding noise. However, such interfaces require users to wear sensors such as a headset microphone, which can be an impediment, especially for the hand disabled. Conversely, it is also well known that the speech recognition accuracy drastically degrades when the microphone is placed far from the user. In this paper, we develop a noise robust speech recognition system for a voice-driven wheelchair. This system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors. We verified the effectiveness of our system in experiments in different environments, and confirmed that our system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors.
Do What I Say! Voice Recognition Makes Major Advances.

ERIC Educational Resources Information Center

Ruley, C. Dorsey

1994-01-01

Explains voice recognition technology applications in the workplace, schools, and libraries. Highlights include a voice-controlled work station using the DragonDictate system that can be used with dyslexic students, converting text to speech, and converting speech to text. (LRW)
Writing with Voice: An Investigation of the Use of a Voice Recognition System as a Writing Aid for a Man with Aphasia

ERIC Educational Resources Information Center

Bruce, Carolyn; Edmundson, Anne; Coleman, Michael

2003-01-01

Background: People with aphasia may experience difficulties that prevent them from demonstrating in writing what they know and can produce orally. Voice recognition systems that allow the user to speak into a microphone and see their words appear on a computer screen have the potential to assist written communication. Aim: This study investigated…
Cost-sensitive learning for emotion robust speaker recognition.

PubMed

Li, Dongdong; Yang, Yingchun; Dai, Weihui

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Cost-Sensitive Learning for Emotion Robust Speaker Recognition

PubMed Central

Li, Dongdong; Yang, Yingchun

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492
DLMS Voice Data Entry.

DTIC Science & Technology

1980-06-01

34 LIST OF ILLUSTRATIONS FIGURE PAGE 1 Block Diagram of DLMS Voice Recognition System .............. S 2 Flowchart of DefaulV...particular are a speech preprocessor and a minicomputer. In the VRS, as shown in the block diagram of Fig. 1, the preprocessor is a TTI model 8040 and...Data General 6026 Magnetic Zo 4 Tape Unit Display L-- - Equipment Cabinet Fig. 1 block Diagram of DIMS Voice Recognition System qS 2. Flexible Disk
Generation of surgical pathology report using a 5,000-word speech recognizer.

PubMed

Tischler, A S; Martin, M R

1989-10-01

Pressures to decrease both turnaround time and operating costs simultaneously have placed conflicting demands on traditional forms of medical transcription. The new technology of voice recognition extends the promise of enabling the pathologist or other medical professional to dictate a correct report and have it printed and/or transmitted to a database immediately. The usefulness of voice recognition systems depends on several factors, including ease of use, reliability, speed, and accuracy. These in turn depend on the general underlying design of the systems and inclusion in the systems of a specific knowledge base appropriate for each application. Development of a good knowledge base requires close collaboration between a domain expert and a knowledge engineer with expertise in voice recognition. The authors have recently completed a knowledge base for surgical pathology using the Kurzweil VoiceReport 5,000-word system.
Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.

ERIC Educational Resources Information Center

Horn, Carin E.; Scott, Brian L.

A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…
``The perceptual bases of speaker identity'' revisited

NASA Astrophysics Data System (ADS)

Voiers, William D.

2003-10-01

A series of experiments begun 40 years ago [W. D. Voiers, J. Acoust. Soc. Am. 36, 1065-1073 (1964)] was concerned with identifying the perceived voice traits (PVTs) on which human recognition of voices depends. It culminated with the development of a voice taxonomy based on 20 PVTs and a set of highly reliable rating scales for classifying voices with respect to those PVTs. The development of a perceptual voice taxonomy was motivated by the need for a practical method of evaluating speaker recognizability in voice communication systems. The Diagnostic Speaker Recognition Test (DSRT) evaluates the effects of systems on speaker recognizability as reflected in changes in the inter-listener reliability of voice ratings on the 20 PVTs. The DSRT thus provides a qualitative, as well as quantitative, evaluation of the effects of a system on speaker recognizability. A fringe benefit of this project is PVT rating data for a sample of 680 voices. [Work partially supported by USAFRL.
Task-Oriented, Naturally Elicited Speech (TONE) Database for the Force Requirements Expert System, Hawaii (FRESH)

DTIC Science & Technology

1988-09-01

Group Subgroup Command and control; Computational linguistics; expert system voice recognition; man- machine interface; U.S. Government 19 Abstract...simulates the characteristics of FRESH on a smaller scale. This study assisted NOSC in developing a voice-recognition, man- machine interface that could...scale. This study assisted NOSC in developing a voice-recogni- tion, man- machine interface that could be used with TONE and upgraded at a later date

The recognition of female voice based on voice registers in singing techniques in real-time using hankel transform method and macdonald function

NASA Astrophysics Data System (ADS)

Meiyanti, R.; Subandi, A.; Fuqara, N.; Budiman, M. A.; Siahaan, A. P. U.

2018-03-01

A singer doesn’t just recite the lyrics of a song, but also with the use of particular sound techniques to make it more beautiful. In the singing technique, more female have a diverse sound registers than male. There are so many registers of the human voice, but the voice registers used while singing, among others, Chest Voice, Head Voice, Falsetto, and Vocal fry. Research of speech recognition based on the female’s voice registers in singing technique is built using Borland Delphi 7.0. Speech recognition process performed by the input recorded voice samples and also in real time. Voice input will result in weight energy values based on calculations using Hankel Transformation method and Macdonald Functions. The results showed that the accuracy of the system depends on the accuracy of sound engineering that trained and tested, and obtained an average percentage of the successful introduction of the voice registers record reached 48.75 percent, while the average percentage of the successful introduction of the voice registers in real time to reach 57 percent.
Practical applications of interactive voice technologies: Some accomplishments and prospects

NASA Technical Reports Server (NTRS)

Grady, Michael W.; Hicklin, M. B.; Porter, J. E.

1977-01-01

A technology assessment of the application of computers and electronics to complex systems is presented. Three existing systems which utilize voice technology (speech recognition and speech generation) are described. Future directions in voice technology are also described.
Voice tracking and spoken word recognition in the presence of other voices

NASA Astrophysics Data System (ADS)

Litong-Palima, Marisciel; Violanda, Renante; Saloma, Caesar

2004-12-01

We study the human hearing process by modeling the hair cell as a thresholded Hopf bifurcator and compare our calculations with experimental results involving human subjects in two different multi-source listening tasks of voice tracking and spoken-word recognition. In the model, we observed noise suppression by destructive interference between noise sources which weakens the effective noise strength acting on the hair cell. Different success rate characteristics were observed for the two tasks. Hair cell performance at low threshold levels agree well with results from voice-tracking experiments while those of word-recognition experiments are consistent with a linear model of the hearing process. The ability of humans to track a target voice is robust against cross-talk interference unlike word-recognition performance which deteriorates quickly with the number of uncorrelated noise sources in the environment which is a response behavior that is associated with linear systems.
Real-Time Reconfigurable Adaptive Speech Recognition Command and Control Apparatus and Method

NASA Technical Reports Server (NTRS)

Salazar, George A. (Inventor); Haynes, Dena S. (Inventor); Sommers, Marc J. (Inventor)

1998-01-01

An adaptive speech recognition and control system and method for controlling various mechanisms and systems in response to spoken instructions and in which spoken commands are effective to direct the system into appropriate memory nodes, and to respective appropriate memory templates corresponding to the voiced command is discussed. Spoken commands from any of a group of operators for which the system is trained may be identified, and voice templates are updated as required in response to changes in pronunciation and voice characteristics over time of any of the operators for which the system is trained. Provisions are made for both near-real-time retraining of the system with respect to individual terms which are determined not be positively identified, and for an overall system training and updating process in which recognition of each command and vocabulary term is checked, and in which the memory templates are retrained if necessary for respective commands or vocabulary terms with respect to an operator currently using the system. In one embodiment, the system includes input circuitry connected to a microphone and including signal processing and control sections for sensing the level of vocabulary recognition over a given period and, if recognition performance falls below a given level, processing audio-derived signals for enhancing recognition performance of the system.
[Computed assisted voice recognition. A dream or reality in the pathologist's routine work?].

PubMed

Delling, G; Delling, D

1999-03-01

During the last 30 years the analysis of human speech with powerful computers has taken great strides; therefore, cost-effective, comfortable solutions are now available for use in professional routine work. The advantages of using voice recognition are the creation of new documentation or archives, reduced personnel costs and, last but not least, independence in cases of unforeseen notification of illness or owing to annual leave. For voice recognition systems to be used easily, a considerable amount of time must be invested for the first 3 months. Younger colleagues in particular will be more motivated to dictate more precisely and more detailed because of the introduction of voice recognition. The effects on other sectors of medical training, quality control, histology report preparation, and transmission can only be speculated.
When the face fits: recognition of celebrities from matching and mismatching faces and voices.

PubMed

Stevenage, Sarah V; Neil, Greg J; Hamlin, Iain

2014-01-01

The results of two experiments are presented in which participants engaged in a face-recognition or a voice-recognition task. The stimuli were face-voice pairs in which the face and voice were co-presented and were either "matched" (same person), "related" (two highly associated people), or "mismatched" (two unrelated people). Analysis in both experiments confirmed that accuracy and confidence in face recognition was consistently high regardless of the identity of the accompanying voice. However accuracy of voice recognition was increasingly affected as the relationship between voice and accompanying face declined. Moreover, when considering self-reported confidence in voice recognition, confidence remained high for correct responses despite the proportion of these responses declining across conditions. These results converged with existing evidence indicating the vulnerability of voice recognition as a relatively weak signaller of identity, and results are discussed in the context of a person-recognition framework.
Superior voice recognition in a patient with acquired prosopagnosia and object agnosia.

PubMed

Hoover, Adria E N; Démonet, Jean-François; Steeves, Jennifer K E

2010-11-01

Anecdotally, it has been reported that individuals with acquired prosopagnosia compensate for their inability to recognize faces by using other person identity cues such as hair, gait or the voice. Are they therefore superior at the use of non-face cues, specifically voices, to person identity? Here, we empirically measure person and object identity recognition in a patient with acquired prosopagnosia and object agnosia. We quantify person identity (face and voice) and object identity (car and horn) recognition for visual, auditory, and bimodal (visual and auditory) stimuli. The patient is unable to recognize faces or cars, consistent with his prosopagnosia and object agnosia, respectively. He is perfectly able to recognize people's voices and car horns and bimodal stimuli. These data show a reverse shift in the typical weighting of visual over auditory information for audiovisual stimuli in a compromised visual recognition system. Moreover, the patient shows selectively superior voice recognition compared to the controls revealing that two different stimulus domains, persons and objects, may not be equally affected by sensory adaptation effects. This also implies that person and object identity recognition are processed in separate pathways. These data demonstrate that an individual with acquired prosopagnosia and object agnosia can compensate for the visual impairment and become quite skilled at using spared aspects of sensory processing. In the case of acquired prosopagnosia it is advantageous to develop a superior use of voices for person identity recognition in everyday life. Copyright © 2010 Elsevier Ltd. All rights reserved.
Obligatory and facultative brain regions for voice-identity recognition

PubMed Central

Roswandowitz, Claudia; Kappes, Claudia; Obrig, Hellmuth; von Kriegstein, Katharina

2018-01-01

Abstract Recognizing the identity of others by their voice is an important skill for social interactions. To date, it remains controversial which parts of the brain are critical structures for this skill. Based on neuroimaging findings, standard models of person-identity recognition suggest that the right temporal lobe is the hub for voice-identity recognition. Neuropsychological case studies, however, reported selective deficits of voice-identity recognition in patients predominantly with right inferior parietal lobe lesions. Here, our aim was to work towards resolving the discrepancy between neuroimaging studies and neuropsychological case studies to find out which brain structures are critical for voice-identity recognition in humans. We performed a voxel-based lesion-behaviour mapping study in a cohort of patients (n = 58) with unilateral focal brain lesions. The study included a comprehensive behavioural test battery on voice-identity recognition of newly learned (voice-name, voice-face association learning) and familiar voices (famous voice recognition) as well as visual (face-identity recognition) and acoustic control tests (vocal-pitch and vocal-timbre discrimination). The study also comprised clinically established tests (neuropsychological assessment, audiometry) and high-resolution structural brain images. The three key findings were: (i) a strong association between voice-identity recognition performance and right posterior/mid temporal and right inferior parietal lobe lesions; (ii) a selective association between right posterior/mid temporal lobe lesions and voice-identity recognition performance when face-identity recognition performance was factored out; and (iii) an association of right inferior parietal lobe lesions with tasks requiring the association between voices and faces but not voices and names. The results imply that the right posterior/mid temporal lobe is an obligatory structure for voice-identity recognition, while the inferior parietal lobe is only a facultative component of voice-identity recognition in situations where additional face-identity processing is required. PMID:29228111
Obligatory and facultative brain regions for voice-identity recognition.

PubMed

Roswandowitz, Claudia; Kappes, Claudia; Obrig, Hellmuth; von Kriegstein, Katharina

2018-01-01

Recognizing the identity of others by their voice is an important skill for social interactions. To date, it remains controversial which parts of the brain are critical structures for this skill. Based on neuroimaging findings, standard models of person-identity recognition suggest that the right temporal lobe is the hub for voice-identity recognition. Neuropsychological case studies, however, reported selective deficits of voice-identity recognition in patients predominantly with right inferior parietal lobe lesions. Here, our aim was to work towards resolving the discrepancy between neuroimaging studies and neuropsychological case studies to find out which brain structures are critical for voice-identity recognition in humans. We performed a voxel-based lesion-behaviour mapping study in a cohort of patients (n = 58) with unilateral focal brain lesions. The study included a comprehensive behavioural test battery on voice-identity recognition of newly learned (voice-name, voice-face association learning) and familiar voices (famous voice recognition) as well as visual (face-identity recognition) and acoustic control tests (vocal-pitch and vocal-timbre discrimination). The study also comprised clinically established tests (neuropsychological assessment, audiometry) and high-resolution structural brain images. The three key findings were: (i) a strong association between voice-identity recognition performance and right posterior/mid temporal and right inferior parietal lobe lesions; (ii) a selective association between right posterior/mid temporal lobe lesions and voice-identity recognition performance when face-identity recognition performance was factored out; and (iii) an association of right inferior parietal lobe lesions with tasks requiring the association between voices and faces but not voices and names. The results imply that the right posterior/mid temporal lobe is an obligatory structure for voice-identity recognition, while the inferior parietal lobe is only a facultative component of voice-identity recognition in situations where additional face-identity processing is required. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain.
An automatic speech recognition system with speaker-independent identification support

NASA Astrophysics Data System (ADS)

Caranica, Alexandru; Burileanu, Corneliu

2015-02-01

The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.
Remote voice training: A case study on space shuttle applications, appendix C

NASA Technical Reports Server (NTRS)

Mollakarimi, Cindy; Hamid, Tamin

1990-01-01

The Tile Automation System includes applications of automation and robotics technology to all aspects of the Shuttle tile processing and inspection system. An integrated set of rapid prototyping testbeds was developed which include speech recognition and synthesis, laser imaging systems, distributed Ada programming environments, distributed relational data base architectures, distributed computer network architectures, multi-media workbenches, and human factors considerations. Remote voice training in the Tile Automation System is discussed. The user is prompted over a headset by synthesized speech for the training sequences. The voice recognition units and the voice output units are remote from the user and are connected by Ethernet to the main computer system. A supervisory channel is used to monitor the training sequences. Discussions include the training approaches as well as the human factors problems and solutions for this system utilizing remote training techniques.
[Research on Control System of an Exoskeleton Upper-limb Rehabilitation Robot].

PubMed

Wang, Lulu; Hu, Xin; Hu, Jie; Fang, Youfang; He, Rongrong; Yu, Hongliu

2016-12-01

In order to help the patients with upper-limb disfunction go on rehabilitation training,this paper proposed an upper-limb exoskeleton rehabilitation robot with four degrees of freedom(DOF),and realized two control schemes,i.e.,voice control and electromyography control.The hardware and software design of the voice control system was completed based on RSC-4128 chips,which realized the speech recognition technology of a specific person.Besides,this study adapted self-made surface eletromyogram(sEMG)signal extraction electrodes to collect sEMG signals and realized pattern recognition by conducting sEMG signals processing,extracting time domain features and fixed threshold algorithm.In addition,the pulse-width modulation(PWM)algorithm was used to realize the speed adjustment of the system.Voice control and electromyography control experiments were then carried out,and the results showed that the mean recognition rate of the voice control and electromyography control reached 93.1%and 90.9%,respectively.The results proved the feasibility of the control system.This study is expected to lay a theoretical foundation for the further improvement of the control system of the upper-limb rehabilitation robot.
Robotics control using isolated word recognition of voice input

NASA Technical Reports Server (NTRS)

Weiner, J. M.

1977-01-01

A speech input/output system is presented that can be used to communicate with a task oriented system. Human speech commands and synthesized voice output extend conventional information exchange capabilities between man and machine by utilizing audio input and output channels. The speech input facility is comprised of a hardware feature extractor and a microprocessor implemented isolated word or phrase recognition system. The recognizer offers a medium sized (100 commands), syntactically constrained vocabulary, and exhibits close to real time performance. The major portion of the recognition processing required is accomplished through software, minimizing the complexity of the hardware feature extractor.
STS-41 Voice Command System Flight Experiment Report

NASA Technical Reports Server (NTRS)

Salazar, George A.

1981-01-01

This report presents the results of the Voice Command System (VCS) flight experiment on the five-day STS-41 mission. Two mission specialists,Bill Shepherd and Bruce Melnick, used the speaker-dependent system to evaluate the operational effectiveness of using voice to control a spacecraft system. In addition, data was gathered to analyze the effects of microgravity on speech recognition performance.
Voice recognition software for clinical use.

PubMed

Korn, K

1998-11-01

The current generation voice recognition products truly offer the promise of voice recognition systems, that are financially and operationally acceptable for use in a health care facility. Although the initial capital outlay for the purchase of such equipment may be substantial, the long-term benefit is felt to outweigh the expense. The ability to utilize computer equipment for educational purposes and information management alone helps to rationalize the cost. In addition, it is important to remember that the Internet has become a substantial source of information which provides another functional use for this equipment. Although one can readily see the implication for such a program in clinical practice, other uses for the program should not be overlooked. Uses far beyond the writing of clinic notes and correspondence can be easily envisioned. Utilization of voice recognition software offers clinical practices the ability to produce quality printed records in a timely and cost-effective manner. After learning procedures for the selected product and appropriately formatting word processing software and printers, printed progress notes should be able to be produced in less time than traditional dictation and transcription methods. Although certain procedures and practices may need to be altered, or may preclude optimal utilization of this type of system, many advantages are apparent. It is recommended that facilities consider utilization of Voice Recognition products such as Dragon Systems Naturally Speaking Software, or at least consider a trial of this method with one of the limited-feature products, if current dictation practices are unsatisfactory or excessively costly. Free downloadable trial software or single user software can provide a reduced-cost method for trial evaluation of such products if a major commitment is not felt to be desired. A list of voice recognition software manufacturer web sites may be accessed through the following: http://www.dragonsys.com/ http://www.software.ibm/com/is/voicetype/ http://www.lhs.com/
Evaluation of Speech Recognition of Cochlear Implant Recipients Using Adaptive, Digital Remote Microphone Technology and a Speech Enhancement Sound Processing Algorithm.

PubMed

Wolfe, Jace; Morais, Mila; Schafer, Erin; Agrawal, Smita; Koch, Dawn

2015-05-01

Cochlear implant recipients often experience difficulty with understanding speech in the presence of noise. Cochlear implant manufacturers have developed sound processing algorithms designed to improve speech recognition in noise, and research has shown these technologies to be effective. Remote microphone technology utilizing adaptive, digital wireless radio transmission has also been shown to provide significant improvement in speech recognition in noise. There are no studies examining the potential improvement in speech recognition in noise when these two technologies are used simultaneously. The goal of this study was to evaluate the potential benefits and limitations associated with the simultaneous use of a sound processing algorithm designed to improve performance in noise (Advanced Bionics ClearVoice) and a remote microphone system that incorporates adaptive, digital wireless radio transmission (Phonak Roger). A two-by-two way repeated measures design was used to examine performance differences obtained without these technologies compared to the use of each technology separately as well as the simultaneous use of both technologies. Eleven Advanced Bionics (AB) cochlear implant recipients, ages 11 to 68 yr. AzBio sentence recognition was measured in quiet and in the presence of classroom noise ranging in level from 50 to 80 dBA in 5-dB steps. Performance was evaluated in four conditions: (1) No ClearVoice and no Roger, (2) ClearVoice enabled without the use of Roger, (3) ClearVoice disabled with Roger enabled, and (4) simultaneous use of ClearVoice and Roger. Speech recognition in quiet was better than speech recognition in noise for all conditions. Use of ClearVoice and Roger each provided significant improvement in speech recognition in noise. The best performance in noise was obtained with the simultaneous use of ClearVoice and Roger. ClearVoice and Roger technology each improves speech recognition in noise, particularly when used at the same time. Because ClearVoice does not degrade performance in quiet settings, clinicians should consider recommending ClearVoice for routine, full-time use for AB implant recipients. Roger should be used in all instances in which remote microphone technology may assist the user in understanding speech in the presence of noise. American Academy of Audiology.
Temporal voice areas exist in autism spectrum disorder but are dysfunctional for voice identity recognition

PubMed Central

Borowiak, Kamila; von Kriegstein, Katharina

2016-01-01

The ability to recognise the identity of others is a key requirement for successful communication. Brain regions that respond selectively to voices exist in humans from early infancy on. Currently, it is unclear whether dysfunction of these voice-sensitive regions can explain voice identity recognition impairments. Here, we used two independent functional magnetic resonance imaging studies to investigate voice processing in a population that has been reported to have no voice-sensitive regions: autism spectrum disorder (ASD). Our results refute the earlier report that individuals with ASD have no responses in voice-sensitive regions: Passive listening to vocal, compared to non-vocal, sounds elicited typical responses in voice-sensitive regions in the high-functioning ASD group and controls. In contrast, the ASD group had a dysfunction in voice-sensitive regions during voice identity but not speech recognition in the right posterior superior temporal sulcus/gyrus (STS/STG)—a region implicated in processing complex spectrotemporal voice features and unfamiliar voices. The right anterior STS/STG correlated with voice identity recognition performance in controls but not in the ASD group. The findings suggest that right STS/STG dysfunction is critical for explaining voice recognition impairments in high-functioning ASD and show that ASD is not characterised by a general lack of voice-sensitive responses. PMID:27369067
Implementation of the Intelligent Voice System for Kazakh

NASA Astrophysics Data System (ADS)

Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

2014-04-01

Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.
Recognition of voice commands using adaptation of foreign language speech recognizer via selection of phonetic transcriptions

NASA Astrophysics Data System (ADS)

Maskeliunas, Rytis; Rudzionis, Vytautas

2011-06-01

In recent years various commercial speech recognizers have become available. These recognizers provide the possibility to develop applications incorporating various speech recognition techniques easily and quickly. All of these commercial recognizers are typically targeted to widely spoken languages having large market potential; however, it may be possible to adapt available commercial recognizers for use in environments where less widely spoken languages are used. Since most commercial recognition engines are closed systems the single avenue for the adaptation is to try set ways for the selection of proper phonetic transcription methods between the two languages. This paper deals with the methods to find the phonetic transcriptions for Lithuanian voice commands to be recognized using English speech engines. The experimental evaluation showed that it is possible to find phonetic transcriptions that will enable the recognition of Lithuanian voice commands with recognition accuracy of over 90%.
Project planning, training, measurement and sustainment: the successful implementation of voice recognition.

PubMed

Antiles, S; Couris, J; Schweitzer, A; Rosenthal, D; Da Silva, R Q

2000-01-01

Computerized voice recognition systems (VR) can reduce costs and enhance service. The capital outlay required for conversion to a VR system is significant; therefore, it is incumbent on radiology departments to provide cost and service justifications to administrators. Massachusetts General Hospital (MGH) in Boston implemented VR over a two-year period and achieved annual savings of $530,000 and a 50% decrease in report throughput. Those accomplishments required solid planning and implementation strategies, training and sustainment programs. This article walks through the process, step by step, in the hope of providing a tool set for future implementations. Because VR has dramatic implications for workflow, a solid operational plan is needed when assessing vendors and planning for implementation. The goals for implementation should be to minimize operational disruptions and capitalize on efficiencies of the technology. Senior leadership--the department chair or vice-chair--must select the goals to be accomplished and oversee, manage and direct the VR initiative. The importance of this point cannot be overstated, since implementation will require behavior changes from radiologists and others who may not perceive any personal benefits. Training is the pivotal factor affecting the success of voice recognition, and practice is the only way for radiologists to enhance their skills. Through practice, radiologists will discover shortcuts, and their speed and comfort will improve. Measurement and data analysis are critical to changing and improving the voice recognition application and are vital to decision-making. Some of the issues about which valuable date can be collected are technical and educational problems, VR penetration, report turnaround time and annual cost savings. Sustained effort is indispensable to the maintenance of voice recognition. Finally, all efforts made and gains achieved may prove to be futile without ongoing sustainment of the system through retraining, education and technical support.

V2S: Voice to Sign Language Translation System for Malaysian Deaf People

NASA Astrophysics Data System (ADS)

Mean Foong, Oi; Low, Tang Jung; La, Wai Wan

The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.
A Voice Enabled Procedure Browser for the International Space Station

NASA Technical Reports Server (NTRS)

Rayner, Manny; Chatzichrisafis, Nikos; Hockey, Beth Ann; Farrell, Kim; Renders, Jean-Michel

2005-01-01

Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station (ISS), is to the best of our knowledge the first spoken dialog system in space. This paper gives background on the system and the ISS procedures, then discusses the research developed to address three key problems: grammar-based speech recognition using the Regulus toolkit; SVM based methods for open microphone speech recognition; and robust side-effect free dialogue management for handling undos, corrections and confirmations.
Military applications of automatic speech recognition and future requirements

NASA Technical Reports Server (NTRS)

Beek, Bruno; Cupples, Edward J.

1977-01-01

An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.
Biometric Fusion Demonstration System Scientific Report

DTIC Science & Technology

2004-03-01

verification and facial recognition , searching watchlist databases comprised of full or partial facial images or voice recordings. Multiple-biometric...17 2.2.1.1 Fingerprint and Facial Recognition ............................... 17...iv DRDC Ottawa CR 2004 – 056 2.2.1.2 Iris Recognition and Facial Recognition ........................ 18
DTO-675: Voice Control of the Closed Circuit Television System

NASA Technical Reports Server (NTRS)

Salazar, George; Gaston, Darilyn M.; Haynes, Dena S.

1996-01-01

This report presents the results of the Detail Test Object (DTO)-675 "Voice Control of the Closed Circuit Television (CCTV)" system. The DTO is a follow-on flight of the Voice Command System (VCS) that flew as a secondary payload on STS-41. Several design changes were made to the VCS for the STS-78 mission. This report discusses those design changes, the data collected during the mission, recognition problems encountered, and findings.
Evaluating a voice recognition system: finding the right product for your department.

PubMed

Freeh, M; Dewey, M; Brigham, L

2001-06-01

The Department of Radiology at the University of Utah Health Sciences Center has been in the process of transitioning from the traditional film-based department to a digital imaging department for the past 2 years. The department is now transitioning from the traditional method of dictating reports (dictation by radiologist to transcription to review and signing by radiologist) to a voice recognition system. The transition to digital operations will not be complete until we have the ability to directly interface the dictation process with the image review process. Voice recognition technology has advanced to the level where it can and should be an integral part of the new way of working in radiology and is an integral part of an efficient digital imaging department. The transition to voice recognition requires the task of identifying the product and the company that will best meet a department's needs. This report introduces the methods we used to evaluate the vendors and the products available as we made our purchasing decision. We discuss our evaluation method and provide a checklist that can be used by other departments to assist with their evaluation process. The criteria used in the evaluation process fall into the following major categories: user operations, technical infrastructure, medical dictionary, system interfaces, service support, cost, and company strength. Conclusions drawn from our evaluation process will be detailed, with the intention being to shorten the process for others as they embark on a similar venture. As more and more organizations investigate the many products and services that are now being offered to enhance the operations of a radiology department, it becomes increasingly important that solid methods are used to most effectively evaluate the new products. This report should help others complete the task of evaluating a voice recognition system and may be adaptable to other products as well.
Voice Recognition in Face-Blind Patients

PubMed Central

Liu, Ran R.; Pancaroglu, Raika; Hills, Charlotte S.; Duchaine, Brad; Barton, Jason J. S.

2016-01-01

Right or bilateral anterior temporal damage can impair face recognition, but whether this is an associative variant of prosopagnosia or part of a multimodal disorder of person recognition is an unsettled question, with implications for cognitive and neuroanatomic models of person recognition. We assessed voice perception and short-term recognition of recently heard voices in 10 subjects with impaired face recognition acquired after cerebral lesions. All 4 subjects with apperceptive prosopagnosia due to lesions limited to fusiform cortex had intact voice discrimination and recognition. One subject with bilateral fusiform and anterior temporal lesions had a combined apperceptive prosopagnosia and apperceptive phonagnosia, the first such described case. Deficits indicating a multimodal syndrome of person recognition were found only in 2 subjects with bilateral anterior temporal lesions. All 3 subjects with right anterior temporal lesions had normal voice perception and recognition, 2 of whom performed normally on perceptual discrimination of faces. This confirms that such lesions can cause a modality-specific associative prosopagnosia. PMID:25349193
A self-teaching image processing and voice-recognition-based, intelligent and interactive system to educate visually impaired children

NASA Astrophysics Data System (ADS)

Iqbal, Asim; Farooq, Umar; Mahmood, Hassan; Asad, Muhammad Usman; Khan, Akrama; Atiq, Hafiz Muhammad

2010-02-01

A self teaching image processing and voice recognition based system is developed to educate visually impaired children, chiefly in their primary education. System comprises of a computer, a vision camera, an ear speaker and a microphone. Camera, attached with the computer system is mounted on the ceiling opposite (on the required angle) to the desk on which the book is placed. Sample images and voices in the form of instructions and commands of English, Urdu alphabets, Numeric Digits, Operators and Shapes are already stored in the database. A blind child first reads the embossed character (object) with the help of fingers than he speaks the answer, name of the character, shape etc into the microphone. With the voice command of a blind child received by the microphone, image is taken by the camera which is processed by MATLAB® program developed with the help of Image Acquisition and Image processing toolbox and generates a response or required set of instructions to child via ear speaker, resulting in self education of a visually impaired child. Speech recognition program is also developed in MATLAB® with the help of Data Acquisition and Signal Processing toolbox which records and process the command of the blind child.
Scientific bases of human-machine communication by voice.

PubMed Central

Schafer, R W

1995-01-01

The scientific bases for human-machine communication by voice are in the fields of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology. The purpose of this paper is to highlight the basic scientific and technological issues in human-machine communication by voice and to point out areas of future research opportunity. The discussion is organized around the following major issues in implementing human-machine voice communication systems: (i) hardware/software implementation of the system, (ii) speech synthesis for voice output, (iii) speech recognition and understanding for voice input, and (iv) usability factors related to how humans interact with machines. PMID:7479802
Artificially intelligent recognition of Arabic speaker using voice print-based local features

NASA Astrophysics Data System (ADS)

Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz

2016-11-01

Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.
Voice Interactive Analysis System Study. Final Report, August 28, 1978 through March 23, 1979.

ERIC Educational Resources Information Center

Harry, D. P.; And Others

The Voice Interactive Analysis System study continued research and development of the LISTEN real-time, minicomputer based connected speech recognition system, within NAVTRAEQUIPCEN'S program of developing automatic speech technology in support of training. An attempt was made to identify the most effective features detected by the TTI-500 model…
Famous faces and voices: Differential profiles in early right and left semantic dementia and in Alzheimer's disease.

PubMed

Luzzi, Simona; Baldinelli, Sara; Ranaldi, Valentina; Fabi, Katia; Cafazzo, Viviana; Fringuelli, Fabio; Silvestrini, Mauro; Provinciali, Leandro; Reverberi, Carlo; Gainotti, Guido

2017-01-08

Famous face and voice recognition is reported to be impaired both in semantic dementia (SD) and in Alzheimer's Disease (AD), although more severely in the former. In AD a coexistence of perceptual impairment in face and voice processing has also been reported and this could contribute to the altered performance in complex semantic tasks. On the other hand, in SD both face and voice recognition disorders could be related to the prevalence of atrophy in the right temporal lobe (RTL). The aim of the present study was twofold: (1) to investigate famous faces and voices recognition in SD and AD to verify if the two diseases show a differential pattern of impairment, resulting from disruption of different cognitive mechanisms; (2) to check if face and voice recognition disorders prevail in patients with atrophy mainly affecting the RTL. To avoid the potential influence of primary perceptual problems in face and voice recognition, a pool of patients suffering from early SD and AD were administered a detailed set of tests exploring face and voice perception. Thirteen SD (8 with prevalence of right and 5 with prevalence of left temporal atrophy) and 25 CE patients, who did not show visual and auditory perceptual impairment, were finally selected and were administered an experimental battery exploring famous face and voice recognition and naming. Twelve SD patients underwent cerebral PET imaging and were classified in right and left SD according to the onset modality and to the prevalent decrease in FDG uptake in right or left temporal lobe respectively. Correlation of PET imaging and famous face and voice recognition was performed. Results showed a differential performance profile in the two diseases, because AD patients were significantly impaired in the naming tests, but showed preserved recognition, whereas SD patients were profoundly impaired both in naming and in recognition of famous faces and voices. Furthermore, face and voice recognition disorders prevailed in SD patients with RTL atrophy, who also showed a conceptual impairment on the Pyramids and Palm Trees test more important in the pictorial than in the verbal modality. Finally, in 12SD patients in whom PET was available, a strong correlation between FDG uptake and face-to-name and voice-to-name matching data was found in the right but not in the left temporal lobe. The data support the hypothesis of a different cognitive basis for impairment of face and voice recognition in the two dementias and suggest that the pattern of impairment in SD may be due to a loss of semantic representations, while a defect of semantic control, with impaired naming and preserved recognition might be hypothesized in AD. Furthermore, the correlation between face and voice recognition disorders and RTL damage are consistent with the hypothesis assuming that in the RTL person-specific knowledge may be mainly based upon non-verbal representations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Voice recognition products-an occupational risk for users with ULDs?

PubMed

Williams, N R

2003-10-01

Voice recognition systems (VRS) allow speech to be converted both directly into text-which appears on the screen of a computer-and to direct equipment to perform specific functions. Suggested applications are many and varied, including increasing efficiency in the reporting of radiographs, allowing directed surgery and enabling individuals with upper limb disorders (ULDs) who cannot use other input devices, such as keyboards and mice, to carry out word processing and other activities. Aim This paper describes four cases of vocal dysfunction related to the use of such software, which have been identified from the database of the Voice and Speech Laboratory of the Massachusetts Eye and Ear infirmary (MEEI). The database was searched using key words 'voice recognition' and four cases were identified from a total of 4800. In all cases, the VRS was supplied to assist individuals with ULDs who could not use conventional input devices. Case reports illustrate time of onset and symptoms experienced. The cases illustrate the need for risk assessment and consideration of the ergonomic aspects of voice use prior to such adaptations being used, particularly in those who already experience work-related ULDs.
An estimate of the prevalence of developmental phonagnosia.

PubMed

Shilowich, Bryan E; Biederman, Irving

2016-08-01

A web-based survey estimated the distribution of voice recognition abilities with a focus on determining the prevalence of developmental phonagnosia, the inability to identify a familiar person based on their voice. Participants matched clips of 50 celebrity voices to 1-4 named headshots of celebrities whose voices they had previously rated for familiarity. Given a strong correlation between rated familiarity and recognition performance, a residual was calculated based on the average familiarity rating on each trial, which thus constituted each respondent's voice recognition ability that could not be accounted for by familiarity. 3.2% of the respondents (23 of 730 participants) had residual recognition scores 2.28 SDs below the mean (whereas 8 or 1.1% would have been expected from a normal distribution). They also judged whether they could imagine the voice of five familiar celebrities. Individuals who had difficulty in imagining voices were also generally below average in their accuracy of recognition. Copyright © 2016 Elsevier Inc. All rights reserved.
United States Homeland Security and National Biometric Identification

DTIC Science & Technology

2002-04-09

security number. Biometrics is the use of unique individual traits such as fingerprints, iris eye patterns, voice recognition, and facial recognition to...technology to control access onto their military bases using a Defense Manpower Management Command developed software application. FACIAL Facial recognition systems...installed facial recognition systems in conjunction with a series of 200 cameras to fight street crime and identify terrorists. The cameras, which are
Voice Recognition: A New Assessment Tool?

ERIC Educational Resources Information Center

Jones, Darla

2005-01-01

This article presents the results of a study conducted in Anchorage, Alaska, that evaluated the accuracy and efficiency of using voice recognition (VR) technology to collect oral reading fluency data for classroom-based assessments. The primary research question was as follows: Is voice recognition technology a valid and reliable alternative to…
Children's Recognition of Cartoon Voices.

ERIC Educational Resources Information Center

Spence, Melanie J.; Rollins, Pamela R.; Jerger, Susan

2002-01-01

A study examined developmental changes in talker recognition skills by assessing 72 children's (ages 3-5) recognition of 20 cartoon characters' voices. Four- and 5-year-old children recognized more of the voices than did 3-year-olds. All children were more accurate at recognizing more familiar characters than less familiar characters. (Contains…
It doesn't matter what you say: FMRI correlates of voice learning and recognition independent of speech content.

PubMed

Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal

2017-09-01

Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content. Copyright © 2017 Elsevier Ltd. All rights reserved.
Voice Based City Panic Button System

NASA Astrophysics Data System (ADS)

Febriansyah; Zainuddin, Zahir; Bachtiar Nappu, M.

2018-03-01

The development of voice activated panic button application aims to design faster early notification of hazardous condition in community to the nearest police by using speech as the detector where the current application still applies touch-combination on screen and use coordination of orders from control center then the early notification still takes longer time. The method used in this research was by using voice recognition as the user voice detection and haversine formula for the comparison of closest distance between the user and the police. This research was equipped with auto sms, which sent notification to the victim’s relatives, that was also integrated with Google Maps application (GMaps) as the map to the victim’s location. The results show that voice registration on the application reaches 100%, incident detection using speech recognition while the application is running is 94.67% in average, and the auto sms to the victim relatives reaches 100%.
Benefits for Voice Learning Caused by Concurrent Faces Develop over Time.

PubMed

Zäske, Romi; Mühl, Constanze; Schweinberger, Stefan R

2015-01-01

Recognition of personally familiar voices benefits from the concurrent presentation of the corresponding speakers' faces. This effect of audiovisual integration is most pronounced for voices combined with dynamic articulating faces. However, it is unclear if learning unfamiliar voices also benefits from audiovisual face-voice integration or, alternatively, is hampered by attentional capture of faces, i.e., "face-overshadowing". In six study-test cycles we compared the recognition of newly-learned voices following unimodal voice learning vs. bimodal face-voice learning with either static (Exp. 1) or dynamic articulating faces (Exp. 2). Voice recognition accuracies significantly increased for bimodal learning across study-test cycles while remaining stable for unimodal learning, as reflected in numerical costs of bimodal relative to unimodal voice learning in the first two study-test cycles and benefits in the last two cycles. This was independent of whether faces were static images (Exp. 1) or dynamic videos (Exp. 2). In both experiments, slower reaction times to voices previously studied with faces compared to voices only may result from visual search for faces during memory retrieval. A general decrease of reaction times across study-test cycles suggests facilitated recognition with more speaker repetitions. Overall, our data suggest two simultaneous and opposing mechanisms during bimodal face-voice learning: while attentional capture of faces may initially impede voice learning, audiovisual integration may facilitate it thereafter.

Implicit multisensory associations influence voice recognition.

PubMed

von Kriegstein, Katharina; Giraud, Anne-Lise

2006-10-01

Natural objects provide partially redundant information to the brain through different sensory modalities. For example, voices and faces both give information about the speech content, age, and gender of a person. Thanks to this redundancy, multimodal recognition is fast, robust, and automatic. In unimodal perception, however, only part of the information about an object is available. Here, we addressed whether, even under conditions of unimodal sensory input, crossmodal neural circuits that have been shaped by previous associative learning become activated and underpin a performance benefit. We measured brain activity with functional magnetic resonance imaging before, while, and after participants learned to associate either sensory redundant stimuli, i.e. voices and faces, or arbitrary multimodal combinations, i.e. voices and written names, ring tones, and cell phones or brand names of these cell phones. After learning, participants were better at recognizing unimodal auditory voices that had been paired with faces than those paired with written names, and association of voices with faces resulted in an increased functional coupling between voice and face areas. No such effects were observed for ring tones that had been paired with cell phones or names. These findings demonstrate that brief exposure to ecologically valid and sensory redundant stimulus pairs, such as voices and faces, induces specific multisensory associations. Consistent with predictive coding theories, associative representations become thereafter available for unimodal perception and facilitate object recognition. These data suggest that for natural objects effective predictive signals can be generated across sensory systems and proceed by optimization of functional connectivity between specialized cortical sensory modules.
Digital signal processing algorithms for automatic voice recognition

NASA Technical Reports Server (NTRS)

Botros, Nazeih M.

1987-01-01

The current digital signal analysis algorithms are investigated that are implemented in automatic voice recognition algorithms. Automatic voice recognition means, the capability of a computer to recognize and interact with verbal commands. The digital signal is focused on, rather than the linguistic, analysis of speech signal. Several digital signal processing algorithms are available for voice recognition. Some of these algorithms are: Linear Predictive Coding (LPC), Short-time Fourier Analysis, and Cepstrum Analysis. Among these algorithms, the LPC is the most widely used. This algorithm has short execution time and do not require large memory storage. However, it has several limitations due to the assumptions used to develop it. The other 2 algorithms are frequency domain algorithms with not many assumptions, but they are not widely implemented or investigated. However, with the recent advances in the digital technology, namely signal processors, these 2 frequency domain algorithms may be investigated in order to implement them in voice recognition. This research is concerned with real time, microprocessor based recognition algorithms.
Memory strength and specificity revealed by pupillometry

PubMed Central

Papesh, Megan H.; Goldinger, Stephen D.; Hout, Michael C.

2011-01-01

Voice-specificity effects in recognition memory were investigated using both behavioral data and pupillometry. Volunteers initially heard spoken words and nonwords in two voices; they later provided confidence-based old/new classifications to items presented in their original voices, changed (but familiar) voices, or entirely new voices. Recognition was more accurate for old-voice items, replicating prior research. Pupillometry was used to gauge cognitive demand during both encoding and testing: Enlarged pupils revealed that participants devoted greater effort to encoding items that were subsequently recognized. Further, pupil responses were sensitive to the cue match between encoding and retrieval voices, as well as memory strength. Strong memories, and those with the closest encoding-retrieval voice matches, resulted in the highest peak pupil diameters. The results are discussed with respect to episodic memory models and Whittlesea’s (1997) SCAPE framework for recognition memory. PMID:22019480
Familiar Person Recognition: Is Autonoetic Consciousness More Likely to Accompany Face Recognition Than Voice Recognition?

NASA Astrophysics Data System (ADS)

Barsics, Catherine; Brédart, Serge

2010-11-01

Autonoetic consciousness is a fundamental property of human memory, enabling us to experience mental time travel, to recollect past events with a feeling of self-involvement, and to project ourselves in the future. Autonoetic consciousness is a characteristic of episodic memory. By contrast, awareness of the past associated with a mere feeling of familiarity or knowing relies on noetic consciousness, depending on semantic memory integrity. Present research was aimed at evaluating whether conscious recollection of episodic memories is more likely to occur following the recognition of a familiar face than following the recognition of a familiar voice. Recall of semantic information (biographical information) was also assessed. Previous studies that investigated the recall of biographical information following person recognition used faces and voices of famous people as stimuli. In this study, the participants were presented with personally familiar people's voices and faces, thus avoiding the presence of identity cues in the spoken extracts and allowing a stricter control of frequency exposure with both types of stimuli (voices and faces). In the present study, the rate of retrieved episodic memories, associated with autonoetic awareness, was significantly higher from familiar faces than familiar voices even though the level of overall recognition was similar for both these stimuli domains. The same pattern was observed regarding semantic information retrieval. These results and their implications for current Interactive Activation and Competition person recognition models are discussed.
The Army word recognition system

NASA Technical Reports Server (NTRS)

Hadden, David R.; Haratz, David

1977-01-01

The application of speech recognition technology in the Army command and control area is presented. The problems associated with this program are described as well as as its relevance in terms of the man/machine interactions, voice inflexions, and the amount of training needed to interact with and utilize the automated system.
Secure Recognition of Voice-Less Commands Using Videos

NASA Astrophysics Data System (ADS)

Yau, Wai Chee; Kumar, Dinesh Kant; Weghorn, Hans

Interest in voice recognition technologies for internet applications is growing due to the flexibility of speech-based communication. The major drawback with the use of sound for internet access with computers is that the commands will be audible to other people in the vicinity. This paper examines a secure and voice-less method for recognition of speech-based commands using video without evaluating sound signals. The proposed approach represents mouth movements in the video data using 2D spatio-temporal templates (STT). Zernike moments (ZM) are computed from STT and fed into support vector machines (SVM) to be classified into one of the utterances. The experimental results demonstrate that the proposed technique produces a high accuracy of 98% in a phoneme classification task. The proposed technique is demonstrated to be invariant to global variations of illumination level. Such a system is useful for securely interpreting user commands for internet applications on mobile devices.
Early Detection of Severe Apnoea through Voice Analysis and Automatic Speaker Recognition Techniques

NASA Astrophysics Data System (ADS)

Fernández, Ruben; Blanco, Jose Luis; Díaz, David; Hernández, Luis A.; López, Eduardo; Alcázar, José

This study is part of an on-going collaborative effort between the medical and the signal processing communities to promote research on applying voice analysis and Automatic Speaker Recognition techniques (ASR) for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based diagnosis could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we present and discuss the possibilities of using generative Gaussian Mixture Models (GMMs), generally used in ASR systems, to model distinctive apnoea voice characteristics (i.e. abnormal nasalization). Finally, we present experimental findings regarding the discriminative power of speaker recognition techniques applied to severe apnoea detection. We have achieved an 81.25 % correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Voice Enabled Framework to Support Post-Surgical Discharge Monitoring

PubMed Central

Blansit, Kevin; Marmor, Rebecca; Zhao, Beiqun; Tien, Dan

2017-01-01

Unplanned surgical readmissions pose a challenging problem for the American healthcare system. We propose to combine consumer electronic voice recognition technology with the FHIR standard to create a post-surgical discharge monitoring app to identify and alert physicians to a patient’s deteriorating status. PMID:29854267
Crossmodal plasticity in the fusiform gyrus of late blind individuals during voice recognition.

PubMed

Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

2014-12-01

Blind individuals are trained in identifying other people through voices. In congenitally blind adults the anterior fusiform gyrus has been shown to be active during voice recognition. Such crossmodal changes have been associated with a superiority of blind adults in voice perception. The key question of the present functional magnetic resonance imaging (fMRI) study was whether visual deprivation that occurs in adulthood is followed by similar adaptive changes of the voice identification system. Late blind individuals and matched sighted participants were tested in a priming paradigm, in which two voice stimuli were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either coming from an old or a young person. Only in late blind but not in matched sighted controls, the activation in the anterior fusiform gyrus was modulated by voice identity: late blind volunteers showed an increase of the BOLD signal in response to person-incongruent compared with person-congruent trials. These results suggest that the fusiform gyrus adapts to input of a new modality even in the mature brain and thus demonstrate an adult type of crossmodal plasticity. Copyright © 2014 Elsevier Inc. All rights reserved.
A preliminary comparison of speech recognition functionality in dental practice management systems.

PubMed

Irwin, Jeannie Y; Schleyer, Titus

2008-11-06

In this study, we examined speech recognition functionality in four leading dental practice management systems. Twenty dental students used voice to chart a simulated patient with 18 findings in each system. Results show it can take over a minute to chart one finding and that users frequently have to repeat commands. Limited functionality, poor usability and a high error rate appear to retard adoption of speech recognition in dentistry.
Implicit Multisensory Associations Influence Voice Recognition

PubMed Central

von Kriegstein, Katharina; Giraud, Anne-Lise

2006-01-01

Natural objects provide partially redundant information to the brain through different sensory modalities. For example, voices and faces both give information about the speech content, age, and gender of a person. Thanks to this redundancy, multimodal recognition is fast, robust, and automatic. In unimodal perception, however, only part of the information about an object is available. Here, we addressed whether, even under conditions of unimodal sensory input, crossmodal neural circuits that have been shaped by previous associative learning become activated and underpin a performance benefit. We measured brain activity with functional magnetic resonance imaging before, while, and after participants learned to associate either sensory redundant stimuli, i.e. voices and faces, or arbitrary multimodal combinations, i.e. voices and written names, ring tones, and cell phones or brand names of these cell phones. After learning, participants were better at recognizing unimodal auditory voices that had been paired with faces than those paired with written names, and association of voices with faces resulted in an increased functional coupling between voice and face areas. No such effects were observed for ring tones that had been paired with cell phones or names. These findings demonstrate that brief exposure to ecologically valid and sensory redundant stimulus pairs, such as voices and faces, induces specific multisensory associations. Consistent with predictive coding theories, associative representations become thereafter available for unimodal perception and facilitate object recognition. These data suggest that for natural objects effective predictive signals can be generated across sensory systems and proceed by optimization of functional connectivity between specialized cortical sensory modules. PMID:17002519
Writing with voice: an investigation of the use of a voice recognition system as a writing aid for a man with aphasia.

PubMed

Bruce, Carolyn; Edmundson, Anne; Coleman, Michael

2003-01-01

People with aphasia may experience difficulties that prevent them from demonstrating in writing what they know and can produce orally. Voice recognition systems that allow the user to speak into a microphone and see their words appear on a computer screen have the potential to assist written communication. This study investigated whether a man with fluent aphasia could learn to use Dragon NaturallySpeaking to write. A single case study of a man with acquired writing difficulties is reported. A detailed account is provided of the stages involved in teaching him to use the software. The therapy tasks carried out to develop his functional use of the system are then described. Outcomes included the percentage of words accurately recognized by the system over time, the quantitative and qualitative changes in written texts produced with and without the use of the speech-recognition system, and the functional benefits the man described. The treatment programme was successful and resulted in a marked improvement in the subject's written work. It also had effects in the functional life domain as the subject could use writing for communication purposes. The results suggest that the technology might benefit others with acquired writing difficulties.
Voice gender and the segregation of competing talkers: Perceptual learning in cochlear implant simulations

PubMed Central

Sullivan, Jessica R.; Assmann, Peter F.; Hossain, Shaikat; Schafer, Erin C.

2017-01-01

Two experiments explored the role of differences in voice gender in the recognition of speech masked by a competing talker in cochlear implant simulations. Experiment 1 confirmed that listeners with normal hearing receive little benefit from differences in voice gender between a target and masker sentence in four- and eight-channel simulations, consistent with previous findings that cochlear implants deliver an impoverished representation of the cues for voice gender. However, gender differences led to small but significant improvements in word recognition with 16 and 32 channels. Experiment 2 assessed the benefits of perceptual training on the use of voice gender cues in an eight-channel simulation. Listeners were assigned to one of four groups: (1) word recognition training with target and masker differing in gender; (2) word recognition training with same-gender target and masker; (3) gender recognition training; or (4) control with no training. Significant improvements in word recognition were observed from pre- to post-test sessions for all three training groups compared to the control group. These improvements were maintained at the late session (one week following the last training session) for all three groups. There was an overall improvement in masked word recognition performance provided by gender mismatch following training, but the amount of benefit did not differ as a function of the type of training. The training effects observed here are consistent with a form of rapid perceptual learning that contributes to the segregation of competing voices but does not specifically enhance the benefits provided by voice gender cues. PMID:28372046
Voice, Schooling, Inequality, and Scale

ERIC Educational Resources Information Center

Collins, James

2013-01-01

The rich studies in this collection show that the investigation of voice requires analysis of "recognition" across layered spatial-temporal and sociolinguistic scales. I argue that the concepts of voice, recognition, and scale provide insight into contemporary educational inequality and that their study benefits, in turn, from paying attention to…
Cockpit voice recognition program at Princeton University

NASA Technical Reports Server (NTRS)

Huang, C. Y.

1983-01-01

Voice recognition technology (VRT) is applied to aeronautics, particularly on the pilot workload alleviation. The VRT does not have to prove its maturity any longer. The feasibility of voice tuning of radio and DME are demonstrated since there are immediate advantages to the pilot and can be completed in a reasonable time.
A Development of a System Enables Character Input and PC Operation via Voice for a Physically Disabled Person with a Speech Impediment

NASA Astrophysics Data System (ADS)

Tanioka, Toshimasa; Egashira, Hiroyuki; Takata, Mayumi; Okazaki, Yasuhisa; Watanabe, Kenzi; Kondo, Hiroki

We have designed and implemented a PC operation support system for a physically disabled person with a speech impediment via voice. Voice operation is an effective method for a physically disabled person with involuntary movement of the limbs and the head. We have applied a commercial speech recognition engine to develop our system for practical purposes. Adoption of a commercial engine reduces development cost and will contribute to make our system useful to another speech impediment people. We have customized commercial speech recognition engine so that it can recognize the utterance of a person with a speech impediment. We have restricted the words that the recognition engine recognizes and separated a target words from similar words in pronunciation to avoid misrecognition. Huge number of words registered in commercial speech recognition engines cause frequent misrecognition for speech impediments' utterance, because their utterance is not clear and unstable. We have solved this problem by narrowing the choice of input down in a small number and also by registering their ambiguous pronunciations in addition to the original ones. To realize all character inputs and all PC operation with a small number of words, we have designed multiple input modes with categorized dictionaries and have introduced two-step input in each mode except numeral input to enable correct operation with small number of words. The system we have developed is in practical level. The first author of this paper is physically disabled with a speech impediment. He has been able not only character input into PC but also to operate Windows system smoothly by using this system. He uses this system in his daily life. This paper is written by him with this system. At present, the speech recognition is customized to him. It is, however, possible to customize for other users by changing words and registering new pronunciation according to each user's utterance.
Voice control of the space shuttle video system

NASA Technical Reports Server (NTRS)

Bejczy, A. K.; Dotson, R. S.; Brown, J. W.; Lewis, J. L.

1981-01-01

A pilot voice control system developed at the Jet Propulsion Laboratory (JPL) to test and evaluate the feasibility of controlling the shuttle TV cameras and monitors by voice commands utilizes a commercially available discrete word speech recognizer which can be trained to the individual utterances of each operator. Successful ground tests were conducted using a simulated full-scale space shuttle manipulator. The test configuration involved the berthing, maneuvering and deploying a simulated science payload in the shuttle bay. The handling task typically required 15 to 20 minutes and 60 to 80 commands to 4 TV cameras and 2 TV monitors. The best test runs show 96 to 100 percent voice recognition accuracy.
Current trends in small vocabulary speech recognition for equipment control

NASA Astrophysics Data System (ADS)

Doukas, Nikolaos; Bardis, Nikolaos G.

2017-09-01

Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.
A voice-input voice-output communication aid for people with severe speech impairment.

PubMed

Hawley, Mark S; Cunningham, Stuart P; Green, Phil D; Enderby, Pam; Palmer, Rebecca; Sehgal, Siddharth; O'Neill, Peter

2013-01-01

A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.
The processing of auditory and visual recognition of self-stimuli.

PubMed

Hughes, Susan M; Nicholson, Shevon E

2010-12-01

This study examined self-recognition processing in both the auditory and visual modalities by determining how comparable hearing a recording of one's own voice was to seeing photograph of one's own face. We also investigated whether the simultaneous presentation of auditory and visual self-stimuli would either facilitate or inhibit self-identification. Ninety-one participants completed reaction-time tasks of self-recognition when presented with their own faces, own voices, and combinations of the two. Reaction time and errors made when responding with both the right and left hand were recorded to determine if there were lateralization effects on these tasks. Our findings showed that visual self-recognition for facial photographs appears to be superior to auditory self-recognition for voice recordings. Furthermore, a combined presentation of one's own face and voice appeared to inhibit rather than facilitate self-recognition and there was a left-hand advantage for reaction time on the combined-presentation tasks. Copyright © 2010 Elsevier Inc. All rights reserved.

Embodied Transcription: A Creative Method for Using Voice-Recognition Software

ERIC Educational Resources Information Center

Brooks, Christine

2010-01-01

Voice-recognition software is designed to be used by one user (voice) at a time, requiring a researcher to speak all of the words of a recorded interview to achieve transcription. Thus, the researcher becomes a conduit through which interview material is inscribed as written word. Embodied Transcription acknowledges performative and interpretative…
Investigations of Hemispheric Specialization of Self-Voice Recognition

ERIC Educational Resources Information Center

Rosa, Christine; Lassonde, Maryse; Pinard, Claudine; Keenan, Julian Paul; Belin, Pascal

2008-01-01

Three experiments investigated functional asymmetries related to self-recognition in the domain of voices. In Experiment 1, participants were asked to identify one of three presented voices (self, familiar or unknown) by responding with either the right or the left-hand. In Experiment 2, participants were presented with auditory morphs between the…
Using Continuous Voice Recognition Technology as an Input Medium to the Naval Warfare Interactive Simulation System (NWISS).

DTIC Science & Technology

1984-06-01

Co ,u’arataor, Gr 7- / ’ . c ; / , caae.ic >ar. ’ ’# d:.i II ’ ..... .. . . .. .. . ... . , rV ABSTRACT A great d-al of research has been conducted an...9 2. Continuous Voice -%ecoait.ior, ....... 11 B. VERBEX 3000 SPEECH APPLiCATION DEVELOP !ENT SYSTEM! ( SPADS ...13 C . NAVAL IAR FARE INT7EACTI7E S:AIULATIC"N SYSTEM (NWISS) ....... .................. 14 D. PURPOSE .................... 16 1. A Past
Normal voice processing after posterior superior temporal sulcus lesion.

PubMed

Jiahui, Guo; Garrido, Lúcia; Liu, Ran R; Susilo, Tirta; Barton, Jason J S; Duchaine, Bradley

2017-10-01

The right posterior superior temporal sulcus (pSTS) shows a strong response to voices, but the cognitive processes generating this response are unclear. One possibility is that this activity reflects basic voice processing. However, several fMRI and magnetoencephalography findings suggest instead that pSTS serves as an integrative hub that combines voice and face information. Here we investigate whether right pSTS contributes to basic voice processing by testing Faith, a patient whose right pSTS was resected, with eight behavioral tasks assessing voice identity perception and recognition, voice sex perception, and voice expression perception. Faith performed normally on all the tasks. Her normal performance indicates right pSTS is not necessary for intact voice recognition and suggests that pSTS activations to voices reflect higher-level processes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Bilingual Computerized Speech Recognition Screening for Depression Symptoms

ERIC Educational Resources Information Center

Gonzalez, Gerardo; Carter, Colby; Blanes, Erika

2007-01-01

The Voice-Interactive Depression Assessment System (VIDAS) is a computerized speech recognition application for screening depression based on the Center for Epidemiological Studies--Depression scale in English and Spanish. Study 1 included 50 English and 47 Spanish speakers. Study 2 involved 108 English and 109 Spanish speakers. Participants…
Impact of Fetal-Neonatal Iron Deficiency on Recognition Memory at 2 Months of Age.

PubMed

Geng, Fengji; Mai, Xiaoqin; Zhan, Jianying; Xu, Lin; Zhao, Zhengyan; Georgieff, Michael; Shao, Jie; Lozoff, Betsy

2015-12-01

To assess the effects of fetal-neonatal iron deficiency on recognition memory in early infancy. Perinatal iron deficiency delays or disrupts hippocampal development in animal models and thus may impair related neural functions in human infants, such as recognition memory. Event-related potentials were used in an auditory recognition memory task to compare 2-month-old Chinese infants with iron sufficiency or deficiency at birth. Fetal-neonatal iron deficiency was defined 2 ways: high zinc protoporphyrin/heme ratio (ZPP/H > 118 μmol/mol) or low serum ferritin (<75 μg/L) in cord blood. Late slow wave was used to measure infant recognition of mother's voice. Event related potentials patterns differed significantly for fetal-neonatal iron deficiency as defined by high cord ZPP/H but not low ferritin. Comparing 35 infants with iron deficiency (ZPP/H > 118 μmol/mol) to 92 with lower ZPP/H (iron-sufficient), only infants with iron sufficiency showed larger late slow wave amplitude for stranger's voice than mother's voice in frontal-central and parietal-occipital locations, indicating the recognition of mother's voice. Infants with iron sufficiency showed electrophysiological evidence of recognizing their mother's voice, whereas infants with fetal-neonatal iron deficiency did not. Their poorer auditory recognition memory at 2 months of age is consistent with effects of fetal-neonatal iron deficiency on the developing hippocampus. Copyright © 2015 Elsevier Inc. All rights reserved.
Speech therapy and voice recognition instrument

NASA Technical Reports Server (NTRS)

Cohen, J.; Babcock, M. L.

1972-01-01

Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.
Experimental study on GMM-based speaker recognition

NASA Astrophysics Data System (ADS)

Ye, Wenxing; Wu, Dapeng; Nucci, Antonio

2010-04-01

Speaker recognition plays a very important role in the field of biometric security. In order to improve the recognition performance, many pattern recognition techniques have be explored in the literature. Among these techniques, the Gaussian Mixture Model (GMM) is proved to be an effective statistic model for speaker recognition and is used in most state-of-the-art speaker recognition systems. The GMM is used to represent the 'voice print' of a speaker through modeling the spectral characteristic of speech signals of the speaker. In this paper, we implement a speaker recognition system, which consists of preprocessing, Mel-Frequency Cepstrum Coefficients (MFCCs) based feature extraction, and GMM based classification. We test our system with TIDIGITS data set (325 speakers) and our own recordings of more than 200 speakers; our system achieves 100% correct recognition rate. Moreover, we also test our system under the scenario that training samples are from one language but test samples are from a different language; our system also achieves 100% correct recognition rate, which indicates that our system is language independent.
In the Beginning Was the Familiar Voice Personally Familiar Voices in the Evolutionary and Contemporary Biology of Communication

PubMed Central

Sidtis, Diana; Kreiman, Jody

2011-01-01

The human voice is described in dialogic linguistics as an embodiment of self in a social context, contributing to expression, perception and mutual exchange of self, consciousness, inner life, and personhood. While these approaches are subjective and arise from phenomenological perspectives, scientific facts about personal vocal identity, and its role in biological development, support these views. It is our purpose to review studies of the biology of personal vocal identity -- the familiar voice pattern-- as providing an empirical foundation for the view that the human voice is an embodiment of self in the social context. Recent developments in the biology and evolution of communication are concordant with these notions, revealing that familiar voice recognition (also known as vocal identity recognition or individual vocal recognition) or contributed to survival in the earliest vocalizing species. Contemporary ethology documents the crucial role of familiar voices across animal species in signaling and perceiving internal states and personal identities. Neuropsychological studies of voice reveal multimodal cerebral associations arising across brain structures involved in memory, emotion, attention, and arousal in vocal perception and production, such that the voice represents the whole person. Although its roots are in evolutionary biology, human competence for processing layered social and personal meanings in the voice, as well as personal identity in a large repertory of familiar voice patterns, has achieved an immense sophistication. PMID:21710374
Valuing autonomy, struggling for an identity and a collective voice, and seeking role recognition: community mental health nurses' perceptions of their roles.

PubMed

White, Jane H; Kudless, Mary

2008-10-01

Leaders in this community mental health system approached the problem of job frustration, morale issues, and turnover concerns of their Community Mental Health Nurses (CMHNs) by designing a qualitative study using Participant Action Research (PAR) methodology based on the philosophy of Habermas. Six focus groups were conducted to address the nurses' concerns. The themes of Valuing Autonomy, Struggling for an Identity and Collective Voice, and Seeking Role Recognition best explained the participants' concerns. The study concluded with an action plan, the implementation of the plan, and a discussion of the plan's final outcomes.
Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

PubMed

Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

2018-05-01

Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Assessment of voice, speech, and related quality of life in advanced head and neck cancer patients 10-years+ after chemoradiotherapy.

PubMed

Kraaijenga, S A C; Oskam, I M; van Son, R J J H; Hamming-Vrieze, O; Hilgers, F J M; van den Brekel, M W M; van der Molen, L

2016-04-01

Assessment of long-term objective and subjective voice, speech, articulation, and quality of life in patients with head and neck cancer (HNC) treated with concurrent chemoradiotherapy (CRT) for advanced, stage IV disease. Twenty-two disease-free survivors, treated with cisplatin-based CRT for inoperable HNC (1999-2004), were evaluated at 10-years post-treatment. A standard Dutch text was recorded. Perceptual analysis of voice, speech, and articulation was conducted by two expert listeners (SLPs). Also an experimental expert system based on automatic speech recognition was used. Patients' perception of voice and speech and related quality of life was assessed with the Voice Handicap Index (VHI) and Speech Handicap Index (SHI) questionnaires. At a median follow-up of 11-years, perceptual evaluation showed abnormal scores in up to 64% of cases, depending on the outcome parameter analyzed. Automatic assessment of voice and speech parameters correlated moderate to strong with perceptual outcome scores. Patient-reported problems with voice (VHI>15) and speech (SHI>6) in daily life were present in 68% and 77% of patients, respectively. Patients treated with IMRT showed significantly less impairment compared to those treated with conventional radiotherapy. More than 10-years after organ-preservation treatment, voice and speech problems are common in this patient cohort, as assessed with perceptual evaluation, automatic speech recognition, and with validated structured questionnaires. There were fewer complaints in patients treated with IMRT than with conventional radiotherapy. Copyright © 2016 Elsevier Ltd. All rights reserved.
Brain systems mediating voice identity processing in blind humans.

PubMed

Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

2014-09-01

Blind people rely more on vocal cues when they recognize a person's identity than sighted people. Indeed, a number of studies have reported better voice recognition skills in blind than in sighted adults. The present functional magnetic resonance imaging study investigated changes in the functional organization of neural systems involved in voice identity processing following congenital blindness. A group of congenitally blind individuals and matched sighted control participants were tested in a priming paradigm, in which two voice stimuli (S1, S2) were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either a old or a young person. Person-incongruent voices (S2) compared with person-congruent voices elicited an increased activation in the right anterior fusiform gyrus in congenitally blind individuals but not in matched sighted control participants. In contrast, only matched sighted controls showed a higher activation in response to person-incongruent compared with person-congruent voices (S2) in the right posterior superior temporal sulcus. These results provide evidence for crossmodal plastic changes of the person identification system in the brain after visual deprivation. Copyright © 2014 Wiley Periodicals, Inc.
Some effects of stress on users of a voice recognition system: A preliminary inquiry

NASA Astrophysics Data System (ADS)

French, B. A.

1983-03-01

Recent work with Automatic Speech Recognition has focused on applications and productivity considerations in the man-machine interface. This thesis is an attempt to see if placing users of such equipment under time-induced stress has an effect on their percent correct recognition rates. Subjects were given a message-handling task of fixed length and allowed progressively shorter times to attempt to complete it. Questionnaire responses indicate stress levels increased with decreased time-allowance; recognition rates decreased as time was reduced.
78 FR 58305 - Honeywell International, Inc.; Analysis of Agreement Containing Consent Order To Aid Public Comment

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-23

..., formulas, patterns, devices, manufacturing processes, or customer names. If you want the Commission to give... barcode scanners, barcode printers, RFID systems and voice recognition systems. III. Scan Engines The...
A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space.

PubMed

Yu, Chengzhu; Hansen, John H L

2017-03-01

Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.
The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

PubMed

Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J

2004-09-01

The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.
Information system for diagnosis of respiratory system diseases

NASA Astrophysics Data System (ADS)

Abramov, G. V.; Korobova, L. A.; Ivashin, A. L.; Matytsina, I. A.

2018-05-01

An information system is for the diagnosis of patients with lung diseases. The main problem solved by this system is the definition of the parameters of cough fragments in the monitoring recordings using a voice recorder. The authors give the recognition criteria of recorded cough moments, audio records analysis. The results of the research are systematized. The cough recognition system can be used by the medical specialists to diagnose the condition of the patients and to monitor the process of their treatment.
Sperry Univac speech communications technology

NASA Technical Reports Server (NTRS)

Medress, Mark F.

1977-01-01

Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.
Systems concept for speech technology application in general aviation

NASA Technical Reports Server (NTRS)

North, R. A.; Bergeron, H.

1984-01-01

The application potential of voice recognition and synthesis circuits for general aviation, single-pilot IFR (SPIFR) situations is examined. The viewpoint of the pilot was central to workload analyses and assessment of the effectiveness of the voice systems. A twin-engine, high performance general aviation aircraft on a cross-country fixed route was employed as the study model. No actual control movements were considered and other possible functions were scored by three IFR-rated instructors. The SPIFR was concluded helpful in alleviating visual and manual workloads during take-off, approach and landing, particularly for data retrieval and entry tasks. Voice synthesis was an aid in alerting a pilot to in-flight problems. It is expected that usable systems will be available within 5 yr.

A Multimodal Emotion Detection System during Human-Robot Interaction

PubMed Central

Alonso-Martín, Fernando; Malfaz, María; Sequeira, João; Gorostiza, Javier F.; Salichs, Miguel A.

2013-01-01

In this paper, a multimodal user-emotion detection system for social robots is presented. This system is intended to be used during human–robot interaction, and it is integrated as part of the overall interaction system of the robot: the Robotics Dialog System (RDS). Two modes are used to detect emotions: the voice and face expression analysis. In order to analyze the voice of the user, a new component has been developed: Gender and Emotion Voice Analysis (GEVA), which is written using the Chuck language. For emotion detection in facial expressions, the system, Gender and Emotion Facial Analysis (GEFA), has been also developed. This last system integrates two third-party solutions: Sophisticated High-speed Object Recognition Engine (SHORE) and Computer Expression Recognition Toolbox (CERT). Once these new components (GEVA and GEFA) give their results, a decision rule is applied in order to combine the information given by both of them. The result of this rule, the detected emotion, is integrated into the dialog system through communicative acts. Hence, each communicative act gives, among other things, the detected emotion of the user to the RDS so it can adapt its strategy in order to get a greater satisfaction degree during the human–robot dialog. Each of the new components, GEVA and GEFA, can also be used individually. Moreover, they are integrated with the robotic control platform ROS (Robot Operating System). Several experiments with real users were performed to determine the accuracy of each component and to set the final decision rule. The results obtained from applying this decision rule in these experiments show a high success rate in automatic user emotion recognition, improving the results given by the two information channels (audio and visual) separately. PMID:24240598
The Cambridge Mindreading (CAM) Face-Voice Battery: Testing complex emotion recognition in adults with and without Asperger syndrome.

PubMed

Golan, Ofer; Baron-Cohen, Simon; Hill, Jacqueline

2006-02-01

Adults with Asperger Syndrome (AS) can recognise simple emotions and pass basic theory of mind tasks, but have difficulties recognising more complex emotions and mental states. This study describes a new battery of tasks, testing recognition of 20 complex emotions and mental states from faces and voices. The battery was given to males and females with AS and matched controls. Results showed the AS group performed worse than controls overall, on emotion recognition from faces and voices and on 12/20 specific emotions. Females recognised faces better than males regardless of diagnosis, and males with AS had more difficulties recognising emotions from faces than from voices. The implications of these results are discussed in relation to social functioning in AS.
Robot Command Interface Using an Audio-Visual Speech Recognition System

NASA Astrophysics Data System (ADS)

Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
Is it me? Self-recognition bias across sensory modalities and its relationship to autistic traits.

PubMed

Chakraborty, Anya; Chakrabarti, Bhismadev

2015-01-01

Atypical self-processing is an emerging theme in autism research, suggested by lower self-reference effect in memory, and atypical neural responses to visual self-representations. Most research on physical self-processing in autism uses visual stimuli. However, the self is a multimodal construct, and therefore, it is essential to test self-recognition in other sensory modalities as well. Self-recognition in the auditory modality remains relatively unexplored and has not been tested in relation to autism and related traits. This study investigates self-recognition in auditory and visual domain in the general population and tests if it is associated with autistic traits. Thirty-nine neurotypical adults participated in a two-part study. In the first session, individual participant's voice was recorded and face was photographed and morphed respectively with voices and faces from unfamiliar identities. In the second session, participants performed a 'self-identification' task, classifying each morph as 'self' voice (or face) or an 'other' voice (or face). All participants also completed the Autism Spectrum Quotient (AQ). For each sensory modality, slope of the self-recognition curve was used as individual self-recognition metric. These two self-recognition metrics were tested for association between each other, and with autistic traits. Fifty percent 'self' response was reached for a higher percentage of self in the auditory domain compared to the visual domain (t = 3.142; P < 0.01). No significant correlation was noted between self-recognition bias across sensory modalities (τ = -0.165, P = 0.204). Higher recognition bias for self-voice was observed in individuals higher in autistic traits (τ AQ = 0.301, P = 0.008). No such correlation was observed between recognition bias for self-face and autistic traits (τ AQ = -0.020, P = 0.438). Our data shows that recognition bias for physical self-representation is not related across sensory modalities. Further, individuals with higher autistic traits were better able to discriminate self from other voices, but this relation was not observed with self-face. A narrow self-other overlap in the auditory domain seen in individuals with high autistic traits could arise due to enhanced perceptual processing of auditory stimuli often observed in individuals with autism.
Using voice to create hospital progress notes: Description of a mobile application and supporting system integrated with a commercial electronic health record.

PubMed

Payne, Thomas H; Alonso, W David; Markiel, J Andrew; Lybarger, Kevin; White, Andrew A

2018-01-01

We describe the development and design of a smartphone app-based system to create inpatient progress notes using voice, commercial automatic speech recognition software, with text processing to recognize spoken voice commands and format the note, and integration with a commercial EHR. This new system fits hospital rounding workflow and was used to support a randomized clinical trial testing whether use of voice to create notes improves timeliness of note availability, note quality, and physician satisfaction with the note creation process. The system was used to create 709 notes which were placed in the corresponding patient's EHR record. The median time from pressing the Send button to appearance of the formatted note in the Inbox was 8.8 min. It was generally very reliable, accepted by physician users, and secure. This approach provides an alternative to use of keyboard and templates to create progress notes and may appeal to physicians who prefer voice to typing. Copyright © 2017 Elsevier Inc. All rights reserved.
Gender recognition from vocal source

NASA Astrophysics Data System (ADS)

Sorokin, V. N.; Makarov, I. S.

2008-07-01

Efficiency of automatic recognition of male and female voices based on solving the inverse problem for glottis area dynamics and for waveform of the glottal airflow volume velocity pulse is studied. The inverse problem is regularized through the use of analytical models of the voice excitation pulse and of the dynamics of the glottis area, as well as the model of one-dimensional glottal airflow. Parameters of these models and spectral parameters of the volume velocity pulse are considered. The following parameters are found to be most promising: the instant of maximum glottis area, the maximum derivative of the area, the slope of the spectrum of the glottal airflow volume velocity pulse, the amplitude ratios of harmonics of this spectrum, and the pitch. On the plane of the first two main components in the space of these parameters, an almost twofold decrease in the classification error relative to that for the pitch alone is attained. The male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%.
Defining the ATC Controller Interface for Data Link Clearances

NASA Technical Reports Server (NTRS)

Rankin, James

1998-01-01

The Controller Interface (CI) is the primary method for Air Traffic Controllers to communicate with aircraft via Controller-Pilot Data Link Communications (CPDLC). The controller, wearing a microphone/headset, aurally gives instructions to aircraft as he/she would with today's voice radio systems. The CI's voice recognition system converts the instructions to digitized messages that are formatted according to the RTCA DO-219 Operational Performance Standards for ATC Two-Way Data Link Communications. The DO-219 messages are transferred via RS-232 to the ATIDS system for uplink using a Mode-S datalink. Pilot acknowledgments of controller messages are downlinked to the ATIDS system and transferred to the Cl. A computer monitor is used to convey information to the controller. Aircraft data from the ARTS database are displayed on flight strips. The flight strips are electronic versions of the strips currently used in the ATC system. Outgoing controller messages cause the respective strip to change color to indicate an unacknowledged transmission. The message text is shown on the flight strips for reference. When the pilot acknowledges the message, the strip returns to its normal color. A map of the airport can also be displayed on the monitor. In addition to voice recognition, the controller can enter messages using the monitor's touch screen or by mouse/keyboard.
Adaptive Suppression of Noise in Voice Communications

NASA Technical Reports Server (NTRS)

Kozel, David; DeVault, James A.; Birr, Richard B.

2003-01-01

A subsystem for the adaptive suppression of noise in a voice communication system effects a high level of reduction of noise that enters the system through microphones. The subsystem includes a digital signal processor (DSP) plus circuitry that implements voice-recognition and spectral- manipulation techniques. The development of the adaptive noise-suppression subsystem was prompted by the following considerations: During processing of the space shuttle at Kennedy Space Center, voice communications among test team members have been significantly impaired in several instances because some test participants have had to communicate from locations with high ambient noise levels. Ear protection for the personnel involved is commercially available and is used in such situations. However, commercially available noise-canceling microphones do not provide sufficient reduction of noise that enters through microphones and thus becomes transmitted on outbound communication links.
Optimized delivery radiological reports: applying Six Sigma methodology to a radiology department.

PubMed

Cavagna, Enrico; Berletti, Riccardo; Schiavon, Francesco; Scarsi, Barbara; Barbato, Giuseppe

2003-03-01

To optimise the process of reporting and delivering radiological examinations with a view to achieving 100% service delivery within 72 hours to outpatients and 36 hours to inpatients. To this end, we used the Six Sigma method which adopts a systematic approach and rigorous statistical analysis to analyse and improve processes, by reducing variability and minimising errors. More specifically, our study focused on the process of radiological report creation, from the end of the examination to the time when the report is made available to the patient, to examine the bottlenecks and identify the measures to be taken to improve the process. Six Sigma uses a five-step problem-solving process called DMAIC, an acronym for Define, Measure, Analyze, Improve and Control. The first step is to define the problem and the elements crucial to quality, in terms of Total Quality Control. Next, the situation is analysed to identify the root causes of the problem and determine which of these is most influential. The situation is then improved by implementing change. Finally, to make sure that the change is long-lasting, measures are taken to sustain the improvements and obtain long-term control. In our case we analysed all of the phases the report passes through before reaching the user, and studied the impact of voice-recognition reporting on the speed of the report creation process. Analysis of the information collected showed that the tools available for report creation (dictaphone, voice-recognition system) and the transport of films and reports were the two critical elements on which to focus our efforts. Of all the phases making up the process, reporting (from end of examination to end of reporting) and distribution (from the report available to administrative staff to report available to the patient) account for 90% of process variability (73% and 17%, respectively). We further found that the reports dictated into a voice-recognition reporting system are delivered in 45 hours (median), whereas those dictated using a dictaphone take 96 hours: voice-recognition reporting systems therefore improve performance by 50 hours. Unfortunately, 38% of our reports are delivered within longer timeframes than the 72h for outpatients and 36h for inpatients agreed with the service users. Reports for inpatients have much faster delivery times and lower variability, as 95% of these examinations are reported using voice-recognition reporting (as a result of the greater sensitivity of physicians to the problem of inpatient waiting times). For conventional radiology examinations, numerically greater than CT or MRI, there is a stronger tendency to use the dictaphone which allows for faster dictation as it is unburdened by administrative tasks such as entering examination codes, correcting errors, etc. Freelance status has no impact on report delivery times, service delivery being the same as in the institutional setting. The subprocess of reporting is strongly affected by the choice of reporting method (voice-recognition system or dictaphone), whereas report delivery is affected by the individual's behaviour patterns and ultimately by habits generated by the lack of a clearly charted process (lack of synchronisation among the various phases), and therefore potentially avoidable. The analytical study of the various phases of examination reporting, from writing to delivery, allowed us to identify the process bottlenecks and take corrective measures. Regardless of imaging modality and individual physician, examination reporting consistently takes longer when a dictaphone is used instead of a voice-recognition reporting system, as this makes the process more complex. To improve the two critical subprocesses whilst maintaining constant resources, a first step is to abandon the dictaphone in favour of the voice-recognition system. In addition, we are experimenting other measures to improve the collection and sorting of examinations and the delivery of reports: the technical staff take the films from the examination rooms to the reporting rooms three times a day; the radiologists collect their examinations and prepare the reports, possibly on the same day; the radiologists leave their signed reports on the table in the central reporting room; the administrative staff collect the signed reports three times a day in the morning and afternoon to be able to deliver them on the same day. This project has allowed us to become familiar with the principles of total quality, to better understand our internal processes and to take effective measures to optimise them. This has resulted in enhanced satisfaction of all the department staff and has laid the grounds for further measures in the future.
Comparison of voice-automated transcription and human transcription in generating pathology reports.

PubMed

Al-Aynati, Maamoun M; Chorneyko, Katherine A

2003-06-01

Software that can convert spoken words into written text has been available since the early 1980s. Early continuous speech systems were developed in 1994, with the latest commercially available editions having a claimed accuracy of up to 98% of speech recognition at natural speech rates. To evaluate the efficacy of one commercially available voice-recognition software system with pathology vocabulary in generating pathology reports and to compare this with human transcription. To draw cost analysis conclusions regarding human versus computer-based transcription. Two hundred six routine pathology reports from the surgical pathology material handled at St Joseph's Healthcare, Hamilton, Ontario, were generated simultaneously using computer-based transcription and human transcription. The following hardware and software were used: a desktop 450-MHz Intel Pentium III processor with 192 MB of RAM, a speech-quality sound card (Sound Blaster), noise-canceling headset microphone, and IBM ViaVoice Pro version 8 with pathology vocabulary support (Voice Automated, Huntington Beach, Calif). The cost of the hardware and software used was approximately Can 2250 dollars. A total of 23 458 words were transcribed using both methods with a mean of 114 words per report. The mean accuracy rate was 93.6% (range, 87.4%-96%) using the computer software, compared to a mean accuracy of 99.6% (range, 99.4%-99.8%) for human transcription (P <.001). Time needed to edit documents by the primary evaluator (M.A.) using the computer was on average twice that needed for editing the documents produced by human transcriptionists (range, 1.4-3.5 times). The extra time needed to edit documents was 67 minutes per week (13 minutes per day). Computer-based continuous speech-recognition systems in pathology can be successfully used in pathology practice even during the handling of gross pathology specimens. The relatively low accuracy rate of this voice-recognition software with resultant increased editing burden on pathologists may not encourage its application on a wide scale in pathology departments with sufficient human transcription services, despite significant potential financial savings. However, computer-based transcription represents an attractive and relatively inexpensive alternative to human transcription in departments where there is a shortage of transcription services, and will no doubt become more commonly used in pathology departments in the future.
Audiovisual speech facilitates voice learning.

PubMed

Sheffert, Sonya M; Olson, Elizabeth

2004-02-01

In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.
Speech-recognition interfaces for music information retrieval

NASA Astrophysics Data System (ADS)

Goto, Masataka

2005-09-01

This paper describes two hands-free music information retrieval (MIR) systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. Our MIR-based jukebox systems employ two different speech-recognition interfaces for MIR, speech completion and speech spotter, which exploit intentionally controlled nonverbal speech information in original ways. The first is a music retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user only remembers part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces. (Video clips: http://staff.aist.go.jp/m.goto/MIR/speech-if.html)
Understanding the mechanisms of familiar voice-identity recognition in the human brain.

PubMed

Maguinness, Corrina; Roswandowitz, Claudia; von Kriegstein, Katharina

2018-03-31

Humans have a remarkable skill for voice-identity recognition: most of us can remember many voices that surround us as 'unique'. In this review, we explore the computational and neural mechanisms which may support our ability to represent and recognise a unique voice-identity. We examine the functional architecture of voice-sensitive regions in the superior temporal gyrus/sulcus, and bring together findings on how these regions may interact with each other, and additional face-sensitive regions, to support voice-identity processing. We also contrast findings from studies on neurotypicals and clinical populations which have examined the processing of familiar and unfamiliar voices. Taken together, the findings suggest that representations of familiar and unfamiliar voices might dissociate in the human brain. Such an observation does not fit well with current models for voice-identity processing, which by-and-large assume a common sequential analysis of the incoming voice signal, regardless of voice familiarity. We provide a revised audio-visual integrative model of voice-identity processing which brings together traditional and prototype models of identity processing. This revised model includes a mechanism of how voice-identity representations are established and provides a novel framework for understanding and examining the potential differences in familiar and unfamiliar voice processing in the human brain. Copyright © 2018 Elsevier Ltd. All rights reserved.
Definition of problems of persons in sheltered care environments

NASA Technical Reports Server (NTRS)

Fetzner, W. N.

1979-01-01

Innovations in health care using aerospace technologies are described. Voice synthesizer and voice recognition technologies were used in developing voice controlled wheel chairs and optacons. Telephone interface modules are also described.
Construction site Voice Operated Information System (VOIS) test

NASA Astrophysics Data System (ADS)

Lawrence, Debbie J.; Hettchen, William

1991-01-01

The Voice Activated Information System (VAIS), developed by USACERL, allows inspectors to verbally log on-site inspection reports on a hand held tape recorder. The tape is later processed by the VAIS, which enters the information into the system's database and produces a written report. The Voice Operated Information System (VOIS), developed by USACERL and Automated Sciences Group, through a ESACERL cooperative research and development agreement (CRDA), is an improved voice recognition system based on the concepts and function of the VAIS. To determine the applicability of the VOIS to Corps of Engineers construction projects, Technology Transfer Test Bad (T3B) funds were provided to the Corps of Engineers National Security Agency (NSA) Area Office (Fort Meade) to procure and implement the VOIS, and to train personnel in its use. This report summarizes the NSA application of the VOIS to quality assurance inspection of radio frequency shielding and to progress payment logs, and concludes that the VOIS is an easily implemented system that can offer improvements when applied to repetitive inspection procedures. Use of VOIS can save time during inspection, improve documentation storage, and provide flexible retrieval of stored information.
The cognitive neuroscience of person identification.

PubMed

Biederman, Irving; Shilowich, Bryan E; Herald, Sarah B; Margalit, Eshed; Maarek, Rafael; Meschke, Emily X; Hacker, Catrina M

2018-02-14

We compare and contrast five differences between person identification by voice and face. 1. There is little or no cost when a familiar face is to be recognized from an unrestricted set of possible faces, even at Rapid Serial Visual Presentation (RSVP) rates, but the accuracy of familiar voice recognition declines precipitously when the set of possible speakers is increased from one to a mere handful. 2. Whereas deficits in face recognition are typically perceptual in origin, those with normal perception of voices can manifest severe deficits in their identification. 3. Congenital prosopagnosics (CPros) and congenital phonagnosics (CPhon) are generally unable to imagine familiar faces and voices, respectively. Only in CPros, however, is this deficit a manifestation of a general inability to form visual images of any kind. CPhons report no deficit in imaging non-voice sounds. 4. The prevalence of CPhons of 3.2% is somewhat higher than the reported prevalence of approximately 2.0% for CPros in the population. There is evidence that CPhon represents a distinct condition statistically and not just normal variation. 5. Face and voice recognition proficiency are uncorrelated rather than reflecting limitations of a general capacity for person individuation. Copyright © 2018 Elsevier Ltd. All rights reserved.
The Glasgow Voice Memory Test: Assessing the ability to memorize and recognize unfamiliar voices.

PubMed

Aglieri, Virginia; Watson, Rebecca; Pernet, Cyril; Latinus, Marianne; Garrido, Lúcia; Belin, Pascal

2017-02-01

One thousand one hundred and twenty subjects as well as a developmental phonagnosic subject (KH) along with age-matched controls performed the Glasgow Voice Memory Test, which assesses the ability to encode and immediately recognize, through an old/new judgment, both unfamiliar voices (delivered as vowels, making language requirements minimal) and bell sounds. The inclusion of non-vocal stimuli allows the detection of significant dissociations between the two categories (vocal vs. non-vocal stimuli). The distributions of accuracy and sensitivity scores (d') reflected a wide range of individual differences in voice recognition performance in the population. As expected, KH showed a dissociation between the recognition of voices and bell sounds, her performance being significantly poorer than matched controls for voices but not for bells. By providing normative data of a large sample and by testing a developmental phonagnosic subject, we demonstrated that the Glasgow Voice Memory Test, available online and accessible from all over the world, can be a valid screening tool (~5 min) for a preliminary detection of potential cases of phonagnosia and of "super recognizers" for voices.
The role of voice input for human-machine communication.

PubMed Central

Cohen, P R; Oviatt, S L

1995-01-01

Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology. PMID:7479803
Voice-Recognition Augmented Performance Tools in Performance Poetry Pedagogy

ERIC Educational Resources Information Center

Devanny, David; McGowan, Jack

2016-01-01

This provocation shares findings from the use of bespoke voice-recognition performance software in a number of seminars (which took place in the 2014-2016 academic years at Glasgow School of Art, University of Warwick, and Falmouth University). The software, made available through this publication, is a web-app which uses Google Chrome's native…
Researching the Use of Voice Recognition Writing Software.

ERIC Educational Resources Information Center

Honeycutt, Lee

2003-01-01

Notes that voice recognition technology (VRT) has become accurate and fast enough to be useful in a variety of writing scenarios. Contends that little is known about how this technology might affect writing process or perceptions of silent writing. Explores future use of VRT by examining past research in the technology of dictation. (PM)

Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

PubMed

Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado

2016-01-01

Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).
[Research on Barrier-free Home Environment System Based on Speech Recognition].

PubMed

Zhu, Husheng; Yu, Hongliu; Shi, Ping; Fang, Youfang; Jian, Zhuo

2015-10-01

The number of people with physical disabilities is increasing year by year, and the trend of population aging is more and more serious. In order to improve the quality of the life, a control system of accessible home environment for the patients with serious disabilities was developed to control the home electrical devices with the voice of the patients. The control system includes a central control platform, a speech recognition module, a terminal operation module, etc. The system combines the speech recognition control technology and wireless information transmission technology with the embedded mobile computing technology, and interconnects the lamp, electronic locks, alarms, TV and other electrical devices in the home environment as a whole system through a wireless network node. The experimental results showed that speech recognition success rate was more than 84% in the home environment.
Voice response system of color and pattern on clothes for visually handicapped person.

PubMed

Miyake, Masao; Manabe, Yoshitsugu; Uranishi, Yuki; Imura, Masataka; Oshiro, Osamu

2013-01-01

For visually handicapped people, a mental support is important in their independent daily life and participation in a society. It is expected to develop a system which can recognize colors and patterns on clothes so that they can go out with less concerns. We have worked on a basic study into such a system, and developed a prototype system which can stably recognize colors and patterns and immediately provide these information in voice, when a user faces it to clothes. In the results of evaluation experiments it is shown that the prototype system is superior to the system in the basic study at the accuracy rate for the recognition of color and pattern.
Voice technology and BBN

NASA Technical Reports Server (NTRS)

Wolf, Jared J.

1977-01-01

The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.
Automatic speech recognition in air-ground data link

NASA Technical Reports Server (NTRS)

Armstrong, Herbert B.

1989-01-01

In the present air traffic system, information presented to the transport aircraft cockpit crew may originate from a variety of sources and may be presented to the crew in visual or aural form, either through cockpit instrument displays or, most often, through voice communication. Voice radio communications are the most error prone method for air-ground data link. Voice messages can be misstated or misunderstood and radio frequency congestion can delay or obscure important messages. To prevent proliferation, a multiplexed data link display can be designed to present information from multiple data link sources on a shared cockpit display unit (CDU) or multi-function display (MFD) or some future combination of flight management and data link information. An aural data link which incorporates an automatic speech recognition (ASR) system for crew response offers several advantages over visual displays. The possibility of applying ASR to the air-ground data link was investigated. The first step was to review current efforts in ASR applications in the cockpit and in air traffic control and evaluated their possible data line application. Next, a series of preliminary research questions is to be developed for possible future collaboration.
Emotional Recognition in Autism Spectrum Conditions from Voices and Faces

ERIC Educational Resources Information Center

Stewart, Mary E.; McAdam, Clair; Ota, Mitsuhiko; Peppe, Sue; Cleland, Joanne

2013-01-01

The present study reports on a new vocal emotion recognition task and assesses whether people with autism spectrum conditions (ASC) perform differently from typically developed individuals on tests of emotional identification from both the face and the voice. The new test of vocal emotion contained trials in which the vocal emotion of the sentence…
Cultural In-Group Advantage: Emotion Recognition in African American and European American Faces and Voices

ERIC Educational Resources Information Center

Wickline, Virginia B.; Bailey, Wendy; Nowicki, Stephen

2009-01-01

The authors explored whether there were in-group advantages in emotion recognition of faces and voices by culture or geographic region. Participants were 72 African American students (33 men, 39 women), 102 European American students (30 men, 72 women), 30 African international students (16 men, 14 women), and 30 European international students…
(Almost) Word for Word: As Voice Recognition Programs Improve, Students Reap the Benefits

ERIC Educational Resources Information Center

Smith, Mark

2006-01-01

Voice recognition software is hardly new--attempts at capturing spoken words and turning them into written text have been available to consumers for about two decades. But what was once an expensive and highly unreliable tool has made great strides in recent years, perhaps most recognized in programs such as Nuance's Dragon NaturallySpeaking…
Keys to the Adoption and Use of Voice Recognition Technology in Organizations.

ERIC Educational Resources Information Center

Goette, Tanya

2000-01-01

Presents results from a field study of individuals with disabilities who used voice recognition technology (VRT). Results indicated that task-technology fit, training, the environment, and disability limitations were the differentiating items, and that using VRT for a trial period may be the major factor in successful adoption of the technology.…
Educational Technology and Student Voice: Examining Teacher Candidates' Perceptions

ERIC Educational Resources Information Center

Byker, Erik Jon; Putman, S. Michael; Handler, Laura; Polly, Drew

2017-01-01

Student Voice is a term that honors the participatory roles that students have when they enter learning spaces like classrooms. Student Voice is the recognition of students' choice, creativity, and freedom. Seminal educationists--like Dewey and Montessori--centered the purposes of education in the flourishing and valuing of Student Voice. This…
Age- and gender-related variations of emotion recognition in pseudowords and faces.

PubMed

Demenescu, Liliana R; Mathiak, Krystyna A; Mathiak, Klaus

2014-01-01

BACKGROUND/STUDY CONTEXT: The ability to interpret emotionally salient stimuli is an important skill for successful social functioning at any age. The objective of the present study was to disentangle age and gender effects on emotion recognition ability in voices and faces. Three age groups of participants (young, age range: 18-35 years; middle-aged, age range: 36-55 years; and older, age range: 56-75 years) identified basic emotions presented in voices and faces in a forced-choice paradigm. Five emotions (angry, fearful, sad, disgusted, and happy) and a nonemotional category (neutral) were shown as encoded in color photographs of facial expressions and pseudowords spoken in affective prosody. Overall, older participants had a lower accuracy rate in categorizing emotions than young and middle-aged participants. Females performed better than males in recognizing emotions from voices, and this gender difference emerged in middle-aged and older participants. The performance of emotion recognition in faces was significantly correlated with the performance in voices. The current study provides further evidence for a general age and gender effect on emotion recognition; the advantage of females seems to be age- and stimulus modality-dependent.
Investigation of air transportation technology at Princeton University, 1983

NASA Technical Reports Server (NTRS)

Stengel, Robert F.

1987-01-01

Progress is discussed for each of the following areas: voice recognition technology for flight control; guidance and control strategies for penetration of microbursts and wind shear; application of artificial intelligence in flight control systems; and computer-aided aircraft design.
Federal Barriers to Innovation

ERIC Educational Resources Information Center

Miller, Raegen; Lake, Robin

2012-01-01

With educational outcomes inadequate, resources tight, and students' academic needs growing more complex, America's education system is certainly ready for technological innovation. And technology itself is ripe to be exploited. Devices harnessing cheap computing power have become smart and connected. Voice recognition, artificial intelligence,…
The Sound-to-Speech Translations Utilizing Graphics Mediation Interface for Students with Severe Handicaps. Final Report.

ERIC Educational Resources Information Center

Brown, Carrie; And Others

This final report describes activities and outcomes of a research project on a sound-to-speech translation system utilizing a graphic mediation interface for students with severe disabilities. The STS/Graphics system is a voice recognition, computer-based system designed to allow individuals with mental retardation and/or severe physical…
How well does voice interaction work in space?

NASA Technical Reports Server (NTRS)

Morris, Randy B.; Whitmore, Mihriban; Adam, Susan C.

1993-01-01

The methods and results of an evaluation of the Voice Navigator software package are discussed. The first phase or ground phase of the study consisted of creating, or training, computer voice files of specific commands. This consisted of repeating each of six commands eight times. The files were then tested for recognition accuracy by the software aboard the microgravity aircraft. During the second phase, both voice training and testing were performed in microgravity. Inflight training was done due to problems encountered in phase one which were believed to be caused by ambient noise levels. Both quantitative and qualitative data were collected. Only one of the commands was found to offer consistently high recognition rates across subjects during the second phase.
Detection of Terrorist Preparations by an Artificial Intelligence Expert System Employing Fuzzy Signal Detection Theory

DTIC Science & Technology

2004-10-25

FUSEDOT does not require facial recognition , or video surveillance of public areas, both of which are apparently a component of TIA ([26], pp...does not use fuzzy signal detection. Involves facial recognition and video surveillance of public areas. Involves monitoring the content of voice...fuzzy signal detection, which TIA does not. Second, FUSEDOT would be easier to develop, because it does not require the development of facial
Economic Evaluation of Voice Recognition (VR) for the Clinician’s Desktop at the Naval Hospital Roosevelt Roads

DTIC Science & Technology

1997-09-01

first PC-based, very large vocabulary dictation system with a continuous natural language free flow approach to speech recognition. (This system allows...indicating the likelihood that a particular stored HMM reference model is the best match for the input. This approach is called the Baum-Welch...InfoCentral, and Envoy 1.0; and Lotus Development Corp.’s SmartSuite 3, Approach 3.0, and Organizer. 2. IBM At a press conference in New York in June 1997, IBM
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

PubMed Central

Martinelli, Eugenio; Mencattini, Arianna; Di Natale, Corrado

2016-01-01

Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present ‘intelligent personal assistants’, and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants’ emotional state, selective/differential data collection based on emotional content, etc.). PMID:27563724
Selective attention in perceptual adjustments to voice.

PubMed

Mullennix, J W; Howe, J N

1999-10-01

The effects of perceptual adjustments to voice information on the perception of isolated spoken words were examined. In two experiments, spoken target words were preceded or followed within a trial by a neutral word spoken in the same voice or in a different voice as the target. Over-all, words were reproduced more accurately on trials on which the voice of the neutral word matched the voice of the spoken target word, suggesting that perceptual adjustments to voice interfere with word processing. This result, however, was mediated by selective attention to voice. The results provide further evidence of a close processing relationship between perceptual adjustments to voice and spoken word recognition.
Whispering - The hidden side of auditory communication.

PubMed

Frühholz, Sascha; Trost, Wiebke; Grandjean, Didier

2016-11-15

Whispering is a unique expression mode that is specific to auditory communication. Individuals switch their vocalization mode to whispering especially when affected by inner emotions in certain social contexts, such as in intimate relationships or intimidating social interactions. Although this context-dependent whispering is adaptive, whispered voices are acoustically far less rich than phonated voices and thus impose higher hearing and neural auditory decoding demands for recognizing their socio-affective value by listeners. The neural dynamics underlying this recognition especially from whispered voices are largely unknown. Here we show that whispered voices in humans are considerably impoverished as quantified by an entropy measure of spectral acoustic information, and this missing information needs large-scale neural compensation in terms of auditory and cognitive processing. Notably, recognizing the socio-affective information from voices was slightly more difficult from whispered voices, probably based on missing tonal information. While phonated voices elicited extended activity in auditory regions for decoding of relevant tonal and time information and the valence of voices, whispered voices elicited activity in a complex auditory-frontal brain network. Our data suggest that a large-scale multidirectional brain network compensates for the impoverished sound quality of socially meaningful environmental signals to support their accurate recognition and valence attribution. Copyright © 2016 Elsevier Inc. All rights reserved.

Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners.

PubMed

Paulmann, Silke; Uskul, Ayse K

2014-01-01

This cross-cultural study of emotional tone of voice recognition tests the in-group advantage hypothesis (Elfenbein & Ambady, 2002) employing a quasi-balanced design. Individuals of Chinese and British background were asked to recognise pseudosentences produced by Chinese and British native speakers, displaying one of seven emotions (anger, disgust, fear, happy, neutral tone of voice, sad, and surprise). Findings reveal that emotional displays were recognised at rates higher than predicted by chance; however, members of each cultural group were more accurate in recognising the displays communicated by a member of their own cultural group than a member of the other cultural group. Moreover, the evaluation of error matrices indicates that both culture groups relied on similar mechanism when recognising emotional displays from the voice. Overall, the study reveals evidence for both universal and culture-specific principles in vocal emotion recognition.
Processing voiceless vowels in Japanese: Effects of language-specific phonological knowledge

NASA Astrophysics Data System (ADS)

Ogasawara, Naomi

2005-04-01

There has been little research on processing allophonic variation in the field of psycholinguistics. This study focuses on processing the voiced/voiceless allophonic alternation of high vowels in Japanese. Three perception experiments were conducted to explore how listeners parse out vowels with the voicing alternation from other segments in the speech stream and how the different voicing statuses of the vowel affect listeners' word recognition process. The results from the three experiments show that listeners use phonological knowledge of their native language for phoneme processing and for word recognition. However, interactions of the phonological and acoustic effects are observed to be different in each process. The facilitatory phonological effect and the inhibitory acoustic effect cancel out one another in phoneme processing; while in word recognition, the facilitatory phonological effect overrides the inhibitory acoustic effect.
The Effects of Certain Background Noises on the Performance of a Voice Recognition System.

DTIC Science & Technology

1980-09-01

Principles in Experimental Design. New York: McGraw-Hill, 1962. Woodworth, R.S. and H. Schlosberg, Experimental Psychology, (Revised edition), New...collection iheet APPENDIX II EXPERIMENTAL PROTOCOL AND SUBJECTS’ INSTRICTJONS THIS IS AN EXPERIMENT DESIGNED TO EVALUJATE SOME ," lE RECOGNITION EQUIPMENT. I...37. CDR Paul Chatelier OUSD R&E Room 3D129 Pentagon Washington, D.C. 20301 38. Ralph Cleveland NFMSO Code 9333 Mechanicsburg, PA 17055 39. Clay Coler
Separation of Singing Voice from Music Accompaniment for Monaural Recordings

DTIC Science & Technology

2005-09-01

Directory: pub/tech-report/2005 File in pdf format: TR61.pdf Separation of Singing Voice from Music Accompaniment for Monaural Recordings Yipeng Li...Abstract Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer...identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little
Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

PubMed

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers' voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker's face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Deaf-And-Mute Sign Language Generation System

NASA Astrophysics Data System (ADS)

Kawai, Hideo; Tamura, Shinichi

1984-08-01

We have developed a system which can recognize speech and generate the corresponding animation-like sign language sequence. The system is implemented in a popular personal computer. This has three video-RAM's and a voice recognition board which can recognize only registered voice of a specific speaker. Presently, fourty sign language patterns and fifty finger spellings are stored in two floppy disks. Each sign pattern is composed of one to four sub-patterns. That is, if the pattern is composed of one sub-pattern, it is displayed as a still pattern. If not, it is displayed as a motion pattern. This system will help communications between deaf-and-mute persons and healthy persons. In order to display in high speed, almost programs are written in a machine language.
Using Ambulatory Voice Monitoring to Investigate Common Voice Disorders: Research Update

PubMed Central

Mehta, Daryush D.; Van Stan, Jarrad H.; Zañartu, Matías; Ghassemi, Marzyeh; Guttag, John V.; Espinoza, Víctor M.; Cortés, Juan P.; Cheyne, Harold A.; Hillman, Robert E.

2015-01-01

Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, referred to as vocal hyperfunction. The clinical management of hyperfunctional voice disorders would be greatly enhanced by the ability to monitor and quantify detrimental vocal behaviors during an individual’s activities of daily life. This paper provides an update on ongoing work that uses a miniature accelerometer on the neck surface below the larynx to collect a large set of ambulatory data on patients with hyperfunctional voice disorders (before and after treatment) and matched-control subjects. Three types of analysis approaches are being employed in an effort to identify the best set of measures for differentiating among hyperfunctional and normal patterns of vocal behavior: (1) ambulatory measures of voice use that include vocal dose and voice quality correlates, (2) aerodynamic measures based on glottal airflow estimates extracted from the accelerometer signal using subject-specific vocal system models, and (3) classification based on machine learning and pattern recognition approaches that have been used successfully in analyzing long-term recordings of other physiological signals. Preliminary results demonstrate the potential for ambulatory voice monitoring to improve the diagnosis and treatment of common hyperfunctional voice disorders. PMID:26528472
The computer in office medical practice.

PubMed

Dowdle, John

2002-04-01

There will continue to be change and evolution in the medical office environment. As voice recognition systems continue to improve, instant creation of office notes with the absence of dictation may be commonplace. As medical and computer technology evolves, we must continue to evaluate the many new computer systems that can assist us in our clinical office practice.
Improving Speaker Recognition by Biometric Voice Deconstruction

PubMed Central

Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

2015-01-01

Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions. PMID:26442245
Improving Speaker Recognition by Biometric Voice Deconstruction.

PubMed

Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

2015-01-01

Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.
Postlingual adult performance in noise with HiRes 120 and ClearVoice Low, Medium, and High.

PubMed

Holden, Laura K; Brenner, Christine; Reeder, Ruth M; Firszt, Jill B

2013-11-01

The study's objectives were to evaluate speech recognition in multiple listening conditions using several noise types with HiRes 120 and ClearVoice (Low, Medium, High) and to determine which ClearVoice program was most beneficial for everyday use. Fifteen postlingual adults attended four sessions; speech recognition was assessed at sessions 1 and 3 with HiRes 120 and at sessions 2 and 4 with all ClearVoice programs. Test measures included sentences presented in restaurant noise (R-SPACE), in speech-spectrum noise, in four- and eight-talker babble, and connected discourse presented in 12-talker babble. Participants completed a questionnaire comparing ClearVoice programs. Significant group differences in performance between HiRes 120 and ClearVoice were present only in the R-SPACE; performance was better with ClearVoice High than HiRes 120. Among ClearVoice programs, no significant group differences were present for any measure. Individual results revealed most participants performed better in the R-SPACE with ClearVoice than HiRes 120. For other measures, significant individual differences between HiRes 120 and ClearVoice were not prevalent. Individual results among ClearVoice programs differed and overall preferences varied. Questionnaire data indicated increased understanding with High and Medium in certain environments. R-SPACE and questionnaire results indicated an advantage for ClearVoice High and Medium. Individual test and preference data showed mixed results between ClearVoice programs making global recommendations difficult; however, results suggest providing ClearVoice High and Medium and HiRes 120 as processor options for adults willing to change settings. For adults unwilling or unable to change settings, ClearVoice Medium is a practical choice for daily listening.
Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

NASA Technical Reports Server (NTRS)

Ye, Sherry

2015-01-01

NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.
Freedom to Grow: Children's Perspectives of Student Voice

ERIC Educational Resources Information Center

Quinn, Sarah; Owen, Susanne

2014-01-01

This article explores the power of student voice, in recognition of the child's right to be treated as a capable, competent social actor involved in the education process. In this study, student voice is considered in the light of improving students' engagement and personal and social development at the primary school level. It emphasizes the…
Recognition of facial, auditory, and bodily emotions in older adults.

PubMed

Ruffman, Ted; Halberstadt, Jamin; Murray, Janice

2009-11-01

Understanding older adults' social functioning difficulties requires insight into their recognition of emotion processing in voices and bodies, not just faces, the focus of most prior research. We examined 60 young and 61 older adults' recognition of basic emotions in facial, vocal, and bodily expressions, and when matching faces and bodies to voices, using 120 emotion items. Older adults were worse than young adults in 17 of 30 comparisons, with consistent difficulties in recognizing both positive (happy) and negative (angry and sad) vocal and bodily expressions. Nearly three quarters of older adults functioned at a level similar to the lowest one fourth of young adults, suggesting that age-related changes are common. In addition, we found that older adults' difficulty in matching emotions was not explained by difficulty on the component sources (i.e., faces or voices on their own), suggesting an additional problem of integration.
Design and performance of a large vocabulary discrete word recognition system. Volume 1: Technical report. [real time computer technique for voice data processing

NASA Technical Reports Server (NTRS)

1973-01-01

The development, construction, and test of a 100-word vocabulary near real time word recognition system are reported. Included are reasonable replacement of any one or all 100 words in the vocabulary, rapid learning of a new speaker, storage and retrieval of training sets, verbal or manual single word deletion, continuous adaptation with verbal or manual error correction, on-line verification of vocabulary as spoken, system modes selectable via verification display keyboard, relationship of classified word to neighboring word, and a versatile input/output interface to accommodate a variety of applications.
Robotic air vehicle. Blending artificial intelligence with conventional software

NASA Technical Reports Server (NTRS)

Mcnulty, Christa; Graham, Joyce; Roewer, Paul

1987-01-01

The Robotic Air Vehicle (RAV) system is described. The program's objectives were to design, implement, and demonstrate cooperating expert systems for piloting robotic air vehicles. The development of this system merges conventional programming used in passive navigation with Artificial Intelligence techniques such as voice recognition, spatial reasoning, and expert systems. The individual components of the RAV system are discussed as well as their interactions with each other and how they operate as a system.
Statistical Evaluation of Biometric Evidence in Forensic Automatic Speaker Recognition

NASA Astrophysics Data System (ADS)

Drygajlo, Andrzej

Forensic speaker recognition is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace). This paper aims at presenting forensic automatic speaker recognition (FASR) methods that provide a coherent way of quantifying and presenting recorded voice as biometric evidence. In such methods, the biometric evidence consists of the quantified degree of similarity between speaker-dependent features extracted from the trace and speaker-dependent features extracted from recorded speech of a suspect. The interpretation of recorded voice as evidence in the forensic context presents particular challenges, including within-speaker (within-source) variability and between-speakers (between-sources) variability. Consequently, FASR methods must provide a statistical evaluation which gives the court an indication of the strength of the evidence given the estimated within-source and between-sources variabilities. This paper reports on the first ENFSI evaluation campaign through a fake case, organized by the Netherlands Forensic Institute (NFI), as an example, where an automatic method using the Gaussian mixture models (GMMs) and the Bayesian interpretation (BI) framework were implemented for the forensic speaker recognition task.
Design and development of a Space Station proximity operations research and development mockup

NASA Technical Reports Server (NTRS)

Haines, Richard F.

1986-01-01

Proximity operations (Prox-Ops) on-orbit refers to all activities taking place within one km of the Space Station. Designing a Prox-Ops control station calls for a comprehensive systems approach which takes into account structural constraints, orbital dynamics including approach/departure flight paths, myriad human factors and other topics. This paper describes a reconfigurable full-scale mock-up of a Prox-Ops station constructed at Ames incorporating an array of windows (with dynamic star field, target vehicle(s), and head-up symbology), head-down perspective display of manned and unmanned vehicles, voice- actuated 'electronic checklist', computer-generated voice system, expert system (to help diagnose subsystem malfunctions), and other displays and controls. The facility is used for demonstrations of selected Prox-Ops approach scenarios, human factors research (work-load assessment, determining external vision envelope requirements, head-down and head-up symbology design, voice synthesis and recognition research, etc.) and development of engineering design guidelines for future module interiors.
Multifunctional microcontrollable interface module

NASA Astrophysics Data System (ADS)

Spitzer, Mark B.; Zavracky, Paul M.; Rensing, Noa M.; Crawford, J.; Hockman, Angela H.; Aquilino, P. D.; Girolamo, Henry J.

2001-08-01

This paper reports the development of a complete eyeglass- mounted computer interface system including display, camera and audio subsystems. The display system provides an SVGA image with a 20 degree horizontal field of view. The camera system has been optimized for face recognition and provides a 19 degree horizontal field of view. A microphone and built-in pre-amp optimized for voice recognition and a speaker on an articulated arm are included for audio. An important feature of the system is a high degree of adjustability and reconfigurability. The system has been developed for testing by the Military Police, in a complete system comprising the eyeglass-mounted interface, a wearable computer, and an RF link. Details of the design, construction, and performance of the eyeglass-based system are discussed.
Onset and Maturation of Fetal Heart Rate Response to the Mother's Voice over Late Gestation

ERIC Educational Resources Information Center

Kisilevsky, Barbara S.; Hains, Sylvia M. J.

2011-01-01

Background: Term fetuses discriminate their mother's voice from a female stranger's, suggesting recognition/learning of some property of her voice. Identification of the onset and maturation of the response would increase our understanding of the influence of environmental sounds on the development of sensory abilities and identify the period when…

Functional Connectivity between Face-Movement and Speech-Intelligibility Areas during Auditory-Only Speech Perception

PubMed Central

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas. PMID:24466026
Blind speech separation system for humanoid robot with FastICA for audio filtering and separation

NASA Astrophysics Data System (ADS)

Budiharto, Widodo; Santoso Gunawan, Alexander Agung

2016-07-01

Nowadays, there are many developments in building intelligent humanoid robot, mainly in order to handle voice and image. In this research, we propose blind speech separation system using FastICA for audio filtering and separation that can be used in education or entertainment. Our main problem is to separate the multi speech sources and also to filter irrelevant noises. After speech separation step, the results will be integrated with our previous speech and face recognition system which is based on Bioloid GP robot and Raspberry Pi 2 as controller. The experimental results show the accuracy of our blind speech separation system is about 88% in command and query recognition cases.
Incorporating Speech Recognition into a Natural User Interface

NASA Technical Reports Server (NTRS)

Chapa, Nicholas

2017-01-01

The Augmented/ Virtual Reality (AVR) Lab has been working to study the applicability of recent virtual and augmented reality hardware and software to KSC operations. This includes the Oculus Rift, HTC Vive, Microsoft HoloLens, and Unity game engine. My project in this lab is to integrate voice recognition and voice commands into an easy to modify system that can be added to an existing portion of a Natural User Interface (NUI). A NUI is an intuitive and simple to use interface incorporating visual, touch, and speech recognition. The inclusion of speech recognition capability will allow users to perform actions or make inquiries using only their voice. The simplicity of needing only to speak to control an on-screen object or enact some digital action means that any user can quickly become accustomed to using this system. Multiple programs were tested for use in a speech command and recognition system. Sphinx4 translates speech to text using a Hidden Markov Model (HMM) based Language Model, an Acoustic Model, and a word Dictionary running on Java. PocketSphinx had similar functionality to Sphinx4 but instead ran on C. However, neither of these programs were ideal as building a Java or C wrapper slowed performance. The most ideal speech recognition system tested was the Unity Engine Grammar Recognizer. A Context Free Grammar (CFG) structure is written in an XML file to specify the structure of phrases and words that will be recognized by Unity Grammar Recognizer. Using Speech Recognition Grammar Specification (SRGS) 1.0 makes modifying the recognized combinations of words and phrases very simple and quick to do. With SRGS 1.0, semantic information can also be added to the XML file, which allows for even more control over how spoken words and phrases are interpreted by Unity. Additionally, using a CFG with SRGS 1.0 produces a Finite State Machine (FSM) functionality limiting the potential for incorrectly heard words or phrases. The purpose of my project was to investigate options for a Speech Recognition System. To that end I attempted to integrate Sphinx4 into a user interface. Sphinx4 had great accuracy and is the only free program able to perform offline speech dictation. However it had a limited dictionary of words that could be recognized, single syllable words were almost impossible for it to hear, and since it ran on Java it could not be integrated into the Unity based NUI. PocketSphinx ran much faster than Sphinx4 which would've made it ideal as a plugin to the Unity NUI, unfortunately creating a C# wrapper for the C code made the program unusable with Unity due to the wrapper slowing code execution and class files becoming unreachable. Unity Grammar Recognizer is the ideal speech recognition interface, it is flexible in recognizing multiple variations of the same command. It is also the most accurate program in recognizing speech due to using an XML grammar to specify speech structure instead of relying solely on a Dictionary and Language model. The Unity Grammar Recognizer will be used with the NUI for these reasons as well as being written in C# which further simplifies the incorporation.
Plastic reorganization of neural systems for perception of others in the congenitally blind.

PubMed

Fairhall, S L; Porter, K B; Bellucci, C; Mazzetti, M; Cipolli, C; Gobbini, M I

2017-09-01

Recent evidence suggests that the function of the core system for face perception might extend beyond visual face-perception to a broader role in person perception. To critically test the broader role of core face-system in person perception, we examined the role of the core system during the perception of others in 7 congenitally blind individuals and 15 sighted subjects by measuring their neural responses using fMRI while they listened to voices and performed identity and emotion recognition tasks. We hypothesised that in people who have had no visual experience of faces, core face-system areas may assume a role in the perception of others via voices. Results showed that emotions conveyed by voices can be decoded in homologues of the core face system only in the blind. Moreover, there was a specific enhancement of response to verbal as compared to non-verbal stimuli in bilateral fusiform face areas and the right posterior superior temporal sulcus showing that the core system also assumes some language-related functions in the blind. These results indicate that, in individuals with no history of visual experience, areas of the core system for face perception may assume a role in aspects of voice perception that are relevant to social cognition and perception of others' emotions. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Speech-based Class Attendance

NASA Astrophysics Data System (ADS)

Faizel Amri, Umar; Nur Wahidah Nik Hashim, Nik; Hazrin Hany Mohamad Hanif, Noor

2017-11-01

In the department of engineering, students are required to fulfil at least 80 percent of class attendance. Conventional method requires student to sign his/her initial on the attendance sheet. However, this method is prone to cheating by having another student signing for their fellow classmate that is absent. We develop our hypothesis according to a verse in the Holy Qur’an (95:4), “We have created men in the best of mould”. Based on the verse, we believe each psychological characteristic of human being is unique and thus, their speech characteristic should be unique. In this paper we present the development of speech biometric-based attendance system. The system requires user’s voice to be installed in the system as trained data and it is saved in the system for registration of the user. The following voice of the user will be the test data in order to verify with the trained data stored in the system. The system uses PSD (Power Spectral Density) and Transition Parameter as the method for feature extraction of the voices. Euclidean and Mahalanobis distances are used in order to verified the user’s voice. For this research, ten subjects of five females and five males were chosen to be tested for the performance of the system. The system performance in term of recognition rate is found to be 60% correct identification of individuals.
Neurocognition and symptoms identify links between facial recognition and emotion processing in schizophrenia: Meta-analytic findings

PubMed Central

Ventura, Joseph; Wood, Rachel C.; Jimenez, Amy M.; Hellemann, Gerhard S.

2014-01-01

Background In schizophrenia patients, one of the most commonly studied deficits of social cognition is emotion processing (EP), which has documented links to facial recognition (FR). But, how are deficits in facial recognition linked to emotion processing deficits? Can neurocognitive and symptom correlates of FR and EP help differentiate the unique contribution of FR to the domain of social cognition? Methods A meta-analysis of 102 studies (combined n = 4826) in schizophrenia patients was conducted to determine the magnitude and pattern of relationships between facial recognition, emotion processing, neurocognition, and type of symptom. Results Meta-analytic results indicated that facial recognition and emotion processing are strongly interrelated (r = .51). In addition, the relationship between FR and EP through voice prosody (r = .58) is as strong as the relationship between FR and EP based on facial stimuli (r = .53). Further, the relationship between emotion recognition, neurocognition, and symptoms is independent of the emotion processing modality – facial stimuli and voice prosody. Discussion The association between FR and EP that occurs through voice prosody suggests that FR is a fundamental cognitive process. The observed links between FR and EP might be due to bottom-up associations between neurocognition and EP, and not simply because most emotion recognition tasks use visual facial stimuli. In addition, links with symptoms, especially negative symptoms and disorganization, suggest possible symptom mechanisms that contribute to FR and EP deficits. PMID:24268469
Neurocognition and symptoms identify links between facial recognition and emotion processing in schizophrenia: meta-analytic findings.

PubMed

Ventura, Joseph; Wood, Rachel C; Jimenez, Amy M; Hellemann, Gerhard S

2013-12-01

In schizophrenia patients, one of the most commonly studied deficits of social cognition is emotion processing (EP), which has documented links to facial recognition (FR). But, how are deficits in facial recognition linked to emotion processing deficits? Can neurocognitive and symptom correlates of FR and EP help differentiate the unique contribution of FR to the domain of social cognition? A meta-analysis of 102 studies (combined n=4826) in schizophrenia patients was conducted to determine the magnitude and pattern of relationships between facial recognition, emotion processing, neurocognition, and type of symptom. Meta-analytic results indicated that facial recognition and emotion processing are strongly interrelated (r=.51). In addition, the relationship between FR and EP through voice prosody (r=.58) is as strong as the relationship between FR and EP based on facial stimuli (r=.53). Further, the relationship between emotion recognition, neurocognition, and symptoms is independent of the emotion processing modality - facial stimuli and voice prosody. The association between FR and EP that occurs through voice prosody suggests that FR is a fundamental cognitive process. The observed links between FR and EP might be due to bottom-up associations between neurocognition and EP, and not simply because most emotion recognition tasks use visual facial stimuli. In addition, links with symptoms, especially negative symptoms and disorganization, suggest possible symptom mechanisms that contribute to FR and EP deficits. © 2013 Elsevier B.V. All rights reserved.
Voice emotion recognition by cochlear-implanted children and their normally-hearing peers

PubMed Central

Chatterjee, Monita; Zion, Danielle; Deroche, Mickael L.; Burianek, Brooke; Limb, Charles; Goren, Alison; Kulkarni, Aditya M.; Christensen, Julie A.

2014-01-01

Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups’ mean performance is similar to aNHs’ performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. PMID:25448167
Success with voice recognition.

PubMed

Sferrella, Sheila M

2003-01-01

You need a compelling reason to implement voice recognition technology. At my institution, the compelling reason was a turnaround time for Radiology results of more than two days. Only 41 percent of our reports were transcribed and signed within 24 hours. In November 1998, a team from Lehigh Valley Hospital went to RSNA and reviewed every voice system on the market. The evaluation was done with the radiologist workflow in mind, and we came back from the meeting with the vendor selection completed. The next steps included developing a business plan, approval of funds, reference calls to more than 15 sites and contract negotiation, all of which took about six months. The department of Radiology at Lehigh Valley Hospital and Health Network (LVHHN) is a multi-site center that performs over 360,000 procedures annually. The department handles all modalities of radiology: general diagnosis, neuroradiology, ultrasound, CT Scan, MRI, interventional radiology, arthography, myelography, bone densitometry, nuclear medicine, PET imaging, vascular lab and other advanced procedures. The department consists of 200 FTEs and a medical staff of more than 40 radiologists. The budget is in the $10.3 million range. There are three hospital sites and four outpatient imaging center sites where services are provided. At Lehigh Valley Hospital, radiologists are not dedicated to one subspecialty, so implementing a voice system by modality was not an option. Because transcription was so far behind, we needed to eliminate that part of the process. As a result, we decided to deploy the system all at once and with the radiologists as editors. The planning and testing phase took about four months, and the implementation took two weeks. We deployed over 40 workstations and trained close to 50 physicians. The radiologists brought in an extra radiologist from our group for the two weeks of training. That allowed us to train without taking a radiologist out of the department. We trained three to six radiologists a day. I projected a savings of 5.0 FTEs over two years. The actual savings were 8.0 FTEs within three weeks for the first phase and an additional 4.3 FTEs within two weeks of the second phase. The transcription staff was retained to perform other types of transcription and not displaced. The goal was to reduce Medical Records' outsourcing expenses by $670,000 over three years. The actual savings are in excess of $900,000. The proposed payback period was 17 months, and the actual was less than 12 months. For two years prior to implementing the voice system, the turnaround time at Lehigh Valley was 41 percent within 24 hours. One week after implementation, the turnaround time was 78 percent within 24 hours. Today it ranges between 85 percent and 92 percent. Overall, the radiologists at Lehigh Valley Hospital did an excellent job with the cultural change to voice recognition. It has made a major impact on our ability to get reports to physicians in a timely manner so they can make treatment decisions.
Study on intelligent processing system of man-machine interactive garment frame model

NASA Astrophysics Data System (ADS)

Chen, Shuwang; Yin, Xiaowei; Chang, Ruijiang; Pan, Peiyun; Wang, Xuedi; Shi, Shuze; Wei, Zhongqian

2018-05-01

A man-machine interactive garment frame model intelligent processing system is studied in this paper. The system consists of several sensor device, voice processing module, mechanical parts and data centralized acquisition devices. The sensor device is used to collect information on the environment changes brought by the body near the clothes frame model, the data collection device is used to collect the information of the environment change induced by the sensor device, voice processing module is used for speech recognition of nonspecific person to achieve human-machine interaction, mechanical moving parts are used to make corresponding mechanical responses to the information processed by data collection device.it is connected with data acquisition device by a means of one-way connection. There is a one-way connection between sensor device and data collection device, two-way connection between data acquisition device and voice processing module. The data collection device is one-way connection with mechanical movement parts. The intelligent processing system can judge whether it needs to interact with the customer, realize the man-machine interaction instead of the current rigid frame model.
Applications of artificial intelligence to space station: General purpose intelligent sensor interface

NASA Technical Reports Server (NTRS)

Mckee, James W.

1988-01-01

This final report describes the accomplishments of the General Purpose Intelligent Sensor Interface task of the Applications of Artificial Intelligence to Space Station grant for the period from October 1, 1987 through September 30, 1988. Portions of the First Biannual Report not revised will not be included but only referenced. The goal is to develop an intelligent sensor system that will simplify the design and development of expert systems using sensors of the physical phenomena as a source of data. This research will concentrate on the integration of image processing sensors and voice processing sensors with a computer designed for expert system development. The result of this research will be the design and documentation of a system in which the user will not need to be an expert in such areas as image processing algorithms, local area networks, image processor hardware selection or interfacing, television camera selection, voice recognition hardware selection, or analog signal processing. The user will be able to access data from video or voice sensors through standard LISP statements without any need to know about the sensor hardware or software.
"Who" is saying "what"? Brain-based decoding of human voice and speech.

PubMed

Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer

2008-11-07

Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
A preliminary analysis of human factors affecting the recognition accuracy of a discrete word recognizer for C3 systems

NASA Astrophysics Data System (ADS)

Yellen, H. W.

1983-03-01

Literature pertaining to Voice Recognition abounds with information relevant to the assessment of transitory speech recognition devices. In the past, engineering requirements have dictated the path this technology followed. But, other factors do exist that influence recognition accuracy. This thesis explores the impact of Human Factors on the successful recognition of speech, principally addressing the differences or variability among users. A Threshold Technology T-600 was used for a 100 utterance vocubalary to test 44 subjects. A statistical analysis was conducted on 5 generic categories of Human Factors: Occupational, Operational, Psychological, Physiological and Personal. How the equipment is trained and the experience level of the speaker were found to be key characteristics influencing recognition accuracy. To a lesser extent computer experience, time or week, accent, vital capacity and rate of air flow, speaker cooperativeness and anxiety were found to affect overall error rates.
Automatic speech recognition research at NASA-Ames Research Center

NASA Technical Reports Server (NTRS)

Coler, Clayton R.; Plummer, Robert P.; Huff, Edward M.; Hitchcock, Myron H.

1977-01-01

A trainable acoustic pattern recognizer manufactured by Scope Electronics is presented. The voice command system VCS encodes speech by sampling 16 bandpass filters with center frequencies in the range from 200 to 5000 Hz. Variations in speaking rate are compensated for by a compression algorithm that subdivides each utterance into eight subintervals in such a way that the amount of spectral change within each subinterval is the same. The recorded filter values within each subinterval are then reduced to a 15-bit representation, giving a 120-bit encoding for each utterance. The VCS incorporates a simple recognition algorithm that utilizes five training samples of each word in a vocabulary of up to 24 words. The recognition rate of approximately 85 percent correct for untrained speakers and 94 percent correct for trained speakers was not considered adequate for flight systems use. Therefore, the built-in recognition algorithm was disabled, and the VCS was modified to transmit 120-bit encodings to an external computer for recognition.
Emotion Recognition From Singing Voices Using Contemporary Commercial Music and Classical Styles.

PubMed

Hakanpää, Tua; Waaramaa, Teija; Laukkanen, Anne-Maria

2018-02-22

This study examines the recognition of emotion in contemporary commercial music (CCM) and classical styles of singing. This information may be useful in improving the training of interpretation in singing. This is an experimental comparative study. Thirteen singers (11 female, 2 male) with a minimum of 3 years' professional-level singing studies (in CCM or classical technique or both) participated. They sang at three pitches (females: a, e1, a1, males: one octave lower) expressing anger, sadness, joy, tenderness, and a neutral state. Twenty-nine listeners listened to 312 short (0.63- to 4.8-second) voice samples, 135 of which were sung using a classical singing technique and 165 of which were sung in a CCM style. The listeners were asked which emotion they heard. Activity and valence were derived from the chosen emotions. The percentage of correct recognitions out of all the answers in the listening test (N = 9048) was 30.2%. The recognition percentage for the CCM-style singing technique was higher (34.5%) than for the classical-style technique (24.5%). Valence and activation were better perceived than the emotions themselves, and activity was better recognized than valence. A higher pitch was more likely to be perceived as joy or anger, and a lower pitch as sorrow. Both valence and activation were better recognized in the female CCM samples than in the other samples. There are statistically significant differences in the recognition of emotions between classical and CCM styles of singing. Furthermore, in the singing voice, pitch affects the perception of emotions, and valence and activity are more easily recognized than emotions. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Event identification by acoustic signature recognition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dress, W.B.; Kercel, S.W.

1995-07-01

Many events of interest to the security commnnity produce acoustic emissions that are, in principle, identifiable as to cause. Some obvious examples are gunshots, breaking glass, takeoffs and landings of small aircraft, vehicular engine noises, footsteps (high frequencies when on gravel, very low frequencies. when on soil), and voices (whispers to shouts). We are investigating wavelet-based methods to extract unique features of such events for classification and identification. We also discuss methods of classification and pattern recognition specifically tailored for acoustic signatures obtained by wavelet analysis. The paper is divided into three parts: completed work, work in progress, and futuremore » applications. The completed phase has led to the successful recognition of aircraft types on landing and takeoff. Both small aircraft (twin-engine turboprop) and large (commercial airliners) were included in the study. The project considered the design of a small, field-deployable, inexpensive device. The techniques developed during the aircraft identification phase were then adapted to a multispectral electromagnetic interference monitoring device now deployed in a nuclear power plant. This is a general-purpose wavelet analysis engine, spanning 14 octaves, and can be adapted for other specific tasks. Work in progress is focused on applying the methods previously developed to speaker identification. Some of the problems to be overcome include recognition of sounds as voice patterns and as distinct from possible background noises (e.g., music), as well as identification of the speaker from a short-duration voice sample. A generalization of the completed work and the work in progress is a device capable of classifying any number of acoustic events-particularly quasi-stationary events such as engine noises and voices and singular events such as gunshots and breaking glass. We will show examples of both kinds of events and discuss their recognition likelihood.« less
Culture/Religion and Identity: Social Justice versus Recognition

ERIC Educational Resources Information Center

Bekerman, Zvi

2012-01-01

Recognition is the main word attached to multicultural perspectives. The multicultural call for recognition, the one calling for the recognition of cultural minorities and identities, the one now voiced by liberal states all over and also in Israel was a more difficult one. It took the author some time to realize that calling for the recognition…
Integrated voice and visual systems research topics

NASA Technical Reports Server (NTRS)

Williams, Douglas H.; Simpson, Carol A.

1986-01-01

A series of studies was performed to investigate factors of helicopter speech and visual system design and measure the effects of these factors on human performance, both for pilots and non-pilots. The findings and conclusions of these studies were applied by the U.S. Army to the design of the Army's next generation threat warning system for helicopters and to the linguistic functional requirements for a joint Army/NASA flightworthy, experimental speech generation and recognition system.
Social power and recognition of emotional prosody: High power is associated with lower recognition accuracy than low power.

PubMed

Uskul, Ayse K; Paulmann, Silke; Weick, Mario

2016-02-01

Listeners have to pay close attention to a speaker's tone of voice (prosody) during daily conversations. This is particularly important when trying to infer the emotional state of the speaker. Although a growing body of research has explored how emotions are processed from speech in general, little is known about how psychosocial factors such as social power can shape the perception of vocal emotional attributes. Thus, the present studies explored how social power affects emotional prosody recognition. In a correlational study (Study 1) and an experimental study (Study 2), we show that high power is associated with lower accuracy in emotional prosody recognition than low power. These results, for the first time, suggest that individuals experiencing high or low power perceive emotional tone of voice differently. (c) 2016 APA, all rights reserved).
Auditory emotion recognition impairments in Schizophrenia: Relationship to acoustic features and cognition

PubMed Central

Gold, Rinat; Butler, Pamela; Revheim, Nadine; Leitman, David; Hansen, John A.; Gur, Ruben; Kantrowitz, Joshua T.; Laukka, Petri; Juslin, Patrik N.; Silipo, Gail S.; Javitt, Daniel C.

2013-01-01

Objective Schizophrenia is associated with deficits in ability to perceive emotion based upon tone of voice. The basis for this deficit, however, remains unclear and assessment batteries remain limited. We evaluated performance in schizophrenia on a novel voice emotion recognition battery with well characterized physical features, relative to impairments in more general emotional and cognitive function. Methods We studied in a primary sample of 92 patients relative to 73 controls. Stimuli were characterized according to both intended emotion and physical features (e.g., pitch, intensity) that contributed to the emotional percept. Parallel measures of visual emotion recognition, pitch perception, general cognition, and overall outcome were obtained. More limited measures were obtained in an independent replication sample of 36 patients, 31 age-matched controls, and 188 general comparison subjects. Results Patients showed significant, large effect size deficits in voice emotion recognition (F=25.4, p<.00001, d=1.1), and were preferentially impaired in recognition of emotion based upon pitch-, but not intensity-features (group X feature interaction: F=7.79, p=.006). Emotion recognition deficits were significantly correlated with pitch perception impairments both across (r=56, p<.0001) and within (r=.47, p<.0001) group. Path analysis showed both sensory-specific and general cognitive contributions to auditory emotion recognition deficits in schizophrenia. Similar patterns of results were observed in the replication sample. Conclusions The present study demonstrates impairments in auditory emotion recognition in schizophrenia relative to acoustic features of underlying stimuli. Furthermore, it provides tools and highlights the need for greater attention to physical features of stimuli used for study of social cognition in neuropsychiatric disorders. PMID:22362394

The influence of nationality on the accuracy of face and voice recognition.

PubMed

Doty, N D

1998-01-01

Sixty English and U.S. citizens were tested to determine the effect of nationality on accuracy in recognizing previously witnessed faces and voices. Subjects viewed a frontal facial photograph and were then asked to select that face from a set of 10 oblique facial photographs. Subjects listened to a recorded voice and were then asked to select the same voice from a set of 10 voice recordings. This process was repeated 7 more times, such that subjects identified a male and female face and voice from England, France, Belize, and the United States. Subjects demonstrated better accuracy recognizing the faces and voices of their own nationality. Subgoups analysis further supported the other-nationality effect as well as the previously documented other-race effect.
Voice reaction times with recognition for Commodore computers

NASA Technical Reports Server (NTRS)

Washburn, David A.; Putney, R. Thompson

1990-01-01

Hardware and software modifications are presented that allow for collection and recognition by a Commodore computer of spoken responses. Responses are timed with millisecond accuracy and automatically analyzed and scored. Accuracy data for this device from several experiments are presented. Potential applications and suggestions for improving recognition accuracy are also discussed.
Interpreting Chicken-Scratch: Lexical Access for Handwritten Words

ERIC Educational Resources Information Center

Barnhart, Anthony S.; Goldinger, Stephen D.

2010-01-01

Handwritten word recognition is a field of study that has largely been neglected in the psychological literature, despite its prevalence in society. Whereas studies of spoken word recognition almost exclusively employ natural, human voices as stimuli, studies of visual word recognition use synthetic typefaces, thus simplifying the process of word…
Towards Real-Time Speech Emotion Recognition for Affective E-Learning

ERIC Educational Resources Information Center

Bahreini, Kiavash; Nadolski, Rob; Westera, Wim

2016-01-01

This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…
A Preliminary Analysis of Human Factors Affecting the Recognition Accuracy of a Discrete Word Recognizer for C3 Systems.

DTIC Science & Technology

1983-03-01

acoustic wave pattern and, if so, word recognitios would be a sliple matter of the voice recogniticn system scanning the pattern, comparing the slmple...TRAINING WEEL - EEK#1 ORD# UTTERANCE CRT PRCtMPT (co0THREE THREE (Oe1EUROPE ERP V)r;_ OVE IT LEFT MCV7 IT LEFT 01 !CARRIAGE RETURN CAER RETURN LOGOLT LOGO UT
Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback.

PubMed

Behroozmand, Roozbeh; Larson, Charles R

2011-06-06

The motor-driven predictions about expected sensory feedback (efference copies) have been proposed to play an important role in recognition of sensory consequences of self-produced motor actions. In the auditory system, this effect was suggested to result in suppression of sensory neural responses to self-produced voices that are predicted by the efference copies during vocal production in comparison with passive listening to the playback of the identical self-vocalizations. In the present study, event-related potentials (ERPs) were recorded in response to upward pitch shift stimuli (PSS) with five different magnitudes (0, +50, +100, +200 and +400 cents) at voice onset during active vocal production and passive listening to the playback. Results indicated that the suppression of the N1 component during vocal production was largest for unaltered voice feedback (PSS: 0 cents), became smaller as the magnitude of PSS increased to 200 cents, and was almost completely eliminated in response to 400 cents stimuli. Findings of the present study suggest that the brain utilizes the motor predictions (efference copies) to determine the source of incoming stimuli and maximally suppresses the auditory responses to unaltered feedback of self-vocalizations. The reduction of suppression for 50, 100 and 200 cents and its elimination for 400 cents pitch-shifted voice auditory feedback support the idea that motor-driven suppression of voice feedback leads to distinctly different sensory neural processing of self vs. non-self vocalizations. This characteristic may enable the audio-vocal system to more effectively detect and correct for unexpected errors in the feedback of self-produced voice pitch compared with externally-generated sounds.
A Cross-Lingual Mobile Medical Communication System Prototype for Foreigners and Subjects with Speech, Hearing, and Mental Disabilities Based on Pictograms

PubMed Central

Wołk, Agnieszka; Glinkowski, Wojciech

2017-01-01

People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer. PMID:29230254
Interactive voice technology: Variations in the vocal utterances of speakers performing a stress-inducing task

NASA Astrophysics Data System (ADS)

Mosko, J. D.; Stevens, K. N.; Griffin, G. R.

1983-08-01

Acoustical analyses were conducted of words produced by four speakers in a motion stress-inducing situation. The aim of the analyses was to document the kinds of changes that occur in the vocal utterances of speakers who are exposed to motion stress and to comment on the implications of these results for the design and development of voice interactive systems. The speakers differed markedly in the types and magnitudes of the changes that occurred in their speech. For some speakers, the stress-inducing experimental condition caused an increase in fundamental frequency, changes in the pattern of vocal fold vibration, shifts in vowel production and changes in the relative amplitudes of sounds containing turbulence noise. All speakers showed greater variability in the experimental condition than in more relaxed control situation. The variability was manifested in the acoustical characteristics of individual phonetic elements, particularly in speech sound variability observed serve to unstressed syllables. The kinds of changes and variability observed serve to emphasize the limitations of speech recognition systems based on template matching of patterns that are stored in the system during a training phase. There is need for a better understanding of these phonetic modifications and for developing ways of incorporating knowledge about these changes within a speech recognition system.
A Cross-Lingual Mobile Medical Communication System Prototype for Foreigners and Subjects with Speech, Hearing, and Mental Disabilities Based on Pictograms.

PubMed

Wołk, Krzysztof; Wołk, Agnieszka; Glinkowski, Wojciech

2017-01-01

People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer.
Cultural in-group advantage: emotion recognition in African American and European American faces and voices.

PubMed

Wickline, Virginia B; Bailey, Wendy; Nowicki, Stephen

2009-03-01

The authors explored whether there were in-group advantages in emotion recognition of faces and voices by culture or geographic region. Participants were 72 African American students (33 men, 39 women), 102 European American students (30 men, 72 women), 30 African international students (16 men, 14 women), and 30 European international students (15 men, 15 women). The participants determined emotions in African American and European American faces and voices. Results showed an in-group advantage-sometimes by culture, less often by race-in recognizing facial and vocal emotional expressions. African international students were generally less accurate at interpreting American nonverbal stimuli than were European American, African American, and European international peers. Results suggest that, although partly universal, emotional expressions have subtle differences across cultures that persons must learn.
Progressive Associative Phonagnosia: A Neuropsychological Analysis

ERIC Educational Resources Information Center

Hailstone, Julia C.; Crutch, Sebastian J.; Vestergaard, Martin D.; Patterson, Roy D.; Warren, Jason D.

2010-01-01

There are few detailed studies of impaired voice recognition, or phonagnosia. Here we describe two patients with progressive phonagnosia in the context of frontotemporal lobar degeneration. Patient QR presented with behavioural decline and increasing difficulty recognising familiar voices, while patient KL presented with progressive prosopagnosia.…
An innovative multimodal virtual platform for communication with devices in a natural way

NASA Astrophysics Data System (ADS)

Kinkar, Chhayarani R.; Golash, Richa; Upadhyay, Akhilesh R.

2012-03-01

As technology grows people are diverted and are more interested in communicating with machine or computer naturally. This will make machine more compact and portable by avoiding remote, keyboard etc. also it will help them to live in an environment free from electromagnetic waves. This thought has made 'recognition of natural modality in human computer interaction' a most appealing and promising research field. Simultaneously it has been observed that using single mode of interaction limit the complete utilization of commands as well as data flow. In this paper a multimodal platform, where out of many natural modalities like eye gaze, speech, voice, face etc. human gestures are combined with human voice is proposed which will minimize the mean square error. This will loosen the strict environment needed for accurate and robust interaction while using single mode. Gesture complement Speech, gestures are ideal for direct object manipulation and natural language is used for descriptive tasks. Human computer interaction basically requires two broad sections recognition and interpretation. Recognition and interpretation of natural modality in complex binary instruction is a tough task as it integrate real world to virtual environment. The main idea of the paper is to develop a efficient model for data fusion coming from heterogeneous sensors, camera and microphone. Through this paper we have analyzed that the efficiency is increased if heterogeneous data (image & voice) is combined at feature level using artificial intelligence. The long term goal of this paper is to design a robust system for physically not able or having less technical knowledge.
Speech recognition systems on the Cell Broadband Engine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Y; Jones, H; Vaidya, S

In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less
Chair alarm for patient fall prevention based on gesture recognition and interactivity.

PubMed

Knight, Heather; Lee, Jae-Kyu; Ma, Hongshen

2008-01-01

The Gesture Recognition Interactive Technology (GRiT) Chair Alarm aims to prevent patient falls from chairs and wheelchairs by recognizing the gesture of a patient attempting to stand. Patient falls are one of the greatest causes of injury in hospitals. Current chair and bed exit alarm systems are inadequate because of insufficient notification, high false-alarm rate, and long trigger delays. The GRiT chair alarm uses an array of capacitive proximity sensors and pressure sensors to create a map of the patient's sitting position, which is then processed using gesture recognition algorithms to determine when a patient is attempting to stand and to alarm the care providers. This system also uses a range of voice and light feedback to encourage the patient to remain seated and/or to make use of the system's integrated nurse-call function. This system can be seamlessly integrated into existing hospital WiFi networks to send notifications and approximate patient location through existing nurse call systems.
Similar representations of emotions across faces and voices.

PubMed

Kuhn, Lisa Katharina; Wydell, Taeko; Lavan, Nadine; McGettigan, Carolyn; Garrido, Lúcia

2017-09-01

[Correction Notice: An Erratum for this article was reported in Vol 17(6) of Emotion (see record 2017-18585-001). In the article, the copyright attribution was incorrectly listed and the Creative Commons CC-BY license disclaimer was incorrectly omitted from the author note. The correct copyright is "© 2017 The Author(s)" and the omitted disclaimer is below. All versions of this article have been corrected. "This article has been published under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Copyright for this article is retained by the author(s). Author(s) grant(s) the American Psychological Association the exclusive right to publish the article and identify itself as the original publisher."] Emotions are a vital component of social communication, carried across a range of modalities and via different perceptual signals such as specific muscle contractions in the face and in the upper respiratory system. Previous studies have found that emotion recognition impairments after brain damage depend on the modality of presentation: recognition from faces may be impaired whereas recognition from voices remains preserved, and vice versa. On the other hand, there is also evidence for shared neural activation during emotion processing in both modalities. In a behavioral study, we investigated whether there are shared representations in the recognition of emotions from faces and voices. We used a within-subjects design in which participants rated the intensity of facial expressions and nonverbal vocalizations for each of the 6 basic emotion labels. For each participant and each modality, we then computed a representation matrix with the intensity ratings of each emotion. These matrices allowed us to examine the patterns of confusions between emotions and to characterize the representations of emotions within each modality. We then compared the representations across modalities by computing the correlations of the representation matrices across faces and voices. We found highly correlated matrices across modalities, which suggest similar representations of emotions across faces and voices. We also showed that these results could not be explained by commonalities between low-level visual and acoustic properties of the stimuli. We thus propose that there are similar or shared coding mechanisms for emotions which may act independently of modality, despite their distinct perceptual inputs. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Cerebral Processing of Voice Gender Studied Using a Continuous Carryover fMRI Design

PubMed Central

Pernet, Cyril; Latinus, Marianne; Crabbe, Frances; Belin, Pascal

2013-01-01

Normal listeners effortlessly determine a person's gender by voice, but the cerebral mechanisms underlying this ability remain unclear. Here, we demonstrate 2 stages of cerebral processing during voice gender categorization. Using voice morphing along with an adaptation-optimized functional magnetic resonance imaging design, we found that secondary auditory cortex including the anterior part of the temporal voice areas in the right hemisphere responded primarily to acoustical distance with the previously heard stimulus. In contrast, a network of bilateral regions involving inferior prefrontal and anterior and posterior cingulate cortex reflected perceived stimulus ambiguity. These findings suggest that voice gender recognition involves neuronal populations along the auditory ventral stream responsible for auditory feature extraction, functioning in pair with the prefrontal cortex in voice gender perception. PMID:22490550
The non-trusty clown attack on model-based speaker recognition systems

NASA Astrophysics Data System (ADS)

Farrokh Baroughi, Alireza; Craver, Scott

2015-03-01

Biometric detectors for speaker identification commonly employ a statistical model for a subject's voice, such as a Gaussian Mixture Model, that combines multiple means to improve detector performance. This allows a malicious insider to amend or append a component of a subject's statistical model so that a detector behaves normally except under a carefully engineered circumstance. This allows an attacker to force a misclassification of his or her voice only when desired, by smuggling data into a database far in advance of an attack. Note that the attack is possible if attacker has access to database even for a limited time to modify victim's model. We exhibit such an attack on a speaker identification, in which an attacker can force a misclassification by speaking in an unusual voice, and replacing the least weighted component of victim's model by the most weighted competent of the unusual voice of the attacker's model. The reason attacker make his or her voice unusual during the attack is because his or her normal voice model can be in database, and by attacking with unusual voice, the attacker has the option to be recognized as himself or herself when talking normally or as the victim when talking in the unusual manner. By attaching an appropriately weighted vector to a victim's model, we can impersonate all users in our simulations, while avoiding unwanted false rejections.
The voice conveys emotion in ten globalized cultures and one remote village in Bhutan.

PubMed

Cordaro, Daniel T; Keltner, Dacher; Tshering, Sumjay; Wangchuk, Dorji; Flynn, Lisa M

2016-02-01

With data from 10 different globalized cultures and 1 remote, isolated village in Bhutan, we examined universals and cultural variations in the recognition of 16 nonverbal emotional vocalizations. College students in 10 nations (Study 1) and villagers in remote Bhutan (Study 2) were asked to match emotional vocalizations to 1-sentence stories of the same valence. Guided by previous conceptualizations of recognition accuracy, across both studies, 7 of the 16 vocal burst stimuli were found to have strong or very strong recognition in all 11 cultures, 6 vocal bursts were found to have moderate recognition, and 4 were not universally recognized. All vocal burst stimuli varied significantly in terms of the degree to which they were recognized across the 11 cultures. Our discussion focuses on the implications of these results for current debates concerning the emotion conveyed in the voice. (c) 2016 APA, all rights reserved).
Hands-free human-machine interaction with voice

NASA Astrophysics Data System (ADS)

Juang, B. H.

2004-05-01

Voice is natural communication interface between a human and a machine. The machine, when placed in today's communication networks, may be configured to provide automation to save substantial operating cost, as demonstrated in AT&T's VRCP (Voice Recognition Call Processing), or to facilitate intelligent services, such as virtual personal assistants, to enhance individual productivity. These intelligent services often need to be accessible anytime, anywhere (e.g., in cars when the user is in a hands-busy-eyes-busy situation or during meetings where constantly talking to a microphone is either undersirable or impossible), and thus call for advanced signal processing and automatic speech recognition techniques which support what we call ``hands-free'' human-machine communication. These techniques entail a broad spectrum of technical ideas, ranging from use of directional microphones and acoustic echo cancellatiion to robust speech recognition. In this talk, we highlight a number of key techniques that were developed for hands-free human-machine communication in the mid-1990s after Bell Labs became a unit of Lucent Technologies. A video clip will be played to demonstrate the accomplishement.
Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.

PubMed

Shao, Xu; Milner, Ben

2005-08-01

This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.

Sensing of Particular Speakers for the Construction of Voice Interface Utilized in Noisy Environment

NASA Astrophysics Data System (ADS)

Sawada, Hideyuki; Ohkado, Minoru

Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.
Audio feature extraction using probability distribution function

NASA Astrophysics Data System (ADS)

Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.

2015-05-01

Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.
Voice emotion recognition by cochlear-implanted children and their normally-hearing peers.

PubMed

Chatterjee, Monita; Zion, Danielle J; Deroche, Mickael L; Burianek, Brooke A; Limb, Charles J; Goren, Alison P; Kulkarni, Aditya M; Christensen, Julie A

2015-04-01

Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups' mean performance is similar to aNHs' performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. This article is part of a Special Issue entitled . Copyright © 2014 Elsevier B.V. All rights reserved.
Voice Technologies in Libraries: A Look into the Future.

ERIC Educational Resources Information Center

Lange, Holley R., Ed.; And Others

1991-01-01

Discussion of synthesized speech and voice recognition focuses on a forum that addressed the potential for speech technologies in libraries. Topics discussed by three contributors include possible library applications in technical processing, book receipt, circulation control, and database access; use by disabled and illiterate users; and problems…
Input and Output Mechanisms and Devices. Phase I: Adding Voice Output to a Speaker-Independent Recognition System.

ERIC Educational Resources Information Center

Scott Instruments Corp., Denton, TX.

This project was designed to develop techniques for adding low-cost speech synthesis to educational software. Four tasks were identified for the study: (1) select a microcomputer with a built-in analog-to-digital converter that is currently being used in educational environments; (2) determine the feasibility of implementing expansion and playback…
On combining multi-normalization and ancillary measures for the optimal score level fusion of fingerprint and voice biometrics

NASA Astrophysics Data System (ADS)

Mohammed Anzar, Sharafudeen Thaha; Sathidevi, Puthumangalathu Savithri

2014-12-01

In this paper, we have considered the utility of multi-normalization and ancillary measures, for the optimal score level fusion of fingerprint and voice biometrics. An efficient matching score preprocessing technique based on multi-normalization is employed for improving the performance of the multimodal system, under various noise conditions. Ancillary measures derived from the feature space and the score space are used in addition to the matching score vectors, for weighing the modalities, based on their relative degradation. Reliability (dispersion) and the separability (inter-/intra-class distance and d-prime statistics) measures under various noise conditions are estimated from the individual modalities, during the training/validation stage. The `best integration weights' are then computed by algebraically combining these measures using the weighted sum rule. The computed integration weights are then optimized against the recognition accuracy using techniques such as grid search, genetic algorithm and particle swarm optimization. The experimental results show that, the proposed biometric solution leads to considerable improvement in the recognition performance even under low signal-to-noise ratio (SNR) conditions and reduces the false acceptance rate (FAR) and false rejection rate (FRR), making the system useful for security as well as forensic applications.
Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

PubMed

Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

2016-10-01

Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
Speech recognition for embedded automatic positioner for laparoscope

NASA Astrophysics Data System (ADS)

Chen, Xiaodong; Yin, Qingyun; Wang, Yi; Yu, Daoyin

2014-07-01

In this paper a novel speech recognition methodology based on Hidden Markov Model (HMM) is proposed for embedded Automatic Positioner for Laparoscope (APL), which includes a fixed point ARM processor as the core. The APL system is designed to assist the doctor in laparoscopic surgery, by implementing the specific doctor's vocal control to the laparoscope. Real-time respond to the voice commands asks for more efficient speech recognition algorithm for the APL. In order to reduce computation cost without significant loss in recognition accuracy, both arithmetic and algorithmic optimizations are applied in the method presented. First, depending on arithmetic optimizations most, a fixed point frontend for speech feature analysis is built according to the ARM processor's character. Then the fast likelihood computation algorithm is used to reduce computational complexity of the HMM-based recognition algorithm. The experimental results show that, the method shortens the recognition time within 0.5s, while the accuracy higher than 99%, demonstrating its ability to achieve real-time vocal control to the APL.
Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback

PubMed Central

2011-01-01

Background The motor-driven predictions about expected sensory feedback (efference copies) have been proposed to play an important role in recognition of sensory consequences of self-produced motor actions. In the auditory system, this effect was suggested to result in suppression of sensory neural responses to self-produced voices that are predicted by the efference copies during vocal production in comparison with passive listening to the playback of the identical self-vocalizations. In the present study, event-related potentials (ERPs) were recorded in response to upward pitch shift stimuli (PSS) with five different magnitudes (0, +50, +100, +200 and +400 cents) at voice onset during active vocal production and passive listening to the playback. Results Results indicated that the suppression of the N1 component during vocal production was largest for unaltered voice feedback (PSS: 0 cents), became smaller as the magnitude of PSS increased to 200 cents, and was almost completely eliminated in response to 400 cents stimuli. Conclusions Findings of the present study suggest that the brain utilizes the motor predictions (efference copies) to determine the source of incoming stimuli and maximally suppresses the auditory responses to unaltered feedback of self-vocalizations. The reduction of suppression for 50, 100 and 200 cents and its elimination for 400 cents pitch-shifted voice auditory feedback support the idea that motor-driven suppression of voice feedback leads to distinctly different sensory neural processing of self vs. non-self vocalizations. This characteristic may enable the audio-vocal system to more effectively detect and correct for unexpected errors in the feedback of self-produced voice pitch compared with externally-generated sounds. PMID:21645406
Real time analysis of voiced sounds

NASA Technical Reports Server (NTRS)

Hong, J. P. (Inventor)

1976-01-01

A power spectrum analysis of the harmonic content of a voiced sound signal is conducted in real time by phase-lock-loop tracking of the fundamental frequency, (f sub 0) of the signal and successive harmonics (h sub 1 through h sub n) of the fundamental frequency. The analysis also includes measuring the quadrature power and phase of each frequency tracked, differentiating the power measurements of the harmonics in adjacent pairs, and analyzing successive differentials to determine peak power points in the power spectrum for display or use in analysis of voiced sound, such as for voice recognition.
Sensitivity-Enhanced Wearable Active Voiceprint Sensor Based on Cellular Polypropylene Piezoelectret.

PubMed

Li, Wenbo; Zhao, Sheng; Wu, Nan; Zhong, Junwen; Wang, Bo; Lin, Shizhe; Chen, Shuwen; Yuan, Fang; Jiang, Hulin; Xiao, Yongjun; Hu, Bin; Zhou, Jun

2017-07-19

Wearable active sensors have extensive applications in mobile biosensing and human-machine interaction but require good flexibility, high sensitivity, excellent stability, and self-powered feature. In this work, cellular polypropylene (PP) piezoelectret was chosen as the core material of a sensitivity-enhanced wearable active voiceprint sensor (SWAVS) to realize voiceprint recognition. By virtue of the dipole orientation control method, the air layers in the piezoelectret were efficiently utilized, and the current sensitivity was enhanced (from 1.98 pA/Hz to 5.81 pA/Hz at 115 dB). The SWAVS exhibited the superiorities of high sensitivity, accurate frequency response, and excellent stability. The voiceprint recognition system could make correct reactions to human voices by judging both the password and speaker. This study presented a voiceprint sensor with potential applications in noncontact biometric recognition and safety guarantee systems, promoting the progress of wearable sensor networks.
A simulation system for Space Station extravehicular activity

NASA Technical Reports Server (NTRS)

Marmolejo, Jose A.; Shepherd, Chip

1993-01-01

America's next major step into space will be the construction of a permanently manned Space Station which is currently under development and scheduled for full operation in the mid-1990's. Most of the construction of the Space Station will be performed over several flights by suited crew members during an extravehicular activity (EVA) from the Space Shuttle. Once fully operational, EVA's will be performed from the Space Station on a routine basis to provide, among other services, maintenance and repair operations of satellites currently in Earth orbit. Both voice recognition and helmet-mounted display technologies can improve the productivity of workers in space by potentially reducing the time, risk, and cost involved in performing EVA. NASA has recognized this potential and is currently developing a voice-controlled information system for Space Station EVA. Two bench-model helmet-mounted displays and an EVA simulation program have been developed to demonstrate the functionality and practicality of the system.
Neural Processing of Vocal Emotion and Identity

ERIC Educational Resources Information Center

Spreckelmeyer, Katja N.; Kutas, Marta; Urbach, Thomas; Altenmuller, Eckart; Munte, Thomas F.

2009-01-01

The voice is a marker of a person's identity which allows individual recognition even if the person is not in sight. Listening to a voice also affords inferences about the speaker's emotional state. Both these types of personal information are encoded in characteristic acoustic feature patterns analyzed within the auditory cortex. In the present…
Social support and substitute voice acquisition on psychological adjustment among patients after laryngectomy.

PubMed

Kotake, Kumiko; Suzukamo, Yoshimi; Kai, Ichiro; Iwanaga, Kazuyo; Takahashi, Aya

2017-03-01

The objective is to clarify whether social support and acquisition of alternative voice enhance the psychological adjustment of laryngectomized patients and which part of the psychological adjustment structure would be influenced by social support. We contacted 1445 patients enrolled in a patient association using mail surveys and 679 patients agreed to participate in the study. The survey items included age, sex, occupation, post-surgery duration, communication method, psychological adjustment (by the Nottingham Adjustment Scale Japanese Laryngectomy Version: NAS-J-L), and the formal support (by Hospital Patient Satisfaction Questionnaire-25: HPSQ-25). Social support and communication methods were added to the three-tier structural model of psychological adjustment shown in our previous study, and a covariance structure analysis was conducted. Formal/informal supports and acquisition of alternative voice influence only the "recognition of oneself as voluntary agent", the first tier of the three-tier structure of psychological adjustment. The results suggest that social support and acquisition of alternative voice may enhance the recognition of oneself as voluntary agent and promote the psychological adjustment.
A Method for Determining the Timing of Displaying the Speaker's Face and Captions for a Real-Time Speech-to-Caption System

NASA Astrophysics Data System (ADS)

Kuroki, Hayato; Ino, Shuichi; Nakano, Satoko; Hori, Kotaro; Ifukube, Tohru

The authors of this paper have been studying a real-time speech-to-caption system using speech recognition technology with a “repeat-speaking” method. In this system, they used a “repeat-speaker” who listens to a lecturer's voice and then speaks back the lecturer's speech utterances into a speech recognition computer. The througoing system showed that the accuracy of the captions is about 97% in Japanese-Japanese conversion and the conversion time from voices to captions is about 4 seconds in English-English conversion in some international conferences. Of course it required a lot of costs to achieve these high performances. In human communications, speech understanding depends not only on verbal information but also on non-verbal information such as speaker's gestures, and face and mouth movements. So the authors found the idea to display information of captions and speaker's face movement images with a suitable way to achieve a higher comprehension after storing information once into a computer briefly. In this paper, we investigate the relationship of the display sequence and display timing between captions that have speech recognition errors and the speaker's face movement images. The results show that the sequence “to display the caption before the speaker's face image” improves the comprehension of the captions. The sequence “to display both simultaneously” shows an improvement only a few percent higher than the question sentence, and the sequence “to display the speaker's face image before the caption” shows almost no change. In addition, the sequence “to display the caption 1 second before the speaker's face shows the most significant improvement of all the conditions.
Impact of a Telehealth Program With Voice Recognition Technology in Patients With Chronic Heart Failure: Feasibility Study

PubMed Central

Lee, Heesun; Choi, Sae Won; Yoon, Yeonyee E; Park, Hyo Eun; Lee, Sang Eun; Lee, Seung-Pyo; Kim, Hyung-Kwan; Cho, Hyun-Jai; Choi, Su-Yeon; Lee, Hae-Young; Choi, Jonghyuk; Lee, Young-Joon; Kim, Yong-Jin; Cho, Goo-Yeong; Choi, Jinwook; Sohn, Dae-Won

2017-01-01

Background Despite the advances in the diagnosis and treatment of heart failure (HF), the current hospital-oriented framework for HF management does not appear to be sufficient to maintain the stability of HF patients in the long term. The importance of self-care management is increasingly being emphasized as a promising long-term treatment strategy for patients with chronic HF. Objective The objective of this study was to evaluate whether a new information communication technology (ICT)–based telehealth program with voice recognition technology could improve clinical or laboratory outcomes in HF patients. Methods In this prospective single-arm pilot study, we recruited 31 consecutive patients with chronic HF who were referred to our institute. An ICT-based telehealth program with voice recognition technology was developed and used by patients with HF for 12 weeks. Patients were educated on the use of this program via mobile phone, landline, or the Internet for the purpose of improving communication and data collection. Using these systems, we collected comprehensive data elements related to the risk of HF self-care management such as weight, diet, exercise, medication adherence, overall symptom change, and home blood pressure. The study endpoints were the changes observed in urine sodium concentration (uNa), Minnesota Living with Heart Failure (MLHFQ) scores, 6-min walk test, and N-terminal prohormone of brain natriuretic peptide (NT-proBNP) as surrogate markers for appropriate HF management. Results Among the 31 enrolled patients, 27 (87%) patients completed the study, and 10 (10/27, 37%) showed good adherence to ICT-based telehealth program with voice recognition technology, which was defined as the use of the program for 100 times or more during the study period. Nearly three-fourths of the patients had been hospitalized at least once because of HF before the enrollment (20/27, 74%); 14 patients had 1, 2 patients had 2, and 4 patients had 3 or more previous HF hospitalizations. In the total study population, there was no significant interval change in laboratory and functional outcome variables after 12 weeks of ICT-based telehealth program. In patients with good adherence to ICT-based telehealth program, there was a significant improvement in the mean uNa (103.1 to 78.1; P=.01) but not in those without (85.4 to 96.9; P=.49). Similarly, a marginal improvement in MLHFQ scores was only observed in patients with good adherence (27.5 to 21.4; P=.08) but not in their counterparts (19.0 to 19.7; P=.73). The mean 6-min walk distance and NT-proBNP were not significantly increased in patients regardless of their adherence. Conclusions Short-term application of ICT-based telehealth program with voice recognition technology showed the potential to improve uNa values and MLHFQ scores in HF patients, suggesting that better control of sodium intake and greater quality of life can be achieved by this program. PMID:28970189
Military and Government Applications of Human-Machine Communication by Voice

NASA Astrophysics Data System (ADS)

Weinstein, Clifford J.

1995-10-01

This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs.
Processing Electromyographic Signals to Recognize Words

NASA Technical Reports Server (NTRS)

Jorgensen, C. C.; Lee, D. D.

2009-01-01

A recently invented speech-recognition method applies to words that are articulated by means of the tongue and throat muscles but are otherwise not voiced or, at most, are spoken sotto voce. This method could satisfy a need for speech recognition under circumstances in which normal audible speech is difficult, poses a hazard, is disturbing to listeners, or compromises privacy. The method could also be used to augment traditional speech recognition by providing an additional source of information about articulator activity. The method can be characterized as intermediate between (1) conventional speech recognition through processing of voice sounds and (2) a method, not yet developed, of processing electroencephalographic signals to extract unspoken words directly from thoughts. This method involves computational processing of digitized electromyographic (EMG) signals from muscle innervation acquired by surface electrodes under a subject's chin near the tongue and on the side of the subject s throat near the larynx. After preprocessing, digitization, and feature extraction, EMG signals are processed by a neural-network pattern classifier, implemented in software, that performs the bulk of the recognition task as described.
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

PubMed

Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

2015-07-01

It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line with the 'auditory-visual view' of auditory speech perception, which assumes that auditory speech recognition is optimized by using predictions from previously encoded speaker-specific audio-visual internal models. Copyright © 2015 Elsevier Ltd. All rights reserved.
What does voice-processing technology support today?

PubMed Central

Nakatsu, R; Suzuki, Y

1995-01-01

This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions. Images Fig. 3 PMID:7479720

Design of a digital voice data compression technique for orbiter voice channels

NASA Technical Reports Server (NTRS)

1975-01-01

Candidate techniques were investigated for digital voice compression to a transmission rate of 8 kbps. Good voice quality, speaker recognition, and robustness in the presence of error bursts were considered. The technique of delayed-decision adaptive predictive coding is described and compared with conventional adaptive predictive coding. Results include a set of experimental simulations recorded on analog tape. The two FM broadcast segments produced show the delayed-decision technique to be virtually undegraded or minimally degraded at .001 and .01 Viterbi decoder bit error rates. Preliminary estimates of the hardware complexity of this technique indicate potential for implementation in space shuttle orbiters.
The Voice Transcription Technique: Use of Voice Recognition Software to Transcribe Digital Interview Data in Qualitative Research

ERIC Educational Resources Information Center

Matheson, Jennifer L.

2007-01-01

Transcribing interview data is a time-consuming task that most qualitative researchers dislike. Transcribing is even more difficult for people with physical limitations because traditional transcribing requires manual dexterity and the ability to sit at a computer for long stretches of time. Researchers have begun to explore using an automated…
Voiced Excitations

DTIC Science & Technology

2004-12-01

3701 North Fairfax Drive Arlington, VA 22203-1714 NA NA NA Radar & EM Speech, Voiced Speech Excitations 61 ULUNCLASSIFIED UNCLASSIFIED UNCLASSIFIED...New Ideas for Speech Recognition and Related Technologies”, Lawrence Livermore National Laboratory Report, UCRL -UR-120310 , 1995 . Available from...Livermore Laboratory report UCRL -JC-134775M Holzrichter 2003, Holzrichter J.F., Kobler, J. B., Rosowski, J.J., Burke, G.J., (2003) “EM wave
Children's Recognition of Their Own Recorded Voice: Influence of Age and Phonological Impairment

ERIC Educational Resources Information Center

Strombergsson, Sofia

2013-01-01

Children with phonological impairment (PI) often have difficulties perceiving insufficiencies in their own speech. The use of recordings has been suggested as a way of directing the child's attention toward his/her own speech, despite a lack of evidence that children actually recognize their recorded voice as their own. We present two studies of…
Vocal Identity Recognition in Autism Spectrum Disorder

PubMed Central

Lin, I-Fan; Yamada, Takashi; Komine, Yoko; Kato, Nobumasa; Kato, Masaharu; Kashino, Makio

2015-01-01

Voices can convey information about a speaker. When forming an abstract representation of a speaker, it is important to extract relevant features from acoustic signals that are invariant to the modulation of these signals. This study investigated the way in which individuals with autism spectrum disorder (ASD) recognize and memorize vocal identity. The ASD group and control group performed similarly in a task when asked to choose the name of the newly-learned speaker based on his or her voice, and the ASD group outperformed the control group in a subsequent familiarity test when asked to discriminate the previously trained voices and untrained voices. These findings suggest that individuals with ASD recognized and memorized voices as well as the neurotypical individuals did, but they categorized voices in a different way: individuals with ASD categorized voices quantitatively based on the exact acoustic features, while neurotypical individuals categorized voices qualitatively based on the acoustic patterns correlated to the speakers' physical and mental properties. PMID:26070199
Vocal Identity Recognition in Autism Spectrum Disorder.

PubMed

Lin, I-Fan; Yamada, Takashi; Komine, Yoko; Kato, Nobumasa; Kato, Masaharu; Kashino, Makio

2015-01-01

Voices can convey information about a speaker. When forming an abstract representation of a speaker, it is important to extract relevant features from acoustic signals that are invariant to the modulation of these signals. This study investigated the way in which individuals with autism spectrum disorder (ASD) recognize and memorize vocal identity. The ASD group and control group performed similarly in a task when asked to choose the name of the newly-learned speaker based on his or her voice, and the ASD group outperformed the control group in a subsequent familiarity test when asked to discriminate the previously trained voices and untrained voices. These findings suggest that individuals with ASD recognized and memorized voices as well as the neurotypical individuals did, but they categorized voices in a different way: individuals with ASD categorized voices quantitatively based on the exact acoustic features, while neurotypical individuals categorized voices qualitatively based on the acoustic patterns correlated to the speakers' physical and mental properties.
A meta-analysis of in-vehicle and nomadic voice-recognition system interaction and driving performance.

PubMed

Simmons, Sarah M; Caird, Jeff K; Steel, Piers

2017-09-01

Driver distraction is a growing and pervasive issue that requires multiple solutions. Voice-recognition (V-R) systems may decrease the visual-manual (V-M) demands of a wide range of in-vehicle system and smartphone interactions. However, the degree that V-R systems integrated into vehicles or available in mobile phone applications affect driver distraction is incompletely understood. A comprehensive meta-analysis of experimental studies was conducted to address this knowledge gap. To meet study inclusion criteria, drivers had to interact with a V-R system while driving and doing everyday V-R tasks such as dialing, initiating a call, texting, emailing, destination entry or music selection. Coded dependent variables included detection, reaction time, lateral position, speed and headway. Comparisons of V-R systems with baseline driving and/or a V-M condition were also coded. Of 817 identified citations, 43 studies involving 2000 drivers and 183 effect sizes (r) were analyzed in the meta-analysis. Compared to baseline, driving while interacting with a V-R system is associated with increases in reaction time and lane positioning, and decreases in detection. When V-M systems were compared to V-R systems, drivers had slightly better performance with the latter system on reaction time, lane positioning and headway. Although V-R systems have some driving performance advantages over V-M systems, they have a distraction cost relative to driving without any system at all. The pattern of results indicates that V-R systems impose moderate distraction costs on driving. In addition, drivers minimally engage in compensatory performance adjustments such as reducing speed and increasing headway while using V-R systems. Implications of the results for theory, design guidelines and future research are discussed. Copyright © 2017 Elsevier Ltd. All rights reserved.
A voice-actuated wind tunnel model leak checking system

NASA Technical Reports Server (NTRS)

Larson, William E.

1989-01-01

A computer program has been developed that improves the efficiency of wind tunnel model leak checking. The program uses a voice recognition unit to relay a technician's commands to the computer. The computer, after receiving a command, can respond to the technician via a voice response unit. Information about the model pressure orifice being checked is displayed on a gas-plasma terminal. On command, the program records up to 30 seconds of pressure data. After the recording is complete, the raw data and a straight line fit of the data are plotted on the terminal. This allows the technician to make a decision on the integrity of the orifice being checked. All results of the leak check program are stored in a database file that can be listed on the line printer for record keeping purposes or displayed on the terminal to help the technician find unchecked orifices. This program allows one technician to check a model for leaks instead of the two or three previously required.
Hearing the Unheard: An Interdisciplinary, Mixed Methodology Study of Women's Experiences of Hearing Voices (Auditory Verbal Hallucinations).

PubMed

McCarthy-Jones, Simon; Castro Romero, Maria; McCarthy-Jones, Roseline; Dillon, Jacqui; Cooper-Rompato, Christine; Kieran, Kathryn; Kaufman, Milissa; Blackman, Lisa

2015-01-01

This paper explores the experiences of women who "hear voices" (auditory verbal hallucinations). We begin by examining historical understandings of women hearing voices, showing these have been driven by androcentric theories of how women's bodies functioned leading to women being viewed as requiring their voices be interpreted by men. We show the twentieth century was associated with recognition that the mental violation of women's minds (represented by some voice-hearing) was often a consequence of the physical violation of women's bodies. We next report the results of a qualitative study into voice-hearing women's experiences (n = 8). This found similarities between women's relationships with their voices and their relationships with others and the wider social context. Finally, we present results from a quantitative study comparing voice-hearing in women (n = 65) and men (n = 132) in a psychiatric setting. Women were more likely than men to have certain forms of voice-hearing (voices conversing) and to have antecedent events of trauma, physical illness, and relationship problems. Voices identified as female may have more positive affect than male voices. We conclude that women voice-hearers have and continue to face specific challenges necessitating research and activism, and hope this paper will act as a stimulus to such work.
A memory like a female Fur Seal: long-lasting recognition of pup's voice by mothers.

PubMed

Mathevon, Nicolas; Charrier, Isabelle; Aubin, Thierry

2004-06-01

In colonial mammals like fur seals, mutual vocal recognition between mothers and their pup is of primary importance for breeding success. Females alternate feeding sea-trips with suckling periods on land, and when coming back from the ocean, they have to vocally find their offspring among numerous similar-looking pups. Young fur seals emit a 'mother-attraction call' that presents individual characteristics. In this paper, we review the perceptual process of pup's call recognition by Subantarctic Fur Seal Arctocephalus tropicalis mothers. To identify their progeny, females rely on the frequency modulation pattern and spectral features of this call. As the acoustic characteristics of a pup's call change throughout the lactation period due to the growing process, mothers have thus to refine their memorization of their pup's voice. Field experiments show that female Fur Seals are able to retain all the successive versions of their pup's call.
Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

NASA Astrophysics Data System (ADS)

Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.

2009-12-01

This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Voice Identification: Levels-of-Processing and the Relationship between Prior Description Accuracy and Recognition Accuracy.

ERIC Educational Resources Information Center

Walter, Todd J.

A study examined whether a person's ability to accurately identify a voice is influenced by factors similar to those proposed by the Supreme Court for eyewitness identification accuracy. In particular, the Supreme Court has suggested that a person's prior description accuracy of a suspect, degree of attention to a suspect, and confidence in…
Administration of Neuropsychological Tests Using Interactive Voice Response Technology in the Elderly: Validation and Limitations

PubMed Central

Miller, Delyana Ivanova; Talbot, Vincent; Gagnon, Michèle; Messier, Claude

2013-01-01

Interactive voice response (IVR) systems are computer programs, which interact with people to provide a number of services from business to health care. We examined the ability of an IVR system to administer and score a verbal fluency task (fruits) and the digit span forward and backward in 158 community dwelling people aged between 65 and 92 years of age (full scale IQ of 68–134). Only six participants could not complete all tasks mostly due to early technical problems in the study. Participants were also administered the Wechsler Intelligence Scale fourth edition (WAIS-IV) and Wechsler Memory Scale fourth edition subtests. The IVR system correctly recognized 90% of the fruits in the verbal fluency task and 93–95% of the number sequences in the digit span. The IVR system typically underestimated the performance of participants because of voice recognition errors. In the digit span, these errors led to the erroneous discontinuation of the test: however the correlation between IVR scoring and clinical scoring was still high (93–95%). The correlation between the IVR verbal fluency and the WAIS-IV Similarities subtest was 0.31. The correlation between the IVR digit span forward and backward and the in-person administration was 0.46. We discuss how valid and useful IVR systems are for neuropsychological testing in the elderly. PMID:23950755
Military and government applications of human-machine communication by voice.

PubMed Central

Weinstein, C J

1995-01-01

This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6 PMID:7479718
Sound specificity effects in spoken word recognition: The effect of integrality between words and sounds.

PubMed

Strori, Dorina; Zaar, Johannes; Cooke, Martin; Mattys, Sven L

2018-01-01

Recent evidence has shown that nonlinguistic sounds co-occurring with spoken words may be retained in memory and affect later retrieval of the words. This sound-specificity effect shares many characteristics with the classic voice-specificity effect. In this study, we argue that the sound-specificity effect is conditional upon the context in which the word and sound coexist. Specifically, we argue that, besides co-occurrence, integrality between words and sounds is a crucial factor in the emergence of the effect. In two recognition-memory experiments, we compared the emergence of voice and sound specificity effects. In Experiment 1 , we examined two conditions where integrality is high. Namely, the classic voice-specificity effect (Exp. 1a) was compared with a condition in which the intensity envelope of a background sound was modulated along the intensity envelope of the accompanying spoken word (Exp. 1b). Results revealed a robust voice-specificity effect and, critically, a comparable sound-specificity effect: A change in the paired sound from exposure to test led to a decrease in word-recognition performance. In the second experiment, we sought to disentangle the contribution of integrality from a mere co-occurrence context effect by removing the intensity modulation. The absence of integrality led to the disappearance of the sound-specificity effect. Taken together, the results suggest that the assimilation of background sounds into memory cannot be reduced to a simple context effect. Rather, it is conditioned by the extent to which words and sounds are perceived as integral as opposed to distinct auditory objects.
Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

PubMed Central

Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

2016-01-01

Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
Multimodal approaches for emotion recognition: a survey

NASA Astrophysics Data System (ADS)

Sebe, Nicu; Cohen, Ira; Gevers, Theo; Huang, Thomas S.

2004-12-01

Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing-emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in several applications. This paper explores new ways of human-computer interaction that enable the computer to be more aware of the user's emotional and attentional expressions. We present the basic research in the field and the recent advances into the emotion recognition from facial, voice, and physiological signals, where the different modalities are treated independently. We then describe the challenging problem of multimodal emotion recognition and we advocate the use of probabilistic graphical models when fusing the different modalities. We also discuss the difficult issues of obtaining reliable affective data, obtaining ground truth for emotion recognition, and the use of unlabeled data.
Multimodal approaches for emotion recognition: a survey

NASA Astrophysics Data System (ADS)

Sebe, Nicu; Cohen, Ira; Gevers, Theo; Huang, Thomas S.

2005-01-01

Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing-emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in several applications. This paper explores new ways of human-computer interaction that enable the computer to be more aware of the user's emotional and attentional expressions. We present the basic research in the field and the recent advances into the emotion recognition from facial, voice, and physiological signals, where the different modalities are treated independently. We then describe the challenging problem of multimodal emotion recognition and we advocate the use of probabilistic graphical models when fusing the different modalities. We also discuss the difficult issues of obtaining reliable affective data, obtaining ground truth for emotion recognition, and the use of unlabeled data.
Voice gender discrimination provides a measure of more than pitch-related perception in cochlear implant users.

PubMed

Li, Tianhao; Fu, Qian-Jie

2011-08-01

(1) To investigate whether voice gender discrimination (VGD) could be a useful indicator of the spectral and temporal processing abilities of individual cochlear implant (CI) users; (2) To examine the relationship between VGD and speech recognition with CI when comparable acoustic cues are used for both perception processes. VGD was measured using two talker sets with different inter-gender fundamental frequencies (F(0)), as well as different acoustic CI simulations. Vowel and consonant recognition in quiet and noise were also measured and compared with VGD performance. Eleven postlingually deaf CI users. The results showed that (1) mean VGD performance differed for different stimulus sets, (2) VGD and speech recognition performance varied among individual CI users, and (3) individual VGD performance was significantly correlated with speech recognition performance under certain conditions. VGD measured with selected stimulus sets might be useful for assessing not only pitch-related perception, but also spectral and temporal processing by individual CI users. In addition to improvements in spectral resolution and modulation detection, the improvement in higher modulation frequency discrimination might be particularly important for CI users in noisy environments.
The Bangor Voice Matching Test: A standardized test for the assessment of voice perception ability.

PubMed

Mühl, Constanze; Sheil, Orla; Jarutytė, Lina; Bestelmeyer, Patricia E G

2017-11-09

Recognising the identity of conspecifics is an important yet highly variable skill. Approximately 2 % of the population suffers from a socially debilitating deficit in face recognition. More recently the existence of a similar deficit in voice perception has emerged (phonagnosia). Face perception tests have been readily available for years, advancing our understanding of underlying mechanisms in face perception. In contrast, voice perception has received less attention, and the construction of standardized voice perception tests has been neglected. Here we report the construction of the first standardized test for voice perception ability. Participants make a same/different identity decision after hearing two voice samples. Item Response Theory guided item selection to ensure the test discriminates between a range of abilities. The test provides a starting point for the systematic exploration of the cognitive and neural mechanisms underlying voice perception. With a high test-retest reliability (r=.86) and short assessment duration (~10 min) this test examines individual abilities reliably and quickly and therefore also has potential for use in developmental and neuropsychological populations.

Recognizing famous voices: influence of stimulus duration and different types of retrieval cues.

PubMed

Schweinberger, S R; Herholz, A; Sommer, W

1997-04-01

The current investigation measured the effects of increasing stimulus duration on listeners' ability to recognize famous voices. In addition, the investigation studied the influence of different types of cues on the naming of voices that could not be named before. Participants were presented with samples of famous and unfamiliar voices and were asked to decide whether or not the samples were spoken by a famous person. The duration of each sample increased in seven steps from 0.25 s up to a maximum of 2 s. Voice recognition improvements with stimulus duration were with a growth function. Gains were most rapid within the first second and less pronounced thereafter. When participants were unable to name a famous voice, they were cued with either a second voice sample, the occupation, or the initials of the celebrity. Initials were most effective in eliciting the name only when semantic information about the speaker had been accessed prior to cue presentation. Paralleling previous research on face naming, this may indicate that voice naming is contingent on previous activation of person-specific semantic information.
Exploring expressivity and emotion with artificial voice and speech technologies.

PubMed

Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

2013-10-01

Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.
[Design and implementation of mobile terminal data acquisition for Chinese materia medica resources survey].

PubMed

Qi, Yuan-Hua; Wang, Hui; Zhang, Xiao-Bo; Jin, Yan; Ge, Xiao-Guang; Jing, Zhi-Xian; Wang, Ling; Zhao, Yu-Ping; Guo, Lan-Ping; Huang, Lu-Qi

2017-11-01

In this paper, a data acquisition system based on mobile terminal combining GPS, offset correction, automatic speech recognition and database networking technology was designed implemented with the function of locating the latitude and elevation information fast, taking conveniently various types of Chinese herbal plant photos, photos, samples habitat photos and so on. The mobile system realizes automatic association with Chinese medicine source information, through the voice recognition function it records the information of plant characteristics and environmental characteristics, and record relevant plant specimen information. The data processing platform based on Chinese medicine resources survey data reporting client can effectively assists in indoor data processing, derives the mobile terminal data to computer terminal. The established data acquisition system provides strong technical support for the fourth national survey of the Chinese materia medica resources (CMMR). Copyright© by the Chinese Pharmaceutical Association.
Biometric iris image acquisition system with wavefront coding technology

NASA Astrophysics Data System (ADS)

Hsieh, Sheng-Hsun; Yang, Hsi-Wen; Huang, Shao-Hung; Li, Yung-Hui; Tien, Chung-Hao

2013-09-01

Biometric signatures for identity recognition have been practiced for centuries. Basically, the personal attributes used for a biometric identification system can be classified into two areas: one is based on physiological attributes, such as DNA, facial features, retinal vasculature, fingerprint, hand geometry, iris texture and so on; the other scenario is dependent on the individual behavioral attributes, such as signature, keystroke, voice and gait style. Among these features, iris recognition is one of the most attractive approaches due to its nature of randomness, texture stability over a life time, high entropy density and non-invasive acquisition. While the performance of iris recognition on high quality image is well investigated, not too many studies addressed that how iris recognition performs subject to non-ideal image data, especially when the data is acquired in challenging conditions, such as long working distance, dynamical movement of subjects, uncontrolled illumination conditions and so on. There are three main contributions in this paper. Firstly, the optical system parameters, such as magnification and field of view, was optimally designed through the first-order optics. Secondly, the irradiance constraints was derived by optical conservation theorem. Through the relationship between the subject and the detector, we could estimate the limitation of working distance when the camera lens and CCD sensor were known. The working distance is set to 3m in our system with pupil diameter 86mm and CCD irradiance 0.3mW/cm2. Finally, We employed a hybrid scheme combining eye tracking with pan and tilt system, wavefront coding technology, filter optimization and post signal recognition to implement a robust iris recognition system in dynamic operation. The blurred image was restored to ensure recognition accuracy over 3m working distance with 400mm focal length and aperture F/6.3 optics. The simulation result as well as experiment validates the proposed code apertured imaging system, where the imaging volume was 2.57 times extended over the traditional optics, while keeping sufficient recognition accuracy.
Double Fourier analysis for Emotion Identification in Voiced Speech

NASA Astrophysics Data System (ADS)

Sierra-Sosa, D.; Bastidas, M.; Ortiz P., D.; Quintero, O. L.

2016-04-01

We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech. Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions. A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds. Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions. Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it. Finally features related with emotions in voiced speech are extracted and presented.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, J.F.; Ng, L.C.

1998-03-17

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, John F.; Ng, Lawrence C.

1998-01-01

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
Investigation of air transportation technology at Princeton University, 1985

NASA Technical Reports Server (NTRS)

Stengel, Robert F.

1987-01-01

The program proceeded along five avenues during 1985. Guidance and control strategies for penetration of microbursts and wind shear, application of artificial intelligence in flight control and air traffic control systems, the use of voice recognition in the cockpit, the effects of control saturation on closed-loop stability and response of open-loop unstable aircraft, and computer aided control system design are among the topics briefly considered. Areas of investigation relate to guidance and control of commercial transports as well as general aviation aircraft. Interaction between the flight crew and automatic systems is the subject of principal concern.
Hearing the Unheard: An Interdisciplinary, Mixed Methodology Study of Women’s Experiences of Hearing Voices (Auditory Verbal Hallucinations)

PubMed Central

McCarthy-Jones, Simon; Castro Romero, Maria; McCarthy-Jones, Roseline; Dillon, Jacqui; Cooper-Rompato, Christine; Kieran, Kathryn; Kaufman, Milissa; Blackman, Lisa

2015-01-01

This paper explores the experiences of women who “hear voices” (auditory verbal hallucinations). We begin by examining historical understandings of women hearing voices, showing these have been driven by androcentric theories of how women’s bodies functioned leading to women being viewed as requiring their voices be interpreted by men. We show the twentieth century was associated with recognition that the mental violation of women’s minds (represented by some voice-hearing) was often a consequence of the physical violation of women’s bodies. We next report the results of a qualitative study into voice-hearing women’s experiences (n = 8). This found similarities between women’s relationships with their voices and their relationships with others and the wider social context. Finally, we present results from a quantitative study comparing voice-hearing in women (n = 65) and men (n = 132) in a psychiatric setting. Women were more likely than men to have certain forms of voice-hearing (voices conversing) and to have antecedent events of trauma, physical illness, and relationship problems. Voices identified as female may have more positive affect than male voices. We conclude that women voice-hearers have and continue to face specific challenges necessitating research and activism, and hope this paper will act as a stimulus to such work. PMID:26779041
Within-Category VOT Affects Recovery from "Lexical" Garden-Paths: Evidence against Phoneme-Level Inhibition

ERIC Educational Resources Information Center

McMurray, Bob; Tanenhaus, Michael K.; Aslin, Richard N.

2009-01-01

Spoken word recognition shows gradient sensitivity to within-category voice onset time (VOT), as predicted by several current models of spoken word recognition, including TRACE (McClelland, J., & Elman, J. (1986). The TRACE model of speech perception. "Cognitive Psychology," 18, 1-86). It remains unclear, however, whether this sensitivity is…
Some Thoughts on the Meaning of and Values that Influence Degree Recognition in Canada

ERIC Educational Resources Information Center

Skolnik, Michael L.

2006-01-01

What has been called "degree recognition" has become the subject of considerable attention in Canadian higher education within the past decade. While concerns similar to those that are being voiced today have arisen occasionally in the past, the scale of this phenomenon today is unprecedented historically. In response to the increased…
Automatic translation among spoken languages

NASA Technical Reports Server (NTRS)

Walter, Sharon M.; Costigan, Kelly

1994-01-01

The Machine Aided Voice Translation (MAVT) system was developed in response to the shortage of experienced military field interrogators with both foreign language proficiency and interrogation skills. Combining speech recognition, machine translation, and speech generation technologies, the MAVT accepts an interrogator's spoken English question and translates it into spoken Spanish. The spoken Spanish response of the potential informant can then be translated into spoken English. Potential military and civilian applications for automatic spoken language translation technology are discussed in this paper.
Speech to Text Translation for Malay Language

NASA Astrophysics Data System (ADS)

Al-khulaidi, Rami Ali; Akmeliawati, Rini

2017-11-01

The speech recognition system is a front end and a back-end process that receives an audio signal uttered by a speaker and converts it into a text transcription. The speech system can be used in several fields including: therapeutic technology, education, social robotics and computer entertainments. In most cases in control tasks, which is the purpose of proposing our system, wherein the speed of performance and response concern as the system should integrate with other controlling platforms such as in voiced controlled robots. Therefore, the need for flexible platforms, that can be easily edited to jibe with functionality of the surroundings, came to the scene; unlike other software programs that require recording audios and multiple training for every entry such as MATLAB and Phoenix. In this paper, a speech recognition system for Malay language is implemented using Microsoft Visual Studio C#. 90 (ninety) Malay phrases were tested by 10 (ten) speakers from both genders in different contexts. The result shows that the overall accuracy (calculated from Confusion Matrix) is satisfactory as it is 92.69%.
Intentional Voice Command Detection for Trigger-Free Speech Interface

NASA Astrophysics Data System (ADS)

Obuchi, Yasunari; Sumiyoshi, Takashi

In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
Automatic assessment of voice quality according to the GRBAS scale.

PubMed

Sáenz-Lechón, Nicolás; Godino-Llorente, Juan I; Osma-Ruiz, Víctor; Blanco-Velasco, Manuel; Cruz-Roldán, Fernando

2006-01-01

Nowadays, the most extended techniques to measure the voice quality are based on perceptual evaluation by well trained professionals. The GRBAS scale is a widely used method for perceptual evaluation of voice quality. The GRBAS scale is widely used in Japan and there is increasing interest in both Europe and the United States. However, this technique needs well-trained experts, and is based on the evaluator's expertise, depending a lot on his own psycho-physical state. Furthermore, a great variability in the assessments performed from one evaluator to another is observed. Therefore, an objective method to provide such measurement of voice quality would be very valuable. In this paper, the automatic assessment of voice quality is addressed by means of short-term Mel cepstral parameters (MFCC), and learning vector quantization (LVQ) in a pattern recognition stage. Results show that this approach provides acceptable results for this purpose, with accuracy around 65% at the best.
Comparison of Voice Handicap Index Scores Between Female Students of Speech Therapy and Other Health Professions.

PubMed

Tafiadis, Dionysios; Chronopoulos, Spyridon K; Siafaka, Vassiliki; Drosos, Konstantinos; Kosma, Evangelia I; Toki, Eugenia I; Ziavra, Nausica

2017-09-01

Students' groups (eg, teachers, speech language pathologists) are presumably at risk of developing a voice disorder due to misuse of their voice, which will affect their way of living. Multidisciplinary voice assessment of student populations is currently spread widely along with the use of self-reported questionnaires. This study compared the Voice Handicap Index domains and item scores between female students of speech and language therapy and of other health professions in Greece. We also examined the probability of speech language therapy students developing any vocal symptom. Two hundred female non-dysphonic students (aged 18-31) were recruited. Participants answered the Voice Evaluation Form and the Greek adaptation of the Voice Handicap Index. Significant differences were observed between the two groups (students of speech therapy and other health professions) through Voice Handicap Index (total score, functional and physical domains), excluding the emotional domain. Furthermore, significant differences for specific Voice Handicap Index items, between subgroups, were observed. In conclusion, speech language therapy students had higher Voice Handicap Index scores, which probably could be an indicator for avoiding profession-related dysphonia at a later stage. Also, Voice Handicap Index could be at a first glance an assessment tool for the recognition of potential voice disorder development in students. In turn, the results could be used for indirect therapy approaches, such as providing methods for maintaining vocal health in different student populations. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Vocal recognition of owners by domestic cats (Felis catus).

PubMed

Saito, Atsuko; Shinozuka, Kazutaka

2013-07-01

Domestic cats have had a 10,000-year history of cohabitation with humans and seem to have the ability to communicate with humans. However, this has not been widely examined. We studied 20 domestic cats to investigate whether they could recognize their owners by using voices that called out the subjects' names, with a habituation-dishabituation method. While the owner was out of the cat's sight, we played three different strangers' voices serially, followed by the owner's voice. We recorded the cat's reactions to the voices and categorized them into six behavioral categories. In addition, ten naive raters rated the cats' response magnitudes. The cats responded to human voices not by communicative behavior (vocalization and tail movement), but by orienting behavior (ear movement and head movement). This tendency did not change even when they were called by their owners. Of the 20 cats, 15 demonstrated a lower response magnitude to the third voice than to the first voice. These habituated cats showed a significant rebound in response to the subsequent presentation of their owners' voices. This result indicates that cats are able to use vocal cues alone to distinguish between humans.
Identification and tracking of particular speaker in noisy environment

NASA Astrophysics Data System (ADS)

Sawada, Hideyuki; Ohkado, Minoru

2004-10-01

Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.
Impact of a Telehealth Program With Voice Recognition Technology in Patients With Chronic Heart Failure: Feasibility Study.

PubMed

Lee, Heesun; Park, Jun-Bean; Choi, Sae Won; Yoon, Yeonyee E; Park, Hyo Eun; Lee, Sang Eun; Lee, Seung-Pyo; Kim, Hyung-Kwan; Cho, Hyun-Jai; Choi, Su-Yeon; Lee, Hae-Young; Choi, Jonghyuk; Lee, Young-Joon; Kim, Yong-Jin; Cho, Goo-Yeong; Choi, Jinwook; Sohn, Dae-Won

2017-10-02

Despite the advances in the diagnosis and treatment of heart failure (HF), the current hospital-oriented framework for HF management does not appear to be sufficient to maintain the stability of HF patients in the long term. The importance of self-care management is increasingly being emphasized as a promising long-term treatment strategy for patients with chronic HF. The objective of this study was to evaluate whether a new information communication technology (ICT)-based telehealth program with voice recognition technology could improve clinical or laboratory outcomes in HF patients. In this prospective single-arm pilot study, we recruited 31 consecutive patients with chronic HF who were referred to our institute. An ICT-based telehealth program with voice recognition technology was developed and used by patients with HF for 12 weeks. Patients were educated on the use of this program via mobile phone, landline, or the Internet for the purpose of improving communication and data collection. Using these systems, we collected comprehensive data elements related to the risk of HF self-care management such as weight, diet, exercise, medication adherence, overall symptom change, and home blood pressure. The study endpoints were the changes observed in urine sodium concentration (uNa), Minnesota Living with Heart Failure (MLHFQ) scores, 6-min walk test, and N-terminal prohormone of brain natriuretic peptide (NT-proBNP) as surrogate markers for appropriate HF management. Among the 31 enrolled patients, 27 (87%) patients completed the study, and 10 (10/27, 37%) showed good adherence to ICT-based telehealth program with voice recognition technology, which was defined as the use of the program for 100 times or more during the study period. Nearly three-fourths of the patients had been hospitalized at least once because of HF before the enrollment (20/27, 74%); 14 patients had 1, 2 patients had 2, and 4 patients had 3 or more previous HF hospitalizations. In the total study population, there was no significant interval change in laboratory and functional outcome variables after 12 weeks of ICT-based telehealth program. In patients with good adherence to ICT-based telehealth program, there was a significant improvement in the mean uNa (103.1 to 78.1; P=.01) but not in those without (85.4 to 96.9; P=.49). Similarly, a marginal improvement in MLHFQ scores was only observed in patients with good adherence (27.5 to 21.4; P=.08) but not in their counterparts (19.0 to 19.7; P=.73). The mean 6-min walk distance and NT-proBNP were not significantly increased in patients regardless of their adherence. Short-term application of ICT-based telehealth program with voice recognition technology showed the potential to improve uNa values and MLHFQ scores in HF patients, suggesting that better control of sodium intake and greater quality of life can be achieved by this program. ©Heesun Lee, Jun-Bean Park, Sae Won Choi, Yeonyee E Yoon, Hyo Eun Park, Sang Eun Lee, Seung-Pyo Lee, Hyung-Kwan Kim, Hyun-Jai Cho, Su-Yeon Choi, Hae-Young Lee, Jonghyuk Choi, Young-Joon Lee, Yong-Jin Kim, Goo-Yeong Cho, Jinwook Choi, Dae-Won Sohn. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 02.10.2017.
Voices to reckon with: perceptions of voice identity in clinical and non-clinical voice hearers

PubMed Central

Badcock, Johanna C.; Chhabra, Saruchi

2013-01-01

The current review focuses on the perception of voice identity in clinical and non-clinical voice hearers. Identity perception in auditory verbal hallucinations (AVH) is grounded in the mechanisms of human (i.e., real, external) voice perception, and shapes the emotional (distress) and behavioral (help-seeking) response to the experience. Yet, the phenomenological assessment of voice identity is often limited, for example to the gender of the voice, and has failed to take advantage of recent models and evidence on human voice perception. In this paper we aim to synthesize the literature on identity in real and hallucinated voices and begin by providing a comprehensive overview of the features used to judge voice identity in healthy individuals and in people with schizophrenia. The findings suggest some subtle, but possibly systematic biases across different levels of voice identity in clinical hallucinators that are associated with higher levels of distress. Next we provide a critical evaluation of voice processing abilities in clinical and non-clinical voice hearers, including recent data collected in our laboratory. Our studies used diverse methods, assessing recognition and binding of words and voices in memory as well as multidimensional scaling of voice dissimilarity judgments. The findings overall point to significant difficulties recognizing familiar speakers and discriminating between unfamiliar speakers in people with schizophrenia, both with and without AVH. In contrast, these voice processing abilities appear to be generally intact in non-clinical hallucinators. The review highlights some important avenues for future research and treatment of AVH associated with a need for care, and suggests some novel insights into other symptoms of psychosis. PMID:23565088

The "Reading the Mind in the Voice" Test-Revised: A Study of Complex Emotion Recognition in Adults with and Without Autism Spectrum Conditions

ERIC Educational Resources Information Center

Golan, Ofer; Baron-Cohen, Simon; Hill, Jacqueline J.; Rutherford, M. D.

2007-01-01

This study reports a revised version of the "Reading the Mind in the Voice" (RMV) task. The original task (Rutherford et al., (2002), "Journal of Autism and Developmental Disorders, 32," 189-194) suffered from ceiling effects and limited sensitivity. To improve that, the task was shortened and two more foils were added to each of the remaining…
Real-time speech gisting for ATC applications

NASA Astrophysics Data System (ADS)

Dunkelberger, Kirk A.

1995-06-01

Command and control within the ATC environment remains primarily voice-based. Hence, automatic real time, speaker independent, continuous speech recognition (CSR) has many obvious applications and implied benefits to the ATC community: automated target tagging, aircraft compliance monitoring, controller training, automatic alarm disabling, display management, and many others. However, while current state-of-the-art CSR systems provide upwards of 98% word accuracy in laboratory environments, recent low-intrusion experiments in the ATCT environments demonstrated less than 70% word accuracy in spite of significant investments in recognizer tuning. Acoustic channel irregularities and controller/pilot grammar verities impact current CSR algorithms at their weakest points. It will be shown herein, however, that real time context- and environment-sensitive gisting can provide key command phrase recognition rates of greater than 95% using the same low-intrusion approach. The combination of real time inexact syntactic pattern recognition techniques and a tight integration of CSR, gisting, and ATC database accessor system components is the key to these high phase recognition rates. A system concept for real time gisting in the ATC context is presented herein. After establishing an application context, discussion presents a minimal CSR technology context then focuses on the gisting mechanism, desirable interfaces into the ATCT database environment, and data and control flow within the prototype system. Results of recent tests for a subset of the functionality are presented together with suggestions for further research.
A perspective on early commercial applications of voice-processing technology for telecommunications and aids for the handicapped.

PubMed Central

Seelbach, C

1995-01-01

The Colloquium on Human-Machine Communication by Voice highlighted the global technical community's focus on the problems and promise of voice-processing technology, particularly, speech recognition and speech synthesis. Clearly, there are many areas in both the research and development of these technologies that can be advanced significantly. However, it is also true that there are many applications of these technologies that are capable of commercialization now. Early successful commercialization of new technology is vital to ensure continuing interest in its development. This paper addresses efforts to commercialize speech technologies in two markets: telecommunications and aids for the handicapped. PMID:7479814
The Use of Voice Cues for Speaker Gender Recognition in Cochlear Implant Recipients

ERIC Educational Resources Information Center

Meister, Hartmut; Fürsen, Katrin; Streicher, Barbara; Lang-Roth, Ruth; Walger, Martin

2016-01-01

Purpose: The focus of this study was to examine the influence of fundamental frequency (F0) and vocal tract length (VTL) modifications on speaker gender recognition in cochlear implant (CI) recipients for different stimulus types. Method: Single words and sentences were manipulated using isolated or combined F0 and VTL cues. Using an 11-point…
Voice gender discrimination provides a measure of more than pitch-related perception in cochlear implant users

PubMed Central

Li, Tianhao; Fu, Qian-Jie

2013-01-01

Objectives (1) To investigate whether voice gender discrimination (VGD) could be a useful indicator of the spectral and temporal processing abilities of individual cochlear implant (CI) users; (2) To examine the relationship between VGD and speech recognition with CI when comparable acoustic cues are used for both perception processes. Design VGD was measured using two talker sets with different inter-gender fundamental frequencies (F0), as well as different acoustic CI simulations. Vowel and consonant recognition in quiet and noise were also measured and compared with VGD performance. Study sample Eleven postlingually deaf CI users. Results The results showed that (1) mean VGD performance differed for different stimulus sets, (2) VGD and speech recognition performance varied among individual CI users, and (3) individual VGD performance was significantly correlated with speech recognition performance under certain conditions. Conclusions VGD measured with selected stimulus sets might be useful for assessing not only pitch-related perception, but also spectral and temporal processing by individual CI users. In addition to improvements in spectral resolution and modulation detection, the improvement in higher modulation frequency discrimination might be particularly important for CI users in noisy environments. PMID:21696330
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J.F.; Ng, L.C.

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less
Engineering Evaluation and Assessment (EE and A) Report for the Symbolic and Sub-symbolic Robotics Intelligence Control System (SS-RICS)

DTIC Science & Technology

2018-04-01

Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions...2006. Since that time , SS-RICS has been the integration platform for many robotics algorithms using a variety of different disciplines from cognitive...voice recognition. Each noise level was run 10 times per gender, yielding 60 total runs. Two paths were chosen for testing (Paths A and B) of
A Password System Based on Sketches

DTIC Science & Technology

2016-07-12

than traditional passwords. Biometrics include biological properties such as fingerprints, voices, faces, and even handwriting . Fingerprints have been...perturbation of the sketch495 results in a corresponding change in the model, which is exactly what we imply when we say that model is (locally...Conf. on Frontiers in Handwriting Recognition (2010) 339–344.690 [29] M. Martinez-Diaz, J. Fierrez, J. Galbally, The DooDB Graphical Password Database: Data Analysis and Benchmark Results, IEEE Access 1 (2013) 596–605. 32 33
On the opposing views of the self–nonself discrimination by the immune system

PubMed Central

Cohn, Melvin

2010-01-01

Today’s generally accepted view of the self–nonself discrimination was voiced by Miller1 in 2004 in a thought-provoking essay. In spite of its popularity, this position has its limitations, which are analyzed here with a view toward establishing an interactive discussion that hopefully will culminate in agreed upon decisive experiments. The inadequacies of Miller’s view of the self–nonself discrimination and their resolution under the associative recognition of antigen model are analyzed. PMID:19048020
Some Effects of Stress on Users of a Voice Recognition System: A Preliminary Inquiry.

DTIC Science & Technology

1983-03-01

criterion of face valiaity Is also imposed (i.e., tne tasks are cctiigrea tc be acce;table to ta:get populations, e.g., pilots ... .Re :.11: pp. 22-Z5j...being the hlgtesz level of eacrt. it was thougtt tnat these iraividual response levels trighz soirehow te reiated to recognitio , rates. L. CUNCI-kTAL...generalizable ;heuorenon, it would irply that after some few training sessions with a reccenizer, the distinction vanishes. If so, faced with a
Voice Recognition Vocabulary Lists for the Army’s TACFIRE System.

DTIC Science & Technology

1983-01-01

reasons for considering the impl-men-ation of vcice control to TACFIR3. Threshold Tnc. was contarted and the researchers were told that there was nothing...thra section on the lactical Fire Control Function. The next section will establish the v3cabalary for the message associated with thS Non -nuclear Fire...Professor Department of Operations Research E. F. Roland Rolands and Associates Reviewed by: Released by: K. T. Mrshl l, ermn Willia M. Tolles Department of
Real-time interactive speech technology at Threshold Technology, Incorporated

NASA Technical Reports Server (NTRS)

Herscher, Marvin B.

1977-01-01

Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed.
Three input concepts for flight crew interaction with information presented on a large-screen electronic cockpit display

NASA Technical Reports Server (NTRS)

Jones, Denise R.

1990-01-01

A piloted simulation study was conducted comparing three different input methods for interfacing to a large-screen, multiwindow, whole-flight-deck display for management of transport aircraft systems. The thumball concept utilized a miniature trackball embedded in a conventional side-arm controller. The touch screen concept provided data entry through a capacitive touch screen. The voice concept utilized a speech recognition system with input through a head-worn microphone. No single input concept emerged as the most desirable method of interacting with the display. Subjective results, however, indicate that the voice concept was the most preferred method of data entry and had the most potential for future applications. The objective results indicate that, overall, the touch screen concept was the most effective input method. There was also significant differences between the time required to perform specific tasks and the input concept employed, with each concept providing better performance relative to a specific task. These results suggest that a system combining all three input concepts might provide the most effective method of interaction.
A Multidimensional Approach to the Study of Emotion Recognition in Autism Spectrum Disorders

PubMed Central

Xavier, Jean; Vignaud, Violaine; Ruggiero, Rosa; Bodeau, Nicolas; Cohen, David; Chaby, Laurence

2015-01-01

Although deficits in emotion recognition have been widely reported in autism spectrum disorder (ASD), experiments have been restricted to either facial or vocal expressions. Here, we explored multimodal emotion processing in children with ASD (N = 19) and with typical development (TD, N = 19), considering uni (faces and voices) and multimodal (faces/voices simultaneously) stimuli and developmental comorbidities (neuro-visual, language and motor impairments). Compared to TD controls, children with ASD had rather high and heterogeneous emotion recognition scores but showed also several significant differences: lower emotion recognition scores for visual stimuli, for neutral emotion, and a greater number of saccades during visual task. Multivariate analyses showed that: (1) the difficulties they experienced with visual stimuli were partially alleviated with multimodal stimuli. (2) Developmental age was significantly associated with emotion recognition in TD children, whereas it was the case only for the multimodal task in children with ASD. (3) Language impairments tended to be associated with emotion recognition scores of ASD children in the auditory modality. Conversely, in the visual or bimodal (visuo-auditory) tasks, the impact of developmental coordination disorder or neuro-visual impairments was not found. We conclude that impaired emotion processing constitutes a dimension to explore in the field of ASD, as research has the potential to define more homogeneous subgroups and tailored interventions. However, it is clear that developmental age, the nature of the stimuli, and other developmental comorbidities must also be taken into account when studying this dimension. PMID:26733928
Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.

PubMed

Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina

2014-12-01

In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.
Wireless Augmented Reality Prototype (WARP)

NASA Technical Reports Server (NTRS)

Devereaux, A. S.

1999-01-01

Initiated in January, 1997, under NASA's Office of Life and Microgravity Sciences and Applications, the Wireless Augmented Reality Prototype (WARP) is a means to leverage recent advances in communications, displays, imaging sensors, biosensors, voice recognition and microelectronics to develop a hands-free, tetherless system capable of real-time personal display and control of computer system resources. Using WARP, an astronaut may efficiently operate and monitor any computer-controllable activity inside or outside the vehicle or station. The WARP concept is a lightweight, unobtrusive heads-up display with a wireless wearable control unit. Connectivity to the external system is achieved through a high-rate radio link from the WARP personal unit to a base station unit installed into any system PC. The radio link has been specially engineered to operate within the high- interference, high-multipath environment of a space shuttle or space station module. Through this virtual terminal, the astronaut will be able to view and manipulate imagery, text or video, using voice commands to control the terminal operations. WARP's hands-free access to computer-based instruction texts, diagrams and checklists replaces juggling manuals and clipboards, and tetherless computer system access allows free motion throughout a cabin while monitoring and operating equipment.
Use of speech-to-text technology for documentation by healthcare providers.

PubMed

Ajami, Sima

2016-01-01

Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Voice input/output capabilities at Perception Technology Corporation

NASA Technical Reports Server (NTRS)

Ferber, Leon A.

1977-01-01

Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included.
Evolving Spiking Neural Networks for Recognition of Aged Voices.

PubMed

Silva, Marco; Vellasco, Marley M B R; Cataldo, Edson

2017-01-01

The aging of the voice, known as presbyphonia, is a natural process that can cause great change in vocal quality of the individual. This is a relevant problem to those people who use their voices professionally, and its early identification can help determine a suitable treatment to avoid its progress or even to eliminate the problem. This work focuses on the development of a new model for the identification of aging voices (independently of their chronological age), using as input attributes parameters extracted from the voice and glottal signals. The proposed model, named Quantum binary-real evolving Spiking Neural Network (QbrSNN), is based on spiking neural networks (SNNs), with an unsupervised training algorithm, and a Quantum-Inspired Evolutionary Algorithm that automatically determines the most relevant attributes and the optimal parameters that configure the SNN. The QbrSNN model was evaluated in a database composed of 120 records, containing samples from three groups of speakers. The results obtained indicate that the proposed model provides better accuracy than other approaches, with fewer input attributes. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
PACS administrators' and radiologists' perspective on the importance of features for PACS selection.

PubMed

Joshi, Vivek; Narra, Vamsi R; Joshi, Kailash; Lee, Kyootai; Melson, David

2014-08-01

Picture archiving and communication systems (PACS) play a critical role in radiology. This paper presents the criteria important to PACS administrators for selecting a PACS. A set of criteria are identified and organized into an integrative hierarchical framework. Survey responses from 48 administrators are used to identify the relative weights of these criteria through an analytical hierarchy process. The five main dimensions for PACS selection in order of importance are system continuity and functionality, system performance and architecture, user interface for workflow management, user interface for image manipulation, and display quality. Among the subdimensions, the highest weights were assessed for security, backup, and continuity; tools for continuous performance monitoring; support for multispecialty images; and voice recognition/transcription. PACS administrators' preferences were generally in line with that of previously reported results for radiologists. Both groups assigned the highest priority to ensuring business continuity and preventing loss of data through features such as security, backup, downtime prevention, and tools for continuous PACS performance monitoring. PACS administrators' next high priorities were support for multispecialty images, image retrieval speeds from short-term and long-term storage, real-time monitoring, and architectural issues of compatibility and integration with other products. Thus, next to ensuring business continuity, administrators' focus was on issues that impact their ability to deliver services and support. On the other hand, radiologists gave high priorities to voice recognition, transcription, and reporting; structured reporting; and convenience and responsiveness in manipulation of images. Thus, radiologists' focus appears to be on issues that may impact their productivity, effort, and accuracy.

Memory for faces and voices varies as a function of sex and expressed emotion.

PubMed

S Cortes, Diana; Laukka, Petri; Lindahl, Christina; Fischer, Håkan

2017-01-01

We investigated how memory for faces and voices (presented separately and in combination) varies as a function of sex and emotional expression (anger, disgust, fear, happiness, sadness, and neutral). At encoding, participants judged the expressed emotion of items in forced-choice tasks, followed by incidental Remember/Know recognition tasks. Results from 600 participants showed that accuracy (hits minus false alarms) was consistently higher for neutral compared to emotional items, whereas accuracy for specific emotions varied across the presentation modalities (i.e., faces, voices, and face-voice combinations). For the subjective sense of recollection ("remember" hits), neutral items received the highest hit rates only for faces, whereas for voices and face-voice combinations anger and fear expressions instead received the highest recollection rates. We also observed better accuracy for items by female expressers, and own-sex bias where female participants displayed memory advantage for female faces and face-voice combinations. Results further suggest that own-sex bias can be explained by recollection, rather than familiarity, rates. Overall, results show that memory for faces and voices may be influenced by the expressions that they carry, as well as by the sex of both items and participants. Emotion expressions may also enhance the subjective sense of recollection without enhancing memory accuracy.
Memory for faces and voices varies as a function of sex and expressed emotion

PubMed Central

Laukka, Petri; Lindahl, Christina; Fischer, Håkan

2017-01-01

We investigated how memory for faces and voices (presented separately and in combination) varies as a function of sex and emotional expression (anger, disgust, fear, happiness, sadness, and neutral). At encoding, participants judged the expressed emotion of items in forced-choice tasks, followed by incidental Remember/Know recognition tasks. Results from 600 participants showed that accuracy (hits minus false alarms) was consistently higher for neutral compared to emotional items, whereas accuracy for specific emotions varied across the presentation modalities (i.e., faces, voices, and face-voice combinations). For the subjective sense of recollection (“remember” hits), neutral items received the highest hit rates only for faces, whereas for voices and face-voice combinations anger and fear expressions instead received the highest recollection rates. We also observed better accuracy for items by female expressers, and own-sex bias where female participants displayed memory advantage for female faces and face-voice combinations. Results further suggest that own-sex bias can be explained by recollection, rather than familiarity, rates. Overall, results show that memory for faces and voices may be influenced by the expressions that they carry, as well as by the sex of both items and participants. Emotion expressions may also enhance the subjective sense of recollection without enhancing memory accuracy. PMID:28570691
More than Just Two Sexes: The Neural Correlates of Voice Gender Perception in Gender Dysphoria

PubMed Central

Junger, Jessica; Habel, Ute; Bröhr, Sabine; Neulen, Josef; Neuschaefer-Rube, Christiane; Birkholz, Peter; Kohler, Christian; Schneider, Frank; Derntl, Birgit; Pauly, Katharina

2014-01-01

Gender dysphoria (also known as “transsexualism”) is characterized as a discrepancy between anatomical sex and gender identity. Research points towards neurobiological influences. Due to the sexually dimorphic characteristics of the human voice, voice gender perception provides a biologically relevant function, e.g. in the context of mating selection. There is evidence for a better recognition of voices of the opposite sex and a differentiation of the sexes in its underlying functional cerebral correlates, namely the prefrontal and middle temporal areas. This fMRI study investigated the neural correlates of voice gender perception in 32 male-to-female gender dysphoric individuals (MtFs) compared to 20 non-gender dysphoric men and 19 non-gender dysphoric women. Participants indicated the sex of 240 voice stimuli modified in semitone steps in the direction to the other gender. Compared to men and women, MtFs showed differences in a neural network including the medial prefrontal gyrus, the insula, and the precuneus when responding to male vs. female voices. With increased voice morphing men recruited more prefrontal areas compared to women and MtFs, while MtFs revealed a pattern more similar to women. On a behavioral and neuronal level, our results support the feeling of MtFs reporting they cannot identify with their assigned sex. PMID:25375171
[Information technology in learning sign language].

PubMed

Hernández, Cesar; Pulido, Jose L; Arias, Jorge E

2015-01-01

To develop a technological tool that improves the initial learning of sign language in hearing impaired children. The development of this research was conducted in three phases: the lifting of requirements, design and development of the proposed device, and validation and evaluation device. Through the use of information technology and with the advice of special education professionals, we were able to develop an electronic device that facilitates the learning of sign language in deaf children. This is formed mainly by a graphic touch screen, a voice synthesizer, and a voice recognition system. Validation was performed with the deaf children in the Filadelfia School of the city of Bogotá. A learning methodology was established that improves learning times through a small, portable, lightweight, and educational technological prototype. Tests showed the effectiveness of this prototype, achieving a 32 % reduction in the initial learning time for sign language in deaf children.
What Are Some Types of Assistive Devices and How Are They Used?

MedlinePlus

... in persons with hearing problems. Cognitive assistance, including computer or electrical assistive devices, can help people function following brain injury. Computer software and hardware, such as voice recognition programs, ...
Human factors issues associated with the use of speech technology in the cockpit

NASA Technical Reports Server (NTRS)

Kersteen, Z. A.; Damos, D.

1983-01-01

The human factors issues associated with the use of voice technology in the cockpit are summarized. The formulation of the LHX avionics suite is described and the allocation of tasks to voice in the cockpit is discussed. State-of-the-art speech recognition technology is reviewed. Finally, a questionnaire designed to tap pilot opinions concerning the allocation of tasks to voice input and output in the cockpit is presented. This questionnaire was designed to be administered to operational AH-1G Cobra gunship pilots. Half of the questionnaire deals specifically with the AH-1G cockpit and the types of tasks pilots would like to have performed by voice in this existing rotorcraft. The remaining portion of the questionnaire deals with an undefined rotorcraft of the future and is aimed at determining what types of tasks these pilots would like to have performed by voice technology if anything was possible, i.e. if there were no technological constraints.
The structural neuroanatomy of music emotion recognition: Evidence from frontotemporal lobar degeneration

PubMed Central

Omar, Rohani; Henley, Susie M.D.; Bartlett, Jonathan W.; Hailstone, Julia C.; Gordon, Elizabeth; Sauter, Disa A.; Frost, Chris; Scott, Sophie K.; Warren, Jason D.

2011-01-01

Despite growing clinical and neurobiological interest in the brain mechanisms that process emotion in music, these mechanisms remain incompletely understood. Patients with frontotemporal lobar degeneration (FTLD) frequently exhibit clinical syndromes that illustrate the effects of breakdown in emotional and social functioning. Here we investigated the neuroanatomical substrate for recognition of musical emotion in a cohort of 26 patients with FTLD (16 with behavioural variant frontotemporal dementia, bvFTD, 10 with semantic dementia, SemD) using voxel-based morphometry. On neuropsychological evaluation, patients with FTLD showed deficient recognition of canonical emotions (happiness, sadness, anger and fear) from music as well as faces and voices compared with healthy control subjects. Impaired recognition of emotions from music was specifically associated with grey matter loss in a distributed cerebral network including insula, orbitofrontal cortex, anterior cingulate and medial prefrontal cortex, anterior temporal and more posterior temporal and parietal cortices, amygdala and the subcortical mesolimbic system. This network constitutes an essential brain substrate for recognition of musical emotion that overlaps with brain regions previously implicated in coding emotional value, behavioural context, conceptual knowledge and theory of mind. Musical emotion recognition may probe the interface of these processes, delineating a profile of brain damage that is essential for the abstraction of complex social emotions. PMID:21385617
The structural neuroanatomy of music emotion recognition: evidence from frontotemporal lobar degeneration.

PubMed

Omar, Rohani; Henley, Susie M D; Bartlett, Jonathan W; Hailstone, Julia C; Gordon, Elizabeth; Sauter, Disa A; Frost, Chris; Scott, Sophie K; Warren, Jason D

2011-06-01

Despite growing clinical and neurobiological interest in the brain mechanisms that process emotion in music, these mechanisms remain incompletely understood. Patients with frontotemporal lobar degeneration (FTLD) frequently exhibit clinical syndromes that illustrate the effects of breakdown in emotional and social functioning. Here we investigated the neuroanatomical substrate for recognition of musical emotion in a cohort of 26 patients with FTLD (16 with behavioural variant frontotemporal dementia, bvFTD, 10 with semantic dementia, SemD) using voxel-based morphometry. On neuropsychological evaluation, patients with FTLD showed deficient recognition of canonical emotions (happiness, sadness, anger and fear) from music as well as faces and voices compared with healthy control subjects. Impaired recognition of emotions from music was specifically associated with grey matter loss in a distributed cerebral network including insula, orbitofrontal cortex, anterior cingulate and medial prefrontal cortex, anterior temporal and more posterior temporal and parietal cortices, amygdala and the subcortical mesolimbic system. This network constitutes an essential brain substrate for recognition of musical emotion that overlaps with brain regions previously implicated in coding emotional value, behavioural context, conceptual knowledge and theory of mind. Musical emotion recognition may probe the interface of these processes, delineating a profile of brain damage that is essential for the abstraction of complex social emotions. Copyright © 2011 Elsevier Inc. All rights reserved.
Voice recognition technology implementation in surgical pathology: advantages and limitations.

PubMed

Singh, Meenakshi; Pal, Timothy R

2011-11-01

Voice recognition technology (VRT) has been in use for medical transcription outside of laboratories for many years, and in recent years it has evolved to a level where it merits consideration by surgical pathologists. To determine the feasibility and impact of making a transition from a transcriptionist-based service to VRT in surgical pathology. We have evaluated VRT in a phased manner for sign out of general and subspecialty surgical pathology cases after conducting a pilot study. We evaluated the effect on turnaround time, workflow, staffing, typographical error rates, and the overall ability of VRT to be adapted for use in surgical pathology. The stepwise implementation of VRT has resulted in real-time sign out of cases and improvement in average turnaround time from 4 to 3 days. The percentage of cases signed out in 1 day improved from 22% to 37%. Amendment rates for typographical errors have decreased. Use of templates and synoptic reports has been facilitated. The transcription staff has been reassigned to other duties and is successfully assisting in other areas. Resident involvement and exposure to complete case sign out has been achieved resulting in a positive impact on resident education. Voice recognition technology allows for a seamless workflow in surgical pathology, with improvements in turnaround time and a positive impact on competency-based resident education. Individual practices may assess the value of VRT and decide to implement it, potentially with gains in many aspects of their practice.
Off the Shelf Cloud Robotics for the Smart Home: Empowering a Wireless Robot through Cloud Computing.

PubMed

Ramírez De La Pinta, Javier; Maestre Torreblanca, José María; Jurado, Isabel; Reyes De Cozar, Sergio

2017-03-06

In this paper, we explore the possibilities offered by the integration of home automation systems and service robots. In particular, we examine how advanced computationally expensive services can be provided by using a cloud computing approach to overcome the limitations of the hardware available at the user's home. To this end, we integrate two wireless low-cost, off-the-shelf systems in this work, namely, the service robot Rovio and the home automation system Z-wave. Cloud computing is used to enhance the capabilities of these systems so that advanced sensing and interaction services based on image processing and voice recognition can be offered.
Off the Shelf Cloud Robotics for the Smart Home: Empowering a Wireless Robot through Cloud Computing

PubMed Central

Ramírez De La Pinta, Javier; Maestre Torreblanca, José María; Jurado, Isabel; Reyes De Cozar, Sergio

2017-01-01

In this paper, we explore the possibilities offered by the integration of home automation systems and service robots. In particular, we examine how advanced computationally expensive services can be provided by using a cloud computing approach to overcome the limitations of the hardware available at the user’s home. To this end, we integrate two wireless low-cost, off-the-shelf systems in this work, namely, the service robot Rovio and the home automation system Z-wave. Cloud computing is used to enhance the capabilities of these systems so that advanced sensing and interaction services based on image processing and voice recognition can be offered. PMID:28272305
Constraints on the Transfer of Perceptual Learning in Accented Speech

PubMed Central

Eisner, Frank; Melinger, Alissa; Weber, Andrea

2013-01-01

The perception of speech sounds can be re-tuned through a mechanism of lexically driven perceptual learning after exposure to instances of atypical speech production. This study asked whether this re-tuning is sensitive to the position of the atypical sound within the word. We investigated perceptual learning using English voiced stop consonants, which are commonly devoiced in word-final position by Dutch learners of English. After exposure to a Dutch learner’s productions of devoiced stops in word-final position (but not in any other positions), British English (BE) listeners showed evidence of perceptual learning in a subsequent cross-modal priming task, where auditory primes with devoiced final stops (e.g., “seed”, pronounced [si:th]), facilitated recognition of visual targets with voiced final stops (e.g., SEED). In Experiment 1, this learning effect generalized to test pairs where the critical contrast was in word-initial position, e.g., auditory primes such as “town” facilitated recognition of visual targets like DOWN. Control listeners, who had not heard any stops by the speaker during exposure, showed no learning effects. The generalization to word-initial position did not occur when participants had also heard correctly voiced, word-initial stops during exposure (Experiment 2), and when the speaker was a native BE speaker who mimicked the word-final devoicing (Experiment 3). The readiness of the perceptual system to generalize a previously learned adjustment to other positions within the word thus appears to be modulated by distributional properties of the speech input, as well as by the perceived sociophonetic characteristics of the speaker. The results suggest that the transfer of pre-lexical perceptual adjustments that occur through lexically driven learning can be affected by a combination of acoustic, phonological, and sociophonetic factors. PMID:23554598
Talker and accent variability effects on spoken word recognition

NASA Astrophysics Data System (ADS)

Nyang, Edna E.; Rogers, Catherine L.; Nishi, Kanae

2003-04-01

A number of studies have shown that words in a list are recognized less accurately in noise and with longer response latencies when they are spoken by multiple talkers, rather than a single talker. These results have been interpreted as support for an exemplar-based model of speech perception, in which it is assumed that detailed information regarding the speaker's voice is preserved in memory and used in recognition, rather than being eliminated via normalization. In the present study, the effects of varying both accent and talker are investigated using lists of words spoken by (a) a single native English speaker, (b) six native English speakers, (c) three native English speakers and three Japanese-accented English speakers. Twelve /hVd/ words were mixed with multi-speaker babble at three signal-to-noise ratios (+10, +5, and 0 dB) to create the word lists. Native English-speaking listeners' percent-correct recognition for words produced by native English speakers across the three talker conditions (single talker native, multi-talker native, and multi-talker mixed native and non-native) and three signal-to-noise ratios will be compared to determine whether sources of speaker variability other than voice alone add to the processing demands imposed by simple (i.e., single accent) speaker variability in spoken word recognition.
EMG-based speech recognition using hidden markov models with global control variables.

PubMed

Lee, Ki-Seung

2008-03-01

It is well known that a strong relationship exists between human voices and the movement of articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The sequence of EMG signals for each word is modelled by a hidden Markov model (HMM) framework. The main objective of the work involves building a model for state observation density when multichannel observation sequences are given. The proposed model reflects the dependencies between each of the EMG signals, which are described by introducing a global control variable. We also develop an efficient model training method, based on a maximum likelihood criterion. In a preliminary study, 60 isolated words were used as recognition variables. EMG signals were acquired from three articulatory facial muscles. The findings indicate that such a system may have the capacity to recognize speech signals with an accuracy of up to 87.07%, which is superior to the independent probabilistic model.
Asymmetric cultural effects on perceptual expertise underlie an own-race bias for voices

PubMed Central

Perrachione, Tyler K.; Chiao, Joan Y.; Wong, Patrick C.M.

2009-01-01

The own-race bias in memory for faces has been a rich source of empirical work on the mechanisms of person perception. This effect is thought to arise because the face-perception system differentially encodes the relevant structural dimensions of features and their configuration based on experiences with different groups of faces. However, the effects of sociocultural experiences on person perception abilities in other identity-conveying modalities like audition have not been explored. Investigating an own-race bias in the auditory domain provides a unique opportunity for studying whether person identification is a modality-independent construct and how it is sensitive to asymmetric cultural experiences. Here we show that an own-race bias in talker identification arises from asymmetric experience with different spoken dialects. When listeners categorized voices by race (White or Black), a subset of the Black voices were categorized as sounding White, while the opposite case was unattested. Acoustic analyses indicated listeners' perceptions about race were consistent with differences in specific phonetic and phonological features. In a subsequent person-identification experiment, the Black voices initially categorized as sounding White elicited an own-race bias from White listeners, but not from Black listeners. These effects are inconsistent with person-perception models that strictly analogize faces and voices based on recognition from only structural features. Our results demonstrate that asymmetric exposure to spoken dialect, independent from talkers' physical characteristics, affects auditory perceptual expertise for talker identification. Person perception thus additionally relies on socioculturally-acquired dynamic information, which may be represented by different mechanisms in different sensory modalities. PMID:19782970
Speech perception and communication ability over the telephone by Mandarin-speaking children with cochlear implants.

PubMed

Wu, Che-Ming; Liu, Tien-Chen; Wang, Nan-Mai; Chao, Wei-Chieh

2013-08-01

(1) To understand speech perception and communication ability through real telephone calls by Mandarin-speaking children with cochlear implants and compare them to live-voice perception, (2) to report the general condition of telephone use of this population, and (3) to investigate the factors that correlate with telephone speech perception performance. Fifty-six children with over 4 years of implant use (aged 6.8-13.6 years, mean duration 8.0 years) took three speech perception tests administered using telephone and live voice to examine sentence, monosyllabic-word and Mandarin tone perception. The children also filled out a questionnaire survey investigating everyday telephone use. Wilcoxon signed-rank test was used to compare the scores between live-voice and telephone tests, and Pearson's test to examine the correlation between them. The mean scores were 86.4%, 69.8% and 70.5% respectively for sentence, word and tone recognition over the telephone. The corresponding live-voice mean scores were 94.3%, 84.0% and 70.8%. Wilcoxon signed-rank test showed the sentence and word scores were significantly different between telephone and live voice test, while the tone recognition scores were not, indicating tone perception was less worsened by telephone transmission than words and sentences. Spearman's test showed that chronological age and duration of implant use were weakly correlated with the perception test scores. The questionnaire survey showed 78% of the children could initiate phone calls and 59% could use the telephone 2 years after implantation. Implanted children are potentially capable of using the telephone 2 years after implantation, and communication ability over the telephone becomes satisfactory 4 years after implantation. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Processing of speech signals for physical and sensory disabilities.

PubMed Central

Levitt, H

1995-01-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities. Images Fig. 4 PMID:7479816
Processing of Speech Signals for Physical and Sensory Disabilities

NASA Astrophysics Data System (ADS)

Levitt, Harry

1995-10-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.
Gay- and Lesbian-Sounding Auditory Cues Elicit Stereotyping and Discrimination.

PubMed

Fasoli, Fabio; Maass, Anne; Paladino, Maria Paola; Sulpizio, Simone

2017-07-01

The growing body of literature on the recognition of sexual orientation from voice ("auditory gaydar") is silent on the cognitive and social consequences of having a gay-/lesbian- versus heterosexual-sounding voice. We investigated this issue in four studies (overall N = 276), conducted in Italian language, in which heterosexual listeners were exposed to single-sentence voice samples of gay/lesbian and heterosexual speakers. In all four studies, listeners were found to make gender-typical inferences about traits and preferences of heterosexual speakers, but gender-atypical inferences about those of gay or lesbian speakers. Behavioral intention measures showed that listeners considered lesbian and gay speakers as less suitable for a leadership position, and male (but not female) listeners took distance from gay speakers. Together, this research demonstrates that having a gay/lesbian rather than heterosexual-sounding voice has tangible consequences for stereotyping and discrimination.
Basic and complex emotion recognition in children with autism: cross-cultural findings.

PubMed

Fridenson-Hayo, Shimrit; Berggren, Steve; Lassalle, Amandine; Tal, Shahar; Pigat, Delia; Bölte, Sven; Baron-Cohen, Simon; Golan, Ofer

2016-01-01

Children with autism spectrum conditions (ASC) have emotion recognition deficits when tested in different expression modalities (face, voice, body). However, these findings usually focus on basic emotions, using one or two expression modalities. In addition, cultural similarities and differences in emotion recognition patterns in children with ASC have not been explored before. The current study examined the similarities and differences in the recognition of basic and complex emotions by children with ASC and typically developing (TD) controls across three cultures: Israel, Britain, and Sweden. Fifty-five children with high-functioning ASC, aged 5-9, were compared to 58 TD children. On each site, groups were matched on age, sex, and IQ. Children were tested using four tasks, examining recognition of basic and complex emotions from voice recordings, videos of facial and bodily expressions, and emotional video scenarios including all modalities in context. Compared to their TD peers, children with ASC showed emotion recognition deficits in both basic and complex emotions on all three modalities and their integration in context. Complex emotions were harder to recognize, compared to basic emotions for the entire sample. Cross-cultural agreement was found for all major findings, with minor deviations on the face and body tasks. Our findings highlight the multimodal nature of ER deficits in ASC, which exist for basic as well as complex emotions and are relatively stable cross-culturally. Cross-cultural research has the potential to reveal both autism-specific universal deficits and the role that specific cultures play in the way empathy operates in different countries.

Google Home: smart speaker as environmental control unit.

PubMed

Noda, Kenichiro

2017-08-23

Environmental Control Units (ECU) are devices or a system that allows a person to control appliances in their home or work environment. Such system can be utilized by clients with physical and/or functional disability to enhance their ability to control their environment, to promote independence and improve their quality of life. Over the last several years, there have been an emergence of several inexpensive, commercially-available, voice activated smart speakers into the market such as Google Home and Amazon Echo. These smart speakers are equipped with far field microphone that supports voice recognition, and allows for complete hand-free operation for various purposes, including for playing music, for information retrieval, and most importantly, for environmental control. Clients with disability could utilize these features to turn the unit into a simple ECU that is completely voice activated and wirelessly connected to appliances. Smart speakers, with their ease of setup, low cost and versatility, may be a more affordable and accessible alternative to the traditional ECU. Implications for Rehabilitation Environmental Control Units (ECU) enable independence for physically and functionally disabled clients, and reduce burden and frequency of demands on carers. Traditional ECU can be costly and may require clients to learn specialized skills to use. Smart speakers have the potential to be used as a new-age ECU by overcoming these barriers, and can be used by a wider range of clients.
A telecommunications journey rural health network.

PubMed

Moore, Joe

2012-01-01

Utilizing a multi-gigabit statewide fiber healthcare network, Radiology Consultants of Iowa (RCI) set out to provide instantaneous service to their rural, critical access, hospital partners. RCIs idea was to assemble a collection of technologies and services that would even out workflow, reduce time on the road, and provide superior service. These technologies included PACS, voice recognition enabled dictation, HL7 interface technology, an imaging system for digitizing paper and prior films, and modern communication networks. The Iowa Rural Health Telecommunication Project was undertaken to form a system that all critical access hospitals would participate in, allowing RCI radiologists the efficiency of "any image, anywhere, anytime".
Top 10 "Smart" Technologies for Schools.

ERIC Educational Resources Information Center

Fodeman, Doug; Holzberg, Carol S.; Kennedy, Kristen; McIntire, Todd; McLester, Susan; Ohler, Jason; Parham, Charles; Poftak, Amy; Schrock, Kathy; Warlick, David

2002-01-01

Describes 10 smart technologies for education, including voice to text software; mobile computing; hybrid computing; virtual reality; artificial intelligence; telementoring; assessment methods; digital video production; fingerprint recognition; and brain functions. Lists pertinent Web sites for each technology. (LRW)
Who Goes There?

ERIC Educational Resources Information Center

Vail, Kathleen

1995-01-01

Biometrics (hand geometry, iris and retina scanners, voice and facial recognition, signature dynamics, facial thermography, and fingerprint readers) identifies people based on physical characteristics. Administrators worried about kidnapping, vandalism, theft, and violent intruders might welcome these security measures when they become more…
CNN: a speaker recognition system using a cascaded neural network.

PubMed

Zaki, M; Ghalwash, A; Elkouny, A A

1996-05-01

The main emphasis of this paper is to present an approach for combining supervised and unsupervised neural network models to the issue of speaker recognition. To enhance the overall operation and performance of recognition, the proposed strategy integrates the two techniques, forming one global model called the cascaded model. We first present a simple conventional technique based on the distance measured between a test vector and a reference vector for different speakers in the population. This particular distance metric has the property of weighting down the components in those directions along which the intraspeaker variance is large. The reason for presenting this method is to clarify the discrepancy in performance between the conventional and neural network approach. We then introduce the idea of using unsupervised learning technique, presented by the winner-take-all model, as a means of recognition. Due to several tests that have been conducted and in order to enhance the performance of this model, dealing with noisy patterns, we have preceded it with a supervised learning model--the pattern association model--which acts as a filtration stage. This work includes both the design and implementation of both conventional and neural network approaches to recognize the speakers templates--which are introduced to the system via a voice master card and preprocessed before extracting the features used in the recognition. The conclusion indicates that the system performance in case of neural network is better than that of the conventional one, achieving a smooth degradation in respect of noisy patterns, and higher performance in respect of noise-free patterns.
Effects of cross-language voice training on speech perception: Whose familiar voices are more intelligible?

PubMed Central

Levi, Susannah V.; Winters, Stephen J.; Pisoni, David B.

2011-01-01

Previous research has shown that familiarity with a talker’s voice can improve linguistic processing (herein, “Familiar Talker Advantage”), but this benefit is constrained by the context in which the talker’s voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers’ voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers’ voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. PMID:22225059
Developing an automated speech-recognition telephone diabetes intervention.

PubMed

Goldman, Roberta E; Sanchez-Hernandez, Maya; Ross-Degnan, Dennis; Piette, John D; Trinacty, Connie Mah; Simon, Steven R

2008-08-01

Many patients do not receive guideline-recommended care for diabetes and other chronic conditions. Automated speech-recognition telephone outreach to supplement in-person physician-patient communication may enhance patient care for chronic illness. We conducted this study to inform the development of an automated telephone outreach intervention for improving diabetes care among members of a large, not-for-profit health plan. In-depth telephone interviews with qualitative analysis. participants Individuals with diabetes (n=36) enrolled in a large regional health plan in the USA. Main outcome measure Patients' opinions about automated speech-recognition telephone technology. Patients who were recently diagnosed with diabetes and some with diabetes for a decade or more expressed basic informational needs. While most would prefer to speak with a live person rather than a computer-recorded voice, many felt that the automated system could successfully supplement the information they receive from their physicians and could serve as an integral part of their care. Patients suggested that such a system could provide specific dietary advice, information about diabetes and its self-care, a call-in menu of information topics, reminders about laboratory test results and appointments, tracking of personal laboratory results and feedback about their self-monitoring. While some patients expressed negative attitudes toward automated speech recognition telephone systems generally, most felt that a variety of functions of such a system could be beneficial to their diabetes care. In-depth interviews resulted in substantive input from health plan members for the design of an automated telephone outreach system to supplement in-person physician-patient communication in this population.
Evaluation of MPEG-7-Based Audio Descriptors for Animal Voice Recognition over Wireless Acoustic Sensor Networks.

PubMed

Luque, Joaquín; Larios, Diego F; Personal, Enrique; Barbancho, Julio; León, Carlos

2016-05-18

Environmental audio monitoring is a huge area of interest for biologists all over the world. This is why some audio monitoring system have been proposed in the literature, which can be classified into two different approaches: acquirement and compression of all audio patterns in order to send them as raw data to a main server; or specific recognition systems based on audio patterns. The first approach presents the drawback of a high amount of information to be stored in a main server. Moreover, this information requires a considerable amount of effort to be analyzed. The second approach has the drawback of its lack of scalability when new patterns need to be detected. To overcome these limitations, this paper proposes an environmental Wireless Acoustic Sensor Network architecture focused on use of generic descriptors based on an MPEG-7 standard. These descriptors demonstrate it to be suitable to be used in the recognition of different patterns, allowing a high scalability. The proposed parameters have been tested to recognize different behaviors of two anuran species that live in Spanish natural parks; the Epidalea calamita and the Alytes obstetricans toads, demonstrating to have a high classification performance.
Evaluation of MPEG-7-Based Audio Descriptors for Animal Voice Recognition over Wireless Acoustic Sensor Networks

PubMed Central

Luque, Joaquín; Larios, Diego F.; Personal, Enrique; Barbancho, Julio; León, Carlos

2016-01-01

Environmental audio monitoring is a huge area of interest for biologists all over the world. This is why some audio monitoring system have been proposed in the literature, which can be classified into two different approaches: acquirement and compression of all audio patterns in order to send them as raw data to a main server; or specific recognition systems based on audio patterns. The first approach presents the drawback of a high amount of information to be stored in a main server. Moreover, this information requires a considerable amount of effort to be analyzed. The second approach has the drawback of its lack of scalability when new patterns need to be detected. To overcome these limitations, this paper proposes an environmental Wireless Acoustic Sensor Network architecture focused on use of generic descriptors based on an MPEG-7 standard. These descriptors demonstrate it to be suitable to be used in the recognition of different patterns, allowing a high scalability. The proposed parameters have been tested to recognize different behaviors of two anuran species that live in Spanish natural parks; the Epidalea calamita and the Alytes obstetricans toads, demonstrating to have a high classification performance. PMID:27213375
Comparison of speech perception performance between Sprint/Esprit 3G and Freedom processors in children implanted with nucleus cochlear implants.

PubMed

Santarelli, Rosamaria; Magnavita, Vincenzo; De Filippi, Roberta; Ventura, Laura; Genovese, Elisabetta; Arslan, Edoardo

2009-04-01

To compare speech perception performance in children fitted with previous generation Nucleus sound processor, Sprint or Esprit 3G, and the Freedom, the most recently released system from the Cochlear Corporation that features a larger input dynamic range. Prospective intrasubject comparative study. University Medical Center. Seventeen prelingually deafened children who had received the Nucleus 24 cochlear implant and used the Sprint or Esprit 3G sound processor. Cochlear implantation with Cochlear device. Speech perception was evaluated at baseline (Sprint, n = 11; Esprit 3G, n = 6) and after 1 month's experience with the Freedom sound processor. Identification and recognition of disyllabic words and identification of vowels were performed via recorded voice in quiet (70 dB [A]), in the presence of background noise at various levels of signal-to-noise ratio (+10, +5, 0, -5) and at a soft presentation level (60 dB [A]). Consonant identification and recognition of disyllabic words, trisyllabic words, and sentences were evaluated in live voice. Frequency discrimination was measured in a subset of subjects (n = 5) by using an adaptive, 3-interval, 3-alternative, forced-choice procedure. Identification of disyllabic words administered at a soft presentation level showed a significant increase when switching to the Freedom compared with the previously worn processor in children using the Sprint or Esprit 3G. Identification and recognition of disyllabic words in the presence of background noise as well as consonant identification and sentence recognition increased significantly for the Freedom compared with the previously worn device only in children fitted with the Sprint. Frequency discrimination was significantly better when switching to the Freedom compared with the previously worn processor. Serial comparisons revealed that that speech perception performance evaluated in children aged 5 to 15 years was superior with the Freedom than previous generations of Nucleus sound processors. These differences are deemed to ensue from an increased input dynamic range, a feature that offers potentially enhanced phonemic discrimination.
On the definition and interpretation of voice selective activation in the temporal cortex

PubMed Central

Bethmann, Anja; Brechmann, André

2014-01-01

Regions along the superior temporal sulci and in the anterior temporal lobes have been found to be involved in voice processing. It has even been argued that parts of the temporal cortices serve as voice-selective areas. Yet, evidence for voice-selective activation in the strict sense is still missing. The current fMRI study aimed at assessing the degree of voice-specific processing in different parts of the superior and middle temporal cortices. To this end, voices of famous persons were contrasted with widely different categories, which were sounds of animals and musical instruments. The argumentation was that only brain regions with statistically proven absence of activation by the control stimuli may be considered as candidates for voice-selective areas. Neural activity was found to be stronger in response to human voices in all analyzed parts of the temporal lobes except for the middle and posterior STG. More importantly, the activation differences between voices and the other environmental sounds increased continuously from the mid-posterior STG to the anterior MTG. Here, only voices but not the control stimuli excited an increase of the BOLD response above a resting baseline level. The findings are discussed with reference to the function of the anterior temporal lobes in person recognition and the general question on how to define selectivity of brain regions for a specific class of stimuli or tasks. In addition, our results corroborate recent assumptions about the hierarchical organization of auditory processing building on a processing stream from the primary auditory cortices to anterior portions of the temporal lobes. PMID:25071527
Should visual speech cues (speechreading) be considered when fitting hearing aids?

NASA Astrophysics Data System (ADS)

Grant, Ken

2002-05-01

When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.
Wavelet-based associative memory

NASA Astrophysics Data System (ADS)

Jones, Katharine J.

2004-04-01

Faces provide important characteristics of a person"s identification. In security checks, face recognition still remains the method in continuous use despite other approaches (i.e. fingerprints, voice recognition, pupil contraction, DNA scanners). With an associative memory, the output data is recalled directly using the input data. This can be achieved with a Nonlinear Holographic Associative Memory (NHAM). This approach can also distinguish between strongly correlated images and images that are partially or totally enclosed by others. Adaptive wavelet lifting has been used for Content-Based Image Retrieval. In this paper, adaptive wavelet lifting will be applied to face recognition to achieve an associative memory.
Studies in automatic speech recognition and its application in aerospace

NASA Astrophysics Data System (ADS)

Taylor, Michael Robinson

Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.
A posteriori error estimates in voice source recovery

NASA Astrophysics Data System (ADS)

Leonov, A. S.; Sorokin, V. N.

2017-12-01

The inverse problem of voice source pulse recovery from a segment of a speech signal is under consideration. A special mathematical model is used for the solution that relates these quantities. A variational method of solving inverse problem of voice source recovery for a new parametric class of sources, that is for piecewise-linear sources (PWL-sources), is proposed. Also, a technique for a posteriori numerical error estimation for obtained solutions is presented. A computer study of the adequacy of adopted speech production model with PWL-sources is performed in solving the inverse problems for various types of voice signals, as well as corresponding study of a posteriori error estimates. Numerical experiments for speech signals show satisfactory properties of proposed a posteriori error estimates, which represent the upper bounds of possible errors in solving the inverse problem. The estimate of the most probable error in determining the source-pulse shapes is about 7-8% for the investigated speech material. It is noted that a posteriori error estimates can be used as a criterion of the quality for obtained voice source pulses in application to speaker recognition.
A Joint Time-Frequency and Matrix Decomposition Feature Extraction Methodology for Pathological Voice Classification

NASA Astrophysics Data System (ADS)

Ghoraani, Behnaz; Krishnan, Sridhar

2009-12-01

The number of people affected by speech problems is increasing as the modern world places increasing demands on the human voice via mobile telephones, voice recognition software, and interpersonal verbal communications. In this paper, we propose a novel methodology for automatic pattern classification of pathological voices. The main contribution of this paper is extraction of meaningful and unique features using Adaptive time-frequency distribution (TFD) and nonnegative matrix factorization (NMF). We construct Adaptive TFD as an effective signal analysis domain to dynamically track the nonstationarity in the speech and utilize NMF as a matrix decomposition (MD) technique to quantify the constructed TFD. The proposed method extracts meaningful and unique features from the joint TFD of the speech, and automatically identifies and measures the abnormality of the signal. Depending on the abnormality measure of each signal, we classify the signal into normal or pathological. The proposed method is applied on the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database which consists of 161 pathological and 51 normal speakers, and an overall classification accuracy of 98.6% was achieved.
Bihippocampal damage with emotional dysfunction: impaired auditory recognition of fear.

PubMed

Ghika-Schmid, F; Ghika, J; Vuilleumier, P; Assal, G; Vuadens, P; Scherer, K; Maeder, P; Uske, A; Bogousslavsky, J

1997-01-01

A right-handed man developed a sudden transient, amnestic syndrome associated with bilateral hemorrhage of the hippocampi, probably due to Urbach-Wiethe disease. In the 3rd month, despite significant hippocampal structural damage on imaging, only a milder degree of retrograde and anterograde amnesia persisted on detailed neuropsychological examination. On systematic testing of recognition of facial and vocal expression of emotion, we found an impairment of the vocal perception of fear, but not that of other emotions, such as joy, sadness and anger. Such selective impairment of fear perception was not present in the recognition of facial expression of emotion. Thus emotional perception varies according to the different aspects of emotions and the different modality of presentation (faces versus voices). This is consistent with the idea that there may be multiple emotion systems. The study of emotional perception in this unique case of bilateral involvement of hippocampus suggests that this structure may play a critical role in the recognition of fear in vocal expression, possibly dissociated from that of other emotions and from that of fear in facial expression. In regard of recent data suggesting that the amygdala is playing a role in the recognition of fear in the auditory as well as in the visual modality this could suggest that the hippocampus may be part of the auditory pathway of fear recognition.
Tongue prints: A novel biometric and potential forensic tool.

PubMed

Radhika, T; Jeddy, Nadeem; Nithya, S

2016-01-01

Tongue is a vital internal organ well encased within the oral cavity and protected from the environment. It has unique features which differ from individual to individual and even between identical twins. The color, shape, and surface features are characteristic of every individual, and this serves as a tool for identification. Many modes of biometric systems have come into existence such as fingerprint, iris scan, skin color, signature verification, voice recognition, and face recognition. The search for a new personal identification method secure has led to the use of the lingual impression or the tongue print as a method of biometric authentication. Tongue characteristics exhibit sexual dimorphism thus aiding in the identification of the person. Emerging as a novel biometric tool, tongue prints also hold the promise of a potential forensic tool. This review highlights the uniqueness of tongue prints and its superiority over other biometric identification systems. The various methods of tongue print collection and the classification of tongue features are also elucidated.
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

PubMed

Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
Application of Pattern Recognition Techniques to the Classification of Full-Term and Preterm Infant Cry.

PubMed

Orlandi, Silvia; Reyes Garcia, Carlos Alberto; Bandini, Andrea; Donzelli, Gianpaolo; Manfredi, Claudia

2016-11-01

Scientific and clinical advances in perinatology and neonatology have enhanced the chances of survival of preterm and very low weight neonates. Infant cry analysis is a suitable noninvasive complementary tool to assess the neurologic state of infants particularly important in the case of preterm neonates. This article aims at exploiting differences between full-term and preterm infant cry with robust automatic acoustical analysis and data mining techniques. Twenty-two acoustical parameters are estimated in more than 3000 cry units from cry recordings of 28 full-term and 10 preterm newborns. Feature extraction is performed through the BioVoice dedicated software tool, developed at the Biomedical Engineering Lab, University of Firenze, Italy. Classification and pattern recognition is based on genetic algorithms for the selection of the best attributes. Training is performed comparing four classifiers: Logistic Curve, Multilayer Perceptron, Support Vector Machine, and Random Forest and three different testing options: full training set, 10-fold cross-validation, and 66% split. Results show that the best feature set is made up by 10 parameters capable to assess differences between preterm and full-term newborns with about 87% of accuracy. Best results are obtained with the Random Forest method (receiver operating characteristic area, 0.94). These 10 cry features might convey important additional information to assist the clinical specialist in the diagnosis and follow-up of possible delays or disorders in the neurologic development due to premature birth in this extremely vulnerable population of patients. The proposed approach is a first step toward an automatic infant cry recognition system for fast and proper identification of risk in preterm babies. Copyright Â© 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

Prosody recognition and audiovisual emotion matching in schizophrenia: the contribution of cognition and psychopathology.

PubMed

Castagna, Filomena; Montemagni, Cristiana; Maria Milani, Anna; Rocca, Giuseppe; Rocca, Paola; Casacchia, Massimo; Bogetto, Filippo

2013-02-28

This study aimed to evaluate the ability to decode emotion in the auditory and audiovisual modality in a group of patients with schizophrenia, and to explore the role of cognition and psychopathology in affecting these emotion recognition abilities. Ninety-four outpatients in a stable phase and 51 healthy subjects were recruited. Patients were assessed through a psychiatric evaluation and a wide neuropsychological battery. All subjects completed the comprehensive affect testing system (CATS), a group of computerized tests designed to evaluate emotion perception abilities. With respect to the controls, patients were not impaired in the CATS tasks involving discrimination of nonemotional prosody, naming of emotional stimuli expressed by voice and judging the emotional content of a sentence, whereas they showed a specific impairment in decoding emotion in a conflicting auditory condition and in the multichannel modality. Prosody impairment was affected by executive functions, attention and negative symptoms, while deficit in multisensory emotion recognition was affected by executive functions and negative symptoms. These emotion recognition deficits, rather than being associated purely with emotion perception disturbances in schizophrenia, are affected by core symptoms of the illness. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Nature and extent of person recognition impairments associated with Capgras syndrome in Lewy body dementia

PubMed Central

Fiacconi, Chris M.; Barkley, Victoria; Finger, Elizabeth C.; Carson, Nicole; Duke, Devin; Rosenbaum, R. Shayna; Gilboa, Asaf; Köhler, Stefan

2014-01-01

Patients with Capgras syndrome (CS) adopt the delusional belief that persons well-known to them have been replaced by an imposter. Several current theoretical models of CS attribute such misidentification problems to deficits in covert recognition processes related to the generation of appropriate affective autonomic signals. These models assume intact overt recognition processes for the imposter and, more broadly, for other individuals. As such, it has been suggested that CS could reflect the “mirror-image” of prosopagnosia. The purpose of the current study was to determine whether overt person recognition abilities are indeed always spared in CS. Furthermore, we examined whether CS might be associated with any impairments in overt affective judgments of facial expressions. We pursued these goals by studying a patient with Dementia with Lewy bodies (DLB) who showed clear signs of CS, and by comparing him to another patient with DLB who did not experience CS, as well as to a group of healthy control participants. Clinical magnetic resonance imaging scans revealed medial prefrontal cortex (mPFC) atrophy that appeared to be uniquely associated with the presence CS. We assessed overt person recognition with three fame recognition tasks, using faces, voices, and names as cues. We also included measures of confidence and probed pertinent semantic knowledge. In addition, participants rated the intensity of fearful facial expressions. We found that CS was associated with overt person recognition deficits when probed with faces and voices, but not with names. Critically, these deficits were not present in the DLB patient without CS. In addition, CS was associated with impairments in overt judgments of affect intensity. Taken together, our findings cast doubt on the traditional view that CS is the mirror-image of prosopagnosia and that it spares overt recognition abilities. These findings can still be accommodated by models of CS that emphasize deficits in autonomic responding, to the extent that the potential role of interoceptive awareness in overt judgments is taken into account. PMID:25309399
Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction.

PubMed

Nass, C; Lee, K M

2001-09-01

Would people exhibit similarity-attraction and consistency-attraction toward unambiguously computer-generated speech even when personality is clearly not relevant? In Experiment 1, participants (extrovert or introvert) heard a synthesized voice (extrovert or introvert) on a book-buying Web site. Participants accurately recognized personality cues in text to speech and showed similarity-attraction in their evaluation of the computer voice, the book reviews, and the reviewer. Experiment 2, in a Web auction context, added personality of the text to the previous design. The results replicated Experiment 1 and demonstrated consistency (voice and text personality)-attraction. To maximize liking and trust, designers should set parameters, for example, words per minute or frequency range, that create a personality that is consistent with the user and the content being presented.
Distributed cooperating processes in a mobile robot control system

NASA Technical Reports Server (NTRS)

Skillman, Thomas L., Jr.

1988-01-01

A mobile inspection robot has been proposed for the NASA Space Station. It will be a free flying autonomous vehicle that will leave a berthing unit to accomplish a variety of inspection tasks around the Space Station, and then return to its berth to recharge, refuel, and transfer information. The Flying Eye robot will receive voice communication to change its attitude, move at a constant velocity, and move to a predefined location along a self generated path. This mobile robot control system requires integration of traditional command and control techniques with a number of AI technologies. Speech recognition, natural language understanding, task and path planning, sensory abstraction and pattern recognition are all required for successful implementation. The interface between the traditional numeric control techniques and the symbolic processing to the AI technologies must be developed, and a distributed computing approach will be needed to meet the real time computing requirements. To study the integration of the elements of this project, a novel mobile robot control architecture and simulation based on the blackboard architecture was developed. The control system operation and structure is discussed.
Newly learned word forms are abstract and integrated immediately after acquisition

PubMed Central

Kapnoula, Efthymia C.; McMurray, Bob

2015-01-01

A hotly debated question in word learning concerns the conditions under which newly learned words compete or interfere with familiar words during spoken word recognition. This has recently been described as a key marker of the integration of a new word into the lexicon and was thought to require consolidation Dumay & Gaskell, (Psychological Science, 18, 35–39, 2007; Gaskell & Dumay, Cognition, 89, 105–132, 2003). Recently, however, Kapnoula, Packard, Gupta, and McMurray, (Cognition, 134, 85–99, 2015) showed that interference can be observed immediately after a word is first learned, implying very rapid integration of new words into the lexicon. It is an open question whether these kinds of effects derive from episodic traces of novel words or from more abstract and lexicalized representations. Here we addressed this question by testing inhibition for newly learned words using training and test stimuli presented in different talker voices. During training, participants were exposed to a set of nonwords spoken by a female speaker. Immediately after training, we assessed the ability of the novel word forms to inhibit familiar words, using a variant of the visual world paradigm. Crucially, the test items were produced by a male speaker. An analysis of fixations showed that even with a change in voice, newly learned words interfered with the recognition of similar known words. These findings show that lexical competition effects from newly learned words spread across different talker voices, which suggests that newly learned words can be sufficiently lexicalized, and abstract with respect to talker voice, without consolidation. PMID:26202702
The Temporal Lobes Differentiate between the Voices of Famous and Unknown People: An Event-Related fMRI Study on Speaker Recognition

PubMed Central

Bethmann, Anja; Scheich, Henning; Brechmann, André

2012-01-01

It is widely accepted that the perception of human voices is supported by neural structures located along the superior temporal sulci. However, there is an ongoing discussion to what extent the activations found in fMRI studies are evoked by the vocal features themselves or are the result of phonetic processing. To show that the temporal lobes are indeed engaged in voice processing, short utterances spoken by famous and unknown people were presented to healthy young participants whose task it was to identify the familiar speakers. In two event-related fMRI experiments, the temporal lobes were found to differentiate between familiar and unfamiliar voices such that named voices elicited higher BOLD signal intensities than unfamiliar voices. Yet, the temporal cortices did not only discriminate between familiar and unfamiliar voices. Experiment 2, which required overtly spoken responses and allowed to distinguish between four familiarity grades, revealed that there was a fine-grained differentiation between all of these familiarity levels with higher familiarity being associated with larger BOLD signal amplitudes. Finally, we observed a gradual response change such that the BOLD signal differences between unfamiliar and highly familiar voices increased with the distance of an area from the transverse temporal gyri, especially towards the anterior temporal cortex and the middle temporal gyri. Therefore, the results suggest that (the anterior and non-superior portions of) the temporal lobes participate in voice-specific processing independent from phonetic components also involved in spoken speech material. PMID:23112826
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.

PubMed

Hartman, Douglas J

2015-06-01

Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2015 Elsevier Inc. All rights reserved.
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.

PubMed

Hartman, Douglas J

2016-03-01

Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2016 Elsevier Inc. All rights reserved.
Graphic overlays in high-precision teleoperation: Current and future work at JPL

NASA Technical Reports Server (NTRS)

Diner, Daniel B.; Venema, Steven C.

1989-01-01

In space teleoperation additional problems arise, including signal transmission time delays. These can greatly reduce operator performance. Recent advances in graphics open new possibilities for addressing these and other problems. Currently a multi-camera system with normal 3-D TV and video graphics capabilities is being developed. Trained and untrained operators will be tested for high precision performance using two force reflecting hand controllers and a voice recognition system to control two robot arms and up to 5 movable stereo or non-stereo TV cameras. A number of new techniques of integrating TV and video graphics displays to improve operator training and performance in teleoperation and supervised automation are evaluated.
Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

PubMed Central

Wong, Raymond

2013-01-01

Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684
Communication Robots for Elderly People and Their Families to Support Their Daily Lives - Case Study of Two Families Living with the Communicaton Robot.

PubMed

Inoue, Kaoru; Sasaki, Chihiro; Nakamura, Mio

2015-01-01

The aim of this project is to analyze how two families (one is living with a senior with physical disabilities and the other is living with seniors) feel about using the human-type communication robot "Palro" and what they demand for the improvement through their 3 weeks usage. All of them liked Palro and its programs, but needed some new programs. They pointed out that Palro sometimes had problems in the facial or voice recognition systems. Palro is useful in the area of self-care and social isolation.
Applications of Artificial Intelligence in Voice Recognition Systems in Micro-Computers.

DTIC Science & Technology

1982-03-01

DELTAO THEN 1290 1050 IF ANS$(I) = "HAIN MENU THEN 320 1060 IF ANS$(I) - " ABORTO THEN 3150 1070 IF ANS$(I) - 󈧄 BACK’ THEN 3590 1080 NEXT I 1090... ABORTO THEN 3150 1660 NEXT I 1670 SOTO 3350 3 REM’ ERROR PACK 1680 STOP 1690 REM SHIPS MENU 1700 REM------------ 1710 HOME : VTAB 5 :HTAB 15 :PRINT...IF ANS*(I) - PROFILESO THEN 3100 2470 IF IS$(I) - "MIN MENU" THEN 320 24Sf IF NB$(I) - "G0 BACK" THEN 3590 2490 IF ANS$(I) - " ABORTO THEN 3150 2500
Interactive Voice Technology: Variations in the Vocal Utterances of Speakers Performing a Stress-Inducing Task,

DTIC Science & Technology

1983-08-16

34. " .. ,,,,.-j.Aid-is.. ;,,i . -i.t . "’" ’, V ,1 5- 4. 3- kHz 2-’ r 1 r s ’.:’ BOGEY 5D 0 S BOGEY 12D Figure 10. Spectrograms of two versions of the word...MF5852801B 0001 Reviewed by Approved and Released by Ashton Graybiel, M.D. Captain W. M. Houk , MC, USN Chief Scientific Advisor Commanding Officer 16 August...incorporating knowledge about these changes into speech recognition systems. i A J- I. . S , .4, ... ..’-° -- -iii l - - .- - i- . .. " •- - i ,f , i
An investigation of potential applications of OP-SAPS: Operational sampled analog processors

NASA Technical Reports Server (NTRS)

Parrish, E. A.; Mcvey, E. S.

1976-01-01

The impact of charge-coupled device (CCD) processors on future instrumentation was investigated. The CCD devices studied process sampled analog data and are referred to as OP-SAPS - operational sampled analog processors. Preliminary studies into various architectural configurations for systems composed of OP-SAPS show that they have potential in such diverse applications as pattern recognition and automatic control. It appears probable that OP-SAPS may be used to construct computing structures which can serve as special peripherals to large-scale computer complexes used in real time flight simulation. The research was limited to the following benchmark programs: (1) face recognition, (2) voice command and control, (3) terrain classification, and (4) terrain identification. A small amount of effort was spent on examining a method by which OP-SAPS may be used to decrease the limiting ground sampling distance encountered in remote sensing from satellites.
Human and animal sounds influence recognition of body language.

PubMed

Van den Stock, Jan; Grèzes, Julie; de Gelder, Beatrice

2008-11-25

In naturalistic settings emotional events have multiple correlates and are simultaneously perceived by several sensory systems. Recent studies have shown that recognition of facial expressions is biased towards the emotion expressed by a simultaneously presented emotional expression in the voice even if attention is directed to the face only. So far, no study examined whether this phenomenon also applies to whole body expressions, although there is no obvious reason why this crossmodal influence would be specific for faces. Here we investigated whether perception of emotions expressed in whole body movements is influenced by affective information provided by human and by animal vocalizations. Participants were instructed to attend to the action displayed by the body and to categorize the expressed emotion. The results indicate that recognition of body language is biased towards the emotion expressed by the simultaneously presented auditory information, whether it consist of human or of animal sounds. Our results show that a crossmodal influence from auditory to visual emotional information obtains for whole body video images with the facial expression blanked and includes human as well as animal sounds.
Spoken Language Processing in the Clarissa Procedure Browser

NASA Technical Reports Server (NTRS)

Rayner, M.; Hockey, B. A.; Renders, J.-M.; Chatzichrisafis, N.; Farrell, K.

2005-01-01

Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station, is as far as we know the first spoken dialog system in space. We describe the objectives of the Clarissa project and the system's architecture. In particular, we focus on three key problems: grammar-based speech recognition using the Regulus toolkit; methods for open mic speech recognition; and robust side-effect free dialogue management for handling undos, corrections and confirmations. We first describe the grammar-based recogniser we have build using Regulus, and report experiments where we compare it against a class N-gram recogniser trained off the same 3297 utterance dataset. We obtained a 15% relative improvement in WER and a 37% improvement in semantic error rate. The grammar-based recogniser moreover outperforms the class N-gram version for utterances of all lengths from 1 to 9 words inclusive. The central problem in building an open-mic speech recognition system is being able to distinguish between commands directed at the system, and other material (cross-talk), which should be rejected. Most spoken dialogue systems make the accept/reject decision by applying a threshold to the recognition confidence score. NASA shows how a simple and general method, based on standard approaches to document classification using Support Vector Machines, can give substantially better performance, and report experiments showing a relative reduction in the task-level error rate by about 25% compared to the baseline confidence threshold method. Finally, we describe a general side-effect free dialogue management architecture that we have implemented in Clarissa, which extends the "update semantics'' framework by including task as well as dialogue information in the information state. We show that this enables elegant treatments of several dialogue management problems, including corrections, confirmations, querying of the environment, and regression testing.
Microcomputers: Independence and Information Access for the Physically Handicapped.

ERIC Educational Resources Information Center

Regen, Shari S.; Chen, Ching-chih

1984-01-01

Provides overview of recent technological developments in microcomputer technology for the physically disabled, including discussion of view expansion, "talking terminals," voice recognition, and price and convenience of micro-based products. Equipment manufacturers and training centers for the physically disabled are listed and microcomputer…
Measures of voiced frication for automatic classification

NASA Astrophysics Data System (ADS)

Jackson, Philip J. B.; Jesus, Luis M. T.; Shadle, Christine H.; Pincas, Jonathan

2004-05-01

As an approach to understanding the characteristics of the acoustic sources in voiced fricatives, it seems apt to draw on knowledge of vowels and voiceless fricatives, which have been relatively well studied. However, the presence of both phonation and frication in these mixed-source sounds offers the possibility of mutual interaction effects, with variations across place of articulation. This paper examines the acoustic and articulatory consequences of these interactions and explores automatic techniques for finding parametric and statistical descriptions of these phenomena. A reliable and consistent set of such acoustic cues could be used for phonetic classification or speech recognition. Following work on devoicing of European Portuguese voiced fricatives [Jesus and Shadle, in Mamede et al. (eds.) (Springer-Verlag, Berlin, 2003), pp. 1-8]. and the modulating effect of voicing on frication [Jackson and Shadle, J. Acoust. Soc. Am. 108, 1421-1434 (2000)], the present study focuses on three types of information: (i) sequences and durations of acoustic events in VC transitions, (ii) temporal, spectral and modulation measures from the periodic and aperiodic components of the acoustic signal, and (iii) voicing activity derived from simultaneous EGG data. Analysis of interactions observed in British/American English and European Portuguese speech corpora will be compared, and the principal findings discussed.
Do Listeners Store in Memory a Speaker's Habitual Utterance-Final Phonation Type?

PubMed Central

Bőhm, Tamás; Shattuck-Hufnagel, Stefanie

2009-01-01

Earlier studies report systematic differences across speakers in the occurrence of utterance-final irregular phonation; the work reported here investigated whether human listeners remember this speaker-specific information and can access it when necessary (a prerequisite for using this cue in speaker recognition). Listeners personally familiar with the voices of the speakers were presented with pairs of speech samples: one with the original and the other with transformed final phonation type. Asked to select the member of the pair that was closer to the talker's voice, most listeners tended to choose the unmanipulated token (even though they judged them to sound essentially equally natural). This suggests that utterance-final pitch period irregularity is part of the mental representation of individual speaker voices, although this may depend on the individual speaker and listener to some extent. PMID:19776665
The effect of voice onset time differences on lexical access in Dutch.

PubMed

van Alphen, Petra M; McQueen, James M

2006-02-01

Effects on spoken-word recognition of prevoicing differences in Dutch initial voiced plosives were examined. In 2 cross-modal identity-priming experiments, participants heard prime words and nonwords beginning with voiced plosives with 12, 6, or 0 periods of prevoicing or matched items beginning with voiceless plosives and made lexical decisions to visual tokens of those items. Six-period primes had the same effect as 12-period primes. Zero-period primes had a different effect, but only when their voiceless counterparts were real words. Listeners could nevertheless discriminate the 6-period primes from the 12- and 0-period primes. Phonetic detail appears to influence lexical access only to the extent that it is useful: In Dutch, presence versus absence of prevoicing is more informative than amount of prevoicing. ((c) 2006 APA, all rights reserved).

Development of miniaturized light endoscope-holder robot for laparoscopic surgery.

PubMed

Long, Jean-Alexandre; Cinquin, Philippe; Troccaz, Jocelyne; Voros, Sandrine; Berkelman, Peter; Descotes, Jean-Luc; Letoublon, Christian; Rambeaud, Jean-Jacques

2007-08-01

We have conducted experiments with an innovatively designed robot endoscope holder for laparoscopic surgery that is small and low cost. A compact light endoscope robot (LER) that is placed on the patient's skin and can be used with the patient in the lateral or dorsal supine position was tested on cadavers and laboratory pigs in order to allow successive modifications. The current control system is based on voice recognition. The range of vision is 360 degrees with an angle of 160 degrees . Twenty-three procedures were performed. The tests made it possible to advance the prototype on a variety of aspects, including reliability, steadiness, ergonomics, and dimensions. The ease of installation of the robot, which takes only 5 minutes, and the easy handling made it possible for 21 of the 23 procedures to be performed without an assistant. The LER is a camera holder guided by the surgeon's voice that can eliminate the need for an assistant during laparoscopic surgery. The ease of installation and manufacture should make it an effective and inexpensive system for use on patients in the lateral and dorsal supine positions. Randomized clinical trials will soon validate a new version of this robot prior to marketing.
High Tech and Library Access for People with Disabilities.

ERIC Educational Resources Information Center

Roatch, Mary A.

1992-01-01

Describes tools that enable people with disabilities to access print information, including optical character recognition, synthetic voice output, other input devices, Braille access devices, large print displays, television and video, TDD (Telecommunications Devices for the Deaf), and Telebraille. Use of technology by libraries to meet mandates…
ERP evidence of preserved early memory function in term infants with neonatal encephalopathy following therapeutic hypothermia.

PubMed

Pfister, Katie M; Zhang, Lei; Miller, Neely C; Hultgren, Solveig; Boys, Chris J; Georgieff, Michael K

2016-12-01

Neonatal encephalopathy (NE) carries high risk for neurodevelopmental impairments. Therapeutic hypothermia (TH) reduces this risk, particularly for moderate encephalopathy (ME). Nevertheless, these infants often have subtle functional deficits, including abnormal memory function. Detection of deficits at the earliest possible time-point would allow for intervention during a period of maximal brain plasticity. Recognition memory function in 22 infants with NE treated with TH was compared to 23 healthy controls using event-related potentials (ERPs) at 2 wk of age. ERPs were recorded to mother's voice alternating with a stranger's voice to assess attentional responses (P2), novelty detection (slow wave), and discrimination between familiar and novel (difference wave). Development was tested at 12 mo using the Bayley Scales of Infant Development, Third Edition (BSID-III). The NE group showed similar ERP components and BSID-III scores to controls. However, infants with NE showed discrimination at midline leads (P = 0.01), whereas controls showed discrimination in the left hemisphere (P = 0.05). Normal MRI (P = 0.05) and seizure-free electroencephalogram (EEG) (P = 0.04) correlated positively with outcomes. Infants with NE have preserved recognition memory function after TH. The spatially different recognition memory processing after early brain injury may represent compensatory changes in the brain circuitry and reflect a benefit of TH.
Phase effects in masking by harmonic complexes: speech recognition.

PubMed

Deroche, Mickael L D; Culling, John F; Chatterjee, Monita

2013-12-01

Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.
Automated speech understanding: the next generation

NASA Astrophysics Data System (ADS)

Picone, J.; Ebel, W. J.; Deshmukh, N.

1995-04-01

Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.
A DFT-Based Method of Feature Extraction for Palmprint Recognition

NASA Astrophysics Data System (ADS)

Choge, H. Kipsang; Karungaru, Stephen G.; Tsuge, Satoru; Fukumi, Minoru

Over the last quarter century, research in biometric systems has developed at a breathtaking pace and what started with the focus on the fingerprint has now expanded to include face, voice, iris, and behavioral characteristics such as gait. Palmprint is one of the most recent additions, and is currently the subject of great research interest due to its inherent uniqueness, stability, user-friendliness and ease of acquisition. This paper describes an effective and procedurally simple method of palmprint feature extraction specifically for palmprint recognition, although verification experiments are also conducted. This method takes advantage of the correspondences that exist between prominent palmprint features or objects in the spatial domain with those in the frequency or Fourier domain. Multi-dimensional feature vectors are formed by extracting a GA-optimized set of points from the 2-D Fourier spectrum of the palmprint images. The feature vectors are then used for palmprint recognition, before and after dimensionality reduction via the Karhunen-Loeve Transform (KLT). Experiments performed using palmprint images from the ‘PolyU Palmprint Database’ indicate that using a compact set of DFT coefficients, combined with KLT and data preprocessing, produces a recognition accuracy of more than 98% and can provide a fast and effective technique for personal identification.
Automated speech recognition for time recording in out-of-hospital emergency medicine-an experimental approach.

PubMed

Gröschel, J; Philipp, F; Skonetzki, St; Genzwürker, H; Wetter, Th; Ellinger, K

2004-02-01

Precise documentation of medical treatment in emergency medical missions and for resuscitation is essential from a medical, legal and quality assurance point of view [Anästhesiologie und Intensivmedizin, 41 (2000) 737]. All conventional methods of time recording are either too inaccurate or elaborate for routine application. Automated speech recognition may offer a solution. A special erase programme for the documentation of all time events was developed. Standard speech recognition software (IBM ViaVoice 7.0) was adapted and installed on two different computer systems. One was a stationary PC (500MHz Pentium III, 128MB RAM, Soundblaster PCI 128 Soundcard, Win NT 4.0), the other was a mobile pen-PC that had already proven its value during emergency missions [Der Notarzt 16, p. 177] (Fujitsu Stylistic 2300, 230Mhz MMX Processor, 160MB RAM, embedded soundcard ESS 1879 chipset, Win98 2nd ed.). On both computers two different microphones were tested. One was a standard headset that came with the recognition software, the other was a small microphone (Lavalier-Kondensatormikrofon EM 116 from Vivanco), that could be attached to the operators collar. Seven women and 15 men spoke a text with 29 phrases to be recognised. Two emergency physicians tested the system in a simulated emergency setting using the collar microphone and the pen-PC with an analogue wireless connection. Overall recognition was best for the PC with a headset (89%) followed by the pen-PC with a headset (85%), the PC with a microphone (84%) and the pen-PC with a microphone (80%). Nevertheless, the difference was not statistically significant. Recognition became significantly worse (89.5% versus 82.3%, P<0.0001 ) when numbers had to be recognised. The gender of speaker and the number of words in a sentence had no influence. Average recognition in the simulated emergency setting was 75%. At no time did false recognition appear. Time recording with automated speech recognition seems to be possible in emergency medical missions. Although results show an average recognition of only 75%, it is possible that missing elements may be reconstructed more precisely. Future technology should integrate a secure wireless connection between microphone and mobile computer. The system could then prove its value for real out-of-hospital emergencies.
Voice-enabled Knowledge Engine using Flood Ontology and Natural Language Processing

NASA Astrophysics Data System (ADS)

Sermet, M. Y.; Demir, I.; Krajewski, W. F.

2015-12-01

The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The IFIS is designed for use by general public, often people with no domain knowledge and limited general science background. To improve effective communication with such audience, we have introduced a voice-enabled knowledge engine on flood related issues in IFIS. Instead of navigating within many features and interfaces of the information system and web-based sources, the system provides dynamic computations based on a collection of built-in data, analysis, and methods. The IFIS Knowledge Engine connects to real-time stream gauges, in-house data sources, analysis and visualization tools to answer natural language questions. Our goal is the systematization of data and modeling results on flood related issues in Iowa, and to provide an interface for definitive answers to factual queries. The goal of the knowledge engine is to make all flood related knowledge in Iowa easily accessible to everyone, and support voice-enabled natural language input. We aim to integrate and curate all flood related data, implement analytical and visualization tools, and make it possible to compute answers from questions. The IFIS explicitly implements analytical methods and models, as algorithms, and curates all flood related data and resources so that all these resources are computable. The IFIS Knowledge Engine computes the answer by deriving it from its computational knowledge base. The knowledge engine processes the statement, access data warehouse, run complex database queries on the server-side and return outputs in various formats. This presentation provides an overview of IFIS Knowledge Engine, its unique information interface and functionality as an educational tool, and discusses the future plans for providing knowledge on flood related issues and resources. IFIS Knowledge Engine provides an alternative access method to these comprehensive set of tools and data resources available in IFIS. Current implementation of the system accepts free-form input and voice recognition capabilities within browser and mobile applications.
A Qualitative Examination of Situational Risk Recognition Among Female Victims of Physical Intimate Partner Violence.

PubMed

Sherrill, Andrew M; Bell, Kathryn M; Wyngarden, Nicole

2016-07-01

Little is known about intimate partner violence (IPV) victims' situational risk recognition, defined as the ability to identify situational factors that signal imminent risk of victimization. Using semi-structured interviews, qualitative data were collected from a community sample of 31 female victims of IPV episodes involving substance use. Thirteen themes were identified, the most prevalent being related to the partner's verbal behavior, tone of voice, motor behavior, alcohol or drug use, and facial expression. Participants reporting at least some anticipation of physical aggression (61.3% of the sample) tended to identify multiple factors (M = 3.47), suggesting numerous situational features often contribute to situational risk recognition. © The Author(s) 2015.
Effect of Dopamine Therapy on Nonverbal Affect Burst Recognition in Parkinson's Disease

PubMed Central

Péron, Julie; Grandjean, Didier; Drapier, Sophie; Vérin, Marc

2014-01-01

Background Parkinson's disease (PD) provides a model for investigating the involvement of the basal ganglia and mesolimbic dopaminergic system in the recognition of emotions from voices (i.e., emotional prosody). Although previous studies of emotional prosody recognition in PD have reported evidence of impairment, none of them compared PD patients at different stages of the disease, or ON and OFF dopamine replacement therapy, making it difficult to determine whether their impairment was due to general cognitive deterioration or to a more specific dopaminergic deficit. Methods We explored the involvement of the dopaminergic pathways in the recognition of nonverbal affect bursts (onomatopoeias) in 15 newly diagnosed PD patients in the early stages of the disease, 15 PD patients in the advanced stages of the disease and 15 healthy controls. The early PD group was studied in two conditions: ON and OFF dopaminergic therapy. Results Results showed that the early PD patients performed more poorly in the ON condition than in the OFF one, for overall emotion recognition, as well as for the recognition of anger, disgust and fear. Additionally, for anger, the early PD ON patients performed more poorly than controls. For overall emotion recognition, both advanced PD patients and early PD ON patients performed more poorly than controls. Analysis of continuous ratings on target and nontarget visual analog scales confirmed these patterns of results, showing a systematic emotional bias in both the advanced PD and early PD ON (but not OFF) patients compared with controls. Conclusions These results i) confirm the involvement of the dopaminergic pathways and basal ganglia in emotional prosody recognition, and ii) suggest a possibly deleterious effect of dopatherapy on affective abilities in the early stages of PD. PMID:24651759
47 CFR 25.259 - Time sharing between NOAA meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2014 CFR

2014-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. 25.259 Section... systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. (a) The space stations of a non-voice, non-geostationary Mobile-Satellite Service (NVNG MSS) system time-sharing downlink...
47 CFR 25.259 - Time sharing between NOAA meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2013 CFR

2013-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. 25.259 Section... systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. (a) The space stations of a non-voice, non-geostationary Mobile-Satellite Service (NVNG MSS) system time-sharing downlink...
75 FR 30845 - Request Voucher for Grant Payment and Line of Credit Control System (LOCCS) Voice Response System...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-06-02

... request vouchers for distribution of grant funds using the automated Voice Response System (VRS). An... Payment and Line of Credit Control System (LOCCS) Voice Response System Access Authorization AGENCY... subject proposal. Payment request vouchers for distribution of grant funds using the automated Voice...
Radiological reporting that combine continuous speech recognition with error correction by transcriptionists.

PubMed

Ichikawa, Tamaki; Kitanosono, Takashi; Koizumi, Jun; Ogushi, Yoichi; Tanaka, Osamu; Endo, Jun; Hashimoto, Takeshi; Kawada, Shuichi; Saito, Midori; Kobayashi, Makiko; Imai, Yutaka

2007-12-20

We evaluated the usefulness of radiological reporting that combines continuous speech recognition (CSR) and error correction by transcriptionists. Four transcriptionists (two with more than 10 years' and two with less than 3 months' transcription experience) listened to the same 100 dictation files and created radiological reports using conventional transcription and a method that combined CSR with manual error correction by the transcriptionists. We compared the 2 groups using the 2 methods for accuracy and report creation time and evaluated the transcriptionists' inter-personal dependence on accuracy rate and report creation time. We used a CSR system that did not require the training of the system to recognize the user's voice. We observed no significant difference in accuracy between the 2 groups and 2 methods that we tested, though transcriptionists with greater experience transcribed faster than those with less experience using conventional transcription. Using the combined method, error correction speed was not significantly different between two groups of transcriptionists with different levels of experience. Combining CSR and manual error correction by transcriptionists enabled convenient and accurate radiological reporting.
Children's Recognition of Emotions from Vocal Cues

ERIC Educational Resources Information Center

Sauter, Disa A.; Panattoni, Charlotte; Happe, Francesca

2013-01-01

Emotional cues contain important information about the intentions and feelings of others. Despite a wealth of research into children's understanding of facial signals of emotions, little research has investigated the developmental trajectory of interpreting affective cues in the voice. In this study, 48 children ranging between 5 and 10 years were…
Must One Be "In Recovery" To Help?

ERIC Educational Resources Information Center

Trimpey, Jack

Rational Recovery (RR) and the Addictive Voice Recognition Technique (AVRT) are described. Rational recovery is a young organization which views alcohol and drug dependency differently from the traditional field which sees addiction as a symptom of something, of a disease, of spiritual bankruptcy, of irrational thinking, of unhappiness, of…
Morphosyntactic Neural Analysis for Generalized Lexical Normalization

ERIC Educational Resources Information Center

Leeman-Munk, Samuel Paul

2016-01-01

The phenomenal growth of social media, web forums, and online reviews has spurred a growing interest in automated analysis of user-generated text. At the same time, a proliferation of voice recordings and efforts to archive culture heritage documents are fueling demand for effective automatic speech recognition (ASR) and optical character…
Web Surveys to Digital Movies: Technological Tools of the Trade.

ERIC Educational Resources Information Center

Fetterman, David M.

2002-01-01

Highlights some of the technological tools used by educational researchers today, focusing on data collection related tools such as Web surveys, digital photography, voice recognition and transcription, file sharing and virtual office, videoconferencing on the Internet, instantaneous chat and chat rooms, reporting and dissemination, and digital…
Experimental evidence of vocal recognition in young and adult black-legged kittiwakes

USGS Publications Warehouse

Mulard, Hervé; Aubin, T.; White, J.F.; Hatch, Shyla A.; Danchin, E.

2008-01-01

Individual recognition is required in most social interactions, and its presence has been confirmed in many species. In birds, vocal cues appear to be a major component of recognition. Curiously, vocal recognition seems absent or limited in some highly social species such as the black-legged kittiwake, Rissa tridactyla. Using playback experiments, we found that kittiwake chicks recognized their parents vocally, this capacity being detectable as early as 20 days after hatching, the youngest age tested. Mates also recognized each other's long calls. Some birds reacted to their partner's voice when only a part of the long call was played back. Nevertheless, only about a third of the tested birds reacted to their mate's or parents' call and we were unable to detect recognition among neighbours. We discuss the low reactivity of kittiwakes in relation to their cliff-nesting habit and compare our results with evidence of vocal recognition in other larids. ?? 2008 The Association for the Study of Animal Behaviour.
Learning Media Application Based On Microcontroller Chip Technology In Early Age

NASA Astrophysics Data System (ADS)

Ika Hidayati, Permata

2018-04-01

In Early childhood cognitive intelligence need right rncdia learning that can help a child’s cognitive intelligence quickly. The purpose of this study to design a learning media in the form of a puppet can used to introduce human anatomy during early childhood. This educational doll utilizing voice recognition technology from EasyVR module to receive commands from the user to introduce body parts on a doll, is used as an indicator TED. In addition to providing the introduction of human anatomy, this dolljut. a user can give a shout out to mainly play previously stored voice module sound recorder. Results obtained from this study is that this educational dolls can detect more than voice and spoken commands that can be random detected. Distance concrete of this doll in detecting the sound is up to a distance of 2.5 meters.

Vocal fold nodules in adult singers: regional opinions about etiologic factors, career impact, and treatment. A survey of otolaryngologists, speech pathologists, and teachers of singing.

PubMed

Hogikyan, N D; Appel, S; Guinn, L W; Haxer, M J

1999-03-01

This study was undertaken to better understand current regional opinions regarding vocal fold nodules in adult singers. A questionnaire was sent to 298 persons representing the 3 professional groups most involved with the care of singers with vocal nodules: otolaryngologists, speech pathologists, and teachers of singing. The questionnaire queried respondents about their level of experience with this problem, and their beliefs about causative factors, career impact, and optimum treatment. Responses within and between groups were similar, with differences between groups primarily in the magnitude of positive or negative responses, rather than in the polarity of the responses. Prevailing opinions included: recognition of causative factors in both singing and speaking voice practices, optimism about responsiveness to appropriate treatment, enthusiasm for coordinated voice therapy and voice training as first-line treatment, and acceptance of microsurgical management as appropriate treatment if behavioral management fails.
Technical experiences of implementing a wireless tracking and facial biometric verification system for a clinical environment

NASA Astrophysics Data System (ADS)

Liu, Brent; Lee, Jasper; Documet, Jorge; Guo, Bing; King, Nelson; Huang, H. K.

2006-03-01

By implementing a tracking and verification system, clinical facilities can effectively monitor workflow and heighten information security in today's growing demand towards digital imaging informatics. This paper presents the technical design and implementation experiences encountered during the development of a Location Tracking and Verification System (LTVS) for a clinical environment. LTVS integrates facial biometrics with wireless tracking so that administrators can manage and monitor patient and staff through a web-based application. Implementation challenges fall into three main areas: 1) Development and Integration, 2) Calibration and Optimization of Wi-Fi Tracking System, and 3) Clinical Implementation. An initial prototype LTVS has been implemented within USC's Healthcare Consultation Center II Outpatient Facility, which currently has a fully digital imaging department environment with integrated HIS/RIS/PACS/VR (Voice Recognition).
47 CFR 25.260 - Time sharing between DoD meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2014 CFR

2014-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. 25.260... systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. (a) The space stations of a non-voice, non-geostationary Mobile-Satellite Service (NVNG MSS) system time-sharing downlink...
47 CFR 25.260 - Time sharing between DoD meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2010 CFR

2010-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. 25.260... systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. (a) A non-voice, non-geostationary mobile-satellite service system licensee (“NVNG licensee”) time-sharing spectrum in...
47 CFR 25.260 - Time sharing between DoD meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2013 CFR

2013-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. 25.260... systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. (a) The space stations of a non-voice, non-geostationary Mobile-Satellite Service (NVNG MSS) system time-sharing downlink...
47 CFR 25.260 - Time sharing between DoD meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2011 CFR

2011-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. 25.260... systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. (a) A non-voice, non-geostationary mobile-satellite service system licensee (“NVNG licensee”) time-sharing spectrum in...
47 CFR 25.259 - Time sharing between NOAA meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2010 CFR

2010-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. 25.259 Section... systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. (a) A non-voice, non-geostationary mobile-satellite service system licensee (“NVNG licensee”) time-sharing spectrum in the 137-138...
47 CFR 25.260 - Time sharing between DoD meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2012 CFR

2012-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. 25.260... systems and non-voice, non-geostationary satellite systems in the 400.15-401 MHz band. (a) A non-voice, non-geostationary mobile-satellite service system licensee (“NVNG licensee”) time-sharing spectrum in...
47 CFR 25.259 - Time sharing between NOAA meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2011 CFR

2011-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. 25.259 Section... systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. (a) A non-voice, non-geostationary mobile-satellite service system licensee (“NVNG licensee”) time-sharing spectrum in the 137-138...
47 CFR 25.259 - Time sharing between NOAA meteorological satellite systems and non-voice, non-geostationary...

Code of Federal Regulations, 2012 CFR

2012-10-01

... satellite systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. 25.259 Section... systems and non-voice, non-geostationary satellite systems in the 137-138 MHz band. (a) A non-voice, non-geostationary mobile-satellite service system licensee (“NVNG licensee”) time-sharing spectrum in the 137-138...
Interpreting Chicken-Scratch: Lexical Access for Handwritten Words

PubMed Central

Barnhart, Anthony S.; Goldinger, Stephen D.

2014-01-01

Handwritten word recognition is a field of study that has largely been neglected in the psychological literature, despite its prevalence in society. Whereas studies of spoken word recognition almost exclusively employ natural, human voices as stimuli, studies of visual word recognition use synthetic typefaces, thus simplifying the process of word recognition. The current study examined the effects of handwriting on a series of lexical variables thought to influence bottom-up and top-down processing, including word frequency, regularity, bidirectional consistency, and imageability. The results suggest that the natural physical ambiguity of handwritten stimuli forces a greater reliance on top-down processes, because almost all effects were magnified, relative to conditions with computer print. These findings suggest that processes of word perception naturally adapt to handwriting, compensating for physical ambiguity by increasing top-down feedback. PMID:20695708
Biometrics: Accessibility challenge or opportunity?

PubMed

Blanco-Gonzalo, Ramon; Lunerti, Chiara; Sanchez-Reillo, Raul; Guest, Richard Michael

2018-01-01

Biometric recognition is currently implemented in several authentication contexts, most recently in mobile devices where it is expected to complement or even replace traditional authentication modalities such as PIN (Personal Identification Number) or passwords. The assumed convenience characteristics of biometrics are transparency, reliability and ease-of-use, however, the question of whether biometric recognition is as intuitive and straightforward to use is open to debate. Can biometric systems make some tasks easier for people with accessibility concerns? To investigate this question, an accessibility evaluation of a mobile app was conducted where test subjects withdraw money from a fictitious ATM (Automated Teller Machine) scenario. The biometric authentication mechanisms used include face, voice, and fingerprint. Furthermore, we employed traditional modalities of PIN and pattern in order to check if biometric recognition is indeed a real improvement. The trial test subjects within this work were people with real-life accessibility concerns. A group of people without accessibility concerns also participated, providing a baseline performance. Experimental results are presented concerning performance, HCI (Human-Computer Interaction) and accessibility, grouped according to category of accessibility concern. Our results reveal links between individual modalities and user category establishing guidelines for future accessible biometric products.
Biometrics: Accessibility challenge or opportunity?

PubMed Central

Lunerti, Chiara; Sanchez-Reillo, Raul; Guest, Richard Michael

2018-01-01

Biometric recognition is currently implemented in several authentication contexts, most recently in mobile devices where it is expected to complement or even replace traditional authentication modalities such as PIN (Personal Identification Number) or passwords. The assumed convenience characteristics of biometrics are transparency, reliability and ease-of-use, however, the question of whether biometric recognition is as intuitive and straightforward to use is open to debate. Can biometric systems make some tasks easier for people with accessibility concerns? To investigate this question, an accessibility evaluation of a mobile app was conducted where test subjects withdraw money from a fictitious ATM (Automated Teller Machine) scenario. The biometric authentication mechanisms used include face, voice, and fingerprint. Furthermore, we employed traditional modalities of PIN and pattern in order to check if biometric recognition is indeed a real improvement. The trial test subjects within this work were people with real-life accessibility concerns. A group of people without accessibility concerns also participated, providing a baseline performance. Experimental results are presented concerning performance, HCI (Human-Computer Interaction) and accessibility, grouped according to category of accessibility concern. Our results reveal links between individual modalities and user category establishing guidelines for future accessible biometric products. PMID:29565989
FELIN: tailored optronics and systems solutions for dismounted combat

NASA Astrophysics Data System (ADS)

Milcent, A. M.

2009-05-01

The FELIN French modernization program for dismounted combat provides the Armies with info-centric systems which dramatically enhance the performances of the soldier and the platoon. Sagem now has available a portfolio of various equipments, providing C4I, data and voice digital communication, and enhanced vision for day and night operations, through compact high performance electro-optics. The FELIN system provides the infantryman with a high-tech integrated and modular system which increases significantly their detection, recognition, identification capabilities, their situation awareness and information sharing, and this in any dismounted close combat situation. Among the key technologies used in this system, infrared and intensified vision provide a significant improvement in capability, observation performance and protection of the ground soldiers. This paper presents in detail the developed equipments, with an emphasis on lessons learned from the technical and operational feedback from dismounted close combat field tests.
Female voice communications in high level aircraft cockpit noises--part II: vocoder and automatic speech recognition systems.

PubMed

Nixon, C; Anderson, T; Morris, L; McCavitt, A; McKinley, R; Yeager, D; McDaniel, M

1998-11-01

The intelligibility of female and male speech is equivalent under most ordinary living conditions. However, due to small differences between their acoustic speech signals, called speech spectra, one can be more or less intelligible than the other in certain situations such as high levels of noise. Anecdotal information, supported by some empirical observations, suggests that some of the high intensity noise spectra of military aircraft cockpits may degrade the intelligibility of female speech more than that of male speech. In an applied research study, the intelligibility of female and male speech was measured in several high level aircraft cockpit noise conditions experienced in military aviation. In Part I, (Nixon CW, et al. Aviat Space Environ Med 1998; 69:675-83) female speech intelligibility measured in the spectra and levels of aircraft cockpit noises and with noise-canceling microphones was lower than that of the male speech in all conditions. However, the differences were small and only those at some of the highest noise levels were significant. Although speech intelligibility of both genders was acceptable during normal cruise noises, improvements are required in most of the highest levels of noise created during maximum aircraft operating conditions. These results are discussed in a Part I technical report. This Part II report examines the intelligibility in the same aircraft cockpit noises of vocoded female and male speech and the accuracy with which female and male speech in some of the cockpit noises were understood by automatic speech recognition systems. The intelligibility of vocoded female speech was generally the same as that of vocoded male speech. No significant differences were measured between the recognition accuracy of male and female speech by the automatic speech recognition systems. The intelligibility of female and male speech was equivalent for these conditions.
Automatic Speech Recognition from Neural Signals: A Focused Review.

PubMed

Herff, Christian; Schultz, Tanja

2016-01-01

Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
Graphics with Special Interfaces for Disabled People.

ERIC Educational Resources Information Center

Tronconi, A.; And Others

The paper describes new software and special input devices to allow physically impaired children to utilize the graphic capabilities of personal computers. Special input devices for computer graphics access--the voice recognition card, the single switch, or the mouse emulator--can be used either singly or in combination by the disabled to control…
Failing To Marvel: The Nuances, Complexities, and Challenges of Multicultural Education.

ERIC Educational Resources Information Center

Simoes de Carvalho, Paulo M.

1998-01-01

Reviews the complex nature of multicultural education, which as it advocates recognition of the values of many cultures, is nevertheless grounded in a Western culture and subject to Western deconstruction. Considers the challenge of the multicultural educator to recognize his or her own voice as representative of the dominant culture. (SLD)
Blood Memory and the Arts: Indigenous Genealogies and Imagined Truths

ERIC Educational Resources Information Center

Mithlo, Nancy Marie

2011-01-01

Contemporary Native arts are rarely included in global arts settings that highlight any number of other disenfranchised artists seeking to gain recognition and a voice in the form of critical exhibition practice or scholarship. This article argues that Native artists can benefit from an increased participation in these broader arts networks, given…
Effect of Technological Changes in Information Transfer on the Delivery of Pharmacy Services.

ERIC Educational Resources Information Center

Barker, Kenneth N.; And Others

1989-01-01

Personal computer technology has arrived in health care. Specific technological advances are optical disc storage, smart cards, voice recognition, and robotics. This paper discusses computers in medicine, in nursing, in conglomerates, and with patients. Future health care will be delivered in primary care centers, medical supermarkets, specialized…

Dysphonia Detected by Pattern Recognition of Spectral Composition.

ERIC Educational Resources Information Center

Leinonen, Lea; And Others

1992-01-01

This study analyzed production of a long vowel sound within Finnish words by normal or dysphonic voices, using the Self-Organizing Map, the artificial neural network algorithm of T. Kohonen which produces two-dimensional representations of speech. The method was found to be both sensitive and specific in the detection of dysphonia. (Author/JDD)
Speech Recognition in Fluctuating and Continuous Maskers: Effects of Hearing Loss and Presentation Level.

ERIC Educational Resources Information Center

Summers, Van; Molis, Michelle R.

2004-01-01

Listeners with normal-hearing sensitivity recognize speech more accurately in the presence of fluctuating background sounds, such as a single competing voice, than in unmodulated noise at the same overall level. These performance differences ore greatly reduced in listeners with hearing impairment, who generally receive little benefit from…
78 FR 17276 - Agency Information Collection Activities: Proposed Request and Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2013-03-20

... information collection in field offices via personal contact (face-to-face or telephone interview) using the... voice recognition technology, or by keying in responses using a telephone key pad. The SSIMWR allows... Development Worksheets: Face-to-Face Interview and Telephone Interview--20 CFR 416.204(b) and 422.135--0960...
77 FR 76591 - Agency Information Collection Activities: Proposed Request and Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2012-12-28

... voice recognition technology, or by keying in responses using a telephone key pad. The SSIMWR allows... Worksheets: Face-to-Face Interview and Telephone Interview--20 CFR 416.204(b) and 422.135--0960- 0780. SSA... each interview either over the telephone or through a face-to-face discussion with the centenarian...
Pupils dilate for vocal or familiar music.

PubMed

Weiss, Michael W; Trehub, Sandra E; Schellenberg, E Glenn; Habashi, Peter

2016-08-01

Previous research reveals that vocal melodies are remembered better than instrumental renditions. Here we explored the possibility that the voice, as a highly salient stimulus, elicits greater arousal than nonvocal stimuli, resulting in greater pupil dilation for vocal than for instrumental melodies. We also explored the possibility that pupil dilation indexes memory for melodies. We tracked pupil dilation during a single exposure to 24 unfamiliar folk melodies (half sung to la la, half piano) and during a subsequent recognition test in which the previously heard melodies were intermixed with 24 novel melodies (half sung, half piano) from the same corpus. Pupil dilation was greater for vocal melodies than for piano melodies in the exposure phase and in the test phase. It was also greater for previously heard melodies than for novel melodies. Our findings provide the first evidence that pupillometry can be used to measure recognition of stimuli that unfold over several seconds. They also provide the first evidence of enhanced arousal to vocal melodies during encoding and retrieval, thereby supporting the more general notion of the voice as a privileged signal. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Multiresolution analysis (discrete wavelet transform) through Daubechies family for emotion recognition in speech.

NASA Astrophysics Data System (ADS)

Campo, D.; Quintero, O. L.; Bastidas, M.

2016-04-01

We propose a study of the mathematical properties of voice as an audio signal. This work includes signals in which the channel conditions are not ideal for emotion recognition. Multiresolution analysis- discrete wavelet transform - was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states. ANNs proved to be a system that allows an appropriate classification of such states. This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features. Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify.
Hemispheric association and dissociation of voice and speech information processing in stroke.

PubMed

Jones, Anna B; Farrall, Andrew J; Belin, Pascal; Pernet, Cyril R

2015-10-01

As we listen to someone speaking, we extract both linguistic and non-linguistic information. Knowing how these two sets of information are processed in the brain is fundamental for the general understanding of social communication, speech recognition and therapy of language impairments. We investigated the pattern of performances in phoneme versus gender categorization in left and right hemisphere stroke patients, and found an anatomo-functional dissociation in the right frontal cortex, establishing a new syndrome in voice discrimination abilities. In addition, phoneme and gender performances were most often associated than dissociated in the left hemisphere patients, suggesting a common neural underpinnings. Copyright © 2015 Elsevier Ltd. All rights reserved.
Robust matching for voice recognition

NASA Astrophysics Data System (ADS)

Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

1994-10-01

This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.
Speech processing using maximum likelihood continuity mapping

DOEpatents

Hogden, John E.

2000-01-01

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.E.

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Illumination-invariant hand gesture recognition

NASA Astrophysics Data System (ADS)

Mendoza-Morales, América I.; Miramontes-Jaramillo, Daniel; Kober, Vitaly

2015-09-01

In recent years, human-computer interaction (HCI) has received a lot of interest in industry and science because it provides new ways to interact with modern devices through voice, body, and facial/hand gestures. The application range of the HCI is from easy control of home appliances to entertainment. Hand gesture recognition is a particularly interesting problem because the shape and movement of hands usually are complex and flexible to be able to codify many different signs. In this work we propose a three step algorithm: first, detection of hands in the current frame is carried out; second, hand tracking across the video sequence is performed; finally, robust recognition of gestures across subsequent frames is made. Recognition rate highly depends on non-uniform illumination of the scene and occlusion of hands. In order to overcome these issues we use two Microsoft Kinect devices utilizing combined information from RGB and infrared sensors. The algorithm performance is tested in terms of recognition rate and processing time.
The impact of compression of speech signal, background noise and acoustic disturbances on the effectiveness of speaker identification

NASA Astrophysics Data System (ADS)

Kamiński, K.; Dobrowolski, A. P.

2017-04-01

The paper presents the architecture and the results of optimization of selected elements of the Automatic Speaker Recognition (ASR) system that uses Gaussian Mixture Models (GMM) in the classification process. Optimization was performed on the process of selection of individual characteristics using the genetic algorithm and the parameters of Gaussian distributions used to describe individual voices. The system that was developed was tested in order to evaluate the impact of different compression methods used, among others, in landline, mobile, and VoIP telephony systems, on effectiveness of the speaker identification. Also, the results were presented of effectiveness of speaker identification at specific levels of noise with the speech signal and occurrence of other disturbances that could appear during phone calls, which made it possible to specify the spectrum of applications of the presented ASR system.
Single-channel voice-response-system program documentation volume I : system description

DOT National Transportation Integrated Search

1977-01-01

This report documents the design and implementation of a Voice Response System (VRS) using Adaptive Differential Pulse Code Modulation (ADPCM) voice coding. Implemented on a Digital Equipment Corporation PDP-11/20,R this VRS system supports a single ...
Exploring the Use of Emoji as a Visual Research Method for Eliciting Young Children's Voices in Childhood Research

ERIC Educational Resources Information Center

Fane, Jennifer; MacDougall, Colin; Jovanovic, Jessie; Redmond, Gerry; Gibbs, Lisa

2018-01-01

Recognition of the need to move from research "on" children to research "with" children has prompted significant theoretical and methodological debate as to how young children can be positioned as active participants in the research process. Visual research methods such as drawing, photography, and videography have received…
Effects of Familiarity and Feeding on Newborn Speech-Voice Recognition

ERIC Educational Resources Information Center

Valiante, A. Grace; Barr, Ronald G.; Zelazo, Philip R.; Brant, Rollin; Young, Simon N.

2013-01-01

Newborn infants preferentially orient to familiar over unfamiliar speech sounds. They are also better at remembering unfamiliar speech sounds for short periods of time if learning and retention occur after a feed than before. It is unknown whether short-term memory for speech is enhanced when the sound is familiar (versus unfamiliar) and, if so,…
Killing Curiosity? An Analysis of Celebrated Identity Performances among Teachers and Students in Nine London Secondary Science Classrooms

ERIC Educational Resources Information Center

Archer, Louise; Dawson, Emily; DeWitt, Jennifer; Godec, Spela; King, Heather; Mau, Ada; Nomikou, Effrosyni; Seakins, Amy

2017-01-01

In this paper, we take the view that school classrooms are spaces that are constituted by complex power struggles (for voice, authenticity, and recognition), involving multiple layers of resistance and contestation between the "institution," teachers and students, which can have profound implications for students' science identity and…
Building Biases in Infancy: The Influence of Race on Face and Voice Emotion Matching

ERIC Educational Resources Information Center

Vogel, Margaret; Monesson, Alexandra; Scott, Lisa S.

2012-01-01

Early in the first year of life infants exhibit equivalent performance distinguishing among people within their own race and within other races. However, with development and experience, their face recognition skills become tuned to groups of people they interact with the most. This developmental tuning is hypothesized to be the origin of adult…
Moving beyond the Hype: What Does the Celebration of Student Writing Do for Students?

ERIC Educational Resources Information Center

Carter, Genesea M.; Gallegos, Erin Penner

2017-01-01

Over the last decade celebrations of student writing (CSWs) have been instituted at universities across the nation as a public way to celebrate students' voices, identities, and literacies. Often touted as a way to gain campus-wide recognition and support for first-year composition courses, this event also purportedly fosters agency and authority…
Voice responses to changes in pitch of voice or tone auditory feedback

NASA Astrophysics Data System (ADS)

Sivasankar, Mahalakshmi; Bauer, Jay J.; Babu, Tara; Larson, Charles R.

2005-02-01

The present study was undertaken to examine if a subject's voice F0 responded not only to perturbations in pitch of voice feedback but also to changes in pitch of a side tone presented congruent with voice feedback. Small magnitude brief duration perturbations in pitch of voice or tone auditory feedback were randomly introduced during sustained vowel phonations. Results demonstrated a higher rate and larger magnitude of voice F0 responses to changes in pitch of the voice compared with a triangular-shaped tone (experiment 1) or a pure tone (experiment 2). However, response latencies did not differ across voice or tone conditions. Data suggest that subjects responded to the change in F0 rather than harmonic frequencies of auditory feedback because voice F0 response prevalence, magnitude, or latency did not statistically differ across triangular-shaped tone or pure-tone feedback. Results indicate the audio-vocal system is sensitive to the change in pitch of a variety of sounds, which may represent a flexible system capable of adapting to changes in the subject's voice. However, lower prevalence and smaller responses to tone pitch-shifted signals suggest that the audio-vocal system may resist changes to the pitch of other environmental sounds when voice feedback is present. .
Iris Cryptography for Security Purpose

NASA Astrophysics Data System (ADS)

Ajith, Srighakollapu; Balaji Ganesh Kumar, M.; Latha, S.; Samiappan, Dhanalakshmi; Muthu, P.

2018-04-01

In today's world, the security became the major issue to every human being. A major issue is hacking as hackers are everywhere, as the technology was developed still there are many issues where the technology fails to meet the security. Engineers, scientists were discovering the new products for security purpose as biometrics sensors like face recognition, pattern recognition, gesture recognition, voice authentication etcetera. But these devices fail to reach the expected results. In this work, we are going to present an approach to generate a unique secure key using the iris template. Here the iris templates are processed using the well-defined processing techniques. Using the encryption and decryption process they are stored, traversed and utilized. As of the work, we can conclude that the iris cryptography gives us the expected results for securing the data from eavesdroppers.

One approach to design of speech emotion database

NASA Astrophysics Data System (ADS)

Uhrin, Dominik; Chmelikova, Zdenka; Tovarek, Jaromir; Partila, Pavol; Voznak, Miroslav

2016-05-01

This article describes a system for evaluating the credibility of recordings with emotional character. Sound recordings form Czech language database for training and testing systems of speech emotion recognition. These systems are designed to detect human emotions in his voice. The emotional state of man is useful in the security forces and emergency call service. Man in action (soldier, police officer and firefighter) is often exposed to stress. Information about the emotional state (his voice) will help to dispatch to adapt control commands for procedure intervention. Call agents of emergency call service must recognize the mental state of the caller to adjust the mood of the conversation. In this case, the evaluation of the psychological state is the key factor for successful intervention. A quality database of sound recordings is essential for the creation of the mentioned systems. There are quality databases such as Berlin Database of Emotional Speech or Humaine. The actors have created these databases in an audio studio. It means that the recordings contain simulated emotions, not real. Our research aims at creating a database of the Czech emotional recordings of real human speech. Collecting sound samples to the database is only one of the tasks. Another one, no less important, is to evaluate the significance of recordings from the perspective of emotional states. The design of a methodology for evaluating emotional recordings credibility is described in this article. The results describe the advantages and applicability of the developed method.
Using voice input and audio feedback to enhance the reality of a virtual experience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miner, N.E.

1994-04-01

Virtual Reality (VR) is a rapidly emerging technology which allows participants to experience a virtual environment through stimulation of the participant`s senses. Intuitive and natural interactions with the virtual world help to create a realistic experience. Typically, a participant is immersed in a virtual environment through the use of a 3-D viewer. Realistic, computer-generated environment models and accurate tracking of a participant`s view are important factors for adding realism to a virtual experience. Stimulating a participant`s sense of sound and providing a natural form of communication for interacting with the virtual world are equally important. This paper discusses the advantagesmore » and importance of incorporating voice recognition and audio feedback capabilities into a virtual world experience. Various approaches and levels of complexity are discussed. Examples of the use of voice and sound are presented through the description of a research application developed in the VR laboratory at Sandia National Laboratories.« less
Representational specificity of within-category phonetic variation in the mental lexicon

NASA Astrophysics Data System (ADS)

Ju, Min; Luce, Paul A.

2003-10-01

This study examines (1) whether within-category phonetic variation in voice onset time (VOT) is encoded in long-term memory and has consequences for subsequent word recognition and, if so, (2) whether such effects are greater in words with voiced counterparts (pat/bat) than those without (cow/*gow), given that VOT information is more critical for lexical discrimination in the former. Two long-term repetition priming experiments were conducted using words containing word-initial voiceless stops varying in VOT. Reaction times to a lexical decision were compared between the same and different VOT conditions in words with or without voiced counterparts. If veridical representations of each episode are preserved in memory, variation in VOT should have demonstrable effects on the magnitude of priming. However, if within-category variation is discarded and form-based representations are abstract, the variation in VOT should not mediate priming. The implications of these results for the specificity and abstractness of phonetic representations in long-term memory will be discussed.
The written voice: implicit memory effects of voice characteristics following silent reading and auditory presentation.

PubMed

Abramson, Marianne

2007-12-01

After being familiarized with two voices, either implicit (auditory lexical decision) or explicit memory (auditory recognition) for words from silently read sentences was assessed among 32 men and 32 women volunteers. In the silently read sentences, the sex of speaker was implied in the initial words, e.g., "He said, ..." or "She said...". Tone in question versus statement was also manipulated by appropriate punctuation. Auditory lexical decision priming was found for sex- and tone-consistent items following silent reading, but only up to 5 min. after silent reading. In a second study, similar lexical decision priming was found following listening to the sentences, although these effects remained reliable after a 2-day delay. The effect sizes for lexical decision priming showed that tone-consistency and sex-consistency were strong following both silent reading and listening 5 min. after studying. These results suggest that readers create episodic traces of text from auditory images of silently read sentences as they do during listening.
Implementation and preliminary evaluation of 'C-tone': A novel algorithm to improve lexical tone recognition in Mandarin-speaking cochlear implant users.

PubMed

Ping, Lichuan; Wang, Ningyuan; Tang, Guofang; Lu, Thomas; Yin, Li; Tu, Wenhe; Fu, Qian-Jie

2017-09-01

Because of limited spectral resolution, Mandarin-speaking cochlear implant (CI) users have difficulty perceiving fundamental frequency (F0) cues that are important to lexical tone recognition. To improve Mandarin tone recognition in CI users, we implemented and evaluated a novel real-time algorithm (C-tone) to enhance the amplitude contour, which is strongly correlated with the F0 contour. The C-tone algorithm was implemented in clinical processors and evaluated in eight users of the Nurotron NSP-60 CI system. Subjects were given 2 weeks of experience with C-tone. Recognition of Chinese tones, monosyllables, and disyllables in quiet was measured with and without the C-tone algorithm. Subjective quality ratings were also obtained for C-tone. After 2 weeks of experience with C-tone, there were small but significant improvements in recognition of lexical tones, monosyllables, and disyllables (P < 0.05 in all cases). Among lexical tones, the largest improvements were observed for Tone 3 (falling-rising) and the smallest for Tone 4 (falling). Improvements with C-tone were greater for disyllables than for monosyllables. Subjective quality ratings showed no strong preference for or against C-tone, except for perception of own voice, where C-tone was preferred. The real-time C-tone algorithm provided small but significant improvements for speech performance in quiet with no change in sound quality. Pre-processing algorithms to reduce noise and better real-time F0 extraction would improve the benefits of C-tone in complex listening environments. Chinese CI users' speech recognition in quiet can be significantly improved by modifying the amplitude contour to better resemble the F0 contour.
Visual and auditory socio-cognitive perception in unilateral temporal lobe epilepsy in children and adolescents: a prospective controlled study.

PubMed

Laurent, Agathe; Arzimanoglou, Alexis; Panagiotakaki, Eleni; Sfaello, Ignacio; Kahane, Philippe; Ryvlin, Philippe; Hirsch, Edouard; de Schonen, Scania

2014-12-01

A high rate of abnormal social behavioural traits or perceptual deficits is observed in children with unilateral temporal lobe epilepsy. In the present study, perception of auditory and visual social signals, carried by faces and voices, was evaluated in children or adolescents with temporal lobe epilepsy. We prospectively investigated a sample of 62 children with focal non-idiopathic epilepsy early in the course of the disorder. The present analysis included 39 children with a confirmed diagnosis of temporal lobe epilepsy. Control participants (72), distributed across 10 age groups, served as a control group. Our socio-perceptual evaluation protocol comprised three socio-visual tasks (face identity, facial emotion and gaze direction recognition), two socio-auditory tasks (voice identity and emotional prosody recognition), and three control tasks (lip reading, geometrical pattern and linguistic intonation recognition). All 39 patients also benefited from a neuropsychological examination. As a group, children with temporal lobe epilepsy performed at a significantly lower level compared to the control group with regards to recognition of facial identity, direction of eye gaze, and emotional facial expressions. We found no relationship between the type of visual deficit and age at first seizure, duration of epilepsy, or the epilepsy-affected cerebral hemisphere. Deficits in socio-perceptual tasks could be found independently of the presence of deficits in visual or auditory episodic memory, visual non-facial pattern processing (control tasks), or speech perception. A normal FSIQ did not exempt some of the patients from an underlying deficit in some of the socio-perceptual tasks. Temporal lobe epilepsy not only impairs development of emotion recognition, but can also impair development of perception of other socio-perceptual signals in children with or without intellectual deficiency. Prospective studies need to be designed to evaluate the results of appropriate re-education programs in children presenting with deficits in social cue processing.
Initial Progress Toward Development of a Voice-Based Computer-Delivered Motivational Intervention for Heavy Drinking College Students: An Experimental Study

PubMed Central

Lechner, William J; MacGlashan, James; Wray, Tyler B; Littman, Michael L

2017-01-01

Background Computer-delivered interventions have been shown to be effective in reducing alcohol consumption in heavy drinking college students. However, these computer-delivered interventions rely on mouse, keyboard, or touchscreen responses for interactions between the users and the computer-delivered intervention. The principles of motivational interviewing suggest that in-person interventions may be effective, in part, because they encourage individuals to think through and speak aloud their motivations for changing a health behavior, which current computer-delivered interventions do not allow. Objective The objective of this study was to take the initial steps toward development of a voice-based computer-delivered intervention that can ask open-ended questions and respond appropriately to users’ verbal responses, more closely mirroring a human-delivered motivational intervention. Methods We developed (1) a voice-based computer-delivered intervention that was run by a human controller and that allowed participants to speak their responses to scripted prompts delivered by speech generation software and (2) a text-based computer-delivered intervention that relied on the mouse, keyboard, and computer screen for all interactions. We randomized 60 heavy drinking college students to interact with the voice-based computer-delivered intervention and 30 to interact with the text-based computer-delivered intervention and compared their ratings of the systems as well as their motivation to change drinking and their drinking behavior at 1-month follow-up. Results Participants reported that the voice-based computer-delivered intervention engaged positively with them in the session and delivered content in a manner consistent with motivational interviewing principles. At 1-month follow-up, participants in the voice-based computer-delivered intervention condition reported significant decreases in quantity, frequency, and problems associated with drinking, and increased perceived importance of changing drinking behaviors. In comparison to the text-based computer-delivered intervention condition, those assigned to voice-based computer-delivered intervention reported significantly fewer alcohol-related problems at the 1-month follow-up (incident rate ratio 0.60, 95% CI 0.44-0.83, P=.002). The conditions did not differ significantly on perceived importance of changing drinking or on measures of drinking quantity and frequency of heavy drinking. Conclusions Results indicate that it is feasible to construct a series of open-ended questions and a bank of responses and follow-up prompts that can be used in a future fully automated voice-based computer-delivered intervention that may mirror more closely human-delivered motivational interventions to reduce drinking. Such efforts will require using advanced speech recognition capabilities and machine-learning approaches to train a program to mirror the decisions made by human controllers in the voice-based computer-delivered intervention used in this study. In addition, future studies should examine enhancements that can increase the perceived warmth and empathy of voice-based computer-delivered intervention, possibly through greater personalization, improvements in the speech generation software, and embodying the computer-delivered intervention in a physical form. PMID:28659259
Voice Response Systems Technology.

ERIC Educational Resources Information Center

Gerald, Jeanette

1984-01-01

Examines two methods of generating synthetic speech in voice response systems, which allow computers to communicate in human terms (speech), using human interface devices (ears): phoneme and reconstructed voice systems. Considerations prior to implementation, current and potential applications, glossary, directory, and introduction to Input Output…
Human-computer interaction for alert warning and attention allocation systems of the multimodal watchstation

NASA Astrophysics Data System (ADS)

Obermayer, Richard W.; Nugent, William A.

2000-11-01

The SPAWAR Systems Center San Diego is currently developing an advanced Multi-Modal Watchstation (MMWS); design concepts and software from this effort are intended for transition to future United States Navy surface combatants. The MMWS features multiple flat panel displays and several modes of user interaction, including voice input and output, natural language recognition, 3D audio, stylus and gestural inputs. In 1999, an extensive literature review was conducted on basic and applied research concerned with alerting and warning systems. After summarizing that literature, a human computer interaction (HCI) designer's guide was prepared to support the design of an attention allocation subsystem (AAS) for the MMWS. The resultant HCI guidelines are being applied in the design of a fully interactive AAS prototype. An overview of key findings from the literature review, a proposed design methodology with illustrative examples, and an assessment of progress made in implementing the HCI designers guide are presented.
Alternative Speech Communication System for Persons with Severe Speech Disorders

NASA Astrophysics Data System (ADS)

Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

2009-12-01

Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Intimate Geographies: Reclaiming Citizenship and Community in "The Autobiography of Delfina Cuero" and Bonita Nunez's "Diaries"

ERIC Educational Resources Information Center

Fitzgerald, Stephanie

2006-01-01

American Indian women's autobiographies recount a specific type of life experience that has often been overlooked, one that is equally important in understanding the genre and to develop ways of reading these texts that balance the recovery and recognition of the Native voice and agency contained within them with the processes of creation and the…
Advanced Productivity Analysis Methods for Air Traffic Control Operations

DTIC Science & Technology

1976-12-01

Routine Work ............................... 37 4.2.2. Surveillance Work .......................... 40 4.2.3. Conflict Prcessing Work ................... 41...crossing and overtake conflicts) includes potential- conflict recognition, assessment, and resolution decision making and A/N voice communications...makers to utilize £ .quantitative and dynamic analysis as a tool for decision - making. 1.1.3 Types of Simulation Models Although there are many ways to
The Complexity of Literacy in Kenya: Narrative Analysis of Maasai Women's Experiences

ERIC Educational Resources Information Center

Taeko, Takayanagi

2014-01-01

This paper aims to challenge limited notions of literacy and argues for the recognition of Maasai women's self-determined learning in order to bring about human development in Kenya. It also seeks to construct a complex picture of literacy, drawing on postcolonial feminist theory as a framework to ensure that the woman's voice is heard. Through…
Enhancement of temporal periodicity cues in cochlear implants: Effects on prosodic perception and vowel identification

NASA Astrophysics Data System (ADS)

Green, Tim; Faulkner, Andrew; Rosen, Stuart; Macherey, Olivier

2005-07-01

Standard continuous interleaved sampling processing, and a modified processing strategy designed to enhance temporal cues to voice pitch, were compared on tests of intonation perception, and vowel perception, both in implant users and in acoustic simulations. In standard processing, 400 Hz low-pass envelopes modulated either pulse trains (implant users) or noise carriers (simulations). In the modified strategy, slow-rate envelope modulations, which convey dynamic spectral variation crucial for speech understanding, were extracted by low-pass filtering (32 Hz). In addition, during voiced speech, higher-rate temporal modulation in each channel was provided by 100% amplitude-modulation by a sawtooth-like wave form whose periodicity followed the fundamental frequency (F0) of the input. Channel levels were determined by the product of the lower- and higher-rate modulation components. Both in acoustic simulations and in implant users, the ability to use intonation information to identify sentences as question or statement was significantly better with modified processing. However, while there was no difference in vowel recognition in the acoustic simulation, implant users performed worse with modified processing both in vowel recognition and in formant frequency discrimination. It appears that, while enhancing pitch perception, modified processing harmed the transmission of spectral information.
Major depression is associated with impaired processing of emotion in music as well as in facial and vocal stimuli.

PubMed

Naranjo, C; Kornreich, C; Campanella, S; Noël, X; Vandriette, Y; Gillain, B; de Longueville, X; Delatte, B; Verbanck, P; Constant, E

2011-02-01

The processing of emotional stimuli is thought to be negatively biased in major depression. This study investigates this issue using musical, vocal and facial affective stimuli. 23 depressed in-patients and 23 matched healthy controls were recruited. Affective information processing was assessed through musical, vocal and facial emotion recognition tasks. Depression, anxiety level and attention capacity were controlled. The depressed participants demonstrated less accurate identification of emotions than the control group in all three sorts of emotion-recognition tasks. The depressed group also gave higher intensity ratings than the controls when scoring negative emotions, and they were more likely to attribute negative emotions to neutral voices and faces. Our in-patient group might differ from the more general population of depressed adults. They were all taking anti-depressant medication, which may have had an influence on their emotional information processing. Major depression is associated with a general negative bias in the processing of emotional stimuli. Emotional processing impairment in depression is not confined to interpersonal stimuli (faces and voices), being also present in the ability to feel music accurately. © 2010 Elsevier B.V. All rights reserved.
Flexible and wearable electronic silk fabrics for human physiological monitoring

NASA Astrophysics Data System (ADS)

Mao, Cuiping; Zhang, Huihui; Lu, Zhisong

2017-09-01

The development of textile-based devices for human physiological monitoring has attracted tremendous interest in recent years. However, flexible physiological sensing elements based on silk fabrics have not been realized. In this paper, ZnO nanorod arrays are grown in situ on reduced graphene oxide-coated silk fabrics via a facile electro-deposition method for the fabrication of silk-fabric-based mechanical sensing devices. The data show that well-aligned ZnO nanorods with hexagonal wurtzite crystalline structures are synthesized on the conductive silk fabric surface. After magnetron sputtering of gold electrodes, silk-fabric-based devices are produced and applied to detect periodic bending and twisting. Based on the electric signals, the deformation and release processes can be easily differentiated. Human arterial pulse and respiration can also be real-time monitored to calculate the pulse rate and respiration frequency, respectively. Throat vibrations during coughing and singing are detected to demonstrate the voice recognition capability. This work may not only help develop silk-fabric-based mechanical sensing elements for potential applications in clinical diagnosis, daily healthcare monitoring and voice recognition, but also provide a versatile method for fabricating textile-based flexible electronic devices.
Florida manatee avoidance technology: A pilot program by the Florida Fish and Wildlife Conservation Commission

NASA Astrophysics Data System (ADS)

Frisch, Katherine; Haubold, Elsa

2003-10-01

Since 1976, approximately 25% of the annual Florida manatee (Trichechus manatus latirostris) mortality has been attributed to collisions with watercraft. In 2001, the Florida Legislature appropriated $200,000 in funds for research projects using technological solutions to directly address the problem of collisions between manatees and watercraft. The Florida Fish & Wildlife Conservation Commission initially funded seven projects for the first two fiscal years. The selected proposals were designed to explore technology that had not previously been applied to the manatee/boat collision problem and included many acoustic concepts related to voice recognition, sonar, and an alerting device to be put on boats to warn manatees. The most promising results to date are from projects employing voice-recognition techniques to identify manatee vocalizations and warn boaters of the manatees' presence. Sonar technology, much like that used in fish finders, is promising but has met with regulatory problems regarding permitting and remains to be tested, as has the manatee-alerting device. The state of Florida found results of the initial years of funding compelling and plans to fund further manatee avoidance technology research in a continued effort to mitigate the problem of manatee/boat collisions.
Voice-on-Target: A New Approach to Tactical Networking and Unmanned Systems Control via the Voice Interface to the SA Environment

DTIC Science & Technology

2009-06-01

Blackberry handheld) device. After each voice command activation, the medic provided voice comments to be recorded in Observer Notepad over Voice...vial (up-right corner of picture) upon voice activation from the medic’s Blackberry handheld. The NPS UAS which was controlled by voice commands...Voice Portal using a standard Blackberry handheld with a head set. The results demonstrated sufficient accuracy for controlling the tactical sensor
Frequency and analysis of non-clinical errors made in radiology reports using the National Integrated Medical Imaging System voice recognition dictation software.

PubMed

Motyer, R E; Liddy, S; Torreggiani, W C; Buckley, O

2016-11-01

Voice recognition (VR) dictation of radiology reports has become the mainstay of reporting in many institutions worldwide. Despite benefit, such software is not without limitations, and transcription errors have been widely reported. Evaluate the frequency and nature of non-clinical transcription error using VR dictation software. Retrospective audit of 378 finalised radiology reports. Errors were counted and categorised by significance, error type and sub-type. Data regarding imaging modality, report length and dictation time was collected. 67 (17.72 %) reports contained ≥1 errors, with 7 (1.85 %) containing 'significant' and 9 (2.38 %) containing 'very significant' errors. A total of 90 errors were identified from the 378 reports analysed, with 74 (82.22 %) classified as 'insignificant', 7 (7.78 %) as 'significant', 9 (10 %) as 'very significant'. 68 (75.56 %) errors were 'spelling and grammar', 20 (22.22 %) 'missense' and 2 (2.22 %) 'nonsense'. 'Punctuation' error was most common sub-type, accounting for 27 errors (30 %). Complex imaging modalities had higher error rates per report and sentence. Computed tomography contained 0.040 errors per sentence compared to plain film with 0.030. Longer reports had a higher error rate, with reports >25 sentences containing an average of 1.23 errors per report compared to 0-5 sentences containing 0.09. These findings highlight the limitations of VR dictation software. While most error was deemed insignificant, there were occurrences of error with potential to alter report interpretation and patient management. Longer reports and reports on more complex imaging had higher error rates and this should be taken into account by the reporting radiologist.
The persuasiveness of synthetic speech versus human speech.

PubMed

Stern, S E; Mullennix, J W; Dyson, C; Wilson, S J

1999-12-01

Is computer-synthesized speech as persuasive as the human voice when presenting an argument? After completing an attitude pretest, 193 participants were randomly assigned to listen to a persuasive appeal under three conditions: a high-quality synthesized speech system (DECtalk Express), a low-quality synthesized speech system (Monologue), and a tape recording of a human voice. Following the appeal, participants completed a posttest attitude survey and a series of questionnaires designed to assess perceptions of speech qualities, perceptions of the speaker, and perceptions of the message. The human voice was generally perceived more favorably than the computer-synthesized voice, and the speaker was perceived more favorably when the voice was a human voice than when it was computer synthesized. There was, however, no evidence that computerized speech, as compared with the human voice, affected persuasion or perceptions of the message. Actual or potential applications of this research include issues that should be considered when designing synthetic speech systems.

Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

NASA Technical Reports Server (NTRS)

Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

2010-01-01

A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
A Voice-Based E-Examination Framework for Visually Impaired Students in Open and Distance Learning

ERIC Educational Resources Information Center

Azeta, Ambrose A.; Inam, Itorobong A.; Daramola, Olawande

2018-01-01

Voice-based systems allow users access to information on the internet over a voice interface. Prior studies on Open and Distance Learning (ODL) e-examination systems that make use of voice interface do not sufficiently exhibit intelligent form of assessment, which diminishes the rigor of examination. The objective of this paper is to improve on…
An Analysis of Content Delivery Systems Using Speaking Voice, Speaking with Repetition Voice, Chanting Voice, and Singing Voice.

ERIC Educational Resources Information Center

Foster, Karen R.; Kersh, Mildred E.; Masztal, Nancy B.

This study investigated the way kindergarten classroom teachers delivered information to students to see if it affected the amount of information students could remember about the solar system. The study also examined whether this difference would be related to the degree of musical aptitude possessed by each student. The students were pretested…
The program complex for vocal recognition

NASA Astrophysics Data System (ADS)

Konev, Anton; Kostyuchenko, Evgeny; Yakimuk, Alexey

2017-01-01

This article discusses the possibility of applying the algorithm of determining the pitch frequency for the note recognition problems. Preliminary study of programs-analogues were carried out for programs with function “recognition of the music”. The software package based on the algorithm for pitch frequency calculation was implemented and tested. It was shown that the algorithm allows recognizing the notes in the vocal performance of the user. A single musical instrument, a set of musical instruments, and a human voice humming a tune can be the sound source. The input file is initially presented in the .wav format or is recorded in this format from a microphone. Processing is performed by sequentially determining the pitch frequency and conversion of its values to the note. According to test results, modification of algorithms used in the complex was planned.
Design of digital voice storage and playback system

NASA Astrophysics Data System (ADS)

Tang, Chao

2018-03-01

Based on STC89C52 chip, this paper presents a single chip microcomputer minimum system, which is used to realize the logic control of digital speech storage and playback system. Compared with the traditional tape voice recording system, the system has advantages of small size, low power consumption, The effective solution of traditional voice recording system is limited in the use of electronic and information processing.
Emotional Cues during Simultaneous Face and Voice Processing: Electrophysiological Insights

PubMed Central

Liu, Taosheng; Pinheiro, Ana; Zhao, Zhongxin; Nestor, Paul G.; McCarley, Robert W.; Niznikiewicz, Margaret A.

2012-01-01

Both facial expression and tone of voice represent key signals of emotional communication but their brain processing correlates remain unclear. Accordingly, we constructed a novel implicit emotion recognition task consisting of simultaneously presented human faces and voices with neutral, happy, and angry valence, within the context of recognizing monkey faces and voices task. To investigate the temporal unfolding of the processing of affective information from human face-voice pairings, we recorded event-related potentials (ERPs) to these audiovisual test stimuli in 18 normal healthy subjects; N100, P200, N250, P300 components were observed at electrodes in the frontal-central region, while P100, N170, P270 were observed at electrodes in the parietal-occipital region. Results indicated a significant audiovisual stimulus effect on the amplitudes and latencies of components in frontal-central (P200, P300, and N250) but not the parietal occipital region (P100, N170 and P270). Specifically, P200 and P300 amplitudes were more positive for emotional relative to neutral audiovisual stimuli, irrespective of valence, whereas N250 amplitude was more negative for neutral relative to emotional stimuli. No differentiation was observed between angry and happy conditions. The results suggest that the general effect of emotion on audiovisual processing can emerge as early as 200 msec (P200 peak latency) post stimulus onset, in spite of implicit affective processing task demands, and that such effect is mainly distributed in the frontal-central region. PMID:22383987
Andreas Vesalius' 500th Anniversary: Initial Integral Understanding of Voice Production.

PubMed

Brinkman, Romy J; Hage, J Joris

2017-01-01

Voice production relies on the integrated functioning of a three-part system: respiration, phonation and resonance, and articulation. To commemorate the 500th anniversary of the great anatomist Andreas Vesalius (1515-1564), we report on his understanding of this integral system. The text of Vesalius' masterpiece De Humani Corporis Fabrica Libri Septum and an eyewitness report of the public dissection of three corpses by Vesalius in Bologna, Italy, in 1540, were searched for references to the voice-producing anatomical structures and their function. We clustered the traced, separate parts for the first time. We found that Vesalius recognized the importance for voice production of many details of the respiratory system, the voice box, and various structures of resonance and articulation. He stressed that voice production was a cerebral function and extensively recorded the innervation of the voice-producing organs by the cranial nerves. Vesalius was the first to publicly record the concept of voice production as an integrated and cerebrally directed function of respiration, phonation and resonance, and articulation. In doing so nearly 500 years ago, he laid a firm basis for the understanding of the physiology of voice production and speech and its management as we know it today. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Can blind persons accurately assess body size from the voice?

PubMed

Pisanski, Katarzyna; Oleszkiewicz, Anna; Sorokowska, Agnieszka

2016-04-01

Vocal tract resonances provide reliable information about a speaker's body size that human listeners use for biosocial judgements as well as speech recognition. Although humans can accurately assess men's relative body size from the voice alone, how this ability is acquired remains unknown. In this study, we test the prediction that accurate voice-based size estimation is possible without prior audiovisual experience linking low frequencies to large bodies. Ninety-one healthy congenitally or early blind, late blind and sighted adults (aged 20-65) participated in the study. On the basis of vowel sounds alone, participants assessed the relative body sizes of male pairs of varying heights. Accuracy of voice-based body size assessments significantly exceeded chance and did not differ among participants who were sighted, or congenitally blind or who had lost their sight later in life. Accuracy increased significantly with relative differences in physical height between men, suggesting that both blind and sighted participants used reliable vocal cues to size (i.e. vocal tract resonances). Our findings demonstrate that prior visual experience is not necessary for accurate body size estimation. This capacity, integral to both nonverbal communication and speech perception, may be present at birth or may generalize from broader cross-modal correspondences. © 2016 The Author(s).
Can blind persons accurately assess body size from the voice?

PubMed Central

Oleszkiewicz, Anna; Sorokowska, Agnieszka

2016-01-01

Vocal tract resonances provide reliable information about a speaker's body size that human listeners use for biosocial judgements as well as speech recognition. Although humans can accurately assess men's relative body size from the voice alone, how this ability is acquired remains unknown. In this study, we test the prediction that accurate voice-based size estimation is possible without prior audiovisual experience linking low frequencies to large bodies. Ninety-one healthy congenitally or early blind, late blind and sighted adults (aged 20–65) participated in the study. On the basis of vowel sounds alone, participants assessed the relative body sizes of male pairs of varying heights. Accuracy of voice-based body size assessments significantly exceeded chance and did not differ among participants who were sighted, or congenitally blind or who had lost their sight later in life. Accuracy increased significantly with relative differences in physical height between men, suggesting that both blind and sighted participants used reliable vocal cues to size (i.e. vocal tract resonances). Our findings demonstrate that prior visual experience is not necessary for accurate body size estimation. This capacity, integral to both nonverbal communication and speech perception, may be present at birth or may generalize from broader cross-modal correspondences. PMID:27095264
Automation of Command and Data Entry in a Glovebox Work Volume: An Evaluation of Data Entry Devices

NASA Technical Reports Server (NTRS)

Steele, Marianne K.; Nakamura, Gail; Havens, Cindy; LeMay, Moira

1996-01-01

The present study was designed to examine the human-computer interface for data entry while performing experimental procedures within a glovebox work volume in order to make a recommendation to the Space Station Biological Research Project for a data entry system to be used within the Life Sciences Glovebox. Test subjects entered data using either a manual keypad, similar to a standard computer numerical keypad located within the glovebox work volume, or a voice input system using a speech recognition program with a microphone headset. Numerical input and commands were programmed in an identical manner between the two systems. With both electronic systems, a small trackball was available within the work volume for cursor control. Data, such as sample vial identification numbers, sample tissue weights, and health check parameters of the specimen, were entered directly into procedures that were electronically displayed on a video monitor within the glovebox. A pen and paper system with a 'flip-chart' format for procedure display, similar to that currently in use on the Space Shuttle, was used as a baseline data entry condition. Procedures were performed by a single operator; eight test subjects were used in the study. The electronic systems were tested under both a 'nominal' or 'anomalous' condition. The anomalous condition was introduced into the experimental procedure to increase the probability of finding limitations or problems with human interactions with the electronic systems. Each subject performed five test runs during a test day: two procedures each with voice and keypad, one with and one without anomalies, and one pen and paper procedure. The data collected were both quantitative (times, errors) and qualitative (subjective ratings of the subjects).
Utilization of Internet Protocol-Based Voice Systems in Remote Payload Operations

NASA Technical Reports Server (NTRS)

Best, Susan; Nichols, Kelvin; Bradford, Robert

2003-01-01

This viewgraph presentation provides an overview of a proposed voice communication system for use in remote payload operations performed on the International Space Station. The system, Internet Voice Distribution System (IVoDS), would make use of existing Internet protocols, and offer a number of advantages over the system currently in use. Topics covered include: system description and operation, system software and hardware, system architecture, project status, and technology transfer applications.
Acoustical and Intelligibility Test of the Vocera(Copyright) B3000 Communication Badge

NASA Technical Reports Server (NTRS)

Archer, Ronald; Litaker, Harry; Chu, Shao-Sheng R.; Simon, Cory; Romero, Andy; Moses, Haifa

2012-01-01

To communicate with each other or ground support, crew members on board the International Space Station (ISS) currently use the Audio Terminal Units (ATU), which are located in each ISS module. However, to use the ATU, crew members must stop their current activity, travel to a panel, and speak into a wall-mounted microphone, or use either a handheld microphone or a Crew Communication Headset that is connected to a panel. These actions unnecessarily may increase task times, lower productivity, create cable management issues, and thus increase crew frustration. Therefore, the Habitability and Human Factors and Human Interface Branches at the NASA Johnson Space Center (JSC) are currently investigating a commercial-off-the-shelf (COTS) wireless communication system, Vocera(C), as a near-term solution for ISS communication. The objectives of the acoustics and intelligibility testing of this system were to answer the following questions: 1. How intelligibly can a human hear the transmitted message from a Vocera(c) badge in three different noise environments (Baseline = 20 dB, US Lab Module = 58 dB, Russian Module = 70.6 dB)? 2. How accurate is the Vocera(C) badge at recognizing voice commands in three different noise environments? 3. What body location (chest, upper arm, or shoulder) is optimal for speech intelligibility and voice recognition accuracy of the Vocera(C) badge on a human in three different noise environments?
Influence of phonetic context on the dysphonic event: contribution of new methodologies for the analysis of pathological voice.

PubMed

Revis, J; Galant, C; Fredouille, C; Ghio, A; Giovanni, A

2012-01-01

Widely studied in terms of perception, acoustics or aerodynamics, dysphonia stays nevertheless a speech phenomenon, closely related to the phonetic composition of the message conveyed by the voice. In this paper, we present a series of three works with the aim to understand the implications of the phonetic manifestation of dysphonia. Our first study proposes a new approach to the perceptual analysis of dysphonia (the phonetic labeling), which principle is to listen and evaluate each phoneme in a sentence separately. This study confirms the hypothesis of Laver that the dysphonia is not a constant noise added to the speech signal, but a discontinuous phenomenon, occurring on certain phonemes, based on the phonetic context. However, the burden of executing the task has led us to look to the techniques of automatic speaker recognition (ASR) to automate the procedure. With the collaboration of the LIA, we have developed a system for automatic classification of dysphonia from the techniques of ASR. This is the subject of our second study. The first results obtained with this system suggest that the unvoiced consonants show predominant performance in the task of automatic classification of dysphonia. This result is surprising since it is often assumed that dysphonia occurs only on laryngeal vibration. We started looking for explanations of this phenomenon and we present our assumptions and experiences in the third work we present.
Secure access to patient's health records using SpeechXRays a mutli-channel biometrics platform for user authentication.

PubMed

Spanakis, Emmanouil G; Spanakis, Marios; Karantanas, Apostolos; Marias, Kostas

2016-08-01

The most commonly used method for user authentication in ICT services or systems is the application of identification tools such as passwords or personal identification numbers (PINs). The rapid development in ICT technology regarding smart devices (laptops, tablets and smartphones) has allowed also the advance of hardware components that capture several biometric traits such as fingerprints and voice. These components are aiming among others to overcome weaknesses and flaws of password usage under the prism of improved user authentication with higher level of security, privacy and usability. To this respect, the potential application of biometrics for secure user authentication regarding access in systems with sensitive data (i.e. patient's data from electronic health records) shows great potentials. SpeechXRays aims to provide a user recognition platform based on biometrics of voice acoustics analysis and audio-visual identity verification. Among others, the platform aims to be applied as an authentication tool for medical personnel in order to gain specific access to patient's electronic health records. In this work a short description of SpeechXrays implementation tool regarding eHealth is provided and analyzed. This study explores security and privacy issues, and offers a comprehensive overview of biometrics technology applications in addressing the e-Health security challenges. We present and describe the necessary requirement for an eHealth platform concerning biometric security.
Internet-Based System for Voice Communication With the ISS

NASA Technical Reports Server (NTRS)

Chamberlain, James; Myers, Gerry; Clem, David; Speir, Terri

2005-01-01

The Internet Voice Distribution System (IVoDS) is a voice-communication system that comprises mainly computer hardware and software. The IVoDS was developed to supplement and eventually replace the Enhanced Voice Distribution System (EVoDS), which, heretofore, has constituted the terrestrial subsystem of a system for voice communications among crewmembers of the International Space Station (ISS), workers at the Payloads Operations Center at Marshall Space Flight Center, principal investigators at diverse locations who are responsible for specific payloads, and others. The IVoDS utilizes a communication infrastructure of NASA and NASArelated intranets in addition to, as its name suggests, the Internet. Whereas the EVoDS utilizes traditional circuitswitched telephony, the IVoDS is a packet-data system that utilizes a voice over Internet protocol (VOIP). Relative to the EVoDS, the IVoDS offers advantages of greater flexibility and lower cost for expansion and reconfiguration. The IVoDS is an extended version of a commercial Internet-based voice conferencing system that enables each user to participate in only one conference at a time. In the IVoDS, a user can receive audio from as many as eight conferences simultaneously while sending audio to one of them. The IVoDS also incorporates administrative controls, beyond those of the commercial system, that provide greater security and control of the capabilities and authorizations for talking and listening afforded to each user.
Voice/Natural Language Interfacing for Robotic Control.

DTIC Science & Technology

1987-11-01

THIS PAGE REPORT DOCUMENTATION PAGE Is. REPORT SECURITY CLASSIFICATION lb . RESTRICTIVE MARKINGS UNCLASSIFIED 2a. SECURITY CLASSIFICATION AUTHORITY 3...until major computing power can be profitably allocated to the speech recognition process, off-the- shelf units will never have sufficient intelligence to...coordinate transformation for a location, and opening or closing the gripper’s toggles. External to world operations, each joint may be rotated
The U.S.-China E-Language Project: A Study of a Gaming Approach to English Language Learning for Middle School Students

ERIC Educational Resources Information Center

Green, Patricia J.; Sha, Mandy; Liu, Lu

2011-01-01

In 2001, the U.S. Department of Education and the Ministry of Education in China entered into a bilateral partnership to develop a technology-driven approach to foreign language learning that integrated gaming, immersion, voice recognition, problem-based learning tasks, and other features that made it a significant research and development pilot…
Improving Automated Lexical and Discourse Analysis of Online Chat Dialog

DTIC Science & Technology

2007-09-01

include spelling- and grammar-checking on our word processing software; voice-recognition in our automobiles; and telephone-based conversational agents ...conversational agents can help customers make purchases on-line [3]. In addition, discourse analyzers can automatically separate multiple, interleaved...telephone-based conversational agent needs to know if it was asked a question or tasked to do something. Indeed, Stolcke et al demonstrated that
NWR (National Weather Service) voice synthesis project, phase 1

NASA Astrophysics Data System (ADS)

Sampson, G. W.

1986-01-01

The purpose of the NOAA Weather Radio (NWR) Voice Synthesis Project is to provide a demonstration of the current voice synthesis technology. Phase 1 of this project is presented, providing a complete automation of an hourly surface aviation observation for broadcast over NWR. In examining the products currently available on the market, the decision was made that synthetic voice technology does not have the high quality speech required for broadcast over the NWR. Therefore the system presented uses the phrase concatenation type of technology for a very high quality, versatile, voice synthesis system.
Identification and human condition analysis based on the human voice analysis

NASA Astrophysics Data System (ADS)

Mieshkov, Oleksandr Yu.; Novikov, Oleksandr O.; Novikov, Vsevolod O.; Fainzilberg, Leonid S.; Kotyra, Andrzej; Smailova, Saule; Kozbekova, Ainur; Imanbek, Baglan

2017-08-01

The paper presents a two-stage biotechnical system for human condition analysis that is based on analysis of human voice signal. At the initial stage, the voice signal is pre-processed and its characteristics in time domain are determined. At the first stage, the developed system is capable of identifying the person in the database on the basis of the extracted characteristics. At the second stage, the model of a human voice is built on the basis of the real voice signals after clustering the whole database.

Using Natural Language to Enhance Mission Effectiveness

NASA Technical Reports Server (NTRS)

Trujillo, Anna C.; Meszaros, Erica

2016-01-01

The availability of highly capable, yet relatively cheap, unmanned aerial vehicles (UAVs) is opening up new areas of use for hobbyists and for professional-related activities. The driving function of this research is allowing a non-UAV pilot, an operator, to define and manage a mission. This paper describes the preliminary usability measures of an interface that allows an operator to define the mission using speech to make inputs. An experiment was conducted to begin to enumerate the efficacy and user acceptance of using voice commands to define a multi-UAV mission and to provide high-level vehicle control commands such as "takeoff." The primary independent variable was input type - voice or mouse. The primary dependent variables consisted of the correctness of the mission parameter inputs and the time needed to make all inputs. Other dependent variables included NASA-TLX workload ratings and subjective ratings on a final questionnaire. The experiment required each subject to fill in an online form that contained comparable required information that would be needed for a package dispatcher to deliver packages. For each run, subjects typed in a simple numeric code for the package code. They then defined the initial starting position, the delivery location, and the return location using either pull-down menus or voice input. Voice input was accomplished using CMU Sphinx4-5prealpha for speech recognition. They then inputted the length of the package. These were the option fields. The subject had the system "Calculate Trajectory" and then "Takeoff" once the trajectory was calculated. Later, the subject used "Land" to finish the run. After the voice and mouse input blocked runs, subjects completed a NASA-TLX. At the conclusion of all runs, subjects completed a questionnaire asking them about their experience in inputting the mission parameters, and starting and stopping the mission using mouse and voice input. In general, the usability of voice commands is acceptable. With a relatively well-defined and simple vocabulary, the operator can input the vast majority of the mission parameters using simple, intuitive voice commands. However, voice input may be more applicable to initial mission specification rather than for critical commands such as the need to land immediately due to time and feedback constraints. It would also be convenient to retrieve relevant mission information using voice input. Therefore, further on-going research is looking at using intent from operator utterances to provide the relevant mission information to the operator. The information displayed will be inferred from the operator's utterances just before key phrases are spoken. Linguistic analysis of the context of verbal communication provides insight into the intended meaning of commonly heard phrases such as "What's it doing now?" Analyzing the semantic sphere surrounding these common phrases enables us to predict the operator's intent and supply the operator's desired information to the interface. This paper also describes preliminary investigations into the generation of the semantic space of UAV operation and the success at providing information to the interface based on the operator's utterances.
Initial Progress Toward Development of a Voice-Based Computer-Delivered Motivational Intervention for Heavy Drinking College Students: An Experimental Study.

PubMed

Kahler, Christopher W; Lechner, William J; MacGlashan, James; Wray, Tyler B; Littman, Michael L

2017-06-28

Computer-delivered interventions have been shown to be effective in reducing alcohol consumption in heavy drinking college students. However, these computer-delivered interventions rely on mouse, keyboard, or touchscreen responses for interactions between the users and the computer-delivered intervention. The principles of motivational interviewing suggest that in-person interventions may be effective, in part, because they encourage individuals to think through and speak aloud their motivations for changing a health behavior, which current computer-delivered interventions do not allow. The objective of this study was to take the initial steps toward development of a voice-based computer-delivered intervention that can ask open-ended questions and respond appropriately to users' verbal responses, more closely mirroring a human-delivered motivational intervention. We developed (1) a voice-based computer-delivered intervention that was run by a human controller and that allowed participants to speak their responses to scripted prompts delivered by speech generation software and (2) a text-based computer-delivered intervention that relied on the mouse, keyboard, and computer screen for all interactions. We randomized 60 heavy drinking college students to interact with the voice-based computer-delivered intervention and 30 to interact with the text-based computer-delivered intervention and compared their ratings of the systems as well as their motivation to change drinking and their drinking behavior at 1-month follow-up. Participants reported that the voice-based computer-delivered intervention engaged positively with them in the session and delivered content in a manner consistent with motivational interviewing principles. At 1-month follow-up, participants in the voice-based computer-delivered intervention condition reported significant decreases in quantity, frequency, and problems associated with drinking, and increased perceived importance of changing drinking behaviors. In comparison to the text-based computer-delivered intervention condition, those assigned to voice-based computer-delivered intervention reported significantly fewer alcohol-related problems at the 1-month follow-up (incident rate ratio 0.60, 95% CI 0.44-0.83, P=.002). The conditions did not differ significantly on perceived importance of changing drinking or on measures of drinking quantity and frequency of heavy drinking. Results indicate that it is feasible to construct a series of open-ended questions and a bank of responses and follow-up prompts that can be used in a future fully automated voice-based computer-delivered intervention that may mirror more closely human-delivered motivational interventions to reduce drinking. Such efforts will require using advanced speech recognition capabilities and machine-learning approaches to train a program to mirror the decisions made by human controllers in the voice-based computer-delivered intervention used in this study. In addition, future studies should examine enhancements that can increase the perceived warmth and empathy of voice-based computer-delivered intervention, possibly through greater personalization, improvements in the speech generation software, and embodying the computer-delivered intervention in a physical form. ©Christopher W Kahler, William J Lechner, James MacGlashan, Tyler B Wray, Michael L Littman. Originally published in JMIR Mental Health (http://mental.jmir.org), 28.06.2017.
Optical gesture sensing and depth mapping technologies for head-mounted displays: an overview

NASA Astrophysics Data System (ADS)

Kress, Bernard; Lee, Johnny

2013-05-01

Head Mounted Displays (HMDs), and especially see-through HMDs have gained renewed interest in recent time, and for the first time outside the traditional military and defense realm, due to several high profile consumer electronics companies presenting their products to hit market. Consumer electronics HMDs have quite different requirements and constrains as their military counterparts. Voice comments are the de-facto interface for such devices, but when the voice recognition does not work (not connection to the cloud for example), trackpad and gesture sensing technologies have to be used to communicate information to the device. We review in this paper the various technologies developed today integrating optical gesture sensing in a small footprint, as well as the various related 3d depth mapping sensors.
Design and realization of intelligent tourism service system based on voice interaction

NASA Astrophysics Data System (ADS)

Hu, Lei-di; Long, Yi; Qian, Cheng-yang; Zhang, Ling; Lv, Guo-nian

2008-10-01

Voice technology is one of the important contents to improve the intelligence and humanization of tourism service system. Combining voice technology, the paper concentrates on application needs and the composition of system to present an overall intelligent tourism service system's framework consisting of presentation layer, Web services layer, and tourism application service layer. On the basis, the paper further elaborated the implementation of the system and its key technologies, including intelligent voice interactive technology, seamless integration technology of multiple data sources, location-perception-based guides' services technology, and tourism safety control technology. Finally, according to the situation of Nanjing tourism, a prototype of Tourism Services System is realized.
Vocabulary Learning in a Yorkshire Terrier: Slow Mapping of Spoken Words

PubMed Central

Griebel, Ulrike; Oller, D. Kimbrough

2012-01-01

Rapid vocabulary learning in children has been attributed to “fast mapping”, with new words often claimed to be learned through a single presentation. As reported in 2004 in Science a border collie (Rico) not only learned to identify more than 200 words, but fast mapped the new words, remembering meanings after just one presentation. Our research tests the fast mapping interpretation of the Science paper based on Rico's results, while extending the demonstration of large vocabulary recognition to a lap dog. We tested a Yorkshire terrier (Bailey) with the same procedures as Rico, illustrating that Bailey accurately retrieved randomly selected toys from a set of 117 on voice command of the owner. Second we tested her retrieval based on two additional voices, one male, one female, with different accents that had never been involved in her training, again showing she was capable of recognition by voice command. Third, we did both exclusion-based training of new items (toys she had never seen before with names she had never heard before) embedded in a set of known items, with subsequent retention tests designed as in the Rico experiment. After Bailey succeeded on exclusion and retention tests, a crucial evaluation of true mapping tested items previously successfully retrieved in exclusion and retention, but now pitted against each other in a two-choice task. Bailey failed on the true mapping task repeatedly, illustrating that the claim of fast mapping in Rico had not been proven, because no true mapping task had ever been conducted with him. It appears that the task called retention in the Rico study only demonstrated success in retrieval by a process of extended exclusion. PMID:22363421
A long distance voice transmission system based on the white light LED

NASA Astrophysics Data System (ADS)

Tian, Chunyu; Wei, Chang; Wang, Yulian; Wang, Dachi; Yu, Benli; Xu, Feng

2017-10-01

A long distance voice transmission system based on a visible light communication technology (VLCT) is proposed in the paper. Our proposed system includes transmitter, receiver and the voice signal processing of single chip microcomputer. In the compact-sized LED transmitter, we use on-off-keying and not-return-to-zero (OOK-NRZ) to easily realize high speed modulation, and then systematic complexity is reduced. A voice transmission system, which possesses the properties of the low-noise and wide modulation band, is achieved by the design of high efficiency receiving optical path and using filters to reduce noise from the surrounding light. To improve the speed of the signal processing, we use single chip microcomputer to code and decode voice signal. Furthermore, serial peripheral interface (SPI) is adopted to accurately transmit voice signal data. The test results of our proposed system show that the transmission distance of this system is more than100 meters with the maximum data rate of 1.5 Mbit/s and a SNR of 30dB. This system has many advantages, such as simple construction, low cost and strong practicality. Therefore, it has extensive application prospect in the fields of the emergency communication and indoor wireless communication, etc.
78 FR 71676 - Submission for Review: 3206-0201, Federal Employees Health Benefits (FEHB) Open Season Express...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-11-29

... (FEHB) Open Season Express Interactive Voice Response (IVR) System and Open Season Web site AGENCY: U.S... Benefits (FEHB) Open Season Express Interactive Voice Response (IVR) System and the Open Season Web site... Season Express Interactive Voice Response (IVR) System, and the Open Season Web site, Open Season Online...
Effects of the Voice over Internet Protocol on Perturbation Analysis of Normal and Pathological Phonation

PubMed Central

Zhu, Yanmei; Witt, Rachel E.; MacCallum, Julia K.; Jiang, Jack J.

2010-01-01

Objective In this study, a Voice over Internet Protocol (VoIP) communication based on G.729 protocol was simulated to determine the effects of this system on acoustic perturbation parameters of normal and pathological voice signals. Patients and Methods: Fifty recordings of normal voice and 48 recordings of pathological voice affected by laryngeal paralysis were transmitted through a VoIP communication system. The acoustic analysis programs of CSpeech and MDVP were used to determine the percent jitter and percent shimmer from the voice samples before and after VoIP transmission. The effects of three frequently used audio compression protocols (MP3, WMA, and FLAC) on the perturbation measures were also studied. Results It was found that VoIP transmission disrupts the waveform and increases the percent jitter and percent shimmer of voice samples. However, after VoIP transmission, significant discrimination between normal and pathological voices affected by laryngeal paralysis was still possible. It was found that the lossless compression method FLAC does not exert any influence on the perturbation measures. The lossy compression methods MP3 and WMA increase percent jitter and percent shimmer values. Conclusion This study validates the feasibility of these transmission and compression protocols in developing remote voice signal data collection and assessment systems. PMID:20588051
How Psychological Stress Affects Emotional Prosody.

PubMed

Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J

2016-01-01

We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity.
How Psychological Stress Affects Emotional Prosody

PubMed Central

Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J.

2016-01-01

We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity. PMID:27802287
Recognition memory and awareness: occurrence of perceptual effects in remembering or in knowing depends on conscious resources at encoding, but not at retrieval.

PubMed

Gardiner, John M; Gregg, Vernon H; Karayianni, Irene

2006-03-01

We report four experiments in which a remember-know paradigm was combined with a response deadline procedure in order to assess memory awareness in fast, as compared with slow,recognition judgments. In the experiments, we also investigated the perceptual effects of study-test congruence, either for picture size or for speaker's voice, following either full or divided attention at study. These perceptual effects occurred in remembering with full attention and in knowing with divided attention, but they were uninfluenced by recognition speed, indicating that their occurrence in remembering or knowing depends more on conscious resources at encoding than on those at retrieval. The results have implications for theoretical accounts of remembering and knowing that assume that remembering is more consciously controlled and effortful, whereas knowing is more automatic and faster.
Apollo experience report: Voice communications techniques and performance

NASA Technical Reports Server (NTRS)

Dabbs, J. H.; Schmidt, O. L.

1972-01-01

The primary performance requirement of the spaceborne Apollo voice communications system is percent word intelligibility, which is related to other link/channel parameters. The effect of percent word intelligibility on voice channel design and a description of the verification procedures are included. Development and testing performance problems and the techniques used to solve the problems are also discussed. Voice communications performance requirements should be comprehensive and verified easily; the total system must be considered in component design, and the necessity of voice processing and the associated effect on noise, distortion, and cross talk should be examined carefully.
Intelligent electrical harness connector assembly using Bell Helicopter Textron's 'Wire Harness Automated Manufacturing System'

NASA Astrophysics Data System (ADS)

Springer, D. W.

Bell Helicopter Textron, Incorporated (BHTI) installed two Digital Equipment Corporation PDP-11 computers and an American Can Inc. Ink Jet printer in 1980 as the cornerstone of the Wire Harness Automated Manufacturing System (WHAMS). WHAMS is based upon the electrical assembly philosophy of continuous filament harness forming. This installation provided BHTI with a 3 to 1 return-on-investment by reducing wire and cable identification cycle time by 80 percent and harness forming, on dedicated layout tooling, by 40 percent. Yet, this improvement in harness forming created a bottle neck in connector assembly. To remove this bottle neck, BHTI has installed a prototype connector assembly cell that integrates the WHAMS' data base and innovative computer technologies to cut harness connector assembly cycle time. This novel connector assembly cell uses voice recognition, laser identification, and animated computer graphics to help the electrician in the correct assembly of harness connectors.
Multi-modal assessment of on-road demand of voice and manual phone calling and voice navigation entry across two embedded vehicle systems.

PubMed

Mehler, Bruce; Kidd, David; Reimer, Bryan; Reagan, Ian; Dobres, Jonathan; McCartt, Anne

2016-03-01

One purpose of integrating voice interfaces into embedded vehicle systems is to reduce drivers' visual and manual distractions with 'infotainment' technologies. However, there is scant research on actual benefits in production vehicles or how different interface designs affect attentional demands. Driving performance, visual engagement, and indices of workload (heart rate, skin conductance, subjective ratings) were assessed in 80 drivers randomly assigned to drive a 2013 Chevrolet Equinox or Volvo XC60. The Chevrolet MyLink system allowed completing tasks with one voice command, while the Volvo Sensus required multiple commands to navigate the menu structure. When calling a phone contact, both voice systems reduced visual demand relative to the visual-manual interfaces, with reductions for drivers in the Equinox being greater. The Equinox 'one-shot' voice command showed advantages during contact calling but had significantly higher error rates than Sensus during destination address entry. For both secondary tasks, neither voice interface entirely eliminated visual demand. Practitioner Summary: The findings reinforce the observation that most, if not all, automotive auditory-vocal interfaces are multi-modal interfaces in which the full range of potential demands (auditory, vocal, visual, manipulative, cognitive, tactile, etc.) need to be considered in developing optimal implementations and evaluating drivers' interaction with the systems. Social Media: In-vehicle voice-interfaces can reduce visual demand but do not eliminate it and all types of demand need to be taken into account in a comprehensive evaluation.
Multi-modal assessment of on-road demand of voice and manual phone calling and voice navigation entry across two embedded vehicle systems

PubMed Central

Mehler, Bruce; Kidd, David; Reimer, Bryan; Reagan, Ian; Dobres, Jonathan; McCartt, Anne

2016-01-01

Abstract One purpose of integrating voice interfaces into embedded vehicle systems is to reduce drivers’ visual and manual distractions with ‘infotainment’ technologies. However, there is scant research on actual benefits in production vehicles or how different interface designs affect attentional demands. Driving performance, visual engagement, and indices of workload (heart rate, skin conductance, subjective ratings) were assessed in 80 drivers randomly assigned to drive a 2013 Chevrolet Equinox or Volvo XC60. The Chevrolet MyLink system allowed completing tasks with one voice command, while the Volvo Sensus required multiple commands to navigate the menu structure. When calling a phone contact, both voice systems reduced visual demand relative to the visual–manual interfaces, with reductions for drivers in the Equinox being greater. The Equinox ‘one-shot’ voice command showed advantages during contact calling but had significantly higher error rates than Sensus during destination address entry. For both secondary tasks, neither voice interface entirely eliminated visual demand. Practitioner Summary: The findings reinforce the observation that most, if not all, automotive auditory–vocal interfaces are multi-modal interfaces in which the full range of potential demands (auditory, vocal, visual, manipulative, cognitive, tactile, etc.) need to be considered in developing optimal implementations and evaluating drivers’ interaction with the systems. Social Media: In-vehicle voice-interfaces can reduce visual demand but do not eliminate it and all types of demand need to be taken into account in a comprehensive evaluation. PMID:26269281
White House Communications Agency (WHCA) Presidential Voice Communications Rack Mount System Mechanical Drawing Package

DTIC Science & Technology

2015-12-01

Rack Mount System Mechanical Drawing Package by Steven P Callaway Approved for public release; distribution unlimited...Laboratory White House Communications Agency (WHCA) Presidential Voice Communications Rack Mount System Mechanical Drawing Package by Steven P...Note 3. DATES COVERED (From - To) 04/2013 4. TITLE AND SUBTITLE White House Communications Agency (WHCA) Presidential Voice Communications Rack
Central Nervous System Control of Voice and Swallowing

PubMed Central

Ludlow, Christy L.

2015-01-01

This review of the central nervous control systems for voice and swallowing has suggested that the traditional concepts of a separation between cortical and limbic and brain stem control should be refined and more integrative. For voice production, a separation of the non-human vocalization system from the human learned voice production system has been posited based primarily on studies of non-human primates. However, recent humans studies of emotionally based vocalizations and human volitional voice production has shown more integration between these two systems than previously proposed. Recent human studies have shown that reflexive vocalization as well as learned voice production not involving speech, involve a common integrative system. On the other hand, recent studies of non-human primates have provided evidence of some cortical activity during vocalization and cortical changes with training during vocal behavior. For swallowing, evidence from the macaque and functional brain imaging in humans indicates that the control for the pharyngeal phase of swallowing is not primarily under brain stem mechanisms as previously proposed. Studies suggest that the initiation and patterning of swallowing for the pharyngeal phase is also under active cortical control for both spontaneous as well as volitional swallowing in awake humans and non-human primates. PMID:26241238
Neurobiological correlates of emotional intelligence in voice and face perception networks

PubMed Central

Karle, Kathrin N; Ethofer, Thomas; Jacob, Heike; Brück, Carolin; Erb, Michael; Lotze, Martin; Nizielski, Sophia; Schütz, Astrid; Wildgruber, Dirk; Kreifelts, Benjamin

2018-01-01

Abstract Facial expressions and voice modulations are among the most important communicational signals to convey emotional information. The ability to correctly interpret this information is highly relevant for successful social interaction and represents an integral component of emotional competencies that have been conceptualized under the term emotional intelligence. Here, we investigated the relationship of emotional intelligence as measured with the Salovey-Caruso-Emotional-Intelligence-Test (MSCEIT) with cerebral voice and face processing using functional and structural magnetic resonance imaging. MSCEIT scores were positively correlated with increased voice-sensitivity and gray matter volume of the insula accompanied by voice-sensitivity enhanced connectivity between the insula and the temporal voice area, indicating generally increased salience of voices. Conversely, in the face processing system, higher MSCEIT scores were associated with decreased face-sensitivity and gray matter volume of the fusiform face area. Taken together, these findings point to an alteration in the balance of cerebral voice and face processing systems in the form of an attenuated face-vs-voice bias as one potential factor underpinning emotional intelligence. PMID:29365199
Neurobiological correlates of emotional intelligence in voice and face perception networks.

PubMed

Karle, Kathrin N; Ethofer, Thomas; Jacob, Heike; Brück, Carolin; Erb, Michael; Lotze, Martin; Nizielski, Sophia; Schütz, Astrid; Wildgruber, Dirk; Kreifelts, Benjamin

2018-02-01

Facial expressions and voice modulations are among the most important communicational signals to convey emotional information. The ability to correctly interpret this information is highly relevant for successful social interaction and represents an integral component of emotional competencies that have been conceptualized under the term emotional intelligence. Here, we investigated the relationship of emotional intelligence as measured with the Salovey-Caruso-Emotional-Intelligence-Test (MSCEIT) with cerebral voice and face processing using functional and structural magnetic resonance imaging. MSCEIT scores were positively correlated with increased voice-sensitivity and gray matter volume of the insula accompanied by voice-sensitivity enhanced connectivity between the insula and the temporal voice area, indicating generally increased salience of voices. Conversely, in the face processing system, higher MSCEIT scores were associated with decreased face-sensitivity and gray matter volume of the fusiform face area. Taken together, these findings point to an alteration in the balance of cerebral voice and face processing systems in the form of an attenuated face-vs-voice bias as one potential factor underpinning emotional intelligence.
Joint Sparse Representation for Robust Multimodal Biometrics Recognition

DTIC Science & Technology

2014-01-01

comprehensive multimodal dataset and a face database are described in section V. Finally, in section VI, we discuss the computational complexity of...fingerprint, iris, palmprint , hand geometry and voice from subjects of different age, gender and ethnicity as described in Table I. It is a...Taylor, “Constructing nonlinear discriminants from multiple data views,” Machine Learning and Knowl- edge Discovery in Databases , pp. 328–343, 2010

Vocal Parameters and Self-Perception in Individuals With Adductor Spasmodic Dysphonia.

PubMed

Rojas, Gleidy Vannesa E; Ricz, Hilton; Tumas, Vitor; Rodrigues, Guilherme R; Toscano, Patrícia; Aguiar-Ricz, Lílian

2017-05-01

The study aimed to compare and correlate perceptual-auditory analysis of vocal parameters and self-perception in individuals with adductor spasmodic dysphonia before and after the application of botulinum toxin. This is a prospective cohort study. Sixteen individuals with a diagnosis of adductor spasmodic dysphonia were submitted to the application of botulinum toxin in the thyroarytenoid muscle, to the recording of a voice signal, and to the Voice Handicap Index (VHI) questionnaire before the application and at two time points after application. Two judges performed a perceptual-auditory analysis of eight vocal parameters with the aid of the Praat software for the visualization of narrow band spectrography, pitch, and intensity contour. Comparison of the vocal parameters before toxin application and on the first return revealed a reduction of oscillation intensity (P = 0.002), voice breaks (P = 0.002), and vocal tremor (P = 0.002). The same parameters increased on the second return. The degree of severity, strained-strangled voice, roughness, breathiness, and asthenia was unchanged. The total score and the emotional domain score of the VHI were reduced on the first return. There was a moderate correlation between the degree of voice severity and the total VHI score before application and on the second return, and a weak correlation on the first return. Perceptual-auditory analysis and self-perception proved to be efficient in the recognition of vocal changes and of the vocal impact on individuals with adductor spasmodic dysphonia under treatment with botulinum toxin, permitting the quantitation of changes along time. Copyright © 2017. Published by Elsevier Inc.
Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition.

PubMed

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners' Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners' everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial-F0s to represent male vs. female voices, and different F0 contours (i.e. falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners' sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners' performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution.
Processing F0 with Cochlear Implants: Modulation Frequency Discrimination and Speech Intonation Recognition

PubMed Central

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners’ Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners’ everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial F0s to represent male vs. female voices, and different F0 contours (i.e., falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners’ sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners’ performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution. PMID:18093766
14 CFR 25.1457 - Cockpit voice recorders.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Cockpit voice recorders. 25.1457 Section 25... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...
14 CFR 25.1457 - Cockpit voice recorders.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Cockpit voice recorders. 25.1457 Section 25... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...
14 CFR 29.1457 - Cockpit voice recorders.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Cockpit voice recorders. 29.1457 Section 29... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...
14 CFR 29.1457 - Cockpit voice recorders.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Cockpit voice recorders. 29.1457 Section 29... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...
14 CFR 25.1457 - Cockpit voice recorders.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Cockpit voice recorders. 25.1457 Section 25... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...
Network Speech Systems Technology Program

NASA Astrophysics Data System (ADS)

Weinstein, C. J.

1980-09-01

This report documents work performed during FY 1980 on the DCA-sponsored Network Speech Systems Technology Program. The areas of work reported are: (1) communication systems studies in Demand-Assignment Multiple Access (DAMA), voice/data integration, and adaptive routing, in support of the evolving Defense Communications System (DCS) and Defense Switched Network (DSN); (2) a satellite/terrestrial integration design study including the functional design of voice and data interfaces to interconnect terrestrial and satellite network subsystems; and (3) voice-conferencing efforts dealing with support of the Secure Voice and Graphics Conferencing (SVGC) Test and Evaluation Program. Progress in definition and planning of experiments for the Experimental Integrated Switched Network (EISN) is detailed separately in an FY 80 Experiment Plan Supplement.
Traffic signs recognition for driving assistance

NASA Astrophysics Data System (ADS)

Sai Sangram Reddy, Yatham; Karthik, Devareddy; Rana, Nikunj; Jasmine Pemeena Priyadarsini, M.; Rajini, G. K.; Naseera, Shaik

2017-11-01

In the current circumstances with the innovative headway, we must be able to provide assistance to the driving in recognising the traffic signs on the roads. At present time, many reviews are being directed moving in the direction of the usage of a keen Traffic Systems. One field of this exploration is driving support systems, and many reviews are being directed to create frameworks which distinguish and perceive street signs in front of the vehicle, and afterward utilize the data to advise the driver or to even control the vehicle by implementing this system on self-driving vehicles. In this paper we propose a method to detect the traffic sign board in a frame using HAAR cascading and then identifying the sign on it. The output may be either given out in voice or can be displayed as per the driver’s convenience. Each of the Traffic Sign is recognised using a database of images of symbols used to train the KNN classifier using open CV libraries.
The Effect of Hydration on the Voice Quality of Future Professional Vocal Performers.

PubMed

van Wyk, Liezl; Cloete, Mariaan; Hattingh, Danel; van der Linde, Jeannie; Geertsema, Salome

2017-01-01

The application of systemic hydration as an instrument for optimal voice quality has been a common practice by several professional voice users over the years. Although the physiological action has been determined, the benefits on acoustic and perceptual characteristics are relatively unknown. The present study aimed to determine whether systemic hydration has beneficial outcomes on the voice quality of future professional voice users. A within-subject, pretest posttest design is applied to determine quantitative research results of female singing students between 18 and 32 years of age without a history of voice pathology. Acoustic and perceptual data were collected before and after a 2-hour singing rehearsal. The difference between the hypohydrated condition (controlled) and the hydrated condition (experimental) and the relationship between adequate hydration and acoustic and perceptual parameters of voice was then investigated. A statistical significant (P = 0.041) increase in jitter values were obtained for the hypohydrated condition. Increased maximum phonation time (MPT/z/) and higher maximum frequency for hydration indicated further statistical significant changes in voice quality (P = 0.028 and P = 0.015, respectively). Systemic hydration has positive outcomes on perceptual and acoustic parameters of voice quality for future professional singers. The singer's ability to sustain notes for longer and reach higher frequencies may reflect well in performances. Any positive change in voice quality may benefit the singer's occupational success and subsequently their social, emotional, and vocational well-being. More research evidence is needed to determine the parameters for implementing adequate hydration in vocal hygiene programs. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Effect of Acting Experience on Emotion Expression and Recognition in Voice: Non-Actors Provide Better Stimuli than Expected.

PubMed

Jürgens, Rebecca; Grass, Annika; Drolet, Matthis; Fischer, Julia

Both in the performative arts and in emotion research, professional actors are assumed to be capable of delivering emotions comparable to spontaneous emotional expressions. This study examines the effects of acting training on vocal emotion depiction and recognition. We predicted that professional actors express emotions in a more realistic fashion than non-professional actors. However, professional acting training may lead to a particular speech pattern; this might account for vocal expressions by actors that are less comparable to authentic samples than the ones by non-professional actors. We compared 80 emotional speech tokens from radio interviews with 80 re-enactments by professional and inexperienced actors, respectively. We analyzed recognition accuracies for emotion and authenticity ratings and compared the acoustic structure of the speech tokens. Both play-acted conditions yielded similar recognition accuracies and possessed more variable pitch contours than the spontaneous recordings. However, professional actors exhibited signs of different articulation patterns compared to non-trained speakers. Our results indicate that for emotion research, emotional expressions by professional actors are not better suited than those from non-actors.
Automatic voice recognition using traditional and artificial neural network approaches

NASA Technical Reports Server (NTRS)

Botros, Nazeih M.

1989-01-01

The main objective of this research is to develop an algorithm for isolated-word recognition. This research is focused on digital signal analysis rather than linguistic analysis of speech. Features extraction is carried out by applying a Linear Predictive Coding (LPC) algorithm with order of 10. Continuous-word and speaker independent recognition will be considered in future study after accomplishing this isolated word research. To examine the similarity between the reference and the training sets, two approaches are explored. The first is implementing traditional pattern recognition techniques where a dynamic time warping algorithm is applied to align the two sets and calculate the probability of matching by measuring the Euclidean distance between the two sets. The second is implementing a backpropagation artificial neural net model with three layers as the pattern classifier. The adaptation rule implemented in this network is the generalized least mean square (LMS) rule. The first approach has been accomplished. A vocabulary of 50 words was selected and tested. The accuracy of the algorithm was found to be around 85 percent. The second approach is in progress at the present time.
Auditory word recognition: extrinsic and intrinsic effects of word frequency.

PubMed

Connine, C M; Titone, D; Wang, J

1993-01-01

Two experiments investigated the influence of word frequency in a phoneme identification task. Speech voicing continua were constructed so that one endpoint was a high-frequency word and the other endpoint was a low-frequency word (e.g., best-pest). Experiment 1 demonstrated that ambiguous tokens were labeled such that a high-frequency word was formed (intrinsic frequency effect). Experiment 2 manipulated the frequency composition of the list (extrinsic frequency effect). A high-frequency list bias produced an exaggerated influence of frequency; a low-frequency list bias showed a reverse frequency effect. Reaction time effects were discussed in terms of activation and postaccess decision models of frequency coding. The results support a late use of frequency in auditory word recognition.
Quantitative evaluation of the voice range profile in patients with voice disorder.

PubMed

Ikeda, Y; Masuda, T; Manako, H; Yamashita, H; Yamamoto, T; Komiyama, S

1999-01-01

In 1953, Calvet first displayed the fundamental frequency (pitch) and sound pressure level (intensity) of a voice on a two-dimensional plane and created a voice range profile. This profile has been used to evaluate clinically various vocal disorders, although such evaluations to date have been subjective without quantitative assessment. In the present study, a quantitative system was developed to evaluate the voice range profile utilizing a personal computer. The area of the voice range profile was defined as the voice volume. This volume was analyzed in 137 males and 175 females who were treated for various dysphonias at Kyushu University between 1984 and 1990. Ten normal subjects served as controls. The voice volume in cases with voice disorders significantly decreased irrespective of the disease and sex. Furthermore, cases having better improvement after treatment showed a tendency for the voice volume to increase. These findings illustrated the voice volume as a useful clinical test for evaluating voice control in cases with vocal disorders.
Knowledge Discovery, Integration and Communication for Extreme Weather and Flood Resilience Using Artificial Intelligence: Flood AI Alpha

NASA Astrophysics Data System (ADS)

Demir, I.; Sermet, M. Y.

2016-12-01

Nobody is immune from extreme events or natural hazards that can lead to large-scale consequences for the nation and public. One of the solutions to reduce the impacts of extreme events is to invest in improving resilience with the ability to better prepare, plan, recover, and adapt to disasters. The National Research Council (NRC) report discusses the topic of how to increase resilience to extreme events through a vision of resilient nation in the year 2030. The report highlights the importance of data, information, gaps and knowledge challenges that needs to be addressed, and suggests every individual to access the risk and vulnerability information to make their communities more resilient. This abstracts presents our project on developing a resilience framework for flooding to improve societal preparedness with objectives; (a) develop a generalized ontology for extreme events with primary focus on flooding; (b) develop a knowledge engine with voice recognition, artificial intelligence, natural language processing, and inference engine. The knowledge engine will utilize the flood ontology and concepts to connect user input to relevant knowledge discovery outputs on flooding; (c) develop a data acquisition and processing framework from existing environmental observations, forecast models, and social networks. The system will utilize the framework, capabilities and user base of the Iowa Flood Information System (IFIS) to populate and test the system; (d) develop a communication framework to support user interaction and delivery of information to users. The interaction and delivery channels will include voice and text input via web-based system (e.g. IFIS), agent-based bots (e.g. Microsoft Skype, Facebook Messenger), smartphone and augmented reality applications (e.g. smart assistant), and automated web workflows (e.g. IFTTT, CloudWork) to open the knowledge discovery for flooding to thousands of community extensible web workflows.
Multisensory emotion perception in congenitally, early, and late deaf CI users

PubMed Central

Nava, Elena; Villwock, Agnes K.; Büchner, Andreas; Lenarz, Thomas; Röder, Brigitte

2017-01-01

Emotions are commonly recognized by combining auditory and visual signals (i.e., vocal and facial expressions). Yet it is unknown whether the ability to link emotional signals across modalities depends on early experience with audio-visual stimuli. In the present study, we investigated the role of auditory experience at different stages of development for auditory, visual, and multisensory emotion recognition abilities in three groups of adolescent and adult cochlear implant (CI) users. CI users had a different deafness onset and were compared to three groups of age- and gender-matched hearing control participants. We hypothesized that congenitally deaf (CD) but not early deaf (ED) and late deaf (LD) CI users would show reduced multisensory interactions and a higher visual dominance in emotion perception than their hearing controls. The CD (n = 7), ED (deafness onset: <3 years of age; n = 7), and LD (deafness onset: >3 years; n = 13) CI users and the control participants performed an emotion recognition task with auditory, visual, and audio-visual emotionally congruent and incongruent nonsense speech stimuli. In different blocks, participants judged either the vocal (Voice task) or the facial expressions (Face task). In the Voice task, all three CI groups performed overall less efficiently than their respective controls and experienced higher interference from incongruent facial information. Furthermore, the ED CI users benefitted more than their controls from congruent faces and the CD CI users showed an analogous trend. In the Face task, recognition efficiency of the CI users and controls did not differ. Our results suggest that CI users acquire multisensory interactions to some degree, even after congenital deafness. When judging affective prosody they appear impaired and more strongly biased by concurrent facial information than typically hearing individuals. We speculate that limitations inherent to the CI contribute to these group differences. PMID:29023525
Multisensory emotion perception in congenitally, early, and late deaf CI users.

PubMed

Fengler, Ineke; Nava, Elena; Villwock, Agnes K; Büchner, Andreas; Lenarz, Thomas; Röder, Brigitte

2017-01-01

Emotions are commonly recognized by combining auditory and visual signals (i.e., vocal and facial expressions). Yet it is unknown whether the ability to link emotional signals across modalities depends on early experience with audio-visual stimuli. In the present study, we investigated the role of auditory experience at different stages of development for auditory, visual, and multisensory emotion recognition abilities in three groups of adolescent and adult cochlear implant (CI) users. CI users had a different deafness onset and were compared to three groups of age- and gender-matched hearing control participants. We hypothesized that congenitally deaf (CD) but not early deaf (ED) and late deaf (LD) CI users would show reduced multisensory interactions and a higher visual dominance in emotion perception than their hearing controls. The CD (n = 7), ED (deafness onset: <3 years of age; n = 7), and LD (deafness onset: >3 years; n = 13) CI users and the control participants performed an emotion recognition task with auditory, visual, and audio-visual emotionally congruent and incongruent nonsense speech stimuli. In different blocks, participants judged either the vocal (Voice task) or the facial expressions (Face task). In the Voice task, all three CI groups performed overall less efficiently than their respective controls and experienced higher interference from incongruent facial information. Furthermore, the ED CI users benefitted more than their controls from congruent faces and the CD CI users showed an analogous trend. In the Face task, recognition efficiency of the CI users and controls did not differ. Our results suggest that CI users acquire multisensory interactions to some degree, even after congenital deafness. When judging affective prosody they appear impaired and more strongly biased by concurrent facial information than typically hearing individuals. We speculate that limitations inherent to the CI contribute to these group differences.
Speaking in Character: Voice Communication in Virtual Worlds

NASA Astrophysics Data System (ADS)

Wadley, Greg; Gibbs, Martin R.

This chapter summarizes 5 years of research on the implications of introducing voice communication systems to virtual worlds. Voice introduces both benefits and problems for players of fast-paced team games, from better coordination of groups and greater social presence of fellow players on the positive side, to negative features such as channel congestion, transmission of noise, and an unwillingness by some to use voice with strangers online. Similarly, in non-game worlds like Second Life, issues related to identity and impression management play important roles, as voice may build greater trust that is especially important for business users, yet it erodes the anonymity and ability to conceal social attributes like gender that are important for other users. A very different mixture of problems and opportunities exists when users conduct several simultaneous conversations in multiple text and voice channels. Technical difficulties still exist with current systems, including the challenge of debugging and harmonizing all the participants' voice setups. Different groups use virtual worlds for very different purposes, so a single modality may not suit all.
Building VoiceXML-Based Applications

DTIC Science & Technology

2002-01-01

basketball games. The Busline systems were pri- y developed using an early implementation of VoiceXML he NBA Update Line was developed using VoiceXML...traveling in and out of Pittsburgh’s rsity neighborhood. The second project is the NBA Up- Line, which provides callers with real-time information NBA ... NBA UPDATE LINE The target user of this system is a fairly knowledgeable basket- ball fan; the system must therefore be able to provide detailed

Voice stress analysis and evaluation

NASA Astrophysics Data System (ADS)

Haddad, Darren M.; Ratley, Roy J.

2001-02-01

Voice Stress Analysis (VSA) systems are marketed as computer-based systems capable of measuring stress in a person's voice as an indicator of deception. They are advertised as being less expensive, easier to use, less invasive in use, and less constrained in their operation then polygraph technology. The National Institute of Justice have asked the Air Force Research Laboratory for assistance in evaluating voice stress analysis technology. Law enforcement officials have also been asking questions about this technology. If VSA technology proves to be effective, its value for military and law enforcement application is tremendous.
Common neural systems associated with the recognition of famous faces and names: An event-related fMRI study

PubMed Central

Nielson, Kristy A.; Seidenberg, Michael; Woodard, John L.; Durgerian, Sally; Zhang, Qi; Gross, William L.; Gander, Amelia; Guidotti, Leslie M.; Antuono, Piero; Rao, Stephen M.

2010-01-01

Person recognition can be accomplished through several modalities (face, name, voice). Lesion, neurophysiology and neuroimaging studies have been conducted in an attempt to determine the similarities and differences in the neural networks associated with person identity via different modality inputs. The current study used event-related functional-MRI in 17 healthy participants to directly compare activation in response to randomly presented famous and non-famous names and faces (25 stimuli in each of the four categories). Findings indicated distinct areas of activation that differed for faces and names in regions typically associated with pre-semantic perceptual processes. In contrast, overlapping brain regions were activated in areas associated with the retrieval of biographical knowledge and associated social affective features. Specifically, activation for famous faces was primarily right lateralized and famous names were left lateralized. However, for both stimuli, similar areas of bilateral activity were observed in the early phases of perceptual processing. Activation for fame, irrespective of stimulus modality, activated an extensive left hemisphere network, with bilateral activity observed in the hippocampi, posterior cingulate, and middle temporal gyri. Findings are discussed within the framework of recent proposals concerning the neural network of person identification. PMID:20167415
Voice and choice in health care in England: understanding citizen responses to dissatisfaction.

PubMed

Dowding, Keith; John, Peter

2011-01-01

Using data from a five-year online survey the paper examines the effects of relative satisfaction with health services on individuals' voice-and-choice activity in the English public health care system. Voice is considered in three parts – individual voice (complaints), collective voice voting and participation (collective action). Exercising choice is seen in terms of complete exit (not using health care), internal exit (choosing another public service provider) and private exit (using private health care). The interaction of satisfaction and forms of voice and choice are analysed over time. Both voice and choice are correlated with dissatisfaction with those who are unhappy with the NHS more likely to privately voice and to plan to take up private health care. Those unable to choose private provision are likely to use private voice. These factors are not affected by items associated with social capital – indeed, being more trusting leads to lower voice activity.
Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?

PubMed

Ali, Zulfiqar; Alsulaiman, Mansour; Muhammad, Ghulam; Elamvazuthi, Irraivan; Al-Nasheri, Ahmed; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H

2017-05-01

A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

DOEpatents

Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

2002-01-01

Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.
Performance of wavelet analysis and neural networks for pathological voices identification

NASA Astrophysics Data System (ADS)

Salhi, Lotfi; Talbi, Mourad; Abid, Sabeur; Cherif, Adnane

2011-09-01

Within the medical environment, diverse techniques exist to assess the state of the voice of the patient. The inspection technique is inconvenient for a number of reasons, such as its high cost, the duration of the inspection, and above all, the fact that it is an invasive technique. This study focuses on a robust, rapid and accurate system for automatic identification of pathological voices. This system employs non-invasive, non-expensive and fully automated method based on hybrid approach: wavelet transform analysis and neural network classifier. First, we present the results obtained in our previous study while using classic feature parameters. These results allow visual identification of pathological voices. Second, quantified parameters drifting from the wavelet analysis are proposed to characterise the speech sample. On the other hand, a system of multilayer neural networks (MNNs) has been developed which carries out the automatic detection of pathological voices. The developed method was evaluated using voice database composed of recorded voice samples (continuous speech) from normophonic or dysphonic speakers. The dysphonic speakers were patients of a National Hospital 'RABTA' of Tunis Tunisia and a University Hospital in Brussels, Belgium. Experimental results indicate a success rate ranging between 75% and 98.61% for discrimination of normal and pathological voices using the proposed parameters and neural network classifier. We also compared the average classification rate based on the MNN, Gaussian mixture model and support vector machines.
[The comparative assessment of the vocal function in the professional voice users and non-occupational voice users in the late adulthood].

PubMed

Pavlikhin, O G; Romanenko, S G; Krasnikova, D I; Lesogorova, E V; Yakovlev, V S

The objective of the present study was to evaluate the clinical and functional condition of the voice apparatus in the elderly patients and to elaborate recommendations for the prevention of disturbances of the vocal function in the professional voice users. This comprehensive study involved 95 patients including the active professional voice users (n=48) and 45 non-occupational voice users at the age from 61 to 82 years with the employment history varying from 32 to 51 years. The study was designed to obtain the voice characteristics by means of the subjective auditory assessment, microlaryngoscopy, video laryngostroboscopy, determination of maximum phonation time (MPT), and computer-assisted acoustic analysis of the voice with the use of the MDVP Kay Pentaxy system. The level of anxiety of the patients was estimated based on the results of the HADS questionnaire study. It is concluded that the majority of the disturbances of the vocal function in the professional voice users have the functional nature. It is concluded that the method of neuro-muscular electrophonopedic stimulation (NMEPS) of laryngeal muscles is the method of choice for the diagnostics of the vocal function of the voice users in the late adulthood. It is recommended that the professional vocal load for such subjects should not exceed 12-14 hours per week. Rational psychotherapy must constitute an important component of the system of measures intended to support the working capacity of the voice users belonging to this age group.
QM/PSK Voice/Data Modem

DOT National Transportation Integrated Search

1976-03-01

Two Quadrature Modulation/Phase Shift Keyed (QM/PSK) Voice/Data Modem systems have been developed as part of the satellite communications hardware for advanced air traffic control systems. These systems consist of a modulator and demodulator unti whi...
Infant face interest is associated with voice information and maternal psychological health.

PubMed

Taylor, Gemma; Slade, Pauline; Herbert, Jane S

2014-11-01

Early infant interest in their mother's face is driven by an experience based face processing system, and is associated with maternal psychological health, even within a non clinical community sample. The present study examined the role of the voice in eliciting infants' interest in mother and stranger faces and in the association between infant face interest and maternal psychological health. Infants aged 3.5-months were shown photographs of their mother's and a stranger's face paired with an audio recording of their mother's and a stranger's voice that was either matched (e.g., mother's face and voice) or mismatched (e.g., mother's face and stranger's voice). Infants spent more time attending to the stranger's matched face and voice than the mother's matched face and voice and the mismatched faces and voices. Thus, infants demonstrated an earlier preference for a stranger's face when given voice information than when the face is presented alone. In the present sample, maternal psychological health varied with 56.7% of mothers reporting mild mood symptoms (depression, anxiety or stress response to childbirth). Infants of mothers with significant mild maternal mood symptoms looked longer at the faces and voices compared to infants of mothers who did not report mild maternal mood symptoms. In sum, infants' experience based face processing system is sensitive to their mothers' maternal psychological health and the multimodal nature of faces. Copyright © 2014 Elsevier Inc. All rights reserved.
Voice loops as coordination aids in space shuttle mission control.

PubMed

Patterson, E S; Watts-Perotti, J; Woods, D D

1999-01-01

Voice loops, an auditory groupware technology, are essential coordination support tools for experienced practitioners in domains such as air traffic management, aircraft carrier operations and space shuttle mission control. They support synchronous communication on multiple channels among groups of people who are spatially distributed. In this paper, we suggest reasons for why the voice loop system is a successful medium for supporting coordination in space shuttle mission control based on over 130 hours of direct observation. Voice loops allow practitioners to listen in on relevant communications without disrupting their own activities or the activities of others. In addition, the voice loop system is structured around the mission control organization, and therefore directly supports the demands of the domain. By understanding how voice loops meet the particular demands of the mission control environment, insight can be gained for the design of groupware tools to support cooperative activity in other event-driven domains.
A survey of the state-of-the-art and focused research in range systems, task 1

NASA Technical Reports Server (NTRS)

Omura, J. K.

1986-01-01

This final report presents the latest research activity in voice compression. We have designed a non-real time simulation system that is implemented around the IBM-PC where the IBM-PC is used as a speech work station for data acquisition and analysis of voice samples. A real-time implementation is also proposed. This real-time Voice Compression Board (VCB) is built around the Texas Instruments TMS-3220. The voice compression algorithm investigated here was described in an earlier report titled, Low Cost Voice Compression for Mobile Digital Radios, by the author. We will assume the reader is familiar with the voice compression algorithm discussed in this report. The VCB compresses speech waveforms at data rates ranging from 4.8 K bps to 16 K bps. This board interfaces to the IBM-PC 8-bit bus, and plugs into a single expansion slot on the mother board.
Voice loops as coordination aids in space shuttle mission control

NASA Technical Reports Server (NTRS)

Patterson, E. S.; Watts-Perotti, J.; Woods, D. D.

1999-01-01

Voice loops, an auditory groupware technology, are essential coordination support tools for experienced practitioners in domains such as air traffic management, aircraft carrier operations and space shuttle mission control. They support synchronous communication on multiple channels among groups of people who are spatially distributed. In this paper, we suggest reasons for why the voice loop system is a successful medium for supporting coordination in space shuttle mission control based on over 130 hours of direct observation. Voice loops allow practitioners to listen in on relevant communications without disrupting their own activities or the activities of others. In addition, the voice loop system is structured around the mission control organization, and therefore directly supports the demands of the domain. By understanding how voice loops meet the particular demands of the mission control environment, insight can be gained for the design of groupware tools to support cooperative activity in other event-driven domains.
Twenty-Channel Voice Response System

DOT National Transportation Integrated Search

1981-06-01

This report documents the design and implementation of a Voice Response System, which provides Direct-User Access to the FAA's aviation-weather data base. This system supports 20 independent audio channels, and as of this report, speaks three weather...
Talker familiarity and spoken word recognition in school-age children*

PubMed Central

Levi, Susannah V.

2014-01-01

Research with adults has shown that spoken language processing is improved when listeners are familiar with talkers’ voices, known as the familiar talker advantage. The current study explored whether this ability extends to school-age children, who are still acquiring language. Children were familiarized with the voices of three German–English bilingual talkers and were tested on the speech of six bilinguals, three of whom were familiar. Results revealed that children do show improved spoken language processing when they are familiar with the talkers, but this improvement was limited to highly familiar lexical items. This restriction of the familiar talker advantage is attributed to differences in the representation of highly familiar and less familiar lexical items. In addition, children did not exhibit accent-general learning; despite having been exposed to German-accented talkers during training, there was no improvement for novel German-accented talkers. PMID:25159173
Research on oral test modeling based on multi-feature fusion

NASA Astrophysics Data System (ADS)

Shi, Yuliang; Tao, Yiyue; Lei, Jun

2018-04-01

In this paper, the spectrum of speech signal is taken as an input of feature extraction. The advantage of PCNN in image segmentation and other processing is used to process the speech spectrum and extract features. And a new method combining speech signal processing and image processing is explored. At the same time of using the features of the speech map, adding the MFCC to establish the spectral features and integrating them with the features of the spectrogram to further improve the accuracy of the spoken language recognition. Considering that the input features are more complicated and distinguishable, we use Support Vector Machine (SVM) to construct the classifier, and then compare the extracted test voice features with the standard voice features to achieve the spoken standard detection. Experiments show that the method of extracting features from spectrograms using PCNN is feasible, and the fusion of image features and spectral features can improve the detection accuracy.
Human voice quality measurement in noisy environments.

PubMed

Ueng, Shyh-Kuang; Luo, Cheng-Ming; Tsai, Tsung-Yu; Yeh, Hsuan-Chen

2015-01-01

Computerized acoustic voice measurement is essential for the diagnosis of vocal pathologies. Previous studies showed that ambient noises have significant influences on the accuracy of voice quality assessment. This paper presents a voice quality assessment system that can accurately measure qualities of voice signals, even though the input voice data are contaminated by low-frequency noises. The ambient noises in our living rooms and laboratories are collected and the frequencies of these noises are analyzed. Based on the analysis, a filter is designed to reduce noise level of the input voice signal. Then, improved numerical algorithms are employed to extract voice parameters from the voice signal to reveal the health of the voice signal. Compared with MDVP and Praat, the proposed method outperforms these two widely used programs in measuring fundamental frequency and harmonic-to-noise ratio, and its performance is comparable to these two famous programs in computing jitter and shimmer. The proposed voice quality assessment method is resistant to low-frequency noises and it can measure human voice quality in environments filled with noises from air-conditioners, ceiling fans and cooling fans of computers.
Speech Recognition: Acoustic-Phonetic Knowledge Acquisition and Representation.

DTIC Science & Technology

1987-09-25

the release duration is the voice onset time, or VOT. For the purpose of this investigation, alveolar flaps ( as in "butter’) and and glottalized /t/’s...Cambridge, Massachusetts 02139 Abstract females and 8 males. The other sentence was said by 7 females We discuss a framework for an acoustic-phonetic...tarned a number of semivowels. One sentence was said by 6 vowels + + "jpporte.d by a Xerox Fellowhsp Table It Features which characterite
Using Voice Recognition Equipment to Run the Warfare Environmental Simulator (WES),

DTIC Science & Technology

1981-03-01

simulations and models are often used. War games are a type of simulation frequently used by the military to evaluate C3 effectiveness. Through the use of a...to 162 words or short phrases (Appendix B). B. EQUIPMENT USED 1. Hardware Description [13] For the experiment a Threshold Model T600 discrete... Model T600 terminal used in this experiment con- sists of an analog speech preprocessor, microcomputer, CRT/keyboard unit, magnetic tape cartridge unit
Poetry and Neuroscience:

PubMed Central

Wilkes, James; Scott, Sophie K

2016-01-01

ABSTRACT Dialogues and collaborations between scientists and non-scientists are now widely understood as important elements of scientific research and public engagement with science. In recognition of this, the authors, a neuroscientist and a poet, use a dialogical approach to extend questions and ideas first shared during a lab-based poetry residency. They recorded a conversation and then expanded it into an essayistic form, allowing divergent disciplinary understandings and uses of experiment, noise, voice and emotion to be articulated, shared and questioned. PMID:27885317
Effects of Voice Coding and Speech Rate on a Synthetic Speech Display in a Telephone Information System

DTIC Science & Technology

1988-05-01

Seeciv Limited- System for varying Senses term filter capacity output until some Figure 2. Original limited-capacity channel model (Frim Broadbent, 1958) S...2 Figure 2. Original limited-capacity channel model (From Broadbent, 1958) .... 10 Figure 3. Experimental...unlimited variety of human voices for digital recording sources. Synthesis by Analysis Analysis-synthesis methods electronically model the human voice

Some links on this page may take you to non-federal websites. Their policies may differ from this site.