Sample records for voice recognition algorithms

  1. Digital signal processing algorithms for automatic voice recognition

    NASA Technical Reports Server (NTRS)

    Botros, Nazeih M.

    1987-01-01

    The current digital signal analysis algorithms are investigated that are implemented in automatic voice recognition algorithms. Automatic voice recognition means, the capability of a computer to recognize and interact with verbal commands. The digital signal is focused on, rather than the linguistic, analysis of speech signal. Several digital signal processing algorithms are available for voice recognition. Some of these algorithms are: Linear Predictive Coding (LPC), Short-time Fourier Analysis, and Cepstrum Analysis. Among these algorithms, the LPC is the most widely used. This algorithm has short execution time and do not require large memory storage. However, it has several limitations due to the assumptions used to develop it. The other 2 algorithms are frequency domain algorithms with not many assumptions, but they are not widely implemented or investigated. However, with the recent advances in the digital technology, namely signal processors, these 2 frequency domain algorithms may be investigated in order to implement them in voice recognition. This research is concerned with real time, microprocessor based recognition algorithms.

  2. Evaluation of Speech Recognition of Cochlear Implant Recipients Using Adaptive, Digital Remote Microphone Technology and a Speech Enhancement Sound Processing Algorithm.

    PubMed

    Wolfe, Jace; Morais, Mila; Schafer, Erin; Agrawal, Smita; Koch, Dawn

    2015-05-01

    Cochlear implant recipients often experience difficulty with understanding speech in the presence of noise. Cochlear implant manufacturers have developed sound processing algorithms designed to improve speech recognition in noise, and research has shown these technologies to be effective. Remote microphone technology utilizing adaptive, digital wireless radio transmission has also been shown to provide significant improvement in speech recognition in noise. There are no studies examining the potential improvement in speech recognition in noise when these two technologies are used simultaneously. The goal of this study was to evaluate the potential benefits and limitations associated with the simultaneous use of a sound processing algorithm designed to improve performance in noise (Advanced Bionics ClearVoice) and a remote microphone system that incorporates adaptive, digital wireless radio transmission (Phonak Roger). A two-by-two way repeated measures design was used to examine performance differences obtained without these technologies compared to the use of each technology separately as well as the simultaneous use of both technologies. Eleven Advanced Bionics (AB) cochlear implant recipients, ages 11 to 68 yr. AzBio sentence recognition was measured in quiet and in the presence of classroom noise ranging in level from 50 to 80 dBA in 5-dB steps. Performance was evaluated in four conditions: (1) No ClearVoice and no Roger, (2) ClearVoice enabled without the use of Roger, (3) ClearVoice disabled with Roger enabled, and (4) simultaneous use of ClearVoice and Roger. Speech recognition in quiet was better than speech recognition in noise for all conditions. Use of ClearVoice and Roger each provided significant improvement in speech recognition in noise. The best performance in noise was obtained with the simultaneous use of ClearVoice and Roger. ClearVoice and Roger technology each improves speech recognition in noise, particularly when used at the same time. Because ClearVoice does not degrade performance in quiet settings, clinicians should consider recommending ClearVoice for routine, full-time use for AB implant recipients. Roger should be used in all instances in which remote microphone technology may assist the user in understanding speech in the presence of noise. American Academy of Audiology.

  3. What does voice-processing technology support today?

    PubMed Central

    Nakatsu, R; Suzuki, Y

    1995-01-01

    This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions. Images Fig. 3 PMID:7479720

  4. [Research on Control System of an Exoskeleton Upper-limb Rehabilitation Robot].

    PubMed

    Wang, Lulu; Hu, Xin; Hu, Jie; Fang, Youfang; He, Rongrong; Yu, Hongliu

    2016-12-01

    In order to help the patients with upper-limb disfunction go on rehabilitation training,this paper proposed an upper-limb exoskeleton rehabilitation robot with four degrees of freedom(DOF),and realized two control schemes,i.e.,voice control and electromyography control.The hardware and software design of the voice control system was completed based on RSC-4128 chips,which realized the speech recognition technology of a specific person.Besides,this study adapted self-made surface eletromyogram(sEMG)signal extraction electrodes to collect sEMG signals and realized pattern recognition by conducting sEMG signals processing,extracting time domain features and fixed threshold algorithm.In addition,the pulse-width modulation(PWM)algorithm was used to realize the speed adjustment of the system.Voice control and electromyography control experiments were then carried out,and the results showed that the mean recognition rate of the voice control and electromyography control reached 93.1%and 90.9%,respectively.The results proved the feasibility of the control system.This study is expected to lay a theoretical foundation for the further improvement of the control system of the upper-limb rehabilitation robot.

  5. Speech recognition for embedded automatic positioner for laparoscope

    NASA Astrophysics Data System (ADS)

    Chen, Xiaodong; Yin, Qingyun; Wang, Yi; Yu, Daoyin

    2014-07-01

    In this paper a novel speech recognition methodology based on Hidden Markov Model (HMM) is proposed for embedded Automatic Positioner for Laparoscope (APL), which includes a fixed point ARM processor as the core. The APL system is designed to assist the doctor in laparoscopic surgery, by implementing the specific doctor's vocal control to the laparoscope. Real-time respond to the voice commands asks for more efficient speech recognition algorithm for the APL. In order to reduce computation cost without significant loss in recognition accuracy, both arithmetic and algorithmic optimizations are applied in the method presented. First, depending on arithmetic optimizations most, a fixed point frontend for speech feature analysis is built according to the ARM processor's character. Then the fast likelihood computation algorithm is used to reduce computational complexity of the HMM-based recognition algorithm. The experimental results show that, the method shortens the recognition time within 0.5s, while the accuracy higher than 99%, demonstrating its ability to achieve real-time vocal control to the APL.

  6. "Who" is saying "what"? Brain-based decoding of human voice and speech.

    PubMed

    Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer

    2008-11-07

    Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.

  7. The program complex for vocal recognition

    NASA Astrophysics Data System (ADS)

    Konev, Anton; Kostyuchenko, Evgeny; Yakimuk, Alexey

    2017-01-01

    This article discusses the possibility of applying the algorithm of determining the pitch frequency for the note recognition problems. Preliminary study of programs-analogues were carried out for programs with function “recognition of the music”. The software package based on the algorithm for pitch frequency calculation was implemented and tested. It was shown that the algorithm allows recognizing the notes in the vocal performance of the user. A single musical instrument, a set of musical instruments, and a human voice humming a tune can be the sound source. The input file is initially presented in the .wav format or is recorded in this format from a microphone. Processing is performed by sequentially determining the pitch frequency and conversion of its values to the note. According to test results, modification of algorithms used in the complex was planned.

  8. Evolving Spiking Neural Networks for Recognition of Aged Voices.

    PubMed

    Silva, Marco; Vellasco, Marley M B R; Cataldo, Edson

    2017-01-01

    The aging of the voice, known as presbyphonia, is a natural process that can cause great change in vocal quality of the individual. This is a relevant problem to those people who use their voices professionally, and its early identification can help determine a suitable treatment to avoid its progress or even to eliminate the problem. This work focuses on the development of a new model for the identification of aging voices (independently of their chronological age), using as input attributes parameters extracted from the voice and glottal signals. The proposed model, named Quantum binary-real evolving Spiking Neural Network (QbrSNN), is based on spiking neural networks (SNNs), with an unsupervised training algorithm, and a Quantum-Inspired Evolutionary Algorithm that automatically determines the most relevant attributes and the optimal parameters that configure the SNN. The QbrSNN model was evaluated in a database composed of 120 records, containing samples from three groups of speakers. The results obtained indicate that the proposed model provides better accuracy than other approaches, with fewer input attributes. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  9. Automatic speech recognition research at NASA-Ames Research Center

    NASA Technical Reports Server (NTRS)

    Coler, Clayton R.; Plummer, Robert P.; Huff, Edward M.; Hitchcock, Myron H.

    1977-01-01

    A trainable acoustic pattern recognizer manufactured by Scope Electronics is presented. The voice command system VCS encodes speech by sampling 16 bandpass filters with center frequencies in the range from 200 to 5000 Hz. Variations in speaking rate are compensated for by a compression algorithm that subdivides each utterance into eight subintervals in such a way that the amount of spectral change within each subinterval is the same. The recorded filter values within each subinterval are then reduced to a 15-bit representation, giving a 120-bit encoding for each utterance. The VCS incorporates a simple recognition algorithm that utilizes five training samples of each word in a vocabulary of up to 24 words. The recognition rate of approximately 85 percent correct for untrained speakers and 94 percent correct for trained speakers was not considered adequate for flight systems use. Therefore, the built-in recognition algorithm was disabled, and the VCS was modified to transmit 120-bit encodings to an external computer for recognition.

  10. When the face fits: recognition of celebrities from matching and mismatching faces and voices.

    PubMed

    Stevenage, Sarah V; Neil, Greg J; Hamlin, Iain

    2014-01-01

    The results of two experiments are presented in which participants engaged in a face-recognition or a voice-recognition task. The stimuli were face-voice pairs in which the face and voice were co-presented and were either "matched" (same person), "related" (two highly associated people), or "mismatched" (two unrelated people). Analysis in both experiments confirmed that accuracy and confidence in face recognition was consistently high regardless of the identity of the accompanying voice. However accuracy of voice recognition was increasingly affected as the relationship between voice and accompanying face declined. Moreover, when considering self-reported confidence in voice recognition, confidence remained high for correct responses despite the proportion of these responses declining across conditions. These results converged with existing evidence indicating the vulnerability of voice recognition as a relatively weak signaller of identity, and results are discussed in the context of a person-recognition framework.

  11. Obligatory and facultative brain regions for voice-identity recognition

    PubMed Central

    Roswandowitz, Claudia; Kappes, Claudia; Obrig, Hellmuth; von Kriegstein, Katharina

    2018-01-01

    Abstract Recognizing the identity of others by their voice is an important skill for social interactions. To date, it remains controversial which parts of the brain are critical structures for this skill. Based on neuroimaging findings, standard models of person-identity recognition suggest that the right temporal lobe is the hub for voice-identity recognition. Neuropsychological case studies, however, reported selective deficits of voice-identity recognition in patients predominantly with right inferior parietal lobe lesions. Here, our aim was to work towards resolving the discrepancy between neuroimaging studies and neuropsychological case studies to find out which brain structures are critical for voice-identity recognition in humans. We performed a voxel-based lesion-behaviour mapping study in a cohort of patients (n = 58) with unilateral focal brain lesions. The study included a comprehensive behavioural test battery on voice-identity recognition of newly learned (voice-name, voice-face association learning) and familiar voices (famous voice recognition) as well as visual (face-identity recognition) and acoustic control tests (vocal-pitch and vocal-timbre discrimination). The study also comprised clinically established tests (neuropsychological assessment, audiometry) and high-resolution structural brain images. The three key findings were: (i) a strong association between voice-identity recognition performance and right posterior/mid temporal and right inferior parietal lobe lesions; (ii) a selective association between right posterior/mid temporal lobe lesions and voice-identity recognition performance when face-identity recognition performance was factored out; and (iii) an association of right inferior parietal lobe lesions with tasks requiring the association between voices and faces but not voices and names. The results imply that the right posterior/mid temporal lobe is an obligatory structure for voice-identity recognition, while the inferior parietal lobe is only a facultative component of voice-identity recognition in situations where additional face-identity processing is required. PMID:29228111

  12. Obligatory and facultative brain regions for voice-identity recognition.

    PubMed

    Roswandowitz, Claudia; Kappes, Claudia; Obrig, Hellmuth; von Kriegstein, Katharina

    2018-01-01

    Recognizing the identity of others by their voice is an important skill for social interactions. To date, it remains controversial which parts of the brain are critical structures for this skill. Based on neuroimaging findings, standard models of person-identity recognition suggest that the right temporal lobe is the hub for voice-identity recognition. Neuropsychological case studies, however, reported selective deficits of voice-identity recognition in patients predominantly with right inferior parietal lobe lesions. Here, our aim was to work towards resolving the discrepancy between neuroimaging studies and neuropsychological case studies to find out which brain structures are critical for voice-identity recognition in humans. We performed a voxel-based lesion-behaviour mapping study in a cohort of patients (n = 58) with unilateral focal brain lesions. The study included a comprehensive behavioural test battery on voice-identity recognition of newly learned (voice-name, voice-face association learning) and familiar voices (famous voice recognition) as well as visual (face-identity recognition) and acoustic control tests (vocal-pitch and vocal-timbre discrimination). The study also comprised clinically established tests (neuropsychological assessment, audiometry) and high-resolution structural brain images. The three key findings were: (i) a strong association between voice-identity recognition performance and right posterior/mid temporal and right inferior parietal lobe lesions; (ii) a selective association between right posterior/mid temporal lobe lesions and voice-identity recognition performance when face-identity recognition performance was factored out; and (iii) an association of right inferior parietal lobe lesions with tasks requiring the association between voices and faces but not voices and names. The results imply that the right posterior/mid temporal lobe is an obligatory structure for voice-identity recognition, while the inferior parietal lobe is only a facultative component of voice-identity recognition in situations where additional face-identity processing is required. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain.

  13. Automatic voice recognition using traditional and artificial neural network approaches

    NASA Technical Reports Server (NTRS)

    Botros, Nazeih M.

    1989-01-01

    The main objective of this research is to develop an algorithm for isolated-word recognition. This research is focused on digital signal analysis rather than linguistic analysis of speech. Features extraction is carried out by applying a Linear Predictive Coding (LPC) algorithm with order of 10. Continuous-word and speaker independent recognition will be considered in future study after accomplishing this isolated word research. To examine the similarity between the reference and the training sets, two approaches are explored. The first is implementing traditional pattern recognition techniques where a dynamic time warping algorithm is applied to align the two sets and calculate the probability of matching by measuring the Euclidean distance between the two sets. The second is implementing a backpropagation artificial neural net model with three layers as the pattern classifier. The adaptation rule implemented in this network is the generalized least mean square (LMS) rule. The first approach has been accomplished. A vocabulary of 50 words was selected and tested. The accuracy of the algorithm was found to be around 85 percent. The second approach is in progress at the present time.

  14. Literature review of voice recognition and generation technology for Army helicopter applications

    NASA Astrophysics Data System (ADS)

    Christ, K. A.

    1984-08-01

    This report is a literature review on the topics of voice recognition and generation. Areas covered are: manual versus vocal data input, vocabulary, stress and workload, noise, protective masks, feedback, and voice warning systems. Results of the studies presented in this report indicate that voice data entry has less of an impact on a pilot's flight performance, during low-level flying and other difficult missions, than manual data entry. However, the stress resulting from such missions may cause the pilot's voice to change, reducing the recognition accuracy of the system. The noise present in helicopter cockpits also causes the recognition accuracy to decrease. Noise-cancelling devices are being developed and improved upon to increase the recognition performance in noisy environments. Future research in the fields of voice recognition and generation should be conducted in the areas of stress and workload, vocabulary, and the types of voice generation best suited for the helicopter cockpit. Also, specific tasks should be studied to determine whether voice recognition and generation can be effectively applied.

  15. Illumination-invariant hand gesture recognition

    NASA Astrophysics Data System (ADS)

    Mendoza-Morales, América I.; Miramontes-Jaramillo, Daniel; Kober, Vitaly

    2015-09-01

    In recent years, human-computer interaction (HCI) has received a lot of interest in industry and science because it provides new ways to interact with modern devices through voice, body, and facial/hand gestures. The application range of the HCI is from easy control of home appliances to entertainment. Hand gesture recognition is a particularly interesting problem because the shape and movement of hands usually are complex and flexible to be able to codify many different signs. In this work we propose a three step algorithm: first, detection of hands in the current frame is carried out; second, hand tracking across the video sequence is performed; finally, robust recognition of gestures across subsequent frames is made. Recognition rate highly depends on non-uniform illumination of the scene and occlusion of hands. In order to overcome these issues we use two Microsoft Kinect devices utilizing combined information from RGB and infrared sensors. The algorithm performance is tested in terms of recognition rate and processing time.

  16. Dysphonia Detected by Pattern Recognition of Spectral Composition.

    ERIC Educational Resources Information Center

    Leinonen, Lea; And Others

    1992-01-01

    This study analyzed production of a long vowel sound within Finnish words by normal or dysphonic voices, using the Self-Organizing Map, the artificial neural network algorithm of T. Kohonen which produces two-dimensional representations of speech. The method was found to be both sensitive and specific in the detection of dysphonia. (Author/JDD)

  17. Temporal voice areas exist in autism spectrum disorder but are dysfunctional for voice identity recognition

    PubMed Central

    Borowiak, Kamila; von Kriegstein, Katharina

    2016-01-01

    The ability to recognise the identity of others is a key requirement for successful communication. Brain regions that respond selectively to voices exist in humans from early infancy on. Currently, it is unclear whether dysfunction of these voice-sensitive regions can explain voice identity recognition impairments. Here, we used two independent functional magnetic resonance imaging studies to investigate voice processing in a population that has been reported to have no voice-sensitive regions: autism spectrum disorder (ASD). Our results refute the earlier report that individuals with ASD have no responses in voice-sensitive regions: Passive listening to vocal, compared to non-vocal, sounds elicited typical responses in voice-sensitive regions in the high-functioning ASD group and controls. In contrast, the ASD group had a dysfunction in voice-sensitive regions during voice identity but not speech recognition in the right posterior superior temporal sulcus/gyrus (STS/STG)—a region implicated in processing complex spectrotemporal voice features and unfamiliar voices. The right anterior STS/STG correlated with voice identity recognition performance in controls but not in the ASD group. The findings suggest that right STS/STG dysfunction is critical for explaining voice recognition impairments in high-functioning ASD and show that ASD is not characterised by a general lack of voice-sensitive responses. PMID:27369067

  18. The Voice as Computer Interface: A Look at Tomorrow's Technologies.

    ERIC Educational Resources Information Center

    Lange, Holley R.

    1991-01-01

    Discussion of voice as the communications device for computer-human interaction focuses on voice recognition systems for use within a library environment. Voice technologies are described, including voice response and voice recognition; examples of voice systems in use in libraries are examined; and further possibilities, including use with…

  19. Voice Recognition in Face-Blind Patients

    PubMed Central

    Liu, Ran R.; Pancaroglu, Raika; Hills, Charlotte S.; Duchaine, Brad; Barton, Jason J. S.

    2016-01-01

    Right or bilateral anterior temporal damage can impair face recognition, but whether this is an associative variant of prosopagnosia or part of a multimodal disorder of person recognition is an unsettled question, with implications for cognitive and neuroanatomic models of person recognition. We assessed voice perception and short-term recognition of recently heard voices in 10 subjects with impaired face recognition acquired after cerebral lesions. All 4 subjects with apperceptive prosopagnosia due to lesions limited to fusiform cortex had intact voice discrimination and recognition. One subject with bilateral fusiform and anterior temporal lesions had a combined apperceptive prosopagnosia and apperceptive phonagnosia, the first such described case. Deficits indicating a multimodal syndrome of person recognition were found only in 2 subjects with bilateral anterior temporal lesions. All 3 subjects with right anterior temporal lesions had normal voice perception and recognition, 2 of whom performed normally on perceptual discrimination of faces. This confirms that such lesions can cause a modality-specific associative prosopagnosia. PMID:25349193

  20. Implementation and preliminary evaluation of 'C-tone': A novel algorithm to improve lexical tone recognition in Mandarin-speaking cochlear implant users.

    PubMed

    Ping, Lichuan; Wang, Ningyuan; Tang, Guofang; Lu, Thomas; Yin, Li; Tu, Wenhe; Fu, Qian-Jie

    2017-09-01

    Because of limited spectral resolution, Mandarin-speaking cochlear implant (CI) users have difficulty perceiving fundamental frequency (F0) cues that are important to lexical tone recognition. To improve Mandarin tone recognition in CI users, we implemented and evaluated a novel real-time algorithm (C-tone) to enhance the amplitude contour, which is strongly correlated with the F0 contour. The C-tone algorithm was implemented in clinical processors and evaluated in eight users of the Nurotron NSP-60 CI system. Subjects were given 2 weeks of experience with C-tone. Recognition of Chinese tones, monosyllables, and disyllables in quiet was measured with and without the C-tone algorithm. Subjective quality ratings were also obtained for C-tone. After 2 weeks of experience with C-tone, there were small but significant improvements in recognition of lexical tones, monosyllables, and disyllables (P < 0.05 in all cases). Among lexical tones, the largest improvements were observed for Tone 3 (falling-rising) and the smallest for Tone 4 (falling). Improvements with C-tone were greater for disyllables than for monosyllables. Subjective quality ratings showed no strong preference for or against C-tone, except for perception of own voice, where C-tone was preferred. The real-time C-tone algorithm provided small but significant improvements for speech performance in quiet with no change in sound quality. Pre-processing algorithms to reduce noise and better real-time F0 extraction would improve the benefits of C-tone in complex listening environments. Chinese CI users' speech recognition in quiet can be significantly improved by modifying the amplitude contour to better resemble the F0 contour.

  1. Famous faces and voices: Differential profiles in early right and left semantic dementia and in Alzheimer's disease.

    PubMed

    Luzzi, Simona; Baldinelli, Sara; Ranaldi, Valentina; Fabi, Katia; Cafazzo, Viviana; Fringuelli, Fabio; Silvestrini, Mauro; Provinciali, Leandro; Reverberi, Carlo; Gainotti, Guido

    2017-01-08

    Famous face and voice recognition is reported to be impaired both in semantic dementia (SD) and in Alzheimer's Disease (AD), although more severely in the former. In AD a coexistence of perceptual impairment in face and voice processing has also been reported and this could contribute to the altered performance in complex semantic tasks. On the other hand, in SD both face and voice recognition disorders could be related to the prevalence of atrophy in the right temporal lobe (RTL). The aim of the present study was twofold: (1) to investigate famous faces and voices recognition in SD and AD to verify if the two diseases show a differential pattern of impairment, resulting from disruption of different cognitive mechanisms; (2) to check if face and voice recognition disorders prevail in patients with atrophy mainly affecting the RTL. To avoid the potential influence of primary perceptual problems in face and voice recognition, a pool of patients suffering from early SD and AD were administered a detailed set of tests exploring face and voice perception. Thirteen SD (8 with prevalence of right and 5 with prevalence of left temporal atrophy) and 25 CE patients, who did not show visual and auditory perceptual impairment, were finally selected and were administered an experimental battery exploring famous face and voice recognition and naming. Twelve SD patients underwent cerebral PET imaging and were classified in right and left SD according to the onset modality and to the prevalent decrease in FDG uptake in right or left temporal lobe respectively. Correlation of PET imaging and famous face and voice recognition was performed. Results showed a differential performance profile in the two diseases, because AD patients were significantly impaired in the naming tests, but showed preserved recognition, whereas SD patients were profoundly impaired both in naming and in recognition of famous faces and voices. Furthermore, face and voice recognition disorders prevailed in SD patients with RTL atrophy, who also showed a conceptual impairment on the Pyramids and Palm Trees test more important in the pictorial than in the verbal modality. Finally, in 12SD patients in whom PET was available, a strong correlation between FDG uptake and face-to-name and voice-to-name matching data was found in the right but not in the left temporal lobe. The data support the hypothesis of a different cognitive basis for impairment of face and voice recognition in the two dementias and suggest that the pattern of impairment in SD may be due to a loss of semantic representations, while a defect of semantic control, with impaired naming and preserved recognition might be hypothesized in AD. Furthermore, the correlation between face and voice recognition disorders and RTL damage are consistent with the hypothesis assuming that in the RTL person-specific knowledge may be mainly based upon non-verbal representations. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. An estimate of the prevalence of developmental phonagnosia.

    PubMed

    Shilowich, Bryan E; Biederman, Irving

    2016-08-01

    A web-based survey estimated the distribution of voice recognition abilities with a focus on determining the prevalence of developmental phonagnosia, the inability to identify a familiar person based on their voice. Participants matched clips of 50 celebrity voices to 1-4 named headshots of celebrities whose voices they had previously rated for familiarity. Given a strong correlation between rated familiarity and recognition performance, a residual was calculated based on the average familiarity rating on each trial, which thus constituted each respondent's voice recognition ability that could not be accounted for by familiarity. 3.2% of the respondents (23 of 730 participants) had residual recognition scores 2.28 SDs below the mean (whereas 8 or 1.1% would have been expected from a normal distribution). They also judged whether they could imagine the voice of five familiar celebrities. Individuals who had difficulty in imagining voices were also generally below average in their accuracy of recognition. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Voice Recognition: A New Assessment Tool?

    ERIC Educational Resources Information Center

    Jones, Darla

    2005-01-01

    This article presents the results of a study conducted in Anchorage, Alaska, that evaluated the accuracy and efficiency of using voice recognition (VR) technology to collect oral reading fluency data for classroom-based assessments. The primary research question was as follows: Is voice recognition technology a valid and reliable alternative to…

  4. Children's Recognition of Cartoon Voices.

    ERIC Educational Resources Information Center

    Spence, Melanie J.; Rollins, Pamela R.; Jerger, Susan

    2002-01-01

    A study examined developmental changes in talker recognition skills by assessing 72 children's (ages 3-5) recognition of 20 cartoon characters' voices. Four- and 5-year-old children recognized more of the voices than did 3-year-olds. All children were more accurate at recognizing more familiar characters than less familiar characters. (Contains…

  5. Voice Recognition Software Accuracy with Second Language Speakers of English.

    ERIC Educational Resources Information Center

    Coniam, D.

    1999-01-01

    Explores the potential of the use of voice-recognition technology with second-language speakers of English. Involves the analysis of the output produced by a small group of very competent second-language subjects reading a text into the voice recognition software Dragon Systems "Dragon NaturallySpeaking." (Author/VWL)

  6. Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

    PubMed

    Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado

    2016-01-01

    Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).

  7. It doesn't matter what you say: FMRI correlates of voice learning and recognition independent of speech content.

    PubMed

    Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal

    2017-09-01

    Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Evaluation of a voice recognition system for the MOTAS pseudo pilot station function

    NASA Technical Reports Server (NTRS)

    Houck, J. A.

    1982-01-01

    The Langley Research Center has undertaken a technology development activity to provide a capability, the mission oriented terminal area simulation (MOTAS), wherein terminal area and aircraft systems studies can be performed. An experiment was conducted to evaluate state-of-the-art voice recognition technology and specifically, the Threshold 600 voice recognition system to serve as an aircraft control input device for the MOTAS pseudo pilot station function. The results of the experiment using ten subjects showed a recognition error of 3.67 percent for a 48-word vocabulary tested against a programmed vocabulary of 103 words. After the ten subjects retrained the Threshold 600 system for the words which were misrecognized or rejected, the recognition error decreased to 1.96 percent. The rejection rates for both cases were less than 0.70 percent. Based on the results of the experiment, voice recognition technology and specifically the Threshold 600 voice recognition system were chosen to fulfill this MOTAS function.

  9. Effects of emotional and perceptual-motor stress on a voice recognition system's accuracy: An applied investigation

    NASA Astrophysics Data System (ADS)

    Poock, G. K.; Martin, B. J.

    1984-02-01

    This was an applied investigation examining the ability of a speech recognition system to recognize speakers' inputs when the speakers were under different stress levels. Subjects were asked to speak to a voice recognition system under three conditions: (1) normal office environment, (2) emotional stress, and (3) perceptual-motor stress. Results indicate a definite relationship between voice recognition system performance and the type of low stress reference patterns used to achieve recognition.

  10. Benefits for Voice Learning Caused by Concurrent Faces Develop over Time.

    PubMed

    Zäske, Romi; Mühl, Constanze; Schweinberger, Stefan R

    2015-01-01

    Recognition of personally familiar voices benefits from the concurrent presentation of the corresponding speakers' faces. This effect of audiovisual integration is most pronounced for voices combined with dynamic articulating faces. However, it is unclear if learning unfamiliar voices also benefits from audiovisual face-voice integration or, alternatively, is hampered by attentional capture of faces, i.e., "face-overshadowing". In six study-test cycles we compared the recognition of newly-learned voices following unimodal voice learning vs. bimodal face-voice learning with either static (Exp. 1) or dynamic articulating faces (Exp. 2). Voice recognition accuracies significantly increased for bimodal learning across study-test cycles while remaining stable for unimodal learning, as reflected in numerical costs of bimodal relative to unimodal voice learning in the first two study-test cycles and benefits in the last two cycles. This was independent of whether faces were static images (Exp. 1) or dynamic videos (Exp. 2). In both experiments, slower reaction times to voices previously studied with faces compared to voices only may result from visual search for faces during memory retrieval. A general decrease of reaction times across study-test cycles suggests facilitated recognition with more speaker repetitions. Overall, our data suggest two simultaneous and opposing mechanisms during bimodal face-voice learning: while attentional capture of faces may initially impede voice learning, audiovisual integration may facilitate it thereafter.

  11. Memory strength and specificity revealed by pupillometry

    PubMed Central

    Papesh, Megan H.; Goldinger, Stephen D.; Hout, Michael C.

    2011-01-01

    Voice-specificity effects in recognition memory were investigated using both behavioral data and pupillometry. Volunteers initially heard spoken words and nonwords in two voices; they later provided confidence-based old/new classifications to items presented in their original voices, changed (but familiar) voices, or entirely new voices. Recognition was more accurate for old-voice items, replicating prior research. Pupillometry was used to gauge cognitive demand during both encoding and testing: Enlarged pupils revealed that participants devoted greater effort to encoding items that were subsequently recognized. Further, pupil responses were sensitive to the cue match between encoding and retrieval voices, as well as memory strength. Strong memories, and those with the closest encoding-retrieval voice matches, resulted in the highest peak pupil diameters. The results are discussed with respect to episodic memory models and Whittlesea’s (1997) SCAPE framework for recognition memory. PMID:22019480

  12. Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

    NASA Astrophysics Data System (ADS)

    White, R. W.; Parks, D. L.

    1985-07-01

    A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept.

  13. Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

    NASA Technical Reports Server (NTRS)

    White, R. W.; Parks, D. L.

    1985-01-01

    A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept.

  14. Pilot study on the feasibility of a computerized speech recognition charting system.

    PubMed

    Feldman, C A; Stevens, D

    1990-08-01

    The objective of this study was to determine the feasibility of developing and using a voice recognition computerized charting system to record dental clinical examination data. More specifically, the study was designed to analyze the time and error differential between the traditional examiner/recorder method (ASSISTANT) and computerized voice recognition method (VOICE). DMFS examinations were performed twice on 20 patients using the traditional ASSISTANT and the VOICE charting system. A statistically significant difference was found when comparing the mean ASSISTANT time of 2.69 min to the VOICE time of 3.72 min (P less than 0.001). No statistically significant difference was found when comparing the mean ASSISTANT recording errors of 0.1 to VOICE recording errors of 0.6 (P = 0.059). 90% of the patients indicated they felt comfortable with the dentist talking to a computer and only 5% of the sample indicated they opposed VOICE. Results from this pilot study indicate that a charting system utilizing voice recognition technology could be considered a viable alternative to traditional examiner/recorder methods of clinical charting.

  15. Familiar Person Recognition: Is Autonoetic Consciousness More Likely to Accompany Face Recognition Than Voice Recognition?

    NASA Astrophysics Data System (ADS)

    Barsics, Catherine; Brédart, Serge

    2010-11-01

    Autonoetic consciousness is a fundamental property of human memory, enabling us to experience mental time travel, to recollect past events with a feeling of self-involvement, and to project ourselves in the future. Autonoetic consciousness is a characteristic of episodic memory. By contrast, awareness of the past associated with a mere feeling of familiarity or knowing relies on noetic consciousness, depending on semantic memory integrity. Present research was aimed at evaluating whether conscious recollection of episodic memories is more likely to occur following the recognition of a familiar face than following the recognition of a familiar voice. Recall of semantic information (biographical information) was also assessed. Previous studies that investigated the recall of biographical information following person recognition used faces and voices of famous people as stimuli. In this study, the participants were presented with personally familiar people's voices and faces, thus avoiding the presence of identity cues in the spoken extracts and allowing a stricter control of frequency exposure with both types of stimuli (voices and faces). In the present study, the rate of retrieved episodic memories, associated with autonoetic awareness, was significantly higher from familiar faces than familiar voices even though the level of overall recognition was similar for both these stimuli domains. The same pattern was observed regarding semantic information retrieval. These results and their implications for current Interactive Activation and Competition person recognition models are discussed.

  16. Robust matching for voice recognition

    NASA Astrophysics Data System (ADS)

    Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

    1994-10-01

    This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.

  17. Superior voice recognition in a patient with acquired prosopagnosia and object agnosia.

    PubMed

    Hoover, Adria E N; Démonet, Jean-François; Steeves, Jennifer K E

    2010-11-01

    Anecdotally, it has been reported that individuals with acquired prosopagnosia compensate for their inability to recognize faces by using other person identity cues such as hair, gait or the voice. Are they therefore superior at the use of non-face cues, specifically voices, to person identity? Here, we empirically measure person and object identity recognition in a patient with acquired prosopagnosia and object agnosia. We quantify person identity (face and voice) and object identity (car and horn) recognition for visual, auditory, and bimodal (visual and auditory) stimuli. The patient is unable to recognize faces or cars, consistent with his prosopagnosia and object agnosia, respectively. He is perfectly able to recognize people's voices and car horns and bimodal stimuli. These data show a reverse shift in the typical weighting of visual over auditory information for audiovisual stimuli in a compromised visual recognition system. Moreover, the patient shows selectively superior voice recognition compared to the controls revealing that two different stimulus domains, persons and objects, may not be equally affected by sensory adaptation effects. This also implies that person and object identity recognition are processed in separate pathways. These data demonstrate that an individual with acquired prosopagnosia and object agnosia can compensate for the visual impairment and become quite skilled at using spared aspects of sensory processing. In the case of acquired prosopagnosia it is advantageous to develop a superior use of voices for person identity recognition in everyday life. Copyright © 2010 Elsevier Ltd. All rights reserved.

  18. Do What I Say! Voice Recognition Makes Major Advances.

    ERIC Educational Resources Information Center

    Ruley, C. Dorsey

    1994-01-01

    Explains voice recognition technology applications in the workplace, schools, and libraries. Highlights include a voice-controlled work station using the DragonDictate system that can be used with dyslexic students, converting text to speech, and converting speech to text. (LRW)

  19. The recognition of female voice based on voice registers in singing techniques in real-time using hankel transform method and macdonald function

    NASA Astrophysics Data System (ADS)

    Meiyanti, R.; Subandi, A.; Fuqara, N.; Budiman, M. A.; Siahaan, A. P. U.

    2018-03-01

    A singer doesn’t just recite the lyrics of a song, but also with the use of particular sound techniques to make it more beautiful. In the singing technique, more female have a diverse sound registers than male. There are so many registers of the human voice, but the voice registers used while singing, among others, Chest Voice, Head Voice, Falsetto, and Vocal fry. Research of speech recognition based on the female’s voice registers in singing technique is built using Borland Delphi 7.0. Speech recognition process performed by the input recorded voice samples and also in real time. Voice input will result in weight energy values based on calculations using Hankel Transformation method and Macdonald Functions. The results showed that the accuracy of the system depends on the accuracy of sound engineering that trained and tested, and obtained an average percentage of the successful introduction of the voice registers record reached 48.75 percent, while the average percentage of the successful introduction of the voice registers in real time to reach 57 percent.

  20. Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

    PubMed Central

    Martinelli, Eugenio; Mencattini, Arianna; Di Natale, Corrado

    2016-01-01

    Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present ‘intelligent personal assistants’, and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants’ emotional state, selective/differential data collection based on emotional content, etc.). PMID:27563724

  1. Voice gender and the segregation of competing talkers: Perceptual learning in cochlear implant simulations

    PubMed Central

    Sullivan, Jessica R.; Assmann, Peter F.; Hossain, Shaikat; Schafer, Erin C.

    2017-01-01

    Two experiments explored the role of differences in voice gender in the recognition of speech masked by a competing talker in cochlear implant simulations. Experiment 1 confirmed that listeners with normal hearing receive little benefit from differences in voice gender between a target and masker sentence in four- and eight-channel simulations, consistent with previous findings that cochlear implants deliver an impoverished representation of the cues for voice gender. However, gender differences led to small but significant improvements in word recognition with 16 and 32 channels. Experiment 2 assessed the benefits of perceptual training on the use of voice gender cues in an eight-channel simulation. Listeners were assigned to one of four groups: (1) word recognition training with target and masker differing in gender; (2) word recognition training with same-gender target and masker; (3) gender recognition training; or (4) control with no training. Significant improvements in word recognition were observed from pre- to post-test sessions for all three training groups compared to the control group. These improvements were maintained at the late session (one week following the last training session) for all three groups. There was an overall improvement in masked word recognition performance provided by gender mismatch following training, but the amount of benefit did not differ as a function of the type of training. The training effects observed here are consistent with a form of rapid perceptual learning that contributes to the segregation of competing voices but does not specifically enhance the benefits provided by voice gender cues. PMID:28372046

  2. Voice, Schooling, Inequality, and Scale

    ERIC Educational Resources Information Center

    Collins, James

    2013-01-01

    The rich studies in this collection show that the investigation of voice requires analysis of "recognition" across layered spatial-temporal and sociolinguistic scales. I argue that the concepts of voice, recognition, and scale provide insight into contemporary educational inequality and that their study benefits, in turn, from paying attention to…

  3. Motorcycle Start-stop System based on Intelligent Biometric Voice Recognition

    NASA Astrophysics Data System (ADS)

    Winda, A.; E Byan, W. R.; Sofyan; Armansyah; Zariantin, D. L.; Josep, B. G.

    2017-03-01

    Current mechanical key in the motorcycle is prone to bulgary, being stolen or misplaced. Intelligent biometric voice recognition as means to replace this mechanism is proposed as an alternative. The proposed system will decide whether the voice is belong to the user or not and the word utter by the user is ‘On’ or ‘Off’. The decision voice will be sent to Arduino in order to start or stop the engine. The recorded voice is processed in order to get some features which later be used as input to the proposed system. The Mel-Frequency Ceptral Coefficient (MFCC) is adopted as a feature extraction technique. The extracted feature is the used as input to the SVM-based identifier. Experimental results confirm the effectiveness of the proposed intelligent voice recognition and word recognition system. It show that the proposed method produces a good training and testing accuracy, 99.31% and 99.43%, respectively. Moreover, the proposed system shows the performance of false rejection rate (FRR) and false acceptance rate (FAR) accuracy of 0.18% and 17.58%, respectively. In the intelligent word recognition shows that the training and testing accuracy are 100% and 96.3%, respectively.

  4. Voice tracking and spoken word recognition in the presence of other voices

    NASA Astrophysics Data System (ADS)

    Litong-Palima, Marisciel; Violanda, Renante; Saloma, Caesar

    2004-12-01

    We study the human hearing process by modeling the hair cell as a thresholded Hopf bifurcator and compare our calculations with experimental results involving human subjects in two different multi-source listening tasks of voice tracking and spoken-word recognition. In the model, we observed noise suppression by destructive interference between noise sources which weakens the effective noise strength acting on the hair cell. Different success rate characteristics were observed for the two tasks. Hair cell performance at low threshold levels agree well with results from voice-tracking experiments while those of word-recognition experiments are consistent with a linear model of the hearing process. The ability of humans to track a target voice is robust against cross-talk interference unlike word-recognition performance which deteriorates quickly with the number of uncorrelated noise sources in the environment which is a response behavior that is associated with linear systems.

  5. Cockpit voice recognition program at Princeton University

    NASA Technical Reports Server (NTRS)

    Huang, C. Y.

    1983-01-01

    Voice recognition technology (VRT) is applied to aeronautics, particularly on the pilot workload alleviation. The VRT does not have to prove its maturity any longer. The feasibility of voice tuning of radio and DME are demonstrated since there are immediate advantages to the pilot and can be completed in a reasonable time.

  6. The processing of auditory and visual recognition of self-stimuli.

    PubMed

    Hughes, Susan M; Nicholson, Shevon E

    2010-12-01

    This study examined self-recognition processing in both the auditory and visual modalities by determining how comparable hearing a recording of one's own voice was to seeing photograph of one's own face. We also investigated whether the simultaneous presentation of auditory and visual self-stimuli would either facilitate or inhibit self-identification. Ninety-one participants completed reaction-time tasks of self-recognition when presented with their own faces, own voices, and combinations of the two. Reaction time and errors made when responding with both the right and left hand were recorded to determine if there were lateralization effects on these tasks. Our findings showed that visual self-recognition for facial photographs appears to be superior to auditory self-recognition for voice recordings. Furthermore, a combined presentation of one's own face and voice appeared to inhibit rather than facilitate self-recognition and there was a left-hand advantage for reaction time on the combined-presentation tasks. Copyright © 2010 Elsevier Inc. All rights reserved.

  7. Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

    PubMed Central

    Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684

  8. Impact of a voice recognition system on report cycle time and radiologist reading time

    NASA Astrophysics Data System (ADS)

    Melson, David L.; Brophy, Robert; Blaine, G. James; Jost, R. Gilbert; Brink, Gary S.

    1998-07-01

    Because of its exciting potential to improve clinical service, as well as reduce costs, a voice recognition system for radiological dictation was recently installed at our institution. This system will be clinically successful if it dramatically reduces radiology report turnaround time without substantially affecting radiologist dictation and editing time. This report summarizes an observer study currently under way in which radiologist reporting times using the traditional transcription system and the voice recognition system are compared. Four radiologists are observed interpreting portable intensive care unit (ICU) chest examinations at a workstation in the chest reading area. Data are recorded with the radiologists using the transcription system and using the voice recognition system. The measurements distinguish between time spent performing clerical tasks and time spent actually dictating the report. Editing time and the number of corrections made are recorded. Additionally, statistics are gathered to assess the voice recognition system's impact on the report cycle time -- the time from report dictation to availability of an edited and finalized report -- and the length of reports.

  9. A Novel Fast and Secure Approach for Voice Encryption Based on DNA Computing

    NASA Astrophysics Data System (ADS)

    Kakaei Kate, Hamidreza; Razmara, Jafar; Isazadeh, Ayaz

    2018-06-01

    Today, in the world of information communication, voice information has a particular importance. One way to preserve voice data from attacks is voice encryption. The encryption algorithms use various techniques such as hashing, chaotic, mixing, and many others. In this paper, an algorithm is proposed for voice encryption based on three different schemes to increase flexibility and strength of the algorithm. The proposed algorithm uses an innovative encoding scheme, the DNA encryption technique and a permutation function to provide a secure and fast solution for voice encryption. The algorithm is evaluated based on various measures including signal to noise ratio, peak signal to noise ratio, correlation coefficient, signal similarity and signal frequency content. The results demonstrate applicability of the proposed method in secure and fast encryption of voice files

  10. Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.

    ERIC Educational Resources Information Center

    Horn, Carin E.; Scott, Brian L.

    A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…

  11. Embodied Transcription: A Creative Method for Using Voice-Recognition Software

    ERIC Educational Resources Information Center

    Brooks, Christine

    2010-01-01

    Voice-recognition software is designed to be used by one user (voice) at a time, requiring a researcher to speak all of the words of a recorded interview to achieve transcription. Thus, the researcher becomes a conduit through which interview material is inscribed as written word. Embodied Transcription acknowledges performative and interpretative…

  12. Investigations of Hemispheric Specialization of Self-Voice Recognition

    ERIC Educational Resources Information Center

    Rosa, Christine; Lassonde, Maryse; Pinard, Claudine; Keenan, Julian Paul; Belin, Pascal

    2008-01-01

    Three experiments investigated functional asymmetries related to self-recognition in the domain of voices. In Experiment 1, participants were asked to identify one of three presented voices (self, familiar or unknown) by responding with either the right or the left-hand. In Experiment 2, participants were presented with auditory morphs between the…

  13. [Computed assisted voice recognition. A dream or reality in the pathologist's routine work?].

    PubMed

    Delling, G; Delling, D

    1999-03-01

    During the last 30 years the analysis of human speech with powerful computers has taken great strides; therefore, cost-effective, comfortable solutions are now available for use in professional routine work. The advantages of using voice recognition are the creation of new documentation or archives, reduced personnel costs and, last but not least, independence in cases of unforeseen notification of illness or owing to annual leave. For voice recognition systems to be used easily, a considerable amount of time must be invested for the first 3 months. Younger colleagues in particular will be more motivated to dictate more precisely and more detailed because of the introduction of voice recognition. The effects on other sectors of medical training, quality control, histology report preparation, and transmission can only be speculated.

  14. Normal voice processing after posterior superior temporal sulcus lesion.

    PubMed

    Jiahui, Guo; Garrido, Lúcia; Liu, Ran R; Susilo, Tirta; Barton, Jason J S; Duchaine, Bradley

    2017-10-01

    The right posterior superior temporal sulcus (pSTS) shows a strong response to voices, but the cognitive processes generating this response are unclear. One possibility is that this activity reflects basic voice processing. However, several fMRI and magnetoencephalography findings suggest instead that pSTS serves as an integrative hub that combines voice and face information. Here we investigate whether right pSTS contributes to basic voice processing by testing Faith, a patient whose right pSTS was resected, with eight behavioral tasks assessing voice identity perception and recognition, voice sex perception, and voice expression perception. Faith performed normally on all the tasks. Her normal performance indicates right pSTS is not necessary for intact voice recognition and suggests that pSTS activations to voices reflect higher-level processes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair

    NASA Astrophysics Data System (ADS)

    Sasou, Akira; Kojima, Hiroaki

    2009-12-01

    Conventional voice-driven wheelchairs usually employ headset microphones that are capable of achieving sufficient recognition accuracy, even in the presence of surrounding noise. However, such interfaces require users to wear sensors such as a headset microphone, which can be an impediment, especially for the hand disabled. Conversely, it is also well known that the speech recognition accuracy drastically degrades when the microphone is placed far from the user. In this paper, we develop a noise robust speech recognition system for a voice-driven wheelchair. This system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors. We verified the effectiveness of our system in experiments in different environments, and confirmed that our system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors.

  16. Impact of Fetal-Neonatal Iron Deficiency on Recognition Memory at 2 Months of Age.

    PubMed

    Geng, Fengji; Mai, Xiaoqin; Zhan, Jianying; Xu, Lin; Zhao, Zhengyan; Georgieff, Michael; Shao, Jie; Lozoff, Betsy

    2015-12-01

    To assess the effects of fetal-neonatal iron deficiency on recognition memory in early infancy. Perinatal iron deficiency delays or disrupts hippocampal development in animal models and thus may impair related neural functions in human infants, such as recognition memory. Event-related potentials were used in an auditory recognition memory task to compare 2-month-old Chinese infants with iron sufficiency or deficiency at birth. Fetal-neonatal iron deficiency was defined 2 ways: high zinc protoporphyrin/heme ratio (ZPP/H > 118 μmol/mol) or low serum ferritin (<75 μg/L) in cord blood. Late slow wave was used to measure infant recognition of mother's voice. Event related potentials patterns differed significantly for fetal-neonatal iron deficiency as defined by high cord ZPP/H but not low ferritin. Comparing 35 infants with iron deficiency (ZPP/H > 118 μmol/mol) to 92 with lower ZPP/H (iron-sufficient), only infants with iron sufficiency showed larger late slow wave amplitude for stranger's voice than mother's voice in frontal-central and parietal-occipital locations, indicating the recognition of mother's voice. Infants with iron sufficiency showed electrophysiological evidence of recognizing their mother's voice, whereas infants with fetal-neonatal iron deficiency did not. Their poorer auditory recognition memory at 2 months of age is consistent with effects of fetal-neonatal iron deficiency on the developing hippocampus. Copyright © 2015 Elsevier Inc. All rights reserved.

  17. Speech therapy and voice recognition instrument

    NASA Technical Reports Server (NTRS)

    Cohen, J.; Babcock, M. L.

    1972-01-01

    Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.

  18. Engineering Evaluation and Assessment (EE and A) Report for the Symbolic and Sub-symbolic Robotics Intelligence Control System (SS-RICS)

    DTIC Science & Technology

    2018-04-01

    Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions...2006. Since that time , SS-RICS has been the integration platform for many robotics algorithms using a variety of different disciplines from cognitive...voice recognition. Each noise level was run 10 times per gender, yielding 60 total runs. Two paths were chosen for testing (Paths A and B) of

  19. In the Beginning Was the Familiar Voice Personally Familiar Voices in the Evolutionary and Contemporary Biology of Communication

    PubMed Central

    Sidtis, Diana; Kreiman, Jody

    2011-01-01

    The human voice is described in dialogic linguistics as an embodiment of self in a social context, contributing to expression, perception and mutual exchange of self, consciousness, inner life, and personhood. While these approaches are subjective and arise from phenomenological perspectives, scientific facts about personal vocal identity, and its role in biological development, support these views. It is our purpose to review studies of the biology of personal vocal identity -- the familiar voice pattern-- as providing an empirical foundation for the view that the human voice is an embodiment of self in the social context. Recent developments in the biology and evolution of communication are concordant with these notions, revealing that familiar voice recognition (also known as vocal identity recognition or individual vocal recognition) or contributed to survival in the earliest vocalizing species. Contemporary ethology documents the crucial role of familiar voices across animal species in signaling and perceiving internal states and personal identities. Neuropsychological studies of voice reveal multimodal cerebral associations arising across brain structures involved in memory, emotion, attention, and arousal in vocal perception and production, such that the voice represents the whole person. Although its roots are in evolutionary biology, human competence for processing layered social and personal meanings in the voice, as well as personal identity in a large repertory of familiar voice patterns, has achieved an immense sophistication. PMID:21710374

  20. Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

    PubMed

    Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

    2018-05-01

    Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.

  1. Writing with Voice: An Investigation of the Use of a Voice Recognition System as a Writing Aid for a Man with Aphasia

    ERIC Educational Resources Information Center

    Bruce, Carolyn; Edmundson, Anne; Coleman, Michael

    2003-01-01

    Background: People with aphasia may experience difficulties that prevent them from demonstrating in writing what they know and can produce orally. Voice recognition systems that allow the user to speak into a microphone and see their words appear on a computer screen have the potential to assist written communication. Aim: This study investigated…

  2. Cost-sensitive learning for emotion robust speaker recognition.

    PubMed

    Li, Dongdong; Yang, Yingchun; Dai, Weihui

    2014-01-01

    In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

  3. Cost-Sensitive Learning for Emotion Robust Speaker Recognition

    PubMed Central

    Li, Dongdong; Yang, Yingchun

    2014-01-01

    In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492

  4. The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

    PubMed

    Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J

    2004-09-01

    The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.

  5. DLMS Voice Data Entry.

    DTIC Science & Technology

    1980-06-01

    34 LIST OF ILLUSTRATIONS FIGURE PAGE 1 Block Diagram of DLMS Voice Recognition System .............. S 2 Flowchart of DefaulV...particular are a speech preprocessor and a minicomputer. In the VRS, as shown in the block diagram of Fig. 1, the preprocessor is a TTI model 8040 and...Data General 6026 Magnetic Zo 4 Tape Unit Display L-- - Equipment Cabinet Fig. 1 block Diagram of DIMS Voice Recognition System qS 2. Flexible Disk

  6. ``The perceptual bases of speaker identity'' revisited

    NASA Astrophysics Data System (ADS)

    Voiers, William D.

    2003-10-01

    A series of experiments begun 40 years ago [W. D. Voiers, J. Acoust. Soc. Am. 36, 1065-1073 (1964)] was concerned with identifying the perceived voice traits (PVTs) on which human recognition of voices depends. It culminated with the development of a voice taxonomy based on 20 PVTs and a set of highly reliable rating scales for classifying voices with respect to those PVTs. The development of a perceptual voice taxonomy was motivated by the need for a practical method of evaluating speaker recognizability in voice communication systems. The Diagnostic Speaker Recognition Test (DSRT) evaluates the effects of systems on speaker recognizability as reflected in changes in the inter-listener reliability of voice ratings on the 20 PVTs. The DSRT thus provides a qualitative, as well as quantitative, evaluation of the effects of a system on speaker recognizability. A fringe benefit of this project is PVT rating data for a sample of 680 voices. [Work partially supported by USAFRL.

  7. The Cambridge Mindreading (CAM) Face-Voice Battery: Testing complex emotion recognition in adults with and without Asperger syndrome.

    PubMed

    Golan, Ofer; Baron-Cohen, Simon; Hill, Jacqueline

    2006-02-01

    Adults with Asperger Syndrome (AS) can recognise simple emotions and pass basic theory of mind tasks, but have difficulties recognising more complex emotions and mental states. This study describes a new battery of tasks, testing recognition of 20 complex emotions and mental states from faces and voices. The battery was given to males and females with AS and matched controls. Results showed the AS group performed worse than controls overall, on emotion recognition from faces and voices and on 12/20 specific emotions. Females recognised faces better than males regardless of diagnosis, and males with AS had more difficulties recognising emotions from faces than from voices. The implications of these results are discussed in relation to social functioning in AS.

  8. Is it me? Self-recognition bias across sensory modalities and its relationship to autistic traits.

    PubMed

    Chakraborty, Anya; Chakrabarti, Bhismadev

    2015-01-01

    Atypical self-processing is an emerging theme in autism research, suggested by lower self-reference effect in memory, and atypical neural responses to visual self-representations. Most research on physical self-processing in autism uses visual stimuli. However, the self is a multimodal construct, and therefore, it is essential to test self-recognition in other sensory modalities as well. Self-recognition in the auditory modality remains relatively unexplored and has not been tested in relation to autism and related traits. This study investigates self-recognition in auditory and visual domain in the general population and tests if it is associated with autistic traits. Thirty-nine neurotypical adults participated in a two-part study. In the first session, individual participant's voice was recorded and face was photographed and morphed respectively with voices and faces from unfamiliar identities. In the second session, participants performed a 'self-identification' task, classifying each morph as 'self' voice (or face) or an 'other' voice (or face). All participants also completed the Autism Spectrum Quotient (AQ). For each sensory modality, slope of the self-recognition curve was used as individual self-recognition metric. These two self-recognition metrics were tested for association between each other, and with autistic traits. Fifty percent 'self' response was reached for a higher percentage of self in the auditory domain compared to the visual domain (t = 3.142; P < 0.01). No significant correlation was noted between self-recognition bias across sensory modalities (τ = -0.165, P = 0.204). Higher recognition bias for self-voice was observed in individuals higher in autistic traits (τ AQ = 0.301, P = 0.008). No such correlation was observed between recognition bias for self-face and autistic traits (τ AQ = -0.020, P = 0.438). Our data shows that recognition bias for physical self-representation is not related across sensory modalities. Further, individuals with higher autistic traits were better able to discriminate self from other voices, but this relation was not observed with self-face. A narrow self-other overlap in the auditory domain seen in individuals with high autistic traits could arise due to enhanced perceptual processing of auditory stimuli often observed in individuals with autism.

  9. Task-Oriented, Naturally Elicited Speech (TONE) Database for the Force Requirements Expert System, Hawaii (FRESH)

    DTIC Science & Technology

    1988-09-01

    Group Subgroup Command and control; Computational linguistics; expert system voice recognition; man- machine interface; U.S. Government 19 Abstract...simulates the characteristics of FRESH on a smaller scale. This study assisted NOSC in developing a voice-recognition, man- machine interface that could...scale. This study assisted NOSC in developing a voice-recogni- tion, man- machine interface that could be used with TONE and upgraded at a later date

  10. Gender recognition from vocal source

    NASA Astrophysics Data System (ADS)

    Sorokin, V. N.; Makarov, I. S.

    2008-07-01

    Efficiency of automatic recognition of male and female voices based on solving the inverse problem for glottis area dynamics and for waveform of the glottal airflow volume velocity pulse is studied. The inverse problem is regularized through the use of analytical models of the voice excitation pulse and of the dynamics of the glottis area, as well as the model of one-dimensional glottal airflow. Parameters of these models and spectral parameters of the volume velocity pulse are considered. The following parameters are found to be most promising: the instant of maximum glottis area, the maximum derivative of the area, the slope of the spectrum of the glottal airflow volume velocity pulse, the amplitude ratios of harmonics of this spectrum, and the pitch. On the plane of the first two main components in the space of these parameters, an almost twofold decrease in the classification error relative to that for the pitch alone is attained. The male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%.

  11. Recognition of voice commands using adaptation of foreign language speech recognizer via selection of phonetic transcriptions

    NASA Astrophysics Data System (ADS)

    Maskeliunas, Rytis; Rudzionis, Vytautas

    2011-06-01

    In recent years various commercial speech recognizers have become available. These recognizers provide the possibility to develop applications incorporating various speech recognition techniques easily and quickly. All of these commercial recognizers are typically targeted to widely spoken languages having large market potential; however, it may be possible to adapt available commercial recognizers for use in environments where less widely spoken languages are used. Since most commercial recognition engines are closed systems the single avenue for the adaptation is to try set ways for the selection of proper phonetic transcription methods between the two languages. This paper deals with the methods to find the phonetic transcriptions for Lithuanian voice commands to be recognized using English speech engines. The experimental evaluation showed that it is possible to find phonetic transcriptions that will enable the recognition of Lithuanian voice commands with recognition accuracy of over 90%.

  12. Structure and weights optimisation of a modified Elman network emotion classifier using hybrid computational intelligence algorithms: a comparative study

    NASA Astrophysics Data System (ADS)

    Sheikhan, Mansour; Abbasnezhad Arabi, Mahdi; Gharavian, Davood

    2015-10-01

    Artificial neural networks are efficient models in pattern recognition applications, but their performance is dependent on employing suitable structure and connection weights. This study used a hybrid method for obtaining the optimal weight set and architecture of a recurrent neural emotion classifier based on gravitational search algorithm (GSA) and its binary version (BGSA), respectively. By considering the features of speech signal that were related to prosody, voice quality, and spectrum, a rich feature set was constructed. To select more efficient features, a fast feature selection method was employed. The performance of the proposed hybrid GSA-BGSA method was compared with similar hybrid methods based on particle swarm optimisation (PSO) algorithm and its binary version, PSO and discrete firefly algorithm, and hybrid of error back-propagation and genetic algorithm that were used for optimisation. Experimental tests on Berlin emotional database demonstrated the superior performance of the proposed method using a lighter network structure.

  13. On combining multi-normalization and ancillary measures for the optimal score level fusion of fingerprint and voice biometrics

    NASA Astrophysics Data System (ADS)

    Mohammed Anzar, Sharafudeen Thaha; Sathidevi, Puthumangalathu Savithri

    2014-12-01

    In this paper, we have considered the utility of multi-normalization and ancillary measures, for the optimal score level fusion of fingerprint and voice biometrics. An efficient matching score preprocessing technique based on multi-normalization is employed for improving the performance of the multimodal system, under various noise conditions. Ancillary measures derived from the feature space and the score space are used in addition to the matching score vectors, for weighing the modalities, based on their relative degradation. Reliability (dispersion) and the separability (inter-/intra-class distance and d-prime statistics) measures under various noise conditions are estimated from the individual modalities, during the training/validation stage. The `best integration weights' are then computed by algebraically combining these measures using the weighted sum rule. The computed integration weights are then optimized against the recognition accuracy using techniques such as grid search, genetic algorithm and particle swarm optimization. The experimental results show that, the proposed biometric solution leads to considerable improvement in the recognition performance even under low signal-to-noise ratio (SNR) conditions and reduces the false acceptance rate (FAR) and false rejection rate (FRR), making the system useful for security as well as forensic applications.

  14. Audio-based, unsupervised machine learning reveals cyclic changes in earthquake mechanisms in the Geysers geothermal field, California

    NASA Astrophysics Data System (ADS)

    Holtzman, B. K.; Paté, A.; Paisley, J.; Waldhauser, F.; Repetto, D.; Boschi, L.

    2017-12-01

    The earthquake process reflects complex interactions of stress, fracture and frictional properties. New machine learning methods reveal patterns in time-dependent spectral properties of seismic signals and enable identification of changes in faulting processes. Our methods are based closely on those developed for music information retrieval and voice recognition, using the spectrogram instead of the waveform directly. Unsupervised learning involves identification of patterns based on differences among signals without any additional information provided to the algorithm. Clustering of 46,000 earthquakes of $0.3

  15. Audiovisual speech facilitates voice learning.

    PubMed

    Sheffert, Sonya M; Olson, Elizabeth

    2004-02-01

    In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.

  16. Generation of surgical pathology report using a 5,000-word speech recognizer.

    PubMed

    Tischler, A S; Martin, M R

    1989-10-01

    Pressures to decrease both turnaround time and operating costs simultaneously have placed conflicting demands on traditional forms of medical transcription. The new technology of voice recognition extends the promise of enabling the pathologist or other medical professional to dictate a correct report and have it printed and/or transmitted to a database immediately. The usefulness of voice recognition systems depends on several factors, including ease of use, reliability, speed, and accuracy. These in turn depend on the general underlying design of the systems and inclusion in the systems of a specific knowledge base appropriate for each application. Development of a good knowledge base requires close collaboration between a domain expert and a knowledge engineer with expertise in voice recognition. The authors have recently completed a knowledge base for surgical pathology using the Kurzweil VoiceReport 5,000-word system.

  17. Applications of artificial intelligence to space station: General purpose intelligent sensor interface

    NASA Technical Reports Server (NTRS)

    Mckee, James W.

    1988-01-01

    This final report describes the accomplishments of the General Purpose Intelligent Sensor Interface task of the Applications of Artificial Intelligence to Space Station grant for the period from October 1, 1987 through September 30, 1988. Portions of the First Biannual Report not revised will not be included but only referenced. The goal is to develop an intelligent sensor system that will simplify the design and development of expert systems using sensors of the physical phenomena as a source of data. This research will concentrate on the integration of image processing sensors and voice processing sensors with a computer designed for expert system development. The result of this research will be the design and documentation of a system in which the user will not need to be an expert in such areas as image processing algorithms, local area networks, image processor hardware selection or interfacing, television camera selection, voice recognition hardware selection, or analog signal processing. The user will be able to access data from video or voice sensors through standard LISP statements without any need to know about the sensor hardware or software.

  18. Understanding the mechanisms of familiar voice-identity recognition in the human brain.

    PubMed

    Maguinness, Corrina; Roswandowitz, Claudia; von Kriegstein, Katharina

    2018-03-31

    Humans have a remarkable skill for voice-identity recognition: most of us can remember many voices that surround us as 'unique'. In this review, we explore the computational and neural mechanisms which may support our ability to represent and recognise a unique voice-identity. We examine the functional architecture of voice-sensitive regions in the superior temporal gyrus/sulcus, and bring together findings on how these regions may interact with each other, and additional face-sensitive regions, to support voice-identity processing. We also contrast findings from studies on neurotypicals and clinical populations which have examined the processing of familiar and unfamiliar voices. Taken together, the findings suggest that representations of familiar and unfamiliar voices might dissociate in the human brain. Such an observation does not fit well with current models for voice-identity processing, which by-and-large assume a common sequential analysis of the incoming voice signal, regardless of voice familiarity. We provide a revised audio-visual integrative model of voice-identity processing which brings together traditional and prototype models of identity processing. This revised model includes a mechanism of how voice-identity representations are established and provides a novel framework for understanding and examining the potential differences in familiar and unfamiliar voice processing in the human brain. Copyright © 2018 Elsevier Ltd. All rights reserved.

  19. Definition of problems of persons in sheltered care environments

    NASA Technical Reports Server (NTRS)

    Fetzner, W. N.

    1979-01-01

    Innovations in health care using aerospace technologies are described. Voice synthesizer and voice recognition technologies were used in developing voice controlled wheel chairs and optacons. Telephone interface modules are also described.

  20. The cognitive neuroscience of person identification.

    PubMed

    Biederman, Irving; Shilowich, Bryan E; Herald, Sarah B; Margalit, Eshed; Maarek, Rafael; Meschke, Emily X; Hacker, Catrina M

    2018-02-14

    We compare and contrast five differences between person identification by voice and face. 1. There is little or no cost when a familiar face is to be recognized from an unrestricted set of possible faces, even at Rapid Serial Visual Presentation (RSVP) rates, but the accuracy of familiar voice recognition declines precipitously when the set of possible speakers is increased from one to a mere handful. 2. Whereas deficits in face recognition are typically perceptual in origin, those with normal perception of voices can manifest severe deficits in their identification. 3. Congenital prosopagnosics (CPros) and congenital phonagnosics (CPhon) are generally unable to imagine familiar faces and voices, respectively. Only in CPros, however, is this deficit a manifestation of a general inability to form visual images of any kind. CPhons report no deficit in imaging non-voice sounds. 4. The prevalence of CPhons of 3.2% is somewhat higher than the reported prevalence of approximately 2.0% for CPros in the population. There is evidence that CPhon represents a distinct condition statistically and not just normal variation. 5. Face and voice recognition proficiency are uncorrelated rather than reflecting limitations of a general capacity for person individuation. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. The Glasgow Voice Memory Test: Assessing the ability to memorize and recognize unfamiliar voices.

    PubMed

    Aglieri, Virginia; Watson, Rebecca; Pernet, Cyril; Latinus, Marianne; Garrido, Lúcia; Belin, Pascal

    2017-02-01

    One thousand one hundred and twenty subjects as well as a developmental phonagnosic subject (KH) along with age-matched controls performed the Glasgow Voice Memory Test, which assesses the ability to encode and immediately recognize, through an old/new judgment, both unfamiliar voices (delivered as vowels, making language requirements minimal) and bell sounds. The inclusion of non-vocal stimuli allows the detection of significant dissociations between the two categories (vocal vs. non-vocal stimuli). The distributions of accuracy and sensitivity scores (d') reflected a wide range of individual differences in voice recognition performance in the population. As expected, KH showed a dissociation between the recognition of voices and bell sounds, her performance being significantly poorer than matched controls for voices but not for bells. By providing normative data of a large sample and by testing a developmental phonagnosic subject, we demonstrated that the Glasgow Voice Memory Test, available online and accessible from all over the world, can be a valid screening tool (~5 min) for a preliminary detection of potential cases of phonagnosia and of "super recognizers" for voices.

  2. Voice-Recognition Augmented Performance Tools in Performance Poetry Pedagogy

    ERIC Educational Resources Information Center

    Devanny, David; McGowan, Jack

    2016-01-01

    This provocation shares findings from the use of bespoke voice-recognition performance software in a number of seminars (which took place in the 2014-2016 academic years at Glasgow School of Art, University of Warwick, and Falmouth University). The software, made available through this publication, is a web-app which uses Google Chrome's native…

  3. Researching the Use of Voice Recognition Writing Software.

    ERIC Educational Resources Information Center

    Honeycutt, Lee

    2003-01-01

    Notes that voice recognition technology (VRT) has become accurate and fast enough to be useful in a variety of writing scenarios. Contends that little is known about how this technology might affect writing process or perceptions of silent writing. Explores future use of VRT by examining past research in the technology of dictation. (PM)

  4. Voice recognition software for clinical use.

    PubMed

    Korn, K

    1998-11-01

    The current generation voice recognition products truly offer the promise of voice recognition systems, that are financially and operationally acceptable for use in a health care facility. Although the initial capital outlay for the purchase of such equipment may be substantial, the long-term benefit is felt to outweigh the expense. The ability to utilize computer equipment for educational purposes and information management alone helps to rationalize the cost. In addition, it is important to remember that the Internet has become a substantial source of information which provides another functional use for this equipment. Although one can readily see the implication for such a program in clinical practice, other uses for the program should not be overlooked. Uses far beyond the writing of clinic notes and correspondence can be easily envisioned. Utilization of voice recognition software offers clinical practices the ability to produce quality printed records in a timely and cost-effective manner. After learning procedures for the selected product and appropriately formatting word processing software and printers, printed progress notes should be able to be produced in less time than traditional dictation and transcription methods. Although certain procedures and practices may need to be altered, or may preclude optimal utilization of this type of system, many advantages are apparent. It is recommended that facilities consider utilization of Voice Recognition products such as Dragon Systems Naturally Speaking Software, or at least consider a trial of this method with one of the limited-feature products, if current dictation practices are unsatisfactory or excessively costly. Free downloadable trial software or single user software can provide a reduced-cost method for trial evaluation of such products if a major commitment is not felt to be desired. A list of voice recognition software manufacturer web sites may be accessed through the following: http://www.dragonsys.com/ http://www.software.ibm/com/is/voicetype/ http://www.lhs.com/

  5. Voice recognition products-an occupational risk for users with ULDs?

    PubMed

    Williams, N R

    2003-10-01

    Voice recognition systems (VRS) allow speech to be converted both directly into text-which appears on the screen of a computer-and to direct equipment to perform specific functions. Suggested applications are many and varied, including increasing efficiency in the reporting of radiographs, allowing directed surgery and enabling individuals with upper limb disorders (ULDs) who cannot use other input devices, such as keyboards and mice, to carry out word processing and other activities. Aim This paper describes four cases of vocal dysfunction related to the use of such software, which have been identified from the database of the Voice and Speech Laboratory of the Massachusetts Eye and Ear infirmary (MEEI). The database was searched using key words 'voice recognition' and four cases were identified from a total of 4800. In all cases, the VRS was supplied to assist individuals with ULDs who could not use conventional input devices. Case reports illustrate time of onset and symptoms experienced. The cases illustrate the need for risk assessment and consideration of the ergonomic aspects of voice use prior to such adaptations being used, particularly in those who already experience work-related ULDs.

  6. Implicit multisensory associations influence voice recognition.

    PubMed

    von Kriegstein, Katharina; Giraud, Anne-Lise

    2006-10-01

    Natural objects provide partially redundant information to the brain through different sensory modalities. For example, voices and faces both give information about the speech content, age, and gender of a person. Thanks to this redundancy, multimodal recognition is fast, robust, and automatic. In unimodal perception, however, only part of the information about an object is available. Here, we addressed whether, even under conditions of unimodal sensory input, crossmodal neural circuits that have been shaped by previous associative learning become activated and underpin a performance benefit. We measured brain activity with functional magnetic resonance imaging before, while, and after participants learned to associate either sensory redundant stimuli, i.e. voices and faces, or arbitrary multimodal combinations, i.e. voices and written names, ring tones, and cell phones or brand names of these cell phones. After learning, participants were better at recognizing unimodal auditory voices that had been paired with faces than those paired with written names, and association of voices with faces resulted in an increased functional coupling between voice and face areas. No such effects were observed for ring tones that had been paired with cell phones or names. These findings demonstrate that brief exposure to ecologically valid and sensory redundant stimulus pairs, such as voices and faces, induces specific multisensory associations. Consistent with predictive coding theories, associative representations become thereafter available for unimodal perception and facilitate object recognition. These data suggest that for natural objects effective predictive signals can be generated across sensory systems and proceed by optimization of functional connectivity between specialized cortical sensory modules.

  7. Artificially intelligent recognition of Arabic speaker using voice print-based local features

    NASA Astrophysics Data System (ADS)

    Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz

    2016-11-01

    Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.

  8. Emotional Recognition in Autism Spectrum Conditions from Voices and Faces

    ERIC Educational Resources Information Center

    Stewart, Mary E.; McAdam, Clair; Ota, Mitsuhiko; Peppe, Sue; Cleland, Joanne

    2013-01-01

    The present study reports on a new vocal emotion recognition task and assesses whether people with autism spectrum conditions (ASC) perform differently from typically developed individuals on tests of emotional identification from both the face and the voice. The new test of vocal emotion contained trials in which the vocal emotion of the sentence…

  9. Cultural In-Group Advantage: Emotion Recognition in African American and European American Faces and Voices

    ERIC Educational Resources Information Center

    Wickline, Virginia B.; Bailey, Wendy; Nowicki, Stephen

    2009-01-01

    The authors explored whether there were in-group advantages in emotion recognition of faces and voices by culture or geographic region. Participants were 72 African American students (33 men, 39 women), 102 European American students (30 men, 72 women), 30 African international students (16 men, 14 women), and 30 European international students…

  10. (Almost) Word for Word: As Voice Recognition Programs Improve, Students Reap the Benefits

    ERIC Educational Resources Information Center

    Smith, Mark

    2006-01-01

    Voice recognition software is hardly new--attempts at capturing spoken words and turning them into written text have been available to consumers for about two decades. But what was once an expensive and highly unreliable tool has made great strides in recent years, perhaps most recognized in programs such as Nuance's Dragon NaturallySpeaking…

  11. Keys to the Adoption and Use of Voice Recognition Technology in Organizations.

    ERIC Educational Resources Information Center

    Goette, Tanya

    2000-01-01

    Presents results from a field study of individuals with disabilities who used voice recognition technology (VRT). Results indicated that task-technology fit, training, the environment, and disability limitations were the differentiating items, and that using VRT for a trial period may be the major factor in successful adoption of the technology.…

  12. Practical applications of interactive voice technologies: Some accomplishments and prospects

    NASA Technical Reports Server (NTRS)

    Grady, Michael W.; Hicklin, M. B.; Porter, J. E.

    1977-01-01

    A technology assessment of the application of computers and electronics to complex systems is presented. Three existing systems which utilize voice technology (speech recognition and speech generation) are described. Future directions in voice technology are also described.

  13. Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach.

    PubMed

    Fang, Shih-Hau; Tsao, Yu; Hsiao, Min-Jing; Chen, Ji-Ying; Lai, Ying-Hui; Lin, Feng-Chuan; Wang, Chi-Te

    2018-03-19

    Computerized detection of voice disorders has attracted considerable academic and clinical interest in the hope of providing an effective screening method for voice diseases before endoscopic confirmation. This study proposes a deep-learning-based approach to detect pathological voice and examines its performance and utility compared with other automatic classification algorithms. This study retrospectively collected 60 normal voice samples and 402 pathological voice samples of 8 common clinical voice disorders in a voice clinic of a tertiary teaching hospital. We extracted Mel frequency cepstral coefficients from 3-second samples of a sustained vowel. The performances of three machine learning algorithms, namely, deep neural network (DNN), support vector machine, and Gaussian mixture model, were evaluated based on a fivefold cross-validation. Collective cases from the voice disorder database of MEEI (Massachusetts Eye and Ear Infirmary) were used to verify the performance of the classification mechanisms. The experimental results demonstrated that DNN outperforms Gaussian mixture model and support vector machine. Its accuracy in detecting voice pathologies reached 94.26% and 90.52% in male and female subjects, based on three representative Mel frequency cepstral coefficient features. When applied to the MEEI database for validation, the DNN also achieved a higher accuracy (99.32%) than the other two classification algorithms. By stacking several layers of neurons with optimized weights, the proposed DNN algorithm can fully utilize the acoustic features and efficiently differentiate between normal and pathological voice samples. Based on this pilot study, future research may proceed to explore more application of DNN from laboratory and clinical perspectives. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  14. Educational Technology and Student Voice: Examining Teacher Candidates' Perceptions

    ERIC Educational Resources Information Center

    Byker, Erik Jon; Putman, S. Michael; Handler, Laura; Polly, Drew

    2017-01-01

    Student Voice is a term that honors the participatory roles that students have when they enter learning spaces like classrooms. Student Voice is the recognition of students' choice, creativity, and freedom. Seminal educationists--like Dewey and Montessori--centered the purposes of education in the flourishing and valuing of Student Voice. This…

  15. Chair alarm for patient fall prevention based on gesture recognition and interactivity.

    PubMed

    Knight, Heather; Lee, Jae-Kyu; Ma, Hongshen

    2008-01-01

    The Gesture Recognition Interactive Technology (GRiT) Chair Alarm aims to prevent patient falls from chairs and wheelchairs by recognizing the gesture of a patient attempting to stand. Patient falls are one of the greatest causes of injury in hospitals. Current chair and bed exit alarm systems are inadequate because of insufficient notification, high false-alarm rate, and long trigger delays. The GRiT chair alarm uses an array of capacitive proximity sensors and pressure sensors to create a map of the patient's sitting position, which is then processed using gesture recognition algorithms to determine when a patient is attempting to stand and to alarm the care providers. This system also uses a range of voice and light feedback to encourage the patient to remain seated and/or to make use of the system's integrated nurse-call function. This system can be seamlessly integrated into existing hospital WiFi networks to send notifications and approximate patient location through existing nurse call systems.

  16. Age- and gender-related variations of emotion recognition in pseudowords and faces.

    PubMed

    Demenescu, Liliana R; Mathiak, Krystyna A; Mathiak, Klaus

    2014-01-01

    BACKGROUND/STUDY CONTEXT: The ability to interpret emotionally salient stimuli is an important skill for successful social functioning at any age. The objective of the present study was to disentangle age and gender effects on emotion recognition ability in voices and faces. Three age groups of participants (young, age range: 18-35 years; middle-aged, age range: 36-55 years; and older, age range: 56-75 years) identified basic emotions presented in voices and faces in a forced-choice paradigm. Five emotions (angry, fearful, sad, disgusted, and happy) and a nonemotional category (neutral) were shown as encoded in color photographs of facial expressions and pseudowords spoken in affective prosody. Overall, older participants had a lower accuracy rate in categorizing emotions than young and middle-aged participants. Females performed better than males in recognizing emotions from voices, and this gender difference emerged in middle-aged and older participants. The performance of emotion recognition in faces was significantly correlated with the performance in voices. The current study provides further evidence for a general age and gender effect on emotion recognition; the advantage of females seems to be age- and stimulus modality-dependent.

  17. Implicit Multisensory Associations Influence Voice Recognition

    PubMed Central

    von Kriegstein, Katharina; Giraud, Anne-Lise

    2006-01-01

    Natural objects provide partially redundant information to the brain through different sensory modalities. For example, voices and faces both give information about the speech content, age, and gender of a person. Thanks to this redundancy, multimodal recognition is fast, robust, and automatic. In unimodal perception, however, only part of the information about an object is available. Here, we addressed whether, even under conditions of unimodal sensory input, crossmodal neural circuits that have been shaped by previous associative learning become activated and underpin a performance benefit. We measured brain activity with functional magnetic resonance imaging before, while, and after participants learned to associate either sensory redundant stimuli, i.e. voices and faces, or arbitrary multimodal combinations, i.e. voices and written names, ring tones, and cell phones or brand names of these cell phones. After learning, participants were better at recognizing unimodal auditory voices that had been paired with faces than those paired with written names, and association of voices with faces resulted in an increased functional coupling between voice and face areas. No such effects were observed for ring tones that had been paired with cell phones or names. These findings demonstrate that brief exposure to ecologically valid and sensory redundant stimulus pairs, such as voices and faces, induces specific multisensory associations. Consistent with predictive coding theories, associative representations become thereafter available for unimodal perception and facilitate object recognition. These data suggest that for natural objects effective predictive signals can be generated across sensory systems and proceed by optimization of functional connectivity between specialized cortical sensory modules. PMID:17002519

  18. How well does voice interaction work in space?

    NASA Technical Reports Server (NTRS)

    Morris, Randy B.; Whitmore, Mihriban; Adam, Susan C.

    1993-01-01

    The methods and results of an evaluation of the Voice Navigator software package are discussed. The first phase or ground phase of the study consisted of creating, or training, computer voice files of specific commands. This consisted of repeating each of six commands eight times. The files were then tested for recognition accuracy by the software aboard the microgravity aircraft. During the second phase, both voice training and testing were performed in microgravity. Inflight training was done due to problems encountered in phase one which were believed to be caused by ambient noise levels. Both quantitative and qualitative data were collected. Only one of the commands was found to offer consistently high recognition rates across subjects during the second phase.

  19. Two-voice fundamental frequency estimation

    NASA Astrophysics Data System (ADS)

    de Cheveigné, Alain

    2002-05-01

    An algorithm is presented that estimates the fundamental frequencies of two concurrent voices or instruments. The algorithm models each voice as a periodic function of time, and jointly estimates both periods by cancellation according to a previously proposed method [de Cheveigné and Kawahara, Speech Commun. 27, 175-185 (1999)]. The new algorithm improves on the old in several respects; it allows an unrestricted search range, effectively avoids harmonic and subharmonic errors, is more accurate (it uses two-dimensional parabolic interpolation), and is computationally less costly. It remains subject to unavoidable errors when periods are in certain simple ratios and the task is inherently ambiguous. The algorithm is evaluated on a small database including speech, singing voice, and instrumental sounds. It can be extended in several ways; to decide the number of voices, to handle amplitude variations, and to estimate more than two voices (at the expense of increased processing cost and decreased reliability). It makes no use of instrument models, learned or otherwise, although it could usefully be combined with such models. [Work supported by the Cognitique programme of the French Ministry of Research and Technology.

  20. An automatic speech recognition system with speaker-independent identification support

    NASA Astrophysics Data System (ADS)

    Caranica, Alexandru; Burileanu, Corneliu

    2015-02-01

    The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.

  1. Voice Based City Panic Button System

    NASA Astrophysics Data System (ADS)

    Febriansyah; Zainuddin, Zahir; Bachtiar Nappu, M.

    2018-03-01

    The development of voice activated panic button application aims to design faster early notification of hazardous condition in community to the nearest police by using speech as the detector where the current application still applies touch-combination on screen and use coordination of orders from control center then the early notification still takes longer time. The method used in this research was by using voice recognition as the user voice detection and haversine formula for the comparison of closest distance between the user and the police. This research was equipped with auto sms, which sent notification to the victim’s relatives, that was also integrated with Google Maps application (GMaps) as the map to the victim’s location. The results show that voice registration on the application reaches 100%, incident detection using speech recognition while the application is running is 94.67% in average, and the auto sms to the victim relatives reaches 100%.

  2. Project planning, training, measurement and sustainment: the successful implementation of voice recognition.

    PubMed

    Antiles, S; Couris, J; Schweitzer, A; Rosenthal, D; Da Silva, R Q

    2000-01-01

    Computerized voice recognition systems (VR) can reduce costs and enhance service. The capital outlay required for conversion to a VR system is significant; therefore, it is incumbent on radiology departments to provide cost and service justifications to administrators. Massachusetts General Hospital (MGH) in Boston implemented VR over a two-year period and achieved annual savings of $530,000 and a 50% decrease in report throughput. Those accomplishments required solid planning and implementation strategies, training and sustainment programs. This article walks through the process, step by step, in the hope of providing a tool set for future implementations. Because VR has dramatic implications for workflow, a solid operational plan is needed when assessing vendors and planning for implementation. The goals for implementation should be to minimize operational disruptions and capitalize on efficiencies of the technology. Senior leadership--the department chair or vice-chair--must select the goals to be accomplished and oversee, manage and direct the VR initiative. The importance of this point cannot be overstated, since implementation will require behavior changes from radiologists and others who may not perceive any personal benefits. Training is the pivotal factor affecting the success of voice recognition, and practice is the only way for radiologists to enhance their skills. Through practice, radiologists will discover shortcuts, and their speed and comfort will improve. Measurement and data analysis are critical to changing and improving the voice recognition application and are vital to decision-making. Some of the issues about which valuable date can be collected are technical and educational problems, VR penetration, report turnaround time and annual cost savings. Sustained effort is indispensable to the maintenance of voice recognition. Finally, all efforts made and gains achieved may prove to be futile without ongoing sustainment of the system through retraining, education and technical support.

  3. Selective attention in perceptual adjustments to voice.

    PubMed

    Mullennix, J W; Howe, J N

    1999-10-01

    The effects of perceptual adjustments to voice information on the perception of isolated spoken words were examined. In two experiments, spoken target words were preceded or followed within a trial by a neutral word spoken in the same voice or in a different voice as the target. Over-all, words were reproduced more accurately on trials on which the voice of the neutral word matched the voice of the spoken target word, suggesting that perceptual adjustments to voice interfere with word processing. This result, however, was mediated by selective attention to voice. The results provide further evidence of a close processing relationship between perceptual adjustments to voice and spoken word recognition.

  4. Whispering - The hidden side of auditory communication.

    PubMed

    Frühholz, Sascha; Trost, Wiebke; Grandjean, Didier

    2016-11-15

    Whispering is a unique expression mode that is specific to auditory communication. Individuals switch their vocalization mode to whispering especially when affected by inner emotions in certain social contexts, such as in intimate relationships or intimidating social interactions. Although this context-dependent whispering is adaptive, whispered voices are acoustically far less rich than phonated voices and thus impose higher hearing and neural auditory decoding demands for recognizing their socio-affective value by listeners. The neural dynamics underlying this recognition especially from whispered voices are largely unknown. Here we show that whispered voices in humans are considerably impoverished as quantified by an entropy measure of spectral acoustic information, and this missing information needs large-scale neural compensation in terms of auditory and cognitive processing. Notably, recognizing the socio-affective information from voices was slightly more difficult from whispered voices, probably based on missing tonal information. While phonated voices elicited extended activity in auditory regions for decoding of relevant tonal and time information and the valence of voices, whispered voices elicited activity in a complex auditory-frontal brain network. Our data suggest that a large-scale multidirectional brain network compensates for the impoverished sound quality of socially meaningful environmental signals to support their accurate recognition and valence attribution. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. Crossmodal plasticity in the fusiform gyrus of late blind individuals during voice recognition.

    PubMed

    Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

    2014-12-01

    Blind individuals are trained in identifying other people through voices. In congenitally blind adults the anterior fusiform gyrus has been shown to be active during voice recognition. Such crossmodal changes have been associated with a superiority of blind adults in voice perception. The key question of the present functional magnetic resonance imaging (fMRI) study was whether visual deprivation that occurs in adulthood is followed by similar adaptive changes of the voice identification system. Late blind individuals and matched sighted participants were tested in a priming paradigm, in which two voice stimuli were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either coming from an old or a young person. Only in late blind but not in matched sighted controls, the activation in the anterior fusiform gyrus was modulated by voice identity: late blind volunteers showed an increase of the BOLD signal in response to person-incongruent compared with person-congruent trials. These results suggest that the fusiform gyrus adapts to input of a new modality even in the mature brain and thus demonstrate an adult type of crossmodal plasticity. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners.

    PubMed

    Paulmann, Silke; Uskul, Ayse K

    2014-01-01

    This cross-cultural study of emotional tone of voice recognition tests the in-group advantage hypothesis (Elfenbein & Ambady, 2002) employing a quasi-balanced design. Individuals of Chinese and British background were asked to recognise pseudosentences produced by Chinese and British native speakers, displaying one of seven emotions (anger, disgust, fear, happy, neutral tone of voice, sad, and surprise). Findings reveal that emotional displays were recognised at rates higher than predicted by chance; however, members of each cultural group were more accurate in recognising the displays communicated by a member of their own cultural group than a member of the other cultural group. Moreover, the evaluation of error matrices indicates that both culture groups relied on similar mechanism when recognising emotional displays from the voice. Overall, the study reveals evidence for both universal and culture-specific principles in vocal emotion recognition.

  7. Processing voiceless vowels in Japanese: Effects of language-specific phonological knowledge

    NASA Astrophysics Data System (ADS)

    Ogasawara, Naomi

    2005-04-01

    There has been little research on processing allophonic variation in the field of psycholinguistics. This study focuses on processing the voiced/voiceless allophonic alternation of high vowels in Japanese. Three perception experiments were conducted to explore how listeners parse out vowels with the voicing alternation from other segments in the speech stream and how the different voicing statuses of the vowel affect listeners' word recognition process. The results from the three experiments show that listeners use phonological knowledge of their native language for phoneme processing and for word recognition. However, interactions of the phonological and acoustic effects are observed to be different in each process. The facilitatory phonological effect and the inhibitory acoustic effect cancel out one another in phoneme processing; while in word recognition, the facilitatory phonological effect overrides the inhibitory acoustic effect.

  8. Separation of Singing Voice from Music Accompaniment for Monaural Recordings

    DTIC Science & Technology

    2005-09-01

    Directory: pub/tech-report/2005 File in pdf format: TR61.pdf Separation of Singing Voice from Music Accompaniment for Monaural Recordings Yipeng Li...Abstract Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer...identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little

  9. Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

    PubMed

    Schall, Sonja; von Kriegstein, Katharina

    2014-01-01

    It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers' voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker's face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.

  10. Improving Speaker Recognition by Biometric Voice Deconstruction

    PubMed Central

    Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

    2015-01-01

    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions. PMID:26442245

  11. Improving Speaker Recognition by Biometric Voice Deconstruction.

    PubMed

    Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

    2015-01-01

    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.

  12. Postlingual adult performance in noise with HiRes 120 and ClearVoice Low, Medium, and High.

    PubMed

    Holden, Laura K; Brenner, Christine; Reeder, Ruth M; Firszt, Jill B

    2013-11-01

    The study's objectives were to evaluate speech recognition in multiple listening conditions using several noise types with HiRes 120 and ClearVoice (Low, Medium, High) and to determine which ClearVoice program was most beneficial for everyday use. Fifteen postlingual adults attended four sessions; speech recognition was assessed at sessions 1 and 3 with HiRes 120 and at sessions 2 and 4 with all ClearVoice programs. Test measures included sentences presented in restaurant noise (R-SPACE), in speech-spectrum noise, in four- and eight-talker babble, and connected discourse presented in 12-talker babble. Participants completed a questionnaire comparing ClearVoice programs. Significant group differences in performance between HiRes 120 and ClearVoice were present only in the R-SPACE; performance was better with ClearVoice High than HiRes 120. Among ClearVoice programs, no significant group differences were present for any measure. Individual results revealed most participants performed better in the R-SPACE with ClearVoice than HiRes 120. For other measures, significant individual differences between HiRes 120 and ClearVoice were not prevalent. Individual results among ClearVoice programs differed and overall preferences varied. Questionnaire data indicated increased understanding with High and Medium in certain environments. R-SPACE and questionnaire results indicated an advantage for ClearVoice High and Medium. Individual test and preference data showed mixed results between ClearVoice programs making global recommendations difficult; however, results suggest providing ClearVoice High and Medium and HiRes 120 as processor options for adults willing to change settings. For adults unwilling or unable to change settings, ClearVoice Medium is a practical choice for daily listening.

  13. Robotics control using isolated word recognition of voice input

    NASA Technical Reports Server (NTRS)

    Weiner, J. M.

    1977-01-01

    A speech input/output system is presented that can be used to communicate with a task oriented system. Human speech commands and synthesized voice output extend conventional information exchange capabilities between man and machine by utilizing audio input and output channels. The speech input facility is comprised of a hardware feature extractor and a microprocessor implemented isolated word or phrase recognition system. The recognizer offers a medium sized (100 commands), syntactically constrained vocabulary, and exhibits close to real time performance. The major portion of the recognition processing required is accomplished through software, minimizing the complexity of the hardware feature extractor.

  14. Real-Time Reconfigurable Adaptive Speech Recognition Command and Control Apparatus and Method

    NASA Technical Reports Server (NTRS)

    Salazar, George A. (Inventor); Haynes, Dena S. (Inventor); Sommers, Marc J. (Inventor)

    1998-01-01

    An adaptive speech recognition and control system and method for controlling various mechanisms and systems in response to spoken instructions and in which spoken commands are effective to direct the system into appropriate memory nodes, and to respective appropriate memory templates corresponding to the voiced command is discussed. Spoken commands from any of a group of operators for which the system is trained may be identified, and voice templates are updated as required in response to changes in pronunciation and voice characteristics over time of any of the operators for which the system is trained. Provisions are made for both near-real-time retraining of the system with respect to individual terms which are determined not be positively identified, and for an overall system training and updating process in which recognition of each command and vocabulary term is checked, and in which the memory templates are retrained if necessary for respective commands or vocabulary terms with respect to an operator currently using the system. In one embodiment, the system includes input circuitry connected to a microphone and including signal processing and control sections for sensing the level of vocabulary recognition over a given period and, if recognition performance falls below a given level, processing audio-derived signals for enhancing recognition performance of the system.

  15. Studies in automatic speech recognition and its application in aerospace

    NASA Astrophysics Data System (ADS)

    Taylor, Michael Robinson

    Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.

  16. Biometric Fusion Demonstration System Scientific Report

    DTIC Science & Technology

    2004-03-01

    verification and facial recognition , searching watchlist databases comprised of full or partial facial images or voice recordings. Multiple-biometric...17 2.2.1.1 Fingerprint and Facial Recognition ............................... 17...iv DRDC Ottawa CR 2004 – 056 2.2.1.2 Iris Recognition and Facial Recognition ........................ 18

  17. Freedom to Grow: Children's Perspectives of Student Voice

    ERIC Educational Resources Information Center

    Quinn, Sarah; Owen, Susanne

    2014-01-01

    This article explores the power of student voice, in recognition of the child's right to be treated as a capable, competent social actor involved in the education process. In this study, student voice is considered in the light of improving students' engagement and personal and social development at the primary school level. It emphasizes the…

  18. Recognition of facial, auditory, and bodily emotions in older adults.

    PubMed

    Ruffman, Ted; Halberstadt, Jamin; Murray, Janice

    2009-11-01

    Understanding older adults' social functioning difficulties requires insight into their recognition of emotion processing in voices and bodies, not just faces, the focus of most prior research. We examined 60 young and 61 older adults' recognition of basic emotions in facial, vocal, and bodily expressions, and when matching faces and bodies to voices, using 120 emotion items. Older adults were worse than young adults in 17 of 30 comparisons, with consistent difficulties in recognizing both positive (happy) and negative (angry and sad) vocal and bodily expressions. Nearly three quarters of older adults functioned at a level similar to the lowest one fourth of young adults, suggesting that age-related changes are common. In addition, we found that older adults' difficulty in matching emotions was not explained by difficulty on the component sources (i.e., faces or voices on their own), suggesting an additional problem of integration.

  19. Secure Recognition of Voice-Less Commands Using Videos

    NASA Astrophysics Data System (ADS)

    Yau, Wai Chee; Kumar, Dinesh Kant; Weghorn, Hans

    Interest in voice recognition technologies for internet applications is growing due to the flexibility of speech-based communication. The major drawback with the use of sound for internet access with computers is that the commands will be audible to other people in the vicinity. This paper examines a secure and voice-less method for recognition of speech-based commands using video without evaluating sound signals. The proposed approach represents mouth movements in the video data using 2D spatio-temporal templates (STT). Zernike moments (ZM) are computed from STT and fed into support vector machines (SVM) to be classified into one of the utterances. The experimental results demonstrate that the proposed technique produces a high accuracy of 98% in a phoneme classification task. The proposed technique is demonstrated to be invariant to global variations of illumination level. Such a system is useful for securely interpreting user commands for internet applications on mobile devices.

  20. Early Detection of Severe Apnoea through Voice Analysis and Automatic Speaker Recognition Techniques

    NASA Astrophysics Data System (ADS)

    Fernández, Ruben; Blanco, Jose Luis; Díaz, David; Hernández, Luis A.; López, Eduardo; Alcázar, José

    This study is part of an on-going collaborative effort between the medical and the signal processing communities to promote research on applying voice analysis and Automatic Speaker Recognition techniques (ASR) for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based diagnosis could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we present and discuss the possibilities of using generative Gaussian Mixture Models (GMMs), generally used in ASR systems, to model distinctive apnoea voice characteristics (i.e. abnormal nasalization). Finally, we present experimental findings regarding the discriminative power of speaker recognition techniques applied to severe apnoea detection. We have achieved an 81.25 % correct classification rate, which is very promising and underpins the interest in this line of inquiry.

  1. Remote voice training: A case study on space shuttle applications, appendix C

    NASA Technical Reports Server (NTRS)

    Mollakarimi, Cindy; Hamid, Tamin

    1990-01-01

    The Tile Automation System includes applications of automation and robotics technology to all aspects of the Shuttle tile processing and inspection system. An integrated set of rapid prototyping testbeds was developed which include speech recognition and synthesis, laser imaging systems, distributed Ada programming environments, distributed relational data base architectures, distributed computer network architectures, multi-media workbenches, and human factors considerations. Remote voice training in the Tile Automation System is discussed. The user is prompted over a headset by synthesized speech for the training sequences. The voice recognition units and the voice output units are remote from the user and are connected by Ethernet to the main computer system. A supervisory channel is used to monitor the training sequences. Discussions include the training approaches as well as the human factors problems and solutions for this system utilizing remote training techniques.

  2. Statistical Evaluation of Biometric Evidence in Forensic Automatic Speaker Recognition

    NASA Astrophysics Data System (ADS)

    Drygajlo, Andrzej

    Forensic speaker recognition is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace). This paper aims at presenting forensic automatic speaker recognition (FASR) methods that provide a coherent way of quantifying and presenting recorded voice as biometric evidence. In such methods, the biometric evidence consists of the quantified degree of similarity between speaker-dependent features extracted from the trace and speaker-dependent features extracted from recorded speech of a suspect. The interpretation of recorded voice as evidence in the forensic context presents particular challenges, including within-speaker (within-source) variability and between-speakers (between-sources) variability. Consequently, FASR methods must provide a statistical evaluation which gives the court an indication of the strength of the evidence given the estimated within-source and between-sources variabilities. This paper reports on the first ENFSI evaluation campaign through a fake case, organized by the Netherlands Forensic Institute (NFI), as an example, where an automatic method using the Gaussian mixture models (GMMs) and the Bayesian interpretation (BI) framework were implemented for the forensic speaker recognition task.

  3. Onset and Maturation of Fetal Heart Rate Response to the Mother's Voice over Late Gestation

    ERIC Educational Resources Information Center

    Kisilevsky, Barbara S.; Hains, Sylvia M. J.

    2011-01-01

    Background: Term fetuses discriminate their mother's voice from a female stranger's, suggesting recognition/learning of some property of her voice. Identification of the onset and maturation of the response would increase our understanding of the influence of environmental sounds on the development of sensory abilities and identify the period when…

  4. Functional Connectivity between Face-Movement and Speech-Intelligibility Areas during Auditory-Only Speech Perception

    PubMed Central

    Schall, Sonja; von Kriegstein, Katharina

    2014-01-01

    It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas. PMID:24466026

  5. Implementation of the Intelligent Voice System for Kazakh

    NASA Astrophysics Data System (ADS)

    Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

    2014-04-01

    Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.

  6. Convolutional Neural Network-Based Embarrassing Situation Detection under Camera for Social Robot in Smart Homes

    PubMed Central

    Sheng, Weihua; Junior, Francisco Erivaldo Fernandes; Li, Shaobo

    2018-01-01

    Recent research has shown that the ubiquitous use of cameras and voice monitoring equipment in a home environment can raise privacy concerns and affect human mental health. This can be a major obstacle to the deployment of smart home systems for elderly or disabled care. This study uses a social robot to detect embarrassing situations. Firstly, we designed an improved neural network structure based on the You Only Look Once (YOLO) model to obtain feature information. By focusing on reducing area redundancy and computation time, we proposed a bounding-box merging algorithm based on region proposal networks (B-RPN), to merge the areas that have similar features and determine the borders of the bounding box. Thereafter, we designed a feature extraction algorithm based on our improved YOLO and B-RPN, called F-YOLO, for our training datasets, and then proposed a real-time object detection algorithm based on F-YOLO (RODA-FY). We implemented RODA-FY and compared models on our MAT social robot. Secondly, we considered six types of situations in smart homes, and developed training and validation datasets, containing 2580 and 360 images, respectively. Meanwhile, we designed three types of experiments with four types of test datasets composed of 960 sample images. Thirdly, we analyzed how a different number of training iterations affects our prediction estimation, and then we explored the relationship between recognition accuracy and learning rates. Our results show that our proposed privacy detection system can recognize designed situations in the smart home with an acceptable recognition accuracy of 94.48%. Finally, we compared the results among RODA-FY, Inception V3, and YOLO, which indicate that our proposed RODA-FY outperforms the other comparison models in recognition accuracy. PMID:29757211

  7. Convolutional Neural Network-Based Embarrassing Situation Detection under Camera for Social Robot in Smart Homes.

    PubMed

    Yang, Guanci; Yang, Jing; Sheng, Weihua; Junior, Francisco Erivaldo Fernandes; Li, Shaobo

    2018-05-12

    Recent research has shown that the ubiquitous use of cameras and voice monitoring equipment in a home environment can raise privacy concerns and affect human mental health. This can be a major obstacle to the deployment of smart home systems for elderly or disabled care. This study uses a social robot to detect embarrassing situations. Firstly, we designed an improved neural network structure based on the You Only Look Once (YOLO) model to obtain feature information. By focusing on reducing area redundancy and computation time, we proposed a bounding-box merging algorithm based on region proposal networks (B-RPN), to merge the areas that have similar features and determine the borders of the bounding box. Thereafter, we designed a feature extraction algorithm based on our improved YOLO and B-RPN, called F-YOLO, for our training datasets, and then proposed a real-time object detection algorithm based on F-YOLO (RODA-FY). We implemented RODA-FY and compared models on our MAT social robot. Secondly, we considered six types of situations in smart homes, and developed training and validation datasets, containing 2580 and 360 images, respectively. Meanwhile, we designed three types of experiments with four types of test datasets composed of 960 sample images. Thirdly, we analyzed how a different number of training iterations affects our prediction estimation, and then we explored the relationship between recognition accuracy and learning rates. Our results show that our proposed privacy detection system can recognize designed situations in the smart home with an acceptable recognition accuracy of 94.48%. Finally, we compared the results among RODA-FY, Inception V3, and YOLO, which indicate that our proposed RODA-FY outperforms the other comparison models in recognition accuracy.

  8. Neurocognition and symptoms identify links between facial recognition and emotion processing in schizophrenia: Meta-analytic findings

    PubMed Central

    Ventura, Joseph; Wood, Rachel C.; Jimenez, Amy M.; Hellemann, Gerhard S.

    2014-01-01

    Background In schizophrenia patients, one of the most commonly studied deficits of social cognition is emotion processing (EP), which has documented links to facial recognition (FR). But, how are deficits in facial recognition linked to emotion processing deficits? Can neurocognitive and symptom correlates of FR and EP help differentiate the unique contribution of FR to the domain of social cognition? Methods A meta-analysis of 102 studies (combined n = 4826) in schizophrenia patients was conducted to determine the magnitude and pattern of relationships between facial recognition, emotion processing, neurocognition, and type of symptom. Results Meta-analytic results indicated that facial recognition and emotion processing are strongly interrelated (r = .51). In addition, the relationship between FR and EP through voice prosody (r = .58) is as strong as the relationship between FR and EP based on facial stimuli (r = .53). Further, the relationship between emotion recognition, neurocognition, and symptoms is independent of the emotion processing modality – facial stimuli and voice prosody. Discussion The association between FR and EP that occurs through voice prosody suggests that FR is a fundamental cognitive process. The observed links between FR and EP might be due to bottom-up associations between neurocognition and EP, and not simply because most emotion recognition tasks use visual facial stimuli. In addition, links with symptoms, especially negative symptoms and disorganization, suggest possible symptom mechanisms that contribute to FR and EP deficits. PMID:24268469

  9. Neurocognition and symptoms identify links between facial recognition and emotion processing in schizophrenia: meta-analytic findings.

    PubMed

    Ventura, Joseph; Wood, Rachel C; Jimenez, Amy M; Hellemann, Gerhard S

    2013-12-01

    In schizophrenia patients, one of the most commonly studied deficits of social cognition is emotion processing (EP), which has documented links to facial recognition (FR). But, how are deficits in facial recognition linked to emotion processing deficits? Can neurocognitive and symptom correlates of FR and EP help differentiate the unique contribution of FR to the domain of social cognition? A meta-analysis of 102 studies (combined n=4826) in schizophrenia patients was conducted to determine the magnitude and pattern of relationships between facial recognition, emotion processing, neurocognition, and type of symptom. Meta-analytic results indicated that facial recognition and emotion processing are strongly interrelated (r=.51). In addition, the relationship between FR and EP through voice prosody (r=.58) is as strong as the relationship between FR and EP based on facial stimuli (r=.53). Further, the relationship between emotion recognition, neurocognition, and symptoms is independent of the emotion processing modality - facial stimuli and voice prosody. The association between FR and EP that occurs through voice prosody suggests that FR is a fundamental cognitive process. The observed links between FR and EP might be due to bottom-up associations between neurocognition and EP, and not simply because most emotion recognition tasks use visual facial stimuli. In addition, links with symptoms, especially negative symptoms and disorganization, suggest possible symptom mechanisms that contribute to FR and EP deficits. © 2013 Elsevier B.V. All rights reserved.

  10. A voice-input voice-output communication aid for people with severe speech impairment.

    PubMed

    Hawley, Mark S; Cunningham, Stuart P; Green, Phil D; Enderby, Pam; Palmer, Rebecca; Sehgal, Siddharth; O'Neill, Peter

    2013-01-01

    A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.

  11. Evaluating a voice recognition system: finding the right product for your department.

    PubMed

    Freeh, M; Dewey, M; Brigham, L

    2001-06-01

    The Department of Radiology at the University of Utah Health Sciences Center has been in the process of transitioning from the traditional film-based department to a digital imaging department for the past 2 years. The department is now transitioning from the traditional method of dictating reports (dictation by radiologist to transcription to review and signing by radiologist) to a voice recognition system. The transition to digital operations will not be complete until we have the ability to directly interface the dictation process with the image review process. Voice recognition technology has advanced to the level where it can and should be an integral part of the new way of working in radiology and is an integral part of an efficient digital imaging department. The transition to voice recognition requires the task of identifying the product and the company that will best meet a department's needs. This report introduces the methods we used to evaluate the vendors and the products available as we made our purchasing decision. We discuss our evaluation method and provide a checklist that can be used by other departments to assist with their evaluation process. The criteria used in the evaluation process fall into the following major categories: user operations, technical infrastructure, medical dictionary, system interfaces, service support, cost, and company strength. Conclusions drawn from our evaluation process will be detailed, with the intention being to shorten the process for others as they embark on a similar venture. As more and more organizations investigate the many products and services that are now being offered to enhance the operations of a radiology department, it becomes increasingly important that solid methods are used to most effectively evaluate the new products. This report should help others complete the task of evaluating a voice recognition system and may be adaptable to other products as well.

  12. Application of Pattern Recognition Techniques to the Classification of Full-Term and Preterm Infant Cry.

    PubMed

    Orlandi, Silvia; Reyes Garcia, Carlos Alberto; Bandini, Andrea; Donzelli, Gianpaolo; Manfredi, Claudia

    2016-11-01

    Scientific and clinical advances in perinatology and neonatology have enhanced the chances of survival of preterm and very low weight neonates. Infant cry analysis is a suitable noninvasive complementary tool to assess the neurologic state of infants particularly important in the case of preterm neonates. This article aims at exploiting differences between full-term and preterm infant cry with robust automatic acoustical analysis and data mining techniques. Twenty-two acoustical parameters are estimated in more than 3000 cry units from cry recordings of 28 full-term and 10 preterm newborns. Feature extraction is performed through the BioVoice dedicated software tool, developed at the Biomedical Engineering Lab, University of Firenze, Italy. Classification and pattern recognition is based on genetic algorithms for the selection of the best attributes. Training is performed comparing four classifiers: Logistic Curve, Multilayer Perceptron, Support Vector Machine, and Random Forest and three different testing options: full training set, 10-fold cross-validation, and 66% split. Results show that the best feature set is made up by 10 parameters capable to assess differences between preterm and full-term newborns with about 87% of accuracy. Best results are obtained with the Random Forest method (receiver operating characteristic area, 0.94). These 10 cry features might convey important additional information to assist the clinical specialist in the diagnosis and follow-up of possible delays or disorders in the neurologic development due to premature birth in this extremely vulnerable population of patients. The proposed approach is a first step toward an automatic infant cry recognition system for fast and proper identification of risk in preterm babies. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  13. A Voice Enabled Procedure Browser for the International Space Station

    NASA Technical Reports Server (NTRS)

    Rayner, Manny; Chatzichrisafis, Nikos; Hockey, Beth Ann; Farrell, Kim; Renders, Jean-Michel

    2005-01-01

    Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station (ISS), is to the best of our knowledge the first spoken dialog system in space. This paper gives background on the system and the ISS procedures, then discusses the research developed to address three key problems: grammar-based speech recognition using the Regulus toolkit; SVM based methods for open microphone speech recognition; and robust side-effect free dialogue management for handling undos, corrections and confirmations.

  14. Voice emotion recognition by cochlear-implanted children and their normally-hearing peers

    PubMed Central

    Chatterjee, Monita; Zion, Danielle; Deroche, Mickael L.; Burianek, Brooke; Limb, Charles; Goren, Alison; Kulkarni, Aditya M.; Christensen, Julie A.

    2014-01-01

    Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups’ mean performance is similar to aNHs’ performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. PMID:25448167

  15. STS-41 Voice Command System Flight Experiment Report

    NASA Technical Reports Server (NTRS)

    Salazar, George A.

    1981-01-01

    This report presents the results of the Voice Command System (VCS) flight experiment on the five-day STS-41 mission. Two mission specialists,Bill Shepherd and Bruce Melnick, used the speaker-dependent system to evaluate the operational effectiveness of using voice to control a spacecraft system. In addition, data was gathered to analyze the effects of microgravity on speech recognition performance.

  16. Speech enhancement on smartphone voice recording

    NASA Astrophysics Data System (ADS)

    Tris Atmaja, Bagus; Nur Farid, Mifta; Arifianto, Dhany

    2016-11-01

    Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation.

  17. A 4.8 kbps code-excited linear predictive coder

    NASA Technical Reports Server (NTRS)

    Tremain, Thomas E.; Campbell, Joseph P., Jr.; Welch, Vanoy C.

    1988-01-01

    A secure voice system STU-3 capable of providing end-to-end secure voice communications (1984) was developed. The terminal for the new system will be built around the standard LPC-10 voice processor algorithm. The performance of the present STU-3 processor is considered to be good, its response to nonspeech sounds such as whistles, coughs and impulse-like noises may not be completely acceptable. Speech in noisy environments also causes problems with the LPC-10 voice algorithm. In addition, there is always a demand for something better. It is hoped that LPC-10's 2.4 kbps voice performance will be complemented with a very high quality speech coder operating at a higher data rate. This new coder is one of a number of candidate algorithms being considered for an upgraded version of the STU-3 in late 1989. The problems of designing a code-excited linear predictive (CELP) coder to provide very high quality speech at a 4.8 kbps data rate that can be implemented on today's hardware are considered.

  18. The rise of deep learning in drug discovery.

    PubMed

    Chen, Hongming; Engkvist, Ola; Wang, Yinhai; Olivecrona, Marcus; Blaschke, Thomas

    2018-06-01

    Over the past decade, deep learning has achieved remarkable success in various artificial intelligence research areas. Evolved from the previous research on artificial neural networks, this technology has shown superior performance to other machine learning algorithms in areas such as image and voice recognition, natural language processing, among others. The first wave of applications of deep learning in pharmaceutical research has emerged in recent years, and its utility has gone beyond bioactivity predictions and has shown promise in addressing diverse problems in drug discovery. Examples will be discussed covering bioactivity prediction, de novo molecular design, synthesis prediction and biological image analysis. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  19. Scientific bases of human-machine communication by voice.

    PubMed Central

    Schafer, R W

    1995-01-01

    The scientific bases for human-machine communication by voice are in the fields of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology. The purpose of this paper is to highlight the basic scientific and technological issues in human-machine communication by voice and to point out areas of future research opportunity. The discussion is organized around the following major issues in implementing human-machine voice communication systems: (i) hardware/software implementation of the system, (ii) speech synthesis for voice output, (iii) speech recognition and understanding for voice input, and (iv) usability factors related to how humans interact with machines. PMID:7479802

  20. Current trends in small vocabulary speech recognition for equipment control

    NASA Astrophysics Data System (ADS)

    Doukas, Nikolaos; Bardis, Nikolaos G.

    2017-09-01

    Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.

  1. Emotion Recognition From Singing Voices Using Contemporary Commercial Music and Classical Styles.

    PubMed

    Hakanpää, Tua; Waaramaa, Teija; Laukkanen, Anne-Maria

    2018-02-22

    This study examines the recognition of emotion in contemporary commercial music (CCM) and classical styles of singing. This information may be useful in improving the training of interpretation in singing. This is an experimental comparative study. Thirteen singers (11 female, 2 male) with a minimum of 3 years' professional-level singing studies (in CCM or classical technique or both) participated. They sang at three pitches (females: a, e1, a1, males: one octave lower) expressing anger, sadness, joy, tenderness, and a neutral state. Twenty-nine listeners listened to 312 short (0.63- to 4.8-second) voice samples, 135 of which were sung using a classical singing technique and 165 of which were sung in a CCM style. The listeners were asked which emotion they heard. Activity and valence were derived from the chosen emotions. The percentage of correct recognitions out of all the answers in the listening test (N = 9048) was 30.2%. The recognition percentage for the CCM-style singing technique was higher (34.5%) than for the classical-style technique (24.5%). Valence and activation were better perceived than the emotions themselves, and activity was better recognized than valence. A higher pitch was more likely to be perceived as joy or anger, and a lower pitch as sorrow. Both valence and activation were better recognized in the female CCM samples than in the other samples. There are statistically significant differences in the recognition of emotions between classical and CCM styles of singing. Furthermore, in the singing voice, pitch affects the perception of emotions, and valence and activity are more easily recognized than emotions. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  2. Event identification by acoustic signature recognition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dress, W.B.; Kercel, S.W.

    1995-07-01

    Many events of interest to the security commnnity produce acoustic emissions that are, in principle, identifiable as to cause. Some obvious examples are gunshots, breaking glass, takeoffs and landings of small aircraft, vehicular engine noises, footsteps (high frequencies when on gravel, very low frequencies. when on soil), and voices (whispers to shouts). We are investigating wavelet-based methods to extract unique features of such events for classification and identification. We also discuss methods of classification and pattern recognition specifically tailored for acoustic signatures obtained by wavelet analysis. The paper is divided into three parts: completed work, work in progress, and futuremore » applications. The completed phase has led to the successful recognition of aircraft types on landing and takeoff. Both small aircraft (twin-engine turboprop) and large (commercial airliners) were included in the study. The project considered the design of a small, field-deployable, inexpensive device. The techniques developed during the aircraft identification phase were then adapted to a multispectral electromagnetic interference monitoring device now deployed in a nuclear power plant. This is a general-purpose wavelet analysis engine, spanning 14 octaves, and can be adapted for other specific tasks. Work in progress is focused on applying the methods previously developed to speaker identification. Some of the problems to be overcome include recognition of sounds as voice patterns and as distinct from possible background noises (e.g., music), as well as identification of the speaker from a short-duration voice sample. A generalization of the completed work and the work in progress is a device capable of classifying any number of acoustic events-particularly quasi-stationary events such as engine noises and voices and singular events such as gunshots and breaking glass. We will show examples of both kinds of events and discuss their recognition likelihood.« less

  3. Culture/Religion and Identity: Social Justice versus Recognition

    ERIC Educational Resources Information Center

    Bekerman, Zvi

    2012-01-01

    Recognition is the main word attached to multicultural perspectives. The multicultural call for recognition, the one calling for the recognition of cultural minorities and identities, the one now voiced by liberal states all over and also in Israel was a more difficult one. It took the author some time to realize that calling for the recognition…

  4. Social power and recognition of emotional prosody: High power is associated with lower recognition accuracy than low power.

    PubMed

    Uskul, Ayse K; Paulmann, Silke; Weick, Mario

    2016-02-01

    Listeners have to pay close attention to a speaker's tone of voice (prosody) during daily conversations. This is particularly important when trying to infer the emotional state of the speaker. Although a growing body of research has explored how emotions are processed from speech in general, little is known about how psychosocial factors such as social power can shape the perception of vocal emotional attributes. Thus, the present studies explored how social power affects emotional prosody recognition. In a correlational study (Study 1) and an experimental study (Study 2), we show that high power is associated with lower accuracy in emotional prosody recognition than low power. These results, for the first time, suggest that individuals experiencing high or low power perceive emotional tone of voice differently. (c) 2016 APA, all rights reserved).

  5. Auditory emotion recognition impairments in Schizophrenia: Relationship to acoustic features and cognition

    PubMed Central

    Gold, Rinat; Butler, Pamela; Revheim, Nadine; Leitman, David; Hansen, John A.; Gur, Ruben; Kantrowitz, Joshua T.; Laukka, Petri; Juslin, Patrik N.; Silipo, Gail S.; Javitt, Daniel C.

    2013-01-01

    Objective Schizophrenia is associated with deficits in ability to perceive emotion based upon tone of voice. The basis for this deficit, however, remains unclear and assessment batteries remain limited. We evaluated performance in schizophrenia on a novel voice emotion recognition battery with well characterized physical features, relative to impairments in more general emotional and cognitive function. Methods We studied in a primary sample of 92 patients relative to 73 controls. Stimuli were characterized according to both intended emotion and physical features (e.g., pitch, intensity) that contributed to the emotional percept. Parallel measures of visual emotion recognition, pitch perception, general cognition, and overall outcome were obtained. More limited measures were obtained in an independent replication sample of 36 patients, 31 age-matched controls, and 188 general comparison subjects. Results Patients showed significant, large effect size deficits in voice emotion recognition (F=25.4, p<.00001, d=1.1), and were preferentially impaired in recognition of emotion based upon pitch-, but not intensity-features (group X feature interaction: F=7.79, p=.006). Emotion recognition deficits were significantly correlated with pitch perception impairments both across (r=56, p<.0001) and within (r=.47, p<.0001) group. Path analysis showed both sensory-specific and general cognitive contributions to auditory emotion recognition deficits in schizophrenia. Similar patterns of results were observed in the replication sample. Conclusions The present study demonstrates impairments in auditory emotion recognition in schizophrenia relative to acoustic features of underlying stimuli. Furthermore, it provides tools and highlights the need for greater attention to physical features of stimuli used for study of social cognition in neuropsychiatric disorders. PMID:22362394

  6. The influence of nationality on the accuracy of face and voice recognition.

    PubMed

    Doty, N D

    1998-01-01

    Sixty English and U.S. citizens were tested to determine the effect of nationality on accuracy in recognizing previously witnessed faces and voices. Subjects viewed a frontal facial photograph and were then asked to select that face from a set of 10 oblique facial photographs. Subjects listened to a recorded voice and were then asked to select the same voice from a set of 10 voice recordings. This process was repeated 7 more times, such that subjects identified a male and female face and voice from England, France, Belize, and the United States. Subjects demonstrated better accuracy recognizing the faces and voices of their own nationality. Subgoups analysis further supported the other-nationality effect as well as the previously documented other-race effect.

  7. A survey of the state-of-the-art and focused research in range systems, task 1

    NASA Technical Reports Server (NTRS)

    Omura, J. K.

    1986-01-01

    This final report presents the latest research activity in voice compression. We have designed a non-real time simulation system that is implemented around the IBM-PC where the IBM-PC is used as a speech work station for data acquisition and analysis of voice samples. A real-time implementation is also proposed. This real-time Voice Compression Board (VCB) is built around the Texas Instruments TMS-3220. The voice compression algorithm investigated here was described in an earlier report titled, Low Cost Voice Compression for Mobile Digital Radios, by the author. We will assume the reader is familiar with the voice compression algorithm discussed in this report. The VCB compresses speech waveforms at data rates ranging from 4.8 K bps to 16 K bps. This board interfaces to the IBM-PC 8-bit bus, and plugs into a single expansion slot on the mother board.

  8. Assessment of voice, speech, and related quality of life in advanced head and neck cancer patients 10-years+ after chemoradiotherapy.

    PubMed

    Kraaijenga, S A C; Oskam, I M; van Son, R J J H; Hamming-Vrieze, O; Hilgers, F J M; van den Brekel, M W M; van der Molen, L

    2016-04-01

    Assessment of long-term objective and subjective voice, speech, articulation, and quality of life in patients with head and neck cancer (HNC) treated with concurrent chemoradiotherapy (CRT) for advanced, stage IV disease. Twenty-two disease-free survivors, treated with cisplatin-based CRT for inoperable HNC (1999-2004), were evaluated at 10-years post-treatment. A standard Dutch text was recorded. Perceptual analysis of voice, speech, and articulation was conducted by two expert listeners (SLPs). Also an experimental expert system based on automatic speech recognition was used. Patients' perception of voice and speech and related quality of life was assessed with the Voice Handicap Index (VHI) and Speech Handicap Index (SHI) questionnaires. At a median follow-up of 11-years, perceptual evaluation showed abnormal scores in up to 64% of cases, depending on the outcome parameter analyzed. Automatic assessment of voice and speech parameters correlated moderate to strong with perceptual outcome scores. Patient-reported problems with voice (VHI>15) and speech (SHI>6) in daily life were present in 68% and 77% of patients, respectively. Patients treated with IMRT showed significantly less impairment compared to those treated with conventional radiotherapy. More than 10-years after organ-preservation treatment, voice and speech problems are common in this patient cohort, as assessed with perceptual evaluation, automatic speech recognition, and with validated structured questionnaires. There were fewer complaints in patients treated with IMRT than with conventional radiotherapy. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Voice reaction times with recognition for Commodore computers

    NASA Technical Reports Server (NTRS)

    Washburn, David A.; Putney, R. Thompson

    1990-01-01

    Hardware and software modifications are presented that allow for collection and recognition by a Commodore computer of spoken responses. Responses are timed with millisecond accuracy and automatically analyzed and scored. Accuracy data for this device from several experiments are presented. Potential applications and suggestions for improving recognition accuracy are also discussed.

  10. Interpreting Chicken-Scratch: Lexical Access for Handwritten Words

    ERIC Educational Resources Information Center

    Barnhart, Anthony S.; Goldinger, Stephen D.

    2010-01-01

    Handwritten word recognition is a field of study that has largely been neglected in the psychological literature, despite its prevalence in society. Whereas studies of spoken word recognition almost exclusively employ natural, human voices as stimuli, studies of visual word recognition use synthetic typefaces, thus simplifying the process of word…

  11. Towards Real-Time Speech Emotion Recognition for Affective E-Learning

    ERIC Educational Resources Information Center

    Bahreini, Kiavash; Nadolski, Rob; Westera, Wim

    2016-01-01

    This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…

  12. A self-teaching image processing and voice-recognition-based, intelligent and interactive system to educate visually impaired children

    NASA Astrophysics Data System (ADS)

    Iqbal, Asim; Farooq, Umar; Mahmood, Hassan; Asad, Muhammad Usman; Khan, Akrama; Atiq, Hafiz Muhammad

    2010-02-01

    A self teaching image processing and voice recognition based system is developed to educate visually impaired children, chiefly in their primary education. System comprises of a computer, a vision camera, an ear speaker and a microphone. Camera, attached with the computer system is mounted on the ceiling opposite (on the required angle) to the desk on which the book is placed. Sample images and voices in the form of instructions and commands of English, Urdu alphabets, Numeric Digits, Operators and Shapes are already stored in the database. A blind child first reads the embossed character (object) with the help of fingers than he speaks the answer, name of the character, shape etc into the microphone. With the voice command of a blind child received by the microphone, image is taken by the camera which is processed by MATLAB® program developed with the help of Image Acquisition and Image processing toolbox and generates a response or required set of instructions to child via ear speaker, resulting in self education of a visually impaired child. Speech recognition program is also developed in MATLAB® with the help of Data Acquisition and Signal Processing toolbox which records and process the command of the blind child.

  13. Cultural in-group advantage: emotion recognition in African American and European American faces and voices.

    PubMed

    Wickline, Virginia B; Bailey, Wendy; Nowicki, Stephen

    2009-03-01

    The authors explored whether there were in-group advantages in emotion recognition of faces and voices by culture or geographic region. Participants were 72 African American students (33 men, 39 women), 102 European American students (30 men, 72 women), 30 African international students (16 men, 14 women), and 30 European international students (15 men, 15 women). The participants determined emotions in African American and European American faces and voices. Results showed an in-group advantage-sometimes by culture, less often by race-in recognizing facial and vocal emotional expressions. African international students were generally less accurate at interpreting American nonverbal stimuli than were European American, African American, and European international peers. Results suggest that, although partly universal, emotional expressions have subtle differences across cultures that persons must learn.

  14. Progressive Associative Phonagnosia: A Neuropsychological Analysis

    ERIC Educational Resources Information Center

    Hailstone, Julia C.; Crutch, Sebastian J.; Vestergaard, Martin D.; Patterson, Roy D.; Warren, Jason D.

    2010-01-01

    There are few detailed studies of impaired voice recognition, or phonagnosia. Here we describe two patients with progressive phonagnosia in the context of frontotemporal lobar degeneration. Patient QR presented with behavioural decline and increasing difficulty recognising familiar voices, while patient KL presented with progressive prosopagnosia.…

  15. Monitoring daily affective symptoms and memory function using interactive voice response in outpatients receiving electroconvulsive therapy.

    PubMed

    Fazzino, Tera L; Rabinowitz, Terry; Althoff, Robert R; Helzer, John E

    2013-12-01

    Recently, there has been a gradual shift from inpatient-only electroconvulsive therapy (ECT) toward outpatient administration. Potential advantages include convenience and reduced cost. But providers do not have the same opportunity to monitor treatment response and adverse effects as they do with inpatients. This can obviate some of the potential advantages of outpatient ECT, such as tailoring treatment intervals to clinical response. Scheduling is typically algorithmic rather than empirically based. Daily monitoring through an automated telephone, interactive voice response (IVR), is a potential solution to this quandary. To test feasibility of clinical monitoring via IVR, we recruited 26 patients (69% female; mean age, 51 years) receiving outpatient ECT to make daily IVR reports of affective symptoms and subjective memory for 60 days. The IVR also administered a word recognition task daily to test objective memory. Every seventh day, a longer IVR weekly interview included questions about suicidal ideation. Overall daily call compliance was high (mean, 80%). Most participants (96%) did not consider the calls to be time-consuming. Longitudinal regression analysis using generalized estimating equations revealed that participant objective memory functioning significantly improved during the study (P < 0.05). Of 123 weekly IVR interviews, 41 reports (33%) in 14 patients endorsed suicidal ideation during the previous week. Interactive voice response monitoring of outpatient ECT can provide more detailed clinical information than standard outpatient ECT assessment. Interactive voice response data offer providers a comprehensive, longitudinal picture of patient treatment response and adverse effects as a basis for treatment scheduling and ongoing clinical management.

  16. Voice Enabled Framework to Support Post-Surgical Discharge Monitoring

    PubMed Central

    Blansit, Kevin; Marmor, Rebecca; Zhao, Beiqun; Tien, Dan

    2017-01-01

    Unplanned surgical readmissions pose a challenging problem for the American healthcare system. We propose to combine consumer electronic voice recognition technology with the FHIR standard to create a post-surgical discharge monitoring app to identify and alert physicians to a patient’s deteriorating status. PMID:29854267

  17. Cerebral Processing of Voice Gender Studied Using a Continuous Carryover fMRI Design

    PubMed Central

    Pernet, Cyril; Latinus, Marianne; Crabbe, Frances; Belin, Pascal

    2013-01-01

    Normal listeners effortlessly determine a person's gender by voice, but the cerebral mechanisms underlying this ability remain unclear. Here, we demonstrate 2 stages of cerebral processing during voice gender categorization. Using voice morphing along with an adaptation-optimized functional magnetic resonance imaging design, we found that secondary auditory cortex including the anterior part of the temporal voice areas in the right hemisphere responded primarily to acoustical distance with the previously heard stimulus. In contrast, a network of bilateral regions involving inferior prefrontal and anterior and posterior cingulate cortex reflected perceived stimulus ambiguity. These findings suggest that voice gender recognition involves neuronal populations along the auditory ventral stream responsible for auditory feature extraction, functioning in pair with the prefrontal cortex in voice gender perception. PMID:22490550

  18. V2S: Voice to Sign Language Translation System for Malaysian Deaf People

    NASA Astrophysics Data System (ADS)

    Mean Foong, Oi; Low, Tang Jung; La, Wai Wan

    The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.

  19. The voice conveys emotion in ten globalized cultures and one remote village in Bhutan.

    PubMed

    Cordaro, Daniel T; Keltner, Dacher; Tshering, Sumjay; Wangchuk, Dorji; Flynn, Lisa M

    2016-02-01

    With data from 10 different globalized cultures and 1 remote, isolated village in Bhutan, we examined universals and cultural variations in the recognition of 16 nonverbal emotional vocalizations. College students in 10 nations (Study 1) and villagers in remote Bhutan (Study 2) were asked to match emotional vocalizations to 1-sentence stories of the same valence. Guided by previous conceptualizations of recognition accuracy, across both studies, 7 of the 16 vocal burst stimuli were found to have strong or very strong recognition in all 11 cultures, 6 vocal bursts were found to have moderate recognition, and 4 were not universally recognized. All vocal burst stimuli varied significantly in terms of the degree to which they were recognized across the 11 cultures. Our discussion focuses on the implications of these results for current debates concerning the emotion conveyed in the voice. (c) 2016 APA, all rights reserved).

  20. Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling

    NASA Astrophysics Data System (ADS)

    Mousa, Allam

    2010-01-01

    Voice changing has many applications in the industry and commercial filed. This paper emphasizes voice conversion using a pitch shifting method which depends on detecting the pitch of the signal (fundamental frequency) using Simplified Inverse Filter Tracking (SIFT) and changing it according to the target pitch period using time stretching with Pitch Synchronous Over Lap Add Algorithm (PSOLA), then resampling the signal in order to have the same play rate. The same study was performed to see the effect of voice conversion when some Arabic speech signal is considered. Treatment of certain Arabic voiced vowels and the conversion between male and female speech has shown some expansion or compression in the resulting speech. Comparison in terms of pitch shifting is presented here. Analysis was performed for a single frame and a full segmentation of speech.

  1. A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space.

    PubMed

    Yu, Chengzhu; Hansen, John H L

    2017-03-01

    Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.

  2. DTO-675: Voice Control of the Closed Circuit Television System

    NASA Technical Reports Server (NTRS)

    Salazar, George; Gaston, Darilyn M.; Haynes, Dena S.

    1996-01-01

    This report presents the results of the Detail Test Object (DTO)-675 "Voice Control of the Closed Circuit Television (CCTV)" system. The DTO is a follow-on flight of the Voice Command System (VCS) that flew as a secondary payload on STS-41. Several design changes were made to the VCS for the STS-78 mission. This report discusses those design changes, the data collected during the mission, recognition problems encountered, and findings.

  3. Hands-free human-machine interaction with voice

    NASA Astrophysics Data System (ADS)

    Juang, B. H.

    2004-05-01

    Voice is natural communication interface between a human and a machine. The machine, when placed in today's communication networks, may be configured to provide automation to save substantial operating cost, as demonstrated in AT&T's VRCP (Voice Recognition Call Processing), or to facilitate intelligent services, such as virtual personal assistants, to enhance individual productivity. These intelligent services often need to be accessible anytime, anywhere (e.g., in cars when the user is in a hands-busy-eyes-busy situation or during meetings where constantly talking to a microphone is either undersirable or impossible), and thus call for advanced signal processing and automatic speech recognition techniques which support what we call ``hands-free'' human-machine communication. These techniques entail a broad spectrum of technical ideas, ranging from use of directional microphones and acoustic echo cancellatiion to robust speech recognition. In this talk, we highlight a number of key techniques that were developed for hands-free human-machine communication in the mid-1990s after Bell Labs became a unit of Lucent Technologies. A video clip will be played to demonstrate the accomplishement.

  4. United States Homeland Security and National Biometric Identification

    DTIC Science & Technology

    2002-04-09

    security number. Biometrics is the use of unique individual traits such as fingerprints, iris eye patterns, voice recognition, and facial recognition to...technology to control access onto their military bases using a Defense Manpower Management Command developed software application. FACIAL Facial recognition systems...installed facial recognition systems in conjunction with a series of 200 cameras to fight street crime and identify terrorists. The cameras, which are

  5. Military applications of automatic speech recognition and future requirements

    NASA Technical Reports Server (NTRS)

    Beek, Bruno; Cupples, Edward J.

    1977-01-01

    An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.

  6. Voice emotion recognition by cochlear-implanted children and their normally-hearing peers.

    PubMed

    Chatterjee, Monita; Zion, Danielle J; Deroche, Mickael L; Burianek, Brooke A; Limb, Charles J; Goren, Alison P; Kulkarni, Aditya M; Christensen, Julie A

    2015-04-01

    Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups' mean performance is similar to aNHs' performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. This article is part of a Special Issue entitled . Copyright © 2014 Elsevier B.V. All rights reserved.

  7. Novel Blind Recognition Algorithm of Frame Synchronization Words Based on Soft-Decision in Digital Communication Systems.

    PubMed

    Qin, Jiangyi; Huang, Zhiping; Liu, Chunwu; Su, Shaojing; Zhou, Jing

    2015-01-01

    A novel blind recognition algorithm of frame synchronization words is proposed to recognize the frame synchronization words parameters in digital communication systems. In this paper, a blind recognition method of frame synchronization words based on the hard-decision is deduced in detail. And the standards of parameter recognition are given. Comparing with the blind recognition based on the hard-decision, utilizing the soft-decision can improve the accuracy of blind recognition. Therefore, combining with the characteristics of Quadrature Phase Shift Keying (QPSK) signal, an improved blind recognition algorithm based on the soft-decision is proposed. Meanwhile, the improved algorithm can be extended to other signal modulation forms. Then, the complete blind recognition steps of the hard-decision algorithm and the soft-decision algorithm are given in detail. Finally, the simulation results show that both the hard-decision algorithm and the soft-decision algorithm can recognize the parameters of frame synchronization words blindly. What's more, the improved algorithm can enhance the accuracy of blind recognition obviously.

  8. Voice Technologies in Libraries: A Look into the Future.

    ERIC Educational Resources Information Center

    Lange, Holley R., Ed.; And Others

    1991-01-01

    Discussion of synthesized speech and voice recognition focuses on a forum that addressed the potential for speech technologies in libraries. Topics discussed by three contributors include possible library applications in technical processing, book receipt, circulation control, and database access; use by disabled and illiterate users; and problems…

  9. Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

    PubMed

    Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

    2016-10-01

    Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.

  10. Real time analysis of voiced sounds

    NASA Technical Reports Server (NTRS)

    Hong, J. P. (Inventor)

    1976-01-01

    A power spectrum analysis of the harmonic content of a voiced sound signal is conducted in real time by phase-lock-loop tracking of the fundamental frequency, (f sub 0) of the signal and successive harmonics (h sub 1 through h sub n) of the fundamental frequency. The analysis also includes measuring the quadrature power and phase of each frequency tracked, differentiating the power measurements of the harmonics in adjacent pairs, and analyzing successive differentials to determine peak power points in the power spectrum for display or use in analysis of voiced sound, such as for voice recognition.

  11. Real-time speech gisting for ATC applications

    NASA Astrophysics Data System (ADS)

    Dunkelberger, Kirk A.

    1995-06-01

    Command and control within the ATC environment remains primarily voice-based. Hence, automatic real time, speaker independent, continuous speech recognition (CSR) has many obvious applications and implied benefits to the ATC community: automated target tagging, aircraft compliance monitoring, controller training, automatic alarm disabling, display management, and many others. However, while current state-of-the-art CSR systems provide upwards of 98% word accuracy in laboratory environments, recent low-intrusion experiments in the ATCT environments demonstrated less than 70% word accuracy in spite of significant investments in recognizer tuning. Acoustic channel irregularities and controller/pilot grammar verities impact current CSR algorithms at their weakest points. It will be shown herein, however, that real time context- and environment-sensitive gisting can provide key command phrase recognition rates of greater than 95% using the same low-intrusion approach. The combination of real time inexact syntactic pattern recognition techniques and a tight integration of CSR, gisting, and ATC database accessor system components is the key to these high phase recognition rates. A system concept for real time gisting in the ATC context is presented herein. After establishing an application context, discussion presents a minimal CSR technology context then focuses on the gisting mechanism, desirable interfaces into the ATCT database environment, and data and control flow within the prototype system. Results of recent tests for a subset of the functionality are presented together with suggestions for further research.

  12. Neural Processing of Vocal Emotion and Identity

    ERIC Educational Resources Information Center

    Spreckelmeyer, Katja N.; Kutas, Marta; Urbach, Thomas; Altenmuller, Eckart; Munte, Thomas F.

    2009-01-01

    The voice is a marker of a person's identity which allows individual recognition even if the person is not in sight. Listening to a voice also affords inferences about the speaker's emotional state. Both these types of personal information are encoded in characteristic acoustic feature patterns analyzed within the auditory cortex. In the present…

  13. Voice Interactive Analysis System Study. Final Report, August 28, 1978 through March 23, 1979.

    ERIC Educational Resources Information Center

    Harry, D. P.; And Others

    The Voice Interactive Analysis System study continued research and development of the LISTEN real-time, minicomputer based connected speech recognition system, within NAVTRAEQUIPCEN'S program of developing automatic speech technology in support of training. An attempt was made to identify the most effective features detected by the TTI-500 model…

  14. Social support and substitute voice acquisition on psychological adjustment among patients after laryngectomy.

    PubMed

    Kotake, Kumiko; Suzukamo, Yoshimi; Kai, Ichiro; Iwanaga, Kazuyo; Takahashi, Aya

    2017-03-01

    The objective is to clarify whether social support and acquisition of alternative voice enhance the psychological adjustment of laryngectomized patients and which part of the psychological adjustment structure would be influenced by social support. We contacted 1445 patients enrolled in a patient association using mail surveys and 679 patients agreed to participate in the study. The survey items included age, sex, occupation, post-surgery duration, communication method, psychological adjustment (by the Nottingham Adjustment Scale Japanese Laryngectomy Version: NAS-J-L), and the formal support (by Hospital Patient Satisfaction Questionnaire-25: HPSQ-25). Social support and communication methods were added to the three-tier structural model of psychological adjustment shown in our previous study, and a covariance structure analysis was conducted. Formal/informal supports and acquisition of alternative voice influence only the "recognition of oneself as voluntary agent", the first tier of the three-tier structure of psychological adjustment. The results suggest that social support and acquisition of alternative voice may enhance the recognition of oneself as voluntary agent and promote the psychological adjustment.

  15. Processing Electromyographic Signals to Recognize Words

    NASA Technical Reports Server (NTRS)

    Jorgensen, C. C.; Lee, D. D.

    2009-01-01

    A recently invented speech-recognition method applies to words that are articulated by means of the tongue and throat muscles but are otherwise not voiced or, at most, are spoken sotto voce. This method could satisfy a need for speech recognition under circumstances in which normal audible speech is difficult, poses a hazard, is disturbing to listeners, or compromises privacy. The method could also be used to augment traditional speech recognition by providing an additional source of information about articulator activity. The method can be characterized as intermediate between (1) conventional speech recognition through processing of voice sounds and (2) a method, not yet developed, of processing electroencephalographic signals to extract unspoken words directly from thoughts. This method involves computational processing of digitized electromyographic (EMG) signals from muscle innervation acquired by surface electrodes under a subject's chin near the tongue and on the side of the subject s throat near the larynx. After preprocessing, digitization, and feature extraction, EMG signals are processed by a neural-network pattern classifier, implemented in software, that performs the bulk of the recognition task as described.

  16. Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

    PubMed

    Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

    2015-07-01

    It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line with the 'auditory-visual view' of auditory speech perception, which assumes that auditory speech recognition is optimized by using predictions from previously encoded speaker-specific audio-visual internal models. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Valuing autonomy, struggling for an identity and a collective voice, and seeking role recognition: community mental health nurses' perceptions of their roles.

    PubMed

    White, Jane H; Kudless, Mary

    2008-10-01

    Leaders in this community mental health system approached the problem of job frustration, morale issues, and turnover concerns of their Community Mental Health Nurses (CMHNs) by designing a qualitative study using Participant Action Research (PAR) methodology based on the philosophy of Habermas. Six focus groups were conducted to address the nurses' concerns. The themes of Valuing Autonomy, Struggling for an Identity and Collective Voice, and Seeking Role Recognition best explained the participants' concerns. The study concluded with an action plan, the implementation of the plan, and a discussion of the plan's final outcomes.

  18. Understanding the Convolutional Neural Networks with Gradient Descent and Backpropagation

    NASA Astrophysics Data System (ADS)

    Zhou, XueFei

    2018-04-01

    With the development of computer technology, the applications of machine learning are more and more extensive. And machine learning is providing endless opportunities to develop new applications. One of those applications is image recognition by using Convolutional Neural Networks (CNNs). CNN is one of the most common algorithms in image recognition. It is significant to understand its theory and structure for every scholar who is interested in this field. CNN is mainly used in computer identification, especially in voice, text recognition and other aspects of the application. It utilizes hierarchical structure with different layers to accelerate computing speed. In addition, the greatest features of CNNs are the weight sharing and dimension reduction. And all of these consolidate the high effectiveness and efficiency of CNNs with idea computing speed and error rate. With the help of other learning altruisms, CNNs could be used in several scenarios for machine learning, especially for deep learning. Based on the general introduction to the background and the core solution CNN, this paper is going to focus on summarizing how Gradient Descent and Backpropagation work, and how they contribute to the high performances of CNNs. Also, some practical applications will be discussed in the following parts. The last section exhibits the conclusion and some perspectives of future work.

  19. Design of a digital voice data compression technique for orbiter voice channels

    NASA Technical Reports Server (NTRS)

    1975-01-01

    Candidate techniques were investigated for digital voice compression to a transmission rate of 8 kbps. Good voice quality, speaker recognition, and robustness in the presence of error bursts were considered. The technique of delayed-decision adaptive predictive coding is described and compared with conventional adaptive predictive coding. Results include a set of experimental simulations recorded on analog tape. The two FM broadcast segments produced show the delayed-decision technique to be virtually undegraded or minimally degraded at .001 and .01 Viterbi decoder bit error rates. Preliminary estimates of the hardware complexity of this technique indicate potential for implementation in space shuttle orbiters.

  20. The Voice Transcription Technique: Use of Voice Recognition Software to Transcribe Digital Interview Data in Qualitative Research

    ERIC Educational Resources Information Center

    Matheson, Jennifer L.

    2007-01-01

    Transcribing interview data is a time-consuming task that most qualitative researchers dislike. Transcribing is even more difficult for people with physical limitations because traditional transcribing requires manual dexterity and the ability to sit at a computer for long stretches of time. Researchers have begun to explore using an automated…

  1. Voiced Excitations

    DTIC Science & Technology

    2004-12-01

    3701 North Fairfax Drive Arlington, VA 22203-1714 NA NA NA Radar & EM Speech, Voiced Speech Excitations 61 ULUNCLASSIFIED UNCLASSIFIED UNCLASSIFIED...New Ideas for Speech Recognition and Related Technologies”, Lawrence Livermore National Laboratory Report, UCRL -UR-120310 , 1995 . Available from...Livermore Laboratory report UCRL -JC-134775M Holzrichter 2003, Holzrichter J.F., Kobler, J. B., Rosowski, J.J., Burke, G.J., (2003) “EM wave

  2. Children's Recognition of Their Own Recorded Voice: Influence of Age and Phonological Impairment

    ERIC Educational Resources Information Center

    Strombergsson, Sofia

    2013-01-01

    Children with phonological impairment (PI) often have difficulties perceiving insufficiencies in their own speech. The use of recordings has been suggested as a way of directing the child's attention toward his/her own speech, despite a lack of evidence that children actually recognize their recorded voice as their own. We present two studies of…

  3. Vocal Identity Recognition in Autism Spectrum Disorder

    PubMed Central

    Lin, I-Fan; Yamada, Takashi; Komine, Yoko; Kato, Nobumasa; Kato, Masaharu; Kashino, Makio

    2015-01-01

    Voices can convey information about a speaker. When forming an abstract representation of a speaker, it is important to extract relevant features from acoustic signals that are invariant to the modulation of these signals. This study investigated the way in which individuals with autism spectrum disorder (ASD) recognize and memorize vocal identity. The ASD group and control group performed similarly in a task when asked to choose the name of the newly-learned speaker based on his or her voice, and the ASD group outperformed the control group in a subsequent familiarity test when asked to discriminate the previously trained voices and untrained voices. These findings suggest that individuals with ASD recognized and memorized voices as well as the neurotypical individuals did, but they categorized voices in a different way: individuals with ASD categorized voices quantitatively based on the exact acoustic features, while neurotypical individuals categorized voices qualitatively based on the acoustic patterns correlated to the speakers' physical and mental properties. PMID:26070199

  4. Vocal Identity Recognition in Autism Spectrum Disorder.

    PubMed

    Lin, I-Fan; Yamada, Takashi; Komine, Yoko; Kato, Nobumasa; Kato, Masaharu; Kashino, Makio

    2015-01-01

    Voices can convey information about a speaker. When forming an abstract representation of a speaker, it is important to extract relevant features from acoustic signals that are invariant to the modulation of these signals. This study investigated the way in which individuals with autism spectrum disorder (ASD) recognize and memorize vocal identity. The ASD group and control group performed similarly in a task when asked to choose the name of the newly-learned speaker based on his or her voice, and the ASD group outperformed the control group in a subsequent familiarity test when asked to discriminate the previously trained voices and untrained voices. These findings suggest that individuals with ASD recognized and memorized voices as well as the neurotypical individuals did, but they categorized voices in a different way: individuals with ASD categorized voices quantitatively based on the exact acoustic features, while neurotypical individuals categorized voices qualitatively based on the acoustic patterns correlated to the speakers' physical and mental properties.

  5. Bilingual Computerized Speech Recognition Screening for Depression Symptoms

    ERIC Educational Resources Information Center

    Gonzalez, Gerardo; Carter, Colby; Blanes, Erika

    2007-01-01

    The Voice-Interactive Depression Assessment System (VIDAS) is a computerized speech recognition application for screening depression based on the Center for Epidemiological Studies--Depression scale in English and Spanish. Study 1 included 50 English and 47 Spanish speakers. Study 2 involved 108 English and 109 Spanish speakers. Participants…

  6. The Army word recognition system

    NASA Technical Reports Server (NTRS)

    Hadden, David R.; Haratz, David

    1977-01-01

    The application of speech recognition technology in the Army command and control area is presented. The problems associated with this program are described as well as as its relevance in terms of the man/machine interactions, voice inflexions, and the amount of training needed to interact with and utilize the automated system.

  7. Hearing the Unheard: An Interdisciplinary, Mixed Methodology Study of Women's Experiences of Hearing Voices (Auditory Verbal Hallucinations).

    PubMed

    McCarthy-Jones, Simon; Castro Romero, Maria; McCarthy-Jones, Roseline; Dillon, Jacqui; Cooper-Rompato, Christine; Kieran, Kathryn; Kaufman, Milissa; Blackman, Lisa

    2015-01-01

    This paper explores the experiences of women who "hear voices" (auditory verbal hallucinations). We begin by examining historical understandings of women hearing voices, showing these have been driven by androcentric theories of how women's bodies functioned leading to women being viewed as requiring their voices be interpreted by men. We show the twentieth century was associated with recognition that the mental violation of women's minds (represented by some voice-hearing) was often a consequence of the physical violation of women's bodies. We next report the results of a qualitative study into voice-hearing women's experiences (n = 8). This found similarities between women's relationships with their voices and their relationships with others and the wider social context. Finally, we present results from a quantitative study comparing voice-hearing in women (n = 65) and men (n = 132) in a psychiatric setting. Women were more likely than men to have certain forms of voice-hearing (voices conversing) and to have antecedent events of trauma, physical illness, and relationship problems. Voices identified as female may have more positive affect than male voices. We conclude that women voice-hearers have and continue to face specific challenges necessitating research and activism, and hope this paper will act as a stimulus to such work.

  8. A memory like a female Fur Seal: long-lasting recognition of pup's voice by mothers.

    PubMed

    Mathevon, Nicolas; Charrier, Isabelle; Aubin, Thierry

    2004-06-01

    In colonial mammals like fur seals, mutual vocal recognition between mothers and their pup is of primary importance for breeding success. Females alternate feeding sea-trips with suckling periods on land, and when coming back from the ocean, they have to vocally find their offspring among numerous similar-looking pups. Young fur seals emit a 'mother-attraction call' that presents individual characteristics. In this paper, we review the perceptual process of pup's call recognition by Subantarctic Fur Seal Arctocephalus tropicalis mothers. To identify their progeny, females rely on the frequency modulation pattern and spectral features of this call. As the acoustic characteristics of a pup's call change throughout the lactation period due to the growing process, mothers have thus to refine their memorization of their pup's voice. Field experiments show that female Fur Seals are able to retain all the successive versions of their pup's call.

  9. Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

    NASA Astrophysics Data System (ADS)

    Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.

    2009-12-01

    This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.

  10. Voice Identification: Levels-of-Processing and the Relationship between Prior Description Accuracy and Recognition Accuracy.

    ERIC Educational Resources Information Center

    Walter, Todd J.

    A study examined whether a person's ability to accurately identify a voice is influenced by factors similar to those proposed by the Supreme Court for eyewitness identification accuracy. In particular, the Supreme Court has suggested that a person's prior description accuracy of a suspect, degree of attention to a suspect, and confidence in…

  11. Some effects of stress on users of a voice recognition system: A preliminary inquiry

    NASA Astrophysics Data System (ADS)

    French, B. A.

    1983-03-01

    Recent work with Automatic Speech Recognition has focused on applications and productivity considerations in the man-machine interface. This thesis is an attempt to see if placing users of such equipment under time-induced stress has an effect on their percent correct recognition rates. Subjects were given a message-handling task of fixed length and allowed progressively shorter times to attempt to complete it. Questionnaire responses indicate stress levels increased with decreased time-allowance; recognition rates decreased as time was reduced.

  12. Voice control of the space shuttle video system

    NASA Technical Reports Server (NTRS)

    Bejczy, A. K.; Dotson, R. S.; Brown, J. W.; Lewis, J. L.

    1981-01-01

    A pilot voice control system developed at the Jet Propulsion Laboratory (JPL) to test and evaluate the feasibility of controlling the shuttle TV cameras and monitors by voice commands utilizes a commercially available discrete word speech recognizer which can be trained to the individual utterances of each operator. Successful ground tests were conducted using a simulated full-scale space shuttle manipulator. The test configuration involved the berthing, maneuvering and deploying a simulated science payload in the shuttle bay. The handling task typically required 15 to 20 minutes and 60 to 80 commands to 4 TV cameras and 2 TV monitors. The best test runs show 96 to 100 percent voice recognition accuracy.

  13. Stress reaction process-based hierarchical recognition algorithm for continuous intrusion events in optical fiber prewarning system

    NASA Astrophysics Data System (ADS)

    Qu, Hongquan; Yuan, Shijiao; Wang, Yanping; Yang, Dan

    2018-04-01

    To improve the recognition performance of optical fiber prewarning system (OFPS), this study proposed a hierarchical recognition algorithm (HRA). Compared with traditional methods, which employ only a complex algorithm that includes multiple extracted features and complex classifiers to increase the recognition rate with a considerable decrease in recognition speed, HRA takes advantage of the continuity of intrusion events, thereby creating a staged recognition flow inspired by stress reaction. HRA is expected to achieve high-level recognition accuracy with less time consumption. First, this work analyzed the continuity of intrusion events and then presented the algorithm based on the mechanism of stress reaction. Finally, it verified the time consumption through theoretical analysis and experiments, and the recognition accuracy was obtained through experiments. Experiment results show that the processing speed of HRA is 3.3 times faster than that of a traditional complicated algorithm and has a similar recognition rate of 98%. The study is of great significance to fast intrusion event recognition in OFPS.

  14. Recognition of strong earthquake-prone areas with a single learning class

    NASA Astrophysics Data System (ADS)

    Gvishiani, A. D.; Agayan, S. M.; Dzeboev, B. A.; Belov, I. O.

    2017-05-01

    This article presents a new Barrier recognition algorithm with learning, designed for recognition of earthquake-prone areas. In comparison to the Crust (Kora) algorithm, used by the classical EPA approach, the Barrier algorithm proceeds with learning just on one "pure" high-seismic class. The new algorithm operates in the space of absolute values of the geological-geophysical parameters of the objects. The algorithm is used for recognition of earthquake-prone areas with M ≥ 6.0 in the Caucasus region. Comparative analysis of the Crust and Barrier algorithms justifies their productive coherence.

  15. The role of voice input for human-machine communication.

    PubMed Central

    Cohen, P R; Oviatt, S L

    1995-01-01

    Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology. PMID:7479803

  16. A TDM link with channel coding and digital voice.

    NASA Technical Reports Server (NTRS)

    Jones, M. W.; Tu, K.; Harton, P. L.

    1972-01-01

    The features of a TDM (time-division multiplexed) link model are described. A PCM telemetry sequence was coded for error correction and multiplexed with a digitized voice channel. An all-digital implementation of a variable-slope delta modulation algorithm was used to digitize the voice channel. The results of extensive testing are reported. The measured coding gain and the system performance over a Gaussian channel are compared with theoretical predictions and computer simulations. Word intelligibility scores are reported as a measure of voice channel performance.

  17. Sound specificity effects in spoken word recognition: The effect of integrality between words and sounds.

    PubMed

    Strori, Dorina; Zaar, Johannes; Cooke, Martin; Mattys, Sven L

    2018-01-01

    Recent evidence has shown that nonlinguistic sounds co-occurring with spoken words may be retained in memory and affect later retrieval of the words. This sound-specificity effect shares many characteristics with the classic voice-specificity effect. In this study, we argue that the sound-specificity effect is conditional upon the context in which the word and sound coexist. Specifically, we argue that, besides co-occurrence, integrality between words and sounds is a crucial factor in the emergence of the effect. In two recognition-memory experiments, we compared the emergence of voice and sound specificity effects. In Experiment 1 , we examined two conditions where integrality is high. Namely, the classic voice-specificity effect (Exp. 1a) was compared with a condition in which the intensity envelope of a background sound was modulated along the intensity envelope of the accompanying spoken word (Exp. 1b). Results revealed a robust voice-specificity effect and, critically, a comparable sound-specificity effect: A change in the paired sound from exposure to test led to a decrease in word-recognition performance. In the second experiment, we sought to disentangle the contribution of integrality from a mere co-occurrence context effect by removing the intensity modulation. The absence of integrality led to the disappearance of the sound-specificity effect. Taken together, the results suggest that the assimilation of background sounds into memory cannot be reduced to a simple context effect. Rather, it is conditioned by the extent to which words and sounds are perceived as integral as opposed to distinct auditory objects.

  18. Writing with voice: an investigation of the use of a voice recognition system as a writing aid for a man with aphasia.

    PubMed

    Bruce, Carolyn; Edmundson, Anne; Coleman, Michael

    2003-01-01

    People with aphasia may experience difficulties that prevent them from demonstrating in writing what they know and can produce orally. Voice recognition systems that allow the user to speak into a microphone and see their words appear on a computer screen have the potential to assist written communication. This study investigated whether a man with fluent aphasia could learn to use Dragon NaturallySpeaking to write. A single case study of a man with acquired writing difficulties is reported. A detailed account is provided of the stages involved in teaching him to use the software. The therapy tasks carried out to develop his functional use of the system are then described. Outcomes included the percentage of words accurately recognized by the system over time, the quantitative and qualitative changes in written texts produced with and without the use of the speech-recognition system, and the functional benefits the man described. The treatment programme was successful and resulted in a marked improvement in the subject's written work. It also had effects in the functional life domain as the subject could use writing for communication purposes. The results suggest that the technology might benefit others with acquired writing difficulties.

  19. Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

    PubMed Central

    Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

    2016-01-01

    Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714

  20. Detecting paroxysmal coughing from pertussis cases using voice recognition technology.

    PubMed

    Parker, Danny; Picone, Joseph; Harati, Amir; Lu, Shuang; Jenkyns, Marion H; Polgreen, Philip M

    2013-01-01

    Pertussis is highly contagious; thus, prompt identification of cases is essential to control outbreaks. Clinicians experienced with the disease can easily identify classic cases, where patients have bursts of rapid coughing followed by gasps, and a characteristic whooping sound. However, many clinicians have never seen a case, and thus may miss initial cases during an outbreak. The purpose of this project was to use voice-recognition software to distinguish pertussis coughs from croup and other coughs. We collected a series of recordings representing pertussis, croup and miscellaneous coughing by children. We manually categorized coughs as either pertussis or non-pertussis, and extracted features for each category. We used Mel-frequency cepstral coefficients (MFCC), a sampling rate of 16 KHz, a frame Duration of 25 msec, and a frame rate of 10 msec. The coughs were filtered. Each cough was divided into 3 sections of proportion 3-4-3. The average of the 13 MFCCs for each section was computed and made into a 39-element feature vector used for the classification. We used the following machine learning algorithms: Neural Networks, K-Nearest Neighbor (KNN), and a 200 tree Random Forest (RF). Data were reserved for cross-validation of the KNN and RF. The Neural Network was trained 100 times, and the averaged results are presented. After categorization, we had 16 examples of non-pertussis coughs and 31 examples of pertussis coughs. Over 90% of all pertussis coughs were properly classified as pertussis. The error rates were: Type I errors of 7%, 12%, and 25% and Type II errors of 8%, 0%, and 0%, using the Neural Network, Random Forest, and KNN, respectively. Our results suggest that we can build a robust classifier to assist clinicians and the public to help identify pertussis cases in children presenting with typical symptoms.

  1. Multimodal approaches for emotion recognition: a survey

    NASA Astrophysics Data System (ADS)

    Sebe, Nicu; Cohen, Ira; Gevers, Theo; Huang, Thomas S.

    2004-12-01

    Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing-emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in several applications. This paper explores new ways of human-computer interaction that enable the computer to be more aware of the user's emotional and attentional expressions. We present the basic research in the field and the recent advances into the emotion recognition from facial, voice, and physiological signals, where the different modalities are treated independently. We then describe the challenging problem of multimodal emotion recognition and we advocate the use of probabilistic graphical models when fusing the different modalities. We also discuss the difficult issues of obtaining reliable affective data, obtaining ground truth for emotion recognition, and the use of unlabeled data.

  2. Multimodal approaches for emotion recognition: a survey

    NASA Astrophysics Data System (ADS)

    Sebe, Nicu; Cohen, Ira; Gevers, Theo; Huang, Thomas S.

    2005-01-01

    Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing-emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in several applications. This paper explores new ways of human-computer interaction that enable the computer to be more aware of the user's emotional and attentional expressions. We present the basic research in the field and the recent advances into the emotion recognition from facial, voice, and physiological signals, where the different modalities are treated independently. We then describe the challenging problem of multimodal emotion recognition and we advocate the use of probabilistic graphical models when fusing the different modalities. We also discuss the difficult issues of obtaining reliable affective data, obtaining ground truth for emotion recognition, and the use of unlabeled data.

  3. Voice gender discrimination provides a measure of more than pitch-related perception in cochlear implant users.

    PubMed

    Li, Tianhao; Fu, Qian-Jie

    2011-08-01

    (1) To investigate whether voice gender discrimination (VGD) could be a useful indicator of the spectral and temporal processing abilities of individual cochlear implant (CI) users; (2) To examine the relationship between VGD and speech recognition with CI when comparable acoustic cues are used for both perception processes. VGD was measured using two talker sets with different inter-gender fundamental frequencies (F(0)), as well as different acoustic CI simulations. Vowel and consonant recognition in quiet and noise were also measured and compared with VGD performance. Eleven postlingually deaf CI users. The results showed that (1) mean VGD performance differed for different stimulus sets, (2) VGD and speech recognition performance varied among individual CI users, and (3) individual VGD performance was significantly correlated with speech recognition performance under certain conditions. VGD measured with selected stimulus sets might be useful for assessing not only pitch-related perception, but also spectral and temporal processing by individual CI users. In addition to improvements in spectral resolution and modulation detection, the improvement in higher modulation frequency discrimination might be particularly important for CI users in noisy environments.

  4. The Bangor Voice Matching Test: A standardized test for the assessment of voice perception ability.

    PubMed

    Mühl, Constanze; Sheil, Orla; Jarutytė, Lina; Bestelmeyer, Patricia E G

    2017-11-09

    Recognising the identity of conspecifics is an important yet highly variable skill. Approximately 2 % of the population suffers from a socially debilitating deficit in face recognition. More recently the existence of a similar deficit in voice perception has emerged (phonagnosia). Face perception tests have been readily available for years, advancing our understanding of underlying mechanisms in face perception. In contrast, voice perception has received less attention, and the construction of standardized voice perception tests has been neglected. Here we report the construction of the first standardized test for voice perception ability. Participants make a same/different identity decision after hearing two voice samples. Item Response Theory guided item selection to ensure the test discriminates between a range of abilities. The test provides a starting point for the systematic exploration of the cognitive and neural mechanisms underlying voice perception. With a high test-retest reliability (r=.86) and short assessment duration (~10 min) this test examines individual abilities reliably and quickly and therefore also has potential for use in developmental and neuropsychological populations.

  5. Recognizing famous voices: influence of stimulus duration and different types of retrieval cues.

    PubMed

    Schweinberger, S R; Herholz, A; Sommer, W

    1997-04-01

    The current investigation measured the effects of increasing stimulus duration on listeners' ability to recognize famous voices. In addition, the investigation studied the influence of different types of cues on the naming of voices that could not be named before. Participants were presented with samples of famous and unfamiliar voices and were asked to decide whether or not the samples were spoken by a famous person. The duration of each sample increased in seven steps from 0.25 s up to a maximum of 2 s. Voice recognition improvements with stimulus duration were with a growth function. Gains were most rapid within the first second and less pronounced thereafter. When participants were unable to name a famous voice, they were cued with either a second voice sample, the occupation, or the initials of the celebrity. Initials were most effective in eliciting the name only when semantic information about the speaker had been accessed prior to cue presentation. Paralleling previous research on face naming, this may indicate that voice naming is contingent on previous activation of person-specific semantic information.

  6. Exploring expressivity and emotion with artificial voice and speech technologies.

    PubMed

    Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

    2013-10-01

    Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.

  7. A preliminary comparison of speech recognition functionality in dental practice management systems.

    PubMed

    Irwin, Jeannie Y; Schleyer, Titus

    2008-11-06

    In this study, we examined speech recognition functionality in four leading dental practice management systems. Twenty dental students used voice to chart a simulated patient with 18 findings in each system. Results show it can take over a minute to chart one finding and that users frequently have to repeat commands. Limited functionality, poor usability and a high error rate appear to retard adoption of speech recognition in dentistry.

  8. A Survey on Sentiment Classification in Face Recognition

    NASA Astrophysics Data System (ADS)

    Qian, Jingyu

    2018-01-01

    Face recognition has been an important topic for both industry and academia for a long time. K-means clustering, autoencoder, and convolutional neural network, each representing a design idea for face recognition method, are three popular algorithms to deal with face recognition problems. It is worthwhile to summarize and compare these three different algorithms. This paper will focus on one specific face recognition problem-sentiment classification from images. Three different algorithms for sentiment classification problems will be summarized, including k-means clustering, autoencoder, and convolutional neural network. An experiment with the application of these algorithms on a specific dataset of human faces will be conducted to illustrate how these algorithms are applied and their accuracy. Finally, the three algorithms are compared based on the accuracy result.

  9. Double Fourier analysis for Emotion Identification in Voiced Speech

    NASA Astrophysics Data System (ADS)

    Sierra-Sosa, D.; Bastidas, M.; Ortiz P., D.; Quintero, O. L.

    2016-04-01

    We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech. Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions. A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds. Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions. Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it. Finally features related with emotions in voiced speech are extracted and presented.

  10. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  11. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  12. Influence of Smartphones and Software on Acoustic Voice Measures

    PubMed Central

    GRILLO, ELIZABETH U.; BROSIOUS, JENNA N.; SORRELL, STACI L.; ANAND, SUPRAJA

    2016-01-01

    This study assessed the within-subject variability of voice measures captured using different recording devices (i.e., smartphones and head mounted microphone) and software programs (i.e., Analysis of Dysphonia in Speech and Voice (ADSV), Multi-dimensional Voice Program (MDVP), and Praat). Correlations between the software programs that calculated the voice measures were also analyzed. Results demonstrated no significant within-subject variability across devices and software and that some of the measures were highly correlated across software programs. The study suggests that certain smartphones may be appropriate to record daily voice measures representing the effects of vocal loading within individuals. In addition, even though different algorithms are used to compute voice measures across software programs, some of the programs and measures share a similar relationship. PMID:28775797

  13. Handwritten digits recognition based on immune network

    NASA Astrophysics Data System (ADS)

    Li, Yangyang; Wu, Yunhui; Jiao, Lc; Wu, Jianshe

    2011-11-01

    With the development of society, handwritten digits recognition technique has been widely applied to production and daily life. It is a very difficult task to solve these problems in the field of pattern recognition. In this paper, a new method is presented for handwritten digit recognition. The digit samples firstly are processed and features extraction. Based on these features, a novel immune network classification algorithm is designed and implemented to the handwritten digits recognition. The proposed algorithm is developed by Jerne's immune network model for feature selection and KNN method for classification. Its characteristic is the novel network with parallel commutating and learning. The performance of the proposed method is experimented to the handwritten number datasets MNIST and compared with some other recognition algorithms-KNN, ANN and SVM algorithm. The result shows that the novel classification algorithm based on immune network gives promising performance and stable behavior for handwritten digits recognition.

  14. Hearing the Unheard: An Interdisciplinary, Mixed Methodology Study of Women’s Experiences of Hearing Voices (Auditory Verbal Hallucinations)

    PubMed Central

    McCarthy-Jones, Simon; Castro Romero, Maria; McCarthy-Jones, Roseline; Dillon, Jacqui; Cooper-Rompato, Christine; Kieran, Kathryn; Kaufman, Milissa; Blackman, Lisa

    2015-01-01

    This paper explores the experiences of women who “hear voices” (auditory verbal hallucinations). We begin by examining historical understandings of women hearing voices, showing these have been driven by androcentric theories of how women’s bodies functioned leading to women being viewed as requiring their voices be interpreted by men. We show the twentieth century was associated with recognition that the mental violation of women’s minds (represented by some voice-hearing) was often a consequence of the physical violation of women’s bodies. We next report the results of a qualitative study into voice-hearing women’s experiences (n = 8). This found similarities between women’s relationships with their voices and their relationships with others and the wider social context. Finally, we present results from a quantitative study comparing voice-hearing in women (n = 65) and men (n = 132) in a psychiatric setting. Women were more likely than men to have certain forms of voice-hearing (voices conversing) and to have antecedent events of trauma, physical illness, and relationship problems. Voices identified as female may have more positive affect than male voices. We conclude that women voice-hearers have and continue to face specific challenges necessitating research and activism, and hope this paper will act as a stimulus to such work. PMID:26779041

  15. Within-Category VOT Affects Recovery from "Lexical" Garden-Paths: Evidence against Phoneme-Level Inhibition

    ERIC Educational Resources Information Center

    McMurray, Bob; Tanenhaus, Michael K.; Aslin, Richard N.

    2009-01-01

    Spoken word recognition shows gradient sensitivity to within-category voice onset time (VOT), as predicted by several current models of spoken word recognition, including TRACE (McClelland, J., & Elman, J. (1986). The TRACE model of speech perception. "Cognitive Psychology," 18, 1-86). It remains unclear, however, whether this sensitivity is…

  16. Some Thoughts on the Meaning of and Values that Influence Degree Recognition in Canada

    ERIC Educational Resources Information Center

    Skolnik, Michael L.

    2006-01-01

    What has been called "degree recognition" has become the subject of considerable attention in Canadian higher education within the past decade. While concerns similar to those that are being voiced today have arisen occasionally in the past, the scale of this phenomenon today is unprecedented historically. In response to the increased…

  17. An Improved Iris Recognition Algorithm Based on Hybrid Feature and ELM

    NASA Astrophysics Data System (ADS)

    Wang, Juan

    2018-03-01

    The iris image is easily polluted by noise and uneven light. This paper proposed an improved extreme learning machine (ELM) based iris recognition algorithm with hybrid feature. 2D-Gabor filters and GLCM is employed to generate a multi-granularity hybrid feature vector. 2D-Gabor filter and GLCM feature work for capturing low-intermediate frequency and high frequency texture information, respectively. Finally, we utilize extreme learning machine for iris recognition. Experimental results reveal our proposed ELM based multi-granularity iris recognition algorithm (ELM-MGIR) has higher accuracy of 99.86%, and lower EER of 0.12% under the premise of real-time performance. The proposed ELM-MGIR algorithm outperforms other mainstream iris recognition algorithms.

  18. Comparison of voice-automated transcription and human transcription in generating pathology reports.

    PubMed

    Al-Aynati, Maamoun M; Chorneyko, Katherine A

    2003-06-01

    Software that can convert spoken words into written text has been available since the early 1980s. Early continuous speech systems were developed in 1994, with the latest commercially available editions having a claimed accuracy of up to 98% of speech recognition at natural speech rates. To evaluate the efficacy of one commercially available voice-recognition software system with pathology vocabulary in generating pathology reports and to compare this with human transcription. To draw cost analysis conclusions regarding human versus computer-based transcription. Two hundred six routine pathology reports from the surgical pathology material handled at St Joseph's Healthcare, Hamilton, Ontario, were generated simultaneously using computer-based transcription and human transcription. The following hardware and software were used: a desktop 450-MHz Intel Pentium III processor with 192 MB of RAM, a speech-quality sound card (Sound Blaster), noise-canceling headset microphone, and IBM ViaVoice Pro version 8 with pathology vocabulary support (Voice Automated, Huntington Beach, Calif). The cost of the hardware and software used was approximately Can 2250 dollars. A total of 23 458 words were transcribed using both methods with a mean of 114 words per report. The mean accuracy rate was 93.6% (range, 87.4%-96%) using the computer software, compared to a mean accuracy of 99.6% (range, 99.4%-99.8%) for human transcription (P <.001). Time needed to edit documents by the primary evaluator (M.A.) using the computer was on average twice that needed for editing the documents produced by human transcriptionists (range, 1.4-3.5 times). The extra time needed to edit documents was 67 minutes per week (13 minutes per day). Computer-based continuous speech-recognition systems in pathology can be successfully used in pathology practice even during the handling of gross pathology specimens. The relatively low accuracy rate of this voice-recognition software with resultant increased editing burden on pathologists may not encourage its application on a wide scale in pathology departments with sufficient human transcription services, despite significant potential financial savings. However, computer-based transcription represents an attractive and relatively inexpensive alternative to human transcription in departments where there is a shortage of transcription services, and will no doubt become more commonly used in pathology departments in the future.

  19. Human voice quality measurement in noisy environments.

    PubMed

    Ueng, Shyh-Kuang; Luo, Cheng-Ming; Tsai, Tsung-Yu; Yeh, Hsuan-Chen

    2015-01-01

    Computerized acoustic voice measurement is essential for the diagnosis of vocal pathologies. Previous studies showed that ambient noises have significant influences on the accuracy of voice quality assessment. This paper presents a voice quality assessment system that can accurately measure qualities of voice signals, even though the input voice data are contaminated by low-frequency noises. The ambient noises in our living rooms and laboratories are collected and the frequencies of these noises are analyzed. Based on the analysis, a filter is designed to reduce noise level of the input voice signal. Then, improved numerical algorithms are employed to extract voice parameters from the voice signal to reveal the health of the voice signal. Compared with MDVP and Praat, the proposed method outperforms these two widely used programs in measuring fundamental frequency and harmonic-to-noise ratio, and its performance is comparable to these two famous programs in computing jitter and shimmer. The proposed voice quality assessment method is resistant to low-frequency noises and it can measure human voice quality in environments filled with noises from air-conditioners, ceiling fans and cooling fans of computers.

  20. Intentional Voice Command Detection for Trigger-Free Speech Interface

    NASA Astrophysics Data System (ADS)

    Obuchi, Yasunari; Sumiyoshi, Takashi

    In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.

  1. Automatic assessment of voice quality according to the GRBAS scale.

    PubMed

    Sáenz-Lechón, Nicolás; Godino-Llorente, Juan I; Osma-Ruiz, Víctor; Blanco-Velasco, Manuel; Cruz-Roldán, Fernando

    2006-01-01

    Nowadays, the most extended techniques to measure the voice quality are based on perceptual evaluation by well trained professionals. The GRBAS scale is a widely used method for perceptual evaluation of voice quality. The GRBAS scale is widely used in Japan and there is increasing interest in both Europe and the United States. However, this technique needs well-trained experts, and is based on the evaluator's expertise, depending a lot on his own psycho-physical state. Furthermore, a great variability in the assessments performed from one evaluator to another is observed. Therefore, an objective method to provide such measurement of voice quality would be very valuable. In this paper, the automatic assessment of voice quality is addressed by means of short-term Mel cepstral parameters (MFCC), and learning vector quantization (LVQ) in a pattern recognition stage. Results show that this approach provides acceptable results for this purpose, with accuracy around 65% at the best.

  2. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    PubMed

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  3. Comparison of Voice Handicap Index Scores Between Female Students of Speech Therapy and Other Health Professions.

    PubMed

    Tafiadis, Dionysios; Chronopoulos, Spyridon K; Siafaka, Vassiliki; Drosos, Konstantinos; Kosma, Evangelia I; Toki, Eugenia I; Ziavra, Nausica

    2017-09-01

    Students' groups (eg, teachers, speech language pathologists) are presumably at risk of developing a voice disorder due to misuse of their voice, which will affect their way of living. Multidisciplinary voice assessment of student populations is currently spread widely along with the use of self-reported questionnaires. This study compared the Voice Handicap Index domains and item scores between female students of speech and language therapy and of other health professions in Greece. We also examined the probability of speech language therapy students developing any vocal symptom. Two hundred female non-dysphonic students (aged 18-31) were recruited. Participants answered the Voice Evaluation Form and the Greek adaptation of the Voice Handicap Index. Significant differences were observed between the two groups (students of speech therapy and other health professions) through Voice Handicap Index (total score, functional and physical domains), excluding the emotional domain. Furthermore, significant differences for specific Voice Handicap Index items, between subgroups, were observed. In conclusion, speech language therapy students had higher Voice Handicap Index scores, which probably could be an indicator for avoiding profession-related dysphonia at a later stage. Also, Voice Handicap Index could be at a first glance an assessment tool for the recognition of potential voice disorder development in students. In turn, the results could be used for indirect therapy approaches, such as providing methods for maintaining vocal health in different student populations. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  4. Vocal recognition of owners by domestic cats (Felis catus).

    PubMed

    Saito, Atsuko; Shinozuka, Kazutaka

    2013-07-01

    Domestic cats have had a 10,000-year history of cohabitation with humans and seem to have the ability to communicate with humans. However, this has not been widely examined. We studied 20 domestic cats to investigate whether they could recognize their owners by using voices that called out the subjects' names, with a habituation-dishabituation method. While the owner was out of the cat's sight, we played three different strangers' voices serially, followed by the owner's voice. We recorded the cat's reactions to the voices and categorized them into six behavioral categories. In addition, ten naive raters rated the cats' response magnitudes. The cats responded to human voices not by communicative behavior (vocalization and tail movement), but by orienting behavior (ear movement and head movement). This tendency did not change even when they were called by their owners. Of the 20 cats, 15 demonstrated a lower response magnitude to the third voice than to the first voice. These habituated cats showed a significant rebound in response to the subsequent presentation of their owners' voices. This result indicates that cats are able to use vocal cues alone to distinguish between humans.

  5. The Pandora multi-algorithm approach to automated pattern recognition in LAr TPC detectors

    NASA Astrophysics Data System (ADS)

    Marshall, J. S.; Blake, A. S. T.; Thomson, M. A.; Escudero, L.; de Vries, J.; Weston, J.; MicroBooNE Collaboration

    2017-09-01

    The development and operation of Liquid Argon Time Projection Chambers (LAr TPCs) for neutrino physics has created a need for new approaches to pattern recognition, in order to fully exploit the superb imaging capabilities offered by this technology. The Pandora Software Development Kit provides functionality to aid the process of designing, implementing and running pattern recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition: individual algorithms each address a specific task in a particular topology; a series of many tens of algorithms then carefully builds-up a picture of the event. The input to the Pandora pattern recognition is a list of 2D Hits. The output from the chain of over 70 algorithms is a hierarchy of reconstructed 3D Particles, each with an identified particle type, vertex and direction.

  6. Kernel-Based Sensor Fusion With Application to Audio-Visual Voice Activity Detection

    NASA Astrophysics Data System (ADS)

    Dov, David; Talmon, Ronen; Cohen, Israel

    2016-12-01

    In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance.

  7. Optimized delivery radiological reports: applying Six Sigma methodology to a radiology department.

    PubMed

    Cavagna, Enrico; Berletti, Riccardo; Schiavon, Francesco; Scarsi, Barbara; Barbato, Giuseppe

    2003-03-01

    To optimise the process of reporting and delivering radiological examinations with a view to achieving 100% service delivery within 72 hours to outpatients and 36 hours to inpatients. To this end, we used the Six Sigma method which adopts a systematic approach and rigorous statistical analysis to analyse and improve processes, by reducing variability and minimising errors. More specifically, our study focused on the process of radiological report creation, from the end of the examination to the time when the report is made available to the patient, to examine the bottlenecks and identify the measures to be taken to improve the process. Six Sigma uses a five-step problem-solving process called DMAIC, an acronym for Define, Measure, Analyze, Improve and Control. The first step is to define the problem and the elements crucial to quality, in terms of Total Quality Control. Next, the situation is analysed to identify the root causes of the problem and determine which of these is most influential. The situation is then improved by implementing change. Finally, to make sure that the change is long-lasting, measures are taken to sustain the improvements and obtain long-term control. In our case we analysed all of the phases the report passes through before reaching the user, and studied the impact of voice-recognition reporting on the speed of the report creation process. Analysis of the information collected showed that the tools available for report creation (dictaphone, voice-recognition system) and the transport of films and reports were the two critical elements on which to focus our efforts. Of all the phases making up the process, reporting (from end of examination to end of reporting) and distribution (from the report available to administrative staff to report available to the patient) account for 90% of process variability (73% and 17%, respectively). We further found that the reports dictated into a voice-recognition reporting system are delivered in 45 hours (median), whereas those dictated using a dictaphone take 96 hours: voice-recognition reporting systems therefore improve performance by 50 hours. Unfortunately, 38% of our reports are delivered within longer timeframes than the 72h for outpatients and 36h for inpatients agreed with the service users. Reports for inpatients have much faster delivery times and lower variability, as 95% of these examinations are reported using voice-recognition reporting (as a result of the greater sensitivity of physicians to the problem of inpatient waiting times). For conventional radiology examinations, numerically greater than CT or MRI, there is a stronger tendency to use the dictaphone which allows for faster dictation as it is unburdened by administrative tasks such as entering examination codes, correcting errors, etc. Freelance status has no impact on report delivery times, service delivery being the same as in the institutional setting. The subprocess of reporting is strongly affected by the choice of reporting method (voice-recognition system or dictaphone), whereas report delivery is affected by the individual's behaviour patterns and ultimately by habits generated by the lack of a clearly charted process (lack of synchronisation among the various phases), and therefore potentially avoidable. The analytical study of the various phases of examination reporting, from writing to delivery, allowed us to identify the process bottlenecks and take corrective measures. Regardless of imaging modality and individual physician, examination reporting consistently takes longer when a dictaphone is used instead of a voice-recognition reporting system, as this makes the process more complex. To improve the two critical subprocesses whilst maintaining constant resources, a first step is to abandon the dictaphone in favour of the voice-recognition system. In addition, we are experimenting other measures to improve the collection and sorting of examinations and the delivery of reports: the technical staff take the films from the examination rooms to the reporting rooms three times a day; the radiologists collect their examinations and prepare the reports, possibly on the same day; the radiologists leave their signed reports on the table in the central reporting room; the administrative staff collect the signed reports three times a day in the morning and afternoon to be able to deliver them on the same day. This project has allowed us to become familiar with the principles of total quality, to better understand our internal processes and to take effective measures to optimise them. This has resulted in enhanced satisfaction of all the department staff and has laid the grounds for further measures in the future.

  8. The Pandora multi-algorithm approach to automated pattern recognition of cosmic-ray muon and neutrino events in the MicroBooNE detector

    NASA Astrophysics Data System (ADS)

    Acciarri, R.; Adams, C.; An, R.; Anthony, J.; Asaadi, J.; Auger, M.; Bagby, L.; Balasubramanian, S.; Baller, B.; Barnes, C.; Barr, G.; Bass, M.; Bay, F.; Bishai, M.; Blake, A.; Bolton, T.; Camilleri, L.; Caratelli, D.; Carls, B.; Castillo Fernandez, R.; Cavanna, F.; Chen, H.; Church, E.; Cianci, D.; Cohen, E.; Collin, G. H.; Conrad, J. M.; Convery, M.; Crespo-Anadón, J. I.; Del Tutto, M.; Devitt, D.; Dytman, S.; Eberly, B.; Ereditato, A.; Escudero Sanchez, L.; Esquivel, J.; Fadeeva, A. A.; Fleming, B. T.; Foreman, W.; Furmanski, A. P.; Garcia-Gamez, D.; Garvey, G. T.; Genty, V.; Goeldi, D.; Gollapinni, S.; Graf, N.; Gramellini, E.; Greenlee, H.; Grosso, R.; Guenette, R.; Hackenburg, A.; Hamilton, P.; Hen, O.; Hewes, J.; Hill, C.; Ho, J.; Horton-Smith, G.; Hourlier, A.; Huang, E.-C.; James, C.; Jan de Vries, J.; Jen, C.-M.; Jiang, L.; Johnson, R. A.; Joshi, J.; Jostlein, H.; Kaleko, D.; Karagiorgi, G.; Ketchum, W.; Kirby, B.; Kirby, M.; Kobilarcik, T.; Kreslo, I.; Laube, A.; Li, Y.; Lister, A.; Littlejohn, B. R.; Lockwitz, S.; Lorca, D.; Louis, W. C.; Luethi, M.; Lundberg, B.; Luo, X.; Marchionni, A.; Mariani, C.; Marshall, J.; Martinez Caicedo, D. A.; Meddage, V.; Miceli, T.; Mills, G. B.; Moon, J.; Mooney, M.; Moore, C. D.; Mousseau, J.; Murrells, R.; Naples, D.; Nienaber, P.; Nowak, J.; Palamara, O.; Paolone, V.; Papavassiliou, V.; Pate, S. F.; Pavlovic, Z.; Piasetzky, E.; Porzio, D.; Pulliam, G.; Qian, X.; Raaf, J. L.; Rafique, A.; Rochester, L.; Rudolf von Rohr, C.; Russell, B.; Schmitz, D. W.; Schukraft, A.; Seligman, W.; Shaevitz, M. H.; Sinclair, J.; Smith, A.; Snider, E. L.; Soderberg, M.; Söldner-Rembold, S.; Soleti, S. R.; Spentzouris, P.; Spitz, J.; St. John, J.; Strauss, T.; Szelc, A. M.; Tagg, N.; Terao, K.; Thomson, M.; Toups, M.; Tsai, Y.-T.; Tufanli, S.; Usher, T.; Van De Pontseele, W.; Van de Water, R. G.; Viren, B.; Weber, M.; Wickremasinghe, D. A.; Wolbers, S.; Wongjirad, T.; Woodruff, K.; Yang, T.; Yates, L.; Zeller, G. P.; Zennamo, J.; Zhang, C.

    2018-01-01

    The development and operation of liquid-argon time-projection chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens of algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies.

  9. Reference-free automatic quality assessment of tracheoesophageal speech.

    PubMed

    Huang, Andy; Falk, Tiago H; Chan, Wai-Yip; Parsa, Vijay; Doyle, Philip

    2009-01-01

    Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in conjunction with a recently introduced speech modulation spectrum measure. Tests performed on two TE speech databases demonstrate that the modulation spectral measure and a subset of features in the standard ITU-T P.563 algorithm estimate TE speech quality with better correlation (up to 0.9) than previously proposed features.

  10. Voices to reckon with: perceptions of voice identity in clinical and non-clinical voice hearers

    PubMed Central

    Badcock, Johanna C.; Chhabra, Saruchi

    2013-01-01

    The current review focuses on the perception of voice identity in clinical and non-clinical voice hearers. Identity perception in auditory verbal hallucinations (AVH) is grounded in the mechanisms of human (i.e., real, external) voice perception, and shapes the emotional (distress) and behavioral (help-seeking) response to the experience. Yet, the phenomenological assessment of voice identity is often limited, for example to the gender of the voice, and has failed to take advantage of recent models and evidence on human voice perception. In this paper we aim to synthesize the literature on identity in real and hallucinated voices and begin by providing a comprehensive overview of the features used to judge voice identity in healthy individuals and in people with schizophrenia. The findings suggest some subtle, but possibly systematic biases across different levels of voice identity in clinical hallucinators that are associated with higher levels of distress. Next we provide a critical evaluation of voice processing abilities in clinical and non-clinical voice hearers, including recent data collected in our laboratory. Our studies used diverse methods, assessing recognition and binding of words and voices in memory as well as multidimensional scaling of voice dissimilarity judgments. The findings overall point to significant difficulties recognizing familiar speakers and discriminating between unfamiliar speakers in people with schizophrenia, both with and without AVH. In contrast, these voice processing abilities appear to be generally intact in non-clinical hallucinators. The review highlights some important avenues for future research and treatment of AVH associated with a need for care, and suggests some novel insights into other symptoms of psychosis. PMID:23565088

  11. The "Reading the Mind in the Voice" Test-Revised: A Study of Complex Emotion Recognition in Adults with and Without Autism Spectrum Conditions

    ERIC Educational Resources Information Center

    Golan, Ofer; Baron-Cohen, Simon; Hill, Jacqueline J.; Rutherford, M. D.

    2007-01-01

    This study reports a revised version of the "Reading the Mind in the Voice" (RMV) task. The original task (Rutherford et al., (2002), "Journal of Autism and Developmental Disorders, 32," 189-194) suffered from ceiling effects and limited sensitivity. To improve that, the task was shortened and two more foils were added to each of the remaining…

  12. A perspective on early commercial applications of voice-processing technology for telecommunications and aids for the handicapped.

    PubMed Central

    Seelbach, C

    1995-01-01

    The Colloquium on Human-Machine Communication by Voice highlighted the global technical community's focus on the problems and promise of voice-processing technology, particularly, speech recognition and speech synthesis. Clearly, there are many areas in both the research and development of these technologies that can be advanced significantly. However, it is also true that there are many applications of these technologies that are capable of commercialization now. Early successful commercialization of new technology is vital to ensure continuing interest in its development. This paper addresses efforts to commercialize speech technologies in two markets: telecommunications and aids for the handicapped. PMID:7479814

  13. The Use of Voice Cues for Speaker Gender Recognition in Cochlear Implant Recipients

    ERIC Educational Resources Information Center

    Meister, Hartmut; Fürsen, Katrin; Streicher, Barbara; Lang-Roth, Ruth; Walger, Martin

    2016-01-01

    Purpose: The focus of this study was to examine the influence of fundamental frequency (F0) and vocal tract length (VTL) modifications on speaker gender recognition in cochlear implant (CI) recipients for different stimulus types. Method: Single words and sentences were manipulated using isolated or combined F0 and VTL cues. Using an 11-point…

  14. Voice gender discrimination provides a measure of more than pitch-related perception in cochlear implant users

    PubMed Central

    Li, Tianhao; Fu, Qian-Jie

    2013-01-01

    Objectives (1) To investigate whether voice gender discrimination (VGD) could be a useful indicator of the spectral and temporal processing abilities of individual cochlear implant (CI) users; (2) To examine the relationship between VGD and speech recognition with CI when comparable acoustic cues are used for both perception processes. Design VGD was measured using two talker sets with different inter-gender fundamental frequencies (F0), as well as different acoustic CI simulations. Vowel and consonant recognition in quiet and noise were also measured and compared with VGD performance. Study sample Eleven postlingually deaf CI users. Results The results showed that (1) mean VGD performance differed for different stimulus sets, (2) VGD and speech recognition performance varied among individual CI users, and (3) individual VGD performance was significantly correlated with speech recognition performance under certain conditions. Conclusions VGD measured with selected stimulus sets might be useful for assessing not only pitch-related perception, but also spectral and temporal processing by individual CI users. In addition to improvements in spectral resolution and modulation detection, the improvement in higher modulation frequency discrimination might be particularly important for CI users in noisy environments. PMID:21696330

  15. Using Ambulatory Voice Monitoring to Investigate Common Voice Disorders: Research Update

    PubMed Central

    Mehta, Daryush D.; Van Stan, Jarrad H.; Zañartu, Matías; Ghassemi, Marzyeh; Guttag, John V.; Espinoza, Víctor M.; Cortés, Juan P.; Cheyne, Harold A.; Hillman, Robert E.

    2015-01-01

    Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, referred to as vocal hyperfunction. The clinical management of hyperfunctional voice disorders would be greatly enhanced by the ability to monitor and quantify detrimental vocal behaviors during an individual’s activities of daily life. This paper provides an update on ongoing work that uses a miniature accelerometer on the neck surface below the larynx to collect a large set of ambulatory data on patients with hyperfunctional voice disorders (before and after treatment) and matched-control subjects. Three types of analysis approaches are being employed in an effort to identify the best set of measures for differentiating among hyperfunctional and normal patterns of vocal behavior: (1) ambulatory measures of voice use that include vocal dose and voice quality correlates, (2) aerodynamic measures based on glottal airflow estimates extracted from the accelerometer signal using subject-specific vocal system models, and (3) classification based on machine learning and pattern recognition approaches that have been used successfully in analyzing long-term recordings of other physiological signals. Preliminary results demonstrate the potential for ambulatory voice monitoring to improve the diagnosis and treatment of common hyperfunctional voice disorders. PMID:26528472

  16. Brain systems mediating voice identity processing in blind humans.

    PubMed

    Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

    2014-09-01

    Blind people rely more on vocal cues when they recognize a person's identity than sighted people. Indeed, a number of studies have reported better voice recognition skills in blind than in sighted adults. The present functional magnetic resonance imaging study investigated changes in the functional organization of neural systems involved in voice identity processing following congenital blindness. A group of congenitally blind individuals and matched sighted control participants were tested in a priming paradigm, in which two voice stimuli (S1, S2) were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either a old or a young person. Person-incongruent voices (S2) compared with person-congruent voices elicited an increased activation in the right anterior fusiform gyrus in congenitally blind individuals but not in matched sighted control participants. In contrast, only matched sighted controls showed a higher activation in response to person-incongruent compared with person-congruent voices (S2) in the right posterior superior temporal sulcus. These results provide evidence for crossmodal plastic changes of the person identification system in the brain after visual deprivation. Copyright © 2014 Wiley Periodicals, Inc.

  17. The impact of compression of speech signal, background noise and acoustic disturbances on the effectiveness of speaker identification

    NASA Astrophysics Data System (ADS)

    Kamiński, K.; Dobrowolski, A. P.

    2017-04-01

    The paper presents the architecture and the results of optimization of selected elements of the Automatic Speaker Recognition (ASR) system that uses Gaussian Mixture Models (GMM) in the classification process. Optimization was performed on the process of selection of individual characteristics using the genetic algorithm and the parameters of Gaussian distributions used to describe individual voices. The system that was developed was tested in order to evaluate the impact of different compression methods used, among others, in landline, mobile, and VoIP telephony systems, on effectiveness of the speaker identification. Also, the results were presented of effectiveness of speaker identification at specific levels of noise with the speech signal and occurrence of other disturbances that could appear during phone calls, which made it possible to specify the spectrum of applications of the presented ASR system.

  18. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzrichter, J.F.; Ng, L.C.

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less

  19. A GPU-paralleled implementation of an enhanced face recognition algorithm

    NASA Astrophysics Data System (ADS)

    Chen, Hao; Liu, Xiyang; Shao, Shuai; Zan, Jiguo

    2013-03-01

    Face recognition algorithm based on compressed sensing and sparse representation is hotly argued in these years. The scheme of this algorithm increases recognition rate as well as anti-noise capability. However, the computational cost is expensive and has become a main restricting factor for real world applications. In this paper, we introduce a GPU-accelerated hybrid variant of face recognition algorithm named parallel face recognition algorithm (pFRA). We describe here how to carry out parallel optimization design to take full advantage of many-core structure of a GPU. The pFRA is tested and compared with several other implementations under different data sample size. Finally, Our pFRA, implemented with NVIDIA GPU and Computer Unified Device Architecture (CUDA) programming model, achieves a significant speedup over the traditional CPU implementations.

  20. Real-time interactive speech technology at Threshold Technology, Incorporated

    NASA Technical Reports Server (NTRS)

    Herscher, Marvin B.

    1977-01-01

    Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed.

  1. Objective speech quality assessment and the RPE-LTP coding algorithm in different noise and language conditions.

    PubMed

    Hansen, J H; Nandkumar, S

    1995-01-01

    The formulation of reliable signal processing algorithms for speech coding and synthesis require the selection of a prior criterion of performance. Though coding efficiency (bits/second) or computational requirements can be used, a final performance measure must always include speech quality. In this paper, three objective speech quality measures are considered with respect to quality assessment for American English, noisy American English, and noise-free versions of seven languages. The purpose is to determine whether objective quality measures can be used to quantify changes in quality for a given voice coding method, with a known subjective performance level, as background noise or language conditions are changed. The speech coding algorithm chosen is regular-pulse excitation with long-term prediction (RPE-LTP), which has been chosen as the standard voice compression algorithm for the European Digital Mobile Radio system. Three areas are considered for objective quality assessment which include: (i) vocoder performance for American English in a noise-free environment, (ii) speech quality variation for three additive background noise sources, and (iii) noise-free performance for seven languages which include English, Japanese, Finnish, German, Hindi, Spanish, and French. It is suggested that although existing objective quality measures will never replace subjective testing, they can be a useful means of assessing changes in performance, identifying areas for improvement in algorithm design, and augmenting subjective quality tests for voice coding/compression algorithms in noise-free, noisy, and/or non-English applications.

  2. Target recognition of ladar range images using slice image: comparison of four improved algorithms

    NASA Astrophysics Data System (ADS)

    Xia, Wenze; Han, Shaokun; Cao, Jingya; Wang, Liang; Zhai, Yu; Cheng, Yang

    2017-07-01

    Compared with traditional 3-D shape data, ladar range images possess properties of strong noise, shape degeneracy, and sparsity, which make feature extraction and representation difficult. The slice image is an effective feature descriptor to resolve this problem. We propose four improved algorithms on target recognition of ladar range images using slice image. In order to improve resolution invariance of the slice image, mean value detection instead of maximum value detection is applied in these four improved algorithms. In order to improve rotation invariance of the slice image, three new improved feature descriptors-which are feature slice image, slice-Zernike moments, and slice-Fourier moments-are applied to the last three improved algorithms, respectively. Backpropagation neural networks are used as feature classifiers in the last two improved algorithms. The performance of these four improved recognition systems is analyzed comprehensively in the aspects of the three invariances, recognition rate, and execution time. The final experiment results show that the improvements for these four algorithms reach the desired effect, the three invariances of feature descriptors are not directly related to the final recognition performance of recognition systems, and these four improved recognition systems have different performances under different conditions.

  3. A Development of a System Enables Character Input and PC Operation via Voice for a Physically Disabled Person with a Speech Impediment

    NASA Astrophysics Data System (ADS)

    Tanioka, Toshimasa; Egashira, Hiroyuki; Takata, Mayumi; Okazaki, Yasuhisa; Watanabe, Kenzi; Kondo, Hiroki

    We have designed and implemented a PC operation support system for a physically disabled person with a speech impediment via voice. Voice operation is an effective method for a physically disabled person with involuntary movement of the limbs and the head. We have applied a commercial speech recognition engine to develop our system for practical purposes. Adoption of a commercial engine reduces development cost and will contribute to make our system useful to another speech impediment people. We have customized commercial speech recognition engine so that it can recognize the utterance of a person with a speech impediment. We have restricted the words that the recognition engine recognizes and separated a target words from similar words in pronunciation to avoid misrecognition. Huge number of words registered in commercial speech recognition engines cause frequent misrecognition for speech impediments' utterance, because their utterance is not clear and unstable. We have solved this problem by narrowing the choice of input down in a small number and also by registering their ambiguous pronunciations in addition to the original ones. To realize all character inputs and all PC operation with a small number of words, we have designed multiple input modes with categorized dictionaries and have introduced two-step input in each mode except numeral input to enable correct operation with small number of words. The system we have developed is in practical level. The first author of this paper is physically disabled with a speech impediment. He has been able not only character input into PC but also to operate Windows system smoothly by using this system. He uses this system in his daily life. This paper is written by him with this system. At present, the speech recognition is customized to him. It is, however, possible to customize for other users by changing words and registering new pronunciation according to each user's utterance.

  4. Using Continuous Voice Recognition Technology as an Input Medium to the Naval Warfare Interactive Simulation System (NWISS).

    DTIC Science & Technology

    1984-06-01

    Co ,u’arataor, Gr 7- / ’ . c ; / , caae.ic >ar. ’ ’# d:.i II ’ ..... .. . . .. .. . ... . , rV ABSTRACT A great d-al of research has been conducted an...9 2. Continuous Voice -%ecoait.ior, ....... 11 B. VERBEX 3000 SPEECH APPLiCATION DEVELOP !ENT SYSTEM! ( SPADS ...13 C . NAVAL IAR FARE INT7EACTI7E S:AIULATIC"N SYSTEM (NWISS) ....... .................. 14 D. PURPOSE .................... 16 1. A Past

  5. The Pandora multi-algorithm approach to automated pattern recognition of cosmic-ray muon and neutrino events in the MicroBooNE detector

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Acciarri, R.; Adams, C.; An, R.

    The development and operation of Liquid-Argon Time-Projection Chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens ofmore » algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies.« less

  6. The Pandora multi-algorithm approach to automated pattern recognition of cosmic-ray muon and neutrino events in the MicroBooNE detector

    DOE PAGES

    Acciarri, R.; Adams, C.; An, R.; ...

    2018-01-29

    The development and operation of Liquid-Argon Time-Projection Chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens ofmore » algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies.« less

  7. A Multidimensional Approach to the Study of Emotion Recognition in Autism Spectrum Disorders

    PubMed Central

    Xavier, Jean; Vignaud, Violaine; Ruggiero, Rosa; Bodeau, Nicolas; Cohen, David; Chaby, Laurence

    2015-01-01

    Although deficits in emotion recognition have been widely reported in autism spectrum disorder (ASD), experiments have been restricted to either facial or vocal expressions. Here, we explored multimodal emotion processing in children with ASD (N = 19) and with typical development (TD, N = 19), considering uni (faces and voices) and multimodal (faces/voices simultaneously) stimuli and developmental comorbidities (neuro-visual, language and motor impairments). Compared to TD controls, children with ASD had rather high and heterogeneous emotion recognition scores but showed also several significant differences: lower emotion recognition scores for visual stimuli, for neutral emotion, and a greater number of saccades during visual task. Multivariate analyses showed that: (1) the difficulties they experienced with visual stimuli were partially alleviated with multimodal stimuli. (2) Developmental age was significantly associated with emotion recognition in TD children, whereas it was the case only for the multimodal task in children with ASD. (3) Language impairments tended to be associated with emotion recognition scores of ASD children in the auditory modality. Conversely, in the visual or bimodal (visuo-auditory) tasks, the impact of developmental coordination disorder or neuro-visual impairments was not found. We conclude that impaired emotion processing constitutes a dimension to explore in the field of ASD, as research has the potential to define more homogeneous subgroups and tailored interventions. However, it is clear that developmental age, the nature of the stimuli, and other developmental comorbidities must also be taken into account when studying this dimension. PMID:26733928

  8. Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.

    PubMed

    Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina

    2014-12-01

    In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Score-Level Fusion of Phase-Based and Feature-Based Fingerprint Matching Algorithms

    NASA Astrophysics Data System (ADS)

    Ito, Koichi; Morita, Ayumi; Aoki, Takafumi; Nakajima, Hiroshi; Kobayashi, Koji; Higuchi, Tatsuo

    This paper proposes an efficient fingerprint recognition algorithm combining phase-based image matching and feature-based matching. In our previous work, we have already proposed an efficient fingerprint recognition algorithm using Phase-Only Correlation (POC), and developed commercial fingerprint verification units for access control applications. The use of Fourier phase information of fingerprint images makes it possible to achieve robust recognition for weakly impressed, low-quality fingerprint images. This paper presents an idea of improving the performance of POC-based fingerprint matching by combining it with feature-based matching, where feature-based matching is introduced in order to improve recognition efficiency for images with nonlinear distortion. Experimental evaluation using two different types of fingerprint image databases demonstrates efficient recognition performance of the combination of the POC-based algorithm and the feature-based algorithm.

  10. Voice input/output capabilities at Perception Technology Corporation

    NASA Technical Reports Server (NTRS)

    Ferber, Leon A.

    1977-01-01

    Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included.

  11. Adaptive Suppression of Noise in Voice Communications

    NASA Technical Reports Server (NTRS)

    Kozel, David; DeVault, James A.; Birr, Richard B.

    2003-01-01

    A subsystem for the adaptive suppression of noise in a voice communication system effects a high level of reduction of noise that enters the system through microphones. The subsystem includes a digital signal processor (DSP) plus circuitry that implements voice-recognition and spectral- manipulation techniques. The development of the adaptive noise-suppression subsystem was prompted by the following considerations: During processing of the space shuttle at Kennedy Space Center, voice communications among test team members have been significantly impaired in several instances because some test participants have had to communicate from locations with high ambient noise levels. Ear protection for the personnel involved is commercially available and is used in such situations. However, commercially available noise-canceling microphones do not provide sufficient reduction of noise that enters through microphones and thus becomes transmitted on outbound communication links.

  12. Model and algorithmic framework for detection and correction of cognitive errors.

    PubMed

    Feki, Mohamed Ali; Biswas, Jit; Tolstikov, Andrei

    2009-01-01

    This paper outlines an approach that we are taking for elder-care applications in the smart home, involving cognitive errors and their compensation. Our approach involves high level modeling of daily activities of the elderly by breaking down these activities into smaller units, which can then be automatically recognized at a low level by collections of sensors placed in the homes of the elderly. This separation allows us to employ plan recognition algorithms and systems at a high level, while developing stand-alone activity recognition algorithms and systems at a low level. It also allows the mixing and matching of multi-modality sensors of various kinds that go to support the same high level requirement. Currently our plan recognition algorithms are still at a conceptual stage, whereas a number of low level activity recognition algorithms and systems have been developed. Herein we present our model for plan recognition, providing a brief survey of the background literature. We also present some concrete results that we have achieved for activity recognition, emphasizing how these results are incorporated into the overall plan recognition system.

  13. Physical environment virtualization for human activities recognition

    NASA Astrophysics Data System (ADS)

    Poshtkar, Azin; Elangovan, Vinayak; Shirkhodaie, Amir; Chan, Alex; Hu, Shuowen

    2015-05-01

    Human activity recognition research relies heavily on extensive datasets to verify and validate performance of activity recognition algorithms. However, obtaining real datasets are expensive and highly time consuming. A physics-based virtual simulation can accelerate the development of context based human activity recognition algorithms and techniques by generating relevant training and testing videos simulating diverse operational scenarios. In this paper, we discuss in detail the requisite capabilities of a virtual environment to aid as a test bed for evaluating and enhancing activity recognition algorithms. To demonstrate the numerous advantages of virtual environment development, a newly developed virtual environment simulation modeling (VESM) environment is presented here to generate calibrated multisource imagery datasets suitable for development and testing of recognition algorithms for context-based human activities. The VESM environment serves as a versatile test bed to generate a vast amount of realistic data for training and testing of sensor processing algorithms. To demonstrate the effectiveness of VESM environment, we present various simulated scenarios and processed results to infer proper semantic annotations from the high fidelity imagery data for human-vehicle activity recognition under different operational contexts.

  14. Vehicle logo recognition using multi-level fusion model

    NASA Astrophysics Data System (ADS)

    Ming, Wei; Xiao, Jianli

    2018-04-01

    Vehicle logo recognition plays an important role in manufacturer identification and vehicle recognition. This paper proposes a new vehicle logo recognition algorithm. It has a hierarchical framework, which consists of two fusion levels. At the first level, a feature fusion model is employed to map the original features to a higher dimension feature space. In this space, the vehicle logos become more recognizable. At the second level, a weighted voting strategy is proposed to promote the accuracy and the robustness of the recognition results. To evaluate the performance of the proposed algorithm, extensive experiments are performed, which demonstrate that the proposed algorithm can achieve high recognition accuracy and work robustly.

  15. Memory for faces and voices varies as a function of sex and expressed emotion.

    PubMed

    S Cortes, Diana; Laukka, Petri; Lindahl, Christina; Fischer, Håkan

    2017-01-01

    We investigated how memory for faces and voices (presented separately and in combination) varies as a function of sex and emotional expression (anger, disgust, fear, happiness, sadness, and neutral). At encoding, participants judged the expressed emotion of items in forced-choice tasks, followed by incidental Remember/Know recognition tasks. Results from 600 participants showed that accuracy (hits minus false alarms) was consistently higher for neutral compared to emotional items, whereas accuracy for specific emotions varied across the presentation modalities (i.e., faces, voices, and face-voice combinations). For the subjective sense of recollection ("remember" hits), neutral items received the highest hit rates only for faces, whereas for voices and face-voice combinations anger and fear expressions instead received the highest recollection rates. We also observed better accuracy for items by female expressers, and own-sex bias where female participants displayed memory advantage for female faces and face-voice combinations. Results further suggest that own-sex bias can be explained by recollection, rather than familiarity, rates. Overall, results show that memory for faces and voices may be influenced by the expressions that they carry, as well as by the sex of both items and participants. Emotion expressions may also enhance the subjective sense of recollection without enhancing memory accuracy.

  16. Memory for faces and voices varies as a function of sex and expressed emotion

    PubMed Central

    Laukka, Petri; Lindahl, Christina; Fischer, Håkan

    2017-01-01

    We investigated how memory for faces and voices (presented separately and in combination) varies as a function of sex and emotional expression (anger, disgust, fear, happiness, sadness, and neutral). At encoding, participants judged the expressed emotion of items in forced-choice tasks, followed by incidental Remember/Know recognition tasks. Results from 600 participants showed that accuracy (hits minus false alarms) was consistently higher for neutral compared to emotional items, whereas accuracy for specific emotions varied across the presentation modalities (i.e., faces, voices, and face-voice combinations). For the subjective sense of recollection (“remember” hits), neutral items received the highest hit rates only for faces, whereas for voices and face-voice combinations anger and fear expressions instead received the highest recollection rates. We also observed better accuracy for items by female expressers, and own-sex bias where female participants displayed memory advantage for female faces and face-voice combinations. Results further suggest that own-sex bias can be explained by recollection, rather than familiarity, rates. Overall, results show that memory for faces and voices may be influenced by the expressions that they carry, as well as by the sex of both items and participants. Emotion expressions may also enhance the subjective sense of recollection without enhancing memory accuracy. PMID:28570691

  17. More than Just Two Sexes: The Neural Correlates of Voice Gender Perception in Gender Dysphoria

    PubMed Central

    Junger, Jessica; Habel, Ute; Bröhr, Sabine; Neulen, Josef; Neuschaefer-Rube, Christiane; Birkholz, Peter; Kohler, Christian; Schneider, Frank; Derntl, Birgit; Pauly, Katharina

    2014-01-01

    Gender dysphoria (also known as “transsexualism”) is characterized as a discrepancy between anatomical sex and gender identity. Research points towards neurobiological influences. Due to the sexually dimorphic characteristics of the human voice, voice gender perception provides a biologically relevant function, e.g. in the context of mating selection. There is evidence for a better recognition of voices of the opposite sex and a differentiation of the sexes in its underlying functional cerebral correlates, namely the prefrontal and middle temporal areas. This fMRI study investigated the neural correlates of voice gender perception in 32 male-to-female gender dysphoric individuals (MtFs) compared to 20 non-gender dysphoric men and 19 non-gender dysphoric women. Participants indicated the sex of 240 voice stimuli modified in semitone steps in the direction to the other gender. Compared to men and women, MtFs showed differences in a neural network including the medial prefrontal gyrus, the insula, and the precuneus when responding to male vs. female voices. With increased voice morphing men recruited more prefrontal areas compared to women and MtFs, while MtFs revealed a pattern more similar to women. On a behavioral and neuronal level, our results support the feeling of MtFs reporting they cannot identify with their assigned sex. PMID:25375171

  18. Automated Electroglottographic Inflection Events Detection. A Pilot Study.

    PubMed

    Codino, Juliana; Torres, María Eugenia; Rubin, Adam; Jackson-Menaldi, Cristina

    2016-11-01

    Vocal-fold vibration can be analyzed in a noninvasive way by registering impedance changes within the glottis, through electroglottography. The morphology of the electroglottographic (EGG) signal is related to different vibratory patterns. In the literature, a characteristic knee in the descending portion of the signal has been reported. Some EGG signals do not exhibit this particular knee and have other types of events (inflection events) throughout the ascending and/or descending portion of the vibratory cycle. The goal of this work is to propose an automatic method to identify and classify these events. A computational algorithm was developed based on the mathematical properties of the EGG signal, which detects and reports events throughout the contact phase. Retrospective analysis of EGG signals obtained during routine voice evaluation of adult individuals with a variety of voice disorders was performed using the algorithm as well as human raters. Two judges, both experts in clinical voice analysis, and three general speech pathologists performed manual and visual evaluation of the sample set. The results obtained by the automatic method were compared with those of the human raters. Statistical analysis revealed a significant level of agreement. This automatic tool could allow professionals in the clinical setting to obtain an automatic quantitative and qualitative report of such events present in a voice sample, without having to manually analyze the whole EGG signal. In addition, it might provide the speech pathologist with more information that would complement the standard voice evaluation. It could also be a valuable tool in voice research. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  19. What Are Some Types of Assistive Devices and How Are They Used?

    MedlinePlus

    ... in persons with hearing problems. Cognitive assistance, including computer or electrical assistive devices, can help people function following brain injury. Computer software and hardware, such as voice recognition programs, ...

  20. Human factors issues associated with the use of speech technology in the cockpit

    NASA Technical Reports Server (NTRS)

    Kersteen, Z. A.; Damos, D.

    1983-01-01

    The human factors issues associated with the use of voice technology in the cockpit are summarized. The formulation of the LHX avionics suite is described and the allocation of tasks to voice in the cockpit is discussed. State-of-the-art speech recognition technology is reviewed. Finally, a questionnaire designed to tap pilot opinions concerning the allocation of tasks to voice input and output in the cockpit is presented. This questionnaire was designed to be administered to operational AH-1G Cobra gunship pilots. Half of the questionnaire deals specifically with the AH-1G cockpit and the types of tasks pilots would like to have performed by voice in this existing rotorcraft. The remaining portion of the questionnaire deals with an undefined rotorcraft of the future and is aimed at determining what types of tasks these pilots would like to have performed by voice technology if anything was possible, i.e. if there were no technological constraints.

  1. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

    PubMed Central

    Shin, Young Hoon; Seo, Jiwon

    2016-01-01

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867

  2. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

    PubMed

    Shin, Young Hoon; Seo, Jiwon

    2016-10-29

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  3. The Next Era: Deep Learning in Pharmaceutical Research.

    PubMed

    Ekins, Sean

    2016-11-01

    Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule's properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique.

  4. Voice recognition technology implementation in surgical pathology: advantages and limitations.

    PubMed

    Singh, Meenakshi; Pal, Timothy R

    2011-11-01

    Voice recognition technology (VRT) has been in use for medical transcription outside of laboratories for many years, and in recent years it has evolved to a level where it merits consideration by surgical pathologists. To determine the feasibility and impact of making a transition from a transcriptionist-based service to VRT in surgical pathology. We have evaluated VRT in a phased manner for sign out of general and subspecialty surgical pathology cases after conducting a pilot study. We evaluated the effect on turnaround time, workflow, staffing, typographical error rates, and the overall ability of VRT to be adapted for use in surgical pathology. The stepwise implementation of VRT has resulted in real-time sign out of cases and improvement in average turnaround time from 4 to 3 days. The percentage of cases signed out in 1 day improved from 22% to 37%. Amendment rates for typographical errors have decreased. Use of templates and synoptic reports has been facilitated. The transcription staff has been reassigned to other duties and is successfully assisting in other areas. Resident involvement and exposure to complete case sign out has been achieved resulting in a positive impact on resident education. Voice recognition technology allows for a seamless workflow in surgical pathology, with improvements in turnaround time and a positive impact on competency-based resident education. Individual practices may assess the value of VRT and decide to implement it, potentially with gains in many aspects of their practice.

  5. Similar representations of emotions across faces and voices.

    PubMed

    Kuhn, Lisa Katharina; Wydell, Taeko; Lavan, Nadine; McGettigan, Carolyn; Garrido, Lúcia

    2017-09-01

    [Correction Notice: An Erratum for this article was reported in Vol 17(6) of Emotion (see record 2017-18585-001). In the article, the copyright attribution was incorrectly listed and the Creative Commons CC-BY license disclaimer was incorrectly omitted from the author note. The correct copyright is "© 2017 The Author(s)" and the omitted disclaimer is below. All versions of this article have been corrected. "This article has been published under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Copyright for this article is retained by the author(s). Author(s) grant(s) the American Psychological Association the exclusive right to publish the article and identify itself as the original publisher."] Emotions are a vital component of social communication, carried across a range of modalities and via different perceptual signals such as specific muscle contractions in the face and in the upper respiratory system. Previous studies have found that emotion recognition impairments after brain damage depend on the modality of presentation: recognition from faces may be impaired whereas recognition from voices remains preserved, and vice versa. On the other hand, there is also evidence for shared neural activation during emotion processing in both modalities. In a behavioral study, we investigated whether there are shared representations in the recognition of emotions from faces and voices. We used a within-subjects design in which participants rated the intensity of facial expressions and nonverbal vocalizations for each of the 6 basic emotion labels. For each participant and each modality, we then computed a representation matrix with the intensity ratings of each emotion. These matrices allowed us to examine the patterns of confusions between emotions and to characterize the representations of emotions within each modality. We then compared the representations across modalities by computing the correlations of the representation matrices across faces and voices. We found highly correlated matrices across modalities, which suggest similar representations of emotions across faces and voices. We also showed that these results could not be explained by commonalities between low-level visual and acoustic properties of the stimuli. We thus propose that there are similar or shared coding mechanisms for emotions which may act independently of modality, despite their distinct perceptual inputs. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  6. High-speed cell recognition algorithm for ultrafast flow cytometer imaging system.

    PubMed

    Zhao, Wanyue; Wang, Chao; Chen, Hongwei; Chen, Minghua; Yang, Sigang

    2018-04-01

    An optical time-stretch flow imaging system enables high-throughput examination of cells/particles with unprecedented high speed and resolution. A significant amount of raw image data is produced. A high-speed cell recognition algorithm is, therefore, highly demanded to analyze large amounts of data efficiently. A high-speed cell recognition algorithm consisting of two-stage cascaded detection and Gaussian mixture model (GMM) classification is proposed. The first stage of detection extracts cell regions. The second stage integrates distance transform and the watershed algorithm to separate clustered cells. Finally, the cells detected are classified by GMM. We compared the performance of our algorithm with support vector machine. Results show that our algorithm increases the running speed by over 150% without sacrificing the recognition accuracy. This algorithm provides a promising solution for high-throughput and automated cell imaging and classification in the ultrafast flow cytometer imaging platform. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).

  7. High-speed cell recognition algorithm for ultrafast flow cytometer imaging system

    NASA Astrophysics Data System (ADS)

    Zhao, Wanyue; Wang, Chao; Chen, Hongwei; Chen, Minghua; Yang, Sigang

    2018-04-01

    An optical time-stretch flow imaging system enables high-throughput examination of cells/particles with unprecedented high speed and resolution. A significant amount of raw image data is produced. A high-speed cell recognition algorithm is, therefore, highly demanded to analyze large amounts of data efficiently. A high-speed cell recognition algorithm consisting of two-stage cascaded detection and Gaussian mixture model (GMM) classification is proposed. The first stage of detection extracts cell regions. The second stage integrates distance transform and the watershed algorithm to separate clustered cells. Finally, the cells detected are classified by GMM. We compared the performance of our algorithm with support vector machine. Results show that our algorithm increases the running speed by over 150% without sacrificing the recognition accuracy. This algorithm provides a promising solution for high-throughput and automated cell imaging and classification in the ultrafast flow cytometer imaging platform.

  8. MoNET: media over net gateway processor for next-generation network

    NASA Astrophysics Data System (ADS)

    Elabd, Hammam; Sundar, Rangarajan; Dedes, John

    2001-12-01

    MoNETTM (Media over Net) SX000 product family is designed using a scalable voice, video and packet-processing platform to address applications with channel densities from few voice channels to four OC3 per card. This platform is developed for bridging public circuit-switched network to the next generation packet telephony and data network. The platform consists of a DSP farm, RISC processors and interface modules. DSP farm is required to execute voice compression, image compression and line echo cancellation algorithms for large number of voice, video, fax, and modem or data channels. RISC CPUs are used for performing various packetizations based on RTP, UDP/IP and ATM encapsulations. In addition, RISC CPUs also participate in the DSP farm load management and communication with the host and other MoP devices. The MoNETTM S1000 communications device is designed for voice processing and for bridging TDM to ATM and IP packet networks. The S1000 consists of the DSP farm based on Carmel DSP core and 32-bit RISC CPU, along with Ethernet, Utopia, PCI, and TDM interfaces. In this paper, we will describe the VoIP infrastructure, building blocks of the S500, S1000 and S3000 devices, algorithms executed on these device and associated channel densities, detailed DSP architecture, memory architecture, data flow and scheduling.

  9. The implementation of aerial object recognition algorithm based on contour descriptor in FPGA-based on-board vision system

    NASA Astrophysics Data System (ADS)

    Babayan, Pavel; Smirnov, Sergey; Strotov, Valery

    2017-10-01

    This paper describes the aerial object recognition algorithm for on-board and stationary vision system. Suggested algorithm is intended to recognize the objects of a specific kind using the set of the reference objects defined by 3D models. The proposed algorithm based on the outer contour descriptor building. The algorithm consists of two stages: learning and recognition. Learning stage is devoted to the exploring of reference objects. Using 3D models we can build the database containing training images by rendering the 3D model from viewpoints evenly distributed on a sphere. Sphere points distribution is made by the geosphere principle. Gathered training image set is used for calculating descriptors, which will be used in the recognition stage of the algorithm. The recognition stage is focusing on estimating the similarity of the captured object and the reference objects by matching an observed image descriptor and the reference object descriptors. The experimental research was performed using a set of the models of the aircraft of the different types (airplanes, helicopters, UAVs). The proposed orientation estimation algorithm showed good accuracy in all case studies. The real-time performance of the algorithm in FPGA-based vision system was demonstrated.

  10. Talker and accent variability effects on spoken word recognition

    NASA Astrophysics Data System (ADS)

    Nyang, Edna E.; Rogers, Catherine L.; Nishi, Kanae

    2003-04-01

    A number of studies have shown that words in a list are recognized less accurately in noise and with longer response latencies when they are spoken by multiple talkers, rather than a single talker. These results have been interpreted as support for an exemplar-based model of speech perception, in which it is assumed that detailed information regarding the speaker's voice is preserved in memory and used in recognition, rather than being eliminated via normalization. In the present study, the effects of varying both accent and talker are investigated using lists of words spoken by (a) a single native English speaker, (b) six native English speakers, (c) three native English speakers and three Japanese-accented English speakers. Twelve /hVd/ words were mixed with multi-speaker babble at three signal-to-noise ratios (+10, +5, and 0 dB) to create the word lists. Native English-speaking listeners' percent-correct recognition for words produced by native English speakers across the three talker conditions (single talker native, multi-talker native, and multi-talker mixed native and non-native) and three signal-to-noise ratios will be compared to determine whether sources of speaker variability other than voice alone add to the processing demands imposed by simple (i.e., single accent) speaker variability in spoken word recognition.

  11. Automatic speech recognition in air-ground data link

    NASA Technical Reports Server (NTRS)

    Armstrong, Herbert B.

    1989-01-01

    In the present air traffic system, information presented to the transport aircraft cockpit crew may originate from a variety of sources and may be presented to the crew in visual or aural form, either through cockpit instrument displays or, most often, through voice communication. Voice radio communications are the most error prone method for air-ground data link. Voice messages can be misstated or misunderstood and radio frequency congestion can delay or obscure important messages. To prevent proliferation, a multiplexed data link display can be designed to present information from multiple data link sources on a shared cockpit display unit (CDU) or multi-function display (MFD) or some future combination of flight management and data link information. An aural data link which incorporates an automatic speech recognition (ASR) system for crew response offers several advantages over visual displays. The possibility of applying ASR to the air-ground data link was investigated. The first step was to review current efforts in ASR applications in the cockpit and in air traffic control and evaluated their possible data line application. Next, a series of preliminary research questions is to be developed for possible future collaboration.

  12. Chinese License Plates Recognition Method Based on A Robust and Efficient Feature Extraction and BPNN Algorithm

    NASA Astrophysics Data System (ADS)

    Zhang, Ming; Xie, Fei; Zhao, Jing; Sun, Rui; Zhang, Lei; Zhang, Yue

    2018-04-01

    The prosperity of license plate recognition technology has made great contribution to the development of Intelligent Transport System (ITS). In this paper, a robust and efficient license plate recognition method is proposed which is based on a combined feature extraction model and BPNN (Back Propagation Neural Network) algorithm. Firstly, the candidate region of the license plate detection and segmentation method is developed. Secondly, a new feature extraction model is designed considering three sets of features combination. Thirdly, the license plates classification and recognition method using the combined feature model and BPNN algorithm is presented. Finally, the experimental results indicate that the license plate segmentation and recognition both can be achieved effectively by the proposed algorithm. Compared with three traditional methods, the recognition accuracy of the proposed method has increased to 95.7% and the consuming time has decreased to 51.4ms.

  13. Tensor Rank Preserving Discriminant Analysis for Facial Recognition.

    PubMed

    Tao, Dapeng; Guo, Yanan; Li, Yaotang; Gao, Xinbo

    2017-10-12

    Facial recognition, one of the basic topics in computer vision and pattern recognition, has received substantial attention in recent years. However, for those traditional facial recognition algorithms, the facial images are reshaped to a long vector, thereby losing part of the original spatial constraints of each pixel. In this paper, a new tensor-based feature extraction algorithm termed tensor rank preserving discriminant analysis (TRPDA) for facial image recognition is proposed; the proposed method involves two stages: in the first stage, the low-dimensional tensor subspace of the original input tensor samples was obtained; in the second stage, discriminative locality alignment was utilized to obtain the ultimate vector feature representation for subsequent facial recognition. On the one hand, the proposed TRPDA algorithm fully utilizes the natural structure of the input samples, and it applies an optimization criterion that can directly handle the tensor spectral analysis problem, thereby decreasing the computation cost compared those traditional tensor-based feature selection algorithms. On the other hand, the proposed TRPDA algorithm extracts feature by finding a tensor subspace that preserves most of the rank order information of the intra-class input samples. Experiments on the three facial databases are performed here to determine the effectiveness of the proposed TRPDA algorithm.

  14. Speech perception and communication ability over the telephone by Mandarin-speaking children with cochlear implants.

    PubMed

    Wu, Che-Ming; Liu, Tien-Chen; Wang, Nan-Mai; Chao, Wei-Chieh

    2013-08-01

    (1) To understand speech perception and communication ability through real telephone calls by Mandarin-speaking children with cochlear implants and compare them to live-voice perception, (2) to report the general condition of telephone use of this population, and (3) to investigate the factors that correlate with telephone speech perception performance. Fifty-six children with over 4 years of implant use (aged 6.8-13.6 years, mean duration 8.0 years) took three speech perception tests administered using telephone and live voice to examine sentence, monosyllabic-word and Mandarin tone perception. The children also filled out a questionnaire survey investigating everyday telephone use. Wilcoxon signed-rank test was used to compare the scores between live-voice and telephone tests, and Pearson's test to examine the correlation between them. The mean scores were 86.4%, 69.8% and 70.5% respectively for sentence, word and tone recognition over the telephone. The corresponding live-voice mean scores were 94.3%, 84.0% and 70.8%. Wilcoxon signed-rank test showed the sentence and word scores were significantly different between telephone and live voice test, while the tone recognition scores were not, indicating tone perception was less worsened by telephone transmission than words and sentences. Spearman's test showed that chronological age and duration of implant use were weakly correlated with the perception test scores. The questionnaire survey showed 78% of the children could initiate phone calls and 59% could use the telephone 2 years after implantation. Implanted children are potentially capable of using the telephone 2 years after implantation, and communication ability over the telephone becomes satisfactory 4 years after implantation. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  15. Multifeature-based high-resolution palmprint recognition.

    PubMed

    Dai, Jifeng; Zhou, Jie

    2011-05-01

    Palmprint is a promising biometric feature for use in access control and forensic applications. Previous research on palmprint recognition mainly concentrates on low-resolution (about 100 ppi) palmprints. But for high-security applications (e.g., forensic usage), high-resolution palmprints (500 ppi or higher) are required from which more useful information can be extracted. In this paper, we propose a novel recognition algorithm for high-resolution palmprint. The main contributions of the proposed algorithm include the following: 1) use of multiple features, namely, minutiae, density, orientation, and principal lines, for palmprint recognition to significantly improve the matching performance of the conventional algorithm. 2) Design of a quality-based and adaptive orientation field estimation algorithm which performs better than the existing algorithm in case of regions with a large number of creases. 3) Use of a novel fusion scheme for an identification application which performs better than conventional fusion methods, e.g., weighted sum rule, SVMs, or Neyman-Pearson rule. Besides, we analyze the discriminative power of different feature combinations and find that density is very useful for palmprint recognition. Experimental results on the database containing 14,576 full palmprints show that the proposed algorithm has achieved a good performance. In the case of verification, the recognition system's False Rejection Rate (FRR) is 16 percent, which is 17 percent lower than the best existing algorithm at a False Acceptance Rate (FAR) of 10(-5), while in the identification experiment, the rank-1 live-scan partial palmprint recognition rate is improved from 82.0 to 91.7 percent.

  16. Gay- and Lesbian-Sounding Auditory Cues Elicit Stereotyping and Discrimination.

    PubMed

    Fasoli, Fabio; Maass, Anne; Paladino, Maria Paola; Sulpizio, Simone

    2017-07-01

    The growing body of literature on the recognition of sexual orientation from voice ("auditory gaydar") is silent on the cognitive and social consequences of having a gay-/lesbian- versus heterosexual-sounding voice. We investigated this issue in four studies (overall N = 276), conducted in Italian language, in which heterosexual listeners were exposed to single-sentence voice samples of gay/lesbian and heterosexual speakers. In all four studies, listeners were found to make gender-typical inferences about traits and preferences of heterosexual speakers, but gender-atypical inferences about those of gay or lesbian speakers. Behavioral intention measures showed that listeners considered lesbian and gay speakers as less suitable for a leadership position, and male (but not female) listeners took distance from gay speakers. Together, this research demonstrates that having a gay/lesbian rather than heterosexual-sounding voice has tangible consequences for stereotyping and discrimination.

  17. Basic and complex emotion recognition in children with autism: cross-cultural findings.

    PubMed

    Fridenson-Hayo, Shimrit; Berggren, Steve; Lassalle, Amandine; Tal, Shahar; Pigat, Delia; Bölte, Sven; Baron-Cohen, Simon; Golan, Ofer

    2016-01-01

    Children with autism spectrum conditions (ASC) have emotion recognition deficits when tested in different expression modalities (face, voice, body). However, these findings usually focus on basic emotions, using one or two expression modalities. In addition, cultural similarities and differences in emotion recognition patterns in children with ASC have not been explored before. The current study examined the similarities and differences in the recognition of basic and complex emotions by children with ASC and typically developing (TD) controls across three cultures: Israel, Britain, and Sweden. Fifty-five children with high-functioning ASC, aged 5-9, were compared to 58 TD children. On each site, groups were matched on age, sex, and IQ. Children were tested using four tasks, examining recognition of basic and complex emotions from voice recordings, videos of facial and bodily expressions, and emotional video scenarios including all modalities in context. Compared to their TD peers, children with ASC showed emotion recognition deficits in both basic and complex emotions on all three modalities and their integration in context. Complex emotions were harder to recognize, compared to basic emotions for the entire sample. Cross-cultural agreement was found for all major findings, with minor deviations on the face and body tasks. Our findings highlight the multimodal nature of ER deficits in ASC, which exist for basic as well as complex emotions and are relatively stable cross-culturally. Cross-cultural research has the potential to reveal both autism-specific universal deficits and the role that specific cultures play in the way empathy operates in different countries.

  18. Detecting Paroxysmal Coughing from Pertussis Cases Using Voice Recognition Technology

    PubMed Central

    Parker, Danny; Picone, Joseph; Harati, Amir; Lu, Shuang; Jenkyns, Marion H.; Polgreen, Philip M.

    2013-01-01

    Background Pertussis is highly contagious; thus, prompt identification of cases is essential to control outbreaks. Clinicians experienced with the disease can easily identify classic cases, where patients have bursts of rapid coughing followed by gasps, and a characteristic whooping sound. However, many clinicians have never seen a case, and thus may miss initial cases during an outbreak. The purpose of this project was to use voice-recognition software to distinguish pertussis coughs from croup and other coughs. Methods We collected a series of recordings representing pertussis, croup and miscellaneous coughing by children. We manually categorized coughs as either pertussis or non-pertussis, and extracted features for each category. We used Mel-frequency cepstral coefficients (MFCC), a sampling rate of 16 KHz, a frame Duration of 25 msec, and a frame rate of 10 msec. The coughs were filtered. Each cough was divided into 3 sections of proportion 3-4-3. The average of the 13 MFCCs for each section was computed and made into a 39-element feature vector used for the classification. We used the following machine learning algorithms: Neural Networks, K-Nearest Neighbor (KNN), and a 200 tree Random Forest (RF). Data were reserved for cross-validation of the KNN and RF. The Neural Network was trained 100 times, and the averaged results are presented. Results After categorization, we had 16 examples of non-pertussis coughs and 31 examples of pertussis coughs. Over 90% of all pertussis coughs were properly classified as pertussis. The error rates were: Type I errors of 7%, 12%, and 25% and Type II errors of 8%, 0%, and 0%, using the Neural Network, Random Forest, and KNN, respectively. Conclusion Our results suggest that we can build a robust classifier to assist clinicians and the public to help identify pertussis cases in children presenting with typical symptoms. PMID:24391730

  19. Impact of a Telehealth Program With Voice Recognition Technology in Patients With Chronic Heart Failure: Feasibility Study

    PubMed Central

    Lee, Heesun; Choi, Sae Won; Yoon, Yeonyee E; Park, Hyo Eun; Lee, Sang Eun; Lee, Seung-Pyo; Kim, Hyung-Kwan; Cho, Hyun-Jai; Choi, Su-Yeon; Lee, Hae-Young; Choi, Jonghyuk; Lee, Young-Joon; Kim, Yong-Jin; Cho, Goo-Yeong; Choi, Jinwook; Sohn, Dae-Won

    2017-01-01

    Background Despite the advances in the diagnosis and treatment of heart failure (HF), the current hospital-oriented framework for HF management does not appear to be sufficient to maintain the stability of HF patients in the long term. The importance of self-care management is increasingly being emphasized as a promising long-term treatment strategy for patients with chronic HF. Objective The objective of this study was to evaluate whether a new information communication technology (ICT)–based telehealth program with voice recognition technology could improve clinical or laboratory outcomes in HF patients. Methods In this prospective single-arm pilot study, we recruited 31 consecutive patients with chronic HF who were referred to our institute. An ICT-based telehealth program with voice recognition technology was developed and used by patients with HF for 12 weeks. Patients were educated on the use of this program via mobile phone, landline, or the Internet for the purpose of improving communication and data collection. Using these systems, we collected comprehensive data elements related to the risk of HF self-care management such as weight, diet, exercise, medication adherence, overall symptom change, and home blood pressure. The study endpoints were the changes observed in urine sodium concentration (uNa), Minnesota Living with Heart Failure (MLHFQ) scores, 6-min walk test, and N-terminal prohormone of brain natriuretic peptide (NT-proBNP) as surrogate markers for appropriate HF management. Results Among the 31 enrolled patients, 27 (87%) patients completed the study, and 10 (10/27, 37%) showed good adherence to ICT-based telehealth program with voice recognition technology, which was defined as the use of the program for 100 times or more during the study period. Nearly three-fourths of the patients had been hospitalized at least once because of HF before the enrollment (20/27, 74%); 14 patients had 1, 2 patients had 2, and 4 patients had 3 or more previous HF hospitalizations. In the total study population, there was no significant interval change in laboratory and functional outcome variables after 12 weeks of ICT-based telehealth program. In patients with good adherence to ICT-based telehealth program, there was a significant improvement in the mean uNa (103.1 to 78.1; P=.01) but not in those without (85.4 to 96.9; P=.49). Similarly, a marginal improvement in MLHFQ scores was only observed in patients with good adherence (27.5 to 21.4; P=.08) but not in their counterparts (19.0 to 19.7; P=.73). The mean 6-min walk distance and NT-proBNP were not significantly increased in patients regardless of their adherence. Conclusions Short-term application of ICT-based telehealth program with voice recognition technology showed the potential to improve uNa values and MLHFQ scores in HF patients, suggesting that better control of sodium intake and greater quality of life can be achieved by this program. PMID:28970189

  20. Design method of ARM based embedded iris recognition system

    NASA Astrophysics Data System (ADS)

    Wang, Yuanbo; He, Yuqing; Hou, Yushi; Liu, Ting

    2008-03-01

    With the advantages of non-invasiveness, uniqueness, stability and low false recognition rate, iris recognition has been successfully applied in many fields. Up to now, most of the iris recognition systems are based on PC. However, a PC is not portable and it needs more power. In this paper, we proposed an embedded iris recognition system based on ARM. Considering the requirements of iris image acquisition and recognition algorithm, we analyzed the design method of the iris image acquisition module, designed the ARM processing module and its peripherals, studied the Linux platform and the recognition algorithm based on this platform, finally actualized the design method of ARM-based iris imaging and recognition system. Experimental results show that the ARM platform we used is fast enough to run the iris recognition algorithm, and the data stream can flow smoothly between the camera and the ARM chip based on the embedded Linux system. It's an effective method of using ARM to actualize portable embedded iris recognition system.

  1. Hybrid simulated annealing and its application to optimization of hidden Markov models for visual speech recognition.

    PubMed

    Lee, Jong-Seok; Park, Cheol Hoon

    2010-08-01

    We propose a novel stochastic optimization algorithm, hybrid simulated annealing (SA), to train hidden Markov models (HMMs) for visual speech recognition. In our algorithm, SA is combined with a local optimization operator that substitutes a better solution for the current one to improve the convergence speed and the quality of solutions. We mathematically prove that the sequence of the objective values converges in probability to the global optimum in the algorithm. The algorithm is applied to train HMMs that are used as visual speech recognizers. While the popular training method of HMMs, the expectation-maximization algorithm, achieves only local optima in the parameter space, the proposed method can perform global optimization of the parameters of HMMs and thereby obtain solutions yielding improved recognition performance. The superiority of the proposed algorithm to the conventional ones is demonstrated via isolated word recognition experiments.

  2. 78 FR 58305 - Honeywell International, Inc.; Analysis of Agreement Containing Consent Order To Aid Public Comment

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-09-23

    ..., formulas, patterns, devices, manufacturing processes, or customer names. If you want the Commission to give... barcode scanners, barcode printers, RFID systems and voice recognition systems. III. Scan Engines The...

  3. Top 10 "Smart" Technologies for Schools.

    ERIC Educational Resources Information Center

    Fodeman, Doug; Holzberg, Carol S.; Kennedy, Kristen; McIntire, Todd; McLester, Susan; Ohler, Jason; Parham, Charles; Poftak, Amy; Schrock, Kathy; Warlick, David

    2002-01-01

    Describes 10 smart technologies for education, including voice to text software; mobile computing; hybrid computing; virtual reality; artificial intelligence; telementoring; assessment methods; digital video production; fingerprint recognition; and brain functions. Lists pertinent Web sites for each technology. (LRW)

  4. Who Goes There?

    ERIC Educational Resources Information Center

    Vail, Kathleen

    1995-01-01

    Biometrics (hand geometry, iris and retina scanners, voice and facial recognition, signature dynamics, facial thermography, and fingerprint readers) identifies people based on physical characteristics. Administrators worried about kidnapping, vandalism, theft, and violent intruders might welcome these security measures when they become more…

  5. Effects of cross-language voice training on speech perception: Whose familiar voices are more intelligible?

    PubMed Central

    Levi, Susannah V.; Winters, Stephen J.; Pisoni, David B.

    2011-01-01

    Previous research has shown that familiarity with a talker’s voice can improve linguistic processing (herein, “Familiar Talker Advantage”), but this benefit is constrained by the context in which the talker’s voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers’ voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers’ voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. PMID:22225059

  6. Systems concept for speech technology application in general aviation

    NASA Technical Reports Server (NTRS)

    North, R. A.; Bergeron, H.

    1984-01-01

    The application potential of voice recognition and synthesis circuits for general aviation, single-pilot IFR (SPIFR) situations is examined. The viewpoint of the pilot was central to workload analyses and assessment of the effectiveness of the voice systems. A twin-engine, high performance general aviation aircraft on a cross-country fixed route was employed as the study model. No actual control movements were considered and other possible functions were scored by three IFR-rated instructors. The SPIFR was concluded helpful in alleviating visual and manual workloads during take-off, approach and landing, particularly for data retrieval and entry tasks. Voice synthesis was an aid in alerting a pilot to in-flight problems. It is expected that usable systems will be available within 5 yr.

  7. Research on autonomous identification of airport targets based on Gabor filtering and Radon transform

    NASA Astrophysics Data System (ADS)

    Yi, Juan; Du, Qingyu; Zhang, Hong jiang; Zhang, Yao lei

    2017-11-01

    Target recognition is a leading key technology in intelligent image processing and application development at present, with the enhancement of computer processing ability, autonomous target recognition algorithm, gradually improve intelligence, and showed good adaptability. Taking the airport target as the research object, analysis the airport layout characteristics, construction of knowledge model, Gabor filter and Radon transform based on the target recognition algorithm of independent design, image processing and feature extraction of the airport, the algorithm was verified, and achieved better recognition results.

  8. Detection of Terrorist Preparations by an Artificial Intelligence Expert System Employing Fuzzy Signal Detection Theory

    DTIC Science & Technology

    2004-10-25

    FUSEDOT does not require facial recognition , or video surveillance of public areas, both of which are apparently a component of TIA ([26], pp...does not use fuzzy signal detection. Involves facial recognition and video surveillance of public areas. Involves monitoring the content of voice...fuzzy signal detection, which TIA does not. Second, FUSEDOT would be easier to develop, because it does not require the development of facial

  9. Robot Command Interface Using an Audio-Visual Speech Recognition System

    NASA Astrophysics Data System (ADS)

    Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

    In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

  10. An innovative multimodal virtual platform for communication with devices in a natural way

    NASA Astrophysics Data System (ADS)

    Kinkar, Chhayarani R.; Golash, Richa; Upadhyay, Akhilesh R.

    2012-03-01

    As technology grows people are diverted and are more interested in communicating with machine or computer naturally. This will make machine more compact and portable by avoiding remote, keyboard etc. also it will help them to live in an environment free from electromagnetic waves. This thought has made 'recognition of natural modality in human computer interaction' a most appealing and promising research field. Simultaneously it has been observed that using single mode of interaction limit the complete utilization of commands as well as data flow. In this paper a multimodal platform, where out of many natural modalities like eye gaze, speech, voice, face etc. human gestures are combined with human voice is proposed which will minimize the mean square error. This will loosen the strict environment needed for accurate and robust interaction while using single mode. Gesture complement Speech, gestures are ideal for direct object manipulation and natural language is used for descriptive tasks. Human computer interaction basically requires two broad sections recognition and interpretation. Recognition and interpretation of natural modality in complex binary instruction is a tough task as it integrate real world to virtual environment. The main idea of the paper is to develop a efficient model for data fusion coming from heterogeneous sensors, camera and microphone. Through this paper we have analyzed that the efficiency is increased if heterogeneous data (image & voice) is combined at feature level using artificial intelligence. The long term goal of this paper is to design a robust system for physically not able or having less technical knowledge.

  11. On the definition and interpretation of voice selective activation in the temporal cortex

    PubMed Central

    Bethmann, Anja; Brechmann, André

    2014-01-01

    Regions along the superior temporal sulci and in the anterior temporal lobes have been found to be involved in voice processing. It has even been argued that parts of the temporal cortices serve as voice-selective areas. Yet, evidence for voice-selective activation in the strict sense is still missing. The current fMRI study aimed at assessing the degree of voice-specific processing in different parts of the superior and middle temporal cortices. To this end, voices of famous persons were contrasted with widely different categories, which were sounds of animals and musical instruments. The argumentation was that only brain regions with statistically proven absence of activation by the control stimuli may be considered as candidates for voice-selective areas. Neural activity was found to be stronger in response to human voices in all analyzed parts of the temporal lobes except for the middle and posterior STG. More importantly, the activation differences between voices and the other environmental sounds increased continuously from the mid-posterior STG to the anterior MTG. Here, only voices but not the control stimuli excited an increase of the BOLD response above a resting baseline level. The findings are discussed with reference to the function of the anterior temporal lobes in person recognition and the general question on how to define selectivity of brain regions for a specific class of stimuli or tasks. In addition, our results corroborate recent assumptions about the hierarchical organization of auditory processing building on a processing stream from the primary auditory cortices to anterior portions of the temporal lobes. PMID:25071527

  12. Should visual speech cues (speechreading) be considered when fitting hearing aids?

    NASA Astrophysics Data System (ADS)

    Grant, Ken

    2002-05-01

    When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.

  13. Wavelet-based associative memory

    NASA Astrophysics Data System (ADS)

    Jones, Katharine J.

    2004-04-01

    Faces provide important characteristics of a person"s identification. In security checks, face recognition still remains the method in continuous use despite other approaches (i.e. fingerprints, voice recognition, pupil contraction, DNA scanners). With an associative memory, the output data is recalled directly using the input data. This can be achieved with a Nonlinear Holographic Associative Memory (NHAM). This approach can also distinguish between strongly correlated images and images that are partially or totally enclosed by others. Adaptive wavelet lifting has been used for Content-Based Image Retrieval. In this paper, adaptive wavelet lifting will be applied to face recognition to achieve an associative memory.

  14. Outcomes Measurement in Voice Disorders: Application of an Acoustic Index of Dysphonia Severity

    ERIC Educational Resources Information Center

    Awan, Shaheen N.; Roy, Nelson

    2009-01-01

    Purpose: The purpose of this experiment was to assess the ability of an acoustic model composed of both time-based and spectral-based measures to track change following voice disorder treatment and to serve as a possible treatment outcomes measure. Method: A weighted, four-factor acoustic algorithm consisting of shimmer, pitch sigma, the ratio of…

  15. Start/End Delays of Voiced and Unvoiced Speech Signals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Herrnstein, A

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measuredmore » acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.« less

  16. Automated Recognition of 3D Features in GPIR Images

    NASA Technical Reports Server (NTRS)

    Park, Han; Stough, Timothy; Fijany, Amir

    2007-01-01

    A method of automated recognition of three-dimensional (3D) features in images generated by ground-penetrating imaging radar (GPIR) is undergoing development. GPIR 3D images can be analyzed to detect and identify such subsurface features as pipes and other utility conduits. Until now, much of the analysis of GPIR images has been performed manually by expert operators who must visually identify and track each feature. The present method is intended to satisfy a need for more efficient and accurate analysis by means of algorithms that can automatically identify and track subsurface features, with minimal supervision by human operators. In this method, data from multiple sources (for example, data on different features extracted by different algorithms) are fused together for identifying subsurface objects. The algorithms of this method can be classified in several different ways. In one classification, the algorithms fall into three classes: (1) image-processing algorithms, (2) feature- extraction algorithms, and (3) a multiaxis data-fusion/pattern-recognition algorithm that includes a combination of machine-learning, pattern-recognition, and object-linking algorithms. The image-processing class includes preprocessing algorithms for reducing noise and enhancing target features for pattern recognition. The feature-extraction algorithms operate on preprocessed data to extract such specific features in images as two-dimensional (2D) slices of a pipe. Then the multiaxis data-fusion/ pattern-recognition algorithm identifies, classifies, and reconstructs 3D objects from the extracted features. In this process, multiple 2D features extracted by use of different algorithms and representing views along different directions are used to identify and reconstruct 3D objects. In object linking, which is an essential part of this process, features identified in successive 2D slices and located within a threshold radius of identical features in adjacent slices are linked in a directed-graph data structure. Relative to past approaches, this multiaxis approach offers the advantages of more reliable detections, better discrimination of objects, and provision of redundant information, which can be helpful in filling gaps in feature recognition by one of the component algorithms. The image-processing class also includes postprocessing algorithms that enhance identified features to prepare them for further scrutiny by human analysts (see figure). Enhancement of images as a postprocessing step is a significant departure from traditional practice, in which enhancement of images is a preprocessing step.

  17. Using voice to create hospital progress notes: Description of a mobile application and supporting system integrated with a commercial electronic health record.

    PubMed

    Payne, Thomas H; Alonso, W David; Markiel, J Andrew; Lybarger, Kevin; White, Andrew A

    2018-01-01

    We describe the development and design of a smartphone app-based system to create inpatient progress notes using voice, commercial automatic speech recognition software, with text processing to recognize spoken voice commands and format the note, and integration with a commercial EHR. This new system fits hospital rounding workflow and was used to support a randomized clinical trial testing whether use of voice to create notes improves timeliness of note availability, note quality, and physician satisfaction with the note creation process. The system was used to create 709 notes which were placed in the corresponding patient's EHR record. The median time from pressing the Send button to appearance of the formatted note in the Inbox was 8.8 min. It was generally very reliable, accepted by physician users, and secure. This approach provides an alternative to use of keyboard and templates to create progress notes and may appeal to physicians who prefer voice to typing. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. A posteriori error estimates in voice source recovery

    NASA Astrophysics Data System (ADS)

    Leonov, A. S.; Sorokin, V. N.

    2017-12-01

    The inverse problem of voice source pulse recovery from a segment of a speech signal is under consideration. A special mathematical model is used for the solution that relates these quantities. A variational method of solving inverse problem of voice source recovery for a new parametric class of sources, that is for piecewise-linear sources (PWL-sources), is proposed. Also, a technique for a posteriori numerical error estimation for obtained solutions is presented. A computer study of the adequacy of adopted speech production model with PWL-sources is performed in solving the inverse problems for various types of voice signals, as well as corresponding study of a posteriori error estimates. Numerical experiments for speech signals show satisfactory properties of proposed a posteriori error estimates, which represent the upper bounds of possible errors in solving the inverse problem. The estimate of the most probable error in determining the source-pulse shapes is about 7-8% for the investigated speech material. It is noted that a posteriori error estimates can be used as a criterion of the quality for obtained voice source pulses in application to speaker recognition.

  19. A Joint Time-Frequency and Matrix Decomposition Feature Extraction Methodology for Pathological Voice Classification

    NASA Astrophysics Data System (ADS)

    Ghoraani, Behnaz; Krishnan, Sridhar

    2009-12-01

    The number of people affected by speech problems is increasing as the modern world places increasing demands on the human voice via mobile telephones, voice recognition software, and interpersonal verbal communications. In this paper, we propose a novel methodology for automatic pattern classification of pathological voices. The main contribution of this paper is extraction of meaningful and unique features using Adaptive time-frequency distribution (TFD) and nonnegative matrix factorization (NMF). We construct Adaptive TFD as an effective signal analysis domain to dynamically track the nonstationarity in the speech and utilize NMF as a matrix decomposition (MD) technique to quantify the constructed TFD. The proposed method extracts meaningful and unique features from the joint TFD of the speech, and automatically identifies and measures the abnormality of the signal. Depending on the abnormality measure of each signal, we classify the signal into normal or pathological. The proposed method is applied on the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database which consists of 161 pathological and 51 normal speakers, and an overall classification accuracy of 98.6% was achieved.

  20. Tracking and recognition face in videos with incremental local sparse representation model

    NASA Astrophysics Data System (ADS)

    Wang, Chao; Wang, Yunhong; Zhang, Zhaoxiang

    2013-10-01

    This paper addresses the problem of tracking and recognizing faces via incremental local sparse representation. First a robust face tracking algorithm is proposed via employing local sparse appearance and covariance pooling method. In the following face recognition stage, with the employment of a novel template update strategy, which combines incremental subspace learning, our recognition algorithm adapts the template to appearance changes and reduces the influence of occlusion and illumination variation. This leads to a robust video-based face tracking and recognition with desirable performance. In the experiments, we test the quality of face recognition in real-world noisy videos on YouTube database, which includes 47 celebrities. Our proposed method produces a high face recognition rate at 95% of all videos. The proposed face tracking and recognition algorithms are also tested on a set of noisy videos under heavy occlusion and illumination variation. The tracking results on challenging benchmark videos demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods. In the case of the challenging dataset in which faces undergo occlusion and illumination variation, and tracking and recognition experiments under significant pose variation on the University of California, San Diego (Honda/UCSD) database, our proposed method also consistently demonstrates a high recognition rate.

  1. Experimental study on GMM-based speaker recognition

    NASA Astrophysics Data System (ADS)

    Ye, Wenxing; Wu, Dapeng; Nucci, Antonio

    2010-04-01

    Speaker recognition plays a very important role in the field of biometric security. In order to improve the recognition performance, many pattern recognition techniques have be explored in the literature. Among these techniques, the Gaussian Mixture Model (GMM) is proved to be an effective statistic model for speaker recognition and is used in most state-of-the-art speaker recognition systems. The GMM is used to represent the 'voice print' of a speaker through modeling the spectral characteristic of speech signals of the speaker. In this paper, we implement a speaker recognition system, which consists of preprocessing, Mel-Frequency Cepstrum Coefficients (MFCCs) based feature extraction, and GMM based classification. We test our system with TIDIGITS data set (325 speakers) and our own recordings of more than 200 speakers; our system achieves 100% correct recognition rate. Moreover, we also test our system under the scenario that training samples are from one language but test samples are from a different language; our system also achieves 100% correct recognition rate, which indicates that our system is language independent.

  2. Voice-enabled Knowledge Engine using Flood Ontology and Natural Language Processing

    NASA Astrophysics Data System (ADS)

    Sermet, M. Y.; Demir, I.; Krajewski, W. F.

    2015-12-01

    The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The IFIS is designed for use by general public, often people with no domain knowledge and limited general science background. To improve effective communication with such audience, we have introduced a voice-enabled knowledge engine on flood related issues in IFIS. Instead of navigating within many features and interfaces of the information system and web-based sources, the system provides dynamic computations based on a collection of built-in data, analysis, and methods. The IFIS Knowledge Engine connects to real-time stream gauges, in-house data sources, analysis and visualization tools to answer natural language questions. Our goal is the systematization of data and modeling results on flood related issues in Iowa, and to provide an interface for definitive answers to factual queries. The goal of the knowledge engine is to make all flood related knowledge in Iowa easily accessible to everyone, and support voice-enabled natural language input. We aim to integrate and curate all flood related data, implement analytical and visualization tools, and make it possible to compute answers from questions. The IFIS explicitly implements analytical methods and models, as algorithms, and curates all flood related data and resources so that all these resources are computable. The IFIS Knowledge Engine computes the answer by deriving it from its computational knowledge base. The knowledge engine processes the statement, access data warehouse, run complex database queries on the server-side and return outputs in various formats. This presentation provides an overview of IFIS Knowledge Engine, its unique information interface and functionality as an educational tool, and discusses the future plans for providing knowledge on flood related issues and resources. IFIS Knowledge Engine provides an alternative access method to these comprehensive set of tools and data resources available in IFIS. Current implementation of the system accepts free-form input and voice recognition capabilities within browser and mobile applications.

  3. Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

    PubMed

    Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

    This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.

  4. Face sketch recognition based on edge enhancement via deep learning

    NASA Astrophysics Data System (ADS)

    Xie, Zhenzhu; Yang, Fumeng; Zhang, Yuming; Wu, Congzhong

    2017-11-01

    In this paper,we address the face sketch recognition problem. Firstly, we utilize the eigenface algorithm to convert a sketch image into a synthesized sketch face image. Subsequently, considering the low-level vision problem in synthesized face sketch image .Super resolution reconstruction algorithm based on CNN(convolutional neural network) is employed to improve the visual effect. To be specific, we uses a lightweight super-resolution structure to learn a residual mapping instead of directly mapping the feature maps from the low-level space to high-level patch representations, which making the networks are easier to optimize and have lower computational complexity. Finally, we adopt LDA(Linear Discriminant Analysis) algorithm to realize face sketch recognition on synthesized face image before super resolution and after respectively. Extensive experiments on the face sketch database(CUFS) from CUHK demonstrate that the recognition rate of SVM(Support Vector Machine) algorithm improves from 65% to 69% and the recognition rate of LDA(Linear Discriminant Analysis) algorithm improves from 69% to 75%.What'more,the synthesized face image after super resolution can not only better describer image details such as hair ,nose and mouth etc, but also improve the recognition accuracy effectively.

  5. Cognitive object recognition system (CORS)

    NASA Astrophysics Data System (ADS)

    Raju, Chaitanya; Varadarajan, Karthik Mahesh; Krishnamurthi, Niyant; Xu, Shuli; Biederman, Irving; Kelley, Troy

    2010-04-01

    We have developed a framework, Cognitive Object Recognition System (CORS), inspired by current neurocomputational models and psychophysical research in which multiple recognition algorithms (shape based geometric primitives, 'geons,' and non-geometric feature-based algorithms) are integrated to provide a comprehensive solution to object recognition and landmarking. Objects are defined as a combination of geons, corresponding to their simple parts, and the relations among the parts. However, those objects that are not easily decomposable into geons, such as bushes and trees, are recognized by CORS using "feature-based" algorithms. The unique interaction between these algorithms is a novel approach that combines the effectiveness of both algorithms and takes us closer to a generalized approach to object recognition. CORS allows recognition of objects through a larger range of poses using geometric primitives and performs well under heavy occlusion - about 35% of object surface is sufficient. Furthermore, geon composition of an object allows image understanding and reasoning even with novel objects. With reliable landmarking capability, the system improves vision-based robot navigation in GPS-denied environments. Feasibility of the CORS system was demonstrated with real stereo images captured from a Pioneer robot. The system can currently identify doors, door handles, staircases, trashcans and other relevant landmarks in the indoor environment.

  6. Face recognition algorithm using extended vector quantization histogram features.

    PubMed

    Yan, Yan; Lee, Feifei; Wu, Xueqian; Chen, Qiu

    2018-01-01

    In this paper, we propose a face recognition algorithm based on a combination of vector quantization (VQ) and Markov stationary features (MSF). The VQ algorithm has been shown to be an effective method for generating features; it extracts a codevector histogram as a facial feature representation for face recognition. Still, the VQ histogram features are unable to convey spatial structural information, which to some extent limits their usefulness in discrimination. To alleviate this limitation of VQ histograms, we utilize Markov stationary features (MSF) to extend the VQ histogram-based features so as to add spatial structural information. We demonstrate the effectiveness of our proposed algorithm by achieving recognition results superior to those of several state-of-the-art methods on publicly available face databases.

  7. A Random Forest-based ensemble method for activity recognition.

    PubMed

    Feng, Zengtao; Mo, Lingfei; Li, Meng

    2015-01-01

    This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.

  8. The fast iris image clarity evaluation based on Tenengrad and ROI selection

    NASA Astrophysics Data System (ADS)

    Gao, Shuqin; Han, Min; Cheng, Xu

    2018-04-01

    In iris recognition system, the clarity of iris image is an important factor that influences recognition effect. In the process of recognition, the blurred image may possibly be rejected by the automatic iris recognition system, which will lead to the failure of identification. Therefore it is necessary to evaluate the iris image definition before recognition. Considered the existing evaluation methods on iris image definition, we proposed a fast algorithm to evaluate the definition of iris image in this paper. In our algorithm, firstly ROI (Region of Interest) is extracted based on the reference point which is determined by using the feature of the light spots within the pupil, then Tenengrad operator is used to evaluate the iris image's definition. Experiment results show that, the iris image definition algorithm proposed in this paper could accurately distinguish the iris images of different clarity, and the algorithm has the merit of low computational complexity and more effectiveness.

  9. Optimal pattern synthesis for speech recognition based on principal component analysis

    NASA Astrophysics Data System (ADS)

    Korsun, O. N.; Poliyev, A. V.

    2018-02-01

    The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.

  10. Automatic measurement of voice onset time using discriminative structured prediction.

    PubMed

    Sonderegger, Morgan; Keshet, Joseph

    2012-12-01

    A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets.

  11. Nature and extent of person recognition impairments associated with Capgras syndrome in Lewy body dementia

    PubMed Central

    Fiacconi, Chris M.; Barkley, Victoria; Finger, Elizabeth C.; Carson, Nicole; Duke, Devin; Rosenbaum, R. Shayna; Gilboa, Asaf; Köhler, Stefan

    2014-01-01

    Patients with Capgras syndrome (CS) adopt the delusional belief that persons well-known to them have been replaced by an imposter. Several current theoretical models of CS attribute such misidentification problems to deficits in covert recognition processes related to the generation of appropriate affective autonomic signals. These models assume intact overt recognition processes for the imposter and, more broadly, for other individuals. As such, it has been suggested that CS could reflect the “mirror-image” of prosopagnosia. The purpose of the current study was to determine whether overt person recognition abilities are indeed always spared in CS. Furthermore, we examined whether CS might be associated with any impairments in overt affective judgments of facial expressions. We pursued these goals by studying a patient with Dementia with Lewy bodies (DLB) who showed clear signs of CS, and by comparing him to another patient with DLB who did not experience CS, as well as to a group of healthy control participants. Clinical magnetic resonance imaging scans revealed medial prefrontal cortex (mPFC) atrophy that appeared to be uniquely associated with the presence CS. We assessed overt person recognition with three fame recognition tasks, using faces, voices, and names as cues. We also included measures of confidence and probed pertinent semantic knowledge. In addition, participants rated the intensity of fearful facial expressions. We found that CS was associated with overt person recognition deficits when probed with faces and voices, but not with names. Critically, these deficits were not present in the DLB patient without CS. In addition, CS was associated with impairments in overt judgments of affect intensity. Taken together, our findings cast doubt on the traditional view that CS is the mirror-image of prosopagnosia and that it spares overt recognition abilities. These findings can still be accommodated by models of CS that emphasize deficits in autonomic responding, to the extent that the potential role of interoceptive awareness in overt judgments is taken into account. PMID:25309399

  12. The Effects of Certain Background Noises on the Performance of a Voice Recognition System.

    DTIC Science & Technology

    1980-09-01

    Principles in Experimental Design. New York: McGraw-Hill, 1962. Woodworth, R.S. and H. Schlosberg, Experimental Psychology, (Revised edition), New...collection iheet APPENDIX II EXPERIMENTAL PROTOCOL AND SUBJECTS’ INSTRICTJONS THIS IS AN EXPERIMENT DESIGNED TO EVALUJATE SOME ," lE RECOGNITION EQUIPMENT. I...37. CDR Paul Chatelier OUSD R&E Room 3D129 Pentagon Washington, D.C. 20301 38. Ralph Cleveland NFMSO Code 9333 Mechanicsburg, PA 17055 39. Clay Coler

  13. Alternative Speech Communication System for Persons with Severe Speech Disorders

    NASA Astrophysics Data System (ADS)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  14. Multi-frame knowledge based text enhancement for mobile phone captured videos

    NASA Astrophysics Data System (ADS)

    Ozarslan, Suleyman; Eren, P. Erhan

    2014-02-01

    In this study, we explore automated text recognition and enhancement using mobile phone captured videos of store receipts. We propose a method which includes Optical Character Resolution (OCR) enhanced by our proposed Row Based Multiple Frame Integration (RB-MFI), and Knowledge Based Correction (KBC) algorithms. In this method, first, the trained OCR engine is used for recognition; then, the RB-MFI algorithm is applied to the output of the OCR. The RB-MFI algorithm determines and combines the most accurate rows of the text outputs extracted by using OCR from multiple frames of the video. After RB-MFI, KBC algorithm is applied to these rows to correct erroneous characters. Results of the experiments show that the proposed video-based approach which includes the RB-MFI and the KBC algorithm increases the word character recognition rate to 95%, and the character recognition rate to 98%.

  15. Investigation of air transportation technology at Princeton University, 1983

    NASA Technical Reports Server (NTRS)

    Stengel, Robert F.

    1987-01-01

    Progress is discussed for each of the following areas: voice recognition technology for flight control; guidance and control strategies for penetration of microbursts and wind shear; application of artificial intelligence in flight control systems; and computer-aided aircraft design.

  16. Federal Barriers to Innovation

    ERIC Educational Resources Information Center

    Miller, Raegen; Lake, Robin

    2012-01-01

    With educational outcomes inadequate, resources tight, and students' academic needs growing more complex, America's education system is certainly ready for technological innovation. And technology itself is ripe to be exploited. Devices harnessing cheap computing power have become smart and connected. Voice recognition, artificial intelligence,…

  17. [A new peak detection algorithm of Raman spectra].

    PubMed

    Jiang, Cheng-Zhi; Sun, Qiang; Liu, Ying; Liang, Jing-Qiu; An, Yan; Liu, Bing

    2014-01-01

    The authors proposed a new Raman peak recognition method named bi-scale correlation algorithm. The algorithm uses the combination of the correlation coefficient and the local signal-to-noise ratio under two scales to achieve Raman peak identification. We compared the performance of the proposed algorithm with that of the traditional continuous wavelet transform method through MATLAB, and then tested the algorithm with real Raman spectra. The results show that the average time for identifying a Raman spectrum is 0.51 s with the algorithm, while it is 0.71 s with the continuous wavelet transform. When the signal-to-noise ratio of Raman peak is greater than or equal to 6 (modern Raman spectrometers feature an excellent signal-to-noise ratio), the recognition accuracy with the algorithm is higher than 99%, while it is less than 84% with the continuous wavelet transform method. The mean and the standard deviations of the peak position identification error of the algorithm are both less than that of the continuous wavelet transform method. Simulation analysis and experimental verification prove that the new algorithm possesses the following advantages: no needs of human intervention, no needs of de-noising and background removal operation, higher recognition speed and higher recognition accuracy. The proposed algorithm is operable in Raman peak identification.

  18. Defining the ATC Controller Interface for Data Link Clearances

    NASA Technical Reports Server (NTRS)

    Rankin, James

    1998-01-01

    The Controller Interface (CI) is the primary method for Air Traffic Controllers to communicate with aircraft via Controller-Pilot Data Link Communications (CPDLC). The controller, wearing a microphone/headset, aurally gives instructions to aircraft as he/she would with today's voice radio systems. The CI's voice recognition system converts the instructions to digitized messages that are formatted according to the RTCA DO-219 Operational Performance Standards for ATC Two-Way Data Link Communications. The DO-219 messages are transferred via RS-232 to the ATIDS system for uplink using a Mode-S datalink. Pilot acknowledgments of controller messages are downlinked to the ATIDS system and transferred to the Cl. A computer monitor is used to convey information to the controller. Aircraft data from the ARTS database are displayed on flight strips. The flight strips are electronic versions of the strips currently used in the ATC system. Outgoing controller messages cause the respective strip to change color to indicate an unacknowledged transmission. The message text is shown on the flight strips for reference. When the pilot acknowledges the message, the strip returns to its normal color. A map of the airport can also be displayed on the monitor. In addition to voice recognition, the controller can enter messages using the monitor's touch screen or by mouse/keyboard.

  19. An improved finger-vein recognition algorithm based on template matching

    NASA Astrophysics Data System (ADS)

    Liu, Yueyue; Di, Si; Jin, Jian; Huang, Daoping

    2016-10-01

    Finger-vein recognition has became the most popular biometric identify methods. The investigation on the recognition algorithms always is the key point in this field. So far, there are many applicable algorithms have been developed. However, there are still some problems in practice, such as the variance of the finger position which may lead to the image distortion and shifting; during the identification process, some matching parameters determined according to experience may also reduce the adaptability of algorithm. Focus on above mentioned problems, this paper proposes an improved finger-vein recognition algorithm based on template matching. In order to enhance the robustness of the algorithm for the image distortion, the least squares error method is adopted to correct the oblique finger. During the feature extraction, local adaptive threshold method is adopted. As regard as the matching scores, we optimized the translation preferences as well as matching distance between the input images and register images on the basis of Naoto Miura algorithm. Experimental results indicate that the proposed method can improve the robustness effectively under the finger shifting and rotation conditions.

  20. Modular biometric system

    NASA Astrophysics Data System (ADS)

    Hsu, Charles; Viazanko, Michael; O'Looney, Jimmy; Szu, Harold

    2009-04-01

    Modularity Biometric System (MBS) is an approach to support AiTR of the cooperated and/or non-cooperated standoff biometric in an area persistent surveillance. Advanced active and passive EOIR and RF sensor suite is not considered here. Neither will we consider the ROC, PD vs. FAR, versus the standoff POT in this paper. Our goal is to catch the "most wanted (MW)" two dozens, separately furthermore ad hoc woman MW class from man MW class, given their archrivals sparse front face data basis, by means of various new instantaneous input called probing faces. We present an advanced algorithm: mini-Max classifier, a sparse sample realization of Cramer-Rao Fisher bound of the Maximum Likelihood classifier that minimize the dispersions among the same woman classes and maximize the separation among different man-woman classes, based on the simple feature space of MIT Petland eigen-faces. The original aspect consists of a modular structured design approach at the system-level with multi-level architectures, multiple computing paradigms, and adaptable/evolvable techniques to allow for achieving a scalable structure in terms of biometric algorithms, identification quality, sensors, database complexity, database integration, and component heterogenity. MBS consist of a number of biometric technologies including fingerprints, vein maps, voice and face recognitions with innovative DSP algorithm, and their hardware implementations such as using Field Programmable Gate arrays (FPGAs). Biometric technologies and the composed modularity biometric system are significant for governmental agencies, enterprises, banks and all other organizations to protect people or control access to critical resources.

  1. The Next Era: Deep Learning in Pharmaceutical Research

    PubMed Central

    Ekins, Sean

    2016-01-01

    Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule’s properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique. PMID:27599991

  2. Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction.

    PubMed

    Nass, C; Lee, K M

    2001-09-01

    Would people exhibit similarity-attraction and consistency-attraction toward unambiguously computer-generated speech even when personality is clearly not relevant? In Experiment 1, participants (extrovert or introvert) heard a synthesized voice (extrovert or introvert) on a book-buying Web site. Participants accurately recognized personality cues in text to speech and showed similarity-attraction in their evaluation of the computer voice, the book reviews, and the reviewer. Experiment 2, in a Web auction context, added personality of the text to the previous design. The results replicated Experiment 1 and demonstrated consistency (voice and text personality)-attraction. To maximize liking and trust, designers should set parameters, for example, words per minute or frequency range, that create a personality that is consistent with the user and the content being presented.

  3. Development of a mobile satellite communication unit

    NASA Technical Reports Server (NTRS)

    Suzuki, Ryutaro; Ikegami, Tetsushi; Hamamoto, Naokazu; Taguchi, Tetsu; Endo, Nobuhiro; Yamamoto, Osamu; Ichiyoshi, Osamu

    1988-01-01

    A compact 210(W) x 280(H) x 330(D) mm mobile terminal capable of transmitting voice and data through L-band mobile satellites is described. The Voice Codec can convert an analog voice to or from digital codes at rates of 9.6, 8 and 4.8 kb/s by an MPC algorithm. The terminal functions with a single 12 V power supplied vehicle battery. The equipment can operate at any L-band frequency allocated for mobile uses in a full duplex mode and will soon be put into a field test via Japans's ETS-V satellite.

  4. Newly learned word forms are abstract and integrated immediately after acquisition

    PubMed Central

    Kapnoula, Efthymia C.; McMurray, Bob

    2015-01-01

    A hotly debated question in word learning concerns the conditions under which newly learned words compete or interfere with familiar words during spoken word recognition. This has recently been described as a key marker of the integration of a new word into the lexicon and was thought to require consolidation Dumay & Gaskell, (Psychological Science, 18, 35–39, 2007; Gaskell & Dumay, Cognition, 89, 105–132, 2003). Recently, however, Kapnoula, Packard, Gupta, and McMurray, (Cognition, 134, 85–99, 2015) showed that interference can be observed immediately after a word is first learned, implying very rapid integration of new words into the lexicon. It is an open question whether these kinds of effects derive from episodic traces of novel words or from more abstract and lexicalized representations. Here we addressed this question by testing inhibition for newly learned words using training and test stimuli presented in different talker voices. During training, participants were exposed to a set of nonwords spoken by a female speaker. Immediately after training, we assessed the ability of the novel word forms to inhibit familiar words, using a variant of the visual world paradigm. Crucially, the test items were produced by a male speaker. An analysis of fixations showed that even with a change in voice, newly learned words interfered with the recognition of similar known words. These findings show that lexical competition effects from newly learned words spread across different talker voices, which suggests that newly learned words can be sufficiently lexicalized, and abstract with respect to talker voice, without consolidation. PMID:26202702

  5. Dragon Stream Cipher for Secure Blackbox Cockpit Voice Recorder

    NASA Astrophysics Data System (ADS)

    Akmal, Fadira; Michrandi Nasution, Surya; Azmi, Fairuz

    2017-11-01

    Aircraft blackbox is a device used to record all aircraft information, which consists of Flight Data Recorder (FDR) and Cockpit Voice Recorder (CVR). Cockpit Voice Recorder contains conversations in the aircraft during the flight.Investigations on aircraft crashes usually take a long time, because it is difficult to find the aircraft blackbox. Then blackbox should have the ability to send information to other places. Aircraft blackbox must have a data security system, data security is a very important part at the time of information exchange process. The system in this research is to perform the encryption and decryption process on Cockpit Voice Recorder by people who are entitled by using Dragon Stream Cipher algorithm. The tests performed are time of data encryption and decryption, and avalanche effect. Result in this paper show us time encryption and decryption are 0,85 seconds and 1,84 second for 30 seconds Cockpit Voice Recorder data witn an avalanche effect 48,67 %.

  6. The Temporal Lobes Differentiate between the Voices of Famous and Unknown People: An Event-Related fMRI Study on Speaker Recognition

    PubMed Central

    Bethmann, Anja; Scheich, Henning; Brechmann, André

    2012-01-01

    It is widely accepted that the perception of human voices is supported by neural structures located along the superior temporal sulci. However, there is an ongoing discussion to what extent the activations found in fMRI studies are evoked by the vocal features themselves or are the result of phonetic processing. To show that the temporal lobes are indeed engaged in voice processing, short utterances spoken by famous and unknown people were presented to healthy young participants whose task it was to identify the familiar speakers. In two event-related fMRI experiments, the temporal lobes were found to differentiate between familiar and unfamiliar voices such that named voices elicited higher BOLD signal intensities than unfamiliar voices. Yet, the temporal cortices did not only discriminate between familiar and unfamiliar voices. Experiment 2, which required overtly spoken responses and allowed to distinguish between four familiarity grades, revealed that there was a fine-grained differentiation between all of these familiarity levels with higher familiarity being associated with larger BOLD signal amplitudes. Finally, we observed a gradual response change such that the BOLD signal differences between unfamiliar and highly familiar voices increased with the distance of an area from the transverse temporal gyri, especially towards the anterior temporal cortex and the middle temporal gyri. Therefore, the results suggest that (the anterior and non-superior portions of) the temporal lobes participate in voice-specific processing independent from phonetic components also involved in spoken speech material. PMID:23112826

  7. The non-trusty clown attack on model-based speaker recognition systems

    NASA Astrophysics Data System (ADS)

    Farrokh Baroughi, Alireza; Craver, Scott

    2015-03-01

    Biometric detectors for speaker identification commonly employ a statistical model for a subject's voice, such as a Gaussian Mixture Model, that combines multiple means to improve detector performance. This allows a malicious insider to amend or append a component of a subject's statistical model so that a detector behaves normally except under a carefully engineered circumstance. This allows an attacker to force a misclassification of his or her voice only when desired, by smuggling data into a database far in advance of an attack. Note that the attack is possible if attacker has access to database even for a limited time to modify victim's model. We exhibit such an attack on a speaker identification, in which an attacker can force a misclassification by speaking in an unusual voice, and replacing the least weighted component of victim's model by the most weighted competent of the unusual voice of the attacker's model. The reason attacker make his or her voice unusual during the attack is because his or her normal voice model can be in database, and by attacking with unusual voice, the attacker has the option to be recognized as himself or herself when talking normally or as the victim when talking in the unusual manner. By attaching an appropriately weighted vector to a victim's model, we can impersonate all users in our simulations, while avoiding unwanted false rejections.

  8. Mothers Consistently Alter Their Unique Vocal Fingerprints When Communicating with Infants.

    PubMed

    Piazza, Elise A; Iordan, Marius Cătălin; Lew-Williams, Casey

    2017-10-23

    The voice is the most direct link we have to others' minds, allowing us to communicate using a rich variety of speech cues [1, 2]. This link is particularly critical early in life as parents draw infants into the structure of their environment using infant-directed speech (IDS), a communicative code with unique pitch and rhythmic characteristics relative to adult-directed speech (ADS) [3, 4]. To begin breaking into language, infants must discern subtle statistical differences about people and voices in order to direct their attention toward the most relevant signals. Here, we uncover a new defining feature of IDS: mothers significantly alter statistical properties of vocal timbre when speaking to their infants. Timbre, the tone color or unique quality of a sound, is a spectral fingerprint that helps us instantly identify and classify sound sources, such as individual people and musical instruments [5-7]. We recorded 24 mothers' naturalistic speech while they interacted with their infants and with adult experimenters in their native language. Half of the participants were English speakers, and half were not. Using a support vector machine classifier, we found that mothers consistently shifted their timbre between ADS and IDS. Importantly, this shift was similar across languages, suggesting that such alterations of timbre may be universal. These findings have theoretical implications for understanding how infants tune in to their local communicative environments. Moreover, our classification algorithm for identifying infant-directed timbre has direct translational implications for speech recognition technology. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. A preliminary analysis of human factors affecting the recognition accuracy of a discrete word recognizer for C3 systems

    NASA Astrophysics Data System (ADS)

    Yellen, H. W.

    1983-03-01

    Literature pertaining to Voice Recognition abounds with information relevant to the assessment of transitory speech recognition devices. In the past, engineering requirements have dictated the path this technology followed. But, other factors do exist that influence recognition accuracy. This thesis explores the impact of Human Factors on the successful recognition of speech, principally addressing the differences or variability among users. A Threshold Technology T-600 was used for a 100 utterance vocubalary to test 44 subjects. A statistical analysis was conducted on 5 generic categories of Human Factors: Occupational, Operational, Psychological, Physiological and Personal. How the equipment is trained and the experience level of the speaker were found to be key characteristics influencing recognition accuracy. To a lesser extent computer experience, time or week, accent, vital capacity and rate of air flow, speaker cooperativeness and anxiety were found to affect overall error rates.

  10. Automated Field-of-View, Illumination, and Recognition Algorithm Design of a Vision System for Pick-and-Place Considering Colour Information in Illumination and Images

    PubMed Central

    Chen, Yibing; Ogata, Taiki; Ueyama, Tsuyoshi; Takada, Toshiyuki; Ota, Jun

    2018-01-01

    Machine vision is playing an increasingly important role in industrial applications, and the automated design of image recognition systems has been a subject of intense research. This study has proposed a system for automatically designing the field-of-view (FOV) of a camera, the illumination strength and the parameters in a recognition algorithm. We formulated the design problem as an optimisation problem and used an experiment based on a hierarchical algorithm to solve it. The evaluation experiments using translucent plastics objects showed that the use of the proposed system resulted in an effective solution with a wide FOV, recognition of all objects and 0.32 mm and 0.4° maximal positional and angular errors when all the RGB (red, green and blue) for illumination and R channel image for recognition were used. Though all the RGB illumination and grey scale images also provided recognition of all the objects, only a narrow FOV was selected. Moreover, full recognition was not achieved by using only G illumination and a grey-scale image. The results showed that the proposed method can automatically design the FOV, illumination and parameters in the recognition algorithm and that tuning all the RGB illumination is desirable even when single-channel or grey-scale images are used for recognition. PMID:29786665

  11. Automated Field-of-View, Illumination, and Recognition Algorithm Design of a Vision System for Pick-and-Place Considering Colour Information in Illumination and Images.

    PubMed

    Chen, Yibing; Ogata, Taiki; Ueyama, Tsuyoshi; Takada, Toshiyuki; Ota, Jun

    2018-05-22

    Machine vision is playing an increasingly important role in industrial applications, and the automated design of image recognition systems has been a subject of intense research. This study has proposed a system for automatically designing the field-of-view (FOV) of a camera, the illumination strength and the parameters in a recognition algorithm. We formulated the design problem as an optimisation problem and used an experiment based on a hierarchical algorithm to solve it. The evaluation experiments using translucent plastics objects showed that the use of the proposed system resulted in an effective solution with a wide FOV, recognition of all objects and 0.32 mm and 0.4° maximal positional and angular errors when all the RGB (red, green and blue) for illumination and R channel image for recognition were used. Though all the RGB illumination and grey scale images also provided recognition of all the objects, only a narrow FOV was selected. Moreover, full recognition was not achieved by using only G illumination and a grey-scale image. The results showed that the proposed method can automatically design the FOV, illumination and parameters in the recognition algorithm and that tuning all the RGB illumination is desirable even when single-channel or grey-scale images are used for recognition.

  12. Locality constrained joint dynamic sparse representation for local matching based face recognition.

    PubMed

    Wang, Jianzhong; Yi, Yugen; Zhou, Wei; Shi, Yanjiao; Qi, Miao; Zhang, Ming; Zhang, Baoxue; Kong, Jun

    2014-01-01

    Recently, Sparse Representation-based Classification (SRC) has attracted a lot of attention for its applications to various tasks, especially in biometric techniques such as face recognition. However, factors such as lighting, expression, pose and disguise variations in face images will decrease the performances of SRC and most other face recognition techniques. In order to overcome these limitations, we propose a robust face recognition method named Locality Constrained Joint Dynamic Sparse Representation-based Classification (LCJDSRC) in this paper. In our method, a face image is first partitioned into several smaller sub-images. Then, these sub-images are sparsely represented using the proposed locality constrained joint dynamic sparse representation algorithm. Finally, the representation results for all sub-images are aggregated to obtain the final recognition result. Compared with other algorithms which process each sub-image of a face image independently, the proposed algorithm regards the local matching-based face recognition as a multi-task learning problem. Thus, the latent relationships among the sub-images from the same face image are taken into account. Meanwhile, the locality information of the data is also considered in our algorithm. We evaluate our algorithm by comparing it with other state-of-the-art approaches. Extensive experiments on four benchmark face databases (ORL, Extended YaleB, AR and LFW) demonstrate the effectiveness of LCJDSRC.

  13. Information system for diagnosis of respiratory system diseases

    NASA Astrophysics Data System (ADS)

    Abramov, G. V.; Korobova, L. A.; Ivashin, A. L.; Matytsina, I. A.

    2018-05-01

    An information system is for the diagnosis of patients with lung diseases. The main problem solved by this system is the definition of the parameters of cough fragments in the monitoring recordings using a voice recorder. The authors give the recognition criteria of recorded cough moments, audio records analysis. The results of the research are systematized. The cough recognition system can be used by the medical specialists to diagnose the condition of the patients and to monitor the process of their treatment.

  14. Full Duplex, Spread Spectrum Radio System

    NASA Technical Reports Server (NTRS)

    Harvey, Bruce A.

    2000-01-01

    The goal of this project was to support the development of a full duplex, spread spectrum voice communications system. The assembly and testing of a prototype system consisting of a Harris PRISM spread spectrum radio, a TMS320C54x signal processing development board and a Zilog Z80180 microprocessor was underway at the start of this project. The efforts under this project were the development of multiple access schemes, analysis of full duplex voice feedback delays, and the development and analysis of forward error correction (FEC) algorithms. The multiple access analysis involved the selection between code division multiple access (CDMA), frequency division multiple access (FDMA) and time division multiple access (TDMA). Full duplex voice feedback analysis involved the analysis of packet size and delays associated with full loop voice feedback for confirmation of radio system performance. FEC analysis included studies of the performance under the expected burst error scenario with the relatively short packet lengths, and analysis of implementation in the TMS320C54x digital signal processor. When the capabilities and the limitations of the components used were considered, the multiple access scheme chosen was a combination TDMA/FDMA scheme that will provide up to eight users on each of three separate frequencies. Packets to and from each user will consist of 16 samples at a rate of 8,000 samples per second for a total of 2 ms of voice information. The resulting voice feedback delay will therefore be 4 - 6 ms. The most practical FEC algorithm for implementation was a convolutional code with a Viterbi decoder. Interleaving of the bits of each packet will be required to offset the effects of burst errors.

  15. A Multimodal Emotion Detection System during Human-Robot Interaction

    PubMed Central

    Alonso-Martín, Fernando; Malfaz, María; Sequeira, João; Gorostiza, Javier F.; Salichs, Miguel A.

    2013-01-01

    In this paper, a multimodal user-emotion detection system for social robots is presented. This system is intended to be used during human–robot interaction, and it is integrated as part of the overall interaction system of the robot: the Robotics Dialog System (RDS). Two modes are used to detect emotions: the voice and face expression analysis. In order to analyze the voice of the user, a new component has been developed: Gender and Emotion Voice Analysis (GEVA), which is written using the Chuck language. For emotion detection in facial expressions, the system, Gender and Emotion Facial Analysis (GEFA), has been also developed. This last system integrates two third-party solutions: Sophisticated High-speed Object Recognition Engine (SHORE) and Computer Expression Recognition Toolbox (CERT). Once these new components (GEVA and GEFA) give their results, a decision rule is applied in order to combine the information given by both of them. The result of this rule, the detected emotion, is integrated into the dialog system through communicative acts. Hence, each communicative act gives, among other things, the detected emotion of the user to the RDS so it can adapt its strategy in order to get a greater satisfaction degree during the human–robot dialog. Each of the new components, GEVA and GEFA, can also be used individually. Moreover, they are integrated with the robotic control platform ROS (Robot Operating System). Several experiments with real users were performed to determine the accuracy of each component and to set the final decision rule. The results obtained from applying this decision rule in these experiments show a high success rate in automatic user emotion recognition, improving the results given by the two information channels (audio and visual) separately. PMID:24240598

  16. A Human Activity Recognition System Using Skeleton Data from RGBD Sensors.

    PubMed

    Cippitelli, Enea; Gasparrini, Samuele; Gambi, Ennio; Spinsante, Susanna

    2016-01-01

    The aim of Active and Assisted Living is to develop tools to promote the ageing in place of elderly people, and human activity recognition algorithms can help to monitor aged people in home environments. Different types of sensors can be used to address this task and the RGBD sensors, especially the ones used for gaming, are cost-effective and provide much information about the environment. This work aims to propose an activity recognition algorithm exploiting skeleton data extracted by RGBD sensors. The system is based on the extraction of key poses to compose a feature vector, and a multiclass Support Vector Machine to perform classification. Computation and association of key poses are carried out using a clustering algorithm, without the need of a learning algorithm. The proposed approach is evaluated on five publicly available datasets for activity recognition, showing promising results especially when applied for the recognition of AAL related actions. Finally, the current applicability of this solution in AAL scenarios and the future improvements needed are discussed.

  17. Gaussian mixture models-based ship target recognition algorithm in remote sensing infrared images

    NASA Astrophysics Data System (ADS)

    Yao, Shoukui; Qin, Xiaojuan

    2018-02-01

    Since the resolution of remote sensing infrared images is low, the features of ship targets become unstable. The issue of how to recognize ships with fuzzy features is an open problem. In this paper, we propose a novel ship target recognition algorithm based on Gaussian mixture models (GMMs). In the proposed algorithm, there are mainly two steps. At the first step, the Hu moments of these ship target images are calculated, and the GMMs are trained on the moment features of ships. At the second step, the moment feature of each ship image is assigned to the trained GMMs for recognition. Because of the scale, rotation, translation invariance property of Hu moments and the power feature-space description ability of GMMs, the GMMs-based ship target recognition algorithm can recognize ship reliably. Experimental results of a large simulating image set show that our approach is effective in distinguishing different ship types, and obtains a satisfactory ship recognition performance.

  18. Word recognition using a lexicon constrained by first/last character decisions

    NASA Astrophysics Data System (ADS)

    Zhao, Sheila X.; Srihari, Sargur N.

    1995-03-01

    In lexicon based recognition of machine-printed word images, the size of the lexicon can be quite extensive. The recognition performance is closely related to the size of the lexicon. Recognition performance drops quickly when lexicon size increases. Here, we present an algorithm to improve the word recognition performance by reducing the size of the given lexicon. The algorithm utilizes the information provided by the first and last characters of a word to reduce the size of the given lexicon. Given a word image and a lexicon that contains the word in the image, the first and last characters are segmented and then recognized by a character classifier. The possible candidates based on the results given by the classifier are selected, which give us the sub-lexicon. Then a word shape analysis algorithm is applied to produce the final ranking of the given lexicon. The algorithm was tested on a set of machine- printed gray-scale word images which includes a wide range of print types and qualities.

  19. Vision-based posture recognition using an ensemble classifier and a vote filter

    NASA Astrophysics Data System (ADS)

    Ji, Peng; Wu, Changcheng; Xu, Xiaonong; Song, Aiguo; Li, Huijun

    2016-10-01

    Posture recognition is a very important Human-Robot Interaction (HRI) way. To segment effective posture from an image, we propose an improved region grow algorithm which combining with the Single Gauss Color Model. The experiment shows that the improved region grow algorithm can get the complete and accurate posture than traditional Single Gauss Model and region grow algorithm, and it can eliminate the similar region from the background at the same time. In the posture recognition part, and in order to improve the recognition rate, we propose a CNN ensemble classifier, and in order to reduce the misjudgments during a continuous gesture control, a vote filter is proposed and applied to the sequence of recognition results. Comparing with CNN classifier, the CNN ensemble classifier we proposed can yield a 96.27% recognition rate, which is better than that of CNN classifier, and the proposed vote filter can improve the recognition result and reduce the misjudgments during the consecutive gesture switch.

  20. Microcomputers: Independence and Information Access for the Physically Handicapped.

    ERIC Educational Resources Information Center

    Regen, Shari S.; Chen, Ching-chih

    1984-01-01

    Provides overview of recent technological developments in microcomputer technology for the physically disabled, including discussion of view expansion, "talking terminals," voice recognition, and price and convenience of micro-based products. Equipment manufacturers and training centers for the physically disabled are listed and microcomputer…

  1. Sperry Univac speech communications technology

    NASA Technical Reports Server (NTRS)

    Medress, Mark F.

    1977-01-01

    Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.

  2. An Indoor Pedestrian Positioning Method Using HMM with a Fuzzy Pattern Recognition Algorithm in a WLAN Fingerprint System

    PubMed Central

    Ni, Yepeng; Liu, Jianbo; Liu, Shan; Bai, Yaxin

    2016-01-01

    With the rapid development of smartphones and wireless networks, indoor location-based services have become more and more prevalent. Due to the sophisticated propagation of radio signals, the Received Signal Strength Indicator (RSSI) shows a significant variation during pedestrian walking, which introduces critical errors in deterministic indoor positioning. To solve this problem, we present a novel method to improve the indoor pedestrian positioning accuracy by embedding a fuzzy pattern recognition algorithm into a Hidden Markov Model. The fuzzy pattern recognition algorithm follows the rule that the RSSI fading has a positive correlation to the distance between the measuring point and the AP location even during a dynamic positioning measurement. Through this algorithm, we use the RSSI variation trend to replace the specific RSSI value to achieve a fuzzy positioning. The transition probability of the Hidden Markov Model is trained by the fuzzy pattern recognition algorithm with pedestrian trajectories. Using the Viterbi algorithm with the trained model, we can obtain a set of hidden location states. In our experiments, we demonstrate that, compared with the deterministic pattern matching algorithm, our method can greatly improve the positioning accuracy and shows robust environmental adaptability. PMID:27618053

  3. Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.

    PubMed

    Shao, Xu; Milner, Ben

    2005-08-01

    This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.

  4. Measures of voiced frication for automatic classification

    NASA Astrophysics Data System (ADS)

    Jackson, Philip J. B.; Jesus, Luis M. T.; Shadle, Christine H.; Pincas, Jonathan

    2004-05-01

    As an approach to understanding the characteristics of the acoustic sources in voiced fricatives, it seems apt to draw on knowledge of vowels and voiceless fricatives, which have been relatively well studied. However, the presence of both phonation and frication in these mixed-source sounds offers the possibility of mutual interaction effects, with variations across place of articulation. This paper examines the acoustic and articulatory consequences of these interactions and explores automatic techniques for finding parametric and statistical descriptions of these phenomena. A reliable and consistent set of such acoustic cues could be used for phonetic classification or speech recognition. Following work on devoicing of European Portuguese voiced fricatives [Jesus and Shadle, in Mamede et al. (eds.) (Springer-Verlag, Berlin, 2003), pp. 1-8]. and the modulating effect of voicing on frication [Jackson and Shadle, J. Acoust. Soc. Am. 108, 1421-1434 (2000)], the present study focuses on three types of information: (i) sequences and durations of acoustic events in VC transitions, (ii) temporal, spectral and modulation measures from the periodic and aperiodic components of the acoustic signal, and (iii) voicing activity derived from simultaneous EGG data. Analysis of interactions observed in British/American English and European Portuguese speech corpora will be compared, and the principal findings discussed.

  5. Do Listeners Store in Memory a Speaker's Habitual Utterance-Final Phonation Type?

    PubMed Central

    Bőhm, Tamás; Shattuck-Hufnagel, Stefanie

    2009-01-01

    Earlier studies report systematic differences across speakers in the occurrence of utterance-final irregular phonation; the work reported here investigated whether human listeners remember this speaker-specific information and can access it when necessary (a prerequisite for using this cue in speaker recognition). Listeners personally familiar with the voices of the speakers were presented with pairs of speech samples: one with the original and the other with transformed final phonation type. Asked to select the member of the pair that was closer to the talker's voice, most listeners tended to choose the unmanipulated token (even though they judged them to sound essentially equally natural). This suggests that utterance-final pitch period irregularity is part of the mental representation of individual speaker voices, although this may depend on the individual speaker and listener to some extent. PMID:19776665

  6. The effect of voice onset time differences on lexical access in Dutch.

    PubMed

    van Alphen, Petra M; McQueen, James M

    2006-02-01

    Effects on spoken-word recognition of prevoicing differences in Dutch initial voiced plosives were examined. In 2 cross-modal identity-priming experiments, participants heard prime words and nonwords beginning with voiced plosives with 12, 6, or 0 periods of prevoicing or matched items beginning with voiceless plosives and made lexical decisions to visual tokens of those items. Six-period primes had the same effect as 12-period primes. Zero-period primes had a different effect, but only when their voiceless counterparts were real words. Listeners could nevertheless discriminate the 6-period primes from the 12- and 0-period primes. Phonetic detail appears to influence lexical access only to the extent that it is useful: In Dutch, presence versus absence of prevoicing is more informative than amount of prevoicing. ((c) 2006 APA, all rights reserved).

  7. Optical character recognition of handwritten Arabic using hidden Markov models

    NASA Astrophysics Data System (ADS)

    Aulama, Mohannad M.; Natsheh, Asem M.; Abandah, Gheith A.; Olama, Mohammed M.

    2011-04-01

    The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language is initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.

  8. Optical character recognition of handwritten Arabic using hidden Markov models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aulama, Mohannad M.; Natsheh, Asem M.; Abandah, Gheith A.

    2011-01-01

    The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language ismore » initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.« less

  9. Hyperspectral face recognition with spatiospectral information fusion and PLS regression.

    PubMed

    Uzair, Muhammad; Mahmood, Arif; Mian, Ajmal

    2015-03-01

    Hyperspectral imaging offers new opportunities for face recognition via improved discrimination along the spectral dimension. However, it poses new challenges, including low signal-to-noise ratio, interband misalignment, and high data dimensionality. Due to these challenges, the literature on hyperspectral face recognition is not only sparse but is limited to ad hoc dimensionality reduction techniques and lacks comprehensive evaluation. We propose a hyperspectral face recognition algorithm using a spatiospectral covariance for band fusion and partial least square regression for classification. Moreover, we extend 13 existing face recognition techniques, for the first time, to perform hyperspectral face recognition.We formulate hyperspectral face recognition as an image-set classification problem and evaluate the performance of seven state-of-the-art image-set classification techniques. We also test six state-of-the-art grayscale and RGB (color) face recognition algorithms after applying fusion techniques on hyperspectral images. Comparison with the 13 extended and five existing hyperspectral face recognition techniques on three standard data sets show that the proposed algorithm outperforms all by a significant margin. Finally, we perform band selection experiments to find the most discriminative bands in the visible and near infrared response spectrum.

  10. [Detection of endpoint for segmentation between consonants and vowels in aphasia rehabilitation software based on artificial intelligence scheduling].

    PubMed

    Deng, Xingjuan; Chen, Ji; Shuai, Jie

    2009-08-01

    For the purpose of improving the efficiency of aphasia rehabilitation training, artificial intelligence-scheduling function is added in the aphasia rehabilitation software, and the software's performance is improved. With the characteristics of aphasia patient's voice as well as with the need of artificial intelligence-scheduling functions under consideration, the present authors have designed a set of endpoint detection algorithm. It determines the reference endpoints, then extracts every word and ensures the reasonable segmentation points between consonants and vowels, using the reference endpoints. The results of experiments show that the algorithm is able to attain the objects of detection at a higher accuracy rate. Therefore, it is applicable to the detection of endpoint on aphasia-patient's voice.

  11. Collegial Activity Learning between Heterogeneous Sensors.

    PubMed

    Feuz, Kyle D; Cook, Diane J

    2017-11-01

    Activity recognition algorithms have matured and become more ubiquitous in recent years. However, these algorithms are typically customized for a particular sensor platform. In this paper we introduce PECO, a Personalized activity ECOsystem, that transfers learned activity information seamlessly between sensor platforms in real time so that any available sensor can continue to track activities without requiring its own extensive labeled training data. We introduce a multi-view transfer learning algorithm that facilitates this information handoff between sensor platforms and provide theoretical performance bounds for the algorithm. In addition, we empirically evaluate PECO using datasets that utilize heterogeneous sensor platforms to perform activity recognition. These results indicate that not only can activity recognition algorithms transfer important information to new sensor platforms, but any number of platforms can work together as colleagues to boost performance.

  12. High Tech and Library Access for People with Disabilities.

    ERIC Educational Resources Information Center

    Roatch, Mary A.

    1992-01-01

    Describes tools that enable people with disabilities to access print information, including optical character recognition, synthetic voice output, other input devices, Braille access devices, large print displays, television and video, TDD (Telecommunications Devices for the Deaf), and Telebraille. Use of technology by libraries to meet mandates…

  13. Pulmonary Nodule Recognition Based on Multiple Kernel Learning Support Vector Machine-PSO

    PubMed Central

    Zhu, Zhichuan; Zhao, Qingdong; Liu, Liwei; Zhang, Lijuan

    2018-01-01

    Pulmonary nodule recognition is the core module of lung CAD. The Support Vector Machine (SVM) algorithm has been widely used in pulmonary nodule recognition, and the algorithm of Multiple Kernel Learning Support Vector Machine (MKL-SVM) has achieved good results therein. Based on grid search, however, the MKL-SVM algorithm needs long optimization time in course of parameter optimization; also its identification accuracy depends on the fineness of grid. In the paper, swarm intelligence is introduced and the Particle Swarm Optimization (PSO) is combined with MKL-SVM algorithm to be MKL-SVM-PSO algorithm so as to realize global optimization of parameters rapidly. In order to obtain the global optimal solution, different inertia weights such as constant inertia weight, linear inertia weight, and nonlinear inertia weight are applied to pulmonary nodules recognition. The experimental results show that the model training time of the proposed MKL-SVM-PSO algorithm is only 1/7 of the training time of the MKL-SVM grid search algorithm, achieving better recognition effect. Moreover, Euclidean norm of normalized error vector is proposed to measure the proximity between the average fitness curve and the optimal fitness curve after convergence. Through statistical analysis of the average of 20 times operation results with different inertial weights, it can be seen that the dynamic inertial weight is superior to the constant inertia weight in the MKL-SVM-PSO algorithm. In the dynamic inertial weight algorithm, the parameter optimization time of nonlinear inertia weight is shorter; the average fitness value after convergence is much closer to the optimal fitness value, which is better than the linear inertial weight. Besides, a better nonlinear inertial weight is verified. PMID:29853983

  14. Pulmonary Nodule Recognition Based on Multiple Kernel Learning Support Vector Machine-PSO.

    PubMed

    Li, Yang; Zhu, Zhichuan; Hou, Alin; Zhao, Qingdong; Liu, Liwei; Zhang, Lijuan

    2018-01-01

    Pulmonary nodule recognition is the core module of lung CAD. The Support Vector Machine (SVM) algorithm has been widely used in pulmonary nodule recognition, and the algorithm of Multiple Kernel Learning Support Vector Machine (MKL-SVM) has achieved good results therein. Based on grid search, however, the MKL-SVM algorithm needs long optimization time in course of parameter optimization; also its identification accuracy depends on the fineness of grid. In the paper, swarm intelligence is introduced and the Particle Swarm Optimization (PSO) is combined with MKL-SVM algorithm to be MKL-SVM-PSO algorithm so as to realize global optimization of parameters rapidly. In order to obtain the global optimal solution, different inertia weights such as constant inertia weight, linear inertia weight, and nonlinear inertia weight are applied to pulmonary nodules recognition. The experimental results show that the model training time of the proposed MKL-SVM-PSO algorithm is only 1/7 of the training time of the MKL-SVM grid search algorithm, achieving better recognition effect. Moreover, Euclidean norm of normalized error vector is proposed to measure the proximity between the average fitness curve and the optimal fitness curve after convergence. Through statistical analysis of the average of 20 times operation results with different inertial weights, it can be seen that the dynamic inertial weight is superior to the constant inertia weight in the MKL-SVM-PSO algorithm. In the dynamic inertial weight algorithm, the parameter optimization time of nonlinear inertia weight is shorter; the average fitness value after convergence is much closer to the optimal fitness value, which is better than the linear inertial weight. Besides, a better nonlinear inertial weight is verified.

  15. ERP evidence of preserved early memory function in term infants with neonatal encephalopathy following therapeutic hypothermia.

    PubMed

    Pfister, Katie M; Zhang, Lei; Miller, Neely C; Hultgren, Solveig; Boys, Chris J; Georgieff, Michael K

    2016-12-01

    Neonatal encephalopathy (NE) carries high risk for neurodevelopmental impairments. Therapeutic hypothermia (TH) reduces this risk, particularly for moderate encephalopathy (ME). Nevertheless, these infants often have subtle functional deficits, including abnormal memory function. Detection of deficits at the earliest possible time-point would allow for intervention during a period of maximal brain plasticity. Recognition memory function in 22 infants with NE treated with TH was compared to 23 healthy controls using event-related potentials (ERPs) at 2 wk of age. ERPs were recorded to mother's voice alternating with a stranger's voice to assess attentional responses (P2), novelty detection (slow wave), and discrimination between familiar and novel (difference wave). Development was tested at 12 mo using the Bayley Scales of Infant Development, Third Edition (BSID-III). The NE group showed similar ERP components and BSID-III scores to controls. However, infants with NE showed discrimination at midline leads (P = 0.01), whereas controls showed discrimination in the left hemisphere (P = 0.05). Normal MRI (P = 0.05) and seizure-free electroencephalogram (EEG) (P = 0.04) correlated positively with outcomes. Infants with NE have preserved recognition memory function after TH. The spatially different recognition memory processing after early brain injury may represent compensatory changes in the brain circuitry and reflect a benefit of TH.

  16. Speech-recognition interfaces for music information retrieval

    NASA Astrophysics Data System (ADS)

    Goto, Masataka

    2005-09-01

    This paper describes two hands-free music information retrieval (MIR) systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. Our MIR-based jukebox systems employ two different speech-recognition interfaces for MIR, speech completion and speech spotter, which exploit intentionally controlled nonverbal speech information in original ways. The first is a music retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user only remembers part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces. (Video clips: http://staff.aist.go.jp/m.goto/MIR/speech-if.html)

  17. Phase effects in masking by harmonic complexes: speech recognition.

    PubMed

    Deroche, Mickael L D; Culling, John F; Chatterjee, Monita

    2013-12-01

    Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.

  18. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    NASA Astrophysics Data System (ADS)

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-09-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

  19. A novel speech processing algorithm based on harmonicity cues in cochlear implant

    NASA Astrophysics Data System (ADS)

    Wang, Jian; Chen, Yousheng; Zhang, Zongping; Chen, Yan; Zhang, Weifeng

    2017-08-01

    This paper proposed a novel speech processing algorithm in cochlear implant, which used harmonicity cues to enhance tonal information in Mandarin Chinese speech recognition. The input speech was filtered by a 4-channel band-pass filter bank. The frequency ranges for the four bands were: 300-621, 621-1285, 1285-2657, and 2657-5499 Hz. In each pass band, temporal envelope and periodicity cues (TEPCs) below 400 Hz were extracted by full wave rectification and low-pass filtering. The TEPCs were modulated by a sinusoidal carrier, the frequency of which was fundamental frequency (F0) and its harmonics most close to the center frequency of each band. Signals from each band were combined together to obtain an output speech. Mandarin tone, word, and sentence recognition in quiet listening conditions were tested for the extensively used continuous interleaved sampling (CIS) strategy and the novel F0-harmonic algorithm. Results found that the F0-harmonic algorithm performed consistently better than CIS strategy in Mandarin tone, word, and sentence recognition. In addition, sentence recognition rate was higher than word recognition rate, as a result of contextual information in the sentence. Moreover, tone 3 and 4 performed better than tone 1 and tone 2, due to the easily identified features of the former. In conclusion, the F0-harmonic algorithm could enhance tonal information in cochlear implant speech processing due to the use of harmonicity cues, thereby improving Mandarin tone, word, and sentence recognition. Further study will focus on the test of the F0-harmonic algorithm in noisy listening conditions.

  20. A Qualitative Examination of Situational Risk Recognition Among Female Victims of Physical Intimate Partner Violence.

    PubMed

    Sherrill, Andrew M; Bell, Kathryn M; Wyngarden, Nicole

    2016-07-01

    Little is known about intimate partner violence (IPV) victims' situational risk recognition, defined as the ability to identify situational factors that signal imminent risk of victimization. Using semi-structured interviews, qualitative data were collected from a community sample of 31 female victims of IPV episodes involving substance use. Thirteen themes were identified, the most prevalent being related to the partner's verbal behavior, tone of voice, motor behavior, alcohol or drug use, and facial expression. Participants reporting at least some anticipation of physical aggression (61.3% of the sample) tended to identify multiple factors (M = 3.47), suggesting numerous situational features often contribute to situational risk recognition. © The Author(s) 2015.

  1. Feature extraction for face recognition via Active Shape Model (ASM) and Active Appearance Model (AAM)

    NASA Astrophysics Data System (ADS)

    Iqtait, M.; Mohamad, F. S.; Mamat, M.

    2018-03-01

    Biometric is a pattern recognition system which is used for automatic recognition of persons based on characteristics and features of an individual. Face recognition with high recognition rate is still a challenging task and usually accomplished in three phases consisting of face detection, feature extraction, and expression classification. Precise and strong location of trait point is a complicated and difficult issue in face recognition. Cootes proposed a Multi Resolution Active Shape Models (ASM) algorithm, which could extract specified shape accurately and efficiently. Furthermore, as the improvement of ASM, Active Appearance Models algorithm (AAM) is proposed to extracts both shape and texture of specified object simultaneously. In this paper we give more details about the two algorithms and give the results of experiments, testing their performance on one dataset of faces. We found that the ASM is faster and gains more accurate trait point location than the AAM, but the AAM gains a better match to the texture.

  2. Infrared vehicle recognition using unsupervised feature learning based on K-feature

    NASA Astrophysics Data System (ADS)

    Lin, Jin; Tan, Yihua; Xia, Haijiao; Tian, Jinwen

    2018-02-01

    Subject to the complex battlefield environment, it is difficult to establish a complete knowledge base in practical application of vehicle recognition algorithms. The infrared vehicle recognition is always difficult and challenging, which plays an important role in remote sensing. In this paper we propose a new unsupervised feature learning method based on K-feature to recognize vehicle in infrared images. First, we use the target detection algorithm which is based on the saliency to detect the initial image. Then, the unsupervised feature learning based on K-feature, which is generated by Kmeans clustering algorithm that extracted features by learning a visual dictionary from a large number of samples without label, is calculated to suppress the false alarm and improve the accuracy. Finally, the vehicle target recognition image is finished by some post-processing. Large numbers of experiments demonstrate that the proposed method has satisfy recognition effectiveness and robustness for vehicle recognition in infrared images under complex backgrounds, and it also improve the reliability of it.

  3. Wide-threat detection: recognition of adversarial missions and activity patterns in Empire Challenge 2009

    NASA Astrophysics Data System (ADS)

    Levchuk, Georgiy; Shabarekh, Charlotte; Furjanic, Caitlin

    2011-06-01

    In this paper, we present results of adversarial activity recognition using data collected in the Empire Challenge (EC 09) exercise. The EC09 experiment provided an opportunity to evaluate our probabilistic spatiotemporal mission recognition algorithms using the data from live air-born and ground sensors. Using ambiguous and noisy data about locations of entities and motion events on the ground, the algorithms inferred the types and locations of OPFOR activities, including reconnaissance, cache runs, IED emplacements, logistics, and planning meetings. In this paper, we present detailed summary of the validation study and recognition accuracy results. Our algorithms were able to detect locations and types of over 75% of hostile activities in EC09 while producing 25% false alarms.

  4. An adaptive deep Q-learning strategy for handwritten digit recognition.

    PubMed

    Qiao, Junfei; Wang, Gongming; Li, Wenjing; Chen, Min

    2018-02-22

    Handwritten digits recognition is a challenging problem in recent years. Although many deep learning-based classification algorithms are studied for handwritten digits recognition, the recognition accuracy and running time still need to be further improved. In this paper, an adaptive deep Q-learning strategy is proposed to improve accuracy and shorten running time for handwritten digit recognition. The adaptive deep Q-learning strategy combines the feature-extracting capability of deep learning and the decision-making of reinforcement learning to form an adaptive Q-learning deep belief network (Q-ADBN). First, Q-ADBN extracts the features of original images using an adaptive deep auto-encoder (ADAE), and the extracted features are considered as the current states of Q-learning algorithm. Second, Q-ADBN receives Q-function (reward signal) during recognition of the current states, and the final handwritten digits recognition is implemented by maximizing the Q-function using Q-learning algorithm. Finally, experimental results from the well-known MNIST dataset show that the proposed Q-ADBN has a superiority to other similar methods in terms of accuracy and running time. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. Appearance-based face recognition and light-fields.

    PubMed

    Gross, Ralph; Matthews, Iain; Baker, Simon

    2004-04-01

    Arguably the most important decision to be made when developing an object recognition algorithm is selecting the scene measurements or features on which to base the algorithm. In appearance-based object recognition, the features are chosen to be the pixel intensity values in an image of the object. These pixel intensities correspond directly to the radiance of light emitted from the object along certain rays in space. The set of all such radiance values over all possible rays is known as the plenoptic function or light-field. In this paper, we develop a theory of appearance-based object recognition from light-fields. This theory leads directly to an algorithm for face recognition across pose that uses as many images of the face as are available, from one upwards. All of the pixels, whichever image they come from, are treated equally and used to estimate the (eigen) light-field of the object. The eigen light-field is then used as the set of features on which to base recognition, analogously to how the pixel intensities are used in appearance-based face and object recognition.

  6. A Fault Recognition System for Gearboxes of Wind Turbines

    NASA Astrophysics Data System (ADS)

    Yang, Zhiling; Huang, Haiyue; Yin, Zidong

    2017-12-01

    Costs of maintenance and loss of power generation caused by the faults of wind turbines gearboxes are the main components of operation costs for a wind farm. Therefore, the technology of condition monitoring and fault recognition for wind turbines gearboxes is becoming a hot topic. A condition monitoring and fault recognition system (CMFRS) is presented for CBM of wind turbines gearboxes in this paper. The vibration signals from acceleration sensors at different locations of gearbox and the data from supervisory control and data acquisition (SCADA) system are collected to CMFRS. Then the feature extraction and optimization algorithm is applied to these operational data. Furthermore, to recognize the fault of gearboxes, the GSO-LSSVR algorithm is proposed, combining the least squares support vector regression machine (LSSVR) with the Glowworm Swarm Optimization (GSO) algorithm. Finally, the results show that the fault recognition system used in this paper has a high rate for identifying three states of wind turbines’ gears; besides, the combination of date features can affect the identifying rate and the selection optimization algorithm presented in this paper can get a pretty good date feature subset for the fault recognition.

  7. Classifying performance impairment in response to sleep loss using pattern recognition algorithms on single session testing

    PubMed Central

    St. Hilaire, Melissa A.; Sullivan, Jason P.; Anderson, Clare; Cohen, Daniel A.; Barger, Laura K.; Lockley, Steven W.; Klerman, Elizabeth B.

    2012-01-01

    There is currently no “gold standard” marker of cognitive performance impairment resulting from sleep loss. We utilized pattern recognition algorithms to determine which features of data collected under controlled laboratory conditions could most reliably identify cognitive performance impairment in response to sleep loss using data from only one testing session, such as would occur in the “real world” or field conditions. A training set for testing the pattern recognition algorithms was developed using objective Psychomotor Vigilance Task (PVT) and subjective Karolinska Sleepiness Scale (KSS) data collected from laboratory studies during which subjects were sleep deprived for 26 – 52 hours. The algorithm was then tested in data from both laboratory and field experiments. The pattern recognition algorithm was able to identify performance impairment with a single testing session in individuals studied under laboratory conditions using PVT, KSS, length of time awake and time of day information with sensitivity and specificity as high as 82%. When this algorithm was tested on data collected under real-world conditions from individuals whose data were not in the training set, accuracy of predictions for individuals categorized with low performance impairment were as high as 98%. Predictions for medium and severe performance impairment were less accurate. We conclude that pattern recognition algorithms may be a promising method for identifying performance impairment in individuals using only current information about the individual’s behavior. Single testing features (e.g., number of PVT lapses) with high correlation with performance impairment in the laboratory setting may not be the best indicators of performance impairment under real-world conditions. Pattern recognition algorithms should be further tested for their ability to be used in conjunction with other assessments of sleepiness in real-world conditions to quantify performance impairment in response to sleep loss. PMID:22959616

  8. Children's Recognition of Emotions from Vocal Cues

    ERIC Educational Resources Information Center

    Sauter, Disa A.; Panattoni, Charlotte; Happe, Francesca

    2013-01-01

    Emotional cues contain important information about the intentions and feelings of others. Despite a wealth of research into children's understanding of facial signals of emotions, little research has investigated the developmental trajectory of interpreting affective cues in the voice. In this study, 48 children ranging between 5 and 10 years were…

  9. Must One Be "In Recovery" To Help?

    ERIC Educational Resources Information Center

    Trimpey, Jack

    Rational Recovery (RR) and the Addictive Voice Recognition Technique (AVRT) are described. Rational recovery is a young organization which views alcohol and drug dependency differently from the traditional field which sees addiction as a symptom of something, of a disease, of spiritual bankruptcy, of irrational thinking, of unhappiness, of…

  10. Morphosyntactic Neural Analysis for Generalized Lexical Normalization

    ERIC Educational Resources Information Center

    Leeman-Munk, Samuel Paul

    2016-01-01

    The phenomenal growth of social media, web forums, and online reviews has spurred a growing interest in automated analysis of user-generated text. At the same time, a proliferation of voice recordings and efforts to archive culture heritage documents are fueling demand for effective automatic speech recognition (ASR) and optical character…

  11. Web Surveys to Digital Movies: Technological Tools of the Trade.

    ERIC Educational Resources Information Center

    Fetterman, David M.

    2002-01-01

    Highlights some of the technological tools used by educational researchers today, focusing on data collection related tools such as Web surveys, digital photography, voice recognition and transcription, file sharing and virtual office, videoconferencing on the Internet, instantaneous chat and chat rooms, reporting and dissemination, and digital…

  12. Online Feature Transformation Learning for Cross-Domain Object Category Recognition.

    PubMed

    Zhang, Xuesong; Zhuang, Yan; Wang, Wei; Pedrycz, Witold

    2017-06-09

    In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.

  13. Impact of a Telehealth Program With Voice Recognition Technology in Patients With Chronic Heart Failure: Feasibility Study.

    PubMed

    Lee, Heesun; Park, Jun-Bean; Choi, Sae Won; Yoon, Yeonyee E; Park, Hyo Eun; Lee, Sang Eun; Lee, Seung-Pyo; Kim, Hyung-Kwan; Cho, Hyun-Jai; Choi, Su-Yeon; Lee, Hae-Young; Choi, Jonghyuk; Lee, Young-Joon; Kim, Yong-Jin; Cho, Goo-Yeong; Choi, Jinwook; Sohn, Dae-Won

    2017-10-02

    Despite the advances in the diagnosis and treatment of heart failure (HF), the current hospital-oriented framework for HF management does not appear to be sufficient to maintain the stability of HF patients in the long term. The importance of self-care management is increasingly being emphasized as a promising long-term treatment strategy for patients with chronic HF. The objective of this study was to evaluate whether a new information communication technology (ICT)-based telehealth program with voice recognition technology could improve clinical or laboratory outcomes in HF patients. In this prospective single-arm pilot study, we recruited 31 consecutive patients with chronic HF who were referred to our institute. An ICT-based telehealth program with voice recognition technology was developed and used by patients with HF for 12 weeks. Patients were educated on the use of this program via mobile phone, landline, or the Internet for the purpose of improving communication and data collection. Using these systems, we collected comprehensive data elements related to the risk of HF self-care management such as weight, diet, exercise, medication adherence, overall symptom change, and home blood pressure. The study endpoints were the changes observed in urine sodium concentration (uNa), Minnesota Living with Heart Failure (MLHFQ) scores, 6-min walk test, and N-terminal prohormone of brain natriuretic peptide (NT-proBNP) as surrogate markers for appropriate HF management. Among the 31 enrolled patients, 27 (87%) patients completed the study, and 10 (10/27, 37%) showed good adherence to ICT-based telehealth program with voice recognition technology, which was defined as the use of the program for 100 times or more during the study period. Nearly three-fourths of the patients had been hospitalized at least once because of HF before the enrollment (20/27, 74%); 14 patients had 1, 2 patients had 2, and 4 patients had 3 or more previous HF hospitalizations. In the total study population, there was no significant interval change in laboratory and functional outcome variables after 12 weeks of ICT-based telehealth program. In patients with good adherence to ICT-based telehealth program, there was a significant improvement in the mean uNa (103.1 to 78.1; P=.01) but not in those without (85.4 to 96.9; P=.49). Similarly, a marginal improvement in MLHFQ scores was only observed in patients with good adherence (27.5 to 21.4; P=.08) but not in their counterparts (19.0 to 19.7; P=.73). The mean 6-min walk distance and NT-proBNP were not significantly increased in patients regardless of their adherence. Short-term application of ICT-based telehealth program with voice recognition technology showed the potential to improve uNa values and MLHFQ scores in HF patients, suggesting that better control of sodium intake and greater quality of life can be achieved by this program. ©Heesun Lee, Jun-Bean Park, Sae Won Choi, Yeonyee E Yoon, Hyo Eun Park, Sang Eun Lee, Seung-Pyo Lee, Hyung-Kwan Kim, Hyun-Jai Cho, Su-Yeon Choi, Hae-Young Lee, Jonghyuk Choi, Young-Joon Lee, Yong-Jin Kim, Goo-Yeong Cho, Jinwook Choi, Dae-Won Sohn. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 02.10.2017.

  14. Fast and accurate face recognition based on image compression

    NASA Astrophysics Data System (ADS)

    Zheng, Yufeng; Blasch, Erik

    2017-05-01

    Image compression is desired for many image-related applications especially for network-based applications with bandwidth and storage constraints. The face recognition community typical reports concentrate on the maximal compression rate that would not decrease the recognition accuracy. In general, the wavelet-based face recognition methods such as EBGM (elastic bunch graph matching) and FPB (face pattern byte) are of high performance but run slowly due to their high computation demands. The PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) algorithms run fast but perform poorly in face recognition. In this paper, we propose a novel face recognition method based on standard image compression algorithm, which is termed as compression-based (CPB) face recognition. First, all gallery images are compressed by the selected compression algorithm. Second, a mixed image is formed with the probe and gallery images and then compressed. Third, a composite compression ratio (CCR) is computed with three compression ratios calculated from: probe, gallery and mixed images. Finally, the CCR values are compared and the largest CCR corresponds to the matched face. The time cost of each face matching is about the time of compressing the mixed face image. We tested the proposed CPB method on the "ASUMSS face database" (visible and thermal images) from 105 subjects. The face recognition accuracy with visible images is 94.76% when using JPEG compression. On the same face dataset, the accuracy of FPB algorithm was reported as 91.43%. The JPEG-compressionbased (JPEG-CPB) face recognition is standard and fast, which may be integrated into a real-time imaging device.

  15. A Palmprint Recognition Algorithm Using Phase-Only Correlation

    NASA Astrophysics Data System (ADS)

    Ito, Koichi; Aoki, Takafumi; Nakajima, Hiroshi; Kobayashi, Koji; Higuchi, Tatsuo

    This paper presents a palmprint recognition algorithm using Phase-Only Correlation (POC). The use of phase components in 2D (two-dimensional) discrete Fourier transforms of palmprint images makes it possible to achieve highly robust image registration and matching. In the proposed algorithm, POC is used to align scaling, rotation and translation between two palmprint images, and evaluate similarity between them. Experimental evaluation using a palmprint image database clearly demonstrates efficient matching performance of the proposed algorithm.

  16. Simulation and performance of an artificial retina for 40 MHz track reconstruction

    DOE PAGES

    Abba, A.; Bedeschi, F.; Citterio, M.; ...

    2015-03-05

    We present the results of a detailed simulation of the artificial retina pattern-recognition algorithm, designed to reconstruct events with hundreds of charged-particle tracks in pixel and silicon detectors at LHCb with LHC crossing frequency of 40 MHz. Performances of the artificial retina algorithm are assessed using the official Monte Carlo samples of the LHCb experiment. We found performances for the retina pattern-recognition algorithm comparable with the full LHCb reconstruction algorithm.

  17. Economic Evaluation of Voice Recognition (VR) for the Clinician’s Desktop at the Naval Hospital Roosevelt Roads

    DTIC Science & Technology

    1997-09-01

    first PC-based, very large vocabulary dictation system with a continuous natural language free flow approach to speech recognition. (This system allows...indicating the likelihood that a particular stored HMM reference model is the best match for the input. This approach is called the Baum-Welch...InfoCentral, and Envoy 1.0; and Lotus Development Corp.’s SmartSuite 3, Approach 3.0, and Organizer. 2. IBM At a press conference in New York in June 1997, IBM

  18. Voice technology and BBN

    NASA Technical Reports Server (NTRS)

    Wolf, Jared J.

    1977-01-01

    The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.

  19. Experimental evidence of vocal recognition in young and adult black-legged kittiwakes

    USGS Publications Warehouse

    Mulard, Hervé; Aubin, T.; White, J.F.; Hatch, Shyla A.; Danchin, E.

    2008-01-01

    Individual recognition is required in most social interactions, and its presence has been confirmed in many species. In birds, vocal cues appear to be a major component of recognition. Curiously, vocal recognition seems absent or limited in some highly social species such as the black-legged kittiwake, Rissa tridactyla. Using playback experiments, we found that kittiwake chicks recognized their parents vocally, this capacity being detectable as early as 20 days after hatching, the youngest age tested. Mates also recognized each other's long calls. Some birds reacted to their partner's voice when only a part of the long call was played back. Nevertheless, only about a third of the tested birds reacted to their mate's or parents' call and we were unable to detect recognition among neighbours. We discuss the low reactivity of kittiwakes in relation to their cliff-nesting habit and compare our results with evidence of vocal recognition in other larids. ?? 2008 The Association for the Study of Animal Behaviour.

  20. Construction site Voice Operated Information System (VOIS) test

    NASA Astrophysics Data System (ADS)

    Lawrence, Debbie J.; Hettchen, William

    1991-01-01

    The Voice Activated Information System (VAIS), developed by USACERL, allows inspectors to verbally log on-site inspection reports on a hand held tape recorder. The tape is later processed by the VAIS, which enters the information into the system's database and produces a written report. The Voice Operated Information System (VOIS), developed by USACERL and Automated Sciences Group, through a ESACERL cooperative research and development agreement (CRDA), is an improved voice recognition system based on the concepts and function of the VAIS. To determine the applicability of the VOIS to Corps of Engineers construction projects, Technology Transfer Test Bad (T3B) funds were provided to the Corps of Engineers National Security Agency (NSA) Area Office (Fort Meade) to procure and implement the VOIS, and to train personnel in its use. This report summarizes the NSA application of the VOIS to quality assurance inspection of radio frequency shielding and to progress payment logs, and concludes that the VOIS is an easily implemented system that can offer improvements when applied to repetitive inspection procedures. Use of VOIS can save time during inspection, improve documentation storage, and provide flexible retrieval of stored information.

  1. Sensing of Particular Speakers for the Construction of Voice Interface Utilized in Noisy Environment

    NASA Astrophysics Data System (ADS)

    Sawada, Hideyuki; Ohkado, Minoru

    Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.

  2. Audio feature extraction using probability distribution function

    NASA Astrophysics Data System (ADS)

    Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.

    2015-05-01

    Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.

  3. Learning Media Application Based On Microcontroller Chip Technology In Early Age

    NASA Astrophysics Data System (ADS)

    Ika Hidayati, Permata

    2018-04-01

    In Early childhood cognitive intelligence need right rncdia learning that can help a child’s cognitive intelligence quickly. The purpose of this study to design a learning media in the form of a puppet can used to introduce human anatomy during early childhood. This educational doll utilizing voice recognition technology from EasyVR module to receive commands from the user to introduce body parts on a doll, is used as an indicator TED. In addition to providing the introduction of human anatomy, this dolljut. a user can give a shout out to mainly play previously stored voice module sound recorder. Results obtained from this study is that this educational dolls can detect more than voice and spoken commands that can be random detected. Distance concrete of this doll in detecting the sound is up to a distance of 2.5 meters.

  4. Vocal fold nodules in adult singers: regional opinions about etiologic factors, career impact, and treatment. A survey of otolaryngologists, speech pathologists, and teachers of singing.

    PubMed

    Hogikyan, N D; Appel, S; Guinn, L W; Haxer, M J

    1999-03-01

    This study was undertaken to better understand current regional opinions regarding vocal fold nodules in adult singers. A questionnaire was sent to 298 persons representing the 3 professional groups most involved with the care of singers with vocal nodules: otolaryngologists, speech pathologists, and teachers of singing. The questionnaire queried respondents about their level of experience with this problem, and their beliefs about causative factors, career impact, and optimum treatment. Responses within and between groups were similar, with differences between groups primarily in the magnitude of positive or negative responses, rather than in the polarity of the responses. Prevailing opinions included: recognition of causative factors in both singing and speaking voice practices, optimism about responsiveness to appropriate treatment, enthusiasm for coordinated voice therapy and voice training as first-line treatment, and acceptance of microsurgical management as appropriate treatment if behavioral management fails.

  5. DCL System Research Using Advanced Approaches for Land-based or Ship-based Real-Time Recognition and Localization of Marine Mammals

    DTIC Science & Technology

    2012-09-30

    recognition. Algorithm design and statistical analysis and feature analysis. Post -Doctoral Associate, Cornell University, Bioacoustics Research...short. The HPC-ADA was designed based on fielded systems [1-4, 6] that offer a variety of desirable attributes, specifically dynamic resource...The software package was designed to utilize parallel and distributed processing for running recognition and other advanced algorithms. DeLMA

  6. Using an Improved SIFT Algorithm and Fuzzy Closed-Loop Control Strategy for Object Recognition in Cluttered Scenes

    PubMed Central

    Nie, Haitao; Long, Kehui; Ma, Jun; Yue, Dan; Liu, Jinguo

    2015-01-01

    Partial occlusions, large pose variations, and extreme ambient illumination conditions generally cause the performance degradation of object recognition systems. Therefore, this paper presents a novel approach for fast and robust object recognition in cluttered scenes based on an improved scale invariant feature transform (SIFT) algorithm and a fuzzy closed-loop control method. First, a fast SIFT algorithm is proposed by classifying SIFT features into several clusters based on several attributes computed from the sub-orientation histogram (SOH), in the feature matching phase only features that share nearly the same corresponding attributes are compared. Second, a feature matching step is performed following a prioritized order based on the scale factor, which is calculated between the object image and the target object image, guaranteeing robust feature matching. Finally, a fuzzy closed-loop control strategy is applied to increase the accuracy of the object recognition and is essential for autonomous object manipulation process. Compared to the original SIFT algorithm for object recognition, the result of the proposed method shows that the number of SIFT features extracted from an object has a significant increase, and the computing speed of the object recognition processes increases by more than 40%. The experimental results confirmed that the proposed method performs effectively and accurately in cluttered scenes. PMID:25714094

  7. False match elimination for face recognition based on SIFT algorithm

    NASA Astrophysics Data System (ADS)

    Gu, Xuyuan; Shi, Ping; Shao, Meide

    2011-06-01

    The SIFT (Scale Invariant Feature Transform) is a well known algorithm used to detect and describe local features in images. It is invariant to image scale, rotation and robust to the noise and illumination. In this paper, a novel method used for face recognition based on SIFT is proposed, which combines the optimization of SIFT, mutual matching and Progressive Sample Consensus (PROSAC) together and can eliminate the false matches of face recognition effectively. Experiments on ORL face database show that many false matches can be eliminated and better recognition rate is achieved.

  8. Interpreting Chicken-Scratch: Lexical Access for Handwritten Words

    PubMed Central

    Barnhart, Anthony S.; Goldinger, Stephen D.

    2014-01-01

    Handwritten word recognition is a field of study that has largely been neglected in the psychological literature, despite its prevalence in society. Whereas studies of spoken word recognition almost exclusively employ natural, human voices as stimuli, studies of visual word recognition use synthetic typefaces, thus simplifying the process of word recognition. The current study examined the effects of handwriting on a series of lexical variables thought to influence bottom-up and top-down processing, including word frequency, regularity, bidirectional consistency, and imageability. The results suggest that the natural physical ambiguity of handwritten stimuli forces a greater reliance on top-down processes, because almost all effects were magnified, relative to conditions with computer print. These findings suggest that processes of word perception naturally adapt to handwriting, compensating for physical ambiguity by increasing top-down feedback. PMID:20695708

  9. [Research on Barrier-free Home Environment System Based on Speech Recognition].

    PubMed

    Zhu, Husheng; Yu, Hongliu; Shi, Ping; Fang, Youfang; Jian, Zhuo

    2015-10-01

    The number of people with physical disabilities is increasing year by year, and the trend of population aging is more and more serious. In order to improve the quality of the life, a control system of accessible home environment for the patients with serious disabilities was developed to control the home electrical devices with the voice of the patients. The control system includes a central control platform, a speech recognition module, a terminal operation module, etc. The system combines the speech recognition control technology and wireless information transmission technology with the embedded mobile computing technology, and interconnects the lamp, electronic locks, alarms, TV and other electrical devices in the home environment as a whole system through a wireless network node. The experimental results showed that speech recognition success rate was more than 84% in the home environment.

  10. Neural system for heartbeats recognition using genetically integrated ensemble of classifiers.

    PubMed

    Osowski, Stanislaw; Siwek, Krzysztof; Siroic, Robert

    2011-03-01

    This paper presents the application of genetic algorithm for the integration of neural classifiers combined in the ensemble for the accurate recognition of heartbeat types on the basis of ECG registration. The idea presented in this paper is that using many classifiers arranged in the form of ensemble leads to the increased accuracy of the recognition. In such ensemble the important problem is the integration of all classifiers into one effective classification system. This paper proposes the use of genetic algorithm. It was shown that application of the genetic algorithm is very efficient and allows to reduce significantly the total error of heartbeat recognition. This was confirmed by the numerical experiments performed on the MIT BIH Arrhythmia Database. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. Incorporating Speech Recognition into a Natural User Interface

    NASA Technical Reports Server (NTRS)

    Chapa, Nicholas

    2017-01-01

    The Augmented/ Virtual Reality (AVR) Lab has been working to study the applicability of recent virtual and augmented reality hardware and software to KSC operations. This includes the Oculus Rift, HTC Vive, Microsoft HoloLens, and Unity game engine. My project in this lab is to integrate voice recognition and voice commands into an easy to modify system that can be added to an existing portion of a Natural User Interface (NUI). A NUI is an intuitive and simple to use interface incorporating visual, touch, and speech recognition. The inclusion of speech recognition capability will allow users to perform actions or make inquiries using only their voice. The simplicity of needing only to speak to control an on-screen object or enact some digital action means that any user can quickly become accustomed to using this system. Multiple programs were tested for use in a speech command and recognition system. Sphinx4 translates speech to text using a Hidden Markov Model (HMM) based Language Model, an Acoustic Model, and a word Dictionary running on Java. PocketSphinx had similar functionality to Sphinx4 but instead ran on C. However, neither of these programs were ideal as building a Java or C wrapper slowed performance. The most ideal speech recognition system tested was the Unity Engine Grammar Recognizer. A Context Free Grammar (CFG) structure is written in an XML file to specify the structure of phrases and words that will be recognized by Unity Grammar Recognizer. Using Speech Recognition Grammar Specification (SRGS) 1.0 makes modifying the recognized combinations of words and phrases very simple and quick to do. With SRGS 1.0, semantic information can also be added to the XML file, which allows for even more control over how spoken words and phrases are interpreted by Unity. Additionally, using a CFG with SRGS 1.0 produces a Finite State Machine (FSM) functionality limiting the potential for incorrectly heard words or phrases. The purpose of my project was to investigate options for a Speech Recognition System. To that end I attempted to integrate Sphinx4 into a user interface. Sphinx4 had great accuracy and is the only free program able to perform offline speech dictation. However it had a limited dictionary of words that could be recognized, single syllable words were almost impossible for it to hear, and since it ran on Java it could not be integrated into the Unity based NUI. PocketSphinx ran much faster than Sphinx4 which would've made it ideal as a plugin to the Unity NUI, unfortunately creating a C# wrapper for the C code made the program unusable with Unity due to the wrapper slowing code execution and class files becoming unreachable. Unity Grammar Recognizer is the ideal speech recognition interface, it is flexible in recognizing multiple variations of the same command. It is also the most accurate program in recognizing speech due to using an XML grammar to specify speech structure instead of relying solely on a Dictionary and Language model. The Unity Grammar Recognizer will be used with the NUI for these reasons as well as being written in C# which further simplifies the incorporation.

  12. Success with voice recognition.

    PubMed

    Sferrella, Sheila M

    2003-01-01

    You need a compelling reason to implement voice recognition technology. At my institution, the compelling reason was a turnaround time for Radiology results of more than two days. Only 41 percent of our reports were transcribed and signed within 24 hours. In November 1998, a team from Lehigh Valley Hospital went to RSNA and reviewed every voice system on the market. The evaluation was done with the radiologist workflow in mind, and we came back from the meeting with the vendor selection completed. The next steps included developing a business plan, approval of funds, reference calls to more than 15 sites and contract negotiation, all of which took about six months. The department of Radiology at Lehigh Valley Hospital and Health Network (LVHHN) is a multi-site center that performs over 360,000 procedures annually. The department handles all modalities of radiology: general diagnosis, neuroradiology, ultrasound, CT Scan, MRI, interventional radiology, arthography, myelography, bone densitometry, nuclear medicine, PET imaging, vascular lab and other advanced procedures. The department consists of 200 FTEs and a medical staff of more than 40 radiologists. The budget is in the $10.3 million range. There are three hospital sites and four outpatient imaging center sites where services are provided. At Lehigh Valley Hospital, radiologists are not dedicated to one subspecialty, so implementing a voice system by modality was not an option. Because transcription was so far behind, we needed to eliminate that part of the process. As a result, we decided to deploy the system all at once and with the radiologists as editors. The planning and testing phase took about four months, and the implementation took two weeks. We deployed over 40 workstations and trained close to 50 physicians. The radiologists brought in an extra radiologist from our group for the two weeks of training. That allowed us to train without taking a radiologist out of the department. We trained three to six radiologists a day. I projected a savings of 5.0 FTEs over two years. The actual savings were 8.0 FTEs within three weeks for the first phase and an additional 4.3 FTEs within two weeks of the second phase. The transcription staff was retained to perform other types of transcription and not displaced. The goal was to reduce Medical Records' outsourcing expenses by $670,000 over three years. The actual savings are in excess of $900,000. The proposed payback period was 17 months, and the actual was less than 12 months. For two years prior to implementing the voice system, the turnaround time at Lehigh Valley was 41 percent within 24 hours. One week after implementation, the turnaround time was 78 percent within 24 hours. Today it ranges between 85 percent and 92 percent. Overall, the radiologists at Lehigh Valley Hospital did an excellent job with the cultural change to voice recognition. It has made a major impact on our ability to get reports to physicians in a timely manner so they can make treatment decisions.

  13. Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback.

    PubMed

    Behroozmand, Roozbeh; Larson, Charles R

    2011-06-06

    The motor-driven predictions about expected sensory feedback (efference copies) have been proposed to play an important role in recognition of sensory consequences of self-produced motor actions. In the auditory system, this effect was suggested to result in suppression of sensory neural responses to self-produced voices that are predicted by the efference copies during vocal production in comparison with passive listening to the playback of the identical self-vocalizations. In the present study, event-related potentials (ERPs) were recorded in response to upward pitch shift stimuli (PSS) with five different magnitudes (0, +50, +100, +200 and +400 cents) at voice onset during active vocal production and passive listening to the playback. Results indicated that the suppression of the N1 component during vocal production was largest for unaltered voice feedback (PSS: 0 cents), became smaller as the magnitude of PSS increased to 200 cents, and was almost completely eliminated in response to 400 cents stimuli. Findings of the present study suggest that the brain utilizes the motor predictions (efference copies) to determine the source of incoming stimuli and maximally suppresses the auditory responses to unaltered feedback of self-vocalizations. The reduction of suppression for 50, 100 and 200 cents and its elimination for 400 cents pitch-shifted voice auditory feedback support the idea that motor-driven suppression of voice feedback leads to distinctly different sensory neural processing of self vs. non-self vocalizations. This characteristic may enable the audio-vocal system to more effectively detect and correct for unexpected errors in the feedback of self-produced voice pitch compared with externally-generated sounds.

  14. Real-time polarization imaging algorithm for camera-based polarization navigation sensors.

    PubMed

    Lu, Hao; Zhao, Kaichun; You, Zheng; Huang, Kaoli

    2017-04-10

    Biologically inspired polarization navigation is a promising approach due to its autonomous nature, high precision, and robustness. Many researchers have built point source-based and camera-based polarization navigation prototypes in recent years. Camera-based prototypes can benefit from their high spatial resolution but incur a heavy computation load. The pattern recognition algorithm in most polarization imaging algorithms involves several nonlinear calculations that impose a significant computation burden. In this paper, the polarization imaging and pattern recognition algorithms are optimized through reduction to several linear calculations by exploiting the orthogonality of the Stokes parameters without affecting precision according to the features of the solar meridian and the patterns of the polarized skylight. The algorithm contains a pattern recognition algorithm with a Hough transform as well as orientation measurement algorithms. The algorithm was loaded and run on a digital signal processing system to test its computational complexity. The test showed that the running time decreased to several tens of milliseconds from several thousand milliseconds. Through simulations and experiments, it was found that the algorithm can measure orientation without reducing precision. It can hence satisfy the practical demands of low computational load and high precision for use in embedded systems.

  15. Image-algebraic design of multispectral target recognition algorithms

    NASA Astrophysics Data System (ADS)

    Schmalz, Mark S.; Ritter, Gerhard X.

    1994-06-01

    In this paper, we discuss methods for multispectral ATR (Automated Target Recognition) of small targets that are sensed under suboptimal conditions, such as haze, smoke, and low light levels. In particular, we discuss our ongoing development of algorithms and software that effect intelligent object recognition by selecting ATR filter parameters according to ambient conditions. Our algorithms are expressed in terms of IA (image algebra), a concise, rigorous notation that unifies linear and nonlinear mathematics in the image processing domain. IA has been implemented on a variety of parallel computers, with preprocessors available for the Ada and FORTRAN languages. An image algebra C++ class library has recently been made available. Thus, our algorithms are both feasible implementationally and portable to numerous machines. Analyses emphasize the aspects of image algebra that aid the design of multispectral vision algorithms, such as parameterized templates that facilitate the flexible specification of ATR filters.

  16. Design of Phoneme MIDI Codes Using the MIDI Encoding Tool “Auto-F” and Realizing Voice Synthesizing Functions Based on Musical Sounds

    NASA Astrophysics Data System (ADS)

    Modegi, Toshio

    Using our previously developed audio to MIDI code converter tool “Auto-F”, from given vocal acoustic signals we can create MIDI data, which enable to playback the voice-like signals with a standard MIDI synthesizer. Applying this tool, we are constructing a MIDI database, which consists of previously converted simple harmonic structured MIDI codes from a set of 71 Japanese male and female syllable recorded signals. And we are developing a novel voice synthesizing system based on harmonically synthesizing musical sounds, which can generate MIDI data and playback voice signals with a MIDI synthesizer by giving Japanese plain (kana) texts, referring to the syllable MIDI code database. In this paper, we propose an improved MIDI converter tool, which can produce temporally higher-resolution MIDI codes. Then we propose an algorithm separating a set of 20 consonant and vowel phoneme MIDI codes from 71 syllable MIDI converted codes in order to construct a voice synthesizing system. And, we present the evaluation results of voice synthesizing quality between these separated phoneme MIDI codes and their original syllable MIDI codes by our developed 4-syllable word listening tests.

  17. Impact of the codec and various QoS methods on the final quality of the transferred voice in an IP network

    NASA Astrophysics Data System (ADS)

    Slavata, Oldřich; Holub, Jan

    2015-02-01

    This paper deals with an analysis of the relation between the codec that is used, the QoS method, and the final voice transmission quality. The Cisco 2811 router is used for adjusting QoS. VoIP client Linphone is used for adjusting the codec. The criterion for transmission quality is the MOS parameter investigated with the ITU-T P.862 PESQ and P.863 POLQA algorithms.

  18. Speaker Recognition Through NLP and CWT Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown-VanHoozer, S.A.; Kercel, S.W.; Tucker, R.W.

    The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the "large population" problem by seeking two completely different kinds of characterizing features. These features are he techniques of Neuro-Linguistic Programming (NLP) and the continuous waveletmore » transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less

  19. Speaker recognition through NLP and CWT modeling.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.

    The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and themore » continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less

  20. Graphics with Special Interfaces for Disabled People.

    ERIC Educational Resources Information Center

    Tronconi, A.; And Others

    The paper describes new software and special input devices to allow physically impaired children to utilize the graphic capabilities of personal computers. Special input devices for computer graphics access--the voice recognition card, the single switch, or the mouse emulator--can be used either singly or in combination by the disabled to control…

  1. Failing To Marvel: The Nuances, Complexities, and Challenges of Multicultural Education.

    ERIC Educational Resources Information Center

    Simoes de Carvalho, Paulo M.

    1998-01-01

    Reviews the complex nature of multicultural education, which as it advocates recognition of the values of many cultures, is nevertheless grounded in a Western culture and subject to Western deconstruction. Considers the challenge of the multicultural educator to recognize his or her own voice as representative of the dominant culture. (SLD)

  2. Blood Memory and the Arts: Indigenous Genealogies and Imagined Truths

    ERIC Educational Resources Information Center

    Mithlo, Nancy Marie

    2011-01-01

    Contemporary Native arts are rarely included in global arts settings that highlight any number of other disenfranchised artists seeking to gain recognition and a voice in the form of critical exhibition practice or scholarship. This article argues that Native artists can benefit from an increased participation in these broader arts networks, given…

  3. Effect of Technological Changes in Information Transfer on the Delivery of Pharmacy Services.

    ERIC Educational Resources Information Center

    Barker, Kenneth N.; And Others

    1989-01-01

    Personal computer technology has arrived in health care. Specific technological advances are optical disc storage, smart cards, voice recognition, and robotics. This paper discusses computers in medicine, in nursing, in conglomerates, and with patients. Future health care will be delivered in primary care centers, medical supermarkets, specialized…

  4. Speech Recognition in Fluctuating and Continuous Maskers: Effects of Hearing Loss and Presentation Level.

    ERIC Educational Resources Information Center

    Summers, Van; Molis, Michelle R.

    2004-01-01

    Listeners with normal-hearing sensitivity recognize speech more accurately in the presence of fluctuating background sounds, such as a single competing voice, than in unmodulated noise at the same overall level. These performance differences ore greatly reduced in listeners with hearing impairment, who generally receive little benefit from…

  5. 78 FR 17276 - Agency Information Collection Activities: Proposed Request and Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-20

    ... information collection in field offices via personal contact (face-to-face or telephone interview) using the... voice recognition technology, or by keying in responses using a telephone key pad. The SSIMWR allows... Development Worksheets: Face-to-Face Interview and Telephone Interview--20 CFR 416.204(b) and 422.135--0960...

  6. 77 FR 76591 - Agency Information Collection Activities: Proposed Request and Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-12-28

    ... voice recognition technology, or by keying in responses using a telephone key pad. The SSIMWR allows... Worksheets: Face-to-Face Interview and Telephone Interview--20 CFR 416.204(b) and 422.135--0960- 0780. SSA... each interview either over the telephone or through a face-to-face discussion with the centenarian...

  7. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    NASA Technical Reports Server (NTRS)

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  8. Pupils dilate for vocal or familiar music.

    PubMed

    Weiss, Michael W; Trehub, Sandra E; Schellenberg, E Glenn; Habashi, Peter

    2016-08-01

    Previous research reveals that vocal melodies are remembered better than instrumental renditions. Here we explored the possibility that the voice, as a highly salient stimulus, elicits greater arousal than nonvocal stimuli, resulting in greater pupil dilation for vocal than for instrumental melodies. We also explored the possibility that pupil dilation indexes memory for melodies. We tracked pupil dilation during a single exposure to 24 unfamiliar folk melodies (half sung to la la, half piano) and during a subsequent recognition test in which the previously heard melodies were intermixed with 24 novel melodies (half sung, half piano) from the same corpus. Pupil dilation was greater for vocal melodies than for piano melodies in the exposure phase and in the test phase. It was also greater for previously heard melodies than for novel melodies. Our findings provide the first evidence that pupillometry can be used to measure recognition of stimuli that unfold over several seconds. They also provide the first evidence of enhanced arousal to vocal melodies during encoding and retrieval, thereby supporting the more general notion of the voice as a privileged signal. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  9. Approximated mutual information training for speech recognition using myoelectric signals.

    PubMed

    Guo, Hua J; Chan, A D C

    2006-01-01

    A new training algorithm called the approximated maximum mutual information (AMMI) is proposed to improve the accuracy of myoelectric speech recognition using hidden Markov models (HMMs). Previous studies have demonstrated that automatic speech recognition can be performed using myoelectric signals from articulatory muscles of the face. Classification of facial myoelectric signals can be performed using HMMs that are trained using the maximum likelihood (ML) algorithm; however, this algorithm maximizes the likelihood of the observations in the training sequence, which is not directly associated with optimal classification accuracy. The AMMI training algorithm attempts to maximize the mutual information, thereby training the HMMs to optimize their parameters for discrimination. Our results show that AMMI training consistently reduces the error rates compared to these by the ML training, increasing the accuracy by approximately 3% on average.

  10. a Review on State-Of Face Recognition Approaches

    NASA Astrophysics Data System (ADS)

    Mahmood, Zahid; Muhammad, Nazeer; Bibi, Nargis; Ali, Tauseef

    Automatic Face Recognition (FR) presents a challenging task in the field of pattern recognition and despite the huge research in the past several decades; it still remains an open research problem. This is primarily due to the variability in the facial images, such as non-uniform illuminations, low resolution, occlusion, and/or variation in poses. Due to its non-intrusive nature, the FR is an attractive biometric modality and has gained a lot of attention in the biometric research community. Driven by the enormous number of potential application domains, many algorithms have been proposed for the FR. This paper presents an overview of the state-of-the-art FR algorithms, focusing their performances on publicly available databases. We highlight the conditions of the image databases with regard to the recognition rate of each approach. This is useful as a quick research overview and for practitioners as well to choose an algorithm for their specified FR application. To provide a comprehensive survey, the paper divides the FR algorithms into three categories: (1) intensity-based, (2) video-based, and (3) 3D based FR algorithms. In each category, the most commonly used algorithms and their performance is reported on standard face databases and a brief critical discussion is carried out.

  11. Toward open set recognition.

    PubMed

    Scheirer, Walter J; de Rezende Rocha, Anderson; Sapkota, Archana; Boult, Terrance E

    2013-07-01

    To date, almost all experimental evaluations of machine learning-based recognition algorithms in computer vision have taken the form of "closed set" recognition, whereby all testing classes are known at training time. A more realistic scenario for vision applications is "open set" recognition, where incomplete knowledge of the world is present at training time, and unknown classes can be submitted to an algorithm during testing. This paper explores the nature of open set recognition and formalizes its definition as a constrained minimization problem. The open set recognition problem is not well addressed by existing algorithms because it requires strong generalization. As a step toward a solution, we introduce a novel "1-vs-set machine," which sculpts a decision space from the marginal distances of a 1-class or binary SVM with a linear kernel. This methodology applies to several different applications in computer vision where open set recognition is a challenging problem, including object recognition and face verification. We consider both in this work, with large scale cross-dataset experiments performed over the Caltech 256 and ImageNet sets, as well as face matching experiments performed over the Labeled Faces in the Wild set. The experiments highlight the effectiveness of machines adapted for open set evaluation compared to existing 1-class and binary SVMs for the same tasks.

  12. Canadian Association of Radiologists White Paper on Artificial Intelligence in Radiology.

    PubMed

    Tang, An; Tam, Roger; Cadrin-Chênevert, Alexandre; Guest, Will; Chong, Jaron; Barfett, Joseph; Chepelev, Leonid; Cairns, Robyn; Mitchell, J Ross; Cicero, Mark D; Poudrette, Manuel Gaudreau; Jaremko, Jacob L; Reinhold, Caroline; Gallix, Benoit; Gray, Bruce; Geis, Raym

    2018-05-01

    Artificial intelligence (AI) is rapidly moving from an experimental phase to an implementation phase in many fields, including medicine. The combination of improved availability of large datasets, increasing computing power, and advances in learning algorithms has created major performance breakthroughs in the development of AI applications. In the last 5 years, AI techniques known as deep learning have delivered rapidly improving performance in image recognition, caption generation, and speech recognition. Radiology, in particular, is a prime candidate for early adoption of these techniques. It is anticipated that the implementation of AI in radiology over the next decade will significantly improve the quality, value, and depth of radiology's contribution to patient care and population health, and will revolutionize radiologists' workflows. The Canadian Association of Radiologists (CAR) is the national voice of radiology committed to promoting the highest standards in patient-centered imaging, lifelong learning, and research. The CAR has created an AI working group with the mandate to discuss and deliberate on practice, policy, and patient care issues related to the introduction and implementation of AI in imaging. This white paper provides recommendations for the CAR derived from deliberations between members of the AI working group. This white paper on AI in radiology will inform CAR members and policymakers on key terminology, educational needs of members, research and development, partnerships, potential clinical applications, implementation, structure and governance, role of radiologists, and potential impact of AI on radiology in Canada. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  13. Object Recognition and Localization: The Role of Tactile Sensors

    PubMed Central

    Aggarwal, Achint; Kirchner, Frank

    2014-01-01

    Tactile sensors, because of their intrinsic insensitivity to lighting conditions and water turbidity, provide promising opportunities for augmenting the capabilities of vision sensors in applications involving object recognition and localization. This paper presents two approaches for haptic object recognition and localization for ground and underwater environments. The first approach called Batch Ransac and Iterative Closest Point augmented Particle Filter (BRICPPF) is based on an innovative combination of particle filters, Iterative-Closest-Point algorithm, and a feature-based Random Sampling and Consensus (RANSAC) algorithm for database matching. It can handle a large database of 3D-objects of complex shapes and performs a complete six-degree-of-freedom localization of static objects. The algorithms are validated by experimentation in ground and underwater environments using real hardware. To our knowledge this is the first instance of haptic object recognition and localization in underwater environments. The second approach is biologically inspired, and provides a close integration between exploration and recognition. An edge following exploration strategy is developed that receives feedback from the current state of recognition. A recognition by parts approach is developed which uses the BRICPPF for object sub-part recognition. Object exploration is either directed to explore a part until it is successfully recognized, or is directed towards new parts to endorse the current recognition belief. This approach is validated by simulation experiments. PMID:24553087

  14. An Autonomous Star Identification Algorithm Based on One-Dimensional Vector Pattern for Star Sensors

    PubMed Central

    Luo, Liyan; Xu, Luping; Zhang, Hua

    2015-01-01

    In order to enhance the robustness and accelerate the recognition speed of star identification, an autonomous star identification algorithm for star sensors is proposed based on the one-dimensional vector pattern (one_DVP). In the proposed algorithm, the space geometry information of the observed stars is used to form the one-dimensional vector pattern of the observed star. The one-dimensional vector pattern of the same observed star remains unchanged when the stellar image rotates, so the problem of star identification is simplified as the comparison of the two feature vectors. The one-dimensional vector pattern is adopted to build the feature vector of the star pattern, which makes it possible to identify the observed stars robustly. The characteristics of the feature vector and the proposed search strategy for the matching pattern make it possible to achieve the recognition result as quickly as possible. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition accuracy and robustness by the proposed algorithm are better than those by the pyramid algorithm, the modified grid algorithm, and the LPT algorithm. The theoretical analysis and experimental results show that the proposed algorithm outperforms the other three star identification algorithms. PMID:26198233

  15. An Autonomous Star Identification Algorithm Based on One-Dimensional Vector Pattern for Star Sensors.

    PubMed

    Luo, Liyan; Xu, Luping; Zhang, Hua

    2015-07-07

    In order to enhance the robustness and accelerate the recognition speed of star identification, an autonomous star identification algorithm for star sensors is proposed based on the one-dimensional vector pattern (one_DVP). In the proposed algorithm, the space geometry information of the observed stars is used to form the one-dimensional vector pattern of the observed star. The one-dimensional vector pattern of the same observed star remains unchanged when the stellar image rotates, so the problem of star identification is simplified as the comparison of the two feature vectors. The one-dimensional vector pattern is adopted to build the feature vector of the star pattern, which makes it possible to identify the observed stars robustly. The characteristics of the feature vector and the proposed search strategy for the matching pattern make it possible to achieve the recognition result as quickly as possible. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition accuracy and robustness by the proposed algorithm are better than those by the pyramid algorithm, the modified grid algorithm, and the LPT algorithm. The theoretical analysis and experimental results show that the proposed algorithm outperforms the other three star identification algorithms.

  16. An effective approach for iris recognition using phase-based image matching.

    PubMed

    Miyazawa, Kazuyuki; Ito, Koichi; Aoki, Takafumi; Kobayashi, Koji; Nakajima, Hiroshi

    2008-10-01

    This paper presents an efficient algorithm for iris recognition using phase-based image matching--an image matching technique using phase components in 2D Discrete Fourier Transforms (DFTs) of given images. Experimental evaluation using CASIA iris image databases (versions 1.0 and 2.0) and Iris Challenge Evaluation (ICE) 2005 database clearly demonstrates that the use of phase components of iris images makes possible to achieve highly accurate iris recognition with a simple matching algorithm. This paper also discusses major implementation issues of our algorithm. In order to reduce the size of iris data and to prevent the visibility of iris images, we introduce the idea of 2D Fourier Phase Code (FPC) for representing iris information. The 2D FPC is particularly useful for implementing compact iris recognition devices using state-of-the-art Digital Signal Processing (DSP) technology.

  17. Theoretical Aspects of the Patterns Recognition Statistical Theory Used for Developing the Diagnosis Algorithms for Complicated Technical Systems

    NASA Astrophysics Data System (ADS)

    Obozov, A. A.; Serpik, I. N.; Mihalchenko, G. S.; Fedyaeva, G. A.

    2017-01-01

    In the article, the problem of application of the pattern recognition (a relatively young area of engineering cybernetics) for analysis of complicated technical systems is examined. It is shown that the application of a statistical approach for hard distinguishable situations could be the most effective. The different recognition algorithms are based on Bayes approach, which estimates posteriori probabilities of a certain event and an assumed error. Application of the statistical approach to pattern recognition is possible for solving the problem of technical diagnosis complicated systems and particularly big powered marine diesel engines.

  18. Iris recognition based on key image feature extraction.

    PubMed

    Ren, X; Tian, Q; Zhang, J; Wu, S; Zeng, Y

    2008-01-01

    In iris recognition, feature extraction can be influenced by factors such as illumination and contrast, and thus the features extracted may be unreliable, which can cause a high rate of false results in iris pattern recognition. In order to obtain stable features, an algorithm was proposed in this paper to extract key features of a pattern from multiple images. The proposed algorithm built an iris feature template by extracting key features and performed iris identity enrolment. Simulation results showed that the selected key features have high recognition accuracy on the CASIA Iris Set, where both contrast and illumination variance exist.

  19. Mandarin Chinese Tone Identification in Cochlear Implants: Predictions from Acoustic Models

    PubMed Central

    Morton, Kenneth D.; Torrione, Peter A.; Throckmorton, Chandra S.; Collins, Leslie M.

    2015-01-01

    It has been established that current cochlear implants do not supply adequate spectral information for perception of tonal languages. Comprehension of a tonal language, such as Mandarin Chinese, requires recognition of lexical tones. New strategies of cochlear stimulation such as variable stimulation rate and current steering may provide the means of delivering more spectral information and thus may provide the auditory fine structure required for tone recognition. Several cochlear implant signal processing strategies are examined in this study, the continuous interleaved sampling (CIS) algorithm, the frequency amplitude modulation encoding (FAME) algorithm, and the multiple carrier frequency algorithm (MCFA). These strategies provide different types and amounts of spectral information. Pattern recognition techniques can be applied to data from Mandarin Chinese tone recognition tasks using acoustic models as a means of testing the abilities of these algorithms to transmit the changes in fundamental frequency indicative of the four lexical tones. The ability of processed Mandarin Chinese tones to be correctly classified may predict trends in the effectiveness of different signal processing algorithms in cochlear implants. The proposed techniques can predict trends in performance of the signal processing techniques in quiet conditions but fail to do so in noise. PMID:18706497

  20. HWDA: A coherence recognition and resolution algorithm for hybrid web data aggregation

    NASA Astrophysics Data System (ADS)

    Guo, Shuhang; Wang, Jian; Wang, Tong

    2017-09-01

    Aiming at the object confliction recognition and resolution problem for hybrid distributed data stream aggregation, a distributed data stream object coherence solution technology is proposed. Firstly, the framework was defined for the object coherence conflict recognition and resolution, named HWDA. Secondly, an object coherence recognition technology was proposed based on formal language description logic and hierarchical dependency relationship between logic rules. Thirdly, a conflict traversal recognition algorithm was proposed based on the defined dependency graph. Next, the conflict resolution technology was prompted based on resolution pattern matching including the definition of the three types of conflict, conflict resolution matching pattern and arbitration resolution method. At last, the experiment use two kinds of web test data sets to validate the effect of application utilizing the conflict recognition and resolution technology of HWDA.

  1. Online graphic symbol recognition using neural network and ARG matching

    NASA Astrophysics Data System (ADS)

    Yang, Bing; Li, Changhua; Xie, Weixing

    2001-09-01

    This paper proposes a novel method for on-line recognition of line-based graphic symbol. The input strokes are usually warped into a cursive form due to the sundry drawing style, and classifying them is very difficult. To deal with this, an ART-2 neural network is used to classify the input strokes. It has the advantages of high recognition rate, less recognition time and forming classes in a self-organized manner. The symbol recognition is achieved by an Attribute Relational Graph (ARG) matching algorithm. The ARG is very efficient for representing complex objects, but computation cost is very high. To over come this, we suggest a fast graph matching algorithm using symbol structure information. The experimental results show that the proposed method is effective for recognition of symbols with hierarchical structure.

  2. Using rate of divergence as an objective measure to differentiate between voice signal types based on the amount of disorder in the signal

    PubMed Central

    Calawerts, William M; Lin, Liyu; Sprott, JC; Jiang, Jack J

    2016-01-01

    Objective/Hypothesis The purpose of this paper is to introduce rate of divergence as an objective measure to differentiate between the four voice types based on the amount of disorder present in a signal. We hypothesized that rate of divergence would provide an objective measure that can quantify all four voice types. Study Design 150 acoustic voice recordings were randomly selected and analyzed using traditional perturbation, nonlinear, and rate of divergence analysis methods. ty Methods We developed a new parameter, rate of divergence, which uses a modified version of Wolf’s algorithm for calculating Lyapunov exponents of a system. The outcome of this calculation is not a Lyapunov exponent, but rather a description of the divergence of two nearby data points for the next three points in the time series, followed in three time delayed embedding dimensions. This measure was compared to currently existing perturbation and nonlinear dynamic methods of distinguishing between voice signals. Results There was a direct relationship between voice type and rate of divergence. This calculation is especially effective at differentiating between type 3 and type 4 voices (p<0.001), and is equally effective at differentiating type 1, type 2, and type 3 signals as currently existing methods. Conclusion The rate of divergence calculation introduced is an objective measure that can be used to distinguish between all four voice types based on amount of disorder present, leading to quicker and more accurate voice typing as well as an improved understanding of the nonlinear dynamics involved in phonation. PMID:26920858

  3. A New Method of Facial Expression Recognition Based on SPE Plus SVM

    NASA Astrophysics Data System (ADS)

    Ying, Zilu; Huang, Mingwei; Wang, Zhen; Wang, Zhewei

    A novel method of facial expression recognition (FER) is presented, which uses stochastic proximity embedding (SPE) for data dimension reduction, and support vector machine (SVM) for expression classification. The proposed algorithm is applied to Japanese Female Facial Expression (JAFFE) database for FER, better performance is obtained compared with some traditional algorithms, such as PCA and LDA etc.. The result have further proved the effectiveness of the proposed algorithm.

  4. Container-code recognition system based on computer vision and deep neural networks

    NASA Astrophysics Data System (ADS)

    Liu, Yi; Li, Tianjian; Jiang, Li; Liang, Xiaoyao

    2018-04-01

    Automatic container-code recognition system becomes a crucial requirement for ship transportation industry in recent years. In this paper, an automatic container-code recognition system based on computer vision and deep neural networks is proposed. The system consists of two modules, detection module and recognition module. The detection module applies both algorithms based on computer vision and neural networks, and generates a better detection result through combination to avoid the drawbacks of the two methods. The combined detection results are also collected for online training of the neural networks. The recognition module exploits both character segmentation and end-to-end recognition, and outputs the recognition result which passes the verification. When the recognition module generates false recognition, the result will be corrected and collected for online training of the end-to-end recognition sub-module. By combining several algorithms, the system is able to deal with more situations, and the online training mechanism can improve the performance of the neural networks at runtime. The proposed system is able to achieve 93% of overall recognition accuracy.

  5. Using Rate of Divergence as an Objective Measure to Differentiate between Voice Signal Types Based on the Amount of Disorder in the Signal.

    PubMed

    Calawerts, William M; Lin, Liyu; Sprott, J C; Jiang, Jack J

    2017-01-01

    The purpose of this paper is to introduce the rate of divergence as an objective measure to differentiate between the four voice types based on the amount of disorder present in a signal. We hypothesized that rate of divergence would provide an objective measure that can quantify all four voice types. A total of 150 acoustic voice recordings were randomly selected and analyzed using traditional perturbation, nonlinear, and rate of divergence analysis methods. We developed a new parameter, rate of divergence, which uses a modified version of Wolf's algorithm for calculating Lyapunov exponents of a system. The outcome of this calculation is not a Lyapunov exponent, but rather a description of the divergence of two nearby data points for the next three points in the time series, followed in three time-delayed embedding dimensions. This measure was compared to currently existing perturbation and nonlinear dynamic methods of distinguishing between voice signals. There was a direct relationship between voice type and rate of divergence. This calculation is especially effective at differentiating between type 3 and type 4 voices (P < 0.001) and is equally effective at differentiating type 1, type 2, and type 3 signals as currently existing methods. The rate of divergence calculation introduced is an objective measure that can be used to distinguish between all four voice types based on the amount of disorder present, leading to quicker and more accurate voice typing as well as an improved understanding of the nonlinear dynamics involved in phonation. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  6. Hemispheric association and dissociation of voice and speech information processing in stroke.

    PubMed

    Jones, Anna B; Farrall, Andrew J; Belin, Pascal; Pernet, Cyril R

    2015-10-01

    As we listen to someone speaking, we extract both linguistic and non-linguistic information. Knowing how these two sets of information are processed in the brain is fundamental for the general understanding of social communication, speech recognition and therapy of language impairments. We investigated the pattern of performances in phoneme versus gender categorization in left and right hemisphere stroke patients, and found an anatomo-functional dissociation in the right frontal cortex, establishing a new syndrome in voice discrimination abilities. In addition, phoneme and gender performances were most often associated than dissociated in the left hemisphere patients, suggesting a common neural underpinnings. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Voice response system of color and pattern on clothes for visually handicapped person.

    PubMed

    Miyake, Masao; Manabe, Yoshitsugu; Uranishi, Yuki; Imura, Masataka; Oshiro, Osamu

    2013-01-01

    For visually handicapped people, a mental support is important in their independent daily life and participation in a society. It is expected to develop a system which can recognize colors and patterns on clothes so that they can go out with less concerns. We have worked on a basic study into such a system, and developed a prototype system which can stably recognize colors and patterns and immediately provide these information in voice, when a user faces it to clothes. In the results of evaluation experiments it is shown that the prototype system is superior to the system in the basic study at the accuracy rate for the recognition of color and pattern.

  8. Speech processing using maximum likelihood continuity mapping

    DOEpatents

    Hogden, John E.

    2000-01-01

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  9. Speech processing using maximum likelihood continuity mapping

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogden, J.E.

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  10. Llamas: Large-area microphone arrays and sensing systems

    NASA Astrophysics Data System (ADS)

    Sanz-Robinson, Josue

    Large-area electronics (LAE) provides a platform to build sensing systems, based on distributing large numbers of densely spaced sensors over a physically-expansive space. Due to their flexible, "wallpaper-like" form factor, these systems can be seamlessly deployed in everyday spaces. They go beyond just supplying sensor readings, but rather they aim to transform the wealth of data from these sensors into actionable inferences about our physical environment. This requires vertically integrated systems that span the entirety of the signal processing chain, including transducers and devices, circuits, and signal processing algorithms. To this end we develop hybrid LAE / CMOS systems, which exploit the complementary strengths of LAE, enabling spatially distributed sensors, and CMOS ICs, providing computational capacity for signal processing. To explore the development of hybrid sensing systems, based on vertical integration across the signal processing chain, we focus on two main drivers: (1) thin-film diodes, and (2) microphone arrays for blind source separation: 1) Thin-film diodes are a key building block for many applications, such as RFID tags or power transfer over non-contact inductive links, which require rectifiers for AC-to-DC conversion. We developed hybrid amorphous / nanocrystalline silicon diodes, which are fabricated at low temperatures (<200 °C) to be compatible with processing on plastic, and have high current densities (5 A/cm2 at 1 V) and high frequency operation (cutoff frequency of 110 MHz). 2) We designed a system for separating the voices of multiple simultaneous speakers, which can ultimately be fed to a voice-command recognition engine for controlling electronic systems. On a device level, we developed flexible PVDF microphones, which were used to create a large-area microphone array. On a circuit level we developed localized a-Si TFT amplifiers, and a custom CMOS IC, for system control, sensor readout and digitization. On a signal processing level we developed an algorithm for blind source separation in a real, reverberant room, based on beamforming and binary masking. It requires no knowledge about the location of the speakers or microphones. Instead, it uses cluster analysis techniques to determine the time delays for beamforming; thus, adapting to the unique acoustic environment of the room.

  11. Membership-degree preserving discriminant analysis with applications to face recognition.

    PubMed

    Yang, Zhangjing; Liu, Chuancai; Huang, Pu; Qian, Jianjun

    2013-01-01

    In pattern recognition, feature extraction techniques have been widely employed to reduce the dimensionality of high-dimensional data. In this paper, we propose a novel feature extraction algorithm called membership-degree preserving discriminant analysis (MPDA) based on the fisher criterion and fuzzy set theory for face recognition. In the proposed algorithm, the membership degree of each sample to particular classes is firstly calculated by the fuzzy k-nearest neighbor (FKNN) algorithm to characterize the similarity between each sample and class centers, and then the membership degree is incorporated into the definition of the between-class scatter and the within-class scatter. The feature extraction criterion via maximizing the ratio of the between-class scatter to the within-class scatter is applied. Experimental results on the ORL, Yale, and FERET face databases demonstrate the effectiveness of the proposed algorithm.

  12. Research on gait-based human identification

    NASA Astrophysics Data System (ADS)

    Li, Youguo

    Gait recognition refers to automatic identification of individual based on his/her style of walking. This paper proposes a gait recognition method based on Continuous Hidden Markov Model with Mixture of Gaussians(G-CHMM). First, we initialize a Gaussian mix model for training image sequence with K-means algorithm, then train the HMM parameters using a Baum-Welch algorithm. These gait feature sequences can be trained and obtain a Continuous HMM for every person, therefore, the 7 key frames and the obtained HMM can represent each person's gait sequence. Finally, the recognition is achieved by Front algorithm. The experiments made on CASIA gait databases obtain comparatively high correction identification ratio and comparatively strong robustness for variety of bodily angle.

  13. Artificial intelligence tools for pattern recognition

    NASA Astrophysics Data System (ADS)

    Acevedo, Elena; Acevedo, Antonio; Felipe, Federico; Avilés, Pedro

    2017-06-01

    In this work, we present a system for pattern recognition that combines the power of genetic algorithms for solving problems and the efficiency of the morphological associative memories. We use a set of 48 tire prints divided into 8 brands of tires. The images have dimensions of 200 x 200 pixels. We applied Hough transform to obtain lines as main features. The number of lines obtained is 449. The genetic algorithm reduces the number of features to ten suitable lines that give thus the 100% of recognition. Morphological associative memories were used as evaluation function. The selection algorithms were Tournament and Roulette wheel. For reproduction, we applied one-point, two-point and uniform crossover.

  14. Exercise recognition for Kinect-based telerehabilitation.

    PubMed

    Antón, D; Goñi, A; Illarramendi, A

    2015-01-01

    An aging population and people's higher survival to diseases and traumas that leave physical consequences are challenging aspects in the context of an efficient health management. This is why telerehabilitation systems are being developed, to allow monitoring and support of physiotherapy sessions at home, which could reduce healthcare costs while also improving the quality of life of the users. Our goal is the development of a Kinect-based algorithm that provides a very accurate real-time monitoring of physical rehabilitation exercises and that also provides a friendly interface oriented both to users and physiotherapists. The two main constituents of our algorithm are the posture classification method and the exercises recognition method. The exercises consist of series of movements. Each movement is composed of an initial posture, a final posture and the angular trajectories of the limbs involved in the movement. The algorithm was designed and tested with datasets of real movements performed by volunteers. We also explain in the paper how we obtained the optimal values for the trade-off values for posture and trajectory recognition. Two relevant aspects of the algorithm were evaluated in our tests, classification accuracy and real-time data processing. We achieved 91.9% accuracy in posture classification and 93.75% accuracy in trajectory recognition. We also checked whether the algorithm was able to process the data in real-time. We found that our algorithm could process more than 20,000 postures per second and all the required trajectory data-series in real-time, which in practice guarantees no perceptible delays. Later on, we carried out two clinical trials with real patients that suffered shoulder disorders. We obtained an exercise monitoring accuracy of 95.16%. We present an exercise recognition algorithm that handles the data provided by Kinect efficiently. The algorithm has been validated in a real scenario where we have verified its suitability. Moreover, we have received a positive feedback from both users and the physiotherapists who took part in the tests.

  15. SU-F-T-20: Novel Catheter Lumen Recognition Algorithm for Rapid Digitization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dise, J; McDonald, D; Ashenafi, M

    Purpose: Manual catheter recognition remains a time-consuming aspect of high-dose-rate brachytherapy (HDR) treatment planning. In this work, a novel catheter lumen recognition algorithm was created for accurate and rapid digitization. Methods: MatLab v8.5 was used to create the catheter recognition algorithm. Initially, the algorithm searches the patient CT dataset using an intensity based k-means filter designed to locate catheters. Once the catheters have been located, seed points are manually selected to initialize digitization of each catheter. From each seed point, the algorithm searches locally in order to automatically digitize the remaining catheter. This digitization is accomplished by finding pixels withmore » similar image curvature and divergence parameters compared to the seed pixel. Newly digitized pixels are treated as new seed positions, and hessian image analysis is used to direct the algorithm toward neighboring catheter pixels, and to make the algorithm insensitive to adjacent catheters that are unresolvable on CT, air pockets, and high Z artifacts. The algorithm was tested using 11 HDR treatment plans, including the Syed template, tandem and ovoid applicator, and multi-catheter lung brachytherapy. Digitization error was calculated by comparing manually determined catheter positions to those determined by the algorithm. Results: he digitization error was 0.23 mm ± 0.14 mm axially and 0.62 mm ± 0.13 mm longitudinally at the tip. The time of digitization, following initial seed placement was less than 1 second per catheter. The maximum total time required to digitize all tested applicators was 4 minutes (Syed template with 15 needles). Conclusion: This algorithm successfully digitizes HDR catheters for a variety of applicators with or without CT markers. The minimal axial error demonstrates the accuracy of the algorithm, and its insensitivity to image artifacts and challenging catheter positioning. Future work to automatically place initial seed positions would improve the algorithm speed.« less

  16. Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback

    PubMed Central

    2011-01-01

    Background The motor-driven predictions about expected sensory feedback (efference copies) have been proposed to play an important role in recognition of sensory consequences of self-produced motor actions. In the auditory system, this effect was suggested to result in suppression of sensory neural responses to self-produced voices that are predicted by the efference copies during vocal production in comparison with passive listening to the playback of the identical self-vocalizations. In the present study, event-related potentials (ERPs) were recorded in response to upward pitch shift stimuli (PSS) with five different magnitudes (0, +50, +100, +200 and +400 cents) at voice onset during active vocal production and passive listening to the playback. Results Results indicated that the suppression of the N1 component during vocal production was largest for unaltered voice feedback (PSS: 0 cents), became smaller as the magnitude of PSS increased to 200 cents, and was almost completely eliminated in response to 400 cents stimuli. Conclusions Findings of the present study suggest that the brain utilizes the motor predictions (efference copies) to determine the source of incoming stimuli and maximally suppresses the auditory responses to unaltered feedback of self-vocalizations. The reduction of suppression for 50, 100 and 200 cents and its elimination for 400 cents pitch-shifted voice auditory feedback support the idea that motor-driven suppression of voice feedback leads to distinctly different sensory neural processing of self vs. non-self vocalizations. This characteristic may enable the audio-vocal system to more effectively detect and correct for unexpected errors in the feedback of self-produced voice pitch compared with externally-generated sounds. PMID:21645406

  17. The computer in office medical practice.

    PubMed

    Dowdle, John

    2002-04-01

    There will continue to be change and evolution in the medical office environment. As voice recognition systems continue to improve, instant creation of office notes with the absence of dictation may be commonplace. As medical and computer technology evolves, we must continue to evaluate the many new computer systems that can assist us in our clinical office practice.

  18. Exploring the Use of Emoji as a Visual Research Method for Eliciting Young Children's Voices in Childhood Research

    ERIC Educational Resources Information Center

    Fane, Jennifer; MacDougall, Colin; Jovanovic, Jessie; Redmond, Gerry; Gibbs, Lisa

    2018-01-01

    Recognition of the need to move from research "on" children to research "with" children has prompted significant theoretical and methodological debate as to how young children can be positioned as active participants in the research process. Visual research methods such as drawing, photography, and videography have received…

  19. Effects of Familiarity and Feeding on Newborn Speech-Voice Recognition

    ERIC Educational Resources Information Center

    Valiante, A. Grace; Barr, Ronald G.; Zelazo, Philip R.; Brant, Rollin; Young, Simon N.

    2013-01-01

    Newborn infants preferentially orient to familiar over unfamiliar speech sounds. They are also better at remembering unfamiliar speech sounds for short periods of time if learning and retention occur after a feed than before. It is unknown whether short-term memory for speech is enhanced when the sound is familiar (versus unfamiliar) and, if so,…

  20. The Sound-to-Speech Translations Utilizing Graphics Mediation Interface for Students with Severe Handicaps. Final Report.

    ERIC Educational Resources Information Center

    Brown, Carrie; And Others

    This final report describes activities and outcomes of a research project on a sound-to-speech translation system utilizing a graphic mediation interface for students with severe disabilities. The STS/Graphics system is a voice recognition, computer-based system designed to allow individuals with mental retardation and/or severe physical…

  1. Killing Curiosity? An Analysis of Celebrated Identity Performances among Teachers and Students in Nine London Secondary Science Classrooms

    ERIC Educational Resources Information Center

    Archer, Louise; Dawson, Emily; DeWitt, Jennifer; Godec, Spela; King, Heather; Mau, Ada; Nomikou, Effrosyni; Seakins, Amy

    2017-01-01

    In this paper, we take the view that school classrooms are spaces that are constituted by complex power struggles (for voice, authenticity, and recognition), involving multiple layers of resistance and contestation between the "institution," teachers and students, which can have profound implications for students' science identity and…

  2. Building Biases in Infancy: The Influence of Race on Face and Voice Emotion Matching

    ERIC Educational Resources Information Center

    Vogel, Margaret; Monesson, Alexandra; Scott, Lisa S.

    2012-01-01

    Early in the first year of life infants exhibit equivalent performance distinguishing among people within their own race and within other races. However, with development and experience, their face recognition skills become tuned to groups of people they interact with the most. This developmental tuning is hypothesized to be the origin of adult…

  3. Moving beyond the Hype: What Does the Celebration of Student Writing Do for Students?

    ERIC Educational Resources Information Center

    Carter, Genesea M.; Gallegos, Erin Penner

    2017-01-01

    Over the last decade celebrations of student writing (CSWs) have been instituted at universities across the nation as a public way to celebrate students' voices, identities, and literacies. Often touted as a way to gain campus-wide recognition and support for first-year composition courses, this event also purportedly fosters agency and authority…

  4. Facial Emotions Recognition using Gabor Transform and Facial Animation Parameters with Neural Networks

    NASA Astrophysics Data System (ADS)

    Harit, Aditya; Joshi, J. C., Col; Gupta, K. K.

    2018-03-01

    The paper proposed an automatic facial emotion recognition algorithm which comprises of two main components: feature extraction and expression recognition. The algorithm uses a Gabor filter bank on fiducial points to find the facial expression features. The resulting magnitudes of Gabor transforms, along with 14 chosen FAPs (Facial Animation Parameters), compose the feature space. There are two stages: the training phase and the recognition phase. Firstly, for the present 6 different emotions, the system classifies all training expressions in 6 different classes (one for each emotion) in the training stage. In the recognition phase, it recognizes the emotion by applying the Gabor bank to a face image, then finds the fiducial points, and then feeds it to the trained neural architecture.

  5. Face recognition algorithm based on Gabor wavelet and locality preserving projections

    NASA Astrophysics Data System (ADS)

    Liu, Xiaojie; Shen, Lin; Fan, Honghui

    2017-07-01

    In order to solve the effects of illumination changes and differences of personal features on the face recognition rate, this paper presents a new face recognition algorithm based on Gabor wavelet and Locality Preserving Projections (LPP). The problem of the Gabor filter banks with high dimensions was solved effectively, and also the shortcoming of the LPP on the light illumination changes was overcome. Firstly, the features of global image information were achieved, which used the good spatial locality and orientation selectivity of Gabor wavelet filters. Then the dimensions were reduced by utilizing the LPP, which well-preserved the local information of the image. The experimental results shown that this algorithm can effectively extract the features relating to facial expressions, attitude and other information. Besides, it can reduce influence of the illumination changes and the differences in personal features effectively, which improves the face recognition rate to 99.2%.

  6. Analysis of objects in binary images. M.S. Thesis - Old Dominion Univ.

    NASA Technical Reports Server (NTRS)

    Leonard, Desiree M.

    1991-01-01

    Digital image processing techniques are typically used to produce improved digital images through the application of successive enhancement techniques to a given image or to generate quantitative data about the objects within that image. In support of and to assist researchers in a wide range of disciplines, e.g., interferometry, heavy rain effects on aerodynamics, and structure recognition research, it is often desirable to count objects in an image and compute their geometric properties. Therefore, an image analysis application package, focusing on a subset of image analysis techniques used for object recognition in binary images, was developed. This report describes the techniques and algorithms utilized in three main phases of the application and are categorized as: image segmentation, object recognition, and quantitative analysis. Appendices provide supplemental formulas for the algorithms employed as well as examples and results from the various image segmentation techniques and the object recognition algorithm implemented.

  7. A star recognition method based on the Adaptive Ant Colony algorithm for star sensors.

    PubMed

    Quan, Wei; Fang, Jiancheng

    2010-01-01

    A new star recognition method based on the Adaptive Ant Colony (AAC) algorithm has been developed to increase the star recognition speed and success rate for star sensors. This method draws circles, with the center of each one being a bright star point and the radius being a special angular distance, and uses the parallel processing ability of the AAC algorithm to calculate the angular distance of any pair of star points in the circle. The angular distance of two star points in the circle is solved as the path of the AAC algorithm, and the path optimization feature of the AAC is employed to search for the optimal (shortest) path in the circle. This optimal path is used to recognize the stellar map and enhance the recognition success rate and speed. The experimental results show that when the position error is about 50″, the identification success rate of this method is 98% while the Delaunay identification method is only 94%. The identification time of this method is up to 50 ms.

  8. A fingerprint classification algorithm based on combination of local and global information

    NASA Astrophysics Data System (ADS)

    Liu, Chongjin; Fu, Xiang; Bian, Junjie; Feng, Jufu

    2011-12-01

    Fingerprint recognition is one of the most important technologies in biometric identification and has been wildly applied in commercial and forensic areas. Fingerprint classification, as the fundamental procedure in fingerprint recognition, can sharply decrease the quantity for fingerprint matching and improve the efficiency of fingerprint recognition. Most fingerprint classification algorithms are based on the number and position of singular points. Because the singular points detecting method only considers the local information commonly, the classification algorithms are sensitive to noise. In this paper, we propose a novel fingerprint classification algorithm combining the local and global information of fingerprint. Firstly we use local information to detect singular points and measure their quality considering orientation structure and image texture in adjacent areas. Furthermore the global orientation model is adopted to measure the reliability of singular points group. Finally the local quality and global reliability is weighted to classify fingerprint. Experiments demonstrate the accuracy and effectivity of our algorithm especially for the poor quality fingerprint images.

  9. Face recognition using total margin-based adaptive fuzzy support vector machines.

    PubMed

    Liu, Yi-Hung; Chen, Yen-Ting

    2007-01-01

    This paper presents a new classifier called total margin-based adaptive fuzzy support vector machines (TAF-SVM) that deals with several problems that may occur in support vector machines (SVMs) when applied to the face recognition. The proposed TAF-SVM not only solves the overfitting problem resulted from the outlier with the approach of fuzzification of the penalty, but also corrects the skew of the optimal separating hyperplane due to the very imbalanced data sets by using different cost algorithm. In addition, by introducing the total margin algorithm to replace the conventional soft margin algorithm, a lower generalization error bound can be obtained. Those three functions are embodied into the traditional SVM so that the TAF-SVM is proposed and reformulated in both linear and nonlinear cases. By using two databases, the Chung Yuan Christian University (CYCU) multiview and the facial recognition technology (FERET) face databases, and using the kernel Fisher's discriminant analysis (KFDA) algorithm to extract discriminating face features, experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy. The results also indicate that the proposed TAF-SVM can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained.

  10. Pattern recognition for passive polarimetric data using nonparametric classifiers

    NASA Astrophysics Data System (ADS)

    Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.

    2005-08-01

    Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.

  11. Multispectral iris recognition based on group selection and game theory

    NASA Astrophysics Data System (ADS)

    Ahmad, Foysal; Roy, Kaushik

    2017-05-01

    A commercially available iris recognition system uses only a narrow band of the near infrared spectrum (700-900 nm) while iris images captured in the wide range of 405 nm to 1550 nm offer potential benefits to enhance recognition performance of an iris biometric system. The novelty of this research is that a group selection algorithm based on coalition game theory is explored to select the best patch subsets. In this algorithm, patches are divided into several groups based on their maximum contribution in different groups. Shapley values are used to evaluate the contribution of patches in different groups. Results show that this group selection based iris recognition

  12. Wavelet decomposition based principal component analysis for face recognition using MATLAB

    NASA Astrophysics Data System (ADS)

    Sharma, Mahesh Kumar; Sharma, Shashikant; Leeprechanon, Nopbhorn; Ranjan, Aashish

    2016-03-01

    For the realization of face recognition systems in the static as well as in the real time frame, algorithms such as principal component analysis, independent component analysis, linear discriminate analysis, neural networks and genetic algorithms are used for decades. This paper discusses an approach which is a wavelet decomposition based principal component analysis for face recognition. Principal component analysis is chosen over other algorithms due to its relative simplicity, efficiency, and robustness features. The term face recognition stands for identifying a person from his facial gestures and having resemblance with factor analysis in some sense, i.e. extraction of the principal component of an image. Principal component analysis is subjected to some drawbacks, mainly the poor discriminatory power and the large computational load in finding eigenvectors, in particular. These drawbacks can be greatly reduced by combining both wavelet transform decomposition for feature extraction and principal component analysis for pattern representation and classification together, by analyzing the facial gestures into space and time domain, where, frequency and time are used interchangeably. From the experimental results, it is envisaged that this face recognition method has made a significant percentage improvement in recognition rate as well as having a better computational efficiency.

  13. Bag-of-visual-phrases and hierarchical deep models for traffic sign detection and recognition in mobile laser scanning data

    NASA Astrophysics Data System (ADS)

    Yu, Yongtao; Li, Jonathan; Wen, Chenglu; Guan, Haiyan; Luo, Huan; Wang, Cheng

    2016-03-01

    This paper presents a novel algorithm for detection and recognition of traffic signs in mobile laser scanning (MLS) data for intelligent transportation-related applications. The traffic sign detection task is accomplished based on 3-D point clouds by using bag-of-visual-phrases representations; whereas the recognition task is achieved based on 2-D images by using a Gaussian-Bernoulli deep Boltzmann machine-based hierarchical classifier. To exploit high-order feature encodings of feature regions, a deep Boltzmann machine-based feature encoder is constructed. For detecting traffic signs in 3-D point clouds, the proposed algorithm achieves an average recall, precision, quality, and F-score of 0.956, 0.946, 0.907, and 0.951, respectively, on the four selected MLS datasets. For on-image traffic sign recognition, a recognition accuracy of 97.54% is achieved by using the proposed hierarchical classifier. Comparative studies with the existing traffic sign detection and recognition methods demonstrate that our algorithm obtains promising, reliable, and high performance in both detecting traffic signs in 3-D point clouds and recognizing traffic signs on 2-D images.

  14. Indonesian Sign Language Number Recognition using SIFT Algorithm

    NASA Astrophysics Data System (ADS)

    Mahfudi, Isa; Sarosa, Moechammad; Andrie Asmara, Rosa; Azrino Gustalika, M.

    2018-04-01

    Indonesian sign language (ISL) is generally used for deaf individuals and poor people communication in communicating. They use sign language as their primary language which consists of 2 types of action: sign and finger spelling. However, not all people understand their sign language so that this becomes a problem for them to communicate with normal people. this problem also becomes a factor they are isolated feel from the social life. It needs a solution that can help them to be able to interacting with normal people. Many research that offers a variety of methods in solving the problem of sign language recognition based on image processing. SIFT (Scale Invariant Feature Transform) algorithm is one of the methods that can be used to identify an object. SIFT is claimed very resistant to scaling, rotation, illumination and noise. Using SIFT algorithm for Indonesian sign language recognition number result rate recognition to 82% with the use of a total of 100 samples image dataset consisting 50 sample for training data and 50 sample images for testing data. Change threshold value get affect the result of the recognition. The best value threshold is 0.45 with rate recognition of 94%.

  15. Improving iris recognition performance using segmentation, quality enhancement, match score fusion, and indexing.

    PubMed

    Vatsa, Mayank; Singh, Richa; Noore, Afzel

    2008-08-01

    This paper proposes algorithms for iris segmentation, quality enhancement, match score fusion, and indexing to improve both the accuracy and the speed of iris recognition. A curve evolution approach is proposed to effectively segment a nonideal iris image using the modified Mumford-Shah functional. Different enhancement algorithms are concurrently applied on the segmented iris image to produce multiple enhanced versions of the iris image. A support-vector-machine-based learning algorithm selects locally enhanced regions from each globally enhanced image and combines these good-quality regions to create a single high-quality iris image. Two distinct features are extracted from the high-quality iris image. The global textural feature is extracted using the 1-D log polar Gabor transform, and the local topological feature is extracted using Euler numbers. An intelligent fusion algorithm combines the textural and topological matching scores to further improve the iris recognition performance and reduce the false rejection rate, whereas an indexing algorithm enables fast and accurate iris identification. The verification and identification performance of the proposed algorithms is validated and compared with other algorithms using the CASIA Version 3, ICE 2005, and UBIRIS iris databases.

  16. Iris Cryptography for Security Purpose

    NASA Astrophysics Data System (ADS)

    Ajith, Srighakollapu; Balaji Ganesh Kumar, M.; Latha, S.; Samiappan, Dhanalakshmi; Muthu, P.

    2018-04-01

    In today's world, the security became the major issue to every human being. A major issue is hacking as hackers are everywhere, as the technology was developed still there are many issues where the technology fails to meet the security. Engineers, scientists were discovering the new products for security purpose as biometrics sensors like face recognition, pattern recognition, gesture recognition, voice authentication etcetera. But these devices fail to reach the expected results. In this work, we are going to present an approach to generate a unique secure key using the iris template. Here the iris templates are processed using the well-defined processing techniques. Using the encryption and decryption process they are stored, traversed and utilized. As of the work, we can conclude that the iris cryptography gives us the expected results for securing the data from eavesdroppers.

  17. Using voice input and audio feedback to enhance the reality of a virtual experience

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miner, N.E.

    1994-04-01

    Virtual Reality (VR) is a rapidly emerging technology which allows participants to experience a virtual environment through stimulation of the participant`s senses. Intuitive and natural interactions with the virtual world help to create a realistic experience. Typically, a participant is immersed in a virtual environment through the use of a 3-D viewer. Realistic, computer-generated environment models and accurate tracking of a participant`s view are important factors for adding realism to a virtual experience. Stimulating a participant`s sense of sound and providing a natural form of communication for interacting with the virtual world are equally important. This paper discusses the advantagesmore » and importance of incorporating voice recognition and audio feedback capabilities into a virtual world experience. Various approaches and levels of complexity are discussed. Examples of the use of voice and sound are presented through the description of a research application developed in the VR laboratory at Sandia National Laboratories.« less

  18. A voice-actuated wind tunnel model leak checking system

    NASA Technical Reports Server (NTRS)

    Larson, William E.

    1989-01-01

    A computer program has been developed that improves the efficiency of wind tunnel model leak checking. The program uses a voice recognition unit to relay a technician's commands to the computer. The computer, after receiving a command, can respond to the technician via a voice response unit. Information about the model pressure orifice being checked is displayed on a gas-plasma terminal. On command, the program records up to 30 seconds of pressure data. After the recording is complete, the raw data and a straight line fit of the data are plotted on the terminal. This allows the technician to make a decision on the integrity of the orifice being checked. All results of the leak check program are stored in a database file that can be listed on the line printer for record keeping purposes or displayed on the terminal to help the technician find unchecked orifices. This program allows one technician to check a model for leaks instead of the two or three previously required.

  19. Design and development of a Space Station proximity operations research and development mockup

    NASA Technical Reports Server (NTRS)

    Haines, Richard F.

    1986-01-01

    Proximity operations (Prox-Ops) on-orbit refers to all activities taking place within one km of the Space Station. Designing a Prox-Ops control station calls for a comprehensive systems approach which takes into account structural constraints, orbital dynamics including approach/departure flight paths, myriad human factors and other topics. This paper describes a reconfigurable full-scale mock-up of a Prox-Ops station constructed at Ames incorporating an array of windows (with dynamic star field, target vehicle(s), and head-up symbology), head-down perspective display of manned and unmanned vehicles, voice- actuated 'electronic checklist', computer-generated voice system, expert system (to help diagnose subsystem malfunctions), and other displays and controls. The facility is used for demonstrations of selected Prox-Ops approach scenarios, human factors research (work-load assessment, determining external vision envelope requirements, head-down and head-up symbology design, voice synthesis and recognition research, etc.) and development of engineering design guidelines for future module interiors.

  20. Representational specificity of within-category phonetic variation in the mental lexicon

    NASA Astrophysics Data System (ADS)

    Ju, Min; Luce, Paul A.

    2003-10-01

    This study examines (1) whether within-category phonetic variation in voice onset time (VOT) is encoded in long-term memory and has consequences for subsequent word recognition and, if so, (2) whether such effects are greater in words with voiced counterparts (pat/bat) than those without (cow/*gow), given that VOT information is more critical for lexical discrimination in the former. Two long-term repetition priming experiments were conducted using words containing word-initial voiceless stops varying in VOT. Reaction times to a lexical decision were compared between the same and different VOT conditions in words with or without voiced counterparts. If veridical representations of each episode are preserved in memory, variation in VOT should have demonstrable effects on the magnitude of priming. However, if within-category variation is discarded and form-based representations are abstract, the variation in VOT should not mediate priming. The implications of these results for the specificity and abstractness of phonetic representations in long-term memory will be discussed.

  1. The written voice: implicit memory effects of voice characteristics following silent reading and auditory presentation.

    PubMed

    Abramson, Marianne

    2007-12-01

    After being familiarized with two voices, either implicit (auditory lexical decision) or explicit memory (auditory recognition) for words from silently read sentences was assessed among 32 men and 32 women volunteers. In the silently read sentences, the sex of speaker was implied in the initial words, e.g., "He said, ..." or "She said...". Tone in question versus statement was also manipulated by appropriate punctuation. Auditory lexical decision priming was found for sex- and tone-consistent items following silent reading, but only up to 5 min. after silent reading. In a second study, similar lexical decision priming was found following listening to the sentences, although these effects remained reliable after a 2-day delay. The effect sizes for lexical decision priming showed that tone-consistency and sex-consistency were strong following both silent reading and listening 5 min. after studying. These results suggest that readers create episodic traces of text from auditory images of silently read sentences as they do during listening.

  2. Probabilistic Open Set Recognition

    NASA Astrophysics Data System (ADS)

    Jain, Lalit Prithviraj

    Real-world tasks in computer vision, pattern recognition and machine learning often touch upon the open set recognition problem: multi-class recognition with incomplete knowledge of the world and many unknown inputs. An obvious way to approach such problems is to develop a recognition system that thresholds probabilities to reject unknown classes. Traditional rejection techniques are not about the unknown; they are about the uncertain boundary and rejection around that boundary. Thus traditional techniques only represent the "known unknowns". However, a proper open set recognition algorithm is needed to reduce the risk from the "unknown unknowns". This dissertation examines this concept and finds existing probabilistic multi-class recognition approaches are ineffective for true open set recognition. We hypothesize the cause is due to weak adhoc assumptions combined with closed-world assumptions made by existing calibration techniques. Intuitively, if we could accurately model just the positive data for any known class without overfitting, we could reject the large set of unknown classes even under this assumption of incomplete class knowledge. For this, we formulate the problem as one of modeling positive training data by invoking statistical extreme value theory (EVT) near the decision boundary of positive data with respect to negative data. We provide a new algorithm called the PI-SVM for estimating the unnormalized posterior probability of class inclusion. This dissertation also introduces a new open set recognition model called Compact Abating Probability (CAP), where the probability of class membership decreases in value (abates) as points move from known data toward open space. We show that CAP models improve open set recognition for multiple algorithms. Leveraging the CAP formulation, we go on to describe the novel Weibull-calibrated SVM (W-SVM) algorithm, which combines the useful properties of statistical EVT for score calibration with one-class and binary support vector machines. Building from the success of statistical EVT based recognition methods such as PI-SVM and W-SVM on the open set problem, we present a new general supervised learning algorithm for multi-class classification and multi-class open set recognition called the Extreme Value Local Basis (EVLB). The design of this algorithm is motivated by the observation that extrema from known negative class distributions are the closest negative points to any positive sample during training, and thus should be used to define the parameters of a probabilistic decision model. In the EVLB, the kernel distribution for each positive training sample is estimated via an EVT distribution fit over the distances to the separating hyperplane between positive training sample and closest negative samples, with a subset of the overall positive training data retained to form a probabilistic decision boundary. Using this subset as a frame of reference, the probability of a sample at test time decreases as it moves away from the positive class. Possessing this property, the EVLB is well-suited to open set recognition problems where samples from unknown or novel classes are encountered at test. Our experimental evaluation shows that the EVLB provides a substantial improvement in scalability compared to standard radial basis function kernel machines, as well as P I-SVM and W-SVM, with improved accuracy in many cases. We evaluate our algorithm on open set variations of the standard visual learning benchmarks, as well as with an open subset of classes from Caltech 256 and ImageNet. Our experiments show that PI-SVM, WSVM and EVLB provide significant advances over the previous state-of-the-art solutions for the same tasks.

  3. Visual and auditory socio-cognitive perception in unilateral temporal lobe epilepsy in children and adolescents: a prospective controlled study.

    PubMed

    Laurent, Agathe; Arzimanoglou, Alexis; Panagiotakaki, Eleni; Sfaello, Ignacio; Kahane, Philippe; Ryvlin, Philippe; Hirsch, Edouard; de Schonen, Scania

    2014-12-01

    A high rate of abnormal social behavioural traits or perceptual deficits is observed in children with unilateral temporal lobe epilepsy. In the present study, perception of auditory and visual social signals, carried by faces and voices, was evaluated in children or adolescents with temporal lobe epilepsy. We prospectively investigated a sample of 62 children with focal non-idiopathic epilepsy early in the course of the disorder. The present analysis included 39 children with a confirmed diagnosis of temporal lobe epilepsy. Control participants (72), distributed across 10 age groups, served as a control group. Our socio-perceptual evaluation protocol comprised three socio-visual tasks (face identity, facial emotion and gaze direction recognition), two socio-auditory tasks (voice identity and emotional prosody recognition), and three control tasks (lip reading, geometrical pattern and linguistic intonation recognition). All 39 patients also benefited from a neuropsychological examination. As a group, children with temporal lobe epilepsy performed at a significantly lower level compared to the control group with regards to recognition of facial identity, direction of eye gaze, and emotional facial expressions. We found no relationship between the type of visual deficit and age at first seizure, duration of epilepsy, or the epilepsy-affected cerebral hemisphere. Deficits in socio-perceptual tasks could be found independently of the presence of deficits in visual or auditory episodic memory, visual non-facial pattern processing (control tasks), or speech perception. A normal FSIQ did not exempt some of the patients from an underlying deficit in some of the socio-perceptual tasks. Temporal lobe epilepsy not only impairs development of emotion recognition, but can also impair development of perception of other socio-perceptual signals in children with or without intellectual deficiency. Prospective studies need to be designed to evaluate the results of appropriate re-education programs in children presenting with deficits in social cue processing.

  4. Automated target recognition and tracking using an optical pattern recognition neural network

    NASA Technical Reports Server (NTRS)

    Chao, Tien-Hsin

    1991-01-01

    The on-going development of an automatic target recognition and tracking system at the Jet Propulsion Laboratory is presented. This system is an optical pattern recognition neural network (OPRNN) that is an integration of an innovative optical parallel processor and a feature extraction based neural net training algorithm. The parallel optical processor provides high speed and vast parallelism as well as full shift invariance. The neural network algorithm enables simultaneous discrimination of multiple noisy targets in spite of their scales, rotations, perspectives, and various deformations. This fully developed OPRNN system can be effectively utilized for the automated spacecraft recognition and tracking that will lead to success in the Automated Rendezvous and Capture (AR&C) of the unmanned Cargo Transfer Vehicle (CTV). One of the most powerful optical parallel processors for automatic target recognition is the multichannel correlator. With the inherent advantages of parallel processing capability and shift invariance, multiple objects can be simultaneously recognized and tracked using this multichannel correlator. This target tracking capability can be greatly enhanced by utilizing a powerful feature extraction based neural network training algorithm such as the neocognitron. The OPRNN, currently under investigation at JPL, is constructed with an optical multichannel correlator where holographic filters have been prepared using the neocognitron training algorithm. The computation speed of the neocognitron-type OPRNN is up to 10(exp 14) analog connections/sec that enabling the OPRNN to outperform its state-of-the-art electronics counterpart by at least two orders of magnitude.

  5. Recognizing Age-Separated Face Images: Humans and Machines

    PubMed Central

    Yadav, Daksha; Singh, Richa; Vatsa, Mayank; Noore, Afzel

    2014-01-01

    Humans utilize facial appearance, gender, expression, aging pattern, and other ancillary information to recognize individuals. It is interesting to observe how humans perceive facial age. Analyzing these properties can help in understanding the phenomenon of facial aging and incorporating the findings can help in designing effective algorithms. Such a study has two components - facial age estimation and age-separated face recognition. Age estimation involves predicting the age of an individual given his/her facial image. On the other hand, age-separated face recognition consists of recognizing an individual given his/her age-separated images. In this research, we investigate which facial cues are utilized by humans for estimating the age of people belonging to various age groups along with analyzing the effect of one's gender, age, and ethnicity on age estimation skills. We also analyze how various facial regions such as binocular and mouth regions influence age estimation and recognition capabilities. Finally, we propose an age-invariant face recognition algorithm that incorporates the knowledge learned from these observations. Key observations of our research are: (1) the age group of newborns and toddlers is easiest to estimate, (2) gender and ethnicity do not affect the judgment of age group estimation, (3) face as a global feature, is essential to achieve good performance in age-separated face recognition, and (4) the proposed algorithm yields improved recognition performance compared to existing algorithms and also outperforms a commercial system in the young image as probe scenario. PMID:25474200

  6. Recognizing age-separated face images: humans and machines.

    PubMed

    Yadav, Daksha; Singh, Richa; Vatsa, Mayank; Noore, Afzel

    2014-01-01

    Humans utilize facial appearance, gender, expression, aging pattern, and other ancillary information to recognize individuals. It is interesting to observe how humans perceive facial age. Analyzing these properties can help in understanding the phenomenon of facial aging and incorporating the findings can help in designing effective algorithms. Such a study has two components--facial age estimation and age-separated face recognition. Age estimation involves predicting the age of an individual given his/her facial image. On the other hand, age-separated face recognition consists of recognizing an individual given his/her age-separated images. In this research, we investigate which facial cues are utilized by humans for estimating the age of people belonging to various age groups along with analyzing the effect of one's gender, age, and ethnicity on age estimation skills. We also analyze how various facial regions such as binocular and mouth regions influence age estimation and recognition capabilities. Finally, we propose an age-invariant face recognition algorithm that incorporates the knowledge learned from these observations. Key observations of our research are: (1) the age group of newborns and toddlers is easiest to estimate, (2) gender and ethnicity do not affect the judgment of age group estimation, (3) face as a global feature, is essential to achieve good performance in age-separated face recognition, and (4) the proposed algorithm yields improved recognition performance compared to existing algorithms and also outperforms a commercial system in the young image as probe scenario.

  7. Key features for ATA / ATR database design in missile systems

    NASA Astrophysics Data System (ADS)

    Özertem, Kemal Arda

    2017-05-01

    Automatic target acquisition (ATA) and automatic target recognition (ATR) are two vital tasks for missile systems, and having a robust detection and recognition algorithm is crucial for overall system performance. In order to have a robust target detection and recognition algorithm, an extensive image database is required. Automatic target recognition algorithms use the database of images in training and testing steps of algorithm. This directly affects the recognition performance, since the training accuracy is driven by the quality of the image database. In addition, the performance of an automatic target detection algorithm can be measured effectively by using an image database. There are two main ways for designing an ATA / ATR database. The first and easy way is by using a scene generator. A scene generator can model the objects by considering its material information, the atmospheric conditions, detector type and the territory. Designing image database by using a scene generator is inexpensive and it allows creating many different scenarios quickly and easily. However the major drawback of using a scene generator is its low fidelity, since the images are created virtually. The second and difficult way is designing it using real-world images. Designing image database with real-world images is a lot more costly and time consuming; however it offers high fidelity, which is critical for missile algorithms. In this paper, critical concepts in ATA / ATR database design with real-world images are discussed. Each concept is discussed in the perspective of ATA and ATR separately. For the implementation stage, some possible solutions and trade-offs for creating the database are proposed, and all proposed approaches are compared to each other with regards to their pros and cons.

  8. A 2D range Hausdorff approach to 3D facial recognition.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koch, Mark William; Russ, Trina Denise; Little, Charles Quentin

    2004-11-01

    This paper presents a 3D facial recognition algorithm based on the Hausdorff distance metric. The standard 3D formulation of the Hausdorff matching algorithm has been modified to operate on a 2D range image, enabling a reduction in computation from O(N2) to O(N) without large storage requirements. The Hausdorff distance is known for its robustness to data outliers and inconsistent data between two data sets, making it a suitable choice for dealing with the inherent problems in many 3D datasets due to sensor noise and object self-occlusion. For optimal performance, the algorithm assumes a good initial alignment between probe and templatemore » datasets. However, to minimize the error between two faces, the alignment can be iteratively refined. Results from the algorithm are presented using 3D face images from the Face Recognition Grand Challenge database version 1.0.« less

  9. Autoregressive statistical pattern recognition algorithms for damage detection in civil structures

    NASA Astrophysics Data System (ADS)

    Yao, Ruigen; Pakzad, Shamim N.

    2012-08-01

    Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.

  10. Wheezing recognition algorithm using recordings of respiratory sounds at the mouth in a pediatric population.

    PubMed

    Bokov, Plamen; Mahut, Bruno; Flaud, Patrice; Delclaux, Christophe

    2016-03-01

    Respiratory diseases in children are a common reason for physician visits. A diagnostic difficulty arises when parents hear wheezing that is no longer present during the medical consultation. Thus, an outpatient objective tool for recognition of wheezing is of clinical value. We developed a wheezing recognition algorithm from recorded respiratory sounds with a Smartphone placed near the mouth. A total of 186 recordings were obtained in a pediatric emergency department, mostly in toddlers (mean age 20 months). After exclusion of recordings with artefacts and those with a single clinical operator auscultation, 95 recordings with the agreement of two operators on auscultation diagnosis (27 with wheezing and 68 without) were subjected to a two phase algorithm (signal analysis and pattern classifier using machine learning algorithms) to classify records. The best performance (71.4% sensitivity and 88.9% specificity) was observed with a Support Vector Machine-based algorithm. We further tested the algorithm over a set of 39 recordings having a single operator and found a fair agreement (kappa=0.28, CI95% [0.12, 0.45]) between the algorithm and the operator. The main advantage of such an algorithm is its use in contact-free sound recording, thus valuable in the pediatric population. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Spatially Invariant Vector Quantization: A pattern matching algorithm for multiple classes of image subject matter including pathology.

    PubMed

    Hipp, Jason D; Cheng, Jerome Y; Toner, Mehmet; Tompkins, Ronald G; Balis, Ulysses J

    2011-02-26

    HISTORICALLY, EFFECTIVE CLINICAL UTILIZATION OF IMAGE ANALYSIS AND PATTERN RECOGNITION ALGORITHMS IN PATHOLOGY HAS BEEN HAMPERED BY TWO CRITICAL LIMITATIONS: 1) the availability of digital whole slide imagery data sets and 2) a relative domain knowledge deficit in terms of application of such algorithms, on the part of practicing pathologists. With the advent of the recent and rapid adoption of whole slide imaging solutions, the former limitation has been largely resolved. However, with the expectation that it is unlikely for the general cohort of contemporary pathologists to gain advanced image analysis skills in the short term, the latter problem remains, thus underscoring the need for a class of algorithm that has the concurrent properties of image domain (or organ system) independence and extreme ease of use, without the need for specialized training or expertise. In this report, we present a novel, general case pattern recognition algorithm, Spatially Invariant Vector Quantization (SIVQ), that overcomes the aforementioned knowledge deficit. Fundamentally based on conventional Vector Quantization (VQ) pattern recognition approaches, SIVQ gains its superior performance and essentially zero-training workflow model from its use of ring vectors, which exhibit continuous symmetry, as opposed to square or rectangular vectors, which do not. By use of the stochastic matching properties inherent in continuous symmetry, a single ring vector can exhibit as much as a millionfold improvement in matching possibilities, as opposed to conventional VQ vectors. SIVQ was utilized to demonstrate rapid and highly precise pattern recognition capability in a broad range of gross and microscopic use-case settings. With the performance of SIVQ observed thus far, we find evidence that indeed there exist classes of image analysis/pattern recognition algorithms suitable for deployment in settings where pathologists alone can effectively incorporate their use into clinical workflow, as a turnkey solution. We anticipate that SIVQ, and other related class-independent pattern recognition algorithms, will become part of the overall armamentarium of digital image analysis approaches that are immediately available to practicing pathologists, without the need for the immediate availability of an image analysis expert.

  12. Aided target recognition processing of MUDSS sonar data

    NASA Astrophysics Data System (ADS)

    Lau, Brian; Chao, Tien-Hsin

    1998-09-01

    The Mobile Underwater Debris Survey System (MUDSS) is a collaborative effort by the Navy and the Jet Propulsion Lab to demonstrate multi-sensor, real-time, survey of underwater sites for ordnance and explosive waste (OEW). We describe the sonar processing algorithm, a novel target recognition algorithm incorporating wavelets, morphological image processing, expansion by Hermite polynomials, and neural networks. This algorithm has found all planted targets in MUDSS tests and has achieved spectacular success upon another Coastal Systems Station (CSS) sonar image database.

  13. Classifier dependent feature preprocessing methods

    NASA Astrophysics Data System (ADS)

    Rodriguez, Benjamin M., II; Peterson, Gilbert L.

    2008-04-01

    In mobile applications, computational complexity is an issue that limits sophisticated algorithms from being implemented on these devices. This paper provides an initial solution to applying pattern recognition systems on mobile devices by combining existing preprocessing algorithms for recognition. In pattern recognition systems, it is essential to properly apply feature preprocessing tools prior to training classification models in an attempt to reduce computational complexity and improve the overall classification accuracy. The feature preprocessing tools extended for the mobile environment are feature ranking, feature extraction, data preparation and outlier removal. Most desktop systems today are capable of processing a majority of the available classification algorithms without concern of processing while the same is not true on mobile platforms. As an application of pattern recognition for mobile devices, the recognition system targets the problem of steganalysis, determining if an image contains hidden information. The measure of performance shows that feature preprocessing increases the overall steganalysis classification accuracy by an average of 22%. The methods in this paper are tested on a workstation and a Nokia 6620 (Symbian operating system) camera phone with similar results.

  14. Recognition of plant parts with problem-specific algorithms

    NASA Astrophysics Data System (ADS)

    Schwanke, Joerg; Brendel, Thorsten; Jensch, Peter F.; Megnet, Roland

    1994-06-01

    Automatic micropropagation is necessary to produce cost-effective high amounts of biomass. Juvenile plants are dissected in clean- room environment on particular points on the stem or the leaves. A vision-system detects possible cutting points and controls a specialized robot. This contribution is directed to the pattern- recognition algorithms to detect structural parts of the plant.

  15. Enhanced facial texture illumination normalization for face recognition.

    PubMed

    Luo, Yong; Guan, Ye-Peng

    2015-08-01

    An uncontrolled lighting condition is one of the most critical challenges for practical face recognition applications. An enhanced facial texture illumination normalization method is put forward to resolve this challenge. An adaptive relighting algorithm is developed to improve the brightness uniformity of face images. Facial texture is extracted by using an illumination estimation difference algorithm. An anisotropic histogram-stretching algorithm is proposed to minimize the intraclass distance of facial skin and maximize the dynamic range of facial texture distribution. Compared with the existing methods, the proposed method can more effectively eliminate the redundant information of facial skin and illumination. Extensive experiments show that the proposed method has superior performance in normalizing illumination variation and enhancing facial texture features for illumination-insensitive face recognition.

  16. Computer Recognition of Facial Profiles

    DTIC Science & Technology

    1974-08-01

    facial recognition 20. ABSTRACT (Continue on reverse side It necessary and Identify by block number) A system for the recognition of human faces from...21 2.6 Classification Algorithms ........... ... 32 III FACIAL RECOGNITION AND AUTOMATIC TRAINING . . . 37 3.1 Facial Profile Recognition...provide a fair test of the classification system. The work of Goldstein, Harmon, and Lesk [81 indicates, however, that for facial recognition , a ten class

  17. Research on Palmprint Identification Method Based on Quantum Algorithms

    PubMed Central

    Zhang, Zhanzhan

    2014-01-01

    Quantum image recognition is a technology by using quantum algorithm to process the image information. It can obtain better effect than classical algorithm. In this paper, four different quantum algorithms are used in the three stages of palmprint recognition. First, quantum adaptive median filtering algorithm is presented in palmprint filtering processing. Quantum filtering algorithm can get a better filtering result than classical algorithm through the comparison. Next, quantum Fourier transform (QFT) is used to extract pattern features by only one operation due to quantum parallelism. The proposed algorithm exhibits an exponential speed-up compared with discrete Fourier transform in the feature extraction. Finally, quantum set operations and Grover algorithm are used in palmprint matching. According to the experimental results, quantum algorithm only needs to apply square of N operations to find out the target palmprint, but the traditional method needs N times of calculation. At the same time, the matching accuracy of quantum algorithm is almost 100%. PMID:25105165

  18. Face Recognition Using Local Quantized Patterns and Gabor Filters

    NASA Astrophysics Data System (ADS)

    Khryashchev, V.; Priorov, A.; Stepanova, O.; Nikitin, A.

    2015-05-01

    The problem of face recognition in a natural or artificial environment has received a great deal of researchers' attention over the last few years. A lot of methods for accurate face recognition have been proposed. Nevertheless, these methods often fail to accurately recognize the person in difficult scenarios, e.g. low resolution, low contrast, pose variations, etc. We therefore propose an approach for accurate and robust face recognition by using local quantized patterns and Gabor filters. The estimation of the eye centers is used as a preprocessing stage. The evaluation of our algorithm on different samples from a standardized FERET database shows that our method is invariant to the general variations of lighting, expression, occlusion and aging. The proposed approach allows about 20% correct recognition accuracy increase compared with the known face recognition algorithms from the OpenCV library. The additional use of Gabor filters can significantly improve the robustness to changes in lighting conditions.

  19. A Lightweight Hierarchical Activity Recognition Framework Using Smartphone Sensors

    PubMed Central

    Han, Manhyung; Bang, Jae Hun; Nugent, Chris; McClean, Sally; Lee, Sungyoung

    2014-01-01

    Activity recognition for the purposes of recognizing a user's intentions using multimodal sensors is becoming a widely researched topic largely based on the prevalence of the smartphone. Previous studies have reported the difficulty in recognizing life-logs by only using a smartphone due to the challenges with activity modeling and real-time recognition. In addition, recognizing life-logs is difficult due to the absence of an established framework which enables the use of different sources of sensor data. In this paper, we propose a smartphone-based Hierarchical Activity Recognition Framework which extends the Naïve Bayes approach for the processing of activity modeling and real-time activity recognition. The proposed algorithm demonstrates higher accuracy than the Naïve Bayes approach and also enables the recognition of a user's activities within a mobile environment. The proposed algorithm has the ability to classify fifteen activities with an average classification accuracy of 92.96%. PMID:25184486

  20. Deaf-And-Mute Sign Language Generation System

    NASA Astrophysics Data System (ADS)

    Kawai, Hideo; Tamura, Shinichi

    1984-08-01

    We have developed a system which can recognize speech and generate the corresponding animation-like sign language sequence. The system is implemented in a popular personal computer. This has three video-RAM's and a voice recognition board which can recognize only registered voice of a specific speaker. Presently, fourty sign language patterns and fifty finger spellings are stored in two floppy disks. Each sign pattern is composed of one to four sub-patterns. That is, if the pattern is composed of one sub-pattern, it is displayed as a still pattern. If not, it is displayed as a motion pattern. This system will help communications between deaf-and-mute persons and healthy persons. In order to display in high speed, almost programs are written in a machine language.

  1. Approach to recognition of flexible form for credit card expiration date recognition as example

    NASA Astrophysics Data System (ADS)

    Sheshkus, Alexander; Nikolaev, Dmitry P.; Ingacheva, Anastasia; Skoryukina, Natalya

    2015-12-01

    In this paper we consider a task of finding information fields within document with flexible form for credit card expiration date field as example. We discuss main difficulties and suggest possible solutions. In our case this task is to be solved on mobile devices therefore computational complexity has to be as low as possible. In this paper we provide results of the analysis of suggested algorithm. Error distribution of the recognition system shows that suggested algorithm solves the task with required accuracy.

  2. Comparison of crisp and fuzzy character networks in handwritten word recognition

    NASA Technical Reports Server (NTRS)

    Gader, Paul; Mohamed, Magdi; Chiang, Jung-Hsien

    1992-01-01

    Experiments involving handwritten word recognition on words taken from images of handwritten address blocks from the United States Postal Service mailstream are described. The word recognition algorithm relies on the use of neural networks at the character level. The neural networks are trained using crisp and fuzzy desired outputs. The fuzzy outputs were defined using a fuzzy k-nearest neighbor algorithm. The crisp networks slightly outperformed the fuzzy networks at the character level but the fuzzy networks outperformed the crisp networks at the word level.

  3. Intimate Geographies: Reclaiming Citizenship and Community in "The Autobiography of Delfina Cuero" and Bonita Nunez's "Diaries"

    ERIC Educational Resources Information Center

    Fitzgerald, Stephanie

    2006-01-01

    American Indian women's autobiographies recount a specific type of life experience that has often been overlooked, one that is equally important in understanding the genre and to develop ways of reading these texts that balance the recovery and recognition of the Native voice and agency contained within them with the processes of creation and the…

  4. Input and Output Mechanisms and Devices. Phase I: Adding Voice Output to a Speaker-Independent Recognition System.

    ERIC Educational Resources Information Center

    Scott Instruments Corp., Denton, TX.

    This project was designed to develop techniques for adding low-cost speech synthesis to educational software. Four tasks were identified for the study: (1) select a microcomputer with a built-in analog-to-digital converter that is currently being used in educational environments; (2) determine the feasibility of implementing expansion and playback…

  5. Advanced Productivity Analysis Methods for Air Traffic Control Operations

    DTIC Science & Technology

    1976-12-01

    Routine Work ............................... 37 4.2.2. Surveillance Work .......................... 40 4.2.3. Conflict Prcessing Work ................... 41...crossing and overtake conflicts) includes potential- conflict recognition, assessment, and resolution decision making and A/N voice communications...makers to utilize £ .quantitative and dynamic analysis as a tool for decision - making. 1.1.3 Types of Simulation Models Although there are many ways to

  6. The Complexity of Literacy in Kenya: Narrative Analysis of Maasai Women's Experiences

    ERIC Educational Resources Information Center

    Taeko, Takayanagi

    2014-01-01

    This paper aims to challenge limited notions of literacy and argues for the recognition of Maasai women's self-determined learning in order to bring about human development in Kenya. It also seeks to construct a complex picture of literacy, drawing on postcolonial feminist theory as a framework to ensure that the woman's voice is heard. Through…

  7. Enhancement of temporal periodicity cues in cochlear implants: Effects on prosodic perception and vowel identification

    NASA Astrophysics Data System (ADS)

    Green, Tim; Faulkner, Andrew; Rosen, Stuart; Macherey, Olivier

    2005-07-01

    Standard continuous interleaved sampling processing, and a modified processing strategy designed to enhance temporal cues to voice pitch, were compared on tests of intonation perception, and vowel perception, both in implant users and in acoustic simulations. In standard processing, 400 Hz low-pass envelopes modulated either pulse trains (implant users) or noise carriers (simulations). In the modified strategy, slow-rate envelope modulations, which convey dynamic spectral variation crucial for speech understanding, were extracted by low-pass filtering (32 Hz). In addition, during voiced speech, higher-rate temporal modulation in each channel was provided by 100% amplitude-modulation by a sawtooth-like wave form whose periodicity followed the fundamental frequency (F0) of the input. Channel levels were determined by the product of the lower- and higher-rate modulation components. Both in acoustic simulations and in implant users, the ability to use intonation information to identify sentences as question or statement was significantly better with modified processing. However, while there was no difference in vowel recognition in the acoustic simulation, implant users performed worse with modified processing both in vowel recognition and in formant frequency discrimination. It appears that, while enhancing pitch perception, modified processing harmed the transmission of spectral information.

  8. Major depression is associated with impaired processing of emotion in music as well as in facial and vocal stimuli.

    PubMed

    Naranjo, C; Kornreich, C; Campanella, S; Noël, X; Vandriette, Y; Gillain, B; de Longueville, X; Delatte, B; Verbanck, P; Constant, E

    2011-02-01

    The processing of emotional stimuli is thought to be negatively biased in major depression. This study investigates this issue using musical, vocal and facial affective stimuli. 23 depressed in-patients and 23 matched healthy controls were recruited. Affective information processing was assessed through musical, vocal and facial emotion recognition tasks. Depression, anxiety level and attention capacity were controlled. The depressed participants demonstrated less accurate identification of emotions than the control group in all three sorts of emotion-recognition tasks. The depressed group also gave higher intensity ratings than the controls when scoring negative emotions, and they were more likely to attribute negative emotions to neutral voices and faces. Our in-patient group might differ from the more general population of depressed adults. They were all taking anti-depressant medication, which may have had an influence on their emotional information processing. Major depression is associated with a general negative bias in the processing of emotional stimuli. Emotional processing impairment in depression is not confined to interpersonal stimuli (faces and voices), being also present in the ability to feel music accurately. © 2010 Elsevier B.V. All rights reserved.

  9. Flexible and wearable electronic silk fabrics for human physiological monitoring

    NASA Astrophysics Data System (ADS)

    Mao, Cuiping; Zhang, Huihui; Lu, Zhisong

    2017-09-01

    The development of textile-based devices for human physiological monitoring has attracted tremendous interest in recent years. However, flexible physiological sensing elements based on silk fabrics have not been realized. In this paper, ZnO nanorod arrays are grown in situ on reduced graphene oxide-coated silk fabrics via a facile electro-deposition method for the fabrication of silk-fabric-based mechanical sensing devices. The data show that well-aligned ZnO nanorods with hexagonal wurtzite crystalline structures are synthesized on the conductive silk fabric surface. After magnetron sputtering of gold electrodes, silk-fabric-based devices are produced and applied to detect periodic bending and twisting. Based on the electric signals, the deformation and release processes can be easily differentiated. Human arterial pulse and respiration can also be real-time monitored to calculate the pulse rate and respiration frequency, respectively. Throat vibrations during coughing and singing are detected to demonstrate the voice recognition capability. This work may not only help develop silk-fabric-based mechanical sensing elements for potential applications in clinical diagnosis, daily healthcare monitoring and voice recognition, but also provide a versatile method for fabricating textile-based flexible electronic devices.

  10. Florida manatee avoidance technology: A pilot program by the Florida Fish and Wildlife Conservation Commission

    NASA Astrophysics Data System (ADS)

    Frisch, Katherine; Haubold, Elsa

    2003-10-01

    Since 1976, approximately 25% of the annual Florida manatee (Trichechus manatus latirostris) mortality has been attributed to collisions with watercraft. In 2001, the Florida Legislature appropriated $200,000 in funds for research projects using technological solutions to directly address the problem of collisions between manatees and watercraft. The Florida Fish & Wildlife Conservation Commission initially funded seven projects for the first two fiscal years. The selected proposals were designed to explore technology that had not previously been applied to the manatee/boat collision problem and included many acoustic concepts related to voice recognition, sonar, and an alerting device to be put on boats to warn manatees. The most promising results to date are from projects employing voice-recognition techniques to identify manatee vocalizations and warn boaters of the manatees' presence. Sonar technology, much like that used in fish finders, is promising but has met with regulatory problems regarding permitting and remains to be tested, as has the manatee-alerting device. The state of Florida found results of the initial years of funding compelling and plans to fund further manatee avoidance technology research in a continued effort to mitigate the problem of manatee/boat collisions.

  11. Design and performance of a large vocabulary discrete word recognition system. Volume 1: Technical report. [real time computer technique for voice data processing

    NASA Technical Reports Server (NTRS)

    1973-01-01

    The development, construction, and test of a 100-word vocabulary near real time word recognition system are reported. Included are reasonable replacement of any one or all 100 words in the vocabulary, rapid learning of a new speaker, storage and retrieval of training sets, verbal or manual single word deletion, continuous adaptation with verbal or manual error correction, on-line verification of vocabulary as spoken, system modes selectable via verification display keyboard, relationship of classified word to neighboring word, and a versatile input/output interface to accommodate a variety of applications.

  12. Fusion of Visible and Thermal Descriptors Using Genetic Algorithms for Face Recognition Systems.

    PubMed

    Hermosilla, Gabriel; Gallardo, Francisco; Farias, Gonzalo; San Martin, Cesar

    2015-07-23

    The aim of this article is to present a new face recognition system based on the fusion of visible and thermal features obtained from the most current local matching descriptors by maximizing face recognition rates through the use of genetic algorithms. The article considers a comparison of the performance of the proposed fusion methodology against five current face recognition methods and classic fusion techniques used commonly in the literature. These were selected by considering their performance in face recognition. The five local matching methods and the proposed fusion methodology are evaluated using the standard visible/thermal database, the Equinox database, along with a new database, the PUCV-VTF, designed for visible-thermal studies in face recognition and described for the first time in this work. The latter is created considering visible and thermal image sensors with different real-world conditions, such as variations in illumination, facial expression, pose, occlusion, etc. The main conclusions of this article are that two variants of the proposed fusion methodology surpass current face recognition methods and the classic fusion techniques reported in the literature, attaining recognition rates of over 97% and 99% for the Equinox and PUCV-VTF databases, respectively. The fusion methodology is very robust to illumination and expression changes, as it combines thermal and visible information efficiently by using genetic algorithms, thus allowing it to choose optimal face areas where one spectrum is more representative than the other.

  13. Study on recognition algorithm for paper currency numbers based on neural network

    NASA Astrophysics Data System (ADS)

    Li, Xiuyan; Liu, Tiegen; Li, Yuanyao; Zhang, Zhongchuan; Deng, Shichao

    2008-12-01

    Based on the unique characteristic, the paper currency numbers can be put into record and the automatic identification equipment for paper currency numbers is supplied to currency circulation market in order to provide convenience for financial sectors to trace the fiduciary circulation socially and provide effective supervision on paper currency. Simultaneously it is favorable for identifying forged notes, blacklisting the forged notes numbers and solving the major social problems, such as armor cash carrier robbery, money laundering. For the purpose of recognizing the paper currency numbers, a recognition algorithm based on neural network is presented in the paper. Number lines in original paper currency images can be draw out through image processing, such as image de-noising, skew correction, segmentation, and image normalization. According to the different characteristics between digits and letters in serial number, two kinds of classifiers are designed. With the characteristics of associative memory, optimization-compute and rapid convergence, the Discrete Hopfield Neural Network (DHNN) is utilized to recognize the letters; with the characteristics of simple structure, quick learning and global optimum, the Radial-Basis Function Neural Network (RBFNN) is adopted to identify the digits. Then the final recognition results are obtained by combining the two kinds of recognition results in regular sequence. Through the simulation tests, it is confirmed by simulation results that the recognition algorithm of combination of two kinds of recognition methods has such advantages as high recognition rate and faster recognition simultaneously, which is worthy of broad application prospect.

  14. Fusion of Visible and Thermal Descriptors Using Genetic Algorithms for Face Recognition Systems

    PubMed Central

    Hermosilla, Gabriel; Gallardo, Francisco; Farias, Gonzalo; San Martin, Cesar

    2015-01-01

    The aim of this article is to present a new face recognition system based on the fusion of visible and thermal features obtained from the most current local matching descriptors by maximizing face recognition rates through the use of genetic algorithms. The article considers a comparison of the performance of the proposed fusion methodology against five current face recognition methods and classic fusion techniques used commonly in the literature. These were selected by considering their performance in face recognition. The five local matching methods and the proposed fusion methodology are evaluated using the standard visible/thermal database, the Equinox database, along with a new database, the PUCV-VTF, designed for visible-thermal studies in face recognition and described for the first time in this work. The latter is created considering visible and thermal image sensors with different real-world conditions, such as variations in illumination, facial expression, pose, occlusion, etc. The main conclusions of this article are that two variants of the proposed fusion methodology surpass current face recognition methods and the classic fusion techniques reported in the literature, attaining recognition rates of over 97% and 99% for the Equinox and PUCV-VTF databases, respectively. The fusion methodology is very robust to illumination and expression changes, as it combines thermal and visible information efficiently by using genetic algorithms, thus allowing it to choose optimal face areas where one spectrum is more representative than the other. PMID:26213932

  15. Basics of identification measurement technology

    NASA Astrophysics Data System (ADS)

    Klikushin, Yu N.; Kobenko, V. Yu; Stepanov, P. P.

    2018-01-01

    All available algorithms and suitable for pattern recognition do not give 100% guarantee, therefore there is a field of scientific night activity in this direction, studies are relevant. It is proposed to develop existing technologies for pattern recognition in the form of application of identification measurements. The purpose of the study is to identify the possibility of recognizing images using identification measurement technologies. In solving problems of pattern recognition, neural networks and hidden Markov models are mainly used. A fundamentally new approach to the solution of problems of pattern recognition based on the technology of identification signal measurements (IIS) is proposed. The essence of IIS technology is the quantitative evaluation of the shape of images using special tools and algorithms.

  16. A study of speech emotion recognition based on hybrid algorithm

    NASA Astrophysics Data System (ADS)

    Zhu, Ju-xia; Zhang, Chao; Lv, Zhao; Rao, Yao-quan; Wu, Xiao-pei

    2011-10-01

    To effectively improve the recognition accuracy of the speech emotion recognition system, a hybrid algorithm which combines Continuous Hidden Markov Model (CHMM), All-Class-in-One Neural Network (ACON) and Support Vector Machine (SVM) is proposed. In SVM and ACON methods, some global statistics are used as emotional features, while in CHMM method, instantaneous features are employed. The recognition rate by the proposed method is 92.25%, with the rejection rate to be 0.78%. Furthermore, it obtains the relative increasing of 8.53%, 4.69% and 0.78% compared with ACON, CHMM and SVM methods respectively. The experiment result confirms the efficiency of distinguishing anger, happiness, neutral and sadness emotional states.

  17. Analysis and Recognition of Curve Type as The Basis of Object Recognition in Image

    NASA Astrophysics Data System (ADS)

    Nugraha, Nurma; Madenda, Sarifuddin; Indarti, Dina; Dewi Agushinta, R.; Ernastuti

    2016-06-01

    An object in an image when analyzed further will show the characteristics that distinguish one object with another object in an image. Characteristics that are used in object recognition in an image can be a color, shape, pattern, texture and spatial information that can be used to represent objects in the digital image. The method has recently been developed for image feature extraction on objects that share characteristics curve analysis (simple curve) and use the search feature of chain code object. This study will develop an algorithm analysis and the recognition of the type of curve as the basis for object recognition in images, with proposing addition of complex curve characteristics with maximum four branches that will be used for the process of object recognition in images. Definition of complex curve is the curve that has a point of intersection. By using some of the image of the edge detection, the algorithm was able to do the analysis and recognition of complex curve shape well.

  18. Automatic detection and recognition of traffic signs in stereo images based on features and probabilistic neural networks

    NASA Astrophysics Data System (ADS)

    Sheng, Yehua; Zhang, Ka; Ye, Chun; Liang, Cheng; Li, Jian

    2008-04-01

    Considering the problem of automatic traffic sign detection and recognition in stereo images captured under motion conditions, a new algorithm for traffic sign detection and recognition based on features and probabilistic neural networks (PNN) is proposed in this paper. Firstly, global statistical color features of left image are computed based on statistics theory. Then for red, yellow and blue traffic signs, left image is segmented to three binary images by self-adaptive color segmentation method. Secondly, gray-value projection and shape analysis are used to confirm traffic sign regions in left image. Then stereo image matching is used to locate the homonymy traffic signs in right image. Thirdly, self-adaptive image segmentation is used to extract binary inner core shapes of detected traffic signs. One-dimensional feature vectors of inner core shapes are computed by central projection transformation. Fourthly, these vectors are input to the trained probabilistic neural networks for traffic sign recognition. Lastly, recognition results in left image are compared with recognition results in right image. If results in stereo images are identical, these results are confirmed as final recognition results. The new algorithm is applied to 220 real images of natural scenes taken by the vehicle-borne mobile photogrammetry system in Nanjing at different time. Experimental results show a detection and recognition rate of over 92%. So the algorithm is not only simple, but also reliable and high-speed on real traffic sign detection and recognition. Furthermore, it can obtain geometrical information of traffic signs at the same time of recognizing their types.

  19. Identification of Alfalfa Leaf Diseases Using Image Recognition Technology

    PubMed Central

    Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

    2016-01-01

    Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767

  20. Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.

    PubMed

    Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

    2016-01-01

    Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.

Top