Sample records for automated speech recognition

  1. Automated Assessment of Speech Fluency for L2 English Learners

    ERIC Educational Resources Information Center

    Yoon, Su-Youn

    2009-01-01

    This dissertation provides an automated scoring method of speech fluency for second language learners of English (L2 learners) based that uses speech recognition technology. Non-standard pronunciation, frequent disfluencies, faulty grammar, and inappropriate lexical choices are crucial characteristics of L2 learners' speech. Due to the ease of…

  2. Liberated Learning: Analysis of University Students' Perceptions and Experiences with Continuous Automated Speech Recognition

    ERIC Educational Resources Information Center

    Ryba, Ken; McIvor, Tom; Shakir, Maha; Paez, Di

    2006-01-01

    This study examined continuous automated speech recognition in the university lecture theatre. The participants were both native speakers of English (L1) and English as a second language students (L2) enrolled in an information systems course (Total N=160). After an initial training period, an L2 lecturer in information systems delivered three…

  3. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    NASA Astrophysics Data System (ADS)

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-09-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

  4. Developing an automated speech-recognition telephone diabetes intervention.

    PubMed

    Goldman, Roberta E; Sanchez-Hernandez, Maya; Ross-Degnan, Dennis; Piette, John D; Trinacty, Connie Mah; Simon, Steven R

    2008-08-01

    Many patients do not receive guideline-recommended care for diabetes and other chronic conditions. Automated speech-recognition telephone outreach to supplement in-person physician-patient communication may enhance patient care for chronic illness. We conducted this study to inform the development of an automated telephone outreach intervention for improving diabetes care among members of a large, not-for-profit health plan. In-depth telephone interviews with qualitative analysis. participants Individuals with diabetes (n=36) enrolled in a large regional health plan in the USA. Main outcome measure Patients' opinions about automated speech-recognition telephone technology. Patients who were recently diagnosed with diabetes and some with diabetes for a decade or more expressed basic informational needs. While most would prefer to speak with a live person rather than a computer-recorded voice, many felt that the automated system could successfully supplement the information they receive from their physicians and could serve as an integral part of their care. Patients suggested that such a system could provide specific dietary advice, information about diabetes and its self-care, a call-in menu of information topics, reminders about laboratory test results and appointments, tracking of personal laboratory results and feedback about their self-monitoring. While some patients expressed negative attitudes toward automated speech recognition telephone systems generally, most felt that a variety of functions of such a system could be beneficial to their diabetes care. In-depth interviews resulted in substantive input from health plan members for the design of an automated telephone outreach system to supplement in-person physician-patient communication in this population.

  5. Performance of Automated Speech Scoring on Different Low- to Medium-Entropy Item Types for Low-Proficiency English Learners. Research Report. ETS RR-17-12

    ERIC Educational Resources Information Center

    Loukina, Anastassia; Zechner, Klaus; Yoon, Su-Youn; Zhang, Mo; Tao, Jidong; Wang, Xinhao; Lee, Chong Min; Mulholland, Matthew

    2017-01-01

    This report presents an overview of the "SpeechRater"? automated scoring engine model building and evaluation process for several item types with a focus on a low-English-proficiency test-taker population. We discuss each stage of speech scoring, including automatic speech recognition, filtering models for nonscorable responses, and…

  6. Speech Recognition for A Digital Video Library.

    ERIC Educational Resources Information Center

    Witbrock, Michael J.; Hauptmann, Alexander G.

    1998-01-01

    Production of the meta-data supporting the Informedia Digital Video Library interface is automated using techniques derived from artificial intelligence research. Speech recognition and natural-language processing, information retrieval, and image analysis are applied to produce an interface that helps users locate information and navigate more…

  7. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    NASA Astrophysics Data System (ADS)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  8. Is talking to an automated teller machine natural and fun?

    PubMed

    Chan, F Y; Khalid, H M

    Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.

  9. Automatic concept extraction from spoken medical reports.

    PubMed

    Happe, André; Pouliquen, Bruno; Burgun, Anita; Cuggia, Marc; Le Beux, Pierre

    2003-07-01

    The objective of this project is to investigate methods whereby a combination of speech recognition and automated indexing methods substitute for current transcription and indexing practices. We based our study on existing speech recognition software programs and on NOMINDEX, a tool that extracts MeSH concepts from medical text in natural language and that is mainly based on a French medical lexicon and on the UMLS. For each document, the process consists of three steps: (1) dictation and digital audio recording, (2) speech recognition, (3) automatic indexing. The evaluation consisted of a comparison between the set of concepts extracted by NOMINDEX after the speech recognition phase and the set of keywords manually extracted from the initial document. The method was evaluated on a set of 28 patient discharge summaries extracted from the MENELAS corpus in French, corresponding to in-patients admitted for coronarography. The overall precision was 73% and the overall recall was 90%. Indexing errors were mainly due to word sense ambiguity and abbreviations. A specific issue was the fact that the standard French translation of MeSH terms lacks diacritics. A preliminary evaluation of speech recognition tools showed that the rate of accurate recognition was higher than 98%. Only 3% of the indexing errors were generated by inadequate speech recognition. We discuss several areas to focus on to improve this prototype. However, the very low rate of indexing errors due to speech recognition errors highlights the potential benefits of combining speech recognition techniques and automatic indexing.

  10. Automated smartphone audiometry: Validation of a word recognition test app.

    PubMed

    Dewyer, Nicholas A; Jiradejvong, Patpong; Henderson Sabes, Jennifer; Limb, Charles J

    2018-03-01

    Develop and validate an automated smartphone word recognition test. Cross-sectional case-control diagnostic test comparison. An automated word recognition test was developed as an app for a smartphone with earphones. English-speaking adults with recent audiograms and various levels of hearing loss were recruited from an audiology clinic and were administered the smartphone word recognition test. Word recognition scores determined by the smartphone app and the gold standard speech audiometry test performed by an audiologist were compared. Test scores for 37 ears were analyzed. Word recognition scores determined by the smartphone app and audiologist testing were in agreement, with 86% of the data points within a clinically acceptable margin of error and a linear correlation value between test scores of 0.89. The WordRec automated smartphone app accurately determines word recognition scores. 3b. Laryngoscope, 128:707-712, 2018. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.

  11. Watch what you say, your computer might be listening: A review of automated speech recognition

    NASA Technical Reports Server (NTRS)

    Degennaro, Stephen V.

    1991-01-01

    Spoken language is the most convenient and natural means by which people interact with each other and is, therefore, a promising candidate for human-machine interactions. Speech also offers an additional channel for hands-busy applications, complementing the use of motor output channels for control. Current speech recognition systems vary considerably across a number of important characteristics, including vocabulary size, speaking mode, training requirements for new speakers, robustness to acoustic environments, and accuracy. Algorithmically, these systems range from rule-based techniques through more probabilistic or self-learning approaches such as hidden Markov modeling and neural networks. This tutorial begins with a brief summary of the relevant features of current speech recognition systems and the strengths and weaknesses of the various algorithmic approaches.

  12. Remote voice training: A case study on space shuttle applications, appendix C

    NASA Technical Reports Server (NTRS)

    Mollakarimi, Cindy; Hamid, Tamin

    1990-01-01

    The Tile Automation System includes applications of automation and robotics technology to all aspects of the Shuttle tile processing and inspection system. An integrated set of rapid prototyping testbeds was developed which include speech recognition and synthesis, laser imaging systems, distributed Ada programming environments, distributed relational data base architectures, distributed computer network architectures, multi-media workbenches, and human factors considerations. Remote voice training in the Tile Automation System is discussed. The user is prompted over a headset by synthesized speech for the training sequences. The voice recognition units and the voice output units are remote from the user and are connected by Ethernet to the main computer system. A supervisory channel is used to monitor the training sequences. Discussions include the training approaches as well as the human factors problems and solutions for this system utilizing remote training techniques.

  13. Natural user interface as a supplement of the holographic Raman tweezers

    NASA Astrophysics Data System (ADS)

    Tomori, Zoltan; Kanka, Jan; Kesa, Peter; Jakl, Petr; Sery, Mojmir; Bernatova, Silvie; Antalik, Marian; Zemánek, Pavel

    2014-09-01

    Holographic Raman tweezers (HRT) manipulates with microobjects by controlling the positions of multiple optical traps via the mouse or joystick. Several attempts have appeared recently to exploit touch tablets, 2D cameras or Kinect game console instead. We proposed a multimodal "Natural User Interface" (NUI) approach integrating hands tracking, gestures recognition, eye tracking and speech recognition. For this purpose we exploited "Leap Motion" and "MyGaze" low-cost sensors and a simple speech recognition program "Tazti". We developed own NUI software which processes signals from the sensors and sends the control commands to HRT which subsequently controls the positions of trapping beams, micropositioning stage and the acquisition system of Raman spectra. System allows various modes of operation proper for specific tasks. Virtual tools (called "pin" and "tweezers") serving for the manipulation with particles are displayed on the transparent "overlay" window above the live camera image. Eye tracker identifies the position of the observed particle and uses it for the autofocus. Laser trap manipulation navigated by the dominant hand can be combined with the gestures recognition of the secondary hand. Speech commands recognition is useful if both hands are busy. Proposed methods make manual control of HRT more efficient and they are also a good platform for its future semi-automated and fully automated work.

  14. Automated speech recognition for time recording in out-of-hospital emergency medicine-an experimental approach.

    PubMed

    Gröschel, J; Philipp, F; Skonetzki, St; Genzwürker, H; Wetter, Th; Ellinger, K

    2004-02-01

    Precise documentation of medical treatment in emergency medical missions and for resuscitation is essential from a medical, legal and quality assurance point of view [Anästhesiologie und Intensivmedizin, 41 (2000) 737]. All conventional methods of time recording are either too inaccurate or elaborate for routine application. Automated speech recognition may offer a solution. A special erase programme for the documentation of all time events was developed. Standard speech recognition software (IBM ViaVoice 7.0) was adapted and installed on two different computer systems. One was a stationary PC (500MHz Pentium III, 128MB RAM, Soundblaster PCI 128 Soundcard, Win NT 4.0), the other was a mobile pen-PC that had already proven its value during emergency missions [Der Notarzt 16, p. 177] (Fujitsu Stylistic 2300, 230Mhz MMX Processor, 160MB RAM, embedded soundcard ESS 1879 chipset, Win98 2nd ed.). On both computers two different microphones were tested. One was a standard headset that came with the recognition software, the other was a small microphone (Lavalier-Kondensatormikrofon EM 116 from Vivanco), that could be attached to the operators collar. Seven women and 15 men spoke a text with 29 phrases to be recognised. Two emergency physicians tested the system in a simulated emergency setting using the collar microphone and the pen-PC with an analogue wireless connection. Overall recognition was best for the PC with a headset (89%) followed by the pen-PC with a headset (85%), the PC with a microphone (84%) and the pen-PC with a microphone (80%). Nevertheless, the difference was not statistically significant. Recognition became significantly worse (89.5% versus 82.3%, P<0.0001 ) when numbers had to be recognised. The gender of speaker and the number of words in a sentence had no influence. Average recognition in the simulated emergency setting was 75%. At no time did false recognition appear. Time recording with automated speech recognition seems to be possible in emergency medical missions. Although results show an average recognition of only 75%, it is possible that missing elements may be reconstructed more precisely. Future technology should integrate a secure wireless connection between microphone and mobile computer. The system could then prove its value for real out-of-hospital emergencies.

  15. A discrete speech recognition system for dermatology: 8 years of daily experience in a medical dermatology office.

    PubMed

    Smith, Kevin C

    2002-09-01

    A discrete speech recognition system was customized to recognize approximately 9,000 phrases and terms commonly used in dermatology, and has been used on a daily basis since 1984 to produce as many as 65 consult and follow up letters daily. The systems and procedures necessary for the practical use of this system are described, together with a discussion of the advantages and disadvantages of using this system to automate the transcription of dermatological dictation.

  16. Validation of Automated Scoring of Oral Reading

    ERIC Educational Resources Information Center

    Balogh, Jennifer; Bernstein, Jared; Cheng, Jian; Van Moere, Alistair; Townshend, Brent; Suzuki, Masanori

    2012-01-01

    A two-part experiment is presented that validates a new measurement tool for scoring oral reading ability. Data collected by the U.S. government in a large-scale literacy assessment of adults were analyzed by a system called VersaReader that uses automatic speech recognition and speech processing technologies to score oral reading fluency. In the…

  17. How Accurately Can the Google Web Speech API Recognize and Transcribe Japanese L2 English Learners' Oral Production?

    ERIC Educational Resources Information Center

    Ashwell, Tim; Elam, Jesse R.

    2017-01-01

    The ultimate aim of our research project was to use the Google Web Speech API to automate scoring of elicited imitation (EI) tests. However, in order to achieve this goal, we had to take a number of preparatory steps. We needed to assess how accurate this speech recognition tool is in recognizing native speakers' production of the test items; we…

  18. The Army word recognition system

    NASA Technical Reports Server (NTRS)

    Hadden, David R.; Haratz, David

    1977-01-01

    The application of speech recognition technology in the Army command and control area is presented. The problems associated with this program are described as well as as its relevance in terms of the man/machine interactions, voice inflexions, and the amount of training needed to interact with and utilize the automated system.

  19. An automatic speech recognition system with speaker-independent identification support

    NASA Astrophysics Data System (ADS)

    Caranica, Alexandru; Burileanu, Corneliu

    2015-02-01

    The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.

  20. The software for automatic creation of the formal grammars used by speech recognition, computer vision, editable text conversion systems, and some new functions

    NASA Astrophysics Data System (ADS)

    Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan

    2017-02-01

    For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.

  1. Automated Intelligibility Assessment of Pathological Speech Using Phonological Features

    NASA Astrophysics Data System (ADS)

    Middag, Catherine; Martens, Jean-Pierre; Van Nuffelen, Gwen; De Bodt, Marc

    2009-12-01

    It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words) and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008) is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.

  2. Advances in natural language processing.

    PubMed

    Hirschberg, Julia; Manning, Christopher D

    2015-07-17

    Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.

  3. Automated speech understanding: the next generation

    NASA Astrophysics Data System (ADS)

    Picone, J.; Ebel, W. J.; Deshmukh, N.

    1995-04-01

    Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.

  4. Implementing ICAO Language Proficiency Requirements in the Versant Aviation English Test

    ERIC Educational Resources Information Center

    Van Moere, Alistair; Suzuki, Masanori; Downey, Ryan; Cheng, Jian

    2009-01-01

    This paper discusses the development of an assessment to satisfy the International Civil Aviation Organization (ICAO) Language Proficiency Requirements. The Versant Aviation English Test utilizes speech recognition technology and a computerized testing platform, such that test administration and scoring are fully automated. Developed in…

  5. Speech and gesture interfaces for squad-level human-robot teaming

    NASA Astrophysics Data System (ADS)

    Harris, Jonathan; Barber, Daniel

    2014-06-01

    As the military increasingly adopts semi-autonomous unmanned systems for military operations, utilizing redundant and intuitive interfaces for communication between Soldiers and robots is vital to mission success. Currently, Soldiers use a common lexicon to verbally and visually communicate maneuvers between teammates. In order for robots to be seamlessly integrated within mixed-initiative teams, they must be able to understand this lexicon. Recent innovations in gaming platforms have led to advancements in speech and gesture recognition technologies, but the reliability of these technologies for enabling communication in human robot teaming is unclear. The purpose for the present study is to investigate the performance of Commercial-Off-The-Shelf (COTS) speech and gesture recognition tools in classifying a Squad Level Vocabulary (SLV) for a spatial navigation reconnaissance and surveillance task. The SLV for this study was based on findings from a survey conducted with Soldiers at Fort Benning, GA. The items of the survey focused on the communication between the Soldier and the robot, specifically in regards to verbally instructing them to execute reconnaissance and surveillance tasks. Resulting commands, identified from the survey, were then converted to equivalent arm and hand gestures, leveraging existing visual signals (e.g. U.S. Army Field Manual for Visual Signaling). A study was then run to test the ability of commercially available automated speech recognition technologies and a gesture recognition glove to classify these commands in a simulated intelligence, surveillance, and reconnaissance task. This paper presents classification accuracy of these devices for both speech and gesture modalities independently.

  6. Artificial intelligence, expert systems, computer vision, and natural language processing

    NASA Technical Reports Server (NTRS)

    Gevarter, W. B.

    1984-01-01

    An overview of artificial intelligence (AI), its core ingredients, and its applications is presented. The knowledge representation, logic, problem solving approaches, languages, and computers pertaining to AI are examined, and the state of the art in AI is reviewed. The use of AI in expert systems, computer vision, natural language processing, speech recognition and understanding, speech synthesis, problem solving, and planning is examined. Basic AI topics, including automation, search-oriented problem solving, knowledge representation, and computational logic, are discussed.

  7. Morphosyntactic Neural Analysis for Generalized Lexical Normalization

    ERIC Educational Resources Information Center

    Leeman-Munk, Samuel Paul

    2016-01-01

    The phenomenal growth of social media, web forums, and online reviews has spurred a growing interest in automated analysis of user-generated text. At the same time, a proliferation of voice recordings and efforts to archive culture heritage documents are fueling demand for effective automatic speech recognition (ASR) and optical character…

  8. Mathematical Modelling for the Evaluation of Automated Speech Recognition Systems--Research Area 3.3.1 (c)

    DTIC Science & Technology

    2016-01-07

    news. Both of these resemble typical activities of intelligence analysts in OSINT processing and production applications. We assessed two task...intelligence analysts in a number of OSINT processing and production applications. (5) Summary of the most important results In both settings

  9. Automation study for space station subsystems and mission ground support

    NASA Technical Reports Server (NTRS)

    1985-01-01

    An automation concept for the autonomous operation of space station subsystems, i.e., electric power, thermal control, and communications and tracking are discussed. To assure that functions essential for autonomous operations are not neglected, an operations function (systems monitoring and control) is included in the discussion. It is recommended that automated speech recognition and synthesis be considered a basic mode of man/machine interaction for space station command and control, and that the data management system (DMS) and other systems on the space station be designed to accommodate fully automated fault detection, isolation, and recovery within the system monitoring function of the DMS.

  10. Capturing patient information at nursing shift changes: methodological evaluation of speech recognition and information extraction

    PubMed Central

    Suominen, Hanna; Johnson, Maree; Zhou, Liyuan; Sanchez, Paula; Sirel, Raul; Basilakis, Jim; Hanlen, Leif; Estival, Dominique; Dawson, Linda; Kelly, Barbara

    2015-01-01

    Objective We study the use of speech recognition and information extraction to generate drafts of Australian nursing-handover documents. Methods Speech recognition correctness and clinicians’ preferences were evaluated using 15 recorder–microphone combinations, six documents, three speakers, Dragon Medical 11, and five survey/interview participants. Information extraction correctness evaluation used 260 documents, six-class classification for each word, two annotators, and the CRF++ conditional random field toolkit. Results A noise-cancelling lapel-microphone with a digital voice recorder gave the best correctness (79%). This microphone was also the most preferred option by all but one participant. Although the participants liked the small size of this recorder, their preference was for tablets that can also be used for document proofing and sign-off, among other tasks. Accented speech was harder to recognize than native language and a male speaker was detected better than a female speaker. Information extraction was excellent in filtering out irrelevant text (85% F1) and identifying text relevant to two classes (87% and 70% F1). Similarly to the annotators’ disagreements, there was confusion between the remaining three classes, which explains the modest 62% macro-averaged F1. Discussion We present evidence for the feasibility of speech recognition and information extraction to support clinicians’ in entering text and unlock its content for computerized decision-making and surveillance in healthcare. Conclusions The benefits of this automation include storing all information; making the drafts available and accessible almost instantly to everyone with authorized access; and avoiding information loss, delays, and misinterpretations inherent to using a ward clerk or transcription services. PMID:25336589

  11. MARTI: man-machine animation real-time interface

    NASA Astrophysics Data System (ADS)

    Jones, Christian M.; Dlay, Satnam S.

    1997-05-01

    The research introduces MARTI (man-machine animation real-time interface) for the realization of natural human-machine interfacing. The system uses simple vocal sound-tracks of human speakers to provide lip synchronization of computer graphical facial models. We present novel research in a number of engineering disciplines, which include speech recognition, facial modeling, and computer animation. This interdisciplinary research utilizes the latest, hybrid connectionist/hidden Markov model, speech recognition system to provide very accurate phone recognition and timing for speaker independent continuous speech, and expands on knowledge from the animation industry in the development of accurate facial models and automated animation. The research has many real-world applications which include the provision of a highly accurate and 'natural' man-machine interface to assist user interactions with computer systems and communication with one other using human idiosyncrasies; a complete special effects and animation toolbox providing automatic lip synchronization without the normal constraints of head-sets, joysticks, and skilled animators; compression of video data to well below standard telecommunication channel bandwidth for video communications and multi-media systems; assisting speech training and aids for the handicapped; and facilitating player interaction for 'video gaming' and 'virtual worlds.' MARTI has introduced a new level of realism to man-machine interfacing and special effect animation which has been previously unseen.

  12. Comparison of voice-automated transcription and human transcription in generating pathology reports.

    PubMed

    Al-Aynati, Maamoun M; Chorneyko, Katherine A

    2003-06-01

    Software that can convert spoken words into written text has been available since the early 1980s. Early continuous speech systems were developed in 1994, with the latest commercially available editions having a claimed accuracy of up to 98% of speech recognition at natural speech rates. To evaluate the efficacy of one commercially available voice-recognition software system with pathology vocabulary in generating pathology reports and to compare this with human transcription. To draw cost analysis conclusions regarding human versus computer-based transcription. Two hundred six routine pathology reports from the surgical pathology material handled at St Joseph's Healthcare, Hamilton, Ontario, were generated simultaneously using computer-based transcription and human transcription. The following hardware and software were used: a desktop 450-MHz Intel Pentium III processor with 192 MB of RAM, a speech-quality sound card (Sound Blaster), noise-canceling headset microphone, and IBM ViaVoice Pro version 8 with pathology vocabulary support (Voice Automated, Huntington Beach, Calif). The cost of the hardware and software used was approximately Can 2250 dollars. A total of 23 458 words were transcribed using both methods with a mean of 114 words per report. The mean accuracy rate was 93.6% (range, 87.4%-96%) using the computer software, compared to a mean accuracy of 99.6% (range, 99.4%-99.8%) for human transcription (P <.001). Time needed to edit documents by the primary evaluator (M.A.) using the computer was on average twice that needed for editing the documents produced by human transcriptionists (range, 1.4-3.5 times). The extra time needed to edit documents was 67 minutes per week (13 minutes per day). Computer-based continuous speech-recognition systems in pathology can be successfully used in pathology practice even during the handling of gross pathology specimens. The relatively low accuracy rate of this voice-recognition software with resultant increased editing burden on pathologists may not encourage its application on a wide scale in pathology departments with sufficient human transcription services, despite significant potential financial savings. However, computer-based transcription represents an attractive and relatively inexpensive alternative to human transcription in departments where there is a shortage of transcription services, and will no doubt become more commonly used in pathology departments in the future.

  13. Gene/protein name recognition based on support vector machine using dictionary as features.

    PubMed

    Mitsumori, Tomohiro; Fation, Sevrani; Murata, Masaki; Doi, Kouichi; Doi, Hirohumi

    2005-01-01

    Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.

  14. Automated recognition of helium speech. Phase I: Investigation of microprocessor based analysis/synthesis system

    NASA Astrophysics Data System (ADS)

    Jelinek, H. J.

    1986-01-01

    This is the Final Report of Electronic Design Associates on its Phase I SBIR project. The purpose of this project is to develop a method for correcting helium speech, as experienced in diver-surface communication. The goal of the Phase I study was to design, prototype, and evaluate a real time helium speech corrector system based upon digital signal processing techniques. The general approach was to develop hardware (an IBM PC board) to digitize helium speech and software (a LAMBDA computer based simulation) to translate the speech. As planned in the study proposal, this initial prototype may now be used to assess expected performance from a self contained real time system which uses an identical algorithm. The Final Report details the work carried out to produce the prototype system. Four major project tasks were: a signal processing scheme for converting helium speech to normal sounding speech was generated. The signal processing scheme was simulated on a general purpose (LAMDA) computer. Actual helium speech was supplied to the simulation and the converted speech was generated. An IBM-PC based 14 bit data Input/Output board was designed and built. A bibliography of references on speech processing was generated.

  15. University of Colorado Dialog Systems for Travel and Navigation

    DTIC Science & Technology

    2001-01-01

    understanding technologies using the DARPA Hub Architecture. Users are able to converse with an automated travel agent over the phone to retrieve up-to-date...travel information such as flight schedules, pricing, along with hotel and rental car availability. The CU Communicator has been under development...implementation of the DARPA Communicator task [3]. The system combines continuous speech recognition, natural language understanding and flexible dialogue

  16. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech

    PubMed Central

    Tóth, László; Hoffmann, Ildikó; Gosztolya, Gábor; Vincze, Veronika; Szatlóczki, Gréta; Bánréti, Zoltán; Pákáski, Magdolna; Kálmán, János

    2018-01-01

    Background: Even today the reliable diagnosis of the prodromal stages of Alzheimer’s disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive de-cline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Methods: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech sig-nals, first manually (using the Praat software), and then automatically, with an automatic speech recogni-tion (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. Results: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process – that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. Conclusion: The temporal analysis of spontaneous speech can be exploited in implementing a new, auto-matic detection-based tool for screening MCI for the community. PMID:29165085

  17. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech.

    PubMed

    Toth, Laszlo; Hoffmann, Ildiko; Gosztolya, Gabor; Vincze, Veronika; Szatloczki, Greta; Banreti, Zoltan; Pakaski, Magdolna; Kalman, Janos

    2018-01-01

    Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process - that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Automatic speech recognition technology development at ITT Defense Communications Division

    NASA Technical Reports Server (NTRS)

    White, George M.

    1977-01-01

    An assessment of the applications of automatic speech recognition to defense communication systems is presented. Future research efforts include investigations into the following areas: (1) dynamic programming; (2) recognition of speech degraded by noise; (3) speaker independent recognition; (4) large vocabulary recognition; (5) word spotting and continuous speech recognition; and (6) isolated word recognition.

  19. Perception and performance in flight simulators: The contribution of vestibular, visual, and auditory information

    NASA Technical Reports Server (NTRS)

    1979-01-01

    The pilot's perception and performance in flight simulators is examined. The areas investigated include: vestibular stimulation, flight management and man cockpit information interfacing, and visual perception in flight simulation. The effects of higher levels of rotary acceleration on response time to constant acceleration, tracking performance, and thresholds for angular acceleration are examined. Areas of flight management examined are cockpit display of traffic information, work load, synthetic speech call outs during the landing phase of flight, perceptual factors in the use of a microwave landing system, automatic speech recognition, automation of aircraft operation, and total simulation of flight training.

  20. Fifty years of progress in speech and speaker recognition

    NASA Astrophysics Data System (ADS)

    Furui, Sadaoki

    2004-10-01

    Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.

  1. Hands-free human-machine interaction with voice

    NASA Astrophysics Data System (ADS)

    Juang, B. H.

    2004-05-01

    Voice is natural communication interface between a human and a machine. The machine, when placed in today's communication networks, may be configured to provide automation to save substantial operating cost, as demonstrated in AT&T's VRCP (Voice Recognition Call Processing), or to facilitate intelligent services, such as virtual personal assistants, to enhance individual productivity. These intelligent services often need to be accessible anytime, anywhere (e.g., in cars when the user is in a hands-busy-eyes-busy situation or during meetings where constantly talking to a microphone is either undersirable or impossible), and thus call for advanced signal processing and automatic speech recognition techniques which support what we call ``hands-free'' human-machine communication. These techniques entail a broad spectrum of technical ideas, ranging from use of directional microphones and acoustic echo cancellatiion to robust speech recognition. In this talk, we highlight a number of key techniques that were developed for hands-free human-machine communication in the mid-1990s after Bell Labs became a unit of Lucent Technologies. A video clip will be played to demonstrate the accomplishement.

  2. Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise

    PubMed Central

    Carroll, Rebecca; Warzybok, Anna; Kollmeier, Birger; Ruigendijk, Esther

    2016-01-01

    Vocabulary size has been suggested as a useful measure of “verbal abilities” that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is less clear. Age-related differences in linguistic measures may predict age-related differences in speech recognition in noise performance. We hypothesized that speech recognition performance can be predicted by the efficiency of lexical access, which refers to the speed with which a given word can be searched and accessed relative to the size of the mental lexicon. We tested speech recognition in a clinical German sentence-in-noise test at two signal-to-noise ratios (SNRs), in 22 younger (18–35 years) and 22 older (60–78 years) listeners with normal hearing. We also assessed receptive vocabulary, lexical access time, verbal working memory, and hearing thresholds as measures of individual differences. Age group, SNR level, vocabulary size, and lexical access time were significant predictors of individual speech recognition scores, but working memory and hearing threshold were not. Interestingly, longer accessing times were correlated with better speech recognition scores. Hierarchical regression models for each subset of age group and SNR showed very similar patterns: the combination of vocabulary size and lexical access time contributed most to speech recognition performance; only for the younger group at the better SNR (yielding about 85% correct speech recognition) did vocabulary size alone predict performance. Our data suggest that successful speech recognition in noise is mainly modulated by the efficiency of lexical access. This suggests that older adults’ poorer performance in the speech recognition task may have arisen from reduced efficiency in lexical access; with an average vocabulary size similar to that of younger adults, they were still slower in lexical access. PMID:27458400

  3. Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise.

    PubMed

    Carroll, Rebecca; Warzybok, Anna; Kollmeier, Birger; Ruigendijk, Esther

    2016-01-01

    Vocabulary size has been suggested as a useful measure of "verbal abilities" that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is less clear. Age-related differences in linguistic measures may predict age-related differences in speech recognition in noise performance. We hypothesized that speech recognition performance can be predicted by the efficiency of lexical access, which refers to the speed with which a given word can be searched and accessed relative to the size of the mental lexicon. We tested speech recognition in a clinical German sentence-in-noise test at two signal-to-noise ratios (SNRs), in 22 younger (18-35 years) and 22 older (60-78 years) listeners with normal hearing. We also assessed receptive vocabulary, lexical access time, verbal working memory, and hearing thresholds as measures of individual differences. Age group, SNR level, vocabulary size, and lexical access time were significant predictors of individual speech recognition scores, but working memory and hearing threshold were not. Interestingly, longer accessing times were correlated with better speech recognition scores. Hierarchical regression models for each subset of age group and SNR showed very similar patterns: the combination of vocabulary size and lexical access time contributed most to speech recognition performance; only for the younger group at the better SNR (yielding about 85% correct speech recognition) did vocabulary size alone predict performance. Our data suggest that successful speech recognition in noise is mainly modulated by the efficiency of lexical access. This suggests that older adults' poorer performance in the speech recognition task may have arisen from reduced efficiency in lexical access; with an average vocabulary size similar to that of younger adults, they were still slower in lexical access.

  4. Should visual speech cues (speechreading) be considered when fitting hearing aids?

    NASA Astrophysics Data System (ADS)

    Grant, Ken

    2002-05-01

    When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.

  5. SAM: speech-aware applications in medicine to support structured data entry.

    PubMed Central

    Wormek, A. K.; Ingenerf, J.; Orthner, H. F.

    1997-01-01

    In the last two years, improvement in speech recognition technology has directed the medical community's interest to porting and using such innovations in clinical systems. The acceptance of speech recognition systems in clinical domains increases with recognition speed, large medical vocabulary, high accuracy, continuous speech recognition, and speaker independence. Although some commercial speech engines approach these requirements, the greatest benefit can be achieved in adapting a speech recognizer to a specific medical application. The goals of our work are first, to develop a speech-aware core component which is able to establish connections to speech recognition engines of different vendors. This is realized in SAM. Second, with applications based on SAM we want to support the physician in his/her routine clinical care activities. Within the STAMP project (STAndardized Multimedia report generator in Pathology), we extend SAM by combining a structured data entry approach with speech recognition technology. Another speech-aware application in the field of Diabetes care is connected to a terminology server. The server delivers a controlled vocabulary which can be used for speech recognition. PMID:9357730

  6. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    NASA Astrophysics Data System (ADS)

    Heracleous, Panikos; Kaino, Tomomi; Saruwatari, Hiroshi; Shikano, Kiyohiro

    2006-12-01

    We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible) speech, but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc.) for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a[InlineEquation not available: see fulltext.] word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  7. Speech recognition technology: an outlook for human-to-machine interaction.

    PubMed

    Erdel, T; Crooks, S

    2000-01-01

    Speech recognition, as an enabling technology in healthcare-systems computing, is a topic that has been discussed for quite some time, but is just now coming to fruition. Traditionally, speech-recognition software has been constrained by hardware, but improved processors and increased memory capacities are starting to remove some of these limitations. With these barriers removed, companies that create software for the healthcare setting have the opportunity to write more successful applications. Among the criticisms of speech-recognition applications are the high rates of error and steep training curves. However, even in the face of such negative perceptions, there remains significant opportunities for speech recognition to allow healthcare providers and, more specifically, physicians, to work more efficiently and ultimately spend more time with their patients and less time completing necessary documentation. This article will identify opportunities for inclusion of speech-recognition technology in the healthcare setting and examine major categories of speech-recognition software--continuous speech recognition, command and control, and text-to-speech. We will discuss the advantages and disadvantages of each area, the limitations of the software today, and how future trends might affect them.

  8. Voice-processing technologies--their application in telecommunications.

    PubMed Central

    Wilpon, J G

    1995-01-01

    As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper. Images Fig. 1 PMID:7479815

  9. Relationship between listeners' nonnative speech recognition and categorization abilities

    PubMed Central

    Atagi, Eriko; Bent, Tessa

    2015-01-01

    Enhancement of the perceptual encoding of talker characteristics (indexical information) in speech can facilitate listeners' recognition of linguistic content. The present study explored this indexical-linguistic relationship in nonnative speech processing by examining listeners' performance on two tasks: nonnative accent categorization and nonnative speech-in-noise recognition. Results indicated substantial variability across listeners in their performance on both the accent categorization and nonnative speech recognition tasks. Moreover, listeners' accent categorization performance correlated with their nonnative speech-in-noise recognition performance. These results suggest that having more robust indexical representations for nonnative accents may allow listeners to more accurately recognize the linguistic content of nonnative speech. PMID:25618098

  10. Method and apparatus for obtaining complete speech signals for speech recognition applications

    NASA Technical Reports Server (NTRS)

    Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

    2009-01-01

    The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

  11. Temporal Sensitivity Measured Shortly After Cochlear Implantation Predicts 6-Month Speech Recognition Outcome.

    PubMed

    Erb, Julia; Ludwig, Alexandra Annemarie; Kunke, Dunja; Fuchs, Michael; Obleser, Jonas

    2018-04-24

    Psychoacoustic tests assessed shortly after cochlear implantation are useful predictors of the rehabilitative speech outcome. While largely independent, both spectral and temporal resolution tests are important to provide an accurate prediction of speech recognition. However, rapid tests of temporal sensitivity are currently lacking. Here, we propose a simple amplitude modulation rate discrimination (AMRD) paradigm that is validated by predicting future speech recognition in adult cochlear implant (CI) patients. In 34 newly implanted patients, we used an adaptive AMRD paradigm, where broadband noise was modulated at the speech-relevant rate of ~4 Hz. In a longitudinal study, speech recognition in quiet was assessed using the closed-set Freiburger number test shortly after cochlear implantation (t0) as well as the open-set Freiburger monosyllabic word test 6 months later (t6). Both AMRD thresholds at t0 (r = -0.51) and speech recognition scores at t0 (r = 0.56) predicted speech recognition scores at t6. However, AMRD and speech recognition at t0 were uncorrelated, suggesting that those measures capture partially distinct perceptual abilities. A multiple regression model predicting 6-month speech recognition outcome with deafness duration and speech recognition at t0 improved from adjusted R = 0.30 to adjusted R = 0.44 when AMRD threshold was added as a predictor. These findings identify AMRD thresholds as a reliable, nonredundant predictor above and beyond established speech tests for CI outcome. This AMRD test could potentially be developed into a rapid clinical temporal-resolution test to be integrated into the postoperative test battery to improve the reliability of speech outcome prognosis.

  12. Audibility-based predictions of speech recognition for children and adults with normal hearing.

    PubMed

    McCreery, Ryan W; Stelmachowicz, Patricia G

    2011-12-01

    This study investigated the relationship between audibility and predictions of speech recognition for children and adults with normal hearing. The Speech Intelligibility Index (SII) is used to quantify the audibility of speech signals and can be applied to transfer functions to predict speech recognition scores. Although the SII is used clinically with children, relatively few studies have evaluated SII predictions of children's speech recognition directly. Children have required more audibility than adults to reach maximum levels of speech understanding in previous studies. Furthermore, children may require greater bandwidth than adults for optimal speech understanding, which could influence frequency-importance functions used to calculate the SII. Speech recognition was measured for 116 children and 19 adults with normal hearing. Stimulus bandwidth and background noise level were varied systematically in order to evaluate speech recognition as predicted by the SII and derive frequency-importance functions for children and adults. Results suggested that children required greater audibility to reach the same level of speech understanding as adults. However, differences in performance between adults and children did not vary across frequency bands. © 2011 Acoustical Society of America

  13. The Suitability of Cloud-Based Speech Recognition Engines for Language Learning

    ERIC Educational Resources Information Center

    Daniels, Paul; Iwago, Koji

    2017-01-01

    As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…

  14. Individual differences in language and working memory affect children's speech recognition in noise.

    PubMed

    McCreery, Ryan W; Spratford, Meredith; Kirby, Benjamin; Brennan, Marc

    2017-05-01

    We examined how cognitive and linguistic skills affect speech recognition in noise for children with normal hearing. Children with better working memory and language abilities were expected to have better speech recognition in noise than peers with poorer skills in these domains. As part of a prospective, cross-sectional study, children with normal hearing completed speech recognition in noise for three types of stimuli: (1) monosyllabic words, (2) syntactically correct but semantically anomalous sentences and (3) semantically and syntactically anomalous word sequences. Measures of vocabulary, syntax and working memory were used to predict individual differences in speech recognition in noise. Ninety-six children with normal hearing, who were between 5 and 12 years of age. Higher working memory was associated with better speech recognition in noise for all three stimulus types. Higher vocabulary abilities were associated with better recognition in noise for sentences and word sequences, but not for words. Working memory and language both influence children's speech recognition in noise, but the relationships vary across types of stimuli. These findings suggest that clinical assessment of speech recognition is likely to reflect underlying cognitive and linguistic abilities, in addition to a child's auditory skills, consistent with the Ease of Language Understanding model.

  15. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

    PubMed Central

    Shin, Young Hoon; Seo, Jiwon

    2016-01-01

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867

  16. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

    PubMed

    Shin, Young Hoon; Seo, Jiwon

    2016-10-29

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  17. Effects and modeling of phonetic and acoustic confusions in accented speech.

    PubMed

    Fung, Pascale; Liu, Yi

    2005-11-01

    Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.

  18. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.

    PubMed

    Giannakopoulos, Theodoros

    2015-01-01

    Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.

  19. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis

    PubMed Central

    Giannakopoulos, Theodoros

    2015-01-01

    Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library. PMID:26656189

  20. Finding Acoustic Regularities in Speech: Applications to Phonetic Recognition

    DTIC Science & Technology

    1988-12-01

    University Press, Indiana, I 1977. [12] N. Chomsky and M. Halle, The Sound Patterns of English, Harper and Row, New York, 1968. l 129 I BIBLIOGRAPHY [13] Y.L...segments are related to the phonemes by a grammar which is determined using. automated procedures operating on a set of training data. Thus important...segments which are described completely in acoustic terms. Next, these acous- tic segments are related to the phonemes by a grammar which is determined

  1. An Innovative Speech-Based User Interface for Smarthomes and IoT Solutions to Help People with Speech and Motor Disabilities.

    PubMed

    Malavasi, Massimiliano; Turri, Enrico; Atria, Jose Joaquin; Christensen, Heidi; Marxer, Ricard; Desideri, Lorenzo; Coy, Andre; Tamburini, Fabio; Green, Phil

    2017-01-01

    A better use of the increasing functional capabilities of home automation systems and Internet of Things (IoT) devices to support the needs of users with disability, is the subject of a research project currently conducted by Area Ausili (Assistive Technology Area), a department of Polo Tecnologico Regionale Corte Roncati of the Local Health Trust of Bologna (Italy), in collaboration with AIAS Ausilioteca Assistive Technology (AT) Team. The main aim of the project is to develop experimental low cost systems for environmental control through simplified and accessible user interfaces. Many of the activities are focused on automatic speech recognition and are developed in the framework of the CloudCAST project. In this paper we report on the first technical achievements of the project and discuss future possible developments and applications within and outside CloudCAST.

  2. Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review

    ERIC Educational Resources Information Center

    Young, Victoria; Mihailidis, Alex

    2010-01-01

    Despite their growing presence in home computer applications and various telephony services, commercial automatic speech recognition technologies are still not easily employed by everyone; especially individuals with speech disorders. In addition, relatively little research has been conducted on automatic speech recognition performance with older…

  3. The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise.

    PubMed

    Shen, Jing; Souza, Pamela E

    2017-09-18

    This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for speech recognition in noise. Last, we explored the individual factors that predict the amount of dynamic-pitch benefit for speech recognition in noise. Younger listeners with normal hearing and older listeners with varying levels of hearing sensitivity participated in the study, in which speech reception thresholds were measured with sentences in nonspeech noise. The younger listeners benefited more from dynamic pitch for speech recognition in temporally modulated noise than unmodulated noise. Older listeners were able to benefit from the dynamic-pitch cues but received less benefit from noise modulation than the younger listeners. For those older listeners with hearing loss, the amount of hearing loss strongly predicted the dynamic-pitch benefit for speech recognition in noise. Dynamic-pitch cues aid speech recognition in noise, particularly when noise has temporal modulation. Hearing loss negatively affects the dynamic-pitch benefit to older listeners with significant hearing loss.

  4. The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise

    PubMed Central

    Souza, Pamela E.

    2017-01-01

    Purpose This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for speech recognition in noise. Last, we explored the individual factors that predict the amount of dynamic-pitch benefit for speech recognition in noise. Method Younger listeners with normal hearing and older listeners with varying levels of hearing sensitivity participated in the study, in which speech reception thresholds were measured with sentences in nonspeech noise. Results The younger listeners benefited more from dynamic pitch for speech recognition in temporally modulated noise than unmodulated noise. Older listeners were able to benefit from the dynamic-pitch cues but received less benefit from noise modulation than the younger listeners. For those older listeners with hearing loss, the amount of hearing loss strongly predicted the dynamic-pitch benefit for speech recognition in noise. Conclusions Dynamic-pitch cues aid speech recognition in noise, particularly when noise has temporal modulation. Hearing loss negatively affects the dynamic-pitch benefit to older listeners with significant hearing loss. PMID:28800370

  5. [Creating language model of the forensic medicine domain for developing a autopsy recording system by automatic speech recognition].

    PubMed

    Niijima, H; Ito, N; Ogino, S; Takatori, T; Iwase, H; Kobayashi, M

    2000-11-01

    For the purpose of practical use of speech recognition technology for recording of forensic autopsy, a language model of the speech recording system, specialized for the forensic autopsy, was developed. The language model for the forensic autopsy by applying 3-gram model was created, and an acoustic model for Japanese speech recognition by Hidden Markov Model in addition to the above were utilized to customize the speech recognition engine for forensic autopsy. A forensic vocabulary set of over 10,000 words was compiled and some 300,000 sentence patterns were made to create the forensic language model, then properly mixing with a general language model to attain high exactitude. When tried by dictating autopsy findings, this speech recognition system was proved to be about 95% of recognition rate that seems to have reached to the practical usability in view of speech recognition software, though there remains rooms for improving its hardware and application-layer software.

  6. Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition.

    PubMed

    Díaz, Begoña; Blank, Helen; von Kriegstein, Katharina

    2018-05-14

    The cerebral cortex modulates early sensory processing via feed-back connections to sensory pathway nuclei. The functions of this top-down modulation for human behavior are poorly understood. Here, we show that top-down modulation of the visual sensory thalamus (the lateral geniculate body, LGN) is involved in visual-speech recognition. In two independent functional magnetic resonance imaging (fMRI) studies, LGN response increased when participants processed fast-varying features of articulatory movements required for visual-speech recognition, as compared to temporally more stable features required for face identification with the same stimulus material. The LGN response during the visual-speech task correlated positively with the visual-speech recognition scores across participants. In addition, the task-dependent modulation was present for speech movements and did not occur for control conditions involving non-speech biological movements. In face-to-face communication, visual speech recognition is used to enhance or even enable understanding what is said. Speech recognition is commonly explained in frameworks focusing on cerebral cortex areas. Our findings suggest that task-dependent modulation at subcortical sensory stages has an important role for communication: Together with similar findings in the auditory modality the findings imply that task-dependent modulation of the sensory thalami is a general mechanism to optimize speech recognition. Copyright © 2018. Published by Elsevier Inc.

  7. Syntactic error modeling and scoring normalization in speech recognition: Error modeling and scoring normalization in the speech recognition task for adult literacy training

    NASA Technical Reports Server (NTRS)

    Olorenshaw, Lex; Trawick, David

    1991-01-01

    The purpose was to develop a speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Better mechanisms are provided for using speech recognition in a literacy tutor application. Using a combination of scoring normalization techniques and cheater-mode decoding, a reasonable acceptance/rejection threshold was provided. In continuous speech, the system was tested to be able to provide above 80 pct. correct acceptance of words, while correctly rejecting over 80 pct. of incorrectly pronounced words.

  8. Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L)

    NASA Astrophysics Data System (ADS)

    Scharenborg, Odette; ten Bosch, Louis; Boves, Lou; Norris, Dennis

    2003-12-01

    This letter evaluates potential benefits of combining human speech recognition (HSR) and automatic speech recognition by building a joint model of an automatic phone recognizer (APR) and a computational model of HSR, viz., Shortlist [Norris, Cognition 52, 189-234 (1994)]. Experiments based on ``real-life'' speech highlight critical limitations posed by some of the simplifying assumptions made in models of human speech recognition. These limitations could be overcome by avoiding hard phone decisions at the output side of the APR, and by using a match between the input and the internal lexicon that flexibly copes with deviations from canonical phonemic representations.

  9. Increase in Speech Recognition due to Linguistic Mismatch Between Target and Masker Speech: Monolingual and Simultaneous Bilingual Performance

    PubMed Central

    Calandruccio, Lauren; Zhou, Haibo

    2014-01-01

    Purpose To examine whether improved speech recognition during linguistically mismatched target–masker experiments is due to linguistic unfamiliarity of the masker speech or linguistic dissimilarity between the target and masker speech. Method Monolingual English speakers (n = 20) and English–Greek simultaneous bilinguals (n = 20) listened to English sentences in the presence of competing English and Greek speech. Data were analyzed using mixed-effects regression models to determine differences in English recogition performance between the 2 groups and 2 masker conditions. Results Results indicated that English sentence recognition for monolinguals and simultaneous English–Greek bilinguals improved when the masker speech changed from competing English to competing Greek speech. Conclusion The improvement in speech recognition that has been observed for linguistically mismatched target–masker experiments cannot be simply explained by the masker language being linguistically unknown or unfamiliar to the listeners. Listeners can improve their speech recognition in linguistically mismatched target–masker experiments even when the listener is able to obtain meaningful linguistic information from the masker speech. PMID:24167230

  10. The Effect of Lexical Content on Dichotic Speech Recognition in Older Adults.

    PubMed

    Findlen, Ursula M; Roup, Christina M

    2016-01-01

    Age-related auditory processing deficits have been shown to negatively affect speech recognition for older adult listeners. In contrast, older adults gain benefit from their ability to make use of semantic and lexical content of the speech signal (i.e., top-down processing), particularly in complex listening situations. Assessment of auditory processing abilities among aging adults should take into consideration semantic and lexical content of the speech signal. The purpose of this study was to examine the effects of lexical and attentional factors on dichotic speech recognition performance characteristics for older adult listeners. A repeated measures design was used to examine differences in dichotic word recognition as a function of lexical and attentional factors. Thirty-five older adults (61-85 yr) with sensorineural hearing loss participated in this study. Dichotic speech recognition was evaluated using consonant-vowel-consonant (CVC) word and nonsense CVC syllable stimuli administered in the free recall, directed recall right, and directed recall left response conditions. Dichotic speech recognition performance for nonsense CVC syllables was significantly poorer than performance for CVC words. Dichotic recognition performance varied across response condition for both stimulus types, which is consistent with previous studies on dichotic speech recognition. Inspection of individual results revealed that five listeners demonstrated an auditory-based left ear deficit for one or both stimulus types. Lexical content of stimulus materials affects performance characteristics for dichotic speech recognition tasks in the older adult population. The use of nonsense CVC syllable material may provide a way to assess dichotic speech recognition performance while potentially lessening the effects of lexical content on performance (i.e., measuring bottom-up auditory function both with and without top-down processing). American Academy of Audiology.

  11. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogden, J.

    The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore » decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.« less

  12. An articulatorily constrained, maximum entropy approach to speech recognition and speech coding

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogden, J.

    Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values aremore » constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.« less

  13. Neural network wavelet technology: A frontier of automation

    NASA Technical Reports Server (NTRS)

    Szu, Harold

    1994-01-01

    Neural networks are an outgrowth of interdisciplinary studies concerning the brain. These studies are guiding the field of Artificial Intelligence towards the, so-called, 6th Generation Computer. Enormous amounts of resources have been poured into R/D. Wavelet Transforms (WT) have replaced Fourier Transforms (FT) in Wideband Transient (WT) cases since the discovery of WT in 1985. The list of successful applications includes the following: earthquake prediction; radar identification; speech recognition; stock market forecasting; FBI finger print image compression; and telecommunication ISDN-data compression.

  14. Automated Speech Rate Measurement in Dysarthria

    ERIC Educational Resources Information Center

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  15. Is Listening in Noise Worth It? The Neurobiology of Speech Recognition in Challenging Listening Conditions.

    PubMed

    Eckert, Mark A; Teubner-Rhodes, Susan; Vaden, Kenneth I

    2016-01-01

    This review examines findings from functional neuroimaging studies of speech recognition in noise to provide a neural systems level explanation for the effort and fatigue that can be experienced during speech recognition in challenging listening conditions. Neuroimaging studies of speech recognition consistently demonstrate that challenging listening conditions engage neural systems that are used to monitor and optimize performance across a wide range of tasks. These systems appear to improve speech recognition in younger and older adults, but sustained engagement of these systems also appears to produce an experience of effort and fatigue that may affect the value of communication. When considered in the broader context of the neuroimaging and decision making literature, the speech recognition findings from functional imaging studies indicate that the expected value, or expected level of speech recognition given the difficulty of listening conditions, should be considered when measuring effort and fatigue. The authors propose that the behavioral economics or neuroeconomics of listening can provide a conceptual and experimental framework for understanding effort and fatigue that may have clinical significance.

  16. Is Listening in Noise Worth It? The Neurobiology of Speech Recognition in Challenging Listening Conditions

    PubMed Central

    Eckert, Mark A.; Teubner-Rhodes, Susan; Vaden, Kenneth I.

    2016-01-01

    This review examines findings from functional neuroimaging studies of speech recognition in noise to provide a neural systems level explanation for the effort and fatigue that can be experienced during speech recognition in challenging listening conditions. Neuroimaging studies of speech recognition consistently demonstrate that challenging listening conditions engage neural systems that are used to monitor and optimize performance across a wide range of tasks. These systems appear to improve speech recognition in younger and older adults, but sustained engagement of these systems also appears to produce an experience of effort and fatigue that may affect the value of communication. When considered in the broader context of the neuroimaging and decision making literature, the speech recognition findings from functional imaging studies indicate that the expected value, or expected level of speech recognition given the difficulty of listening conditions, should be considered when measuring effort and fatigue. We propose that the behavioral economics and/or neuroeconomics of listening can provide a conceptual and experimental framework for understanding effort and fatigue that may have clinical significance. PMID:27355759

  17. Speech Recognition and Parent Ratings From Auditory Development Questionnaires in Children Who Are Hard of Hearing.

    PubMed

    McCreery, Ryan W; Walker, Elizabeth A; Spratford, Meredith; Oleson, Jacob; Bentler, Ruth; Holte, Lenore; Roush, Patricia

    2015-01-01

    Progress has been made in recent years in the provision of amplification and early intervention for children who are hard of hearing. However, children who use hearing aids (HAs) may have inconsistent access to their auditory environment due to limitations in speech audibility through their HAs or limited HA use. The effects of variability in children's auditory experience on parent-reported auditory skills questionnaires and on speech recognition in quiet and in noise were examined for a large group of children who were followed as part of the Outcomes of Children with Hearing Loss study. Parent ratings on auditory development questionnaires and children's speech recognition were assessed for 306 children who are hard of hearing. Children ranged in age from 12 months to 9 years. Three questionnaires involving parent ratings of auditory skill development and behavior were used, including the LittlEARS Auditory Questionnaire, Parents Evaluation of Oral/Aural Performance in Children rating scale, and an adaptation of the Speech, Spatial, and Qualities of Hearing scale. Speech recognition in quiet was assessed using the Open- and Closed-Set Test, Early Speech Perception test, Lexical Neighborhood Test, and Phonetically Balanced Kindergarten word lists. Speech recognition in noise was assessed using the Computer-Assisted Speech Perception Assessment. Children who are hard of hearing were compared with peers with normal hearing matched for age, maternal educational level, and nonverbal intelligence. The effects of aided audibility, HA use, and language ability on parent responses to auditory development questionnaires and on children's speech recognition were also examined. Children who are hard of hearing had poorer performance than peers with normal hearing on parent ratings of auditory skills and had poorer speech recognition. Significant individual variability among children who are hard of hearing was observed. Children with greater aided audibility through their HAs, more hours of HA use, and better language abilities generally had higher parent ratings of auditory skills and better speech-recognition abilities in quiet and in noise than peers with less audibility, more limited HA use, or poorer language abilities. In addition to the auditory and language factors that were predictive for speech recognition in quiet, phonological working memory was also a positive predictor for word recognition abilities in noise. Children who are hard of hearing continue to experience delays in auditory skill development and speech-recognition abilities compared with peers with normal hearing. However, significant improvements in these domains have occurred in comparison to similar data reported before the adoption of universal newborn hearing screening and early intervention programs for children who are hard of hearing. Increasing the audibility of speech has a direct positive effect on auditory skill development and speech-recognition abilities and also may enhance these skills by improving language abilities in children who are hard of hearing. Greater number of hours of HA use also had a significant positive impact on parent ratings of auditory skills and children's speech recognition.

  18. Real-time speech gisting for ATC applications

    NASA Astrophysics Data System (ADS)

    Dunkelberger, Kirk A.

    1995-06-01

    Command and control within the ATC environment remains primarily voice-based. Hence, automatic real time, speaker independent, continuous speech recognition (CSR) has many obvious applications and implied benefits to the ATC community: automated target tagging, aircraft compliance monitoring, controller training, automatic alarm disabling, display management, and many others. However, while current state-of-the-art CSR systems provide upwards of 98% word accuracy in laboratory environments, recent low-intrusion experiments in the ATCT environments demonstrated less than 70% word accuracy in spite of significant investments in recognizer tuning. Acoustic channel irregularities and controller/pilot grammar verities impact current CSR algorithms at their weakest points. It will be shown herein, however, that real time context- and environment-sensitive gisting can provide key command phrase recognition rates of greater than 95% using the same low-intrusion approach. The combination of real time inexact syntactic pattern recognition techniques and a tight integration of CSR, gisting, and ATC database accessor system components is the key to these high phase recognition rates. A system concept for real time gisting in the ATC context is presented herein. After establishing an application context, discussion presents a minimal CSR technology context then focuses on the gisting mechanism, desirable interfaces into the ATCT database environment, and data and control flow within the prototype system. Results of recent tests for a subset of the functionality are presented together with suggestions for further research.

  19. Automated Depression Analysis Using Convolutional Neural Networks from Speech.

    PubMed

    He, Lang; Cao, Cui

    2018-05-28

    To help clinicians to efficiently diagnose the severity of a person's depression, the affective computing community and the artificial intelligence field have shown a growing interest in designing automated systems. The speech features have useful information for the diagnosis of depression. However, manually designing and domain knowledge are still important for the selection of the feature, which makes the process labor consuming and subjective. In recent years, deep-learned features based on neural networks have shown superior performance to hand-crafted features in various areas. In this paper, to overcome the difficulties mentioned above, we propose a combination of hand-crafted and deep-learned features which can effectively measure the severity of depression from speech. In the proposed method, Deep Convolutional Neural Networks (DCNN) are firstly built to learn deep-learned features from spectrograms and raw speech waveforms. Then we manually extract the state-of-the-art texture descriptors named median robust extended local binary patterns (MRELBP) from spectrograms. To capture the complementary information within the hand-crafted features and deep-learned features, we propose joint fine-tuning layers to combine the raw and spectrogram DCNN to boost the depression recognition performance. Moreover, to address the problems with small samples, a data augmentation method was proposed. Experiments conducted on AVEC2013 and AVEC2014 depression databases show that our approach is robust and effective for the diagnosis of depression when compared to state-of-the-art audio-based methods. Copyright © 2018. Published by Elsevier Inc.

  20. Automated Speech Rate Measurement in Dysarthria.

    PubMed

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-06-01

    In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. The new algorithm was trained and tested using Dutch speech samples of 36 speakers with no history of speech impairment and 40 speakers with mild to moderate dysarthria. We tested the algorithm under various conditions: according to speech task type (sentence reading, passage reading, and storytelling) and algorithm optimization method (speaker group optimization and individual speaker optimization). Correlations between automated and human SR determination were calculated for each condition. High correlations between automated and human SR determination were found in the various testing conditions. The new algorithm measures SR in a sufficiently reliable manner. It is currently being integrated in a clinical software tool for assessing and managing prosody in dysarthric speech. Further research is needed to fine-tune the algorithm to severely dysarthric speech, to make the algorithm less sensitive to background noise, and to evaluate how the algorithm deals with syllabic consonants.

  1. Speech Recognition as a Transcription Aid: A Randomized Comparison With Standard Transcription

    PubMed Central

    Mohr, David N.; Turner, David W.; Pond, Gregory R.; Kamath, Joseph S.; De Vos, Cathy B.; Carpenter, Paul C.

    2003-01-01

    Objective. Speech recognition promises to reduce information entry costs for clinical information systems. It is most likely to be accepted across an organization if physicians can dictate without concerning themselves with real-time recognition and editing; assistants can then edit and process the computer-generated document. Our objective was to evaluate the use of speech-recognition technology in a randomized controlled trial using our institutional infrastructure. Design. Clinical note dictations from physicians in two specialty divisions were randomized to either a standard transcription process or a speech-recognition process. Secretaries and transcriptionists also were assigned randomly to each of these processes. Measurements. The duration of each dictation was measured. The amount of time spent processing a dictation to yield a finished document also was measured. Secretarial and transcriptionist productivity, defined as hours of secretary work per minute of dictation processed, was determined for speech recognition and standard transcription. Results. Secretaries in the endocrinology division were 87.3% (confidence interval, 83.3%, 92.3%) as productive with the speech-recognition technology as implemented in this study as they were using standard transcription. Psychiatry transcriptionists and secretaries were similarly less productive. Author, secretary, and type of clinical note were significant (p < 0.05) predictors of productivity. Conclusion. When implemented in an organization with an existing document-processing infrastructure (which included training and interfaces of the speech-recognition editor with the existing document entry application), speech recognition did not improve the productivity of secretaries or transcriptionists. PMID:12509359

  2. An Automated and Programed Laboratory for Instruction in the Areas of Speech and Communication. Final Report.

    ERIC Educational Resources Information Center

    Coleman, Thomas; Langberg, George

    An experimental public school speech therapy program is described, which offers automated, programed instruction in sound production and auditory training. The experiment includes self-teaching methods, as well as utilization of paraprofessional personnel under the supervision of a qualified speech therapist. Although the automated program was…

  3. Speech emotion recognition methods: A literature review

    NASA Astrophysics Data System (ADS)

    Basharirad, Babak; Moradhaseli, Mohammadreza

    2017-10-01

    Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.

  4. Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

    PubMed

    Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

    2018-05-01

    Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.

  5. Ongoing slow oscillatory phase modulates speech intelligibility in cooperation with motor cortical activity.

    PubMed

    Onojima, Takayuki; Kitajo, Keiichi; Mizuhara, Hiroaki

    2017-01-01

    Neural oscillation is attracting attention as an underlying mechanism for speech recognition. Speech intelligibility is enhanced by the synchronization of speech rhythms and slow neural oscillation, which is typically observed as human scalp electroencephalography (EEG). In addition to the effect of neural oscillation, it has been proposed that speech recognition is enhanced by the identification of a speaker's motor signals, which are used for speech production. To verify the relationship between the effect of neural oscillation and motor cortical activity, we measured scalp EEG, and simultaneous EEG and functional magnetic resonance imaging (fMRI) during a speech recognition task in which participants were required to recognize spoken words embedded in noise sound. We proposed an index to quantitatively evaluate the EEG phase effect on behavioral performance. The results showed that the delta and theta EEG phase before speech inputs modulated the participant's response time when conducting speech recognition tasks. The simultaneous EEG-fMRI experiment showed that slow EEG activity was correlated with motor cortical activity. These results suggested that the effect of the slow oscillatory phase was associated with the activity of the motor cortex during speech recognition.

  6. Address entry while driving: speech recognition versus a touch-screen keyboard.

    PubMed

    Tsimhoni, Omer; Smith, Daniel; Green, Paul

    2004-01-01

    A driving simulator experiment was conducted to determine the effects of entering addresses into a navigation system during driving. Participants drove on roads of varying visual demand while entering addresses. Three address entry methods were explored: word-based speech recognition, character-based speech recognition, and typing on a touch-screen keyboard. For each method, vehicle control and task measures, glance timing, and subjective ratings were examined. During driving, word-based speech recognition yielded the shortest total task time (15.3 s), followed by character-based speech recognition (41.0 s) and touch-screen keyboard (86.0 s). The standard deviation of lateral position when performing keyboard entry (0.21 m) was 60% higher than that for all other address entry methods (0.13 m). Degradation of vehicle control associated with address entry using a touch screen suggests that the use of speech recognition is favorable. Speech recognition systems with visual feedback, however, even with excellent accuracy, are not without performance consequences. Applications of this research include the design of in-vehicle navigation systems as well as other systems requiring significant driver input, such as E-mail, the Internet, and text messaging.

  7. Speech Processing and Recognition (SPaRe)

    DTIC Science & Technology

    2011-01-01

    results in the areas of automatic speech recognition (ASR), speech processing, machine translation (MT), natural language processing ( NLP ), and...Processing ( NLP ), Information Retrieval (IR) 16. SECURITY CLASSIFICATION OF: UNCLASSIFED 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME...Figure 9, the IOC was only expected to provide document submission and search; automatic speech recognition (ASR) for English, Spanish, Arabic , and

  8. Using Automatic Speech Recognition to Dictate Mathematical Expressions: The Development of the "TalkMaths" Application at Kingston University

    ERIC Educational Resources Information Center

    Wigmore, Angela; Hunter, Gordon; Pflugel, Eckhard; Denholm-Price, James; Binelli, Vincent

    2009-01-01

    Speech technology--especially automatic speech recognition--has now advanced to a level where it can be of great benefit both to able-bodied people and those with various disabilities. In this paper we describe an application "TalkMaths" which, using the output from a commonly-used conventional automatic speech recognition system,…

  9. Performing speech recognition research with hypercard

    NASA Technical Reports Server (NTRS)

    Shepherd, Chip

    1993-01-01

    The purpose of this paper is to describe a HyperCard-based system for performing speech recognition research and to instruct Human Factors professionals on how to use the system to obtain detailed data about the user interface of a prototype speech recognition application.

  10. Speech recognition and parent-ratings from auditory development questionnaires in children who are hard of hearing

    PubMed Central

    McCreery, Ryan W.; Walker, Elizabeth A.; Spratford, Meredith; Oleson, Jacob; Bentler, Ruth; Holte, Lenore; Roush, Patricia

    2015-01-01

    Objectives Progress has been made in recent years in the provision of amplification and early intervention for children who are hard of hearing. However, children who use hearing aids (HA) may have inconsistent access to their auditory environment due to limitations in speech audibility through their HAs or limited HA use. The effects of variability in children’s auditory experience on parent-report auditory skills questionnaires and on speech recognition in quiet and in noise were examined for a large group of children who were followed as part of the Outcomes of Children with Hearing Loss study. Design Parent ratings on auditory development questionnaires and children’s speech recognition were assessed for 306 children who are hard of hearing. Children ranged in age from 12 months to 9 years of age. Three questionnaires involving parent ratings of auditory skill development and behavior were used, including the LittlEARS Auditory Questionnaire, Parents Evaluation of Oral/Aural Performance in Children Rating Scale, and an adaptation of the Speech, Spatial and Qualities of Hearing scale. Speech recognition in quiet was assessed using the Open and Closed set task, Early Speech Perception Test, Lexical Neighborhood Test, and Phonetically-balanced Kindergarten word lists. Speech recognition in noise was assessed using the Computer-Assisted Speech Perception Assessment. Children who are hard of hearing were compared to peers with normal hearing matched for age, maternal educational level and nonverbal intelligence. The effects of aided audibility, HA use and language ability on parent responses to auditory development questionnaires and on children’s speech recognition were also examined. Results Children who are hard of hearing had poorer performance than peers with normal hearing on parent ratings of auditory skills and had poorer speech recognition. Significant individual variability among children who are hard of hearing was observed. Children with greater aided audibility through their HAs, more hours of HA use and better language abilities generally had higher parent ratings of auditory skills and better speech recognition abilities in quiet and in noise than peers with less audibility, more limited HA use or poorer language abilities. In addition to the auditory and language factors that were predictive for speech recognition in quiet, phonological working memory was also a positive predictor for word recognition abilities in noise. Conclusions Children who are hard of hearing continue to experience delays in auditory skill development and speech recognition abilities compared to peers with normal hearing. However, significant improvements in these domains have occurred in comparison to similar data reported prior to the adoption of universal newborn hearing screening and early intervention programs for children who are hard of hearing. Increasing the audibility of speech has a direct positive effect on auditory skill development and speech recognition abilities, and may also enhance these skills by improving language abilities in children who are hard of hearing. Greater number of hours of HA use also had a significant positive impact on parent ratings of auditory skills and children’s speech recognition. PMID:26731160

  11. Perceptual learning for speech in noise after application of binary time-frequency masks

    PubMed Central

    Ahmadi, Mahnaz; Gross, Vauna L.; Sinex, Donal G.

    2013-01-01

    Ideal time-frequency (TF) masks can reject noise and improve the recognition of speech-noise mixtures. An ideal TF mask is constructed with prior knowledge of the target speech signal. The intelligibility of a processed speech-noise mixture depends upon the threshold criterion used to define the TF mask. The study reported here assessed the effect of training on the recognition of speech in noise after processing by ideal TF masks that did not restore perfect speech intelligibility. Two groups of listeners with normal hearing listened to speech-noise mixtures processed by TF masks calculated with different threshold criteria. For each group, a threshold criterion that initially produced word recognition scores between 0.56–0.69 was chosen for training. Listeners practiced with one set of TF-masked sentences until their word recognition performance approached asymptote. Perceptual learning was quantified by comparing word-recognition scores in the first and last training sessions. Word recognition scores improved with practice for all listeners with the greatest improvement observed for the same materials used in training. PMID:23464038

  12. i-SAIRAS '90; Proceedings of the International Symposium on Artificial Intelligence, Robotics and Automation in Space, Kobe, Japan, Nov. 18-20, 1990

    NASA Technical Reports Server (NTRS)

    1990-01-01

    The present conference on artificial intelligence (AI), robotics, and automation in space encompasses robot systems, lunar and planetary robots, advanced processing, expert systems, knowledge bases, issues of operation and management, manipulator control, and on-orbit service. Specific issues addressed include fundamental research in AI at NASA, the FTS dexterous telerobot, a target-capture experiment by a free-flying robot, the NASA Planetary Rover Program, the Katydid system for compiling KEE applications to Ada, and speech recognition for robots. Also addressed are a knowledge base for real-time diagnosis, a pilot-in-the-loop simulation of an orbital docking maneuver, intelligent perturbation algorithms for space scheduling optimization, a fuzzy control method for a space manipulator system, hyperredundant manipulator applications, robotic servicing of EOS instruments, and a summary of astronaut inputs on automation and robotics for the Space Station Freedom.

  13. The effects of reverberant self- and overlap-masking on speech recognition in cochlear implant listeners.

    PubMed

    Desmond, Jill M; Collins, Leslie M; Throckmorton, Chandra S

    2014-06-01

    Many cochlear implant (CI) listeners experience decreased speech recognition in reverberant environments [Kokkinakis et al., J. Acoust. Soc. Am. 129(5), 3221-3232 (2011)], which may be caused by a combination of self- and overlap-masking [Bolt and MacDonald, J. Acoust. Soc. Am. 21(6), 577-580 (1949)]. Determining the extent to which these effects decrease speech recognition for CI listeners may influence reverberation mitigation algorithms. This study compared speech recognition with ideal self-masking mitigation, with ideal overlap-masking mitigation, and with no mitigation. Under these conditions, mitigating either self- or overlap-masking resulted in significant improvements in speech recognition for both normal hearing subjects utilizing an acoustic model and for CI listeners using their own devices.

  14. Children with a cochlear implant: characteristics and determinants of speech recognition, speech-recognition growth rate, and speech production.

    PubMed

    Wie, Ona Bø; Falkenberg, Eva-Signe; Tvete, Ole; Tomblin, Bruce

    2007-05-01

    The objectives of the study were to describe the characteristics of the first 79 prelingually deaf cochlear implant users in Norway and to investigate to what degree the variation in speech recognition, speech- recognition growth rate, and speech production could be explained by the characteristics of the child, the cochlear implant, the family, and the educational setting. Data gathered longitudinally were analysed using descriptive statistics, multiple regression, and growth-curve analysis. The results show that more than 50% of the variation could be explained by these characteristics. Daily user-time, non-verbal intelligence, mode of communication, length of CI experience, and educational placement had the highest effect on the outcome. The results also indicate that children educated in a bilingual approach to education have better speech perception and faster speech perception growth rate with increased focus on spoken language.

  15. How does susceptibility to proactive interference relate to speech recognition in aided and unaided conditions?

    PubMed

    Ellis, Rachel J; Rönnberg, Jerker

    2015-01-01

    Proactive interference (PI) is the capacity to resist interference to the acquisition of new memories from information stored in the long-term memory. Previous research has shown that PI correlates significantly with the speech-in-noise recognition scores of younger adults with normal hearing. In this study, we report the results of an experiment designed to investigate the extent to which tests of visual PI relate to the speech-in-noise recognition scores of older adults with hearing loss, in aided and unaided conditions. The results suggest that measures of PI correlate significantly with speech-in-noise recognition only in the unaided condition. Furthermore the relation between PI and speech-in-noise recognition differs to that observed in younger listeners without hearing loss. The findings suggest that the relation between PI tests and the speech-in-noise recognition scores of older adults with hearing loss relates to capability of the test to index cognitive flexibility.

  16. How does susceptibility to proactive interference relate to speech recognition in aided and unaided conditions?

    PubMed Central

    Ellis, Rachel J.; Rönnberg, Jerker

    2015-01-01

    Proactive interference (PI) is the capacity to resist interference to the acquisition of new memories from information stored in the long-term memory. Previous research has shown that PI correlates significantly with the speech-in-noise recognition scores of younger adults with normal hearing. In this study, we report the results of an experiment designed to investigate the extent to which tests of visual PI relate to the speech-in-noise recognition scores of older adults with hearing loss, in aided and unaided conditions. The results suggest that measures of PI correlate significantly with speech-in-noise recognition only in the unaided condition. Furthermore the relation between PI and speech-in-noise recognition differs to that observed in younger listeners without hearing loss. The findings suggest that the relation between PI tests and the speech-in-noise recognition scores of older adults with hearing loss relates to capability of the test to index cognitive flexibility. PMID:26283981

  17. Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

    PubMed

    Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

    2015-07-01

    It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line with the 'auditory-visual view' of auditory speech perception, which assumes that auditory speech recognition is optimized by using predictions from previously encoded speaker-specific audio-visual internal models. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Automatic Speech Recognition from Neural Signals: A Focused Review.

    PubMed

    Herff, Christian; Schultz, Tanja

    2016-01-01

    Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.

  19. Distributed Fusion in Sensor Networks with Information Genealogy

    DTIC Science & Technology

    2011-06-28

    image processing [2], acoustic and speech recognition [3], multitarget tracking [4], distributed fusion [5], and Bayesian inference [6-7]. For...Adaptation for Distant-Talking Speech Recognition." in Proc Acoustics. Speech , and Signal Processing, 2004 |4| Y Bar-Shalom and T 1-. Fortmann...used in speech recognition and other classification applications [8]. But their use in underwater mine classification is limited. In this paper, we

  20. Prediction of consonant recognition in quiet for listeners with normal and impaired hearing using an auditory model.

    PubMed

    Jürgens, Tim; Ewert, Stephan D; Kollmeier, Birger; Brand, Thomas

    2014-03-01

    Consonant recognition was assessed in normal-hearing (NH) and hearing-impaired (HI) listeners in quiet as a function of speech level using a nonsense logatome test. Average recognition scores were analyzed and compared to recognition scores of a speech recognition model. In contrast to commonly used spectral speech recognition models operating on long-term spectra, a "microscopic" model operating in the time domain was used. Variations of the model (accounting for hearing impairment) and different model parameters (reflecting cochlear compression) were tested. Using these model variations this study examined whether speech recognition performance in quiet is affected by changes in cochlear compression, namely, a linearization, which is often observed in HI listeners. Consonant recognition scores for HI listeners were poorer than for NH listeners. The model accurately predicted the speech reception thresholds of the NH and most HI listeners. A partial linearization of the cochlear compression in the auditory model, while keeping audibility constant, produced higher recognition scores and improved the prediction accuracy. However, including listener-specific information about the exact form of the cochlear compression did not improve the prediction further.

  1. Mispronunciation Detection for Language Learning and Speech Recognition Adaptation

    ERIC Educational Resources Information Center

    Ge, Zhenhao

    2013-01-01

    The areas of "mispronunciation detection" (or "accent detection" more specifically) within the speech recognition community are receiving increased attention now. Two application areas, namely language learning and speech recognition adaptation, are largely driving this research interest and are the focal points of this work.…

  2. Longitudinal changes in speech recognition in older persons.

    PubMed

    Dubno, Judy R; Lee, Fu-Shing; Matthews, Lois J; Ahlstrom, Jayne B; Horwitz, Amy R; Mills, John H

    2008-01-01

    Recognition of isolated monosyllabic words in quiet and recognition of key words in low- and high-context sentences in babble were measured in a large sample of older persons enrolled in a longitudinal study of age-related hearing loss. Repeated measures were obtained yearly or every 2 to 3 years. To control for concurrent changes in pure-tone thresholds and speech levels, speech-recognition scores were adjusted using an importance-weighted speech-audibility metric (AI). Linear-regression slope estimated the rate of change in adjusted speech-recognition scores. Recognition of words in quiet declined significantly faster with age than predicted by declines in speech audibility. As subjects aged, observed scores deviated increasingly from AI-predicted scores, but this effect did not accelerate with age. Rate of decline in word recognition was significantly faster for females than males and for females with high serum progesterone levels, whereas noise history had no effect. Rate of decline did not accelerate with age but increased with degree of hearing loss, suggesting that with more severe injury to the auditory system, impairments to auditory function other than reduced audibility resulted in faster declines in word recognition as subjects aged. Recognition of key words in low- and high-context sentences in babble did not decline significantly with age.

  3. From Birdsong to Human Speech Recognition: Bayesian Inference on a Hierarchy of Nonlinear Dynamical Systems

    PubMed Central

    Yildiz, Izzet B.; von Kriegstein, Katharina; Kiebel, Stefan J.

    2013-01-01

    Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents—an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments. PMID:24068902

  4. From birdsong to human speech recognition: bayesian inference on a hierarchy of nonlinear dynamical systems.

    PubMed

    Yildiz, Izzet B; von Kriegstein, Katharina; Kiebel, Stefan J

    2013-01-01

    Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents-an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments.

  5. Statistical assessment of speech system performance

    NASA Technical Reports Server (NTRS)

    Moshier, Stephen L.

    1977-01-01

    Methods for the normalization of performance tests results of speech recognition systems are presented. Technological accomplishments in speech recognition systems, as well as planned research activities are described.

  6. Building Searchable Collections of Enterprise Speech Data.

    ERIC Educational Resources Information Center

    Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret

    The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…

  7. Masked Speech Recognition and Reading Ability in School-Age Children: Is There a Relationship?

    ERIC Educational Resources Information Center

    Miller, Gabrielle; Lewis, Barbara; Benchek, Penelope; Buss, Emily; Calandruccio, Lauren

    2018-01-01

    Purpose: The relationship between reading (decoding) skills, phonological processing abilities, and masked speech recognition in typically developing children was explored. This experiment was designed to evaluate the relationship between phonological processing and decoding abilities and 2 aspects of masked speech recognition in typically…

  8. Six characteristics of effective structured reporting and the inevitable integration with speech recognition.

    PubMed

    Liu, David; Zucherman, Mark; Tulloss, William B

    2006-03-01

    The reporting of radiological images is undergoing dramatic changes due to the introduction of two new technologies: structured reporting and speech recognition. Each technology has its own unique advantages. The highly organized content of structured reporting facilitates data mining and billing, whereas speech recognition offers a natural succession from the traditional dictation-transcription process. This article clarifies the distinction between the process and outcome of structured reporting, describes fundamental requirements for any effective structured reporting system, and describes the potential development of a novel, easy-to-use, customizable structured reporting system that incorporates speech recognition. This system should have all the advantages derived from structured reporting, accommodate a wide variety of user needs, and incorporate speech recognition as a natural component and extension of the overall reporting process.

  9. Military applications of automatic speech recognition and future requirements

    NASA Technical Reports Server (NTRS)

    Beek, Bruno; Cupples, Edward J.

    1977-01-01

    An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.

  10. Speech Recognition as a Support Service for Deaf and Hard of Hearing Students: Adaptation and Evaluation. Final Report to Spencer Foundation.

    ERIC Educational Resources Information Center

    Stinson, Michael; Elliot, Lisa; McKee, Barbara; Coyne, Gina

    This report discusses a project that adapted new automatic speech recognition (ASR) technology to provide real-time speech-to-text transcription as a support service for students who are deaf and hard of hearing (D/HH). In this system, as the teacher speaks, a hearing intermediary, or captionist, dictates into the speech recognition system in a…

  11. Comparing auditory filter bandwidths, spectral ripple modulation detection, spectral ripple discrimination, and speech recognition: Normal and impaired hearinga)

    PubMed Central

    Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela

    2015-01-01

    Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes. PMID:26233047

  12. Comparing auditory filter bandwidths, spectral ripple modulation detection, spectral ripple discrimination, and speech recognition: Normal and impaired hearing.

    PubMed

    Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela

    2015-07-01

    Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes.

  13. Automated Discovery of Speech Act Categories in Educational Games

    ERIC Educational Resources Information Center

    Rus, Vasile; Moldovan, Cristian; Niraula, Nobal; Graesser, Arthur C.

    2012-01-01

    In this paper we address the important task of automated discovery of speech act categories in dialogue-based, multi-party educational games. Speech acts are important in dialogue-based educational systems because they help infer the student speaker's intentions (the task of speech act classification) which in turn is crucial to providing adequate…

  14. Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model.

    PubMed

    Jürgens, Tim; Brand, Thomas

    2009-11-01

    This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.

  15. The influence of speech rate and accent on access and use of semantic information.

    PubMed

    Sajin, Stanislav M; Connine, Cynthia M

    2017-04-01

    Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.

  16. The role of artificial intelligence and expert systems in increasing STS operations productivity

    NASA Technical Reports Server (NTRS)

    Culbert, C.

    1985-01-01

    Artificial Intelligence (AI) is discussed. A number of the computer technologies pioneered in the AI world can make significant contributions to increasing STS operations productivity. Application of expert systems, natural language, speech recognition, and other key technologies can reduce manpower while raising productivity. Many aspects of STS support lend themselves to this type of automation. The artificial intelligence section of the mission planning and analysis division has developed a number of functioning prototype systems which demonstrate the potential gains of applying AI technology.

  17. Use of intonation contours for speech recognition in noise by cochlear implant recipients.

    PubMed

    Meister, Hartmut; Landwehr, Markus; Pyschny, Verena; Grugel, Linda; Walger, Martin

    2011-05-01

    The corruption of intonation contours has detrimental effects on sentence-based speech recognition in normal-hearing listeners Binns and Culling [(2007). J. Acoust. Soc. Am. 122, 1765-1776]. This paper examines whether this finding also applies to cochlear implant (CI) recipients. The subjects' F0-discrimination and speech perception in the presence of noise were measured, using sentences with regular and inverted F0-contours. The results revealed that speech recognition for regular contours was significantly better than for inverted contours. This difference was related to the subjects' F0-discrimination providing further evidence that the perception of intonation patterns is important for the CI-mediated speech recognition in noise.

  18. Does quality of life depend on speech recognition performance for adult cochlear implant users?

    PubMed

    Capretta, Natalie R; Moberly, Aaron C

    2016-03-01

    Current postoperative clinical outcome measures for adults receiving cochlear implants (CIs) consist of testing speech recognition, primarily under quiet conditions. However, it is strongly suspected that results on these measures may not adequately reflect patients' quality of life (QOL) using their implants. This study aimed to evaluate whether QOL for CI users depends on speech recognition performance. Twenty-three postlingually deafened adults with CIs were assessed. Participants were tested for speech recognition (Central Institute for the Deaf word and AzBio sentence recognition in quiet) and completed three QOL measures-the Nijmegen Cochlear Implant Questionnaire; either the Hearing Handicap Inventory for Adults or the Hearing Handicap Inventory for the Elderly; and the Speech, Spatial and Qualities of Hearing Scale questionnaires-to assess a variety of QOL factors. Correlations were sought between speech recognition and QOL scores. Demographics, audiologic history, language, and cognitive skills were also examined as potential predictors of QOL. Only a few QOL scores significantly correlated with postoperative sentence or word recognition in quiet, and correlations were primarily isolated to speech-related subscales on QOL measures. Poorer pre- and postoperative unaided hearing predicted better QOL. Socioeconomic status, duration of deafness, age at implantation, duration of CI use, reading ability, vocabulary size, and cognitive status did not consistently predict QOL scores. For adult, postlingually deafened CI users, clinical speech recognition measures in quiet do not correlate broadly with QOL. Results suggest the need for additional outcome measures of the benefits and limitations of cochlear implantation. 4. Laryngoscope, 126:699-706, 2016. © 2015 The American Laryngological, Rhinological and Otological Society, Inc.

  19. Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.

    PubMed

    Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina

    2014-12-01

    In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Speech processing using maximum likelihood continuity mapping

    DOEpatents

    Hogden, John E.

    2000-01-01

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  1. Speech processing using maximum likelihood continuity mapping

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogden, J.E.

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  2. Use of Automated Scoring in Spoken Language Assessments for Test Takers with Speech Impairments. Research Report. ETS RR-17-42

    ERIC Educational Resources Information Center

    Loukina, Anastassia; Buzick, Heather

    2017-01-01

    This study is an evaluation of the performance of automated speech scoring for speakers with documented or suspected speech impairments. Given that the use of automated scoring of open-ended spoken responses is relatively nascent and there is little research to date that includes test takers with disabilities, this small exploratory study focuses…

  3. Supporting Dictation Speech Recognition Error Correction: The Impact of External Information

    ERIC Educational Resources Information Center

    Shi, Yongmei; Zhou, Lina

    2011-01-01

    Although speech recognition technology has made remarkable progress, its wide adoption is still restricted by notable effort made and frustration experienced by users while correcting speech recognition errors. One of the promising ways to improve error correction is by providing user support. Although support mechanisms have been proposed for…

  4. Application of an auditory model to speech recognition.

    PubMed

    Cohen, J R

    1989-06-01

    Some aspects of auditory processing are incorporated in a front end for the IBM speech-recognition system [F. Jelinek, "Continuous speech recognition by statistical methods," Proc. IEEE 64 (4), 532-556 (1976)]. This new process includes adaptation, loudness scaling, and mel warping. Tests show that the design is an improvement over previous algorithms.

  5. Specific acoustic models for spontaneous and dictated style in indonesian speech recognition

    NASA Astrophysics Data System (ADS)

    Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.

    2018-03-01

    The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.

  6. Increase in Speech Recognition Due to Linguistic Mismatch between Target and Masker Speech: Monolingual and Simultaneous Bilingual Performance

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Zhou, Haibo

    2014-01-01

    Purpose: To examine whether improved speech recognition during linguistically mismatched target-masker experiments is due to linguistic unfamiliarity of the masker speech or linguistic dissimilarity between the target and masker speech. Method: Monolingual English speakers (n = 20) and English-Greek simultaneous bilinguals (n = 20) listened to…

  7. The influence of audibility on speech recognition with nonlinear frequency compression for children and adults with hearing loss

    PubMed Central

    McCreery, Ryan W.; Alexander, Joshua; Brennan, Marc A.; Hoover, Brenda; Kopun, Judy; Stelmachowicz, Patricia G.

    2014-01-01

    Objective The primary goal of nonlinear frequency compression (NFC) and other frequency lowering strategies is to increase the audibility of high-frequency sounds that are not otherwise audible with conventional hearing-aid processing due to the degree of hearing loss, limited hearing aid bandwidth or a combination of both factors. The aim of the current study was to compare estimates of speech audibility processed by NFC to improvements in speech recognition for a group of children and adults with high-frequency hearing loss. Design Monosyllabic word recognition was measured in noise for twenty-four adults and twelve children with mild to severe sensorineural hearing loss. Stimuli were amplified based on each listener’s audiogram with conventional processing (CP) with amplitude compression or with NFC and presented under headphones using a software-based hearing aid simulator. A modification of the speech intelligibility index (SII) was used to estimate audibility of information in frequency-lowered bands. The mean improvement in SII was compared to the mean improvement in speech recognition. Results All but two listeners experienced improvements in speech recognition with NFC compared to CP, consistent with the small increase in audibility that was estimated using the modification of the SII. Children and adults had similar improvements in speech recognition with NFC. Conclusion Word recognition with NFC was higher than CP for children and adults with mild to severe hearing loss. The average improvement in speech recognition with NFC (7%) was consistent with the modified SII, which indicated that listeners experienced an increase in audibility with NFC compared to CP. Further studies are necessary to determine if changes in audibility with NFC are related to speech recognition with NFC for listeners with greater degrees of hearing loss, with a greater variety of compression settings, and using auditory training. PMID:24535558

  8. Automation of Shuttle Tile Inspection - Engineering methodology for Space Station

    NASA Technical Reports Server (NTRS)

    Wiskerchen, M. J.; Mollakarimi, C.

    1987-01-01

    The Space Systems Integration and Operations Research Applications (SIORA) Program was initiated in late 1986 as a cooperative applications research effort between Stanford University, NASA Kennedy Space Center, and Lockheed Space Operations Company. One of the major initial SIORA tasks was the application of automation and robotics technology to all aspects of the Shuttle tile processing and inspection system. This effort has adopted a systems engineering approach consisting of an integrated set of rapid prototyping testbeds in which a government/university/industry team of users, technologists, and engineers test and evaluate new concepts and technologies within the operational world of Shuttle. These integrated testbeds include speech recognition and synthesis, laser imaging inspection systems, distributed Ada programming environments, distributed relational database architectures, distributed computer network architectures, multimedia workbenches, and human factors considerations.

  9. How much does language proficiency by non-native listeners influence speech audiometric tests in noise?

    PubMed

    Warzybok, Anna; Brand, Thomas; Wagener, Kirsten C; Kollmeier, Birger

    2015-01-01

    The current study investigates the extent to which the linguistic complexity of three commonly employed speech recognition tests and second language proficiency influence speech recognition thresholds (SRTs) in noise in non-native listeners. SRTs were measured for non-natives and natives using three German speech recognition tests: the digit triplet test (DTT), the Oldenburg sentence test (OLSA), and the Göttingen sentence test (GÖSA). Sixty-four non-native and eight native listeners participated. Non-natives can show native-like SRTs in noise only for the linguistically easy speech material (DTT). Furthermore, the limitation of phonemic-acoustical cues in digit triplets affects speech recognition to the same extent in non-natives and natives. For more complex and less familiar speech materials, non-natives, ranging from basic to advanced proficiency in German, require on average 3-dB better signal-to-noise ratio for the OLSA and 6-dB for the GÖSA to obtain 50% speech recognition compared to native listeners. In clinical audiology, SRT measurements with a closed-set speech test (i.e. DTT for screening or OLSA test for clinical purposes) should be used with non-native listeners rather than open-set speech tests (such as the GÖSA or HINT), especially if a closed-set version in the patient's own native language is available.

  10. Mandarin-Speaking Children's Speech Recognition: Developmental Changes in the Influences of Semantic Context and F0 Contours.

    PubMed

    Zhou, Hong; Li, Yu; Liang, Meng; Guan, Connie Qun; Zhang, Linjun; Shu, Hua; Zhang, Yang

    2017-01-01

    The goal of this developmental speech perception study was to assess whether and how age group modulated the influences of high-level semantic context and low-level fundamental frequency ( F 0 ) contours on the recognition of Mandarin speech by elementary and middle-school-aged children in quiet and interference backgrounds. The results revealed different patterns for semantic and F 0 information. One the one hand, age group modulated significantly the use of F 0 contours, indicating that elementary school children relied more on natural F 0 contours than middle school children during Mandarin speech recognition. On the other hand, there was no significant modulation effect of age group on semantic context, indicating that children of both age groups used semantic context to assist speech recognition to a similar extent. Furthermore, the significant modulation effect of age group on the interaction between F 0 contours and semantic context revealed that younger children could not make better use of semantic context in recognizing speech with flat F 0 contours compared with natural F 0 contours, while older children could benefit from semantic context even when natural F 0 contours were altered, thus confirming the important role of F 0 contours in Mandarin speech recognition by elementary school children. The developmental changes in the effects of high-level semantic and low-level F 0 information on speech recognition might reflect the differences in auditory and cognitive resources associated with processing of the two types of information in speech perception.

  11. Talker variability in audio-visual speech perception

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker’s face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred. PMID:25076919

  12. Talker variability in audio-visual speech perception.

    PubMed

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  13. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  14. Methods and apparatus for non-acoustic speech characterization and recognition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzrichter, J.F.

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  15. Voice technology and BBN

    NASA Technical Reports Server (NTRS)

    Wolf, Jared J.

    1977-01-01

    The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.

  16. Effects of Age and Working Memory Capacity on Speech Recognition Performance in Noise Among Listeners With Normal Hearing.

    PubMed

    Gordon-Salant, Sandra; Cole, Stacey Samuels

    2016-01-01

    This study aimed to determine if younger and older listeners with normal hearing who differ on working memory span perform differently on speech recognition tests in noise. Older adults typically exhibit poorer speech recognition scores in noise than younger adults, which is attributed primarily to poorer hearing sensitivity and more limited working memory capacity in older than younger adults. Previous studies typically tested older listeners with poorer hearing sensitivity and shorter working memory spans than younger listeners, making it difficult to discern the importance of working memory capacity on speech recognition. This investigation controlled for hearing sensitivity and compared speech recognition performance in noise by younger and older listeners who were subdivided into high and low working memory groups. Performance patterns were compared for different speech materials to assess whether or not the effect of working memory capacity varies with the demands of the specific speech test. The authors hypothesized that (1) normal-hearing listeners with low working memory span would exhibit poorer speech recognition performance in noise than those with high working memory span; (2) older listeners with normal hearing would show poorer speech recognition scores than younger listeners with normal hearing, when the two age groups were matched for working memory span; and (3) an interaction between age and working memory would be observed for speech materials that provide contextual cues. Twenty-eight older (61 to 75 years) and 25 younger (18 to 25 years) normal-hearing listeners were assigned to groups based on age and working memory status. Northwestern University Auditory Test No. 6 words and Institute of Electrical and Electronics Engineers sentences were presented in noise using an adaptive procedure to measure the signal-to-noise ratio corresponding to 50% correct performance. Cognitive ability was evaluated with two tests of working memory (Listening Span Test and Reading Span Test) and two tests of processing speed (Paced Auditory Serial Addition Test and The Letter Digit Substitution Test). Significant effects of age and working memory capacity were observed on the speech recognition measures in noise, but these effects were mediated somewhat by the speech signal. Specifically, main effects of age and working memory were revealed for both words and sentences, but the interaction between the two was significant for sentences only. For these materials, effects of age were observed for listeners in the low working memory groups only. Although all cognitive measures were significantly correlated with speech recognition in noise, working memory span was the most important variable accounting for speech recognition performance. The results indicate that older adults with high working memory capacity are able to capitalize on contextual cues and perform as well as young listeners with high working memory capacity for sentence recognition. The data also suggest that listeners with normal hearing and low working memory capacity are less able to adapt to distortion of speech signals caused by background noise, which requires the allocation of more processing resources to earlier processing stages. These results indicate that both younger and older adults with low working memory capacity and normal hearing are at a disadvantage for recognizing speech in noise.

  17. Speech Recognition and Cognitive Skills in Bimodal Cochlear Implant Users

    ERIC Educational Resources Information Center

    Hua, Håkan; Johansson, Björn; Magnusson, Lennart; Lyxell, Björn; Ellis, Rachel J.

    2017-01-01

    Purpose: To examine the relation between speech recognition and cognitive skills in bimodal cochlear implant (CI) and hearing aid users. Method: Seventeen bimodal CI users (28-74 years) were recruited to the study. Speech recognition tests were carried out in quiet and in noise. The cognitive tests employed included the Reading Span Test and the…

  18. Leveraging Automatic Speech Recognition Errors to Detect Challenging Speech Segments in TED Talks

    ERIC Educational Resources Information Center

    Mirzaei, Maryam Sadat; Meshgi, Kourosh; Kawahara, Tatsuya

    2016-01-01

    This study investigates the use of Automatic Speech Recognition (ASR) systems to epitomize second language (L2) listeners' problems in perception of TED talks. ASR-generated transcripts of videos often involve recognition errors, which may indicate difficult segments for L2 listeners. This paper aims to discover the root-causes of the ASR errors…

  19. Visual Speech Primes Open-Set Recognition of Spoken Words

    ERIC Educational Resources Information Center

    Buchwald, Adam B.; Winters, Stephen J.; Pisoni, David B.

    2009-01-01

    Visual speech perception has become a topic of considerable interest to speech researchers. Previous research has demonstrated that perceivers neurally encode and use speech information from the visual modality, and this information has been found to facilitate spoken word recognition in tasks such as lexical decision (Kim, Davis, & Krins,…

  20. Significance of parametric spectral ratio methods in detection and recognition of whispered speech

    NASA Astrophysics Data System (ADS)

    Mathur, Arpit; Reddy, Shankar M.; Hegde, Rajesh M.

    2012-12-01

    In this article the significance of a new parametric spectral ratio method that can be used to detect whispered speech segments within normally phonated speech is described. Adaptation methods based on the maximum likelihood linear regression (MLLR) are then used to realize a mismatched train-test style speech recognition system. This proposed parametric spectral ratio method computes a ratio spectrum of the linear prediction (LP) and the minimum variance distortion-less response (MVDR) methods. The smoothed ratio spectrum is then used to detect whispered segments of speech within neutral speech segments effectively. The proposed LP-MVDR ratio method exhibits robustness at different SNRs as indicated by the whisper diarization experiments conducted on the CHAINS and the cell phone whispered speech corpus. The proposed method also performs reasonably better than the conventional methods for whisper detection. In order to integrate the proposed whisper detection method into a conventional speech recognition engine with minimal changes, adaptation methods based on the MLLR are used herein. The hidden Markov models corresponding to neutral mode speech are adapted to the whispered mode speech data in the whispered regions as detected by the proposed ratio method. The performance of this method is first evaluated on whispered speech data from the CHAINS corpus. The second set of experiments are conducted on the cell phone corpus of whispered speech. This corpus is collected using a set up that is used commercially for handling public transactions. The proposed whisper speech recognition system exhibits reasonably better performance when compared to several conventional methods. The results shown indicate the possibility of a whispered speech recognition system for cell phone based transactions.

  1. Current trends in small vocabulary speech recognition for equipment control

    NASA Astrophysics Data System (ADS)

    Doukas, Nikolaos; Bardis, Nikolaos G.

    2017-09-01

    Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.

  2. Preschoolers Benefit From Visually Salient Speech Cues

    PubMed Central

    Holt, Rachael Frush

    2015-01-01

    Purpose This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method Twelve adults and 27 typically developing 3- and 4-year-old children completed 3 audiovisual (AV) speech integration tasks: matching, discrimination, and recognition. The authors compared AV benefit for visually salient and less visually salient speech discrimination contrasts and assessed the visual saliency of consonant confusions in auditory-only and AV word recognition. Results Four-year-olds and adults demonstrated visual influence on all measures. Three-year-olds demonstrated visual influence on speech discrimination and recognition measures. All groups demonstrated greater AV benefit for the visually salient discrimination contrasts. AV recognition benefit in 4-year-olds and adults depended on the visual saliency of speech sounds. Conclusions Preschoolers can demonstrate AV speech integration. Their AV benefit results from efficient use of visually salient speech cues. Four-year-olds, but not 3-year-olds, used visual phonological knowledge to take advantage of visually salient speech cues, suggesting possible developmental differences in the mechanisms of AV benefit. PMID:25322336

  3. Speech Perception in Noise by Children With Cochlear Implants

    PubMed Central

    Caldwell, Amanda; Nittrouer, Susan

    2013-01-01

    Purpose Common wisdom suggests that listening in noise poses disproportionately greater difficulty for listeners with cochlear implants (CIs) than for peers with normal hearing (NH). The purpose of this study was to examine phonological, language, and cognitive skills that might help explain speech-in-noise abilities for children with CIs. Method Three groups of kindergartners (NH, hearing aid wearers, and CI users) were tested on speech recognition in quiet and noise and on tasks thought to underlie the abilities that fit into the domains of phonological awareness, general language, and cognitive skills. These last measures were used as predictor variables in regression analyses with speech-in-noise scores as dependent variables. Results Compared to children with NH, children with CIs did not perform as well on speech recognition in noise or on most other measures, including recognition in quiet. Two surprising results were that (a) noise effects were consistent across groups and (b) scores on other measures did not explain any group differences in speech recognition. Conclusions Limitations of implant processing take their primary toll on recognition in quiet and account for poor speech recognition and language/phonological deficits in children with CIs. Implications are that teachers/clinicians need to teach language/phonology directly and maximize signal-to-noise levels in the classroom. PMID:22744138

  4. Simulation of talking faces in the human brain improves auditory speech recognition

    PubMed Central

    von Kriegstein, Katharina; Dogan, Özgür; Grüter, Martina; Giraud, Anne-Lise; Kell, Christian A.; Grüter, Thomas; Kleinschmidt, Andreas; Kiebel, Stefan J.

    2008-01-01

    Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face. PMID:18436648

  5. Head movements encode emotions during speech and song.

    PubMed

    Livingstone, Steven R; Palmer, Caroline

    2016-04-01

    When speaking or singing, vocalists often move their heads in an expressive fashion, yet the influence of emotion on vocalists' head motion is unknown. Using a comparative speech/song task, we examined whether vocalists' intended emotions influence head movements and whether those movements influence the perceived emotion. In Experiment 1, vocalists were recorded with motion capture while speaking and singing each statement with different emotional intentions (very happy, happy, neutral, sad, very sad). Functional data analyses showed that head movements differed in translational and rotational displacement across emotional intentions, yet were similar across speech and song, transcending differences in F0 (varied freely in speech, fixed in song) and lexical variability. Head motion specific to emotional state occurred before and after vocalizations, as well as during sound production, confirming that some aspects of movement were not simply a by-product of sound production. In Experiment 2, observers accurately identified vocalists' intended emotion on the basis of silent, face-occluded videos of head movements during speech and song. These results provide the first evidence that head movements encode a vocalist's emotional intent and that observers decode emotional information from these movements. We discuss implications for models of head motion during vocalizations and applied outcomes in social robotics and automated emotion recognition. (c) 2016 APA, all rights reserved).

  6. Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.

    PubMed

    Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger

    2011-01-01

    The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.

  7. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  8. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  9. Characteristics of speaking style and implications for speech recognition.

    PubMed

    Shinozaki, Takahiro; Ostendorf, Mari; Atlas, Les

    2009-09-01

    Differences in speaking style are associated with more or less spectral variability, as well as different modulation characteristics. The greater variation in some styles (e.g., spontaneous speech and infant-directed speech) poses challenges for recognition but possibly also opportunities for learning more robust models, as evidenced by prior work and motivated by child language acquisition studies. In order to investigate this possibility, this work proposes a new method for characterizing speaking style (the modulation spectrum), examines spontaneous, read, adult-directed, and infant-directed styles in this space, and conducts pilot experiments in style detection and sampling for improved speech recognizer training. Speaking style classification is improved by using the modulation spectrum in combination with standard pitch and energy variation. Speech recognition experiments on a small vocabulary conversational speech recognition task show that sampling methods for training with a small amount of data benefit from the new features.

  10. How should a speech recognizer work?

    PubMed

    Scharenborg, Odette; Norris, Dennis; Bosch, Louis; McQueen, James M

    2005-11-12

    Although researchers studying human speech recognition (HSR) and automatic speech recognition (ASR) share a common interest in how information processing systems (human or machine) recognize spoken language, there is little communication between the two disciplines. We suggest that this lack of communication follows largely from the fact that research in these related fields has focused on the mechanics of how speech can be recognized. In Marr's (1982) terms, emphasis has been on the algorithmic and implementational levels rather than on the computational level. In this article, we provide a computational-level analysis of the task of speech recognition, which reveals the close parallels between research concerned with HSR and ASR. We illustrate this relation by presenting a new computational model of human spoken-word recognition, built using techniques from the field of ASR that, in contrast to current existing models of HSR, recognizes words from real speech input. 2005 Lawrence Erlbaum Associates, Inc.

  11. Evaluation of Speech Recognition of Cochlear Implant Recipients Using Adaptive, Digital Remote Microphone Technology and a Speech Enhancement Sound Processing Algorithm.

    PubMed

    Wolfe, Jace; Morais, Mila; Schafer, Erin; Agrawal, Smita; Koch, Dawn

    2015-05-01

    Cochlear implant recipients often experience difficulty with understanding speech in the presence of noise. Cochlear implant manufacturers have developed sound processing algorithms designed to improve speech recognition in noise, and research has shown these technologies to be effective. Remote microphone technology utilizing adaptive, digital wireless radio transmission has also been shown to provide significant improvement in speech recognition in noise. There are no studies examining the potential improvement in speech recognition in noise when these two technologies are used simultaneously. The goal of this study was to evaluate the potential benefits and limitations associated with the simultaneous use of a sound processing algorithm designed to improve performance in noise (Advanced Bionics ClearVoice) and a remote microphone system that incorporates adaptive, digital wireless radio transmission (Phonak Roger). A two-by-two way repeated measures design was used to examine performance differences obtained without these technologies compared to the use of each technology separately as well as the simultaneous use of both technologies. Eleven Advanced Bionics (AB) cochlear implant recipients, ages 11 to 68 yr. AzBio sentence recognition was measured in quiet and in the presence of classroom noise ranging in level from 50 to 80 dBA in 5-dB steps. Performance was evaluated in four conditions: (1) No ClearVoice and no Roger, (2) ClearVoice enabled without the use of Roger, (3) ClearVoice disabled with Roger enabled, and (4) simultaneous use of ClearVoice and Roger. Speech recognition in quiet was better than speech recognition in noise for all conditions. Use of ClearVoice and Roger each provided significant improvement in speech recognition in noise. The best performance in noise was obtained with the simultaneous use of ClearVoice and Roger. ClearVoice and Roger technology each improves speech recognition in noise, particularly when used at the same time. Because ClearVoice does not degrade performance in quiet settings, clinicians should consider recommending ClearVoice for routine, full-time use for AB implant recipients. Roger should be used in all instances in which remote microphone technology may assist the user in understanding speech in the presence of noise. American Academy of Audiology.

  12. Speech recognition: how good is good enough?

    PubMed

    Krohn, Richard

    2002-03-01

    Since its infancy in the early 1990s, the technology of speech recognition has undergone a rapid evolution. Not only has the reliability of the programming improved dramatically, the return on investment has become increasingly compelling. The author describes some of the latest health care applications of speech-recognition technology, and how the next advances will be made in this area.

  13. Speech recognition systems on the Cell Broadband Engine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Y; Jones, H; Vaidya, S

    In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less

  14. Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data

    PubMed Central

    Cao, Beiming; Kim, Myungjong; Mau, Ted; Wang, Jun

    2017-01-01

    Individuals with larynx (vocal folds) impaired have problems in controlling their glottal vibration, producing whispered speech with extreme hoarseness. Standard automatic speech recognition using only acoustic cues is typically ineffective for whispered speech because the corresponding spectral characteristics are distorted. Articulatory cues such as the tongue and lip motion may help in recognizing whispered speech since articulatory motion patterns are generally not affected. In this paper, we investigated whispered speech recognition for patients with reconstructed larynx using articulatory movement data. A data set with both acoustic and articulatory motion data was collected from a patient with surgically reconstructed larynx using an electromagnetic articulograph. Two speech recognition systems, Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network-HMM (DNN-HMM), were used in the experiments. Experimental results showed adding either tongue or lip motion data to acoustic features such as mel-frequency cepstral coefficient (MFCC) significantly reduced the phone error rates on both speech recognition systems. Adding both tongue and lip data achieved the best performance. PMID:29423453

  15. The Swedish Hayling task, and its relation to working memory, verbal ability, and speech-recognition-in-noise.

    PubMed

    Stenbäck, Victoria; Hällgren, Mathias; Lyxell, Björn; Larsby, Birgitta

    2015-06-01

    Cognitive functions and speech-recognition-in-noise were evaluated with a cognitive test battery, assessing response inhibition using the Hayling task, working memory capacity (WMC) and verbal information processing, and an auditory test of speech recognition. The cognitive tests were performed in silence whereas the speech recognition task was presented in noise. Thirty young normally-hearing individuals participated in the study. The aim of the study was to investigate one executive function, response inhibition, and whether it is related to individual working memory capacity (WMC), and how speech-recognition-in-noise relates to WMC and inhibitory control. The results showed a significant difference between initiation and response inhibition, suggesting that the Hayling task taps cognitive activity responsible for executive control. Our findings also suggest that high verbal ability was associated with better performance in the Hayling task. We also present findings suggesting that individuals who perform well on tasks involving response inhibition, and WMC, also perform well on a speech-in-noise task. Our findings indicate that capacity to resist semantic interference can be used to predict performance on speech-in-noise tasks. © 2015 Scandinavian Psychological Associations and John Wiley & Sons Ltd.

  16. Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users.

    PubMed

    Tao, Duoduo; Deng, Rui; Jiang, Ye; Galvin, John J; Fu, Qian-Jie; Chen, Bing

    2014-01-01

    To investigate how auditory working memory relates to speech perception performance by Mandarin-speaking cochlear implant (CI) users. Auditory working memory and speech perception was measured in Mandarin-speaking CI and normal-hearing (NH) participants. Working memory capacity was measured using forward digit span and backward digit span; working memory efficiency was measured using articulation rate. Speech perception was assessed with: (a) word-in-sentence recognition in quiet, (b) word-in-sentence recognition in speech-shaped steady noise at +5 dB signal-to-noise ratio, (c) Chinese disyllable recognition in quiet, (d) Chinese lexical tone recognition in quiet. Self-reported school rank was also collected regarding performance in schoolwork. There was large inter-subject variability in auditory working memory and speech performance for CI participants. Working memory and speech performance were significantly poorer for CI than for NH participants. All three working memory measures were strongly correlated with each other for both CI and NH participants. Partial correlation analyses were performed on the CI data while controlling for demographic variables. Working memory efficiency was significantly correlated only with sentence recognition in quiet when working memory capacity was partialled out. Working memory capacity was correlated with disyllable recognition and school rank when efficiency was partialled out. There was no correlation between working memory and lexical tone recognition in the present CI participants. Mandarin-speaking CI users experience significant deficits in auditory working memory and speech performance compared with NH listeners. The present data suggest that auditory working memory may contribute to CI users' difficulties in speech understanding. The present pattern of results with Mandarin-speaking CI users is consistent with previous auditory working memory studies with English-speaking CI users, suggesting that the lexical importance of voice pitch cues (albeit poorly coded by the CI) did not influence the relationship between working memory and speech perception.

  17. Implementation of the Intelligent Voice System for Kazakh

    NASA Astrophysics Data System (ADS)

    Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

    2014-04-01

    Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.

  18. Automated acoustic analysis in detection of spontaneous swallows in Parkinson's disease.

    PubMed

    Golabbakhsh, Marzieh; Rajaei, Ali; Derakhshan, Mahmoud; Sadri, Saeed; Taheri, Masoud; Adibi, Peyman

    2014-10-01

    Acoustic monitoring of swallow frequency has become important as the frequency of spontaneous swallowing can be an index for dysphagia and related complications. In addition, it can be employed as an objective quantification of ingestive behavior. Commonly, swallowing complications are manually detected using videofluoroscopy recordings, which require expensive equipment and exposure to radiation. In this study, a noninvasive automated technique is proposed that uses breath and swallowing recordings obtained via a microphone located over the laryngopharynx. Nonlinear diffusion filters were used in which a scale-space decomposition of recorded sound at different levels extract swallows from breath sounds and artifacts. This technique was compared to manual detection of swallows using acoustic signals on a sample of 34 subjects with Parkinson's disease. A speech language pathologist identified five subjects who showed aspiration during the videofluoroscopic swallowing study. The proposed automated method identified swallows with a sensitivity of 86.67 %, a specificity of 77.50 %, and an accuracy of 82.35 %. These results indicate the validity of automated acoustic recognition of swallowing as a fast and efficient approach to objectively estimate spontaneous swallow frequency.

  19. The NTID speech recognition test: NSRT(®).

    PubMed

    Bochner, Joseph H; Garrison, Wayne M; Doherty, Karen A

    2015-07-01

    The purpose of this study was to collect and analyse data necessary for expansion of the NSRT item pool and to evaluate the NSRT adaptive testing software. Participants were administered pure-tone and speech recognition tests including W-22 and QuickSIN, as well as a set of 323 new NSRT items and NSRT adaptive tests in quiet and background noise. Performance on the adaptive tests was compared to pure-tone thresholds and performance on other speech recognition measures. The 323 new items were subjected to Rasch scaling analysis. Seventy adults with mild to moderately severe hearing loss participated in this study. Their mean age was 62.4 years (sd = 20.8). The 323 new NSRT items fit very well with the original item bank, enabling the item pool to be more than doubled in size. Data indicate high reliability coefficients for the NSRT and moderate correlations with pure-tone thresholds (PTA and HFPTA) and other speech recognition measures (W-22, QuickSIN, and SRT). The adaptive NSRT is an efficient and effective measure of speech recognition, providing valid and reliable information concerning respondents' speech perception abilities.

  20. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzrichter, J.F.; Ng, L.C.

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less

  1. Female voice communications in high level aircraft cockpit noises--part II: vocoder and automatic speech recognition systems.

    PubMed

    Nixon, C; Anderson, T; Morris, L; McCavitt, A; McKinley, R; Yeager, D; McDaniel, M

    1998-11-01

    The intelligibility of female and male speech is equivalent under most ordinary living conditions. However, due to small differences between their acoustic speech signals, called speech spectra, one can be more or less intelligible than the other in certain situations such as high levels of noise. Anecdotal information, supported by some empirical observations, suggests that some of the high intensity noise spectra of military aircraft cockpits may degrade the intelligibility of female speech more than that of male speech. In an applied research study, the intelligibility of female and male speech was measured in several high level aircraft cockpit noise conditions experienced in military aviation. In Part I, (Nixon CW, et al. Aviat Space Environ Med 1998; 69:675-83) female speech intelligibility measured in the spectra and levels of aircraft cockpit noises and with noise-canceling microphones was lower than that of the male speech in all conditions. However, the differences were small and only those at some of the highest noise levels were significant. Although speech intelligibility of both genders was acceptable during normal cruise noises, improvements are required in most of the highest levels of noise created during maximum aircraft operating conditions. These results are discussed in a Part I technical report. This Part II report examines the intelligibility in the same aircraft cockpit noises of vocoded female and male speech and the accuracy with which female and male speech in some of the cockpit noises were understood by automatic speech recognition systems. The intelligibility of vocoded female speech was generally the same as that of vocoded male speech. No significant differences were measured between the recognition accuracy of male and female speech by the automatic speech recognition systems. The intelligibility of female and male speech was equivalent for these conditions.

  2. Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

    PubMed

    Schall, Sonja; von Kriegstein, Katharina

    2014-01-01

    It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers' voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker's face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.

  3. Recognition of Time-Compressed and Natural Speech with Selective Temporal Enhancements by Young and Elderly Listeners

    ERIC Educational Resources Information Center

    Gordon-Salant, Sandra; Fitzgibbons, Peter J.; Friedman, Sarah A.

    2007-01-01

    Purpose: The goal of this experiment was to determine whether selective slowing of speech segments improves recognition performance by young and elderly listeners. The hypotheses were (a) the benefits of time expansion occur for rapid speech but not for natural-rate speech, (b) selective time expansion of consonants produces greater score…

  4. Hierarchical singleton-type recurrent neural fuzzy networks for noisy speech recognition.

    PubMed

    Juang, Chia-Feng; Chiou, Chyi-Tian; Lai, Chun-Lung

    2007-05-01

    This paper proposes noisy speech recognition using hierarchical singleton-type recurrent neural fuzzy networks (HSRNFNs). The proposed HSRNFN is a hierarchical connection of two singleton-type recurrent neural fuzzy networks (SRNFNs), where one is used for noise filtering and the other for recognition. The SRNFN is constructed by recurrent fuzzy if-then rules with fuzzy singletons in the consequences, and their recurrent properties make them suitable for processing speech patterns with temporal characteristics. In n words recognition, n SRNFNs are created for modeling n words, where each SRNFN receives the current frame feature and predicts the next one of its modeling word. The prediction error of each SRNFN is used as recognition criterion. In filtering, one SRNFN is created, and each SRNFN recognizer is connected to the same SRNFN filter, which filters noisy speech patterns in the feature domain before feeding them to the SRNFN recognizer. Experiments with Mandarin word recognition under different types of noise are performed. Other recognizers, including multilayer perceptron (MLP), time-delay neural networks (TDNNs), and hidden Markov models (HMMs), are also tested and compared. These experiments and comparisons demonstrate good results with HSRNFN for noisy speech recognition tasks.

  5. Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

    NASA Astrophysics Data System (ADS)

    Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

    2017-09-01

    This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.

  6. Robot Command Interface Using an Audio-Visual Speech Recognition System

    NASA Astrophysics Data System (ADS)

    Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

    In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

  7. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    PubMed Central

    Caballero-Morales, Santiago-Omar

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410

  8. Deterioration of Speech Recognition Ability Over a Period of 5 Years in Adults Ages 18 to 70 Years: Results of the Dutch Online Speech-in-Noise Test.

    PubMed

    Stam, Mariska; Smits, Cas; Twisk, Jos W R; Lemke, Ulrike; Festen, Joost M; Kramer, Sophia E

    2015-01-01

    The first aim of the present study was to determine the change in speech recognition in noise over a period of 5 years in participants ages 18 to 70 years at baseline. The second aim was to investigate whether age, gender, educational level, the level of initial speech recognition in noise, and reported chronic conditions were associated with a change in speech recognition in noise. The baseline and 5-year follow-up data of 427 participants with and without hearing impairment participating in the National Longitudinal Study on Hearing (NL-SH) were analyzed. The ability to recognize speech in noise was measured twice with the online National Hearing Test, a digit-triplet speech-in-noise test. Speech-reception-threshold in noise (SRTn) scores were calculated, corresponding to 50% speech intelligibility. Unaided SRTn scores obtained with the same transducer (headphones or loudspeakers) at both test moments were included. Changes in SRTn were calculated as a raw shift (T1 - T0) and an adjusted shift for regression towards the mean. Paired t tests and multivariable linear regression analyses were applied. The mean increase (i.e., deterioration) in SRTn was 0.38-dB signal-to-noise ratio (SNR) over 5 years (p < 0.001). Results of the multivariable regression analyses showed that the age group of 50 to 59 years had a significantly larger deterioration in SRTn compared with the age group of 18 to 39 years (raw shift: beta: 0.64-dB SNR; 95% confidence interval: 0.07-1.22; p = 0.028, adjusted for initial speech recognition level - adjusted shift: beta: 0.82-dB SNR; 95% confidence interval: 0.27-1.34; p = 0.004). Gender, educational level, and the number of chronic conditions were not associated with a change in SRTn over time. No significant differences in increase of SRTn were found between the initial levels of speech recognition (i.e., good, insufficient, or poor) when taking into account the phenomenon regression towards the mean. The study results indicate that hearing deterioration of speech recognition in noise over 5 years can also be detected in adults ages 18 to 70 years. This rather small numeric change might represent a relevant impact on an individual's ability to understand speech in everyday life.

  9. Applications of Text Analysis Tools for Spoken Response Grading

    ERIC Educational Resources Information Center

    Crossley, Scott; McNamara, Danielle

    2013-01-01

    This study explores the potential for automated indices related to speech delivery, language use, and topic development to model human judgments of TOEFL speaking proficiency in second language (L2) speech samples. For this study, 244 transcribed TOEFL speech samples taken from 244 L2 learners were analyzed using automated indices taken from…

  10. Enhancing speech recognition using improved particle swarm optimization based hidden Markov model.

    PubMed

    Selvaraj, Lokesh; Ganesan, Balakrishnan

    2014-01-01

    Enhancing speech recognition is the primary intention of this work. In this paper a novel speech recognition method based on vector quantization and improved particle swarm optimization (IPSO) is suggested. The suggested methodology contains four stages, namely, (i) denoising, (ii) feature mining (iii), vector quantization, and (iv) IPSO based hidden Markov model (HMM) technique (IP-HMM). At first, the speech signals are denoised using median filter. Next, characteristics such as peak, pitch spectrum, Mel frequency Cepstral coefficients (MFCC), mean, standard deviation, and minimum and maximum of the signal are extorted from the denoised signal. Following that, to accomplish the training process, the extracted characteristics are given to genetic algorithm based codebook generation in vector quantization. The initial populations are created by selecting random code vectors from the training set for the codebooks for the genetic algorithm process and IP-HMM helps in doing the recognition. At this point the creativeness will be done in terms of one of the genetic operation crossovers. The proposed speech recognition technique offers 97.14% accuracy.

  11. Speaker recognition with temporal cues in acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Vongphoe, Michael; Zeng, Fan-Gang

    2005-08-01

    Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.

  12. A speech-controlled environmental control system for people with severe dysarthria.

    PubMed

    Hawley, Mark S; Enderby, Pam; Green, Phil; Cunningham, Stuart; Brownsell, Simon; Carmichael, James; Parker, Mark; Hatzis, Athanassios; O'Neill, Peter; Palmer, Rebecca

    2007-06-01

    Automatic speech recognition (ASR) can provide a rapid means of controlling electronic assistive technology. Off-the-shelf ASR systems function poorly for users with severe dysarthria because of the increased variability of their articulations. We have developed a limited vocabulary speaker dependent speech recognition application which has greater tolerance to variability of speech, coupled with a computerised training package which assists dysarthric speakers to improve the consistency of their vocalisations and provides more data for recogniser training. These applications, and their implementation as the interface for a speech-controlled environmental control system (ECS), are described. The results of field trials to evaluate the training program and the speech-controlled ECS are presented. The user-training phase increased the recognition rate from 88.5% to 95.4% (p<0.001). Recognition rates were good for people with even the most severe dysarthria in everyday usage in the home (mean word recognition rate 86.9%). Speech-controlled ECS were less accurate (mean task completion accuracy 78.6% versus 94.8%) but were faster to use than switch-scanning systems, even taking into account the need to repeat unsuccessful operations (mean task completion time 7.7s versus 16.9s, p<0.001). It is concluded that a speech-controlled ECS is a viable alternative to switch-scanning systems for some people with severe dysarthria and would lead, in many cases, to more efficient control of the home.

  13. Automated analysis of free speech predicts psychosis onset in high-risk youths

    PubMed Central

    Bedi, Gillinder; Carrillo, Facundo; Cecchi, Guillermo A; Slezak, Diego Fernández; Sigman, Mariano; Mota, Natália B; Ribeiro, Sidarta; Javitt, Daniel C; Copelli, Mauro; Corcoran, Cheryl M

    2015-01-01

    Background/Objectives: Psychiatry lacks the objective clinical tests routinely used in other specializations. Novel computerized methods to characterize complex behaviors such as speech could be used to identify and predict psychiatric illness in individuals. AIMS: In this proof-of-principle study, our aim was to test automated speech analyses combined with Machine Learning to predict later psychosis onset in youths at clinical high-risk (CHR) for psychosis. Methods: Thirty-four CHR youths (11 females) had baseline interviews and were assessed quarterly for up to 2.5 years; five transitioned to psychosis. Using automated analysis, transcripts of interviews were evaluated for semantic and syntactic features predicting later psychosis onset. Speech features were fed into a convex hull classification algorithm with leave-one-subject-out cross-validation to assess their predictive value for psychosis outcome. The canonical correlation between the speech features and prodromal symptom ratings was computed. Results: Derived speech features included a Latent Semantic Analysis measure of semantic coherence and two syntactic markers of speech complexity: maximum phrase length and use of determiners (e.g., which). These speech features predicted later psychosis development with 100% accuracy, outperforming classification from clinical interviews. Speech features were significantly correlated with prodromal symptoms. Conclusions: Findings support the utility of automated speech analysis to measure subtle, clinically relevant mental state changes in emergent psychosis. Recent developments in computer science, including natural language processing, could provide the foundation for future development of objective clinical tests for psychiatry. PMID:27336038

  14. Improving speech-in-noise recognition for children with hearing loss: Potential effects of language abilities, binaural summation, and head shadow

    PubMed Central

    Nittrouer, Susan; Caldwell-Tarr, Amanda; Tarr, Eric; Lowenstein, Joanna H.; Rice, Caitlin; Moberly, Aaron C.

    2014-01-01

    Objective: This study examined speech recognition in noise for children with hearing loss, compared it to recognition for children with normal hearing, and examined mechanisms that might explain variance in children’s abilities to recognize speech in noise. Design: Word recognition was measured in two levels of noise, both when the speech and noise were co-located in front and when the noise came separately from one side. Four mechanisms were examined as factors possibly explaining variance: vocabulary knowledge, sensitivity to phonological structure, binaural summation, and head shadow. Study sample: Participants were 113 eight-year-old children. Forty-eight had normal hearing (NH) and 65 had hearing loss: 18 with hearing aids (HAs), 19 with one cochlear implant (CI), and 28 with two CIs. Results: Phonological sensitivity explained a significant amount of between-groups variance in speech-in-noise recognition. Little evidence of binaural summation was found. Head shadow was similar in magnitude for children with NH and with CIs, regardless of whether they wore one or two CIs. Children with HAs showed reduced head shadow effects. Conclusion: These outcomes suggest that in order to improve speech-in-noise recognition for children with hearing loss, intervention needs to be comprehensive, focusing on both language abilities and auditory mechanisms. PMID:23834373

  15. Auditory training of speech recognition with interrupted and continuous noise maskers by children with hearing impairment

    PubMed Central

    Sullivan, Jessica R.; Thibodeau, Linda M.; Assmann, Peter F.

    2013-01-01

    Previous studies have indicated that individuals with normal hearing (NH) experience a perceptual advantage for speech recognition in interrupted noise compared to continuous noise. In contrast, adults with hearing impairment (HI) and younger children with NH receive a minimal benefit. The objective of this investigation was to assess whether auditory training in interrupted noise would improve speech recognition in noise for children with HI and perhaps enhance their utilization of glimpsing skills. A partially-repeated measures design was used to evaluate the effectiveness of seven 1-h sessions of auditory training in interrupted and continuous noise. Speech recognition scores in interrupted and continuous noise were obtained from pre-, post-, and 3 months post-training from 24 children with moderate-to-severe hearing loss. Children who participated in auditory training in interrupted noise demonstrated a significantly greater improvement in speech recognition compared to those who trained in continuous noise. Those who trained in interrupted noise demonstrated similar improvements in both noise conditions while those who trained in continuous noise only showed modest improvements in the interrupted noise condition. This study presents direct evidence that auditory training in interrupted noise can be beneficial in improving speech recognition in noise for children with HI. PMID:23297921

  16. A multimodal interface for real-time soldier-robot teaming

    NASA Astrophysics Data System (ADS)

    Barber, Daniel J.; Howard, Thomas M.; Walter, Matthew R.

    2016-05-01

    Recent research and advances in robotics have led to the development of novel platforms leveraging new sensing capabilities for semantic navigation. As these systems becoming increasingly more robust, they support highly complex commands beyond direct teleoperation and waypoint finding facilitating a transition away from robots as tools to robots as teammates. Supporting future Soldier-Robot teaming requires communication capabilities on par with human-human teams for successful integration of robots. Therefore, as robots increase in functionality, it is equally important that the interface between the Soldier and robot advances as well. Multimodal communication (MMC) enables human-robot teaming through redundancy and levels of communications more robust than single mode interaction. Commercial-off-the-shelf (COTS) technologies released in recent years for smart-phones and gaming provide tools for the creation of portable interfaces incorporating MMC through the use of speech, gestures, and visual displays. However, for multimodal interfaces to be successfully used in the military domain, they must be able to classify speech, gestures, and process natural language in real-time with high accuracy. For the present study, a prototype multimodal interface supporting real-time interactions with an autonomous robot was developed. This device integrated COTS Automated Speech Recognition (ASR), a custom gesture recognition glove, and natural language understanding on a tablet. This paper presents performance results (e.g. response times, accuracy) of the integrated device when commanding an autonomous robot to perform reconnaissance and surveillance activities in an unknown outdoor environment.

  17. Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners with Simulated Age-Related Hearing Loss

    ERIC Educational Resources Information Center

    Fontan, Lionel; Ferrané, Isabelle; Farinas, Jérôme; Pinquier, Julien; Tardieu, Julien; Magnen, Cynthia; Gaillard, Pascal; Aumont, Xavier; Füllgrabe, Christian

    2017-01-01

    Purpose: The purpose of this article is to assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an automatic speech recognition (ASR) system. The long-term goal of this research is to develop a system that will assist…

  18. The Relationship between Binaural Benefit and Difference in Unilateral Speech Recognition Performance for Bilateral Cochlear Implant Users

    PubMed Central

    Yoon, Yang-soo; Li, Yongxin; Kang, Hou-Yong; Fu, Qian-Jie

    2011-01-01

    Objective The full benefit of bilateral cochlear implants may depend on the unilateral performance with each device, the speech materials, processing ability of the user, and/or the listening environment. In this study, bilateral and unilateral speech performances were evaluated in terms of recognition of phonemes and sentences presented in quiet or in noise. Design Speech recognition was measured for unilateral left, unilateral right, and bilateral listening conditions; speech and noise were presented at 0° azimuth. The “binaural benefit” was defined as the difference between bilateral performance and unilateral performance with the better ear. Study Sample 9 adults with bilateral cochlear implants participated. Results On average, results showed a greater binaural benefit in noise than in quiet for all speech tests. More importantly, the binaural benefit was greater when unilateral performance was similar across ears. As the difference in unilateral performance between ears increased, the binaural advantage decreased; this functional relationship was observed across the different speech materials and noise levels even though there was substantial intra- and inter-subject variability. Conclusions The results indicate that subjects who show symmetry in speech recognition performance between implanted ears in general show a large binaural benefit. PMID:21696329

  19. Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals.

    PubMed

    Polur, Prasad D; Miller, Gerald E

    2006-10-01

    Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients requires a robust technique that can handle conditions of very high variability and limited training data. In this study, application of a 10 state ergodic hidden Markov model (HMM)/artificial neural network (ANN) hybrid structure for a dysarthric speech (isolated word) recognition system, intended to act as an assistive tool, was investigated. A small size vocabulary spoken by three cerebral palsy subjects was chosen. The effect of such a structure on the recognition rate of the system was investigated by comparing it with an ergodic hidden Markov model as a control tool. This was done in order to determine if this modified technique contributed to enhanced recognition of dysarthric speech. The speech was sampled at 11 kHz. Mel frequency cepstral coefficients were extracted from them using 15 ms frames and served as training input to the hybrid model setup. The subsequent results demonstrated that the hybrid model structure was quite robust in its ability to handle the large variability and non-conformity of dysarthric speech. The level of variability in input dysarthric speech patterns sometimes limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor impaired individuals holds sufficient promise.

  20. Effects of intelligibility on working memory demand for speech perception.

    PubMed

    Francis, Alexander L; Nusbaum, Howard C

    2009-08-01

    Understanding low-intelligibility speech is effortful. In three experiments, we examined the effects of intelligibility on working memory (WM) demands imposed by perception of synthetic speech. In all three experiments, a primary speeded word recognition task was paired with a secondary WM-load task designed to vary the availability of WM capacity during speech perception. Speech intelligibility was varied either by training listeners to use available acoustic cues in a more diagnostic manner (as in Experiment 1) or by providing listeners with more informative acoustic cues (i.e., better speech quality, as in Experiments 2 and 3). In the first experiment, training significantly improved intelligibility and recognition speed; increasing WM load significantly slowed recognition. A significant interaction between training and load indicated that the benefit of training on recognition speed was observed only under low memory load. In subsequent experiments, listeners received no training; intelligibility was manipulated by changing synthesizers. Improving intelligibility without training improved recognition accuracy, and increasing memory load still decreased it, but more intelligible speech did not produce more efficient use of available WM capacity. This suggests that perceptual learning modifies the way available capacity is used, perhaps by increasing the use of more phonetically informative features and/or by decreasing use of less informative ones.

  1. Self-Assessed Hearing Handicap in Older Adults With Poorer-Than-Predicted Speech Recognition in Noise.

    PubMed

    Eckert, Mark A; Matthews, Lois J; Dubno, Judy R

    2017-01-01

    Even older adults with relatively mild hearing loss report hearing handicap, suggesting that hearing handicap is not completely explained by reduced speech audibility. We examined the extent to which self-assessed ratings of hearing handicap using the Hearing Handicap Inventory for the Elderly (HHIE; Ventry & Weinstein, 1982) were significantly associated with measures of speech recognition in noise that controlled for differences in speech audibility. One hundred sixty-two middle-aged and older adults had HHIE total scores that were significantly associated with audibility-adjusted measures of speech recognition for low-context but not high-context sentences. These findings were driven by HHIE items involving negative feelings related to communication difficulties that also captured variance in subjective ratings of effort and frustration that predicted speech recognition. The average pure-tone threshold accounted for some of the variance in the association between the HHIE and audibility-adjusted speech recognition, suggesting an effect of central and peripheral auditory system decline related to elevated thresholds. The accumulation of difficult listening experiences appears to produce a self-assessment of hearing handicap resulting from (a) reduced audibility of stimuli, (b) declines in the central and peripheral auditory system function, and (c) additional individual variation in central nervous system function.

  2. Self-Assessed Hearing Handicap in Older Adults With Poorer-Than-Predicted Speech Recognition in Noise

    PubMed Central

    Matthews, Lois J.; Dubno, Judy R.

    2017-01-01

    Purpose Even older adults with relatively mild hearing loss report hearing handicap, suggesting that hearing handicap is not completely explained by reduced speech audibility. Method We examined the extent to which self-assessed ratings of hearing handicap using the Hearing Handicap Inventory for the Elderly (HHIE; Ventry & Weinstein, 1982) were significantly associated with measures of speech recognition in noise that controlled for differences in speech audibility. Results One hundred sixty-two middle-aged and older adults had HHIE total scores that were significantly associated with audibility-adjusted measures of speech recognition for low-context but not high-context sentences. These findings were driven by HHIE items involving negative feelings related to communication difficulties that also captured variance in subjective ratings of effort and frustration that predicted speech recognition. The average pure-tone threshold accounted for some of the variance in the association between the HHIE and audibility-adjusted speech recognition, suggesting an effect of central and peripheral auditory system decline related to elevated thresholds. Conclusion The accumulation of difficult listening experiences appears to produce a self-assessment of hearing handicap resulting from (a) reduced audibility of stimuli, (b) declines in the central and peripheral auditory system function, and (c) additional individual variation in central nervous system function. PMID:28060993

  3. Continuous multiword recognition performance of young and elderly listeners in ambient noise

    NASA Astrophysics Data System (ADS)

    Sato, Hiroshi

    2005-09-01

    Hearing threshold shift due to aging is known as a dominant factor to degrade speech recognition performance in noisy conditions. On the other hand, cognitive factors of aging-relating speech recognition performance in various speech-to-noise conditions are not well established. In this study, two kinds of speech test were performed to examine how working memory load relates to speech recognition performance. One is word recognition test with high-familiarity, four-syllable Japanese words (single-word test). In this test, each word was presented to listeners; the listeners were asked to write the word down on paper with enough time to answer. In the other test, five continuous word were presented to listeners and listeners were asked to write the word down after just five words were presented (multiword test). Both tests were done in various speech-to-noise ratios under 50-dBA Hoth spectrum noise with more than 50 young and elderly subjects. The results of two experiments suggest that (1) Hearing level is related to scores of both tests. (2) Scores of single-word test are well correlated with those of multiword test. (3) Scores of multiword test are not improved as speech-to-noise ratio improves in the condition where scores of single-word test reach their ceiling.

  4. Speech recognition: Acoustic phonetic and lexical knowledge representation

    NASA Astrophysics Data System (ADS)

    Zue, V. W.

    1983-02-01

    The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words, and determine to what extent the phontactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.

  5. Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

    NASA Astrophysics Data System (ADS)

    Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

    2016-10-01

    Objective. The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.

  6. Functional Connectivity between Face-Movement and Speech-Intelligibility Areas during Auditory-Only Speech Perception

    PubMed Central

    Schall, Sonja; von Kriegstein, Katharina

    2014-01-01

    It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas. PMID:24466026

  7. A preliminary comparison of speech recognition functionality in dental practice management systems.

    PubMed

    Irwin, Jeannie Y; Schleyer, Titus

    2008-11-06

    In this study, we examined speech recognition functionality in four leading dental practice management systems. Twenty dental students used voice to chart a simulated patient with 18 findings in each system. Results show it can take over a minute to chart one finding and that users frequently have to repeat commands. Limited functionality, poor usability and a high error rate appear to retard adoption of speech recognition in dentistry.

  8. Two Stage Data Augmentation for Low Resourced Speech Recognition (Author’s Manuscript)

    DTIC Science & Technology

    2016-09-12

    speech recognition, deep neural networks, data augmentation 1. Introduction When training data is limited—whether it be audio or text—the obvious...Schwartz, and S. Tsakalidis, “Enhancing low resource keyword spotting with au- tomatically retrieved web documents,” in Interspeech, 2015, pp. 839–843. [2...and F. Seide, “Feature learning in deep neural networks - a study on speech recognition tasks,” in International Conference on Learning Representations

  9. Getting What You Want: Accurate Document Filtering in a Terabyte World

    DTIC Science & Technology

    2002-11-01

    models are used widely in speech recognition and have shown promise for ad-hoc information retrieval (Ponte and Croft, 1998; Lafferty and Zhai, 2001...tasks is focused on developing techniques similar to those used in speech recognition. However the differing requirements of speech recognition and...Conference on Research and Development in Information Retrieval. ACM. 6. T.Ault, and Y. Yang. (2001.) kNN at TREC-9: A failure analysis. In

  10. V2S: Voice to Sign Language Translation System for Malaysian Deaf People

    NASA Astrophysics Data System (ADS)

    Mean Foong, Oi; Low, Tang Jung; La, Wai Wan

    The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.

  11. Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.

    PubMed

    Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina

    2014-11-15

    Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on an inter-hemispheric mechanism which exploits both a right-hemispheric sensitivity to pitch information and a left-hemispheric dominance in speech processing. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Linguistic contributions to speech-on-speech masking for native and non-native listeners: language familiarity and semantic content.

    PubMed

    Brouwer, Susanne; Van Engen, Kristin J; Calandruccio, Lauren; Bradlow, Ann R

    2012-02-01

    This study examined whether speech-on-speech masking is sensitive to variation in the degree of similarity between the target and the masker speech. Three experiments investigated whether speech-in-speech recognition varies across different background speech languages (English vs Dutch) for both English and Dutch targets, as well as across variation in the semantic content of the background speech (meaningful vs semantically anomalous sentences), and across variation in listener status vis-à-vis the target and masker languages (native, non-native, or unfamiliar). The results showed that the more similar the target speech is to the masker speech (e.g., same vs different language, same vs different levels of semantic content), the greater the interference on speech recognition accuracy. Moreover, the listener's knowledge of the target and the background language modulate the size of the release from masking. These factors had an especially strong effect on masking effectiveness in highly unfavorable listening conditions. Overall this research provided evidence that that the degree of target-masker similarity plays a significant role in speech-in-speech recognition. The results also give insight into how listeners assign their resources differently depending on whether they are listening to their first or second language. © 2012 Acoustical Society of America

  13. Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content

    PubMed Central

    Brouwer, Susanne; Van Engen, Kristin J.; Calandruccio, Lauren; Bradlow, Ann R.

    2012-01-01

    This study examined whether speech-on-speech masking is sensitive to variation in the degree of similarity between the target and the masker speech. Three experiments investigated whether speech-in-speech recognition varies across different background speech languages (English vs Dutch) for both English and Dutch targets, as well as across variation in the semantic content of the background speech (meaningful vs semantically anomalous sentences), and across variation in listener status vis-à-vis the target and masker languages (native, non-native, or unfamiliar). The results showed that the more similar the target speech is to the masker speech (e.g., same vs different language, same vs different levels of semantic content), the greater the interference on speech recognition accuracy. Moreover, the listener’s knowledge of the target and the background language modulate the size of the release from masking. These factors had an especially strong effect on masking effectiveness in highly unfavorable listening conditions. Overall this research provided evidence that that the degree of target-masker similarity plays a significant role in speech-in-speech recognition. The results also give insight into how listeners assign their resources differently depending on whether they are listening to their first or second language. PMID:22352516

  14. Speech Recognition in Adults With Cochlear Implants: The Effects of Working Memory, Phonological Sensitivity, and Aging.

    PubMed

    Moberly, Aaron C; Harris, Michael S; Boyce, Lauren; Nittrouer, Susan

    2017-04-14

    Models of speech recognition suggest that "top-down" linguistic and cognitive functions, such as use of phonotactic constraints and working memory, facilitate recognition under conditions of degradation, such as in noise. The question addressed in this study was what happens to these functions when a listener who has experienced years of hearing loss obtains a cochlear implant. Thirty adults with cochlear implants and 30 age-matched controls with age-normal hearing underwent testing of verbal working memory using digit span and serial recall of words. Phonological capacities were assessed using a lexical decision task and nonword repetition. Recognition of words in sentences in speech-shaped noise was measured. Implant users had only slightly poorer working memory accuracy than did controls and only on serial recall of words; however, phonological sensitivity was highly impaired. Working memory did not facilitate speech recognition in noise for either group. Phonological sensitivity predicted sentence recognition for implant users but not for listeners with normal hearing. Clinical speech recognition outcomes for adult implant users relate to the ability of these users to process phonological information. Results suggest that phonological capacities may serve as potential clinical targets through rehabilitative training. Such novel interventions may be particularly helpful for older adult implant users.

  15. Speech Recognition in Adults With Cochlear Implants: The Effects of Working Memory, Phonological Sensitivity, and Aging

    PubMed Central

    Harris, Michael S.; Boyce, Lauren; Nittrouer, Susan

    2017-01-01

    Purpose Models of speech recognition suggest that “top-down” linguistic and cognitive functions, such as use of phonotactic constraints and working memory, facilitate recognition under conditions of degradation, such as in noise. The question addressed in this study was what happens to these functions when a listener who has experienced years of hearing loss obtains a cochlear implant. Method Thirty adults with cochlear implants and 30 age-matched controls with age-normal hearing underwent testing of verbal working memory using digit span and serial recall of words. Phonological capacities were assessed using a lexical decision task and nonword repetition. Recognition of words in sentences in speech-shaped noise was measured. Results Implant users had only slightly poorer working memory accuracy than did controls and only on serial recall of words; however, phonological sensitivity was highly impaired. Working memory did not facilitate speech recognition in noise for either group. Phonological sensitivity predicted sentence recognition for implant users but not for listeners with normal hearing. Conclusion Clinical speech recognition outcomes for adult implant users relate to the ability of these users to process phonological information. Results suggest that phonological capacities may serve as potential clinical targets through rehabilitative training. Such novel interventions may be particularly helpful for older adult implant users. PMID:28384805

  16. Effects of Semantic Context and Fundamental Frequency Contours on Mandarin Speech Recognition by Second Language Learners.

    PubMed

    Zhang, Linjun; Li, Yu; Wu, Han; Li, Xin; Shu, Hua; Zhang, Yang; Li, Ping

    2016-01-01

    Speech recognition by second language (L2) learners in optimal and suboptimal conditions has been examined extensively with English as the target language in most previous studies. This study extended existing experimental protocols (Wang et al., 2013) to investigate Mandarin speech recognition by Japanese learners of Mandarin at two different levels (elementary vs. intermediate) of proficiency. The overall results showed that in addition to L2 proficiency, semantic context, F0 contours, and listening condition all affected the recognition performance on the Mandarin sentences. However, the effects of semantic context and F0 contours on L2 speech recognition diverged to some extent. Specifically, there was significant modulation effect of listening condition on semantic context, indicating that L2 learners made use of semantic context less efficiently in the interfering background than in quiet. In contrast, no significant modulation effect of listening condition on F0 contours was found. Furthermore, there was significant interaction between semantic context and F0 contours, indicating that semantic context becomes more important for L2 speech recognition when F0 information is degraded. None of these effects were found to be modulated by L2 proficiency. The discrepancy in the effects of semantic context and F0 contours on L2 speech recognition in the interfering background might be related to differences in processing capacities required by the two types of information in adverse listening conditions.

  17. Speech Recognition Thresholds for Multilingual Populations.

    ERIC Educational Resources Information Center

    Ramkissoon, Ishara

    2001-01-01

    This article traces the development of speech audiometry in the United States and reports on the current status, focusing on the needs of a multilingual population in terms of measuring speech recognition threshold (SRT). It also discusses sociolinguistic considerations, alternative SRT stimuli for second language learners, and research on using…

  18. Random Deep Belief Networks for Recognizing Emotions from Speech Signals.

    PubMed

    Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

    2017-01-01

    Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.

  19. Random Deep Belief Networks for Recognizing Emotions from Speech Signals

    PubMed Central

    Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

    2017-01-01

    Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition. PMID:28356908

  20. Speech recognition for embedded automatic positioner for laparoscope

    NASA Astrophysics Data System (ADS)

    Chen, Xiaodong; Yin, Qingyun; Wang, Yi; Yu, Daoyin

    2014-07-01

    In this paper a novel speech recognition methodology based on Hidden Markov Model (HMM) is proposed for embedded Automatic Positioner for Laparoscope (APL), which includes a fixed point ARM processor as the core. The APL system is designed to assist the doctor in laparoscopic surgery, by implementing the specific doctor's vocal control to the laparoscope. Real-time respond to the voice commands asks for more efficient speech recognition algorithm for the APL. In order to reduce computation cost without significant loss in recognition accuracy, both arithmetic and algorithmic optimizations are applied in the method presented. First, depending on arithmetic optimizations most, a fixed point frontend for speech feature analysis is built according to the ARM processor's character. Then the fast likelihood computation algorithm is used to reduce computational complexity of the HMM-based recognition algorithm. The experimental results show that, the method shortens the recognition time within 0.5s, while the accuracy higher than 99%, demonstrating its ability to achieve real-time vocal control to the APL.

  1. Investigation of potential cognitive tests for use with older adults in audiology clinics.

    PubMed

    Vaughan, Nancy; Storzbach, Daniel; Furukawa, Izumi

    2008-01-01

    Cognitive declines in working memory and processing speed are hallmarks of aging. Deficits in speech understanding also are seen in aging individuals. A clinical test to determine whether the cognitive aging changes contribute to aging speech understanding difficulties would be helpful for determining rehabilitation strategies in audiology clinics. To identify a clinical neurocognitive test or battery of tests that could be used in audiology clinics to help explain deficits in speech recognition in some older listeners. A correlational study examining the association between certain cognitive test scores and speech recognition performance. Speeded (time-compressed) speech was used to increase the cognitive processing load. Two hundred twenty-five adults aged 50 through 75 years were participants in this study. Both batteries of tests were administered to all participants in two separate sessions. A selected battery of neurocognitive tests and a time-compressed speech recognition test battery using various rates of speech were administered. Principal component analysis was used to extract the important component factors from each set of tests, and regression models were constructed to examine the association between tests and to identify the neurocognitive test most strongly associated with speech recognition performance. A sequencing working memory test (Letter-Number Sequencing [LNS]) was most strongly associated with rapid speech understanding. The association between the LNS test results and the compressed sentence recognition scores (CSRS) was strong even when age and hearing loss were controlled. The LNS is a sequencing test that provides information about temporal processing at the cognitive level and may prove useful in diagnosis of speech understanding problems, and in the development of aural rehabilitation and training strategies.

  2. Recognition of voice commands using adaptation of foreign language speech recognizer via selection of phonetic transcriptions

    NASA Astrophysics Data System (ADS)

    Maskeliunas, Rytis; Rudzionis, Vytautas

    2011-06-01

    In recent years various commercial speech recognizers have become available. These recognizers provide the possibility to develop applications incorporating various speech recognition techniques easily and quickly. All of these commercial recognizers are typically targeted to widely spoken languages having large market potential; however, it may be possible to adapt available commercial recognizers for use in environments where less widely spoken languages are used. Since most commercial recognition engines are closed systems the single avenue for the adaptation is to try set ways for the selection of proper phonetic transcription methods between the two languages. This paper deals with the methods to find the phonetic transcriptions for Lithuanian voice commands to be recognized using English speech engines. The experimental evaluation showed that it is possible to find phonetic transcriptions that will enable the recognition of Lithuanian voice commands with recognition accuracy of over 90%.

  3. Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem

    PubMed Central

    Liu, Xunying; Zhang, Chao; Woodland, Phil; Fonteneau, Elisabeth

    2017-01-01

    There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR) systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental ‘machine states’, generated as the ASR analysis progresses over time, to the incremental ‘brain states’, measured using combined electro- and magneto-encephalography (EMEG), generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain. PMID:28945744

  4. Speech Recognition for Medical Dictation: Overview in Quebec and Systematic Review.

    PubMed

    Poder, Thomas G; Fisette, Jean-François; Déry, Véronique

    2018-04-03

    Speech recognition is increasingly used in medical reporting. The aim of this article is to identify in the literature the strengths and weaknesses of this technology, as well as barriers to and facilitators of its implementation. A systematic review of systematic reviews was performed using PubMed, Scopus, the Cochrane Library and the Center for Reviews and Dissemination through August 2017. The gray literature has also been consulted. The quality of systematic reviews has been assessed with the AMSTAR checklist. The main inclusion criterion was use of speech recognition for medical reporting (front-end or back-end). A survey has also been conducted in Quebec, Canada, to identify the dissemination of this technology in this province, as well as the factors leading to the success or failure of its implementation. Five systematic reviews were identified. These reviews indicated a high level of heterogeneity across studies. The quality of the studies reported was generally poor. Speech recognition is not as accurate as human transcription, but it can dramatically reduce turnaround times for reporting. In front-end use, medical doctors need to spend more time on dictation and correction than required with human transcription. With speech recognition, major errors occur up to three times more frequently. In back-end use, a potential increase in productivity of transcriptionists was noted. In conclusion, speech recognition offers several advantages for medical reporting. However, these advantages are countered by an increased burden on medical doctors and by risks of additional errors in medical reports. It is also hard to identify for which medical specialties and which clinical activities the use of speech recognition will be the most beneficial.

  5. Differences in Speech Recognition Between Children with Attention Deficits and Typically Developed Children Disappear When Exposed to 65 dB of Auditory Noise

    PubMed Central

    Söderlund, Göran B. W.; Jobs, Elisabeth Nilsson

    2016-01-01

    The most common neuropsychiatric condition in the in children is attention deficit hyperactivity disorder (ADHD), affecting ∼6–9% of the population. ADHD is distinguished by inattention and hyperactive, impulsive behaviors as well as poor performance in various cognitive tasks often leading to failures at school. Sensory and perceptual dysfunctions have also been noticed. Prior research has mainly focused on limitations in executive functioning where differences are often explained by deficits in pre-frontal cortex activation. Less notice has been given to sensory perception and subcortical functioning in ADHD. Recent research has shown that children with ADHD diagnosis have a deviant auditory brain stem response compared to healthy controls. The aim of the present study was to investigate if the speech recognition threshold differs between attentive and children with ADHD symptoms in two environmental sound conditions, with and without external noise. Previous research has namely shown that children with attention deficits can benefit from white noise exposure during cognitive tasks and here we investigate if noise benefit is present during an auditory perceptual task. For this purpose we used a modified Hagerman’s speech recognition test where children with and without attention deficits performed a binaural speech recognition task to assess the speech recognition threshold in no noise and noise conditions (65 dB). Results showed that the inattentive group displayed a higher speech recognition threshold than typically developed children and that the difference in speech recognition threshold disappeared when exposed to noise at supra threshold level. From this we conclude that inattention can partly be explained by sensory perceptual limitations that can possibly be ameliorated through noise exposure. PMID:26858679

  6. Strategies for distant speech recognitionin reverberant environments

    NASA Astrophysics Data System (ADS)

    Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

    2015-12-01

    Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.

  7. Optimal pattern synthesis for speech recognition based on principal component analysis

    NASA Astrophysics Data System (ADS)

    Korsun, O. N.; Poliyev, A. V.

    2018-02-01

    The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.

  8. Age and measurement time-of-day effects on speech recognition in noise.

    PubMed

    Veneman, Carrie E; Gordon-Salant, Sandra; Matthews, Lois J; Dubno, Judy R

    2013-01-01

    The purpose of this study was to determine the effect of measurement time of day on speech recognition in noise and the extent to which time-of-day effects differ with age. Older adults tend to have more difficulty understanding speech in noise than younger adults, even when hearing is normal. Two possible contributors to this age difference in speech recognition may be measurement time of day and inhibition. Most younger adults are "evening-type," showing peak circadian arousal in the evening, whereas most older adults are "morning-type," with circadian arousal peaking in the morning. Tasks that require inhibition of irrelevant information have been shown to be affected by measurement time of day, with maximum performance attained at one's peak time of day. The authors hypothesized that a change in inhibition will be associated with measurement time of day and therefore affect speech recognition in noise, with better performance in the morning for older adults and in the evening for younger adults. Fifteen younger evening-type adults (20-28 years) and 15 older morning-type adults with normal hearing (66-78 years) listened to the Hearing in Noise Test (HINT) and the Quick Speech in Noise (QuickSIN) test in the morning and evening (peak and off-peak times). Time of day preference was assessed using the Morningness-Eveningness Questionnaire. Sentences and noise were presented binaurally through insert earphones. During morning and evening sessions, participants solved word-association problems within the visual-distraction task (VDT), which was used as an estimate of inhibition. After each session, participants rated perceived mental demand of the tasks using a revised version of the NASA Task Load Index. Younger adults performed significantly better on the speech-in-noise tasks and rated themselves as requiring significantly less mental demand when tested at their peak (evening) than off-peak (morning) time of day. In contrast, time-of-day effects were not observed for the older adults on the speech recognition or rating tasks. Although older adults required significantly more advantageous signal-to-noise ratios than younger adults for equivalent speech-recognition performance, a significantly larger younger versus older age difference in speech recognition was observed in the evening than in the morning. Older adults performed significantly poorer than younger adults on the VDT, but performance was not affected by measurement time of day. VDT performance for misleading distracter items was significantly correlated with HINT and QuickSIN test performance at the peak measurement time of day. Although all participants had normal hearing, speech recognition in noise was significantly poorer for older than younger adults, with larger age-related differences in the evening (an off-peak time for older adults) than in the morning. The significant effect of measurement time of day suggests that this factor may impact the clinical assessment of speech recognition in noise for all individuals. It appears that inhibition, as estimated by a visual distraction task for misleading visual items, is a cognitive mechanism that is related to speech-recognition performance in noise, at least at a listener's peak time of day.

  9. Feasibility of automated speech sample collection with stuttering children using interactive voice response (IVR) technology.

    PubMed

    Vogel, Adam P; Block, Susan; Kefalianos, Elaina; Onslow, Mark; Eadie, Patricia; Barth, Ben; Conway, Laura; Mundt, James C; Reilly, Sheena

    2015-04-01

    To investigate the feasibility of adopting automated interactive voice response (IVR) technology for remotely capturing standardized speech samples from stuttering children. Participants were 10 6-year-old stuttering children. Their parents called a toll-free number from their homes and were prompted to elicit speech from their children using a standard protocol involving conversation, picture description and games. The automated IVR system was implemented using an off-the-shelf telephony software program and delivered by a standard desktop computer. The software infrastructure utilizes voice over internet protocol. Speech samples were automatically recorded during the calls. Video recordings were simultaneously acquired in the home at the time of the call to evaluate the fidelity of the telephone collected samples. Key outcome measures included syllables spoken, percentage of syllables stuttered and an overall rating of stuttering severity using a 10-point scale. Data revealed a high level of relative reliability in terms of intra-class correlation between the video and telephone acquired samples on all outcome measures during the conversation task. Findings were less consistent for speech samples during picture description and games. Results suggest that IVR technology can be used successfully to automate remote capture of child speech samples.

  10. Multitasking During Degraded Speech Recognition in School-Age Children

    PubMed Central

    Ward, Kristina M.; Brehm, Laurel

    2017-01-01

    Multitasking requires individuals to allocate their cognitive resources across different tasks. The purpose of the current study was to assess school-age children’s multitasking abilities during degraded speech recognition. Children (8 to 12 years old) completed a dual-task paradigm including a sentence recognition (primary) task containing speech that was either unprocessed or noise-band vocoded with 8, 6, or 4 spectral channels and a visual monitoring (secondary) task. Children’s accuracy and reaction time on the visual monitoring task was quantified during the dual-task paradigm in each condition of the primary task and compared with single-task performance. Children experienced dual-task costs in the 6- and 4-channel conditions of the primary speech recognition task with decreased accuracy on the visual monitoring task relative to baseline performance. In all conditions, children’s dual-task performance on the visual monitoring task was strongly predicted by their single-task (baseline) performance on the task. Results suggest that children’s proficiency with the secondary task contributes to the magnitude of dual-task costs while multitasking during degraded speech recognition. PMID:28105890

  11. Multitasking During Degraded Speech Recognition in School-Age Children.

    PubMed

    Grieco-Calub, Tina M; Ward, Kristina M; Brehm, Laurel

    2017-01-01

    Multitasking requires individuals to allocate their cognitive resources across different tasks. The purpose of the current study was to assess school-age children's multitasking abilities during degraded speech recognition. Children (8 to 12 years old) completed a dual-task paradigm including a sentence recognition (primary) task containing speech that was either unprocessed or noise-band vocoded with 8, 6, or 4 spectral channels and a visual monitoring (secondary) task. Children's accuracy and reaction time on the visual monitoring task was quantified during the dual-task paradigm in each condition of the primary task and compared with single-task performance. Children experienced dual-task costs in the 6- and 4-channel conditions of the primary speech recognition task with decreased accuracy on the visual monitoring task relative to baseline performance. In all conditions, children's dual-task performance on the visual monitoring task was strongly predicted by their single-task (baseline) performance on the task. Results suggest that children's proficiency with the secondary task contributes to the magnitude of dual-task costs while multitasking during degraded speech recognition.

  12. Processing Electromyographic Signals to Recognize Words

    NASA Technical Reports Server (NTRS)

    Jorgensen, C. C.; Lee, D. D.

    2009-01-01

    A recently invented speech-recognition method applies to words that are articulated by means of the tongue and throat muscles but are otherwise not voiced or, at most, are spoken sotto voce. This method could satisfy a need for speech recognition under circumstances in which normal audible speech is difficult, poses a hazard, is disturbing to listeners, or compromises privacy. The method could also be used to augment traditional speech recognition by providing an additional source of information about articulator activity. The method can be characterized as intermediate between (1) conventional speech recognition through processing of voice sounds and (2) a method, not yet developed, of processing electroencephalographic signals to extract unspoken words directly from thoughts. This method involves computational processing of digitized electromyographic (EMG) signals from muscle innervation acquired by surface electrodes under a subject's chin near the tongue and on the side of the subject s throat near the larynx. After preprocessing, digitization, and feature extraction, EMG signals are processed by a neural-network pattern classifier, implemented in software, that performs the bulk of the recognition task as described.

  13. Relations Between Self-reported Executive Functioning and Speech Perception Skills in Adult Cochlear Implant Users.

    PubMed

    Moberly, Aaron C; Patel, Tirth R; Castellanos, Irina

    2018-02-01

    As a result of their hearing loss, adults with cochlear implants (CIs) would self-report poorer executive functioning (EF) skills than normal-hearing (NH) peers, and these EF skills would be associated with performance on speech recognition tasks. EF refers to a group of high order neurocognitive skills responsible for behavioral and emotional regulation during goal-directed activity, and EF has been found to be poorer in children with CIs than their NH age-matched peers. Moreover, there is increasing evidence that neurocognitive skills, including some EF skills, contribute to the ability to recognize speech through a CI. Thirty postlingually deafened adults with CIs and 42 age-matched NH adults were enrolled. Participants and their spouses or significant others (informants) completed well-validated self-reports or informant-reports of EF, the Behavior Rating Inventory of Executive Function - Adult (BRIEF-A). CI users' speech recognition skills were assessed in quiet using several measures of sentence recognition. NH peers were tested for recognition of noise-vocoded versions of the same speech stimuli. CI users self-reported difficulty on EF tasks of shifting and task monitoring. In CI users, measures of speech recognition correlated with several self-reported EF skills. The present findings provide further evidence that neurocognitive factors, including specific EF skills, may decline in association with hearing loss, and that some of these EF skills contribute to speech processing under degraded listening conditions.

  14. Spontaneous Speech Collection for the CSR Corpus

    DTIC Science & Technology

    1992-01-01

    Menlo Park, California 94025 1. ABSTRACT As part of a pilot data collection for DARPA’s Continuous Speech Recognition ( CSR ) speech corpus, SRI...International experi- mented with the collection of spontaneous speeoh material. The bulk of the CSR pilot data was read versions of news articles from...variable. 2. INTRODUCTION The CSR (Continuous Speech Recognition) Corpus collec- tion can be considered the successor to the Resource Man- agemen t

  15. Eyes and ears: Using eye tracking and pupillometry to understand challenges to speech recognition.

    PubMed

    Van Engen, Kristin J; McLaughlin, Drew J

    2018-05-04

    Although human speech recognition is often experienced as relatively effortless, a number of common challenges can render the task more difficult. Such challenges may originate in talkers (e.g., unfamiliar accents, varying speech styles), the environment (e.g. noise), or in listeners themselves (e.g., hearing loss, aging, different native language backgrounds). Each of these challenges can reduce the intelligibility of spoken language, but even when intelligibility remains high, they can place greater processing demands on listeners. Noisy conditions, for example, can lead to poorer recall for speech, even when it has been correctly understood. Speech intelligibility measures, memory tasks, and subjective reports of listener difficulty all provide critical information about the effects of such challenges on speech recognition. Eye tracking and pupillometry complement these methods by providing objective physiological measures of online cognitive processing during listening. Eye tracking records the moment-to-moment direction of listeners' visual attention, which is closely time-locked to unfolding speech signals, and pupillometry measures the moment-to-moment size of listeners' pupils, which dilate in response to increased cognitive load. In this paper, we review the uses of these two methods for studying challenges to speech recognition. Copyright © 2018. Published by Elsevier B.V.

  16. Robust recognition of loud and Lombard speech in the fighter cockpit environment

    NASA Astrophysics Data System (ADS)

    Stanton, Bill J., Jr.

    1988-08-01

    There are a number of challenges associated with incorporating speech recognition technology into the fighter cockpit. One of the major problems is the wide range of variability in the pilot's voice. That can result from changing levels of stress and workload. Increasing the training set to include abnormal speech is not an attractive option because of the innumerable conditions that would have to be represented and the inordinate amount of time to collect such a training set. A more promising approach is to study subsets of abnormal speech that have been produced under controlled cockpit conditions with the purpose of characterizing reliable shifts that occur relative to normal speech. Such was the initiative of this research. Analyses were conducted for 18 features on 17671 phoneme tokens across eight speakers for normal, loud, and Lombard speech. It was discovered that there was a consistent migration of energy in the sonorants. This discovery of reliable energy shifts led to the development of a method to reduce or eliminate these shifts in the Euclidean distances between LPC log magnitude spectra. This combination significantly improved recognition performance of loud and Lombard speech. Discrepancies in recognition error rates between normal and abnormal speech were reduced by approximately 50 percent for all eight speakers combined.

  17. Robust relationship between reading span and speech recognition in noise

    PubMed Central

    Souza, Pamela; Arehart, Kathryn

    2015-01-01

    Objective Working memory refers to a cognitive system that manages information processing and temporary storage. Recent work has demonstrated that individual differences in working memory capacity measured using a reading span task are related to ability to recognize speech in noise. In this project, we investigated whether the specific implementation of the reading span task influenced the strength of the relationship between working memory capacity and speech recognition. Design The relationship between speech recognition and working memory capacity was examined for two different working memory tests that varied in approach, using a within-subject design. Data consisted of audiometric results along with the two different working memory tests; one speech-in-noise test; and a reading comprehension test. Study sample The test group included 94 older adults with varying hearing loss and 30 younger adults with normal hearing. Results Listeners with poorer working memory capacity had more difficulty understanding speech in noise after accounting for age and degree of hearing loss. That relationship did not differ significantly between the two different implementations of reading span. Conclusions Our findings suggest that different implementations of a verbal reading span task do not affect the strength of the relationship between working memory capacity and speech recognition. PMID:25975360

  18. Robust relationship between reading span and speech recognition in noise.

    PubMed

    Souza, Pamela; Arehart, Kathryn

    2015-01-01

    Working memory refers to a cognitive system that manages information processing and temporary storage. Recent work has demonstrated that individual differences in working memory capacity measured using a reading span task are related to ability to recognize speech in noise. In this project, we investigated whether the specific implementation of the reading span task influenced the strength of the relationship between working memory capacity and speech recognition. The relationship between speech recognition and working memory capacity was examined for two different working memory tests that varied in approach, using a within-subject design. Data consisted of audiometric results along with the two different working memory tests; one speech-in-noise test; and a reading comprehension test. The test group included 94 older adults with varying hearing loss and 30 younger adults with normal hearing. Listeners with poorer working memory capacity had more difficulty understanding speech in noise after accounting for age and degree of hearing loss. That relationship did not differ significantly between the two different implementations of reading span. Our findings suggest that different implementations of a verbal reading span task do not affect the strength of the relationship between working memory capacity and speech recognition.

  19. Automatic lip reading by using multimodal visual features

    NASA Astrophysics Data System (ADS)

    Takahashi, Shohei; Ohya, Jun

    2013-12-01

    Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.

  20. [Perception of emotional intonation of noisy speech signal with different acoustic parameters by adults of different age and gender].

    PubMed

    Dmitrieva, E S; Gel'man, V Ia

    2011-01-01

    The listener-distinctive features of recognition of different emotional intonations (positive, negative and neutral) of male and female speakers in the presence or absence of background noise were studied in 49 adults aged 20-79 years. In all the listeners noise produced the most pronounced decrease in recognition accuracy for positive emotional intonation ("joy") as compared to other intonations, whereas it did not influence the recognition accuracy of "anger" in 65-79-year-old listeners. The higher emotion recognition rates of a noisy signal were observed for speech emotional intonations expressed by female speakers. Acoustic characteristics of noisy and clear speech signals underlying perception of speech emotional prosody were found for adult listeners of different age and gender.

  1. Evaluating deep learning architectures for Speech Emotion Recognition.

    PubMed

    Fayek, Haytham M; Lech, Margaret; Cavedon, Lawrence

    2017-08-01

    Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Data analysis and integration of environmental sensors to meet human needs

    NASA Astrophysics Data System (ADS)

    Santamaria, Amilcare Francesco; De Rango, Floriano; Barletta, Domenico; Falbo, Domenico; Imbrogno, Alessandro

    2014-05-01

    Nowadays one of the main task of technology is to make people's life simpler and easier. Ambient intelligence is an emerging discipline that brings intelligence to environments making them sensitive to us. This discipline has developed following the spread of sensors devices, sensor networks, pervasive computing and artificial intelligence. In this work, we attempt to enhance the Internet Of Things (loT) with intelligence and environments exploring various interactions between humans' beings and the environment they live in. In particular, the core of the system is composed of an automation system, which is made up with a domotic control unit and several sensors installed in the environment. The task of the sensors is to collect information from the environment and to send them to the control unit. Once the information is collected, the core combines them in order to infer the most accurate human needs. The knowledge of human needs and the current environment status compose the inputs of the intelligence block whose main goal is to find the right automations to satisfy human needs in a real time way. The system also provides a Speech Recognition service which allow users to interact with the system by their voice so human speech can be considered as additional input for smart automatisms.

  3. Developing and Evaluating an Oral Skills Training Website Supported by Automatic Speech Recognition Technology

    ERIC Educational Resources Information Center

    Chen, Howard Hao-Jan

    2011-01-01

    Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…

  4. Vocal Tract Representation in the Recognition of Cerebral Palsied Speech

    ERIC Educational Resources Information Center

    Rudzicz, Frank; Hirst, Graeme; van Lieshout, Pascal

    2012-01-01

    Purpose: In this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine. Method: Data were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech.…

  5. Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.

    ERIC Educational Resources Information Center

    Horn, Carin E.; Scott, Brian L.

    A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…

  6. Speech Recognition Software for Language Learning: Toward an Evaluation of Validity and Student Perceptions

    ERIC Educational Resources Information Center

    Cordier, Deborah

    2009-01-01

    A renewed focus on foreign language (FL) learning and speech for communication has resulted in computer-assisted language learning (CALL) software developed with Automatic Speech Recognition (ASR). ASR features for FL pronunciation (Lafford, 2004) are functional components of CALL designs used for FL teaching and learning. The ASR features…

  7. Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair

    NASA Astrophysics Data System (ADS)

    Sasou, Akira; Kojima, Hiroaki

    2009-12-01

    Conventional voice-driven wheelchairs usually employ headset microphones that are capable of achieving sufficient recognition accuracy, even in the presence of surrounding noise. However, such interfaces require users to wear sensors such as a headset microphone, which can be an impediment, especially for the hand disabled. Conversely, it is also well known that the speech recognition accuracy drastically degrades when the microphone is placed far from the user. In this paper, we develop a noise robust speech recognition system for a voice-driven wheelchair. This system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors. We verified the effectiveness of our system in experiments in different environments, and confirmed that our system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors.

  8. Automated Cough Assessment on a Mobile Platform

    PubMed Central

    2014-01-01

    The development of an Automated System for Asthma Monitoring (ADAM) is described. This consists of a consumer electronics mobile platform running a custom application. The application acquires an audio signal from an external user-worn microphone connected to the device analog-to-digital converter (microphone input). This signal is processed to determine the presence or absence of cough sounds. Symptom tallies and raw audio waveforms are recorded and made easily accessible for later review by a healthcare provider. The symptom detection algorithm is based upon standard speech recognition and machine learning paradigms and consists of an audio feature extraction step followed by a Hidden Markov Model based Viterbi decoder that has been trained on a large database of audio examples from a variety of subjects. Multiple Hidden Markov Model topologies and orders are studied. Performance of the recognizer is presented in terms of the sensitivity and the rate of false alarm as determined in a cross-validation test. PMID:25506590

  9. The Effect of Remote Masking on the Reception of Speech by Young School-Age Children.

    PubMed

    Youngdahl, Carla L; Healy, Eric W; Yoho, Sarah E; Apoux, Frédéric; Holt, Rachael Frush

    2018-02-15

    Psychoacoustic data indicate that infants and children are less likely than adults to focus on a spectral region containing an anticipated signal and are more susceptible to remote masking of a signal. These detection tasks suggest that infants and children, unlike adults, do not listen selectively. However, less is known about children's ability to listen selectively during speech recognition. Accordingly, the current study examines remote masking during speech recognition in children and adults. Adults and 7- and 5-year-old children performed sentence recognition in the presence of various spectrally remote maskers. Intelligibility was determined for each remote-masker condition, and performance was compared across age groups. It was found that speech recognition for 5-year-olds was reduced in the presence of spectrally remote noise, whereas the maskers had no effect on the 7-year-olds or adults. Maskers of different bandwidth and remoteness had similar effects. In accord with psychoacoustic data, young children do not appear to focus on a spectral region of interest and ignore other regions during speech recognition. This tendency may help account for their typically poorer speech perception in noise. This study also appears to capture an important developmental stage, during which a substantial refinement in spectral listening occurs.

  10. Studies in automatic speech recognition and its application in aerospace

    NASA Astrophysics Data System (ADS)

    Taylor, Michael Robinson

    Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.

  11. Speech Recognition in Noise by Children with and without Dyslexia: How is it Related to Reading?

    PubMed

    Nittrouer, Susan; Krieg, Letitia M; Lowenstein, Joanna H

    2018-06-01

    Developmental dyslexia is commonly viewed as a phonological deficit that makes it difficult to decode written language. But children with dyslexia typically exhibit other problems, as well, including poor speech recognition in noise. The purpose of this study was to examine whether the speech-in-noise problems of children with dyslexia are related to their reading problems, and if so, if a common underlying factor might explain both. The specific hypothesis examined was that a spectral processing disorder results in these children receiving smeared signals, which could explain both the diminished sensitivity to phonological structure - leading to reading problems - and the speech recognition in noise difficulties. The alternative hypothesis tested in this study was that children with dyslexia simply have broadly based language deficits. Ninety-seven children between the ages of 7 years; 10 months and 12 years; 9 months participated: 46 with dyslexia and 51 without dyslexia. Children were tested on two dependent measures: word reading and recognition in noise with two types of sentence materials: as unprocessed (UP) signals, and as spectrally smeared (SM) signals. Data were collected for four predictor variables: phonological awareness, vocabulary, grammatical knowledge, and digit span. Children with dyslexia showed deficits on both dependent and all predictor variables. Their scores for speech recognition in noise were poorer than those of children without dyslexia for both the UP and SM signals, but by equivalent amounts across signal conditions indicating that they were not disproportionately hindered by spectral distortion. Correlation analyses on scores from children with dyslexia showed that reading ability and speech-in-noise recognition were only mildly correlated, and each skill was related to different underlying abilities. No substantial evidence was found to support the suggestion that the reading and speech recognition in noise problems of children with dyslexia arise from a single factor that could be defined as a spectral processing disorder. The reading and speech recognition in noise deficits of these children appeared to be largely independent. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Schizophrenia alters intra-network functional connectivity in the caudate for detecting speech under informational speech masking conditions.

    PubMed

    Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang

    2018-04-04

    Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.

  13. Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

    NASA Astrophysics Data System (ADS)

    Caballero Morales, Santiago Omar; Cox, Stephen J.

    2009-12-01

    Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.

  14. Application of advanced speech technology in manned penetration bombers

    NASA Astrophysics Data System (ADS)

    North, R.; Lea, W.

    1982-03-01

    This report documents research on the potential use of speech technology in a manned penetration bomber aircraft (B-52/G and H). The objectives of the project were to analyze the pilot/copilot crewstation tasks over a three-hour-and forty-minute mission and determine the tasks that would benefit the most from conversion to speech recognition/generation, determine the technological feasibility of each of the identified tasks, and prioritize these tasks based on these criteria. Secondary objectives of the program were to enunciate research strategies in the application of speech technologies in airborne environments, and develop guidelines for briefing user commands on the potential of using speech technologies in the cockpit. The results of this study indicated that for the B-52 crewmember, speech recognition would be most beneficial for retrieving chart and procedural data that is contained in the flight manuals. Technological feasibility of these tasks indicated that the checklist and procedural retrieval tasks would be highly feasible for a speech recognition system.

  15. Automated analysis of connected speech reveals early biomarkers of Parkinson's disease in patients with rapid eye movement sleep behaviour disorder.

    PubMed

    Hlavnička, Jan; Čmejla, Roman; Tykalová, Tereza; Šonka, Karel; Růžička, Evžen; Rusz, Jan

    2017-02-02

    For generations, the evaluation of speech abnormalities in neurodegenerative disorders such as Parkinson's disease (PD) has been limited to perceptual tests or user-controlled laboratory analysis based upon rather small samples of human vocalizations. Our study introduces a fully automated method that yields significant features related to respiratory deficits, dysphonia, imprecise articulation and dysrhythmia from acoustic microphone data of natural connected speech for predicting early and distinctive patterns of neurodegeneration. We compared speech recordings of 50 subjects with rapid eye movement sleep behaviour disorder (RBD), 30 newly diagnosed, untreated PD patients and 50 healthy controls, and showed that subliminal parkinsonian speech deficits can be reliably captured even in RBD patients, which are at high risk of developing PD or other synucleinopathies. Thus, automated vocal analysis should soon be able to contribute to screening and diagnostic procedures for prodromal parkinsonian neurodegeneration in natural environments.

  16. On the Use of Evolutionary Algorithms to Improve the Robustness of Continuous Speech Recognition Systems in Adverse Conditions

    NASA Astrophysics Data System (ADS)

    Selouani, Sid-Ahmed; O'Shaughnessy, Douglas

    2003-12-01

    Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech recognition (CSR) systems. We propose a novel approach which combines the Karhunen-Loève transform (KLT) in the mel-frequency domain with a genetic algorithm (GA) to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique, when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe interfering car noise environments for a wide range of signal-to-noise ratios (SNRs) varying from 16 dB to[InlineEquation not available: see fulltext.] dB. We also showed the effectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.

  17. How linguistic closure and verbal working memory relate to speech recognition in noise--a review.

    PubMed

    Besser, Jana; Koelewijn, Thomas; Zekveld, Adriana A; Kramer, Sophia E; Festen, Joost M

    2013-06-01

    The ability to recognize masked speech, commonly measured with a speech reception threshold (SRT) test, is associated with cognitive processing abilities. Two cognitive factors frequently assessed in speech recognition research are the capacity of working memory (WM), measured by means of a reading span (Rspan) or listening span (Lspan) test, and the ability to read masked text (linguistic closure), measured by the text reception threshold (TRT). The current article provides a review of recent hearing research that examined the relationship of TRT and WM span to SRTs in various maskers. Furthermore, modality differences in WM capacity assessed with the Rspan compared to the Lspan test were examined and related to speech recognition abilities in an experimental study with young adults with normal hearing (NH). Span scores were strongly associated with each other, but were higher in the auditory modality. The results of the reviewed studies suggest that TRT and WM span are related to each other, but differ in their relationships with SRT performance. In NH adults of middle age or older, both TRT and Rspan were associated with SRTs in speech maskers, whereas TRT better predicted speech recognition in fluctuating nonspeech maskers. The associations with SRTs in steady-state noise were inconclusive for both measures. WM span was positively related to benefit from contextual information in speech recognition, but better TRTs related to less interference from unrelated cues. Data for individuals with impaired hearing are limited, but larger WM span seems to give a general advantage in various listening situations.

  18. How Linguistic Closure and Verbal Working Memory Relate to Speech Recognition in Noise—A Review

    PubMed Central

    Koelewijn, Thomas; Zekveld, Adriana A.; Kramer, Sophia E.; Festen, Joost M.

    2013-01-01

    The ability to recognize masked speech, commonly measured with a speech reception threshold (SRT) test, is associated with cognitive processing abilities. Two cognitive factors frequently assessed in speech recognition research are the capacity of working memory (WM), measured by means of a reading span (Rspan) or listening span (Lspan) test, and the ability to read masked text (linguistic closure), measured by the text reception threshold (TRT). The current article provides a review of recent hearing research that examined the relationship of TRT and WM span to SRTs in various maskers. Furthermore, modality differences in WM capacity assessed with the Rspan compared to the Lspan test were examined and related to speech recognition abilities in an experimental study with young adults with normal hearing (NH). Span scores were strongly associated with each other, but were higher in the auditory modality. The results of the reviewed studies suggest that TRT and WM span are related to each other, but differ in their relationships with SRT performance. In NH adults of middle age or older, both TRT and Rspan were associated with SRTs in speech maskers, whereas TRT better predicted speech recognition in fluctuating nonspeech maskers. The associations with SRTs in steady-state noise were inconclusive for both measures. WM span was positively related to benefit from contextual information in speech recognition, but better TRTs related to less interference from unrelated cues. Data for individuals with impaired hearing are limited, but larger WM span seems to give a general advantage in various listening situations. PMID:23945955

  19. The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise

    ERIC Educational Resources Information Center

    Shen, Jung; Souza, Pamela E.

    2017-01-01

    Purpose: This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for…

  20. Introduction and Overview of the Vicens-Reddy Speech Recognition System.

    ERIC Educational Resources Information Center

    Kameny, Iris; Ritea, H.

    The Vicens-Reddy System is unique in the sense that it approaches the problem of speech recognition as a whole, rather than treating particular aspects of the problems as in previous attempts. For example, where earlier systems treated only segmentation of speech into phoneme groups, or detected phonemes in a given context, the Vicens-Reddy System…

  1. Accommodation and Compliance Series: Employees with Arthritis

    MedlinePlus

    ... handed keyboard, an articulating keyboard tray, speech recognition software, a trackball, and office equipment for a workstation ... space heater, additional window insulation, and speech recognition software. An insurance clerk with arthritis from systemic lupus ...

  2. [Research on Barrier-free Home Environment System Based on Speech Recognition].

    PubMed

    Zhu, Husheng; Yu, Hongliu; Shi, Ping; Fang, Youfang; Jian, Zhuo

    2015-10-01

    The number of people with physical disabilities is increasing year by year, and the trend of population aging is more and more serious. In order to improve the quality of the life, a control system of accessible home environment for the patients with serious disabilities was developed to control the home electrical devices with the voice of the patients. The control system includes a central control platform, a speech recognition module, a terminal operation module, etc. The system combines the speech recognition control technology and wireless information transmission technology with the embedded mobile computing technology, and interconnects the lamp, electronic locks, alarms, TV and other electrical devices in the home environment as a whole system through a wireless network node. The experimental results showed that speech recognition success rate was more than 84% in the home environment.

  3. Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

    PubMed

    Schafer, Phillip B; Jin, Dezhe Z

    2014-03-01

    Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

  4. Speech-recognition interfaces for music information retrieval

    NASA Astrophysics Data System (ADS)

    Goto, Masataka

    2005-09-01

    This paper describes two hands-free music information retrieval (MIR) systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. Our MIR-based jukebox systems employ two different speech-recognition interfaces for MIR, speech completion and speech spotter, which exploit intentionally controlled nonverbal speech information in original ways. The first is a music retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user only remembers part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces. (Video clips: http://staff.aist.go.jp/m.goto/MIR/speech-if.html)

  5. Speaker-Machine Interaction in Automatic Speech Recognition. Technical Report.

    ERIC Educational Resources Information Center

    Makhoul, John I.

    The feasibility and limitations of speaker adaptation in improving the performance of a "fixed" (speaker-independent) automatic speech recognition system were examined. A fixed vocabulary of 55 syllables is used in the recognition system which contains 11 stops and fricatives and five tense vowels. The results of an experiment on speaker…

  6. Dynamic relation between working memory capacity and speech recognition in noise during the first 6 months of hearing aid use.

    PubMed

    Ng, Elaine H N; Classon, Elisabet; Larsby, Birgitta; Arlinger, Stig; Lunner, Thomas; Rudner, Mary; Rönnberg, Jerker

    2014-11-23

    The present study aimed to investigate the changing relationship between aided speech recognition and cognitive function during the first 6 months of hearing aid use. Twenty-seven first-time hearing aid users with symmetrical mild to moderate sensorineural hearing loss were recruited. Aided speech recognition thresholds in noise were obtained in the hearing aid fitting session as well as at 3 and 6 months postfitting. Cognitive abilities were assessed using a reading span test, which is a measure of working memory capacity, and a cognitive test battery. Results showed a significant correlation between reading span and speech reception threshold during the hearing aid fitting session. This relation was significantly weakened over the first 6 months of hearing aid use. Multiple regression analysis showed that reading span was the main predictor of speech recognition thresholds in noise when hearing aids were first fitted, but that the pure-tone average hearing threshold was the main predictor 6 months later. One way of explaining the results is that working memory capacity plays a more important role in speech recognition in noise initially rather than after 6 months of use. We propose that new hearing aid users engage working memory capacity to recognize unfamiliar processed speech signals because the phonological form of these signals cannot be automatically matched to phonological representations in long-term memory. As familiarization proceeds, the mismatch effect is alleviated, and the engagement of working memory capacity is reduced. © The Author(s) 2014.

  7. Recognition of Speech from the Television with Use of a Wireless Technology Designed for Cochlear Implants.

    PubMed

    Duke, Mila Morais; Wolfe, Jace; Schafer, Erin

    2016-05-01

    Cochlear implant (CI) recipients often experience difficulty understanding speech in noise and speech that originates from a distance. Many CI recipients also experience difficulty understanding speech originating from a television. Use of hearing assistance technology (HAT) may improve speech recognition in noise and for signals that originate from more than a few feet from the listener; however, there are no published studies evaluating the potential benefits of a wireless HAT designed to deliver audio signals from a television directly to a CI sound processor. The objective of this study was to compare speech recognition in quiet and in noise of CI recipients with the use of their CI alone and with the use of their CI and a wireless HAT (Cochlear Wireless TV Streamer). A two-way repeated measures design was used to evaluate performance differences obtained in quiet and in competing noise (65 dBA) with the CI sound processor alone and with the sound processor coupled to the Cochlear Wireless TV Streamer. Sixteen users of Cochlear Nucleus 24 Freedom, CI512, and CI422 implants were included in the study. Participants were evaluated in four conditions including use of the sound processor alone and use of the sound processor with the wireless streamer in quiet and in the presence of competing noise at 65 dBA. Speech recognition was evaluated in each condition with two full lists of Computer-Assisted Speech Perception Testing and Training Sentence-Level Test sentences presented from a light-emitting diode television. Speech recognition in noise was significantly better with use of the wireless streamer compared to participants' performance with their CI sound processor alone. There was also a nonsignificant trend toward better performance in quiet with use of the TV Streamer. Performance was significantly poorer when evaluated in noise compared to performance in quiet when the TV Streamer was not used. Use of the Cochlear Wireless TV Streamer designed to stream audio from a television directly to a CI sound processor provides better speech recognition in quiet and in noise when compared to performance obtained with use of the CI sound processor alone. American Academy of Audiology.

  8. Institute for the Study of Human Capabilities Summary Descriptions of Research for the Period September 1988 through June 1989

    DTIC Science & Technology

    1989-06-01

    12 1.7 Application of the Modified Speech Transmission Index to Monaural and Binaural Speech Recognition in Normal and Impaired...describe all of the data from both groups. 1.7 Application of the Modified Speech Transmission Index to Monaural and Binaural Speech Recognition in...were obtained for materials presented to each ear separately (monaurally) and to both ears ( binaurally ). Results from the normal listeners are accurately

  9. Emotion recognition from speech: tools and challenges

    NASA Astrophysics Data System (ADS)

    Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.

    2015-05-01

    Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.

  10. Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

    PubMed Central

    Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai

    2017-01-01

    Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705

  11. Identifying Speech Acts in E-Mails: Toward Automated Scoring of the "TOEIC"® E-Mail Task. Research Report. ETS RR-12-16

    ERIC Educational Resources Information Center

    De Felice, Rachele; Deane, Paul

    2012-01-01

    This study proposes an approach to automatically score the "TOEIC"® Writing e-mail task. We focus on one component of the scoring rubric, which notes whether the test-takers have used particular speech acts such as requests, orders, or commitments. We developed a computational model for automated speech act identification and tested it…

  12. Hearing Handicap and Speech Recognition Correlate With Self-Reported Listening Effort and Fatigue.

    PubMed

    Alhanbali, Sara; Dawes, Piers; Lloyd, Simon; Munro, Kevin J

    To investigate the correlations between hearing handicap, speech recognition, listening effort, and fatigue. Eighty-four adults with hearing loss (65 to 85 years) completed three self-report questionnaires: the Fatigue Assessment Scale, the Effort Assessment Scale, and the Hearing Handicap Inventory for Elderly. Audiometric assessment included pure-tone audiometry and speech recognition in noise. There was a significant positive correlation between handicap and fatigue (r = 0.39, p < 0.05) and handicap and effort (r = 0.73, p < 0.05). There were significant (but lower) correlations between speech recognition and fatigue (r = 0.22, p < 0.05) or effort (r = 0.32, p< 0.05). There was no significant correlation between hearing level and fatigue or effort. Hearing handicap and speech recognition both correlate with self-reported listening effort and fatigue, which is consistent with a model of listening effort and fatigue where perceived difficulty is related to sustained effort and fatigue for unrewarding tasks over which the listener has low control. A clinical implication is that encouraging clients to recognize and focus on the pleasure and positive experiences of listening may result in greater satisfaction and benefit from hearing aid use.

  13. Development of coffee maker service robot using speech and face recognition systems using POMDP

    NASA Astrophysics Data System (ADS)

    Budiharto, Widodo; Meiliana; Santoso Gunawan, Alexander Agung

    2016-07-01

    There are many development of intelligent service robot in order to interact with user naturally. This purpose can be done by embedding speech and face recognition ability on specific tasks to the robot. In this research, we would like to propose Intelligent Coffee Maker Robot which the speech recognition is based on Indonesian language and powered by statistical dialogue systems. This kind of robot can be used in the office, supermarket or restaurant. In our scenario, robot will recognize user's face and then accept commands from the user to do an action, specifically in making a coffee. Based on our previous work, the accuracy for speech recognition is about 86% and face recognition is about 93% in laboratory experiments. The main problem in here is to know the intention of user about how sweetness of the coffee. The intelligent coffee maker robot should conclude the user intention through conversation under unreliable automatic speech in noisy environment. In this paper, this spoken dialog problem is treated as a partially observable Markov decision process (POMDP). We describe how this formulation establish a promising framework by empirical results. The dialog simulations are presented which demonstrate significant quantitative outcome.

  14. Financial and workflow analysis of radiology reporting processes in the planning phase of implementation of a speech recognition system

    NASA Astrophysics Data System (ADS)

    Whang, Tom; Ratib, Osman M.; Umamoto, Kathleen; Grant, Edward G.; McCoy, Michael J.

    2002-05-01

    The goal of this study is to determine the financial value and workflow improvements achievable by replacing traditional transcription services with a speech recognition system in a large, university hospital setting. Workflow metrics were measured at two hospitals, one of which exclusively uses a transcription service (UCLA Medical Center), and the other which exclusively uses speech recognition (West Los Angeles VA Hospital). Workflow metrics include time spent per report (the sum of time spent interpreting, dictating, reviewing, and editing), transcription turnaround, and total report turnaround. Compared to traditional transcription, speech recognition resulted in radiologists spending 13-32% more time per report, but it also resulted in reduction of report turnaround time by 22-62% and reduction of marginal cost per report by 94%. The model developed here helps justify the introduction of a speech recognition system by showing that the benefits of reduced operating costs and decreased turnaround time outweigh the cost of increased time spent per report. Whether the ultimate goal is to achieve a financial objective or to improve operational efficiency, it is important to conduct a thorough analysis of workflow before implementation.

  15. Effects of cooperating and conflicting cues on speech intonation recognition by cochlear implant users and normal hearing listeners.

    PubMed

    Peng, Shu-Chen; Lu, Nelson; Chatterjee, Monita

    2009-01-01

    Cochlear implant (CI) recipients have only limited access to fundamental frequency (F0) information, and thus exhibit deficits in speech intonation recognition. For speech intonation, F0 serves as the primary cue, and other potential acoustic cues (e.g. intensity properties) may also contribute. This study examined the effects of cooperating or conflicting acoustic cues on speech intonation recognition by adult CI and normal hearing (NH) listeners with full-spectrum and spectrally degraded speech stimuli. Identification of speech intonation that signifies question and statement contrasts was measured in 13 CI recipients and 4 NH listeners, using resynthesized bi-syllabic words, where F0 and intensity properties were systematically manipulated. The stimulus set was comprised of tokens whose acoustic cues (i.e. F0 contour and intensity patterns) were either cooperating or conflicting. Subjects identified if each stimulus is a 'statement' or a 'question' in a single-interval, 2-alternative forced-choice (2AFC) paradigm. Logistic models were fitted to the data, and estimated coefficients were compared under cooperating and conflicting conditions, between the subject groups (CI vs. NH), and under full-spectrum and spectrally degraded conditions for NH listeners. The results indicated that CI listeners' intonation recognition was enhanced by cooperating F0 contour and intensity cues, but was adversely affected by these cues being conflicting. On the other hand, with full-spectrum stimuli, NH listeners' intonation recognition was not affected by cues being cooperating or conflicting. The effects of cues being cooperating or conflicting were comparable between the CI group and NH listeners with spectrally degraded stimuli. These findings suggest the importance of taking multiple acoustic sources for speech recognition into consideration in aural rehabilitation for CI recipients. Copyright (C) 2009 S. Karger AG, Basel.

  16. Effects of cooperating and conflicting cues on speech intonation recognition by cochlear implant users and normal hearing listeners

    PubMed Central

    Peng, Shu-Chen; Lu, Nelson; Chatterjee, Monita

    2009-01-01

    Cochlear implant (CI) recipients have only limited access to fundamental frequency (F0) information, and thus exhibit deficits in speech intonation recognition. For speech intonation, F0 serves as the primary cue, and other potential acoustic cues (e.g., intensity properties) may also contribute. This study examined the effects of acoustic cues being cooperating or conflicting on speech intonation recognition by adult cochlear implant (CI), and normal-hearing (NH) listeners with full-spectrum and spectrally degraded speech stimuli. Identification of speech intonation that signifies question and statement contrasts was measured in 13 CI recipients and 4 NH listeners, using resynthesized bi-syllabic words, where F0 and intensity properties were systematically manipulated. The stimulus set was comprised of tokens whose acoustic cues, i.e., F0 contour and intensity patterns, were either cooperating or conflicting. Subjects identified if each stimulus is a “statement” or a “question” in a single-interval, two-alternative forced-choice (2AFC) paradigm. Logistic models were fitted to the data, and estimated coefficients were compared under cooperating and conflicting conditions, between the subject groups (CI vs. NH), and under full-spectrum and spectrally degraded conditions for NH listeners. The results indicated that CI listeners’ intonation recognition was enhanced by F0 contour and intensity cues being cooperating, but was adversely affected by these cues being conflicting. On the other hand, with full-spectrum stimuli, NH listeners’ intonation recognition was not affected by cues being cooperating or conflicting. The effects of cues being cooperating or conflicting were comparable between the CI group and NH listeners with spectrally-degraded stimuli. These findings suggest the importance of taking multiple acoustic sources for speech recognition into consideration in aural rehabilitation for CI recipients. PMID:19372651

  17. Improving Mobile Phone Speech Recognition by Personalized Amplification: Application in People with Normal Hearing and Mild-to-Moderate Hearing Loss.

    PubMed

    Kam, Anna Chi Shan; Sung, John Ka Keung; Lee, Tan; Wong, Terence Ka Cheong; van Hasselt, Andrew

    In this study, the authors evaluated the effect of personalized amplification on mobile phone speech recognition in people with and without hearing loss. This prospective study used double-blind, within-subjects, repeated measures, controlled trials to evaluate the effectiveness of applying personalized amplification based on the hearing level captured on the mobile device. The personalized amplification settings were created using modified one-third gain targets. The participants in this study included 100 adults of age between 20 and 78 years (60 with age-adjusted normal hearing and 40 with hearing loss). The performance of the participants with personalized amplification and standard settings was compared using both subjective and speech-perception measures. Speech recognition was measured in quiet and in noise using Cantonese disyllabic words. Subjective ratings on the quality, clarity, and comfortableness of the mobile signals were measured with an 11-point visual analog scale. Subjective preferences of the settings were also obtained by a paired-comparison procedure. The personalized amplification application provided better speech recognition via the mobile phone both in quiet and in noise for people with hearing impairment (improved 8 to 10%) and people with normal hearing (improved 1 to 4%). The improvement in speech recognition was significantly better for people with hearing impairment. When the average device output level was matched, more participants preferred to have the individualized gain than not to have it. The personalized amplification application has the potential to improve speech recognition for people with mild-to-moderate hearing loss, as well as people with normal hearing, in particular when listening in noisy environments.

  18. Study of environmental sound source identification based on hidden Markov model for robust speech recognition

    NASA Astrophysics Data System (ADS)

    Nishiura, Takanobu; Nakamura, Satoshi

    2003-10-01

    Humans communicate with each other through speech by focusing on the target speech among environmental sounds in real acoustic environments. We can easily identify the target sound from other environmental sounds. For hands-free speech recognition, the identification of the target speech from environmental sounds is imperative. This mechanism may also be important for a self-moving robot to sense the acoustic environments and communicate with humans. Therefore, this paper first proposes hidden Markov model (HMM)-based environmental sound source identification. Environmental sounds are modeled by three states of HMMs and evaluated using 92 kinds of environmental sounds. The identification accuracy was 95.4%. This paper also proposes a new HMM composition method that composes speech HMMs and an HMM of categorized environmental sounds for robust environmental sound-added speech recognition. As a result of the evaluation experiments, we confirmed that the proposed HMM composition outperforms the conventional HMM composition with speech HMMs and a noise (environmental sound) HMM trained using noise periods prior to the target speech in a captured signal. [Work supported by Ministry of Public Management, Home Affairs, Posts and Telecommunications of Japan.

  19. Speech therapy and voice recognition instrument

    NASA Technical Reports Server (NTRS)

    Cohen, J.; Babcock, M. L.

    1972-01-01

    Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.

  20. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  1. A Development of a System Enables Character Input and PC Operation via Voice for a Physically Disabled Person with a Speech Impediment

    NASA Astrophysics Data System (ADS)

    Tanioka, Toshimasa; Egashira, Hiroyuki; Takata, Mayumi; Okazaki, Yasuhisa; Watanabe, Kenzi; Kondo, Hiroki

    We have designed and implemented a PC operation support system for a physically disabled person with a speech impediment via voice. Voice operation is an effective method for a physically disabled person with involuntary movement of the limbs and the head. We have applied a commercial speech recognition engine to develop our system for practical purposes. Adoption of a commercial engine reduces development cost and will contribute to make our system useful to another speech impediment people. We have customized commercial speech recognition engine so that it can recognize the utterance of a person with a speech impediment. We have restricted the words that the recognition engine recognizes and separated a target words from similar words in pronunciation to avoid misrecognition. Huge number of words registered in commercial speech recognition engines cause frequent misrecognition for speech impediments' utterance, because their utterance is not clear and unstable. We have solved this problem by narrowing the choice of input down in a small number and also by registering their ambiguous pronunciations in addition to the original ones. To realize all character inputs and all PC operation with a small number of words, we have designed multiple input modes with categorized dictionaries and have introduced two-step input in each mode except numeral input to enable correct operation with small number of words. The system we have developed is in practical level. The first author of this paper is physically disabled with a speech impediment. He has been able not only character input into PC but also to operate Windows system smoothly by using this system. He uses this system in his daily life. This paper is written by him with this system. At present, the speech recognition is customized to him. It is, however, possible to customize for other users by changing words and registering new pronunciation according to each user's utterance.

  2. Use of Adaptive Digital Signal Processing to Improve Speech Communication for Normally Hearing aand Hearing-Impaired Subjects.

    ERIC Educational Resources Information Center

    Harris, Richard W.; And Others

    1988-01-01

    A two-microphone adaptive digital noise cancellation technique improved word-recognition ability for 20 normal and 12 hearing-impaired adults by reducing multitalker speech babble and speech spectrum noise 18-22 dB. Word recognition improvements averaged 37-50 percent for normal and 27-40 percent for hearing-impaired subjects. Improvement was best…

  3. Multilingual Phoneme Models for Rapid Speech Processing System Development

    DTIC Science & Technology

    2006-09-01

    processes are used to develop an Arabic speech recognition system starting from monolingual English models, In- ternational Phonetic Association (IPA...clusters. It was found that multilingual bootstrapping methods out- perform monolingual English bootstrapping methods on the Arabic evaluation data initially...International Phonetic Alphabet . . . . . . . . . 7 2.3.2 Multilingual vs. Monolingual Speech Recognition 7 2.3.3 Data-Driven Approaches

  4. An Exploration of the Potential of Automatic Speech Recognition to Assist and Enable Receptive Communication in Higher Education

    ERIC Educational Resources Information Center

    Wald, Mike

    2006-01-01

    The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search…

  5. A novel probabilistic framework for event-based speech recognition

    NASA Astrophysics Data System (ADS)

    Juneja, Amit; Espy-Wilson, Carol

    2003-10-01

    One of the reasons for unsatisfactory performance of the state-of-the-art automatic speech recognition (ASR) systems is the inferior acoustic modeling of low-level acoustic-phonetic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal, but such a system for continuous speech recognition (CSR) is not known to exist. A probabilistic and statistical framework for CSR based on the idea of the representation of speech sounds by bundles of binary valued articulatory phonetic features is proposed. Multiple probabilistic sequences of linguistically motivated landmarks are obtained using binary classifiers of manner phonetic features-syllabic, sonorant and continuant-and the knowledge-based acoustic parameters (APs) that are acoustic correlates of those features. The landmarks are then used for the extraction of knowledge-based APs for source and place phonetic features and their binary classification. Probabilistic landmark sequences are constrained using manner class language models for isolated or connected word recognition. The proposed method could overcome the disadvantages encountered by the early acoustic-phonetic knowledge-based systems that led the ASR community to switch to systems highly dependent on statistical pattern analysis methods and probabilistic language or grammar models.

  6. Effect of a Bluetooth-implemented hearing aid on speech recognition performance: subjective and objective measurement.

    PubMed

    Kim, Min-Beom; Chung, Won-Ho; Choi, Jeesun; Hong, Sung Hwa; Cho, Yang-Sun; Park, Gyuseok; Lee, Sangmin

    2014-06-01

    The object was to evaluate speech perception improvement through Bluetooth-implemented hearing aids in hearing-impaired adults. Thirty subjects with bilateral symmetric moderate sensorineural hearing loss participated in this study. A Bluetooth-implemented hearing aid was fitted unilaterally in all study subjects. Objective speech recognition score and subjective satisfaction were measured with a Bluetooth-implemented hearing aid to replace the acoustic connection from either a cellular phone or a loudspeaker system. In each system, participants were assigned to 4 conditions: wireless speech signal transmission into hearing aid (wireless mode) in quiet or noisy environment and conventional speech signal transmission using external microphone of hearing aid (conventional mode) in quiet or noisy environment. Also, participants completed questionnaires to investigate subjective satisfaction. Both cellular phone and loudspeaker system situation, participants showed improvements in sentence and word recognition scores with wireless mode compared to conventional mode in both quiet and noise conditions (P < .001). Participants also reported subjective improvements, including better sound quality, less noise interference, and better accuracy naturalness, when using the wireless mode (P < .001). Bluetooth-implemented hearing aids helped to improve subjective and objective speech recognition performances in quiet and noisy environments during the use of electronic audio devices.

  7. Speech recognition in individuals with sensorineural hearing loss.

    PubMed

    de Andrade, Adriana Neves; Iorio, Maria Cecilia Martinelli; Gil, Daniela

    2016-01-01

    Hearing loss can negatively influence the communication performance of individuals, who should be evaluated with suitable material and in situations of listening close to those found in everyday life. To analyze and compare the performance of patients with mild-to-moderate sensorineural hearing loss in speech recognition tests carried out in silence and with noise, according to the variables ear (right and left) and type of stimulus presentation. The study included 19 right-handed individuals with mild-to-moderate symmetrical bilateral sensorineural hearing loss, submitted to the speech recognition test with words in different modalities and speech test with white noise and pictures. There was no significant difference between right and left ears in any of the tests. The mean number of correct responses in the speech recognition test with pictures, live voice, and recorded monosyllables was 97.1%, 85.9%, and 76.1%, respectively, whereas after the introduction of noise, the performance decreased to 72.6% accuracy. The best performances in the Speech Recognition Percentage Index were obtained using monosyllabic stimuli, represented by pictures presented in silence, with no significant differences between the right and left ears. After the introduction of competitive noise, there was a decrease in individuals' performance. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.

  8. Relationships between Structural and Acoustic Properties of Maternal Talk and Children's Early Word Recognition

    ERIC Educational Resources Information Center

    Suttora, Chiara; Salerni, Nicoletta; Zanchi, Paola; Zampini, Laura; Spinelli, Maria; Fasolo, Mirco

    2017-01-01

    This study aimed to investigate specific associations between structural and acoustic characteristics of infant-directed (ID) speech and word recognition. Thirty Italian-acquiring children and their mothers were tested when the children were 1;3. Children's word recognition was measured with the looking-while-listening task. Maternal ID speech was…

  9. Speech Recognition in Adults with Cochlear Implants: The Effects of Working Memory, Phonological Sensitivity, and Aging

    ERIC Educational Resources Information Center

    Moberly, Aaron C.; Harris, Michael S.; Boyce, Lauren; Nittrouer, Susan

    2017-01-01

    Purpose: Models of speech recognition suggest that "top-down" linguistic and cognitive functions, such as use of phonotactic constraints and working memory, facilitate recognition under conditions of degradation, such as in noise. The question addressed in this study was what happens to these functions when a listener who has experienced…

  10. The Effect of Asymmetrical Signal Degradation on Binaural Speech Recognition in Children and Adults.

    ERIC Educational Resources Information Center

    Rothpletz, Ann M.; Tharpe, Anne Marie; Grantham, D. Wesley

    2004-01-01

    To determine the effect of asymmetrical signal degradation on binaural speech recognition, 28 children and 14 adults were administered a sentence recognition task amidst multitalker babble. There were 3 listening conditions: (a) monaural, with mild degradation in 1 ear; (b) binaural, with mild degradation in both ears (symmetric degradation); and…

  11. Learning Models and Real-Time Speech Recognition.

    ERIC Educational Resources Information Center

    Danforth, Douglas G.; And Others

    This report describes the construction and testing of two "psychological" learning models for the purpose of computer recognition of human speech over the telephone. One of the two models was found to be superior in all tests. A regression analysis yielded a 92.3% recognition rate for 14 subjects ranging in age from 6 to 13 years. Tests…

  12. Use of Authentic-Speech Technique for Teaching Sound Recognition to EFL Students

    ERIC Educational Resources Information Center

    Sersen, William J.

    2011-01-01

    The main objective of this research was to test an authentic-speech technique for improving the sound-recognition skills of EFL (English as a foreign language) students at Roi-Et Rajabhat University. The secondary objective was to determine the correlation, if any, between students' self-evaluation of sound-recognition progress and the actual…

  13. What happens to the motor theory of perception when the motor system is damaged?

    PubMed

    Stasenko, Alena; Garcea, Frank E; Mahon, Bradford Z

    2013-09-01

    Motor theories of perception posit that motor information is necessary for successful recognition of actions. Perhaps the most well known of this class of proposals is the motor theory of speech perception, which argues that speech recognition is fundamentally a process of identifying the articulatory gestures (i.e. motor representations) that were used to produce the speech signal. Here we review neuropsychological evidence from patients with damage to the motor system, in the context of motor theories of perception applied to both manual actions and speech. Motor theories of perception predict that patients with motor impairments will have impairments for action recognition. Contrary to that prediction, the available neuropsychological evidence indicates that recognition can be spared despite profound impairments to production. These data falsify strong forms of the motor theory of perception, and frame new questions about the dynamical interactions that govern how information is exchanged between input and output systems.

  14. What happens to the motor theory of perception when the motor system is damaged?

    PubMed Central

    Stasenko, Alena; Garcea, Frank E.; Mahon, Bradford Z.

    2016-01-01

    Motor theories of perception posit that motor information is necessary for successful recognition of actions. Perhaps the most well known of this class of proposals is the motor theory of speech perception, which argues that speech recognition is fundamentally a process of identifying the articulatory gestures (i.e. motor representations) that were used to produce the speech signal. Here we review neuropsychological evidence from patients with damage to the motor system, in the context of motor theories of perception applied to both manual actions and speech. Motor theories of perception predict that patients with motor impairments will have impairments for action recognition. Contrary to that prediction, the available neuropsychological evidence indicates that recognition can be spared despite profound impairments to production. These data falsify strong forms of the motor theory of perception, and frame new questions about the dynamical interactions that govern how information is exchanged between input and output systems. PMID:26823687

  15. Sentence Recognition Prediction for Hearing-impaired Listeners in Stationary and Fluctuation Noise With FADE

    PubMed Central

    Schädler, Marc René; Warzybok, Anna; Meyer, Bernd T.; Brand, Thomas

    2016-01-01

    To characterize the individual patient’s hearing impairment as obtained with the matrix sentence recognition test, a simulation Framework for Auditory Discrimination Experiments (FADE) is extended here using the Attenuation and Distortion (A+D) approach by Plomp as a blueprint for setting the individual processing parameters. FADE has been shown to predict the outcome of both speech recognition tests and psychoacoustic experiments based on simulations using an automatic speech recognition system requiring only few assumptions. It builds on the closed-set matrix sentence recognition test which is advantageous for testing individual speech recognition in a way comparable across languages. Individual predictions of speech recognition thresholds in stationary and in fluctuating noise were derived using the audiogram and an estimate of the internal level uncertainty for modeling the individual Plomp curves fitted to the data with the Attenuation (A-) and Distortion (D-) parameters of the Plomp approach. The “typical” audiogram shapes from Bisgaard et al with or without a “typical” level uncertainty and the individual data were used for individual predictions. As a result, the individualization of the level uncertainty was found to be more important than the exact shape of the individual audiogram to accurately model the outcome of the German Matrix test in stationary or fluctuating noise for listeners with hearing impairment. The prediction accuracy of the individualized approach also outperforms the (modified) Speech Intelligibility Index approach which is based on the individual threshold data only. PMID:27604782

  16. Development of an automated speech recognition interface for personal emergency response systems

    PubMed Central

    Hamill, Melinda; Young, Vicky; Boger, Jennifer; Mihailidis, Alex

    2009-01-01

    Background Demands on long-term-care facilities are predicted to increase at an unprecedented rate as the baby boomer generation reaches retirement age. Aging-in-place (i.e. aging at home) is the desire of most seniors and is also a good option to reduce the burden on an over-stretched long-term-care system. Personal Emergency Response Systems (PERSs) help enable older adults to age-in-place by providing them with immediate access to emergency assistance. Traditionally they operate with push-button activators that connect the occupant via speaker-phone to a live emergency call-centre operator. If occupants do not wear the push button or cannot access the button, then the system is useless in the event of a fall or emergency. Additionally, a false alarm or failure to check-in at a regular interval will trigger a connection to a live operator, which can be unwanted and intrusive to the occupant. This paper describes the development and testing of an automated, hands-free, dialogue-based PERS prototype. Methods The prototype system was built using a ceiling mounted microphone array, an open-source automatic speech recognition engine, and a 'yes' and 'no' response dialog modelled after an existing call-centre protocol. Testing compared a single microphone versus a microphone array with nine adults in both noisy and quiet conditions. Dialogue testing was completed with four adults. Results and discussion The microphone array demonstrated improvement over the single microphone. In all cases, dialog testing resulted in the system reaching the correct decision about the kind of assistance the user was requesting. Further testing is required with elderly voices and under different noise conditions to ensure the appropriateness of the technology. Future developments include integration of the system with an emergency detection method as well as communication enhancement using features such as barge-in capability. Conclusion The use of an automated dialog-based PERS has the potential to provide users with more autonomy in decisions regarding their own health and more privacy in their own home. PMID:19583876

  17. Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

    PubMed

    Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

    This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.

  18. Assessment of spectral and temporal resolution in cochlear implant users using psychoacoustic discrimination and speech cue categorization

    PubMed Central

    Winn, Matthew B.; Won, Jong Ho; Moon, Il Joon

    2016-01-01

    Objectives This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). We hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. We further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Design Nineteen CI listeners and 10 listeners with normal hearing (NH) participated in a suite of tasks that included spectral ripple discrimination (SRD), temporal modulation detection (TMD), and syllable categorization, which was split into a spectral-cue-based task (targeting the /ba/-/da/ contrast) and a timing-cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated in order to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression in order to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for CI listeners. Results CI users were generally less successful at utilizing both spectral and temporal cues for categorization compared to listeners with normal hearing. For the CI listener group, SRD was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. TMD using 100 Hz and 10 Hz modulated noise was not correlated with the CI subjects’ categorization of VOT, nor with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. Conclusions When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart non-linguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (VOT) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language. PMID:27438871

  19. Speech variability effects on recognition accuracy associated with concurrent task performance by pilots

    NASA Technical Reports Server (NTRS)

    Simpson, C. A.

    1985-01-01

    In the present study of the responses of pairs of pilots to aircraft warning classification tasks using an isolated word, speaker-dependent speech recognition system, the induced stress was manipulated by means of different scoring procedures for the classification task and by the inclusion of a competitive manual control task. Both speech patterns and recognition accuracy were analyzed, and recognition errors were recorded by type for an isolated word speaker-dependent system and by an offline technique for a connected word speaker-dependent system. While errors increased with task loading for the isolated word system, there was no such effect for task loading in the case of the connected word system.

  20. Speech recognition in advanced rotorcraft - Using speech controls to reduce manual control overload

    NASA Technical Reports Server (NTRS)

    Vidulich, Michael A.; Bortolussi, Michael R.

    1988-01-01

    An experiment has been conducted to ascertain the usefulness of helicopter pilot speech controls and their effect on time-sharing performance, under the impetus of multiple-resource theories of attention which predict that time-sharing should be more efficient with mixed manual and speech controls than with all-manual ones. The test simulation involved an advanced, single-pilot scout/attack helicopter. Performance and subjective workload levels obtained supported the claimed utility of speech recognition-based controls; specifically, time-sharing performance was improved while preparing a data-burst transmission of information during helicopter hover.

  1. "Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing.

    PubMed

    Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis G; Atkins, David C; Narayanan, Shrikanth S

    2015-01-01

    The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.

  2. Incorporating Speech Recognition into a Natural User Interface

    NASA Technical Reports Server (NTRS)

    Chapa, Nicholas

    2017-01-01

    The Augmented/ Virtual Reality (AVR) Lab has been working to study the applicability of recent virtual and augmented reality hardware and software to KSC operations. This includes the Oculus Rift, HTC Vive, Microsoft HoloLens, and Unity game engine. My project in this lab is to integrate voice recognition and voice commands into an easy to modify system that can be added to an existing portion of a Natural User Interface (NUI). A NUI is an intuitive and simple to use interface incorporating visual, touch, and speech recognition. The inclusion of speech recognition capability will allow users to perform actions or make inquiries using only their voice. The simplicity of needing only to speak to control an on-screen object or enact some digital action means that any user can quickly become accustomed to using this system. Multiple programs were tested for use in a speech command and recognition system. Sphinx4 translates speech to text using a Hidden Markov Model (HMM) based Language Model, an Acoustic Model, and a word Dictionary running on Java. PocketSphinx had similar functionality to Sphinx4 but instead ran on C. However, neither of these programs were ideal as building a Java or C wrapper slowed performance. The most ideal speech recognition system tested was the Unity Engine Grammar Recognizer. A Context Free Grammar (CFG) structure is written in an XML file to specify the structure of phrases and words that will be recognized by Unity Grammar Recognizer. Using Speech Recognition Grammar Specification (SRGS) 1.0 makes modifying the recognized combinations of words and phrases very simple and quick to do. With SRGS 1.0, semantic information can also be added to the XML file, which allows for even more control over how spoken words and phrases are interpreted by Unity. Additionally, using a CFG with SRGS 1.0 produces a Finite State Machine (FSM) functionality limiting the potential for incorrectly heard words or phrases. The purpose of my project was to investigate options for a Speech Recognition System. To that end I attempted to integrate Sphinx4 into a user interface. Sphinx4 had great accuracy and is the only free program able to perform offline speech dictation. However it had a limited dictionary of words that could be recognized, single syllable words were almost impossible for it to hear, and since it ran on Java it could not be integrated into the Unity based NUI. PocketSphinx ran much faster than Sphinx4 which would've made it ideal as a plugin to the Unity NUI, unfortunately creating a C# wrapper for the C code made the program unusable with Unity due to the wrapper slowing code execution and class files becoming unreachable. Unity Grammar Recognizer is the ideal speech recognition interface, it is flexible in recognizing multiple variations of the same command. It is also the most accurate program in recognizing speech due to using an XML grammar to specify speech structure instead of relying solely on a Dictionary and Language model. The Unity Grammar Recognizer will be used with the NUI for these reasons as well as being written in C# which further simplifies the incorporation.

  3. A novel speech processing algorithm based on harmonicity cues in cochlear implant

    NASA Astrophysics Data System (ADS)

    Wang, Jian; Chen, Yousheng; Zhang, Zongping; Chen, Yan; Zhang, Weifeng

    2017-08-01

    This paper proposed a novel speech processing algorithm in cochlear implant, which used harmonicity cues to enhance tonal information in Mandarin Chinese speech recognition. The input speech was filtered by a 4-channel band-pass filter bank. The frequency ranges for the four bands were: 300-621, 621-1285, 1285-2657, and 2657-5499 Hz. In each pass band, temporal envelope and periodicity cues (TEPCs) below 400 Hz were extracted by full wave rectification and low-pass filtering. The TEPCs were modulated by a sinusoidal carrier, the frequency of which was fundamental frequency (F0) and its harmonics most close to the center frequency of each band. Signals from each band were combined together to obtain an output speech. Mandarin tone, word, and sentence recognition in quiet listening conditions were tested for the extensively used continuous interleaved sampling (CIS) strategy and the novel F0-harmonic algorithm. Results found that the F0-harmonic algorithm performed consistently better than CIS strategy in Mandarin tone, word, and sentence recognition. In addition, sentence recognition rate was higher than word recognition rate, as a result of contextual information in the sentence. Moreover, tone 3 and 4 performed better than tone 1 and tone 2, due to the easily identified features of the former. In conclusion, the F0-harmonic algorithm could enhance tonal information in cochlear implant speech processing due to the use of harmonicity cues, thereby improving Mandarin tone, word, and sentence recognition. Further study will focus on the test of the F0-harmonic algorithm in noisy listening conditions.

  4. Speech recognition in one- and two-talker maskers in school-age children and adults: Development of perceptual masking and glimpsing

    PubMed Central

    Buss, Emily; Leibold, Lori J.; Porter, Heather L.; Grose, John H.

    2017-01-01

    Children perform more poorly than adults on a wide range of masked speech perception paradigms, but this effect is particularly pronounced when the masker itself is also composed of speech. The present study evaluated two factors that might contribute to this effect: the ability to perceptually isolate the target from masker speech, and the ability to recognize target speech based on sparse cues (glimpsing). Speech reception thresholds (SRTs) were estimated for closed-set, disyllabic word recognition in children (5–16 years) and adults in a one- or two-talker masker. Speech maskers were 60 dB sound pressure level (SPL), and they were either presented alone or in combination with a 50-dB-SPL speech-shaped noise masker. There was an age effect overall, but performance was adult-like at a younger age for the one-talker than the two-talker masker. Noise tended to elevate SRTs, particularly for older children and adults, and when summed with the one-talker masker. Removing time-frequency epochs associated with a poor target-to-masker ratio markedly improved SRTs, with larger effects for younger listeners; the age effect was not eliminated, however. Results were interpreted as indicating that development of speech-in-speech recognition is likely impacted by development of both perceptual masking and the ability recognize speech based on sparse cues. PMID:28464682

  5. Working memory capacity may influence perceived effort during aided speech recognition in noise.

    PubMed

    Rudner, Mary; Lunner, Thomas; Behrens, Thomas; Thorén, Elisabet Sundewall; Rönnberg, Jerker

    2012-09-01

    Recently there has been interest in using subjective ratings as a measure of perceived effort during speech recognition in noise. Perceived effort may be an indicator of cognitive load. Thus, subjective effort ratings during speech recognition in noise may covary both with signal-to-noise ratio (SNR) and individual cognitive capacity. The present study investigated the relation between subjective ratings of the effort involved in listening to speech in noise, speech recognition performance, and individual working memory (WM) capacity in hearing impaired hearing aid users. In two experiments, participants with hearing loss rated perceived effort during aided speech perception in noise. Noise type and SNR were manipulated in both experiments, and in the second experiment hearing aid compression release settings were also manipulated. Speech recognition performance was measured along with WM capacity. There were 46 participants in all with bilateral mild to moderate sloping hearing loss. In Experiment 1 there were 16 native Danish speakers (eight women and eight men) with a mean age of 63.5 yr (SD = 12.1) and average pure tone (PT) threshold of 47. 6 dB (SD = 9.8). In Experiment 2 there were 30 native Swedish speakers (19 women and 11 men) with a mean age of 70 yr (SD = 7.8) and average PT threshold of 45.8 dB (SD = 6.6). A visual analog scale (VAS) was used for effort rating in both experiments. In Experiment 1, effort was rated at individually adapted SNRs while in Experiment 2 it was rated at fixed SNRs. Speech recognition in noise performance was measured using adaptive procedures in both experiments with Dantale II sentences in Experiment 1 and Hagerman sentences in Experiment 2. WM capacity was measured using a letter-monitoring task in Experiment 1 and the reading span task in Experiment 2. In both experiments, there was a strong and significant relation between rated effort and SNR that was independent of individual WM capacity, whereas the relation between rated effort and noise type seemed to be influenced by individual WM capacity. Experiment 2 showed that hearing aid compression setting influenced rated effort. Subjective ratings of the effort involved in speech recognition in noise reflect SNRs, and individual cognitive capacity seems to influence relative rating of noise type. American Academy of Audiology.

  6. Blind speech separation system for humanoid robot with FastICA for audio filtering and separation

    NASA Astrophysics Data System (ADS)

    Budiharto, Widodo; Santoso Gunawan, Alexander Agung

    2016-07-01

    Nowadays, there are many developments in building intelligent humanoid robot, mainly in order to handle voice and image. In this research, we propose blind speech separation system using FastICA for audio filtering and separation that can be used in education or entertainment. Our main problem is to separate the multi speech sources and also to filter irrelevant noises. After speech separation step, the results will be integrated with our previous speech and face recognition system which is based on Bioloid GP robot and Raspberry Pi 2 as controller. The experimental results show the accuracy of our blind speech separation system is about 88% in command and query recognition cases.

  7. New Ideas for Speech Recognition and Related Technologies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzrichter, J F

    The ideas relating to the use of organ motion sensors for the purposes of speech recognition were first described by.the author in spring 1994. During the past year, a series of productive collaborations between the author, Tom McEwan and Larry Ng ensued and have lead to demonstrations, new sensor ideas, and algorithmic descriptions of a large number of speech recognition concepts. This document summarizes the basic concepts of recognizing speech once organ motions have been obtained. Micro power radars and their uses for the measurement of body organ motions, such as those of the heart and lungs, have been demonstratedmore » by Tom McEwan over the past two years. McEwan and I conducted a series of experiments, using these instruments, on vocal organ motions beginning in late spring, during which we observed motions of vocal folds (i.e., cords), tongue, jaw, and related organs that are very useful for speech recognition and other purposes. These will be reviewed in a separate paper. Since late summer 1994, Lawrence Ng and I have worked to make many of the initial recognition ideas more rigorous and to investigate the applications of these new ideas to new speech recognition algorithms, to speech coding, and to speech synthesis. I introduce some of those ideas in section IV of this document, and we describe them more completely in the document following this one, UCRL-UR-120311. For the design and operation of micro-power radars and their application to body organ motions, the reader may contact Tom McEwan directly. The capability for using EM sensors (i.e., radar units) to measure body organ motions and positions has been available for decades. Impediments to their use appear to have been size, excessive power, lack of resolution, and lack of understanding of the value of organ motion measurements, especially as applied to speech related technologies. However, with the invention of very low power, portable systems as demonstrated by McEwan at LLNL researchers have begun to think differently about practical applications of such radars. In particular, his demonstrations of heart and lung motions have opened up many new areas of application for human and animal measurements.« less

  8. Human phoneme recognition depending on speech-intrinsic variability.

    PubMed

    Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

    2010-11-01

    The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).

  9. A voice-input voice-output communication aid for people with severe speech impairment.

    PubMed

    Hawley, Mark S; Cunningham, Stuart P; Green, Phil D; Enderby, Pam; Palmer, Rebecca; Sehgal, Siddharth; O'Neill, Peter

    2013-01-01

    A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.

  10. The effect of sensorineural hearing loss and tinnitus on speech recognition over air and bone conduction military communications headsets.

    PubMed

    Manning, Candice; Mermagen, Timothy; Scharine, Angelique

    2017-06-01

    Military personnel are at risk for hearing loss due to noise exposure during deployment (USACHPPM, 2008). Despite mandated use of hearing protection, hearing loss and tinnitus are prevalent due to reluctance to use hearing protection. Bone conduction headsets can offer good speech intelligibility for normal hearing (NH) listeners while allowing the ears to remain open in quiet environments and the use of hearing protection when needed. Those who suffer from tinnitus, the experience of perceiving a sound not produced by an external source, often show degraded speech recognition; however, it is unclear whether this is a result of decreased hearing sensitivity or increased distractibility (Moon et al., 2015). It has been suggested that the vibratory stimulation of a bone conduction headset might ameliorate the effects of tinnitus on speech perception; however, there is currently no research to support or refute this claim (Hoare et al., 2014). Speech recognition of words presented over air conduction and bone conduction headsets was measured for three groups of listeners: NH, sensorineural hearing impaired, and/or tinnitus sufferers. Three levels of speech-to-noise (SNR = 0, -6, -12 dB) were created by embedding speech items in pink noise. Better speech recognition performance was observed with the bone conduction headset regardless of hearing profile, and speech intelligibility was a function of SNR. Discussion will include study limitations and the implications of these findings for those serving in the military. Published by Elsevier B.V.

  11. Speaker normalization for chinese vowel recognition in cochlear implants.

    PubMed

    Luo, Xin; Fu, Qian-Jie

    2005-07-01

    Because of the limited spectra-temporal resolution associated with cochlear implants, implant patients often have greater difficulty with multitalker speech recognition. The present study investigated whether multitalker speech recognition can be improved by applying speaker normalization techniques to cochlear implant speech processing. Multitalker Chinese vowel recognition was tested with normal-hearing Chinese-speaking subjects listening to a 4-channel cochlear implant simulation, with and without speaker normalization. For each subject, speaker normalization was referenced to the speaker that produced the best recognition performance under conditions without speaker normalization. To match the remaining speakers to this "optimal" output pattern, the overall frequency range of the analysis filter bank was adjusted for each speaker according to the ratio of the mean third formant frequency values between the specific speaker and the reference speaker. Results showed that speaker normalization provided a small but significant improvement in subjects' overall recognition performance. After speaker normalization, subjects' patterns of recognition performance across speakers changed, demonstrating the potential for speaker-dependent effects with the proposed normalization technique.

  12. Speech Perception, Word Recognition and the Structure of the Lexicon. Research on Speech Perception Progress Report No. 10.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    The results of three projects concerned with auditory word recognition and the structure of the lexicon are reported in this paper. The first project described was designed to test experimentally several specific predictions derived from MACS, a simulation model of the Cohort Theory of word recognition. The second project description provides the…

  13. The Development of the Orthographic Consistency Effect in Speech Recognition: From Sublexical to Lexical Involvement

    ERIC Educational Resources Information Center

    Ventura, Paulo; Morais, Jose; Kolinsky, Regine

    2007-01-01

    The influence of orthography on children's on-line auditory word recognition was studied from the end of Grade 2 to the end of Grade 4, by examining the orthographic consistency effect [Ziegler, J. C., & Ferrand, L. (1998). Orthography shapes the perception of speech: The consistency effect in auditory recognition. "Psychonomic Bulletin & Review",…

  14. Speech and Language Therapy Under an Automated Stimulus Control System.

    ERIC Educational Resources Information Center

    Garrett, Edgar Ray

    Programed instruction for speech and language therapy, based upon stimulus control programing and presented by a completely automated teaching machine, was evaluated with 32 mentally retarded children, 20 children with language disorders (childhood aphasia), six adult aphasics, and 60 normal elementary school children. Posttesting with the…

  15. Do What I Say! Voice Recognition Makes Major Advances.

    ERIC Educational Resources Information Center

    Ruley, C. Dorsey

    1994-01-01

    Explains voice recognition technology applications in the workplace, schools, and libraries. Highlights include a voice-controlled work station using the DragonDictate system that can be used with dyslexic students, converting text to speech, and converting speech to text. (LRW)

  16. Emotional recognition from the speech signal for a virtual education agent

    NASA Astrophysics Data System (ADS)

    Tickle, A.; Raghu, S.; Elshaw, M.

    2013-06-01

    This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.

  17. Robotics control using isolated word recognition of voice input

    NASA Technical Reports Server (NTRS)

    Weiner, J. M.

    1977-01-01

    A speech input/output system is presented that can be used to communicate with a task oriented system. Human speech commands and synthesized voice output extend conventional information exchange capabilities between man and machine by utilizing audio input and output channels. The speech input facility is comprised of a hardware feature extractor and a microprocessor implemented isolated word or phrase recognition system. The recognizer offers a medium sized (100 commands), syntactically constrained vocabulary, and exhibits close to real time performance. The major portion of the recognition processing required is accomplished through software, minimizing the complexity of the hardware feature extractor.

  18. Measuring the effects of spectral smearing and enhancement on speech recognition in noise for adults and children

    PubMed Central

    Nittrouer, Susan; Tarr, Eric; Wucinich, Taylor; Moberly, Aaron C.; Lowenstein, Joanna H.

    2015-01-01

    Broadened auditory filters associated with sensorineural hearing loss have clearly been shown to diminish speech recognition in noise for adults, but far less is known about potential effects for children. This study examined speech recognition in noise for adults and children using simulated auditory filters of different widths. Specifically, 5 groups (20 listeners each) of adults or children (5 and 7 yrs), were asked to recognize sentences in speech-shaped noise. Seven-year-olds listened at 0 dB signal-to-noise ratio (SNR) only; 5-yr-olds listened at +3 or 0 dB SNR; and adults listened at 0 or −3 dB SNR. Sentence materials were processed both to smear the speech spectrum (i.e., simulate broadened filters), and to enhance the spectrum (i.e., simulate narrowed filters). Results showed: (1) Spectral smearing diminished recognition for listeners of all ages; (2) spectral enhancement did not improve recognition, and in fact diminished it somewhat; and (3) interactions were observed between smearing and SNR, but only for adults. That interaction made age effects difficult to gauge. Nonetheless, it was concluded that efforts to diagnose the extent of broadening of auditory filters and to develop techniques to correct this condition could benefit patients with hearing loss, especially children. PMID:25920851

  19. Histogram equalization with Bayesian estimation for noise robust speech recognition.

    PubMed

    Suh, Youngjoo; Kim, Hoirin

    2018-02-01

    The histogram equalization approach is an efficient feature normalization technique for noise robust automatic speech recognition. However, it suffers from performance degradation when some fundamental conditions are not satisfied in the test environment. To remedy these limitations of the original histogram equalization methods, class-based histogram equalization approach has been proposed. Although this approach showed substantial performance improvement under noise environments, it still suffers from performance degradation due to the overfitting problem when test data are insufficient. To address this issue, the proposed histogram equalization technique employs the Bayesian estimation method in the test cumulative distribution function estimation. It was reported in a previous study conducted on the Aurora-4 task that the proposed approach provided substantial performance gains in speech recognition systems based on the acoustic modeling of the Gaussian mixture model-hidden Markov model. In this work, the proposed approach was examined in speech recognition systems with deep neural network-hidden Markov model (DNN-HMM), the current mainstream speech recognition approach where it also showed meaningful performance improvement over the conventional maximum likelihood estimation-based method. The fusion of the proposed features with the mel-frequency cepstral coefficients provided additional performance gains in DNN-HMM systems, which otherwise suffer from performance degradation in the clean test condition.

  20. A longitudinal study of the bilateral benefit in children with bilateral cochlear implants.

    PubMed

    Asp, Filip; Mäki-Torkko, Elina; Karltorp, Eva; Harder, Henrik; Hergils, Leif; Eskilsson, Gunnar; Stenfelt, Stefan

    2015-02-01

    To study the development of the bilateral benefit in children using bilateral cochlear implants by measurements of speech recognition and sound localization. Bilateral and unilateral speech recognition in quiet, in multi-source noise, and horizontal sound localization was measured at three occasions during a two-year period, without controlling for age or implant experience. Longitudinal and cross-sectional analyses were performed. Results were compared to cross-sectional data from children with normal hearing. Seventy-eight children aged 5.1-11.9 years, with a mean bilateral cochlear implant experience of 3.3 years and a mean age of 7.8 years, at inclusion in the study. Thirty children with normal hearing aged 4.8-9.0 years provided normative data. For children with cochlear implants, bilateral and unilateral speech recognition in quiet was comparable whereas a bilateral benefit for speech recognition in noise and sound localization was found at all three test occasions. Absolute performance was lower than in children with normal hearing. Early bilateral implantation facilitated sound localization. A bilateral benefit for speech recognition in noise and sound localization continues to exist over time for children with bilateral cochlear implants, but no relative improvement is found after three years of bilateral cochlear implant experience.

  1. Phonological mismatch makes aided speech recognition in noise cognitively taxing.

    PubMed

    Rudner, Mary; Foo, Catharina; Rönnberg, Jerker; Lunner, Thomas

    2007-12-01

    The working memory framework for Ease of Language Understanding predicts that speech processing becomes more effortful, thus requiring more explicit cognitive resources, when there is mismatch between speech input and phonological representations in long-term memory. To test this prediction, we changed the compression release settings in the hearing instruments of experienced users and allowed them to train for 9 weeks with the new settings. After training, aided speech recognition in noise was tested with both the trained settings and orthogonal settings. We postulated that training would lead to acclimatization to the trained setting, which in turn would involve establishment of new phonological representations in long-term memory. Further, we postulated that after training, testing with orthogonal settings would give rise to phonological mismatch, associated with more explicit cognitive processing. Thirty-two participants (mean=70.3 years, SD=7.7) with bilateral sensorineural hearing loss (pure-tone average=46.0 dB HL, SD=6.5), bilaterally fitted for more than 1 year with digital, two-channel, nonlinear signal processing hearing instruments and chosen from the patient population at the Linköping University Hospital were randomly assigned to 9 weeks training with new, fast (40 ms) or slow (640 ms), compression release settings in both channels. Aided speech recognition in noise performance was tested according to a design with three within-group factors: test occasion (T1, T2), test setting (fast, slow), and type of noise (unmodulated, modulated) and one between-group factor: experience setting (fast, slow) for two types of speech materials-the highly constrained Hagerman sentences and the less-predictable Hearing in Noise Test (HINT). Complex cognitive capacity was measured using the reading span and letter monitoring tests. PREDICTION: We predicted that speech recognition in noise at T2 with mismatched experience and test settings would be associated with more explicit cognitive processing and thus stronger correlations with complex cognitive measures, as well as poorer performance if complex cognitive capacity was exceeded. Under mismatch conditions, stronger correlations were found between performance on speech recognition with the Hagerman sentences and reading span, along with poorer speech recognition for participants with low reading span scores. No consistent mismatch effect was found with HINT. The mismatch prediction generated by the working memory framework for Ease of Language Understanding is supported for speech recognition in noise with the highly constrained Hagerman sentences but not the less-predictable HINT.

  2. Improving language models for radiology speech recognition.

    PubMed

    Paulett, John M; Langlotz, Curtis P

    2009-02-01

    Speech recognition systems have become increasingly popular as a means to produce radiology reports, for reasons both of efficiency and of cost. However, the suboptimal recognition accuracy of these systems can affect the productivity of the radiologists creating the text reports. We analyzed a database of over two million de-identified radiology reports to determine the strongest determinants of word frequency. Our results showed that body site and imaging modality had a similar influence on the frequency of words and of three-word phrases as did the identity of the speaker. These findings suggest that the accuracy of speech recognition systems could be significantly enhanced by further tailoring their language models to body site and imaging modality, which are readily available at the time of report creation.

  3. Connected word recognition using a cascaded neuro-computational model

    NASA Astrophysics Data System (ADS)

    Hoya, Tetsuya; van Leeuwen, Cees

    2016-10-01

    We propose a novel framework for processing a continuous speech stream that contains a varying number of words, as well as non-speech periods. Speech samples are segmented into word-tokens and non-speech periods. An augmented version of an earlier-proposed, cascaded neuro-computational model is used for recognising individual words within the stream. Simulation studies using both a multi-speaker-dependent and speaker-independent digit string database show that the proposed method yields a recognition performance comparable to that obtained by a benchmark approach using hidden Markov models with embedded training.

  4. Measuring listening effort: driving simulator vs. simple dual-task paradigm

    PubMed Central

    Wu, Yu-Hsiang; Aksan, Nazan; Rizzo, Matthew; Stangl, Elizabeth; Zhang, Xuyang; Bentler, Ruth

    2014-01-01

    Objectives The dual-task paradigm has been widely used to measure listening effort. The primary objectives of the study were to (1) investigate the effect of hearing aid amplification and a hearing aid directional technology on listening effort measured by a complicated, more real world dual-task paradigm, and (2) compare the results obtained with this paradigm to a simpler laboratory-style dual-task paradigm. Design The listening effort of adults with hearing impairment was measured using two dual-task paradigms, wherein participants performed a speech recognition task simultaneously with either a driving task in a simulator or a visual reaction-time task in a sound-treated booth. The speech materials and road noises for the speech recognition task were recorded in a van traveling on the highway in three hearing aid conditions: unaided, aided with omni directional processing (OMNI), and aided with directional processing (DIR). The change in the driving task or the visual reaction-time task performance across the conditions quantified the change in listening effort. Results Compared to the driving-only condition, driving performance declined significantly with the addition of the speech recognition task. Although the speech recognition score was higher in the OMNI and DIR conditions than in the unaided condition, driving performance was similar across these three conditions, suggesting that listening effort was not affected by amplification and directional processing. Results from the simple dual-task paradigm showed a similar trend: hearing aid technologies improved speech recognition performance, but did not affect performance in the visual reaction-time task (i.e., reduce listening effort). The correlation between listening effort measured using the driving paradigm and the visual reaction-time task paradigm was significant. The finding showing that our older (56 to 85 years old) participants’ better speech recognition performance did not result in reduced listening effort was not consistent with literature that evaluated younger (approximately 20 years old), normal hearing adults. Because of this, a follow-up study was conducted. In the follow-up study, the visual reaction-time dual-task experiment using the same speech materials and road noises was repeated on younger adults with normal hearing. Contrary to findings with older participants, the results indicated that the directional technology significantly improved performance in both speech recognition and visual reaction-time tasks. Conclusions Adding a speech listening task to driving undermined driving performance. Hearing aid technologies significantly improved speech recognition while driving, but did not significantly reduce listening effort. Listening effort measured by dual-task experiments using a simulated real-world driving task and a conventional laboratory-style task was generally consistent. For a given listening environment, the benefit of hearing aid technologies on listening effort measured from younger adults with normal hearing may not be fully translated to older listeners with hearing impairment. PMID:25083599

  5. Modeling Driving Performance Using In-Vehicle Speech Data From a Naturalistic Driving Study.

    PubMed

    Kuo, Jonny; Charlton, Judith L; Koppel, Sjaan; Rudin-Brown, Christina M; Cross, Suzanne

    2016-09-01

    We aimed to (a) describe the development and application of an automated approach for processing in-vehicle speech data from a naturalistic driving study (NDS), (b) examine the influence of child passenger presence on driving performance, and (c) model this relationship using in-vehicle speech data. Parent drivers frequently engage in child-related secondary behaviors, but the impact on driving performance is unknown. Applying automated speech-processing techniques to NDS audio data would facilitate the analysis of in-vehicle driver-child interactions and their influence on driving performance. Speech activity detection and speaker diarization algorithms were applied to audio data from a Melbourne-based NDS involving 42 families. Multilevel models were developed to evaluate the effect of speech activity and the presence of child passengers on driving performance. Speech activity was significantly associated with velocity and steering angle variability. Child passenger presence alone was not associated with changes in driving performance. However, speech activity in the presence of two child passengers was associated with the most variability in driving performance. The effects of in-vehicle speech on driving performance in the presence of child passengers appear to be heterogeneous, and multiple factors may need to be considered in evaluating their impact. This goal can potentially be achieved within large-scale NDS through the automated processing of observational data, including speech. Speech-processing algorithms enable new perspectives on driving performance to be gained from existing NDS data, and variables that were once labor-intensive to process can be readily utilized in future research. © 2016, Human Factors and Ergonomics Society.

  6. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1988-09-01

    The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.

  7. Across-site patterns of modulation detection: Relation to speech recognitiona)

    PubMed Central

    Garadat, Soha N.; Zwolan, Teresa A.; Pfingst, Bryan E.

    2012-01-01

    The aim of this study was to identify across-site patterns of modulation detection thresholds (MDTs) in subjects with cochlear implants and to determine if removal of sites with the poorest MDTs from speech processor programs would result in improved speech recognition. Five hundred millisecond trains of symmetric-biphasic pulses were modulated sinusoidally at 10 Hz and presented at a rate of 900 pps using monopolar stimulation. Subjects were asked to discriminate a modulated pulse train from an unmodulated pulse train for all electrodes in quiet and in the presence of an interleaved unmodulated masker presented on the adjacent site. Across-site patterns of masked MDTs were then used to construct two 10-channel MAPs such that one MAP consisted of sites with the best masked MDTs and the other MAP consisted of sites with the worst masked MDTs. Subjects’ speech recognition skills were compared when they used these two different MAPs. Results showed that MDTs were variable across sites and were elevated in the presence of a masker by various amounts across sites. Better speech recognition was observed when the processor MAP consisted of sites with best masked MDTs, suggesting that temporal modulation sensitivity has important contributions to speech recognition with a cochlear implant. PMID:22559376

  8. Sentence Recognition Prediction for Hearing-impaired Listeners in Stationary and Fluctuation Noise With FADE: Empowering the Attenuation and Distortion Concept by Plomp With a Quantitative Processing Model.

    PubMed

    Kollmeier, Birger; Schädler, Marc René; Warzybok, Anna; Meyer, Bernd T; Brand, Thomas

    2016-09-07

    To characterize the individual patient's hearing impairment as obtained with the matrix sentence recognition test, a simulation Framework for Auditory Discrimination Experiments (FADE) is extended here using the Attenuation and Distortion (A+D) approach by Plomp as a blueprint for setting the individual processing parameters. FADE has been shown to predict the outcome of both speech recognition tests and psychoacoustic experiments based on simulations using an automatic speech recognition system requiring only few assumptions. It builds on the closed-set matrix sentence recognition test which is advantageous for testing individual speech recognition in a way comparable across languages. Individual predictions of speech recognition thresholds in stationary and in fluctuating noise were derived using the audiogram and an estimate of the internal level uncertainty for modeling the individual Plomp curves fitted to the data with the Attenuation (A-) and Distortion (D-) parameters of the Plomp approach. The "typical" audiogram shapes from Bisgaard et al with or without a "typical" level uncertainty and the individual data were used for individual predictions. As a result, the individualization of the level uncertainty was found to be more important than the exact shape of the individual audiogram to accurately model the outcome of the German Matrix test in stationary or fluctuating noise for listeners with hearing impairment. The prediction accuracy of the individualized approach also outperforms the (modified) Speech Intelligibility Index approach which is based on the individual threshold data only. © The Author(s) 2016.

  9. Science 101: How Does Speech-Recognition Software Work?

    ERIC Educational Resources Information Center

    Robertson, Bill

    2016-01-01

    This column provides background science information for elementary teachers. Many innovations with computer software begin with analysis of how humans do a task. This article takes a look at how humans recognize spoken words and explains the origins of speech-recognition software.

  10. Robust matching for voice recognition

    NASA Astrophysics Data System (ADS)

    Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

    1994-10-01

    This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.

  11. Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children With Normal Hearing: A Replication and Extension of ).

    PubMed

    Roman, Adrienne S; Pisoni, David B; Kronenberger, William G; Faulkner, Kathleen F

    Noise-vocoded speech is a valuable research tool for testing experimental hypotheses about the effects of spectral degradation on speech recognition in adults with normal hearing (NH). However, very little research has utilized noise-vocoded speech with children with NH. Earlier studies with children with NH focused primarily on the amount of spectral information needed for speech recognition without assessing the contribution of neurocognitive processes to speech perception and spoken word recognition. In this study, we first replicated the seminal findings reported by ) who investigated effects of lexical density and word frequency on noise-vocoded speech perception in a small group of children with NH. We then extended the research to investigate relations between noise-vocoded speech recognition abilities and five neurocognitive measures: auditory attention (AA) and response set, talker discrimination, and verbal and nonverbal short-term working memory. Thirty-one children with NH between 5 and 13 years of age were assessed on their ability to perceive lexically controlled words in isolation and in sentences that were noise-vocoded to four spectral channels. Children were also administered vocabulary assessments (Peabody Picture Vocabulary test-4th Edition and Expressive Vocabulary test-2nd Edition) and measures of AA (NEPSY AA and response set and a talker discrimination task) and short-term memory (visual digit and symbol spans). Consistent with the findings reported in the original ) study, we found that children perceived noise-vocoded lexically easy words better than lexically hard words. Words in sentences were also recognized better than the same words presented in isolation. No significant correlations were observed between noise-vocoded speech recognition scores and the Peabody Picture Vocabulary test-4th Edition using language quotients to control for age effects. However, children who scored higher on the Expressive Vocabulary test-2nd Edition recognized lexically easy words better than lexically hard words in sentences. Older children perceived noise-vocoded speech better than younger children. Finally, we found that measures of AA and short-term memory capacity were significantly correlated with a child's ability to perceive noise-vocoded isolated words and sentences. First, we successfully replicated the major findings from the ) study. Because familiarity, phonological distinctiveness and lexical competition affect word recognition, these findings provide additional support for the proposal that several foundational elementary neurocognitive processes underlie the perception of spectrally degraded speech. Second, we found strong and significant correlations between performance on neurocognitive measures and children's ability to recognize words and sentences noise-vocoded to four spectral channels. These findings extend earlier research suggesting that perception of spectrally degraded speech reflects early peripheral auditory processes, as well as additional contributions of executive function, specifically, selective attention and short-term memory processes in spoken word recognition. The present findings suggest that AA and short-term memory support robust spoken word recognition in children with NH even under compromised and challenging listening conditions. These results are relevant to research carried out with listeners who have hearing loss, because they are routinely required to encode, process, and understand spectrally degraded acoustic signals.

  12. Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children with Normal Hearing: A Replication and Extension of Eisenberg et al., 2002

    PubMed Central

    Roman, Adrienne S.; Pisoni, David B.; Kronenberger, William G.; Faulkner, Kathleen F.

    2016-01-01

    Objectives Noise-vocoded speech is a valuable research tool for testing experimental hypotheses about the effects of spectral-degradation on speech recognition in adults with normal hearing (NH). However, very little research has utilized noise-vocoded speech with children with NH. Earlier studies with children with NH focused primarily on the amount of spectral information needed for speech recognition without assessing the contribution of neurocognitive processes to speech perception and spoken word recognition. In this study, we first replicated the seminal findings reported by Eisenberg et al. (2002) who investigated effects of lexical density and word frequency on noise-vocoded speech perception in a small group of children with NH. We then extended the research to investigate relations between noise-vocoded speech recognition abilities and five neurocognitive measures: auditory attention and response set, talker discrimination and verbal and nonverbal short-term working memory. Design Thirty-one children with NH between 5 and 13 years of age were assessed on their ability to perceive lexically controlled words in isolation and in sentences that were noise-vocoded to four spectral channels. Children were also administered vocabulary assessments (PPVT-4 and EVT-2) and measures of auditory attention (NEPSY Auditory Attention (AA) and Response Set (RS) and a talker discrimination task (TD)) and short-term memory (visual digit and symbol spans). Results Consistent with the findings reported in the original Eisenberg et al. (2002) study, we found that children perceived noise-vocoded lexically easy words better than lexically hard words. Words in sentences were also recognized better than the same words presented in isolation. No significant correlations were observed between noise-vocoded speech recognition scores and the PPVT-4 using language quotients to control for age effects. However, children who scored higher on the EVT-2 recognized lexically easy words better than lexically hard words in sentences. Older children perceived noise-vocoded speech better than younger children. Finally, we found that measures of auditory attention and short-term memory capacity were significantly correlated with a child’s ability to perceive noise-vocoded isolated words and sentences. Conclusions First, we successfully replicated the major findings from the Eisenberg et al. (2002) study. Because familiarity, phonological distinctiveness and lexical competition affect word recognition, these findings provide additional support for the proposal that several foundational elementary neurocognitive processes underlie the perception of spectrally-degraded speech. Second, we found strong and significant correlations between performance on neurocognitive measures and children’s ability to recognize words and sentences noise-vocoded to four spectral channels. These findings extend earlier research suggesting that perception of spectrally-degraded speech reflects early peripheral auditory processes as well as additional contributions of executive function, specifically, selective attention and short-term memory processes in spoken word recognition. The present findings suggest that auditory attention and short-term memory support robust spoken word recognition in children with NH even under compromised and challenging listening conditions. These results are relevant to research carried out with listeners who have hearing loss, since they are routinely required to encode, process and understand spectrally-degraded acoustic signals. PMID:28045787

  13. A Comparison of Two Scoring Methods for an Automated Speech Scoring System

    ERIC Educational Resources Information Center

    Xi, Xiaoming; Higgins, Derrick; Zechner, Klaus; Williamson, David

    2012-01-01

    This paper compares two alternative scoring methods--multiple regression and classification trees--for an automated speech scoring system used in a practice environment. The two methods were evaluated on two criteria: construct representation and empirical performance in predicting human scores. The empirical performance of the two scoring models…

  14. Perception of Sung Speech in Bimodal Cochlear Implant Users.

    PubMed

    Crew, Joseph D; Galvin, John J; Fu, Qian-Jie

    2016-11-11

    Combined use of a hearing aid (HA) and cochlear implant (CI) has been shown to improve CI users' speech and music performance. However, different hearing devices, test stimuli, and listening tasks may interact and obscure bimodal benefits. In this study, speech and music perception were measured in bimodal listeners for CI-only, HA-only, and CI + HA conditions, using the Sung Speech Corpus, a database of monosyllabic words produced at different fundamental frequencies. Sentence recognition was measured using sung speech in which pitch was held constant or varied across words, as well as for spoken speech. Melodic contour identification (MCI) was measured using sung speech in which the words were held constant or varied across notes. Results showed that sentence recognition was poorer with sung speech relative to spoken, with little difference between sung speech with a constant or variable pitch; mean performance was better with CI-only relative to HA-only, and best with CI + HA. MCI performance was better with constant words versus variable words; mean performance was better with HA-only than with CI-only and was best with CI + HA. Relative to CI-only, a strong bimodal benefit was observed for speech and music perception. Relative to the better ear, bimodal benefits remained strong for sentence recognition but were marginal for MCI. While variations in pitch and timbre may negatively affect CI users' speech and music perception, bimodal listening may partially compensate for these deficits. © The Author(s) 2016.

  15. Sperry Univac speech communications technology

    NASA Technical Reports Server (NTRS)

    Medress, Mark F.

    1977-01-01

    Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.

  16. Assistive Technology and Adults with Learning Disabilities: A Blueprint for Exploration and Advancement.

    ERIC Educational Resources Information Center

    Raskind, Marshall

    1993-01-01

    This article describes assistive technologies for persons with learning disabilities, including word processing, spell checking, proofreading programs, outlining/"brainstorming" programs, abbreviation expanders, speech recognition, speech synthesis/screen review, optical character recognition systems, personal data managers, free-form databases,…

  17. Speech as a pilot input medium

    NASA Technical Reports Server (NTRS)

    Plummer, R. P.; Coler, C. R.

    1977-01-01

    The speech recognition system under development is a trainable pattern classifier based on a maximum-likelihood technique. An adjustable uncertainty threshold allows the rejection of borderline cases for which the probability of misclassification is high. The syntax of the command language spoken may be used as an aid to recognition, and the system adapts to changes in pronunciation if feedback from the user is available. Words must be separated by .25 second gaps. The system runs in real time on a mini-computer (PDP 11/10) and was tested on 120,000 speech samples from 10- and 100-word vocabularies. The results of these tests were 99.9% correct recognition for a vocabulary consisting of the ten digits, and 99.6% recognition for a 100-word vocabulary of flight commands, with a 5% rejection rate in each case. With no rejection, the recognition accuracies for the same vocabularies were 99.5% and 98.6% respectively.

  18. Listeners Experience Linguistic Masking Release in Noise-Vocoded Speech-in-Speech Recognition

    ERIC Educational Resources Information Center

    Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.

    2018-01-01

    Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…

  19. Detection of target phonemes in spontaneous and read speech.

    PubMed

    Mehta, G; Cutler, A

    1988-01-01

    Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalise to the recognition of spontaneous speech. In the present study listeners were presented with both spontaneous and read speech materials, and their response time to detect word-initial target phonemes was measured. Responses were, overall, equally fast in each speech mode. However, analysis of effects previously reported in phoneme detection studies revealed significant differences between speech modes. In read speech but not in spontaneous speech, later targets were detected more rapidly than targets preceded by short words. In contrast, in spontaneous speech but not in read speech, targets were detected more rapidly in accented than in unaccented words and in strong than in weak syllables. An explanation for this pattern is offered in terms of characteristic prosodic differences between spontaneous and read speech. The results support claims from previous work that listeners pay great attention to prosodic information in the process of recognising speech.

  20. LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS HOPKINS SUMMER WORKSHOP.

    PubMed

    Hasegawa-Johnson, Mark; Baker, James; Borys, Sarah; Chen, Ken; Coogan, Emily; Greenberg, Steven; Juneja, Amit; Kirchhoff, Katrin; Livescu, Karen; Mohan, Srividya; Muller, Jennifer; Sonmez, Kemal; Wang, Tianyu

    2005-01-01

    Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multiframe acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.

  1. An Analysis of Individual Differences in Recognizing Monosyllabic Words Under the Speech Intelligibility Index Framework

    PubMed Central

    Shen, Yi; Kern, Allison B.

    2018-01-01

    Individual differences in the recognition of monosyllabic words, either in isolation (NU6 test) or in sentence context (SPIN test), were investigated under the theoretical framework of the speech intelligibility index (SII). An adaptive psychophysical procedure, namely the quick-band-importance-function procedure, was developed to enable the fitting of the SII model to individual listeners. Using this procedure, the band importance function (i.e., the relative weights of speech information across the spectrum) and the link function relating the SII to recognition scores can be simultaneously estimated while requiring only 200 to 300 trials of testing. Octave-frequency band importance functions and link functions were estimated separately for NU6 and SPIN materials from 30 normal-hearing listeners who were naïve to speech recognition experiments. For each type of speech material, considerable individual differences in the spectral weights were observed in some but not all frequency regions. At frequencies where the greatest intersubject variability was found, the spectral weights were correlated between the two speech materials, suggesting that the variability in spectral weights reflected listener-originated factors. PMID:29532711

  2. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    PubMed Central

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  3. Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a Mel-cepstral stochastic model.

    PubMed

    Polur, Prasad D; Miller, Gerald E

    2005-01-01

    Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients, requires a robust technique that can handle conditions of very high variability and limited training data. In this study, a hidden Markov model (HMM) was constructed and conditions investigated that would provide improved performance for a dysarthric speech (isolated word) recognition system intended to act as an assistive/control tool. In particular, we investigated the effect of high-frequency spectral components on the recognition rate of the system to determine if they contributed useful additional information to the system. A small-size vocabulary spoken by three cerebral palsy subjects was chosen. Mel-frequency cepstral coefficients extracted with the use of 15 ms frames served as training input to an ergodic HMM setup. Subsequent results demonstrated that no significant useful information was available to the system for enhancing its ability to discriminate dysarthric speech above 5.5 kHz in the current set of dysarthric data. The level of variability in input dysarthric speech patterns limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor-impaired individuals such as cerebral palsy subjects holds sufficient promise.

  4. Application of Business Process Management to drive the deployment of a speech recognition system in a healthcare organization.

    PubMed

    González Sánchez, María José; Framiñán Torres, José Manuel; Parra Calderón, Carlos Luis; Del Río Ortega, Juan Antonio; Vigil Martín, Eduardo; Nieto Cervera, Jaime

    2008-01-01

    We present a methodology based on Business Process Management to guide the development of a speech recognition system in a hospital in Spain. The methodology eases the deployment of the system by 1) involving the clinical staff in the process, 2) providing the IT professionals with a description of the process and its requirements, 3) assessing advantages and disadvantages of the speech recognition system, as well as its impact in the organisation, and 4) help reorganising the healthcare process before implementing the new technology in order to identify how it can better contribute to the overall objective of the organisation.

  5. Psychometric Functions for Shortened Administrations of a Speech Recognition Approach Using Tri-Word Presentations and Phonemic Scoring

    ERIC Educational Resources Information Center

    Gelfand, Stanley A.; Gelfand, Jessica T.

    2012-01-01

    Method: Complete psychometric functions for phoneme and word recognition scores at 8 signal-to-noise ratios from -15 dB to 20 dB were generated for the first 10, 20, and 25, as well as all 50, three-word presentations of the Tri-Word or Computer Assisted Speech Recognition Assessment (CASRA) Test (Gelfand, 1998) based on the results of 12…

  6. Adaptive method of recognition of signals for one and two-frequency signal system in the telephony on the background of speech

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Michael V.

    2006-05-01

    For reliable teamwork of various systems of automatic telecommunication including transferring systems of optical communication networks it is necessary authentic recognition of signals for one- or two-frequency service signal system. The analysis of time parameters of an accepted signal allows increasing reliability of detection and recognition of the service signal system on a background of speech.

  7. Age-related Effects on Word Recognition: Reliance on Cognitive Control Systems with Structural Declines in Speech-responsive Cortex

    PubMed Central

    Walczak, Adam; Ahlstrom, Jayne; Denslow, Stewart; Horwitz, Amy; Dubno, Judy R.

    2008-01-01

    Speech recognition can be difficult and effortful for older adults, even for those with normal hearing. Declining frontal lobe cognitive control has been hypothesized to cause age-related speech recognition problems. This study examined age-related changes in frontal lobe function for 15 clinically normal hearing adults (21–75 years) when they performed a word recognition task that was made challenging by decreasing word intelligibility. Although there were no age-related changes in word recognition, there were age-related changes in the degree of activity within left middle frontal gyrus (MFG) and anterior cingulate (ACC) regions during word recognition. Older adults engaged left MFG and ACC regions when words were most intelligible compared to younger adults who engaged these regions when words were least intelligible. Declining gray matter volume within temporal lobe regions responsive to word intelligibility significantly predicted left MFG activity, even after controlling for total gray matter volume, suggesting that declining structural integrity of brain regions responsive to speech leads to the recruitment of frontal regions when words are easily understood. Electronic supplementary material The online version of this article (doi:10.1007/s10162-008-0113-3) contains supplementary material, which is available to authorized users. PMID:18274825

  8. An audiovisual emotion recognition system

    NASA Astrophysics Data System (ADS)

    Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun

    2007-12-01

    Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.

  9. The Relationship Between Spectral Modulation Detection and Speech Recognition: Adult Versus Pediatric Cochlear Implant Recipients

    PubMed Central

    Noble, Jack H.; Camarata, Stephen M.; Sunderhaus, Linsey W.; Dwyer, Robert T.; Dawant, Benoit M.; Dietrich, Mary S.; Labadie, Robert F.

    2018-01-01

    Adult cochlear implant (CI) recipients demonstrate a reliable relationship between spectral modulation detection and speech understanding. Prior studies documenting this relationship have focused on postlingually deafened adult CI recipients—leaving an open question regarding the relationship between spectral resolution and speech understanding for adults and children with prelingual onset of deafness. Here, we report CI performance on the measures of speech recognition and spectral modulation detection for 578 CI recipients including 477 postlingual adults, 65 prelingual adults, and 36 prelingual pediatric CI users. The results demonstrated a significant correlation between spectral modulation detection and various measures of speech understanding for 542 adult CI recipients. For 36 pediatric CI recipients, however, there was no significant correlation between spectral modulation detection and speech understanding in quiet or in noise nor was spectral modulation detection significantly correlated with listener age or age at implantation. These findings suggest that pediatric CI recipients might not depend upon spectral resolution for speech understanding in the same manner as adult CI recipients. It is possible that pediatric CI users are making use of different cues, such as those contained within the temporal envelope, to achieve high levels of speech understanding. Further investigation is warranted to investigate the relationship between spectral and temporal resolution and speech recognition to describe the underlying mechanisms driving peripheral auditory processing in pediatric CI users. PMID:29716437

  10. Speech Recognition Using Multiple Features and Multiple Recognizers

    DTIC Science & Technology

    1991-12-03

    6 2.1 Introduction ............................................... 6 2.2 Human Speech Communication Process...119 How to Setup ASRT.......................................... 119 How to Use Interactive Menus .................................. 120...recognize a word from an acoustic signal. The human ear and brain perform this type of recognition with incredible speed and precision. Even though

  11. Automated Assessment of Child Vocalization Development Using LENA.

    PubMed

    Richards, Jeffrey A; Xu, Dongxin; Gilkerson, Jill; Yapanel, Umit; Gray, Sharmistha; Paul, Terrance

    2017-07-12

    To produce a novel, efficient measure of children's expressive vocal development on the basis of automatic vocalization assessment (AVA), child vocalizations were automatically identified and extracted from audio recordings using Language Environment Analysis (LENA) System technology. Assessment was based on full-day audio recordings collected in a child's unrestricted, natural language environment. AVA estimates were derived using automatic speech recognition modeling techniques to categorize and quantify the sounds in child vocalizations (e.g., protophones and phonemes). These were expressed as phone and biphone frequencies, reduced to principal components, and inputted to age-based multiple linear regression models to predict independently collected criterion-expressive language scores. From these models, we generated vocal development AVA estimates as age-standardized scores and development age estimates. AVA estimates demonstrated strong statistical reliability and validity when compared with standard criterion expressive language assessments. Automated analysis of child vocalizations extracted from full-day recordings in natural settings offers a novel and efficient means to assess children's expressive vocal development. More research remains to identify specific mechanisms of operation.

  12. Improving diabetes care among patients overdue for recommended testing: a randomized controlled trial of automated telephone outreach.

    PubMed

    Simon, Steven R; Trinacty, Connie Mah; Soumerai, Stephen B; Piette, John D; Meigs, James B; Shi, Ping; Ensroth, Arthur; Ross-Degnan, Dennis

    2010-07-01

    The study's objective was to assess the effects of automated telephone outreach with speech recognition (ATO-SR) on diabetes-related testing. We identified 1,200 health plan members who were overdue for diabetes-related testing and randomly allocated 600 to ATO-SR and 600 to usual care (no intervention). The intervention included three interactive calls encouraging recommended testing. The primary outcome was retinopathy testing, since this was the health plan's principal goal. Tests for glycemia, hyperlipidemia, and nephropathy were secondary outcomes. In total, 232 participants (39%) verbally responded to the calls. There was no difference between the intervention and the usual care groups in the primary outcome (adjusted hazard ratio 0.93 [95% CI 0.71-1.22]) and no effect of the intervention on any of the secondary outcomes. Fewer than 40% of the patients randomized to ATO-SR interacted verbally with the system. The intervention had no effect on the study's outcomes.

  13. [Improvement in Phoneme Discrimination in Noise in Normal Hearing Adults].

    PubMed

    Schumann, A; Garea Garcia, L; Hoppe, U

    2017-02-01

    Objective: The study's aim was to examine the possibility to train phoneme-discrimination in noise with normal hearing adults, and its effectivity on speech recognition in noise. A specific computerised training program was used, consisting of special nonsense-syllables with background noise, to train participants' discrimination ability. Material and Methods: 46 normal hearing subjects took part in this study, 28 as training group participants, 18 as control group participants. Only the training group subjects were asked to train over a period of 3 weeks, twice a week for an hour with a computer-based training program. Speech recognition in noise were measured pre- to posttraining for the training group subjects with the Freiburger Einsilber Test. The control group subjects obtained test and restest measures within a 2-3 week break. For the training group follow-up speech recognition was measured 2-3 months after the end of the training. Results: The majority of training group subjects improved their phoneme discrimination significantly. Besides, their speech recognition in noise improved significantly during the training compared to the control group, and remained stable for a period of time. Conclusions: Phonem-Discrimination in noise can be trained by normal hearing adults. The improvements have got a positiv effect on speech recognition in noise, also for a longer period of time. © Georg Thieme Verlag KG Stuttgart · New York.

  14. Recognition and localization of speech by adult cochlear implant recipients wearing a digital hearing aid in the nonimplanted ear (bimodal hearing).

    PubMed

    Potts, Lisa G; Skinner, Margaret W; Litovsky, Ruth A; Strube, Michael J; Kuk, Francis

    2009-06-01

    The use of bilateral amplification is now common clinical practice for hearing aid users but not for cochlear implant recipients. In the past, most cochlear implant recipients were implanted in one ear and wore only a monaural cochlear implant processor. There has been recent interest in benefits arising from bilateral stimulation that may be present for cochlear implant recipients. One option for bilateral stimulation is the use of a cochlear implant in one ear and a hearing aid in the opposite nonimplanted ear (bimodal hearing). This study evaluated the effect of wearing a cochlear implant in one ear and a digital hearing aid in the opposite ear on speech recognition and localization. A repeated-measures correlational study was completed. Nineteen adult Cochlear Nucleus 24 implant recipients participated in the study. The participants were fit with a Widex Senso Vita 38 hearing aid to achieve maximum audibility and comfort within their dynamic range. Soundfield thresholds, loudness growth, speech recognition, localization, and subjective questionnaires were obtained six-eight weeks after the hearing aid fitting. Testing was completed in three conditions: hearing aid only, cochlear implant only, and cochlear implant and hearing aid (bimodal). All tests were repeated four weeks after the first test session. Repeated-measures analysis of variance was used to analyze the data. Significant effects were further examined using pairwise comparison of means or in the case of continuous moderators, regression analyses. The speech-recognition and localization tasks were unique, in that a speech stimulus presented from a variety of roaming azimuths (140 degree loudspeaker array) was used. Performance in the bimodal condition was significantly better for speech recognition and localization compared to the cochlear implant-only and hearing aid-only conditions. Performance was also different between these conditions when the location (i.e., side of the loudspeaker array that presented the word) was analyzed. In the bimodal condition, the speech-recognition and localization tasks were equal regardless of which side of the loudspeaker array presented the word, while performance was significantly poorer for the monaural conditions (hearing aid only and cochlear implant only) when the words were presented on the side with no stimulation. Binaural loudness summation of 1-3 dB was seen in soundfield thresholds and loudness growth in the bimodal condition. Measures of the audibility of sound with the hearing aid, including unaided thresholds, soundfield thresholds, and the Speech Intelligibility Index, were significant moderators of speech recognition and localization. Based on the questionnaire responses, participants showed a strong preference for bimodal stimulation. These findings suggest that a well-fit digital hearing aid worn in conjunction with a cochlear implant is beneficial to speech recognition and localization. The dynamic test procedures used in this study illustrate the importance of bilateral hearing for locating, identifying, and switching attention between multiple speakers. It is recommended that unilateral cochlear implant recipients, with measurable unaided hearing thresholds, be fit with a hearing aid.

  15. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  16. "Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing

    PubMed Central

    Xiao, Bo; Imel, Zac E.; Georgiou, Panayiotis G.; Atkins, David C.; Narayanan, Shrikanth S.

    2015-01-01

    The technology for evaluating patient-provider interactions in psychotherapy–observational coding–has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies. PMID:26630392

  17. Automatic speech recognition and training for severely dysarthric users of assistive technology: the STARDUST project.

    PubMed

    Parker, Mark; Cunningham, Stuart; Enderby, Pam; Hawley, Mark; Green, Phil

    2006-01-01

    The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent proximity to "normal" articulatory patterns. Severe dysarthric output may also be characterized by a small mass of distinguishable phonetic tokens making the acoustic differentiation of target words difficult. Speaker dependent computer speech recognition using Hidden Markov Models was achieved by the identification of robust phonetic elements within the individual speaker output patterns. A new system of speech training using computer generated visual and auditory feedback reduced the inconsistent production of key phonetic tokens over time.

  18. Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

    PubMed

    Borrie, Stephanie A

    2015-03-01

    This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.

  19. Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition.

    PubMed

    Chatterjee, Monita; Peng, Shu-Chen

    2008-01-01

    Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners' Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners' everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial-F0s to represent male vs. female voices, and different F0 contours (i.e. falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners' sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners' performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution.

  20. Processing F0 with Cochlear Implants: Modulation Frequency Discrimination and Speech Intonation Recognition

    PubMed Central

    Chatterjee, Monita; Peng, Shu-Chen

    2008-01-01

    Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners’ Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners’ everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial F0s to represent male vs. female voices, and different F0 contours (i.e., falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners’ sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners’ performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution. PMID:18093766

  1. Benefits of adaptive FM systems on speech recognition in noise for listeners who use hearing aids.

    PubMed

    Thibodeau, Linda

    2010-06-01

    To compare the benefits of adaptive FM and fixed FM systems through measurement of speech recognition in noise with adults and students in clinical and real-world settings. Five adults and 5 students with moderate-to-severe hearing loss completed objective and subjective speech recognition in noise measures with the 2 types of FM processing. Sentence recognition was evaluated in a classroom for 5 competing noise levels ranging from 54 to 80 dBA while the FM microphone was positioned 6 in. from the signal loudspeaker to receive input at 84 dB SPL. The subjective measures included 2 classroom activities and 6 auditory lessons in a noisy, public aquarium. On the objective measures, adaptive FM processing resulted in significantly better speech recognition in noise than fixed FM processing for 68- and 73-dBA noise levels. On the subjective measures, all individuals preferred adaptive over fixed processing for half of the activities. Adaptive processing was also preferred by most (8-9) individuals for the remaining 4 activities. The adaptive FM processing resulted in significant improvements at the higher noise levels and was preferred by the majority of participants in most of the conditions.

  2. Working Memory and Speech Recognition in Noise Under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type Among Adults With Hearing Loss.

    PubMed

    Miller, Christi W; Stewart, Erin K; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A; Tremblay, Kelly

    2017-08-16

    This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2 measures of WM were taken: a reading span measure, and Word Auditory Recognition and Recall Measure (Smith, Pichora-Fuller, & Alexander, 2016). Speech recognition was measured with the Multi-Modal Lexical Sentence Test for Adults (Kirk et al., 2012) in steady-state noise and 4-talker babble, with and without visual cues. Testing was under unaided conditions. A linear mixed model revealed visual cues and pure-tone average as the only significant predictors of Multi-Modal Lexical Sentence Test outcomes. Neither WM measure nor noise type showed a significant effect. The contribution of WM in explaining unaided speech recognition in noise was negligible and not influenced by noise type or visual cues. We anticipate that with audibility partially restored by hearing aids, the effects of WM will increase. For clinical practice to be affected, more significant effect sizes are needed.

  3. Neuronal Spoken Word Recognition: The Time Course of Processing Variation in the Speech Signal

    ERIC Educational Resources Information Center

    Schild, Ulrike; Roder, Brigitte; Friedrich, Claudia K.

    2012-01-01

    Recent neurobiological studies revealed evidence for lexical representations that are not specified for the coronal place of articulation (PLACE; Friedrich, Eulitz, & Lahiri, 2006; Friedrich, Lahiri, & Eulitz, 2008). Here we tested when these types of underspecified representations influence neuronal speech recognition. In a unimodal…

  4. Recognizing Speech under a Processing Load: Dissociating Energetic from Informational Factors

    ERIC Educational Resources Information Center

    Mattys, Sven L.; Brooks, Joanna; Cooke, Martin

    2009-01-01

    Effects of perceptual and cognitive loads on spoken-word recognition have so far largely escaped investigation. This study lays the foundations of a psycholinguistic approach to speech recognition in adverse conditions that draws upon the distinction between energetic masking, i.e., listening environments leading to signal degradation, and…

  5. Bilingual Computerized Speech Recognition Screening for Depression Symptoms

    ERIC Educational Resources Information Center

    Gonzalez, Gerardo; Carter, Colby; Blanes, Erika

    2007-01-01

    The Voice-Interactive Depression Assessment System (VIDAS) is a computerized speech recognition application for screening depression based on the Center for Epidemiological Studies--Depression scale in English and Spanish. Study 1 included 50 English and 47 Spanish speakers. Study 2 involved 108 English and 109 Spanish speakers. Participants…

  6. Multichannel Compression, Temporal Cues, and Audibility.

    ERIC Educational Resources Information Center

    Souza, Pamela E.; Turner, Christopher W.

    1998-01-01

    The effect of the reduction of the temporal envelope produced by multichannel compression on recognition was examined in 16 listeners with hearing loss, with particular focus on audibility of the speech signal. Multichannel compression improved speech recognition when superior audibility was provided by a two-channel compression system over linear…

  7. The effect of hearing aid technologies on listening in an automobile.

    PubMed

    Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A; Stanziola, Rachel W

    2013-06-01

    Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile/road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the listener/driver, conventional directional processing that places the directivity beam toward the listener's front may not be helpful and, in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener's speech recognition performance and preference for communication in a traveling automobile. A single-blinded, repeated-measures design was used. Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 yr participated in the study. The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 mph on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound-treated booth to assess speech recognition performance and preference with each programmed condition. Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants' preferences for a given processing scheme were generally consistent with speech recognition results. The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. American Academy of Audiology.

  8. Not all sounds sound the same: Parkinson's disease affects differently emotion processing in music and in speech prosody.

    PubMed

    Lima, César F; Garrett, Carolina; Castro, São Luís

    2013-01-01

    Does emotion processing in music and speech prosody recruit common neurocognitive mechanisms? To examine this question, we implemented a cross-domain comparative design in Parkinson's disease (PD). Twenty-four patients and 25 controls performed emotion recognition tasks for music and spoken sentences. In music, patients had impaired recognition of happiness and peacefulness, and intact recognition of sadness and fear; this pattern was independent of general cognitive and perceptual abilities. In speech, patients had a small global impairment, which was significantly mediated by executive dysfunction. Hence, PD affected differently musical and prosodic emotions. This dissociation indicates that the mechanisms underlying the two domains are partly independent.

  9. Speech-on-speech masking with variable access to the linguistic content of the masker speech for native and nonnative english speakers.

    PubMed

    Calandruccio, Lauren; Bradlow, Ann R; Dhar, Sumitrajit

    2014-04-01

    Masking release for an English sentence-recognition task in the presence of foreign-accented English speech compared with native-accented English speech was reported in Calandruccio et al (2010a). The masking release appeared to increase as the masker intelligibility decreased. However, it could not be ruled out that spectral differences between the speech maskers were influencing the significant differences observed. The purpose of the current experiment was to minimize spectral differences between speech maskers to determine how various amounts of linguistic information within competing speech Affiliationect masking release. A mixed-model design with within-subject (four two-talker speech maskers) and between-subject (listener group) factors was conducted. Speech maskers included native-accented English speech and high-intelligibility, moderate-intelligibility, and low-intelligibility Mandarin-accented English. Normalizing the long-term average speech spectra of the maskers to each other minimized spectral differences between the masker conditions. Three listener groups were tested, including monolingual English speakers with normal hearing, nonnative English speakers with normal hearing, and monolingual English speakers with hearing loss. The nonnative English speakers were from various native language backgrounds, not including Mandarin (or any other Chinese dialect). Listeners with hearing loss had symmetric mild sloping to moderate sensorineural hearing loss. Listeners were asked to repeat back sentences that were presented in the presence of four different two-talker speech maskers. Responses were scored based on the key words within the sentences (100 key words per masker condition). A mixed-model regression analysis was used to analyze the difference in performance scores between the masker conditions and listener groups. Monolingual English speakers with normal hearing benefited when the competing speech signal was foreign accented compared with native accented, allowing for improved speech recognition. Various levels of intelligibility across the foreign-accented speech maskers did not influence results. Neither the nonnative English-speaking listeners with normal hearing nor the monolingual English speakers with hearing loss benefited from masking release when the masker was changed from native-accented to foreign-accented English. Slight modifications between the target and the masker speech allowed monolingual English speakers with normal hearing to improve their recognition of native-accented English, even when the competing speech was highly intelligible. Further research is needed to determine which modifications within the competing speech signal caused the Mandarin-accented English to be less effective with respect to masking. Determining the influences within the competing speech that make it less effective as a masker or determining why monolingual normal-hearing listeners can take advantage of these differences could help improve speech recognition for those with hearing loss in the future. American Academy of Audiology.

  10. Application of speech recognition and synthesis in the general aviation cockpit

    NASA Technical Reports Server (NTRS)

    North, R. A.; Mountford, S. J.; Bergeron, H.

    1984-01-01

    Interactive speech recognition/synthesis technology is assessed as a method for the aleviation of single-pilot IFR flight workloads. Attention was given during this series of evaluations to the conditions typical of general aviation twin-engine aircrft cockpits, covering several commonly encountered IFR flight condition scenarios. The most beneficial speech command tasks are noted to be in the data retrieval domain, which would allow the pilot access to uplinked data, checklists, and performance charts. Data entry tasks also appear to benefit from this technology.

  11. Noise-immune multisensor transduction of speech

    NASA Astrophysics Data System (ADS)

    Viswanathan, Vishu R.; Henry, Claudia M.; Derr, Alan G.; Roucos, Salim; Schwartz, Richard M.

    1986-08-01

    Two types of configurations of multiple sensors were developed, tested and evaluated in speech recognition application for robust performance in high levels of acoustic background noise: One type combines the individual sensor signals to provide a single speech signal input, and the other provides several parallel inputs. For single-input systems, several configurations of multiple sensors were developed and tested. Results from formal speech intelligibility and quality tests in simulated fighter aircraft cockpit noise show that each of the two-sensor configurations tested outperforms the constituent individual sensors in high noise. Also presented are results comparing the performance of two-sensor configurations and individual sensors in speaker-dependent, isolated-word speech recognition tests performed using a commercial recognizer (Verbex 4000) in simulated fighter aircraft cockpit noise.

  12. Evaluation of speech recognizers for use in advanced combat helicopter crew station research and development

    NASA Technical Reports Server (NTRS)

    Simpson, Carol A.

    1990-01-01

    The U.S. Army Crew Station Research and Development Facility uses vintage 1984 speech recognizers. An evaluation was performed of newer off-the-shelf speech recognition devices to determine whether newer technology performance and capabilities are substantially better than that of the Army's current speech recognizers. The Phonetic Discrimination (PD-100) Test was used to compare recognizer performance in two ambient noise conditions: quiet office and helicopter noise. Test tokens were spoken by males and females and in isolated-word and connected-work mode. Better overall recognition accuracy was obtained from the newer recognizers. Recognizer capabilities needed to support the development of human factors design requirements for speech command systems in advanced combat helicopters are listed.

  13. "Who" is saying "what"? Brain-based decoding of human voice and speech.

    PubMed

    Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer

    2008-11-07

    Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.

  14. Effects of noise on speech recognition: Challenges for communication by service members.

    PubMed

    Le Prell, Colleen G; Clavier, Odile H

    2017-06-01

    Speech communication often takes place in noisy environments; this is an urgent issue for military personnel who must communicate in high-noise environments. The effects of noise on speech recognition vary significantly according to the sources of noise, the number and types of talkers, and the listener's hearing ability. In this review, speech communication is first described as it relates to current standards of hearing assessment for military and civilian populations. The next section categorizes types of noise (also called maskers) according to their temporal characteristics (steady or fluctuating) and perceptive effects (energetic or informational masking). Next, speech recognition difficulties experienced by listeners with hearing loss and by older listeners are summarized, and questions on the possible causes of speech-in-noise difficulty are discussed, including recent suggestions of "hidden hearing loss". The final section describes tests used by military and civilian researchers, audiologists, and hearing technicians to assess performance of an individual in recognizing speech in background noise, as well as metrics that predict performance based on a listener and background noise profile. This article provides readers with an overview of the challenges associated with speech communication in noisy backgrounds, as well as its assessment and potential impact on functional performance, and provides guidance for important new research directions relevant not only to military personnel, but also to employees who work in high noise environments. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation

    PubMed Central

    Banks, Briony; Gowen, Emma; Munro, Kevin J.; Adank, Patti

    2015-01-01

    Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker’s facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants’ eye gaze was recorded to verify that they looked at the speaker’s face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation. PMID:26283946

  16. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

    PubMed

    Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

    2015-01-01

    Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

  17. Speech Recognition of Bimodal Cochlear Implant Recipients Using a Wireless Audio Streaming Accessory for the Telephone.

    PubMed

    Wolfe, Jace; Morais, Mila; Schafer, Erin

    2016-02-01

    The goals of the present investigation were (1) to evaluate recognition of recorded speech presented over a mobile telephone for a group of adult bimodal cochlear implant users, and (2) to measure the potential benefits of wireless hearing assistance technology (HAT) for mobile telephone speech recognition using bimodal stimulation (i.e., a cochlear implant in one ear and a hearing aid on the other ear). A three-by-two-way repeated measures design was used to evaluate mobile telephone sentence-recognition performance differences obtained in quiet and in noise with and without the wireless HAT accessory coupled to the hearing aid alone, CI sound processor alone, and in the bimodal condition. Outpatient cochlear implant clinic. Sixteen bimodal users with Nucleus 24, Freedom, CI512, or CI422 cochlear implants participated in this study. Performance was measured with and without the use of a wireless HAT for the telephone used with the hearing aid alone, CI alone, and bimodal condition. CNC word recognition in quiet and in noise with and without the use of a wireless HAT telephone accessory in the hearing aid alone, CI alone, and bimodal conditions. Results suggested that the bimodal condition gave significantly better speech recognition on the mobile telephone with the wireless HAT. A wireless HAT for the mobile telephone provides bimodal users with significant improvement in word recognition in quiet and in noise over the mobile telephone.

  18. Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

    NASA Astrophysics Data System (ADS)

    Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.

    2009-12-01

    This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.

  19. Automatic speech recognition in air traffic control

    NASA Technical Reports Server (NTRS)

    Karlsson, Joakim

    1990-01-01

    Automatic Speech Recognition (ASR) technology and its application to the Air Traffic Control system are described. The advantages of applying ASR to Air Traffic Control, as well as criteria for choosing a suitable ASR system are presented. Results from previous research and directions for future work at the Flight Transportation Laboratory are outlined.

  20. Transcribe Your Class: Using Speech Recognition to Improve Access for At-Risk Students

    ERIC Educational Resources Information Center

    Bain, Keith; Lund-Lucas, Eunice; Stevens, Janice

    2012-01-01

    Through a project supported by Canada's Social Development Partnerships Program, a team of leading National Disability Organizations, universities, and industry partners are piloting a prototype Hosted Transcription Service that uses speech recognition to automatically create multimedia transcripts that can be used by students for study purposes.…

  1. Automatic Speech Recognition: Reliability and Pedagogical Implications for Teaching Pronunciation

    ERIC Educational Resources Information Center

    Kim, In-Seok

    2006-01-01

    This study examines the reliability of automatic speech recognition (ASR) software used to teach English pronunciation, focusing on one particular piece of software, "FluSpeak, as a typical example." Thirty-six Korean English as a Foreign Language (EFL) college students participated in an experiment in which they listened to 15 sentences…

  2. Speech Recognition Technology for Disabilities Education

    ERIC Educational Resources Information Center

    Tang, K. Wendy; Kamoua, Ridha; Sutan, Victor; Farooq, Omer; Eng, Gilbert; Chu, Wei Chern; Hou, Guofeng

    2005-01-01

    Speech recognition is an alternative to traditional methods of interacting with a computer, such as textual input through a keyboard. An effective system can replace or reduce the reliability on standard keyboard and mouse input. This can especially assist dyslexic students who have problems with character or word use and manipulation in a textual…

  3. Automatic Speech Recognition Technology as an Effective Means for Teaching Pronunciation

    ERIC Educational Resources Information Center

    Elimat, Amal Khalil; AbuSeileek, Ali Farhan

    2014-01-01

    This study aimed to explore the effect of using automatic speech recognition technology (ASR) on the third grade EFL students' performance in pronunciation, whether teaching pronunciation through ASR is better than regular instruction, and the most effective teaching technique (individual work, pair work, or group work) in teaching pronunciation…

  4. An Investigation of the Compensatory Effectiveness of Assistive Technology on Postsecondary Students with Learning Disabilities. Final Report.

    ERIC Educational Resources Information Center

    Murphy, Harry; Higgins, Eleanor

    This final report describes the activities and accomplishments of a 3-year study on the compensatory effectiveness of three assistive technologies, optical character recognition, speech synthesis, and speech recognition, on postsecondary students (N=140) with learning disabilities. These technologies were investigated relative to: (1) immediate…

  5. Effects of Cognitive Load on Speech Recognition

    ERIC Educational Resources Information Center

    Mattys, Sven L.; Wiget, Lukas

    2011-01-01

    The effect of cognitive load (CL) on speech recognition has received little attention despite the prevalence of CL in everyday life, e.g., dual-tasking. To assess the effect of CL on the interaction between lexically-mediated and acoustically-mediated processes, we measured the magnitude of the "Ganong effect" (i.e., lexical bias on phoneme…

  6. Auditory Training with Multiple Talkers and Passage-Based Semantic Cohesion

    ERIC Educational Resources Information Center

    Casserly, Elizabeth D.; Barney, Erin C.

    2017-01-01

    Purpose: Current auditory training methods typically result in improvements to speech recognition abilities in quiet, but learner gains may not extend to other domains in speech (e.g., recognition in noise) or self-assessed benefit. This study examined the potential of training involving multiple talkers and training emphasizing discourse-level…

  7. Implicit Processing of Phonotactic Cues: Evidence from Electrophysiological and Vascular Responses

    ERIC Educational Resources Information Center

    Rossi, Sonja; Jurgenson, Ina B.; Hanulikova, Adriana; Telkemeyer, Silke; Wartenburger, Isabell; Obrig, Hellmuth

    2011-01-01

    Spoken word recognition is achieved via competition between activated lexical candidates that match the incoming speech input. The competition is modulated by prelexical cues that are important for segmenting the auditory speech stream into linguistic units. One such prelexical cue that listeners rely on in spoken word recognition is phonotactics.…

  8. Spoken Word Recognition of Chinese Words in Continuous Speech

    ERIC Educational Resources Information Center

    Yip, Michael C. W.

    2015-01-01

    The present study examined the role of positional probability of syllables played in recognition of spoken word in continuous Cantonese speech. Because some sounds occur more frequently at the beginning position or ending position of Cantonese syllables than the others, so these kinds of probabilistic information of syllables may cue the locations…

  9. Phonotactics Constraints and the Spoken Word Recognition of Chinese Words in Speech

    ERIC Educational Resources Information Center

    Yip, Michael C.

    2016-01-01

    Two word-spotting experiments were conducted to examine the question of whether native Cantonese listeners are constrained by phonotactics information in spoken word recognition of Chinese words in speech. Because no legal consonant clusters occurred within an individual Chinese word, this kind of categorical phonotactics information of Chinese…

  10. Evaluating Automatic Speech Recognition-Based Language Learning Systems: A Case Study

    ERIC Educational Resources Information Center

    van Doremalen, Joost; Boves, Lou; Colpaert, Jozef; Cucchiarini, Catia; Strik, Helmer

    2016-01-01

    The purpose of this research was to evaluate a prototype of an automatic speech recognition (ASR)-based language learning system that provides feedback on different aspects of speaking performance (pronunciation, morphology and syntax) to students of Dutch as a second language. We carried out usability reviews, expert reviews and user tests to…

  11. Motor Speech Disorders Associated with Primary Progressive Aphasia

    PubMed Central

    Duffy, Joseph R.; Strand, Edythe A.; Josephs, Keith A.

    2014-01-01

    Background Primary progressive aphasia (PPA) and conditions that overlap with it can be accompanied by motor speech disorders. Recognition and understanding of motor speech disorders can contribute to a fuller clinical understanding of PPA and its management as well as its localization and underlying pathology. Aims To review the types of motor speech disorders that may occur with PPA, its primary variants, and its overlap syndromes (progressive supranuclear palsy syndrome, corticobasal syndrome, motor neuron disease), as well as with primary progressive apraxia of speech. Main Contribution The review should assist clinicians' and researchers' understanding of the relationship between motor speech disorders and PPA and its major variants. It also highlights the importance of recognizing neurodegenerative apraxia of speech as a condition that can occur with little or no evidence of aphasia. Conclusion Motor speech disorders can occur with PPA. Their recognition can contribute to clinical diagnosis and management of PPA and to understanding and predicting the localization and pathology associated with PPA variants and conditions that can overlap with them. PMID:25309017

  12. Quadcopter Control Using Speech Recognition

    NASA Astrophysics Data System (ADS)

    Malik, H.; Darma, S.; Soekirno, S.

    2018-04-01

    This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).

  13. Participation of the Classical Speech Areas in Auditory Long-Term Memory

    PubMed Central

    Karabanov, Anke Ninija; Paine, Rainer; Chao, Chi Chao; Schulze, Katrin; Scott, Brian; Hallett, Mark; Mishkin, Mortimer

    2015-01-01

    Accumulating evidence suggests that storing speech sounds requires transposing rapidly fluctuating sound waves into more easily encoded oromotor sequences. If so, then the classical speech areas in the caudalmost portion of the temporal gyrus (pSTG) and in the inferior frontal gyrus (IFG) may be critical for performing this acoustic-oromotor transposition. We tested this proposal by applying repetitive transcranial magnetic stimulation (rTMS) to each of these left-hemisphere loci, as well as to a nonspeech locus, while participants listened to pseudowords. After 5 minutes these stimuli were re-presented together with new ones in a recognition test. Compared to control-site stimulation, pSTG stimulation produced a highly significant increase in recognition error rate, without affecting reaction time. By contrast, IFG stimulation led only to a weak, non-significant, trend toward recognition memory impairment. Importantly, the impairment after pSTG stimulation was not due to interference with perception, since the same stimulation failed to affect pseudoword discrimination examined with short interstimulus intervals. Our findings suggest that pSTG is essential for transforming speech sounds into stored motor plans for reproducing the sound. Whether or not the IFG also plays a role in speech-sound recognition could not be determined from the present results. PMID:25815813

  14. Participation of the classical speech areas in auditory long-term memory.

    PubMed

    Karabanov, Anke Ninija; Paine, Rainer; Chao, Chi Chao; Schulze, Katrin; Scott, Brian; Hallett, Mark; Mishkin, Mortimer

    2015-01-01

    Accumulating evidence suggests that storing speech sounds requires transposing rapidly fluctuating sound waves into more easily encoded oromotor sequences. If so, then the classical speech areas in the caudalmost portion of the temporal gyrus (pSTG) and in the inferior frontal gyrus (IFG) may be critical for performing this acoustic-oromotor transposition. We tested this proposal by applying repetitive transcranial magnetic stimulation (rTMS) to each of these left-hemisphere loci, as well as to a nonspeech locus, while participants listened to pseudowords. After 5 minutes these stimuli were re-presented together with new ones in a recognition test. Compared to control-site stimulation, pSTG stimulation produced a highly significant increase in recognition error rate, without affecting reaction time. By contrast, IFG stimulation led only to a weak, non-significant, trend toward recognition memory impairment. Importantly, the impairment after pSTG stimulation was not due to interference with perception, since the same stimulation failed to affect pseudoword discrimination examined with short interstimulus intervals. Our findings suggest that pSTG is essential for transforming speech sounds into stored motor plans for reproducing the sound. Whether or not the IFG also plays a role in speech-sound recognition could not be determined from the present results.

  15. Syntactic error modeling and scoring normalization in speech recognition

    NASA Technical Reports Server (NTRS)

    Olorenshaw, Lex

    1991-01-01

    The objective was to develop the speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Research was performed in the following areas: (1) syntactic error modeling; (2) score normalization; and (3) phoneme error modeling. The study into the types of errors that a reader makes will provide the basis for creating tests which will approximate the use of the system in the real world. NASA-Johnson will develop this technology into a 'Literacy Tutor' in order to bring innovative concepts to the task of teaching adults to read.

  16. Tone classification of syllable-segmented Thai speech based on multilayer perception

    NASA Astrophysics Data System (ADS)

    Satravaha, Nuttavudh; Klinkhachorn, Powsiri; Lass, Norman

    2002-05-01

    Thai is a monosyllabic tonal language that uses tone to convey lexical information about the meaning of a syllable. Thus to completely recognize a spoken Thai syllable, a speech recognition system not only has to recognize a base syllable but also must correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system. Thai has five distinctive tones (``mid,'' ``low,'' ``falling,'' ``high,'' and ``rising'') and each tone is represented by a single fundamental frequency (F0) pattern. However, several factors, including tonal coarticulation, stress, intonation, and speaker variability, affect the F0 pattern of a syllable in continuous Thai speech. In this study, an efficient method for tone classification of syllable-segmented Thai speech, which incorporates the effects of tonal coarticulation, stress, and intonation, as well as a method to perform automatic syllable segmentation, were developed. Acoustic parameters were used as the main discriminating parameters. The F0 contour of a segmented syllable was normalized by using a z-score transformation before being presented to a tone classifier. The proposed system was evaluated on 920 test utterances spoken by 8 speakers. A recognition rate of 91.36% was achieved by the proposed system.

  17. Multisensory speech perception in autism spectrum disorder: From phoneme to whole-word perception.

    PubMed

    Stevenson, Ryan A; Baum, Sarah H; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Wallace, Mark T

    2017-07-01

    Speech perception in noisy environments is boosted when a listener can see the speaker's mouth and integrate the auditory and visual speech information. Autistic children have a diminished capacity to integrate sensory information across modalities, which contributes to core symptoms of autism, such as impairments in social communication. We investigated the abilities of autistic and typically-developing (TD) children to integrate auditory and visual speech stimuli in various signal-to-noise ratios (SNR). Measurements of both whole-word and phoneme recognition were recorded. At the level of whole-word recognition, autistic children exhibited reduced performance in both the auditory and audiovisual modalities. Importantly, autistic children showed reduced behavioral benefit from multisensory integration with whole-word recognition, specifically at low SNRs. At the level of phoneme recognition, autistic children exhibited reduced performance relative to their TD peers in auditory, visual, and audiovisual modalities. However, and in contrast to their performance at the level of whole-word recognition, both autistic and TD children showed benefits from multisensory integration for phoneme recognition. In accordance with the principle of inverse effectiveness, both groups exhibited greater benefit at low SNRs relative to high SNRs. Thus, while autistic children showed typical multisensory benefits during phoneme recognition, these benefits did not translate to typical multisensory benefit of whole-word recognition in noisy environments. We hypothesize that sensory impairments in autistic children raise the SNR threshold needed to extract meaningful information from a given sensory input, resulting in subsequent failure to exhibit behavioral benefits from additional sensory information at the level of whole-word recognition. Autism Res 2017. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. Autism Res 2017, 10: 1280-1290. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.

  18. Lexical-Access Ability and Cognitive Predictors of Speech Recognition in Noise in Adult Cochlear Implant Users

    PubMed Central

    Smits, Cas; Merkus, Paul; Festen, Joost M.; Goverts, S. Theo

    2017-01-01

    Not all of the variance in speech-recognition performance of cochlear implant (CI) users can be explained by biographic and auditory factors. In normal-hearing listeners, linguistic and cognitive factors determine most of speech-in-noise performance. The current study explored specifically the influence of visually measured lexical-access ability compared with other cognitive factors on speech recognition of 24 postlingually deafened CI users. Speech-recognition performance was measured with monosyllables in quiet (consonant-vowel-consonant [CVC]), sentences-in-noise (SIN), and digit-triplets in noise (DIN). In addition to a composite variable of lexical-access ability (LA), measured with a lexical-decision test (LDT) and word-naming task, vocabulary size, working-memory capacity (Reading Span test [RSpan]), and a visual analogue of the SIN test (text reception threshold test) were measured. The DIN test was used to correct for auditory factors in SIN thresholds by taking the difference between SIN and DIN: SRTdiff. Correlation analyses revealed that duration of hearing loss (dHL) was related to SIN thresholds. Better working-memory capacity was related to SIN and SRTdiff scores. LDT reaction time was positively correlated with SRTdiff scores. No significant relationships were found for CVC or DIN scores with the predictor variables. Regression analyses showed that together with dHL, RSpan explained 55% of the variance in SIN thresholds. When controlling for auditory performance, LA, LDT, and RSpan separately explained, together with dHL, respectively 37%, 36%, and 46% of the variance in SRTdiff outcome. The results suggest that poor verbal working-memory capacity and to a lesser extent poor lexical-access ability limit speech-recognition ability in listeners with a CI. PMID:29205095

  19. A new time-adaptive discrete bionic wavelet transform for enhancing speech from adverse noise environment

    NASA Astrophysics Data System (ADS)

    Palaniswamy, Sumithra; Duraisamy, Prakash; Alam, Mohammad Showkat; Yuan, Xiaohui

    2012-04-01

    Automatic speech processing systems are widely used in everyday life such as mobile communication, speech and speaker recognition, and for assisting the hearing impaired. In speech communication systems, the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. To obtain an intelligible speech signal and one that is more pleasant to listen, noise reduction is essential. In this paper a new Time Adaptive Discrete Bionic Wavelet Thresholding (TADBWT) scheme is proposed. The proposed technique uses Daubechies mother wavelet to achieve better enhancement of speech from additive non- stationary noises which occur in real life such as street noise and factory noise. Due to the integration of human auditory system model into the wavelet transform, bionic wavelet transform (BWT) has great potential for speech enhancement which may lead to a new path in speech processing. In the proposed technique, at first, discrete BWT is applied to noisy speech to derive TADBWT coefficients. Then the adaptive nature of the BWT is captured by introducing a time varying linear factor which updates the coefficients at each scale over time. This approach has shown better performance than the existing algorithms at lower input SNR due to modified soft level dependent thresholding on time adaptive coefficients. The objective and subjective test results confirmed the competency of the TADBWT technique. The effectiveness of the proposed technique is also evaluated for speaker recognition task under noisy environment. The recognition results show that the TADWT technique yields better performance when compared to alternate methods specifically at lower input SNR.

  20. Tell it like it is.

    PubMed

    Lee, S L

    2000-05-01

    Nurses, therapists and case managers were spending too much time each week on the phone waiting to read patient reports to live transcriptionists who would then type the reports for storage in VNSNY's clinical management mainframe database. A speech recognition system helped solve the problem by providing the staff 24-hour access to an automated transcription service any day of the week. Nurses and case managers no longer wait in long queues to transmit patient reports or to retrieve information from the database. Everything is done automatically within minutes. VNSNY saved both time and money by updating its transcription strategy. Now nurses can spend more time with patients and less time on the phone transcribing notes. It also means fewer staff members are needed on weekends to do manual transcribing.

  1. Relating hearing loss and executive functions to hearing aid users' preference for, and speech recognition with, different combinations of binaural noise reduction and microphone directionality

    PubMed Central

    Neher, Tobias

    2014-01-01

    Knowledge of how executive functions relate to preferred hearing aid (HA) processing is sparse and seemingly inconsistent with related knowledge for speech recognition outcomes. This study thus aimed to find out if (1) performance on a measure of reading span (RS) is related to preferred binaural noise reduction (NR) strength, (2) similar relations exist for two different, non-verbal measures of executive function, (3) pure-tone average hearing loss (PTA), signal-to-noise ratio (SNR), and microphone directionality (DIR) also influence preferred NR strength, and (4) preference and speech recognition outcomes are similar. Sixty elderly HA users took part. Six HA conditions consisting of omnidirectional or cardioid microphones followed by inactive, moderate, or strong binaural NR as well as linear amplification were tested. Outcome was assessed at fixed SNRs using headphone simulations of a frontal target talker in a busy cafeteria. Analyses showed positive effects of active NR and DIR on preference, and negative and positive effects of, respectively, strong NR and DIR on speech recognition. Also, while moderate NR was the most preferred NR setting overall, preference for strong NR increased with SNR. No relation between RS and preference was found. However, larger PTA was related to weaker preference for inactive NR and stronger preference for strong NR for both microphone modes. Equivalent (but weaker) relations between worse performance on one non-verbal measure of executive function and the HA conditions without DIR were found. For speech recognition, there were relations between HA condition, PTA, and RS, but their pattern differed from that for preference. Altogether, these results indicate that, while moderate NR works well in general, a notable proportion of HA users prefer stronger NR. Furthermore, PTA and executive functions can account for some of the variability in preference for, and speech recognition with, different binaural NR and DIR settings. PMID:25538547

  2. Lessons Learned in Part-of-Speech Tagging of Conversational Speech

    DTIC Science & Technology

    2010-10-01

    for conversational speech recognition. In Plenary Meeting and Symposium on Prosody and Speech Processing. Slav Petrov and Dan Klein. 2007. Improved...inference for unlexicalized parsing. In HLT-NAACL. Slav Petrov. 2010. Products of random latent variable grammars. In HLT-NAACL. Brian Roark, Yang Liu

  3. Recognition and Localization of Speech by Adult Cochlear Implant Recipients Wearing a Digital Hearing Aid in the Nonimplanted Ear (Bimodal Hearing)

    PubMed Central

    Potts, Lisa G.; Skinner, Margaret W.; Litovsky, Ruth A.; Strube, Michael J; Kuk, Francis

    2010-01-01

    Background The use of bilateral amplification is now common clinical practice for hearing aid users but not for cochlear implant recipients. In the past, most cochlear implant recipients were implanted in one ear and wore only a monaural cochlear implant processor. There has been recent interest in benefits arising from bilateral stimulation that may be present for cochlear implant recipients. One option for bilateral stimulation is the use of a cochlear implant in one ear and a hearing aid in the opposite nonimplanted ear (bimodal hearing). Purpose This study evaluated the effect of wearing a cochlear implant in one ear and a digital hearing aid in the opposite ear on speech recognition and localization. Research Design A repeated-measures correlational study was completed. Study Sample Nineteen adult Cochlear Nucleus 24 implant recipients participated in the study. Intervention The participants were fit with a Widex Senso Vita 38 hearing aid to achieve maximum audibility and comfort within their dynamic range. Data Collection and Analysis Soundfield thresholds, loudness growth, speech recognition, localization, and subjective questionnaires were obtained six–eight weeks after the hearing aid fitting. Testing was completed in three conditions: hearing aid only, cochlear implant only, and cochlear implant and hearing aid (bimodal). All tests were repeated four weeks after the first test session. Repeated-measures analysis of variance was used to analyze the data. Significant effects were further examined using pairwise comparison of means or in the case of continuous moderators, regression analyses. The speech-recognition and localization tasks were unique, in that a speech stimulus presented from a variety of roaming azimuths (140 degree loudspeaker array) was used. Results Performance in the bimodal condition was significantly better for speech recognition and localization compared to the cochlear implant–only and hearing aid–only conditions. Performance was also different between these conditions when the location (i.e., side of the loudspeaker array that presented the word) was analyzed. In the bimodal condition, the speech-recognition and localization tasks were equal regardless of which side of the loudspeaker array presented the word, while performance was significantly poorer for the monaural conditions (hearing aid only and cochlear implant only) when the words were presented on the side with no stimulation. Binaural loudness summation of 1–3 dB was seen in soundfield thresholds and loudness growth in the bimodal condition. Measures of the audibility of sound with the hearing aid, including unaided thresholds, soundfield thresholds, and the Speech Intelligibility Index, were significant moderators of speech recognition and localization. Based on the questionnaire responses, participants showed a strong preference for bimodal stimulation. Conclusions These findings suggest that a well-fit digital hearing aid worn in conjunction with a cochlear implant is beneficial to speech recognition and localization. The dynamic test procedures used in this study illustrate the importance of bilateral hearing for locating, identifying, and switching attention between multiple speakers. It is recommended that unilateral cochlear implant recipients, with measurable unaided hearing thresholds, be fit with a hearing aid. PMID:19594084

  4. Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis

    ERIC Educational Resources Information Center

    Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.

    2009-01-01

    Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…

  5. Staff Report to the Senior Department Official on Recognition Compliance Issues. Recommendation Page: American Speech-Language-Hearing Association

    ERIC Educational Resources Information Center

    US Department of Education, 2010

    2010-01-01

    The American Speech-Language-Hearing Association, Council on Academic Accreditation in Audiology and Speech-Language Pathology (CAA) is a national accrediting agency of graduate education programs in audiology or speech-language pathology. The CAA currently accredits or or preaccredits 319 programs (247 in speech-language pathology and 72 in…

  6. Working Memory and Speech Recognition in Noise Under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type Among Adults With Hearing Loss

    PubMed Central

    Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

    2017-01-01

    Purpose This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2 measures of WM were taken: a reading span measure, and Word Auditory Recognition and Recall Measure (Smith, Pichora-Fuller, & Alexander, 2016). Speech recognition was measured with the Multi-Modal Lexical Sentence Test for Adults (Kirk et al., 2012) in steady-state noise and 4-talker babble, with and without visual cues. Testing was under unaided conditions. Results A linear mixed model revealed visual cues and pure-tone average as the only significant predictors of Multi-Modal Lexical Sentence Test outcomes. Neither WM measure nor noise type showed a significant effect. Conclusion The contribution of WM in explaining unaided speech recognition in noise was negligible and not influenced by noise type or visual cues. We anticipate that with audibility partially restored by hearing aids, the effects of WM will increase. For clinical practice to be affected, more significant effect sizes are needed. PMID:28744550

  7. Influence of auditory attention on sentence recognition captured by the neural phase.

    PubMed

    Müller, Jana Annina; Kollmeier, Birger; Debener, Stefan; Brand, Thomas

    2018-03-07

    The aim of this study was to investigate whether attentional influences on speech recognition are reflected in the neural phase entrained by an external modulator. Sentences were presented in 7 Hz sinusoidally modulated noise while the neural response to that modulation frequency was monitored by electroencephalogram (EEG) recordings in 21 participants. We implemented a selective attention paradigm including three different attention conditions while keeping physical stimulus parameters constant. The participants' task was either to repeat the sentence as accurately as possible (speech recognition task), to count the number of decrements implemented in modulated noise (decrement detection task), or to do both (dual task), while the EEG was recorded. Behavioural analysis revealed reduced performance in the dual task condition for decrement detection, possibly reflecting limited cognitive resources. EEG analysis revealed no significant differences in power for the 7 Hz modulation frequency, but an attention-dependent phase difference between tasks. Further phase analysis revealed a significant difference 500 ms after sentence onset between trials with correct and incorrect responses for speech recognition, indicating that speech recognition performance and the neural phase are linked via selective attention mechanisms, at least shortly after sentence onset. However, the neural phase effects identified were small and await further investigation. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  8. Hybrid simulated annealing and its application to optimization of hidden Markov models for visual speech recognition.

    PubMed

    Lee, Jong-Seok; Park, Cheol Hoon

    2010-08-01

    We propose a novel stochastic optimization algorithm, hybrid simulated annealing (SA), to train hidden Markov models (HMMs) for visual speech recognition. In our algorithm, SA is combined with a local optimization operator that substitutes a better solution for the current one to improve the convergence speed and the quality of solutions. We mathematically prove that the sequence of the objective values converges in probability to the global optimum in the algorithm. The algorithm is applied to train HMMs that are used as visual speech recognizers. While the popular training method of HMMs, the expectation-maximization algorithm, achieves only local optima in the parameter space, the proposed method can perform global optimization of the parameters of HMMs and thereby obtain solutions yielding improved recognition performance. The superiority of the proposed algorithm to the conventional ones is demonstrated via isolated word recognition experiments.

  9. Prediction of psychosis across protocols and risk cohorts using automated language analysis

    PubMed Central

    Corcoran, Cheryl M.; Carrillo, Facundo; Fernández‐Slezak, Diego; Bedi, Gillinder; Klim, Casimir; Javitt, Daniel C.; Bearden, Carrie E.; Cecchi, Guillermo A.

    2018-01-01

    Language and speech are the primary source of data for psychiatrists to diagnose and treat mental disorders. In psychosis, the very structure of language can be disturbed, including semantic coherence (e.g., derailment and tangentiality) and syntactic complexity (e.g., concreteness). Subtle disturbances in language are evident in schizophrenia even prior to first psychosis onset, during prodromal stages. Using computer‐based natural language processing analyses, we previously showed that, among English‐speaking clinical (e.g., ultra) high‐risk youths, baseline reduction in semantic coherence (the flow of meaning in speech) and in syntactic complexity could predict subsequent psychosis onset with high accuracy. Herein, we aimed to cross‐validate these automated linguistic analytic methods in a second larger risk cohort, also English‐speaking, and to discriminate speech in psychosis from normal speech. We identified an automated machine‐learning speech classifier – comprising decreased semantic coherence, greater variance in that coherence, and reduced usage of possessive pronouns – that had an 83% accuracy in predicting psychosis onset (intra‐protocol), a cross‐validated accuracy of 79% of psychosis onset prediction in the original risk cohort (cross‐protocol), and a 72% accuracy in discriminating the speech of recent‐onset psychosis patients from that of healthy individuals. The classifier was highly correlated with previously identified manual linguistic predictors. Our findings support the utility and validity of automated natural language processing methods to characterize disturbances in semantics and syntax across stages of psychotic disorder. The next steps will be to apply these methods in larger risk cohorts to further test reproducibility, also in languages other than English, and identify sources of variability. This technology has the potential to improve prediction of psychosis outcome among at‐risk youths and identify linguistic targets for remediation and preventive intervention. More broadly, automated linguistic analysis can be a powerful tool for diagnosis and treatment across neuropsychiatry. PMID:29352548

  10. Prediction of psychosis across protocols and risk cohorts using automated language analysis.

    PubMed

    Corcoran, Cheryl M; Carrillo, Facundo; Fernández-Slezak, Diego; Bedi, Gillinder; Klim, Casimir; Javitt, Daniel C; Bearden, Carrie E; Cecchi, Guillermo A

    2018-02-01

    Language and speech are the primary source of data for psychiatrists to diagnose and treat mental disorders. In psychosis, the very structure of language can be disturbed, including semantic coherence (e.g., derailment and tangentiality) and syntactic complexity (e.g., concreteness). Subtle disturbances in language are evident in schizophrenia even prior to first psychosis onset, during prodromal stages. Using computer-based natural language processing analyses, we previously showed that, among English-speaking clinical (e.g., ultra) high-risk youths, baseline reduction in semantic coherence (the flow of meaning in speech) and in syntactic complexity could predict subsequent psychosis onset with high accuracy. Herein, we aimed to cross-validate these automated linguistic analytic methods in a second larger risk cohort, also English-speaking, and to discriminate speech in psychosis from normal speech. We identified an automated machine-learning speech classifier - comprising decreased semantic coherence, greater variance in that coherence, and reduced usage of possessive pronouns - that had an 83% accuracy in predicting psychosis onset (intra-protocol), a cross-validated accuracy of 79% of psychosis onset prediction in the original risk cohort (cross-protocol), and a 72% accuracy in discriminating the speech of recent-onset psychosis patients from that of healthy individuals. The classifier was highly correlated with previously identified manual linguistic predictors. Our findings support the utility and validity of automated natural language processing methods to characterize disturbances in semantics and syntax across stages of psychotic disorder. The next steps will be to apply these methods in larger risk cohorts to further test reproducibility, also in languages other than English, and identify sources of variability. This technology has the potential to improve prediction of psychosis outcome among at-risk youths and identify linguistic targets for remediation and preventive intervention. More broadly, automated linguistic analysis can be a powerful tool for diagnosis and treatment across neuropsychiatry. © 2018 World Psychiatric Association.

  11. Speech-Enabled Interfaces for Travel Information Systems with Large Grammars

    NASA Astrophysics Data System (ADS)

    Zhao, Baoli; Allen, Tony; Bargiela, Andrzej

    This paper introduces three grammar-segmentation methods capable of handling the large grammar issues associated with producing a real-time speech-enabled VXML bus travel application for London. Large grammars tend to produce relatively slow recognition interfaces and this work shows how this limitation can be successfully addressed. Comparative experimental results show that the novel last-word recognition based grammar segmentation method described here achieves an optimal balance between recognition rate, speed of processing and naturalness of interaction.

  12. Memristive Computational Architecture of an Echo State Network for Real-Time Speech Emotion Recognition

    DTIC Science & Technology

    2015-05-28

    recognition is simpler and requires less computational resources compared to other inputs such as facial expressions . The Berlin database of Emotional ...Processing Magazine, IEEE, vol. 18, no. 1, pp. 32– 80, 2001. [15] K. R. Scherer, T. Johnstone, and G. Klasmeyer, “Vocal expression of emotion ...Network for Real-Time Speech- Emotion Recognition 5a. CONTRACT NUMBER IN-HOUSE 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 62788F 6. AUTHOR(S) Q

  13. Inferring Speaker Affect in Spoken Natural Language Communication

    ERIC Educational Resources Information Center

    Pon-Barry, Heather Roberta

    2013-01-01

    The field of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards…

  14. Use of Computer Speech Technologies To Enhance Learning.

    ERIC Educational Resources Information Center

    Ferrell, Joe

    1999-01-01

    Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)

  15. Enhancing Speech Intelligibility: Interactions among Context, Modality, Speech Style, and Masker

    ERIC Educational Resources Information Center

    Van Engen, Kristin J.; Phelps, Jasmine E. B.; Smiljanic, Rajka; Chandrasekaran, Bharath

    2014-01-01

    Purpose: The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method: Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous…

  16. The Affordance of Speech Recognition Technology for EFL Learning in an Elementary School Setting

    ERIC Educational Resources Information Center

    Liaw, Meei-Ling

    2014-01-01

    This study examined the use of speech recognition (SR) technology to support a group of elementary school children's learning of English as a foreign language (EFL). SR technology has been used in various language learning contexts. Its application to EFL teaching and learning is still relatively recent, but a solid understanding of its…

  17. The Effect of Automatic Speech Recognition Eyespeak Software on Iraqi Students' English Pronunciation: A Pilot Study

    ERIC Educational Resources Information Center

    Sidgi, Lina Fathi Sidig; Shaari, Ahmad Jelani

    2017-01-01

    The use of technology, such as computer-assisted language learning (CALL), is used in teaching and learning in the foreign language classrooms where it is most needed. One promising emerging technology that supports language learning is automatic speech recognition (ASR). Integrating such technology, especially in the instruction of pronunciation…

  18. Machine Learning Through Signature Trees. Applications to Human Speech.

    ERIC Educational Resources Information Center

    White, George M.

    A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…

  19. Apps, iPads, and Literacy: Examining the Feasibility of Speech Recognition in a First-Grade Classroom

    ERIC Educational Resources Information Center

    Baker, Elizabeth A.

    2017-01-01

    Informed by sociocultural and systems theory tenets, this study used ethnographic research methods to examine the feasibility of using speech recognition (SR) technology to support struggling readers in an early elementary classroom setting. Observations of eight first graders were conducted as they participated in a structured SR-supported…

  20. User Experience of a Mobile Speaking Application with Automatic Speech Recognition for EFL Learning

    ERIC Educational Resources Information Center

    Ahn, Tae youn; Lee, Sangmin-Michelle

    2016-01-01

    With the spread of mobile devices, mobile phones have enormous potential regarding their pedagogical use in language education. The goal of this study is to analyse user experience of a mobile-based learning system that is enhanced by speech recognition technology for the improvement of EFL (English as a foreign language) learners' speaking…

  1. [The role of temporal fine structure in tone recognition and music perception].

    PubMed

    Zhou, Q; Gu, X; Liu, B

    2017-11-07

    The sound signal can be decomposed into temporal envelope and temporal fine structure information. The temporal envelope information is crucial for speech perception in quiet environment, and the temporal fine structure information plays an important role in speech perception in noise, Mandarin tone recognition and music perception, especially the pitch and melody perception.

  2. EduSpeak[R]: A Speech Recognition and Pronunciation Scoring Toolkit for Computer-Aided Language Learning Applications

    ERIC Educational Resources Information Center

    Franco, Horacio; Bratt, Harry; Rossier, Romain; Rao Gadde, Venkata; Shriberg, Elizabeth; Abrash, Victor; Precoda, Kristin

    2010-01-01

    SRI International's EduSpeak[R] system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to…

  3. Investigating an Innovative Computer Application to Improve L2 Word Recognition from Speech

    ERIC Educational Resources Information Center

    Matthews, Joshua; O'Toole, John Mitchell

    2015-01-01

    The ability to recognise words from the aural modality is a critical aspect of successful second language (L2) listening comprehension. However, little research has been reported on computer-mediated development of L2 word recognition from speech in L2 learning contexts. This report describes the development of an innovative computer application…

  4. Using Automatic Speech Recognition Technology with Elicited Oral Response Testing

    ERIC Educational Resources Information Center

    Cox, Troy L.; Davies, Randall S.

    2012-01-01

    This study examined the use of automatic speech recognition (ASR) scored elicited oral response (EOR) tests to assess the speaking ability of English language learners. It also examined the relationship between ASR-scored EOR and other language proficiency measures and the ability of the ASR to rate speakers without bias to gender or native…

  5. Performance Evaluation of Speech Recognition Systems as a Next-Generation Pilot-Vehicle Interface Technology

    NASA Technical Reports Server (NTRS)

    Arthur, Jarvis J., III; Shelton, Kevin J.; Prinzel, Lawrence J., III; Bailey, Randall E.

    2016-01-01

    During the flight trials known as Gulfstream-V Synthetic Vision Systems Integrated Technology Evaluation (GV-SITE), a Speech Recognition System (SRS) was used by the evaluation pilots. The SRS system was intended to be an intuitive interface for display control (rather than knobs, buttons, etc.). This paper describes the performance of the current "state of the art" Speech Recognition System (SRS). The commercially available technology was evaluated as an application for possible inclusion in commercial aircraft flight decks as a crew-to-vehicle interface. Specifically, the technology is to be used as an interface from aircrew to the onboard displays, controls, and flight management tasks. A flight test of a SRS as well as a laboratory test was conducted.

  6. The effect of hearing aid technologies on listening in an automobile

    PubMed Central

    Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A.; Stanziola, Rachel W.

    2014-01-01

    Background Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile /road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the driver/listener, conventional directional processing that places the directivity beam toward the listener’s front may not be helpful, and in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). Purpose The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener’s speech recognition performance and preference for communication in a traveling automobile. Research design A single-blinded, repeated-measures design was used. Study Sample Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 years participated in the study. Data Collection and Analysis The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 miles/hour on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound treated booth to assess speech recognition performance and preference with each programmed condition. Results Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants’ preferences for a given processing scheme were generally consistent with speech recognition results. Conclusions The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. PMID:23886425

  7. Automatic speech recognition using a predictive echo state network classifier.

    PubMed

    Skowronski, Mark D; Harris, John G

    2007-04-01

    We have combined an echo state network (ESN) with a competitive state machine framework to create a classification engine called the predictive ESN classifier. We derive the expressions for training the predictive ESN classifier and show that the model was significantly more noise robust compared to a hidden Markov model in noisy speech classification experiments by 8+/-1 dB signal-to-noise ratio. The simple training algorithm and noise robustness of the predictive ESN classifier make it an attractive classification engine for automatic speech recognition.

  8. A Mis-recognized Medical Vocabulary Correction System for Speech-based Electronic Medical Record

    PubMed Central

    Seo, Hwa Jeong; Kim, Ju Han; Sakabe, Nagamasa

    2002-01-01

    Speech recognition as an input tool for electronic medical record (EMR) enables efficient data entry at the point of care. However, the recognition accuracy for medical vocabulary is much poorer than that for doctor-patient dialogue. We developed a mis-recognized medical vocabulary correction system based on syllable-by-syllable comparison of speech text against medical vocabulary database. Using specialty medical vocabulary, the algorithm detects and corrects mis-recognized medical vocabularies in narrative text. Our preliminary evaluation showed 94% of accuracy in mis-recognized medical vocabulary correction.

  9. Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

    NASA Astrophysics Data System (ADS)

    Malcangi, Mario

    2009-08-01

    Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.

  10. Technological evaluation of gesture and speech interfaces for enabling dismounted soldier-robot dialogue

    NASA Astrophysics Data System (ADS)

    Kattoju, Ravi Kiran; Barber, Daniel J.; Abich, Julian; Harris, Jonathan

    2016-05-01

    With increasing necessity for intuitive Soldier-robot communication in military operations and advancements in interactive technologies, autonomous robots have transitioned from assistance tools to functional and operational teammates able to service an array of military operations. Despite improvements in gesture and speech recognition technologies, their effectiveness in supporting Soldier-robot communication is still uncertain. The purpose of the present study was to evaluate the performance of gesture and speech interface technologies to facilitate Soldier-robot communication during a spatial-navigation task with an autonomous robot. Gesture and speech semantically based spatial-navigation commands leveraged existing lexicons for visual and verbal communication from the U.S Army field manual for visual signaling and a previously established Squad Level Vocabulary (SLV). Speech commands were recorded by a Lapel microphone and Microsoft Kinect, and classified by commercial off-the-shelf automatic speech recognition (ASR) software. Visual signals were captured and classified using a custom wireless gesture glove and software. Participants in the experiment commanded a robot to complete a simulated ISR mission in a scaled down urban scenario by delivering a sequence of gesture and speech commands, both individually and simultaneously, to the robot. Performance and reliability of gesture and speech hardware interfaces and recognition tools were analyzed and reported. Analysis of experimental results demonstrated the employed gesture technology has significant potential for enabling bidirectional Soldier-robot team dialogue based on the high classification accuracy and minimal training required to perform gesture commands.

  11. Accuracy of cochlear implant recipients in speech reception in the presence of background music.

    PubMed

    Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Kliethermes, Stephanie; Driscoll, Virginia

    2012-12-01

    This study examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of 3 contrasting types of background music, and compared performance based upon listener groups: CI recipients using conventional long-electrode devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing adults. We tested 154 long-electrode CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 normal-hearing adults on closed-set recognition of spondees presented in 3 contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Signal-to-noise ratio thresholds for speech in music were examined in relation to measures of speech recognition in background noise and multitalker babble, pitch perception, and music experience. The signal-to-noise ratio thresholds for speech in music varied as a function of category of background music, group membership (long-electrode, Hybrid, normal-hearing), and age. The thresholds for speech in background music were significantly correlated with measures of pitch perception and thresholds for speech in background noise; auditory status was an important predictor. Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music.

  12. Hidden Markov models in automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  13. Speech-on-speech masking with variable access to the linguistic content of the masker speech for native and non-native speakers of English

    PubMed Central

    Calandruccio, Lauren; Bradlow, Ann R.; Dhar, Sumitrajit

    2013-01-01

    Background Masking release for an English sentence-recognition task in the presence of foreign-accented English speech compared to native-accented English speech was reported in Calandruccio, Dhar and Bradlow (2010). The masking release appeared to increase as the masker intelligibility decreased. However, it could not be ruled out that spectral differences between the speech maskers were influencing the significant differences observed. Purpose The purpose of the current experiment was to minimize spectral differences between speech maskers to determine how various amounts of linguistic information within competing speech affect masking release. Research Design A mixed model design with within- (four two-talker speech maskers) and between-subject (listener group) factors was conducted. Speech maskers included native-accented English speech, and high-intelligibility, moderate-intelligibility and low-intelligibility Mandarin-accented English. Normalizing the long-term average speech spectra of the maskers to each other minimized spectral differences between the masker conditions. Study Sample Three listener groups were tested including monolingual English speakers with normal hearing, non-native speakers of English with normal hearing, and monolingual speakers of English with hearing loss. The non-native speakers of English were from various native-language backgrounds, not including Mandarin (or any other Chinese dialect). Listeners with hearing loss had symmetrical, mild sloping to moderate sensorineural hearing loss. Data Collection and Analysis Listeners were asked to repeat back sentences that were presented in the presence of four different two-talker speech maskers. Responses were scored based on the keywords within the sentences (100 keywords/masker condition). A mixed-model regression analysis was used to analyze the difference in performance scores between the masker conditions and the listener groups. Results Monolingual speakers of English with normal hearing benefited when the competing speech signal was foreign-accented compared to native-accented allowing for improved speech recognition. Various levels of intelligibility across the foreign-accented speech maskers did not influence results. Neither the non-native English listeners with normal hearing, nor the monolingual English speakers with hearing loss benefited from masking release when the masker was changed from native-accented to foreign-accented English. Conclusions Slight modifications between the target and the masker speech allowed monolingual speakers of English with normal hearing to improve their recognition of native-accented English even when the competing speech was highly intelligible. Further research is needed to determine which modifications within the competing speech signal caused the Mandarin-accented English to be less effective with respect to masking. Determining the influences within the competing speech that make it less effective as a masker, or determining why monolingual normal-hearing listeners can take advantage of these differences could help improve speech recognition for those with hearing loss in the future. PMID:25126683

  14. Fifty years of progress in acoustic phonetics

    NASA Astrophysics Data System (ADS)

    Stevens, Kenneth N.

    2004-10-01

    Three events that occurred 50 or 60 years ago shaped the study of acoustic phonetics, and in the following few decades these events influenced research and applications in speech disorders, speech development, speech synthesis, speech recognition, and other subareas in speech communication. These events were: (1) the source-filter theory of speech production (Chiba and Kajiyama; Fant); (2) the development of the sound spectrograph and its interpretation (Potter, Kopp, and Green; Joos); and (3) the birth of research that related distinctive features to acoustic patterns (Jakobson, Fant, and Halle). Following these events there has been systematic exploration of the articulatory, acoustic, and perceptual bases of phonological categories, and some quantification of the sources of variability in the transformation of this phonological representation of speech into its acoustic manifestations. This effort has been enhanced by studies of how children acquire language in spite of this variability and by research on speech disorders. Gaps in our knowledge of this inherent variability in speech have limited the directions of applications such as synthesis and recognition of speech, and have led to the implementation of data-driven techniques rather than theoretical principles. Some examples of advances in our knowledge, and limitations of this knowledge, are reviewed.

  15. Restoring the missing features of the corrupted speech using linear interpolation methods

    NASA Astrophysics Data System (ADS)

    Rassem, Taha H.; Makbol, Nasrin M.; Hasan, Ali Muttaleb; Zaki, Siti Syazni Mohd; Girija, P. N.

    2017-10-01

    One of the main challenges in the Automatic Speech Recognition (ASR) is the noise. The performance of the ASR system reduces significantly if the speech is corrupted by noise. In spectrogram representation of a speech signal, after deleting low Signal to Noise Ratio (SNR) elements, the incomplete spectrogram is obtained. In this case, the speech recognizer should make modifications to the spectrogram in order to restore the missing elements, which is one direction. In another direction, speech recognizer should be able to restore the missing elements due to deleting low SNR elements before performing the recognition. This is can be done using different spectrogram reconstruction methods. In this paper, the geometrical spectrogram reconstruction methods suggested by some researchers are implemented as a toolbox. In these geometrical reconstruction methods, the linear interpolation along time or frequency methods are used to predict the missing elements between adjacent observed elements in the spectrogram. Moreover, a new linear interpolation method using time and frequency together is presented. The CMU Sphinx III software is used in the experiments to test the performance of the linear interpolation reconstruction method. The experiments are done under different conditions such as different lengths of the window and different lengths of utterances. Speech corpus consists of 20 males and 20 females; each one has two different utterances are used in the experiments. As a result, 80% recognition accuracy is achieved with 25% SNR ratio.

  16. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  17. Driver compliance to take-over requests with different auditory outputs in conditional automation.

    PubMed

    Forster, Yannick; Naujoks, Frederik; Neukum, Alexandra; Huestegge, Lynn

    2017-12-01

    Conditionally automated driving (CAD) systems are expected to improve traffic safety. Whenever the CAD system exceeds its limit of operation, designers of the system need to ensure a safe and timely enough transition from automated to manual mode. An existing visual Human-Machine Interface (HMI) was supplemented by different auditory outputs. The present work compares the effects of different auditory outputs in form of (1) a generic warning tone and (2) additional semantic speech output on driver behavior for the announcement of an upcoming take-over request (TOR). We expect the information carried by means of speech output to lead to faster reactions and better subjective evaluations by the drivers compared to generic auditory output. To test this assumption, N=17 drivers completed two simulator drives, once with a generic warning tone ('Generic') and once with additional speech output ('Speech+generic'), while they were working on a non-driving related task (NDRT; i.e., reading a magazine). Each drive incorporated one transition from automated to manual mode when yellow secondary lanes emerged. Different reaction time measures, relevant for the take-over process, were assessed. Furthermore, drivers evaluated the complete HMI regarding usefulness, ease of use and perceived visual workload just after experiencing the take-over. They gave comparative ratings on usability and acceptance at the end of the experiment. Results revealed that reaction times, reflecting information processing time (i.e., hands on the steering wheel, termination of NDRT), were shorter for 'Speech+generic' compared to 'Generic' while reaction time, reflecting allocation of attention (i.e., first glance ahead), did not show this difference. Subjective ratings were in favor of the system with additional speech output. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Analyzing crowdsourced ratings of speech-based take-over requests for automated driving.

    PubMed

    Bazilinskyy, P; de Winter, J C F

    2017-10-01

    Take-over requests in automated driving should fit the urgency of the traffic situation. The robustness of various published research findings on the valuations of speech-based warning messages is unclear. This research aimed to establish how people value speech-based take-over requests as a function of speech rate, background noise, spoken phrase, and speaker's gender and emotional tone. By means of crowdsourcing, 2669 participants from 95 countries listened to a random 10 out of 140 take-over requests, and rated each take-over request on urgency, commandingness, pleasantness, and ease of understanding. Our results replicate several published findings, in particular that an increase in speech rate results in a monotonic increase of perceived urgency. The female voice was easier to understand than a male voice when there was a high level of background noise, a finding that contradicts the literature. Moreover, a take-over request spoken with Indian accent was found to be easier to understand by participants from India than by participants from other countries. Our results replicate effects in the literature regarding speech-based warnings, and shed new light on effects of background noise, gender, and nationality. The results may have implications for the selection of appropriate take-over requests in automated driving. Additionally, our study demonstrates the promise of crowdsourcing for testing human factors and ergonomics theories with large sample sizes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Evaluation of Mandarin Chinese Speech Recognition in Adults with Cochlear Implants Using the Spectral Ripple Discrimination Test

    PubMed Central

    Dai, Chuanfu; Zhao, Zeqi; Zhang, Duo; Lei, Guanxiong

    2018-01-01

    Background The aim of this study was to explore the value of the spectral ripple discrimination test in speech recognition evaluation among a deaf (post-lingual) Mandarin-speaking population in China following cochlear implantation. Material/Methods The study included 23 Mandarin-speaking adult subjects with normal hearing (normal-hearing group) and 17 deaf adults who were former Mandarin-speakers, with cochlear implants (cochlear implantation group). The normal-hearing subjects were divided into men (n=10) and women (n=13). The spectral ripple discrimination thresholds between the groups were compared. The correlation between spectral ripple discrimination thresholds and Mandarin speech recognition rates in the cochlear implantation group were studied. Results Spectral ripple discrimination thresholds did not correlate with age (r=−0.19; p=0.22), and there was no significant difference in spectral ripple discrimination thresholds between the male and female groups (p=0.654). Spectral ripple discrimination thresholds of deaf adults with cochlear implants were significantly correlated with monosyllabic recognition rates (r=0.84; p=0.000). Conclusions In a Mandarin Chinese speaking population, spectral ripple discrimination thresholds of normal-hearing individuals were unaffected by both gender and age. Spectral ripple discrimination thresholds were correlated with Mandarin monosyllabic recognition rates of Mandarin-speaking in post-lingual deaf adults with cochlear implants. The spectral ripple discrimination test is a promising method for speech recognition evaluation in adults following cochlear implantation in China. PMID:29806954

  20. Evaluation of Mandarin Chinese Speech Recognition in Adults with Cochlear Implants Using the Spectral Ripple Discrimination Test.

    PubMed

    Dai, Chuanfu; Zhao, Zeqi; Shen, Weidong; Zhang, Duo; Lei, Guanxiong; Qiao, Yuehua; Yang, Shiming

    2018-05-28

    BACKGROUND The aim of this study was to explore the value of the spectral ripple discrimination test in speech recognition evaluation among a deaf (post-lingual) Mandarin-speaking population in China following cochlear implantation. MATERIAL AND METHODS The study included 23 Mandarin-speaking adult subjects with normal hearing (normal-hearing group) and 17 deaf adults who were former Mandarin-speakers, with cochlear implants (cochlear implantation group). The normal-hearing subjects were divided into men (n=10) and women (n=13). The spectral ripple discrimination thresholds between the groups were compared. The correlation between spectral ripple discrimination thresholds and Mandarin speech recognition rates in the cochlear implantation group were studied. RESULTS Spectral ripple discrimination thresholds did not correlate with age (r=-0.19; p=0.22), and there was no significant difference in spectral ripple discrimination thresholds between the male and female groups (p=0.654). Spectral ripple discrimination thresholds of deaf adults with cochlear implants were significantly correlated with monosyllabic recognition rates (r=0.84; p=0.000). CONCLUSIONS In a Mandarin Chinese speaking population, spectral ripple discrimination thresholds of normal-hearing individuals were unaffected by both gender and age. Spectral ripple discrimination thresholds were correlated with Mandarin monosyllabic recognition rates of Mandarin-speaking in post-lingual deaf adults with cochlear implants. The spectral ripple discrimination test is a promising method for speech recognition evaluation in adults following cochlear implantation in China.

  1. Music Training Can Improve Music and Speech Perception in Pediatric Mandarin-Speaking Cochlear Implant Users.

    PubMed

    Cheng, Xiaoting; Liu, Yangwenyi; Shu, Yilai; Tao, Duo-Duo; Wang, Bing; Yuan, Yasheng; Galvin, John J; Fu, Qian-Jie; Chen, Bing

    2018-01-01

    Due to limited spectral resolution, cochlear implants (CIs) do not convey pitch information very well. Pitch cues are important for perception of music and tonal language; it is possible that music training may improve performance in both listening tasks. In this study, we investigated music training outcomes in terms of perception of music, lexical tones, and sentences in 22 young (4.8 to 9.3 years old), prelingually deaf Mandarin-speaking CI users. Music perception was measured using a melodic contour identification (MCI) task. Speech perception was measured for lexical tones and sentences presented in quiet. Subjects received 8 weeks of MCI training using pitch ranges not used for testing. Music and speech perception were measured at 2, 4, and 8 weeks after training was begun; follow-up measures were made 4 weeks after training was stopped. Mean baseline performance was 33.2%, 76.9%, and 45.8% correct for MCI, lexical tone recognition, and sentence recognition, respectively. After 8 weeks of MCI training, mean performance significantly improved by 22.9, 14.4, and 14.5 percentage points for MCI, lexical tone recognition, and sentence recognition, respectively ( p < .05 in all cases). Four weeks after training was stopped, there was no significant change in posttraining music and speech performance. The results suggest that music training can significantly improve pediatric Mandarin-speaking CI users' music and speech perception.

  2. A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks.

    PubMed

    Shahamiri, Seyed Reza; Salim, Siti Salwah Binti

    2014-09-01

    Automatic speech recognition (ASR) can be very helpful for speakers who suffer from dysarthria, a neurological disability that damages the control of motor speech articulators. Although a few attempts have been made to apply ASR technologies to sufferers of dysarthria, previous studies show that such ASR systems have not attained an adequate level of performance. In this study, a dysarthric multi-networks speech recognizer (DM-NSR) model is provided using a realization of multi-views multi-learners approach called multi-nets artificial neural networks, which tolerates variability of dysarthric speech. In particular, the DM-NSR model employs several ANNs (as learners) to approximate the likelihood of ASR vocabulary words and to deal with the complexity of dysarthric speech. The proposed DM-NSR approach was presented as both speaker-dependent and speaker-independent paradigms. In order to highlight the performance of the proposed model over legacy models, multi-views single-learner models of the DM-NSRs were also provided and their efficiencies were compared in detail. Moreover, a comparison among the prominent dysarthric ASR methods and the proposed one is provided. The results show that the DM-NSR recorded improved recognition rate by up to 24.67% and the error rate was reduced by up to 8.63% over the reference model.

  3. Semantic and phonetic enhancements for speech-in-noise recognition by native and non-native listeners.

    PubMed

    Bradlow, Ann R; Alexander, Jennifer A

    2007-04-01

    Previous research has shown that speech recognition differences between native and proficient non-native listeners emerge under suboptimal conditions. Current evidence has suggested that the key deficit that underlies this disproportionate effect of unfavorable listening conditions for non-native listeners is their less effective use of compensatory information at higher levels of processing to recover from information loss at the phoneme identification level. The present study investigated whether this non-native disadvantage could be overcome if enhancements at various levels of processing were presented in combination. Native and non-native listeners were presented with English sentences in which the final word varied in predictability and which were produced in either plain or clear speech. Results showed that, relative to the low-predictability-plain-speech baseline condition, non-native listener final word recognition improved only when both semantic and acoustic enhancements were available (high-predictability-clear-speech). In contrast, the native listeners benefited from each source of enhancement separately and in combination. These results suggests that native and non-native listeners apply similar strategies for speech-in-noise perception: The crucial difference is in the signal clarity required for contextual information to be effective, rather than in an inability of non-native listeners to take advantage of this contextual information per se.

  4. Investigating an Application of Speech-to-Text Recognition: A Study on Visual Attention and Learning Behaviour

    ERIC Educational Resources Information Center

    Huang, Y-M.; Liu, C-J.; Shadiev, Rustam; Shen, M-H.; Hwang, W-Y.

    2015-01-01

    One major drawback of previous research on speech-to-text recognition (STR) is that most findings showing the effectiveness of STR for learning were based upon subjective evidence. Very few studies have used eye-tracking techniques to investigate visual attention of students on STR-generated text. Furthermore, not much attention was paid to…

  5. The Compensatory Effectiveness of Optical Character Recognition/Speech Synthesis on Reading Comprehension of Postsecondary Students with Learning Disabilities.

    ERIC Educational Resources Information Center

    Higgins, Eleanor L.; Raskind, Marshall H.

    1997-01-01

    Thirty-seven college students with learning disabilities were given a reading comprehension task under the following conditions: (1) using an optical character recognition/speech synthesis system; (2) having the text read aloud by a human reader; or (3) reading silently without assistance. Findings indicated that the greater the disability, the…

  6. The Use of an Autonomous Pedagogical Agent and Automatic Speech Recognition for Teaching Sight Words to Students with Autism Spectrum Disorder

    ERIC Educational Resources Information Center

    Saadatzi, Mohammad Nasser; Pennington, Robert C.; Welch, Karla C.; Graham, James H.; Scott, Renee E.

    2017-01-01

    In the current study, we examined the effects of an instructional package comprised of an autonomous pedagogical agent, automatic speech recognition, and constant time delay during the instruction of reading sight words aloud to young adults with autism spectrum disorder. We used a concurrent multiple baseline across participants design to…

  7. Review of Speech-to-Text Recognition Technology for Enhancing Learning

    ERIC Educational Resources Information Center

    Shadiev, Rustam; Hwang, Wu-Yuin; Chen, Nian-Shing; Huang, Yueh-Min

    2014-01-01

    This paper reviewed literature from 1999 to 2014 inclusively on how Speech-to-Text Recognition (STR) technology has been applied to enhance learning. The first aim of this review is to understand how STR technology has been used to support learning over the past fifteen years, and the second is to analyze all research evidence to understand how…

  8. A Freely-Available Authoring System for Browser-Based CALL with Speech Recognition

    ERIC Educational Resources Information Center

    O'Brien, Myles

    2017-01-01

    A system for authoring browser-based CALL material incorporating Google speech recognition has been developed and made freely available for download. The system provides a teacher with a simple way to set up CALL material, including an optional image, sound or video, which will elicit spoken (and/or typed) answers from the user and check them…

  9. Computer-Mediated Input, Output and Feedback in the Development of L2 Word Recognition from Speech

    ERIC Educational Resources Information Center

    Matthews, Joshua; Cheng, Junyu; O'Toole, John Mitchell

    2015-01-01

    This paper reports on the impact of computer-mediated input, output and feedback on the development of second language (L2) word recognition from speech (WRS). A quasi-experimental pre-test/treatment/post-test research design was used involving three intact tertiary level English as a Second Language (ESL) classes. Classes were either assigned to…

  10. Applications of Speech-to-Text Recognition and Computer-Aided Translation for Facilitating Cross-Cultural Learning through a Learning Activity: Issues and Their Solutions

    ERIC Educational Resources Information Center

    Shadiev, Rustam; Wu, Ting-Ting; Sun, Ai; Huang, Yueh-Min

    2018-01-01

    In this study, 21 university students, who represented thirteen nationalities, participated in an online cross-cultural learning activity. The participants were engaged in interactions and exchanges carried out on Facebook® and Skype® platforms, and their multilingual communications were supported by speech-to-text recognition (STR) and…

  11. Investigating the Effectiveness of Speech-To-Text Recognition Applications on Learning Performance, Attention, and Meditation

    ERIC Educational Resources Information Center

    Shadiev, Rustam; Huang, Yueh-Min; Hwang, Jan-Pan

    2017-01-01

    In this study, the effectiveness of the application of speech-to-text recognition (STR) technology on enhancing learning and concentration in a calm state of mind, hereafter referred to as meditation (An intentional and self-regulated focusing of attention in order to relax and calm the mind), was investigated. This effectiveness was further…

  12. Non-native Listeners’ Recognition of High-Variability Speech Using PRESTO

    PubMed Central

    Tamati, Terrin N.; Pisoni, David B.

    2015-01-01

    Background Natural variability in speech is a significant challenge to robust successful spoken word recognition. In everyday listening environments, listeners must quickly adapt and adjust to multiple sources of variability in both the signal and listening environments. High-variability speech may be particularly difficult to understand for non-native listeners, who have less experience with the second language (L2) phonological system and less detailed knowledge of sociolinguistic variation of the L2. Purpose The purpose of this study was to investigate the effects of high-variability sentences on non-native speech recognition and to explore the underlying sources of individual differences in speech recognition abilities of non-native listeners. Research Design Participants completed two sentence recognition tasks involving high-variability and low-variability sentences. They also completed a battery of behavioral tasks and self-report questionnaires designed to assess their indexical processing skills, vocabulary knowledge, and several core neurocognitive abilities. Study Sample Native speakers of Mandarin (n = 25) living in the United States recruited from the Indiana University community participated in the current study. A native comparison group consisted of scores obtained from native speakers of English (n = 21) in the Indiana University community taken from an earlier study. Data Collection and Analysis Speech recognition in high-variability listening conditions was assessed with a sentence recognition task using sentences from PRESTO (Perceptually Robust English Sentence Test Open-Set) mixed in 6-talker multitalker babble. Speech recognition in low-variability listening conditions was assessed using sentences from HINT (Hearing In Noise Test) mixed in 6-talker multitalker babble. Indexical processing skills were measured using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Vocabulary knowledge was assessed with the WordFam word familiarity test, and executive functioning was assessed with the BRIEF-A (Behavioral Rating Inventory of Executive Function – Adult Version) self-report questionnaire. Scores from the non-native listeners on behavioral tasks and self-report questionnaires were compared with scores obtained from native listeners tested in a previous study and were examined for individual differences. Results Non-native keyword recognition scores were significantly lower on PRESTO sentences than on HINT sentences. Non-native listeners’ keyword recognition scores were also lower than native listeners’ scores on both sentence recognition tasks. Differences in performance on the sentence recognition tasks between non-native and native listeners were larger on PRESTO than on HINT, although group differences varied by signal-to-noise ratio. The non-native and native groups also differed in the ability to categorize talkers by region of origin and in vocabulary knowledge. Individual non-native word recognition accuracy on PRESTO sentences in multitalker babble at more favorable signal-to-noise ratios was found to be related to several BRIEF-A subscales and composite scores. However, non-native performance on PRESTO was not related to regional dialect categorization, talker and gender discrimination, or vocabulary knowledge. Conclusions High-variability sentences in multitalker babble were particularly challenging for non-native listeners. Difficulty under high-variability testing conditions was related to lack of experience with the L2, especially L2 sociolinguistic information, compared with native listeners. Individual differences among the non-native listeners were related to weaknesses in core neurocognitive abilities affecting behavioral control in everyday life. PMID:25405842

  13. Perception of Filtered Speech by Children with Developmental Dyslexia and Children with Specific Language Impairments

    PubMed Central

    Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M.; Barnes, Lisa; Fosker, Tim

    2016-01-01

    Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed. PMID:27303348

  14. Recognizing speech under a processing load: dissociating energetic from informational factors.

    PubMed

    Mattys, Sven L; Brooks, Joanna; Cooke, Martin

    2009-11-01

    Effects of perceptual and cognitive loads on spoken-word recognition have so far largely escaped investigation. This study lays the foundations of a psycholinguistic approach to speech recognition in adverse conditions that draws upon the distinction between energetic masking, i.e., listening environments leading to signal degradation, and informational masking, i.e., listening environments leading to depletion of higher-order, domain-general processing resources, independent of signal degradation. We show that severe energetic masking, such as that produced by background speech or noise, curtails reliance on lexical-semantic knowledge and increases relative reliance on salient acoustic detail. In contrast, informational masking, induced by a resource-depleting competing task (divided attention or a memory load), results in the opposite pattern. Based on this clear dissociation, we propose a model of speech recognition that addresses not only the mapping between sensory input and lexical representations, as traditionally advocated, but also the way in which this mapping interfaces with general cognition and non-linguistic processes.

  15. Audiovisual speech perception development at varying levels of perceptual processing

    PubMed Central

    Lalonde, Kaylah; Holt, Rachael Frush

    2016-01-01

    This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children. PMID:27106318

  16. Audiovisual speech perception development at varying levels of perceptual processing.

    PubMed

    Lalonde, Kaylah; Holt, Rachael Frush

    2016-04-01

    This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.

  17. Do Older Listeners With Hearing Loss Benefit From Dynamic Pitch for Speech Recognition in Noise?

    PubMed

    Shen, Jing; Souza, Pamela E

    2017-10-12

    Dynamic pitch, the variation in the fundamental frequency of speech, aids older listeners' speech perception in noise. It is unclear, however, whether some older listeners with hearing loss benefit from strengthened dynamic pitch cues for recognizing speech in certain noise scenarios and how this relative benefit may be associated with individual factors. We first examined older individuals' relative benefit between natural and strong dynamic pitches for better speech recognition in noise. Further, we reported the individual factors of the 2 groups of listeners who benefit differently from natural and strong dynamic pitches. Speech reception thresholds of 13 older listeners with mild-moderate hearing loss were measured using target speech with 3 levels of dynamic pitch strength. Individuals' ability to benefit from dynamic pitch was defined as the speech reception threshold difference between speeches with and without dynamic pitch cues. The relative benefit of natural versus strong dynamic pitch varied across individuals. However, this relative benefit remained consistent for the same individuals across those background noises with temporal modulation. Those listeners who benefited more from strong dynamic pitch reported better subjective speech perception abilities. Strong dynamic pitch may be more beneficial than natural dynamic pitch for some older listeners to recognize speech better in noise, particularly when the noise has temporal modulation.

  18. Approximated mutual information training for speech recognition using myoelectric signals.

    PubMed

    Guo, Hua J; Chan, A D C

    2006-01-01

    A new training algorithm called the approximated maximum mutual information (AMMI) is proposed to improve the accuracy of myoelectric speech recognition using hidden Markov models (HMMs). Previous studies have demonstrated that automatic speech recognition can be performed using myoelectric signals from articulatory muscles of the face. Classification of facial myoelectric signals can be performed using HMMs that are trained using the maximum likelihood (ML) algorithm; however, this algorithm maximizes the likelihood of the observations in the training sequence, which is not directly associated with optimal classification accuracy. The AMMI training algorithm attempts to maximize the mutual information, thereby training the HMMs to optimize their parameters for discrimination. Our results show that AMMI training consistently reduces the error rates compared to these by the ML training, increasing the accuracy by approximately 3% on average.

  19. The Cortical Organization of Speech Processing: Feedback Control and Predictive Coding the Context of a Dual-Stream Model

    ERIC Educational Resources Information Center

    Hickok, Gregory

    2012-01-01

    Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the…

  20. STANFORD ARTIFICIAL INTELLIGENCE PROJECT.

    DTIC Science & Technology

    ARTIFICIAL INTELLIGENCE , GAME THEORY, DECISION MAKING, BIONICS, AUTOMATA, SPEECH RECOGNITION, GEOMETRIC FORMS, LEARNING MACHINES, MATHEMATICAL MODELS, PATTERN RECOGNITION, SERVOMECHANISMS, SIMULATION, BIBLIOGRAPHIES.

  1. Effects of WDRC release time and number of channels on output SNR and speech recognition

    PubMed Central

    Alexander, Joshua M.; Masterson, Katie

    2014-01-01

    Objectives The purpose of this study was to investigate the joint effects that wide dynamic range compression (WDRC) release time (RT) and number of channels have on recognition of sentences in the presence of steady and modulated maskers at different signal-to-noise ratios (SNRs). How the different combinations of WDRC parameters affect output SNR and the role this plays in the observed findings was also investigated. Design Twenty-four listeners with mild to moderate sensorineural hearing loss identified sentences mixed with steady or modulated maskers at 3 SNRs (−5, 0, +5 dB) that had been processed using a hearing aid simulator with 6 combinations of RT (40 and 640 ms) and number of channels (4, 8, and 16). Compression parameters were set using the Desired Sensation Level v5.0a prescriptive fitting method. For each condition, amplified speech and masker levels and the resultant long-term output SNR were measured. Results Speech recognition with WDRC depended on the combination of RT and number of channels, with the greatest effects observed at 0 dB input SNR, in which mean speech recognition scores varied by 10–12% across WDRC manipulations. Overall, effect sizes were generally small. Across both masker types and the three SNRs tested, the best speech recognition was obtained with 8 channels, regardless of RT. Increased speech levels, which favor audibility, were associated with the short RT and with an increase in the number of channels. These same conditions also increased masker levels by an even greater amount, for a net decrease in the long-term output SNR. Changes in long-term SNR across WDRC conditions were found to be strongly associated with changes in the temporal envelope shape as quantified by the Envelope Difference Index, however, neither of these factors fully explained the observed differences in speech recognition. Conclusions A primary finding of this study was that the number of channels had a modest effect when analyzed at each level of RT, with results suggesting that selecting 8 channels for a given RT might be the safest choice. Effects were smaller for RT, with results suggesting that short RT was slightly better when only 4 channels were used and that long RT was better when 16 channels were used. Individual differences in how listeners were influenced by audibility, output SNR, temporal distortion, and spectral distortion may have contributed to the size of the effects found in this study. Because only general suppositions could made for how each of these factors may have influenced the overall results of this study, future research would benefit from exploring the predictive value of these and other factors in selecting the processing parameters that maximize speech recognition for individuals. PMID:25470368

  2. Predicting Intelligibility Gains in Dysarthria through Automated Speech Feature Analysis

    ERIC Educational Resources Information Center

    Fletcher, Annalise R.; Wisler, Alan A.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.

    2017-01-01

    Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, &…

  3. Improving medical imaging report turnaround times: the role of technolgy.

    PubMed

    Marquez, Luis O; Stewart, Howard

    2005-01-01

    At Southern Ohio Medical Center (SOMC), the medical imaging department and the radiologists expressed a strong desire to improve workflow. The improved workflow was a major motivating factor toward implementing a new RIS and speech recognition technology. The need to monitor workflow in a real-time fashion and to evaluate productivity and resources necessitated that a new solution be found. A decision was made to roll out both the new RIS product and speech recognition to maximize the resources to interface and implement the new solution. Prior to implementation of the new RIS, the medical imaging department operated in a conventional electronic-order-entry to paper request manner. The paper request followed the study through exam completion to the radiologist. SOMC entered into a contract with its PACS vendor to participate in beta testing and clinical trials for a new RIS product for the US market. Backup plans were created in the event the product failed to function as planned--either during the beta testing period or during clinical trails. The last piece of the technology puzzle to improve report turnaround time was voice recognition technology. Speech recognition enhanced the RIS technology as soon as it was implemented. The results show that the project has been a success. The new RIS, combined with speech recognition and the PACS, makes for a very effective solution to patient, exam, and results management in the medical imaging department.

  4. The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise

    PubMed Central

    Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath

    2017-01-01

    Purpose Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing varying degrees of linguistic information (2-talker babble or pink noise). Method Twenty-nine monolingual English speakers were instructed to ignore the lexical status of spoken syllables (e.g., gift vs. kift) and to only categorize the initial phonemes (/g/ vs. /k/). The same participants then performed speech recognition tasks in the presence of 2-talker babble or pink noise in audio-only and audiovisual conditions. Results Individuals who demonstrated greater lexical influences on phonemic processing experienced greater speech processing difficulties in 2-talker babble than in pink noise. These selective difficulties were present across audio-only and audiovisual conditions. Conclusion Individuals with greater reliance on lexical processes during speech perception exhibit impaired speech recognition in listening conditions in which competing talkers introduce audible linguistic interferences. Future studies should examine the locus of lexical influences/interferences on phonemic processing and speech-in-speech processing. PMID:28586824

  5. Does it really matter whether students' contributions are spoken versus typed in an intelligent tutoring system with natural language?

    PubMed

    D'Mello, Sidney K; Dowell, Nia; Graesser, Arthur

    2011-03-01

    There is the question of whether learning differs when students speak versus type their responses when interacting with intelligent tutoring systems with natural language dialogues. Theoretical bases exist for three contrasting hypotheses. The speech facilitation hypothesis predicts that spoken input will increase learning, whereas the text facilitation hypothesis predicts typed input will be superior. The modality equivalence hypothesis claims that learning gains will be equivalent. Previous experiments that tested these hypotheses were confounded by automated speech recognition systems with substantial error rates that were detected by learners. We addressed this concern in two experiments via a Wizard of Oz procedure, where a human intercepted the learner's speech and transcribed the utterances before submitting them to the tutor. The overall pattern of the results supported the following conclusions: (1) learning gains associated with spoken and typed input were on par and quantitatively higher than a no-intervention control, (2) participants' evaluations of the session were not influenced by modality, and (3) there were no modality effects associated with differences in prior knowledge and typing proficiency. Although the results generally support the modality equivalence hypothesis, highly motivated learners reported lower cognitive load and demonstrated increased learning when typing compared with speaking. We discuss the implications of our findings for intelligent tutoring systems that can support typed and spoken input.

  6. Computational validation of the motor contribution to speech perception.

    PubMed

    Badino, Leonardo; D'Ausilio, Alessandro; Fadiga, Luciano; Metta, Giorgio

    2014-07-01

    Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data). Copyright © 2014 Cognitive Science Society, Inc.

  7. An automated approach for annual layer counting in ice cores

    NASA Astrophysics Data System (ADS)

    Winstrup, M.; Svensson, A.; Rasmussen, S. O.; Winther, O.; Steig, E.; Axelrod, A.

    2012-04-01

    The temporal resolution of some ice cores is sufficient to preserve seasonal information in the ice core record. In such cases, annual layer counting represents one of the most accurate methods to produce a chronology for the core. Yet, manual layer counting is a tedious and sometimes ambiguous job. As reliable layer recognition becomes more difficult, a manual approach increasingly relies on human interpretation of the available data. Thus, much may be gained by an automated and therefore objective approach for annual layer identification in ice cores. We have developed a novel method for automated annual layer counting in ice cores, which relies on Bayesian statistics. It uses algorithms from the statistical framework of Hidden Markov Models (HMM), originally developed for use in machine speech recognition. The strength of this layer detection algorithm lies in the way it is able to imitate the manual procedures for annual layer counting, while being based on purely objective criteria for annual layer identification. With this methodology, it is possible to determine the most likely position of multiple layer boundaries in an entire section of ice core data at once. It provides a probabilistic uncertainty estimate of the resulting layer count, hence ensuring a proper treatment of ambiguous layer boundaries in the data. Furthermore multiple data series can be incorporated to be used at once, hence allowing for a full multi-parameter annual layer counting method similar to a manual approach. In this study, the automated layer counting algorithm has been applied to data from the NGRIP ice core, Greenland. The NGRIP ice core has very high temporal resolution with depth, and hence the potential to be dated by annual layer counting far back in time. In previous studies [Andersen et al., 2006; Svensson et al., 2008], manual layer counting has been carried out back to 60 kyr BP. A comparison between the counted annual layers based on the two approaches will be presented and their differences discussed. Within the estimated uncertainties, the two methodologies agree. This shows the potential for a fully automated annual layer counting method to be operational for data sections where the annual layering is unknown.

  8. Automated Desensitization for the Clinical Treatment of Speech Anxiety

    ERIC Educational Resources Information Center

    McManus, Marianne; Lohr, James

    1976-01-01

    A self-guided, audio-tape, desensitization treatment procedure, using standard cassette recorders in a counseling service office, is an effective means for modifying self-report of speech anxiety. (MB)

  9. Effects of Familiarity and Feeding on Newborn Speech-Voice Recognition

    ERIC Educational Resources Information Center

    Valiante, A. Grace; Barr, Ronald G.; Zelazo, Philip R.; Brant, Rollin; Young, Simon N.

    2013-01-01

    Newborn infants preferentially orient to familiar over unfamiliar speech sounds. They are also better at remembering unfamiliar speech sounds for short periods of time if learning and retention occur after a feed than before. It is unknown whether short-term memory for speech is enhanced when the sound is familiar (versus unfamiliar) and, if so,…

  10. Speech Recognition Scores of White and Black Student-Teacher Listeners for Black and White First Grade Speakers. Final Technical Report.

    ERIC Educational Resources Information Center

    Nober, E. Harris; Seymour, Harry N.

    In order to investigate the possible consequences of dialectical differences in the classroom setting relative to the low income black and white first grade child and the prospective white middle-class teacher, 25 black and 25 white university listeners yielded speech recognition scores for 48 black and 48 white five-year-old urban school-children…

  11. Auditory models for speech analysis

    NASA Astrophysics Data System (ADS)

    Maybury, Mark T.

    This paper reviews the psychophysical basis for auditory models and discusses their application to automatic speech recognition. First an overview of the human auditory system is presented, followed by a review of current knowledge gleaned from neurological and psychoacoustic experimentation. Next, a general framework describes established peripheral auditory models which are based on well-understood properties of the peripheral auditory system. This is followed by a discussion of current enhancements to that models to include nonlinearities and synchrony information as well as other higher auditory functions. Finally, the initial performance of auditory models in the task of speech recognition is examined and additional applications are mentioned.

  12. The Benefit of Remote Microphones Using Four Wireless Protocols.

    PubMed

    Rodemerk, Krishna S; Galster, Jason A

    2015-09-01

    Many studies have reported the speech recognition benefits of a personal remote microphone system when used by adult listeners with hearing loss. The advance of wireless technology has allowed for many wireless audio transmission protocols. Some of these protocols interface with commercially available hearing aids. As a result, commercial remote microphone systems use a variety of different protocols for wireless audio transmission. It is not known how these systems compare, with regard to adult speech recognition in noise. The primary goal of this investigation was to determine the speech recognition benefits of four different commercially available remote microphone systems, each with a different wireless audio transmission protocol. A repeated-measures design was used in this study. Sixteen adults, ages 52 to 81 yr, with mild to severe sensorineural hearing loss participated in this study. Participants were fit with three different sets of bilateral hearing aids and four commercially available remote microphone systems (FM, 900 MHz, 2.4 GHz, and Bluetooth(®) paired with near-field magnetic induction). Speech recognition scores were measured by an adaptive version of the Hearing in Noise Test (HINT). The participants were seated both 6 and 12' away from the talker loudspeaker. Participants repeated HINT sentences with and without hearing aids and with four commercially available remote microphone systems in both seated positions with and without contributions from the hearing aid or environmental microphone (24 total conditions). The HINT SNR-50, or the signal-to-noise ratio required for correct repetition of 50% of the sentences, was recorded for all conditions. A one-way repeated measures analysis of variance was used to determine statistical significance of microphone condition. The results of this study revealed that use of the remote microphone systems statistically improved speech recognition in noise relative to unaided and hearing aid-only conditions across all four wireless transmission protocols at 6 and 12' away from the talker. Participants showed a significant improvement in speech recognition in noise when comparing four remote microphone systems with different wireless transmission methods to hearing aids alone. American Academy of Audiology.

  13. Listening Effort and Speech Recognition with Frequency Compression Amplification for Children and Adults with Hearing Loss.

    PubMed

    Brennan, Marc A; Lewis, Dawna; McCreery, Ryan; Kopun, Judy; Alexander, Joshua M

    2017-10-01

    Nonlinear frequency compression (NFC) can improve the audibility of high-frequency sounds by lowering them to a frequency where audibility is better; however, this lowering results in spectral distortion. Consequently, performance is a combination of the effects of increased access to high-frequency sounds and the detrimental effects of spectral distortion. Previous work has demonstrated positive benefits of NFC on speech recognition when NFC is set to improve audibility while minimizing distortion. However, the extent to which NFC impacts listening effort is not well understood, especially for children with sensorineural hearing loss (SNHL). To examine the impact of NFC on recognition and listening effort for speech in adults and children with SNHL. Within-subject, quasi-experimental study. Participants listened to amplified nonsense words that were (1) frequency-lowered using NFC, (2) low-pass filtered at 5 kHz to simulate the restricted bandwidth (RBW) of conventional hearing aid processing, or (3) low-pass filtered at 10 kHz to simulate extended bandwidth (EBW) amplification. Fourteen children (8-16 yr) and 14 adults (19-65 yr) with mild-to-severe SNHL. Participants listened to speech processed by a hearing aid simulator that amplified input signals to fit a prescriptive target fitting procedure. Participants were blinded to the type of processing. Participants' responses to each nonsense word were analyzed for accuracy and verbal-response time (VRT; listening effort). A multivariate analysis of variance and linear mixed model were used to determine the effect of hearing-aid signal processing on nonsense word recognition and VRT. Both children and adults identified the nonsense words and initial consonants better with EBW and NFC than with RBW. The type of processing did not affect the identification of the vowels or final consonants. There was no effect of age on recognition of the nonsense words, initial consonants, medial vowels, or final consonants. VRT did not change significantly with the type of processing or age. Both adults and children demonstrated improved speech recognition with access to the high-frequency sounds in speech. Listening effort as measured by VRT was not affected by access to high-frequency sounds. American Academy of Audiology

  14. Speech recognition for bilaterally asymmetric and symmetric hearing aid microphone modes in simulated classroom environments.

    PubMed

    Ricketts, Todd A; Picou, Erin M

    2013-09-01

    This study aimed to evaluate the potential utility of asymmetrical and symmetrical directional hearing aid fittings for school-age children in simulated classroom environments. This study also aimed to evaluate speech recognition performance of children with normal hearing in the same listening environments. Two groups of school-age children 11 to 17 years of age participated in this study. Twenty participants had normal hearing, and 29 participants had sensorineural hearing loss. Participants with hearing loss were fitted with behind-the-ear hearing aids with clinically appropriate venting and were tested in 3 hearing aid configurations: bilateral omnidirectional, bilateral directional, and asymmetrical directional microphones. Speech recognition testing was completed in each microphone configuration in 3 environments: Talker-Front, Talker-Back, and Question-Answer situations. During testing, the location of the speech signal changed, but participants were always seated in a noisy, moderately reverberant classroom-like room. For all conditions, results revealed expected effects of directional microphones on speech recognition performance. When the signal of interest was in front of the listener, bilateral directional microphone was best, and when the signal of interest was behind the listener, bilateral omnidirectional microphone was best. Performance with asymmetric directional microphones was between the 2 symmetrical conditions. The magnitudes of directional benefits and decrements were not significantly correlated. In comparison with their peers with normal hearing, children with hearing loss performed similarly to their peers with normal hearing when fitted with directional microphones and the speech was from the front. In contrast, children with normal hearing still outperformed children with hearing loss if the speech originated from behind, even when the children were fitted with the optimal hearing aid microphone mode for the situation. Bilateral directional microphones can be effective in improving speech recognition performance for children in the classroom, as long as child is facing the talker of interest. Bilateral directional microphones, however, can impair performance if the signal originates from behind a listener. However, these data suggest that the magnitude of decrement is not predictable from an individual's benefit. The results re-emphasize the importance of appropriate switching between microphone modes so children can take full advantage of directional benefits without being hurt by directional decrements. An asymmetric fitting limits decrements, but does not lead to maximum speech recognition scores when compared with the optimal symmetrical fitting. Therefore, the asymmetric mode may not be the best option as a default fitting for children in a classroom environment. While directional microphones improve performance for children with hearing loss, their performance in most conditions continues to be impaired relative to their normal-hearing peers, particularly when the signals of interest originate from behind or from an unpredictable location.

  15. Optimizing the Combination of Acoustic and Electric Hearing in the Implanted Ear

    PubMed Central

    Karsten, Sue A.; Turner, Christopher W.; Brown, Carolyn J.; Jeon, Eun Kyung; Abbas, Paul J.; Gantz, Bruce J.

    2016-01-01

    Objectives The aim of this study was to determine an optimal approach to program combined acoustic plus electric (A+E) hearing devices in the same ear to maximize speech-recognition performance. Design Ten participants with at least 1 year of experience using Nucleus Hybrid (short electrode) A+E devices were evaluated across three different fitting conditions that varied in the frequency ranges assigned to the acoustically and electrically presented portions of the spectrum. Real-ear measurements were used to optimize the acoustic component for each participant, and the acoustic stimulation was then held constant across conditions. The lower boundary of the electric frequency range was systematically varied to create three conditions with respect to the upper boundary of the acoustic spectrum: Meet, Overlap, and Gap programming. Consonant recognition in quiet and speech recognition in competing-talker babble were evaluated after participants were given the opportunity to adapt by using the experimental programs in their typical everyday listening situations. Participants provided subjective ratings and evaluations for each fitting condition. Results There were no significant differences in performance between conditions (Meet, Overlap, Gap) for consonant recognition in quiet. A significant decrement in performance was measured for the Overlap fitting condition for speech recognition in babble. Subjective ratings indicated a significant preference for the Meet fitting regimen. Conclusions Participants using the Hybrid ipsilateral A+E device generally performed better when the acoustic and electric spectra were programmed to meet at a single frequency region, as opposed to a gap or overlap. Although there is no particular advantage for the Meet fitting strategy for recognition of consonants in quiet, the advantage becomes evident for speech recognition in competing-talker babble and in patient preferences. PMID:23059851

  16. Neuromorphic crossbar circuit with nanoscale filamentary-switching binary memristors for speech recognition.

    PubMed

    Truong, Son Ngoc; Ham, Seok-Jin; Min, Kyeong-Sik

    2014-01-01

    In this paper, a neuromorphic crossbar circuit with binary memristors is proposed for speech recognition. The binary memristors which are based on filamentary-switching mechanism can be found more popularly and are easy to be fabricated than analog memristors that are rare in materials and need a more complicated fabrication process. Thus, we develop a neuromorphic crossbar circuit using filamentary-switching binary memristors not using interface-switching analog memristors. The proposed binary memristor crossbar can recognize five vowels with 4-bit 64 input channels. The proposed crossbar is tested by 2,500 speech samples and verified to be able to recognize 89.2% of the tested samples. From the statistical simulation, the recognition rate of the binary memristor crossbar is estimated to be degraded very little from 89.2% to 80%, though the percentage variation in memristance is increased very much from 0% to 15%. In contrast, the analog memristor crossbar loses its recognition rate significantly from 96% to 9% for the same percentage variation in memristance.

  17. Versatile simulation testbed for rotorcraft speech I/O system design

    NASA Technical Reports Server (NTRS)

    Simpson, Carol A.

    1986-01-01

    A versatile simulation testbed for the design of a rotorcraft speech I/O system is described in detail. The testbed will be used to evaluate alternative implementations of synthesized speech displays and speech recognition controls for the next generation of Army helicopters including the LHX. The message delivery logic is discussed as well as the message structure, the speech recognizer command structure and features, feedback from the recognizer, and random access to controls via speech command.

  18. Continuous Speech Recognition for Clinicians

    PubMed Central

    Zafar, Atif; Overhage, J. Marc; McDonald, Clement J.

    1999-01-01

    The current generation of continuous speech recognition systems claims to offer high accuracy (greater than 95 percent) speech recognition at natural speech rates (150 words per minute) on low-cost (under $2000) platforms. This paper presents a state-of-the-technology summary, along with insights the authors have gained through testing one such product extensively and other products superficially. The authors have identified a number of issues that are important in managing accuracy and usability. First, for efficient recognition users must start with a dictionary containing the phonetic spellings of all words they anticipate using. The authors dictated 50 discharge summaries using one inexpensive internal medicine dictionary ($30) and found that they needed to add an additional 400 terms to get recognition rates of 98 percent. However, if they used either of two more expensive and extensive commercial medical vocabularies ($349 and $695), they did not need to add terms to get a 98 percent recognition rate. Second, users must speak clearly and continuously, distinctly pronouncing all syllables. Users must also correct errors as they occur, because accuracy improves with error correction by at least 5 percent over two weeks. Users may find it difficult to train the system to recognize certain terms, regardless of the amount of training, and appropriate substitutions must be created. For example, the authors had to substitute “twice a day” for “bid” when using the less expensive dictionary, but not when using the other two dictionaries. From trials they conducted in settings ranging from an emergency room to hospital wards and clinicians' offices, they learned that ambient noise has minimal effect. Finally, they found that a minimal “usable” hardware configuration (which keeps up with dictation) comprises a 300-MHz Pentium processor with 128 MB of RAM and a “speech quality” sound card (e.g., SoundBlaster, $99). Anything less powerful will result in the system lagging behind the speaking rate. The authors obtained 97 percent accuracy with just 30 minutes of training when using the latest edition of one of the speech recognition systems supplemented by a commercial medical dictionary. This technology has advanced considerably in recent years and is now a serious contender to replace some or all of the increasingly expensive alternative methods of dictation with human transcription. PMID:10332653

  19. Hands-free device control using sound picked up in the ear canal

    NASA Astrophysics Data System (ADS)

    Chhatpar, Siddharth R.; Ngia, Lester; Vlach, Chris; Lin, Dong; Birkhimer, Craig; Juneja, Amit; Pruthi, Tarun; Hoffman, Orin; Lewis, Tristan

    2008-04-01

    Hands-free control of unmanned ground vehicles is essential for soldiers, bomb disposal squads, and first responders. Having their hands free for other equipment and tasks allows them to be safer and more mobile. Currently, the most successful hands-free control devices are speech-command based. However, these devices use external microphones, and in field environments, e.g., war zones and fire sites, their performance suffers because of loud ambient noise: typically above 90dBA. This paper describes the development of technology using the ear as an output source that can provide excellent command recognition accuracy even in noisy environments. Instead of picking up speech radiating from the mouth, this technology detects speech transmitted internally through the ear canal. Discreet tongue movements also create air pressure changes within the ear canal, and can be used for stealth control. A patented earpiece was developed with a microphone pointed into the ear canal that captures these signals generated by tongue movements and speech. The signals are transmitted from the earpiece to an Ultra-Mobile Personal Computer (UMPC) through a wired connection. The UMPC processes the signals and utilizes them for device control. The processing can include command recognition, ambient noise cancellation, acoustic echo cancellation, and speech equalization. Successful control of an iRobot PackBot has been demonstrated with both speech (13 discrete commands) and tongue (5 discrete commands) signals. In preliminary tests, command recognition accuracy was 95% with speech control and 85% with tongue control.

  20. Accuracy of Cochlear Implant Recipients on Speech Reception in Background Music

    PubMed Central

    Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Kliethermes, Stephanie; Driscoll, Virginia

    2012-01-01

    Objectives This study (a) examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of three contrasting types of background music, and (b) compared performance based upon listener groups: CI recipients using conventional long-electrode (LE) devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing (NH) adults. Methods We tested 154 LE CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 NH adults on closed-set recognition of spondees presented in three contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Outcomes Signal-to-noise thresholds for speech in music (SRTM) were examined in relation to measures of speech recognition in background noise and multi-talker babble, pitch perception, and music experience. Results SRTM thresholds varied as a function of category of background music, group membership (LE, Hybrid, NH), and age. Thresholds for speech in background music were significantly correlated with measures of pitch perception and speech in background noise thresholds; auditory status was an important predictor. Conclusions Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music. PMID:23342550

  1. Cochlear implant microphone location affects speech recognition in diffuse noise.

    PubMed

    Kolberg, Elizabeth R; Sheffield, Sterling W; Davis, Timothy J; Sunderhaus, Linsey W; Gifford, René H

    2015-01-01

    Despite improvements in cochlear implants (CIs), CI recipients continue to experience significant communicative difficulty in background noise. Many potential solutions have been proposed to help increase signal-to-noise ratio in noisy environments, including signal processing and external accessories. To date, however, the effect of microphone location on speech recognition in noise has focused primarily on hearing aid users. The purpose of this study was to (1) measure physical output for the T-Mic as compared with the integrated behind-the-ear (BTE) processor mic for various source azimuths, and (2) to investigate the effect of CI processor mic location for speech recognition in semi-diffuse noise with speech originating from various source azimuths as encountered in everyday communicative environments. A repeated-measures, within-participant design was used to compare performance across listening conditions. A total of 11 adults with Advanced Bionics CIs were recruited for this study. Physical acoustic output was measured on a Knowles Experimental Mannequin for Acoustic Research (KEMAR) for the T-Mic and BTE mic, with broadband noise presented at 0 and 90° (directed toward the implant processor). In addition to physical acoustic measurements, we also assessed recognition of sentences constructed by researchers at Texas Instruments, the Massachusetts Institute of Technology, and the Stanford Research Institute (TIMIT sentences) at 60 dBA for speech source azimuths of 0, 90, and 270°. Sentences were presented in a semi-diffuse restaurant noise originating from the R-SPACE 8-loudspeaker array. Signal-to-noise ratio was determined individually to achieve approximately 50% correct in the unilateral implanted listening condition with speech at 0°. Performance was compared across the T-Mic, 50/50, and the integrated BTE processor mic. The integrated BTE mic provided approximately 5 dB attenuation from 1500-4500 Hz for signals presented at 0° as compared with 90° (directed toward the processor). The T-Mic output was essentially equivalent for sources originating from 0 and 90°. Mic location also significantly affected sentence recognition as a function of source azimuth, with the T-Mic yielding the highest performance for speech originating from 0°. These results have clinical implications for (1) future implant processor design with respect to mic location, (2) mic settings for implant recipients, and (3) execution of advanced speech testing in the clinic. American Academy of Audiology.

  2. Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

    PubMed Central

    Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

    2016-01-01

    Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714

  3. A study of speech emotion recognition based on hybrid algorithm

    NASA Astrophysics Data System (ADS)

    Zhu, Ju-xia; Zhang, Chao; Lv, Zhao; Rao, Yao-quan; Wu, Xiao-pei

    2011-10-01

    To effectively improve the recognition accuracy of the speech emotion recognition system, a hybrid algorithm which combines Continuous Hidden Markov Model (CHMM), All-Class-in-One Neural Network (ACON) and Support Vector Machine (SVM) is proposed. In SVM and ACON methods, some global statistics are used as emotional features, while in CHMM method, instantaneous features are employed. The recognition rate by the proposed method is 92.25%, with the rejection rate to be 0.78%. Furthermore, it obtains the relative increasing of 8.53%, 4.69% and 0.78% compared with ACON, CHMM and SVM methods respectively. The experiment result confirms the efficiency of distinguishing anger, happiness, neutral and sadness emotional states.

  4. Post interaural neural net-based vowel recognition

    NASA Astrophysics Data System (ADS)

    Jouny, Ismail I.

    2001-10-01

    Interaural head related transfer functions are used to process speech signatures prior to neural net based recognition. Data representing the head related transfer function of a dummy has been collected at MIT and made available on the Internet. This data is used to pre-process vowel signatures to mimic the effects of human ear on speech perception. Signatures representing various vowels of the English language are then presented to a multi-layer perceptron trained using the back propagation algorithm for recognition purposes. The focus in this paper is to assess the effects of human interaural system on vowel recognition performance particularly when using a classification system that mimics the human brain such as a neural net.

  5. Sequential Bilateral Cochlear Implantation in a Patient with Bilateral Meniere’s Disease

    PubMed Central

    Holden, Laura K.; Neely, J. Gail; Gotter, Brenda D.; Mispagel, Karen M.; Firszt, Jill B.

    2012-01-01

    This case study describes a 45 year old female with bilateral, profound sensorineural hearing loss due to Meniere’s disease. She received her first cochlear implant in the right ear in 2008 and the second cochlear implant in the left ear in 2010. The case study examines the enhancement to speech recognition, particularly in noise, provided by bilateral cochlear implants. Speech recognition tests were administered prior to obtaining the second implant and at a number of test intervals following activation of the second device. Speech recognition in quiet and noise as well as localization abilities were assessed in several conditions to determine bilateral benefit and performance differences between ears. The results of the speech recognition testing indicated a substantial improvement in the patient’s ability to understand speech in noise and her ability to localize sound when using bilateral cochlear implants compared to using a unilateral implant or an implant and a hearing aid. In addition, the patient reported considerable improvement in her ability to communicate in daily life when using bilateral implants versus a unilateral implant. This case suggests that cochlear implantation is a viable option for patients who have lost their hearing to Meniere’s disease even when a number of medical treatments and surgical interventions have been performed to control vertigo. In the case presented, bilateral cochlear implantation was necessary for this patient to communicate successfully at home and at work. PMID:22463939

  6. Free Field Word recognition test in the presence of noise in normal hearing adults.

    PubMed

    Almeida, Gleide Viviani Maciel; Ribas, Angela; Calleros, Jorge

    In ideal listening situations, subjects with normal hearing can easily understand speech, as can many subjects who have a hearing loss. To present the validation of the Word Recognition Test in a Free Field in the Presence of Noise in normal-hearing adults. Sample consisted of 100 healthy adults over 18 years of age with normal hearing. After pure tone audiometry, a speech recognition test was applied in free field condition with monosyllables and disyllables, with standardized material in three listening situations: optimal listening condition (no noise), with a signal to noise ratio of 0dB and a signal to noise ratio of -10dB. For these tests, an environment in calibrated free field was arranged where speech was presented to the subject being tested from two speakers located at 45°, and noise from a third speaker, located at 180°. All participants had speech audiometry results in the free field between 88% and 100% in the three listening situations. Word Recognition Test in Free Field in the Presence of Noise proved to be easy to be organized and applied. The results of the test validation suggest that individuals with normal hearing should get between 88% and 100% of the stimuli correct. The test can be an important tool in measuring noise interference on the speech perception abilities. Copyright © 2016 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.

  7. The Long Road to Automation: Neurocognitive Development of Letter-Speech Sound Processing

    ERIC Educational Resources Information Center

    Froyen, Dries J. W.; Bonte, Milene L.; van Atteveldt, Nienke; Blomert, Leo

    2009-01-01

    In transparent alphabetic languages, the expected standard for complete acquisition of letter-speech sound associations is within one year of reading instruction. The neural mechanisms underlying the acquisition of letter-speech sound associations have, however, hardly been investigated. The present article describes an ERP study with beginner and…

  8. Factors that influence the performance of experienced speech recognition users.

    PubMed

    Koester, Heidi Horstmann

    2006-01-01

    Performance on automatic speech recognition (ASR) systems for users with physical disabilities varies widely between individuals. The goal of this study was to discover some key factors that account for that variation. Using data from 23 experienced ASR users with physical disabilities, the effect of 20 different independent variables on recognition accuracy and text entry rate with ASR was measured using bivariate and multivariate analyses. The results show that use of appropriate correction strategies had the strongest influence on user performance with ASR. The amount of time the user spent on his or her computer, the user's manual typing speed, and the speed with which the ASR system recognized speech were all positively associated with better performance. The amount or perceived adequacy of ASR training did not have a significant impact on performance for this user group.

  9. It doesn't matter what you say: FMRI correlates of voice learning and recognition independent of speech content.

    PubMed

    Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal

    2017-09-01

    Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Clinical implications of word recognition differences in earphone and aided conditions

    PubMed Central

    McRackan, Theodore R.; Ahlstrom, Jayne B.; Clinkscales, William B.; Meyer, Ted A.; Dubno, Judy R

    2017-01-01

    Objective To compare word recognition scores for adults with hearing loss measured using earphones and in the sound field without and with hearing aids (HA) Study design Independent review of pre-surgical audiological data from an active middle ear implant (MEI) FDA clinical trial Setting Multicenter prospective FDA clinical trial Patients Ninety-four adult HA users Interventions/Main outcomes measured Pre-operative earphone, unaided and aided pure tone thresholds, word recognition scores, and speech intelligibility index. Results We performed an independent review of pre-surgical audiological data from a MEI FDA trial and compared unaided and aided word recognition scores with participants’ HAs fit according to the NAL-R algorithm. For 52 participants (55.3%), differences in scores between earphone and aided conditions were >10%; for 33 participants (35.1%), earphone scores were higher by 10% or more than aided scores. These participants had significantly higher pure tone thresholds at 250 Hz, 500 Hz, and 1000 Hz), higher pure tone averages, higher speech recognition thresholds, (and higher earphone speech levels (p=0.002). No significant correlation was observed between word recognition scores measured with earphones and with hearing aids (r=.14; p=0.16), whereas a moderately high positive correlation was observed between unaided and aided word recognition (r=0.68; p<0.001). Conclusion Results of the these analyses do not support the common clinical practice of using word recognition scores measured with earphones to predict aided word recognition or hearing aid benefit. Rather, these results provide evidence supporting the measurement of aided word recognition in patients who are considering hearing aids. PMID:27631832

  11. Hybrid Speaker Recognition Using Universal Acoustic Model

    NASA Astrophysics Data System (ADS)

    Nishimura, Jun; Kuroda, Tadahiro

    We propose a novel speaker recognition approach using a speaker-independent universal acoustic model (UAM) for sensornet applications. In sensornet applications such as “Business Microscope”, interactions among knowledge workers in an organization can be visualized by sensing face-to-face communication using wearable sensor nodes. In conventional studies, speakers are detected by comparing energy of input speech signals among the nodes. However, there are often synchronization errors among the nodes which degrade the speaker recognition performance. By focusing on property of the speaker's acoustic channel, UAM can provide robustness against the synchronization error. The overall speaker recognition accuracy is improved by combining UAM with the energy-based approach. For 0.1s speech inputs and 4 subjects, speaker recognition accuracy of 94% is achieved at the synchronization error less than 100ms.

  12. HTM Spatial Pooler With Memristor Crossbar Circuits for Sparse Biometric Recognition.

    PubMed

    James, Alex Pappachen; Fedorova, Irina; Ibrayev, Timur; Kudithipudi, Dhireesha

    2017-06-01

    Hierarchical Temporal Memory (HTM) is an online machine learning algorithm that emulates the neo-cortex. The development of a scalable on-chip HTM architecture is an open research area. The two core substructures of HTM are spatial pooler and temporal memory. In this work, we propose a new Spatial Pooler circuit design with parallel memristive crossbar arrays for the 2D columns. The proposed design was validated on two different benchmark datasets, face recognition, and speech recognition. The circuits are simulated and analyzed using a practical memristor device model and 0.18 μm IBM CMOS technology model. The databases AR, YALE, ORL, and UFI, are used to test the performance of the design in face recognition. TIMIT dataset is used for the speech recognition.

  13. [Vocal recognition in dental and oral radiology].

    PubMed

    La Fianza, A; Giorgetti, S; Marelli, P; Campani, R

    1993-10-01

    Speech reporting benefits by units which can recognize sentences in any natural language in real time. The use of this method in the everyday practice of radiology departments shows its possible application fields. We used the speech recognition method to report orthopantomographic exams in order to evaluate the advantages the method offers to the management and quality of reporting the exams which are difficult to fit in other closed computed reporting systems. Both speech recognition and the conventional reporting method (tape recording and typewriting) were used to report 760 orthopantomographs. The average time needed to make the report, the legibility (or Flesch) index, as adapted for the Italian language, and finally a clinical index (the subjective opinion of 4 odontostomatologists) were evaluated for each exam, with both techniques. Moreover, errors in speech reporting (crude, human and overall errors) were also evaluated. The advantages of speech reporting consisted in the shorter time needed for the report to become available (2.24 vs 2.99 minutes) (p < 0.0005), in the improved Flesch index (30.62 vs 28.9) and in the clinical index. The data obtained from speech reporting in odontostomatologic radiology were useful not only to reduce the mean reporting time of orthopantomographic exams but also to improve report quality by reducing both grammar and transmission mistakes. However, the basic condition for such results to be obtained is the speaker's skills to make a good report.

  14. Primitive Auditory Memory Is Correlated with Spatial Unmasking That Is Based on Direct-Reflection Integration

    PubMed Central

    Li, Huahui; Kong, Lingzhi; Wu, Xihong; Li, Liang

    2013-01-01

    In reverberant rooms with multiple-people talking, spatial separation between speech sources improves recognition of attended speech, even though both the head-shadowing and interaural-interaction unmasking cues are limited by numerous reflections. It is the perceptual integration between the direct wave and its reflections that bridges the direct-reflection temporal gaps and results in the spatial unmasking under reverberant conditions. This study further investigated (1) the temporal dynamic of the direct-reflection-integration-based spatial unmasking as a function of the reflection delay, and (2) whether this temporal dynamic is correlated with the listeners’ auditory ability to temporally retain raw acoustic signals (i.e., the fast decaying primitive auditory memory, PAM). The results showed that recognition of the target speech against the speech-masker background is a descending exponential function of the delay of the simulated target reflection. In addition, the temporal extent of PAM is frequency dependent and markedly longer than that for perceptual fusion. More importantly, the temporal dynamic of the speech-recognition function is significantly correlated with the temporal extent of the PAM of low-frequency raw signals. Thus, we propose that a chain process, which links the earlier-stage PAM with the later-stage correlation computation, perceptual integration, and attention facilitation, plays a role in spatially unmasking target speech under reverberant conditions. PMID:23658664

  15. Exploring the Link Between Cognitive Abilities and Speech Recognition in the Elderly Under Different Listening Conditions

    PubMed Central

    Nuesse, Theresa; Steenken, Rike; Neher, Tobias; Holube, Inga

    2018-01-01

    Elderly listeners are known to differ considerably in their ability to understand speech in noise. Several studies have addressed the underlying factors that contribute to these differences. These factors include audibility, and age-related changes in supra-threshold auditory processing abilities, and it has been suggested that differences in cognitive abilities may also be important. The objective of this study was to investigate associations between performance in cognitive tasks and speech recognition under different listening conditions in older adults with either age appropriate hearing or hearing-impairment. To that end, speech recognition threshold (SRT) measurements were performed under several masking conditions that varied along the perceptual dimensions of dip listening, spatial separation, and informational masking. In addition, a neuropsychological test battery was administered, which included measures of verbal working and short-term memory, executive functioning, selective and divided attention, and lexical and semantic abilities. Age-matched groups of older adults with either age-appropriate hearing (ENH, n = 20) or aided hearing impairment (EHI, n = 21) participated. In repeated linear regression analyses, composite scores of cognitive test outcomes (evaluated using PCA) were included to predict SRTs. These associations were different for the two groups. When hearing thresholds were controlled for, composed cognitive factors were significantly associated with the SRTs for the ENH listeners. Whereas better lexical and semantic abilities were associated with lower (better) SRTs in this group, there was a negative association between attentional abilities and speech recognition in the presence of spatially separated speech-like maskers. For the EHI group, the pure-tone thresholds (averaged across 0.5, 1, 2, and 4 kHz) were significantly associated with the SRTs, despite the fact that all signals were amplified and therefore in principle audible. PMID:29867654

  16. Research on Speech Perception. Progress Report No. 13.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities in 1987, this is the thirteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information on…

  17. Development of a Low-Cost, Noninvasive, Portable Visual Speech Recognition Program.

    PubMed

    Kohlberg, Gavriel D; Gal, Ya'akov Kobi; Lalwani, Anil K

    2016-09-01

    Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low-cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech. A Microsoft Kinect-based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements. The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16). We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients. © The Author(s) 2016.

  18. Working Memory and Speech Recognition in Noise under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type among Adults with Hearing Loss

    ERIC Educational Resources Information Center

    Miller, Christi W.; Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

    2017-01-01

    Purpose: This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method: Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2…

  19. A Pilot Investigation regarding Speech-Recognition Performance in Noise for Adults with Hearing Loss in the FM+HA Listening Condition

    ERIC Educational Resources Information Center

    Lewis, M. Samantha; Gallun, Frederick J.; Gordon, Jane; Lilly, David J.; Crandell, Carl

    2010-01-01

    While the concurrent use of the hearing aid (HA) microphone with frequency modulation (FM) technology can decrease speech-recognition performance, the FM+HA condition is still an important setting for users of both HA and FM technology. The primary goal of this investigation was to evaluate the effect of attenuating HA gain in the FM+HA listening…

  20. Integrated multimodal human-computer interface and augmented reality for interactive display applications

    NASA Astrophysics Data System (ADS)

    Vassiliou, Marius S.; Sundareswaran, Venkataraman; Chen, S.; Behringer, Reinhold; Tam, Clement K.; Chan, M.; Bangayan, Phil T.; McGee, Joshua H.

    2000-08-01

    We describe new systems for improved integrated multimodal human-computer interaction and augmented reality for a diverse array of applications, including future advanced cockpits, tactical operations centers, and others. We have developed an integrated display system featuring: speech recognition of multiple concurrent users equipped with both standard air- coupled microphones and novel throat-coupled sensors (developed at Army Research Labs for increased noise immunity); lip reading for improving speech recognition accuracy in noisy environments, three-dimensional spatialized audio for improved display of warnings, alerts, and other information; wireless, coordinated handheld-PC control of a large display; real-time display of data and inferences from wireless integrated networked sensors with on-board signal processing and discrimination; gesture control with disambiguated point-and-speak capability; head- and eye- tracking coupled with speech recognition for 'look-and-speak' interaction; and integrated tetherless augmented reality on a wearable computer. The various interaction modalities (speech recognition, 3D audio, eyetracking, etc.) are implemented a 'modality servers' in an Internet-based client-server architecture. Each modality server encapsulates and exposes commercial and research software packages, presenting a socket network interface that is abstracted to a high-level interface, minimizing both vendor dependencies and required changes on the client side as the server's technology improves.

  1. Speech recognition training for enhancing written language generation by a traumatic brain injury survivor.

    PubMed

    Manasse, N J; Hux, K; Rankin-Erickson, J L

    2000-11-01

    Impairments in motor functioning, language processing, and cognitive status may impact the written language performance of traumatic brain injury (TBI) survivors. One strategy to minimize the impact of these impairments is to use a speech recognition system. The purpose of this study was to explore the effect of mild dysarthria and mild cognitive-communication deficits secondary to TBI on a 19-year-old survivor's mastery and use of such a system-specifically, Dragon Naturally Speaking. Data included the % of the participant's words accurately perceived by the system over time, the participant's accuracy over time in using commands for navigation and error correction, and quantitative and qualitative changes in the participant's written texts generated with and without the use of the speech recognition system. Results showed that Dragon NaturallySpeaking was approximately 80% accurate in perceiving words spoken by the participant, and the participant quickly and easily mastered all navigation and error correction commands presented. Quantitatively, the participant produced a greater amount of text using traditional word processing and a standard keyboard than using the speech recognition system. Minimal qualitative differences appeared between writing samples. Discussion of factors that may have contributed to the obtained results and that may affect the generalization of the findings to other TBI survivors is provided.

  2. Subauditory Speech Recognition based on EMG/EPG Signals

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Lee, Diana Dee; Agabon, Shane; Lau, Sonie (Technical Monitor)

    2003-01-01

    Sub-vocal electromyogram/electro palatogram (EMG/EPG) signal classification is demonstrated as a method for silent speech recognition. Recorded electrode signals from the larynx and sublingual areas below the jaw are noise filtered and transformed into features using complex dual quad tree wavelet transforms. Feature sets for six sub-vocally pronounced words are trained using a trust region scaled conjugate gradient neural network. Real time signals for previously unseen patterns are classified into categories suitable for primitive control of graphic objects. Feature construction, recognition accuracy and an approach for extension of the technique to a variety of real world application areas are presented.

  3. Cost-sensitive learning for emotion robust speaker recognition.

    PubMed

    Li, Dongdong; Yang, Yingchun; Dai, Weihui

    2014-01-01

    In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

  4. A preliminary analysis of human factors affecting the recognition accuracy of a discrete word recognizer for C3 systems

    NASA Astrophysics Data System (ADS)

    Yellen, H. W.

    1983-03-01

    Literature pertaining to Voice Recognition abounds with information relevant to the assessment of transitory speech recognition devices. In the past, engineering requirements have dictated the path this technology followed. But, other factors do exist that influence recognition accuracy. This thesis explores the impact of Human Factors on the successful recognition of speech, principally addressing the differences or variability among users. A Threshold Technology T-600 was used for a 100 utterance vocubalary to test 44 subjects. A statistical analysis was conducted on 5 generic categories of Human Factors: Occupational, Operational, Psychological, Physiological and Personal. How the equipment is trained and the experience level of the speaker were found to be key characteristics influencing recognition accuracy. To a lesser extent computer experience, time or week, accent, vital capacity and rate of air flow, speaker cooperativeness and anxiety were found to affect overall error rates.

  5. Cost-Sensitive Learning for Emotion Robust Speaker Recognition

    PubMed Central

    Li, Dongdong; Yang, Yingchun

    2014-01-01

    In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492

  6. Differences in Talker Recognition by Preschoolers and Adults

    ERIC Educational Resources Information Center

    Creel, Sarah C.; Jimenez, Sofia R.

    2012-01-01

    Talker variability in speech influences language processing from infancy through adulthood and is inextricably embedded in the very cues that identify speech sounds. Yet little is known about developmental changes in the processing of talker information. On one account, children have not yet learned to separate speech sound variability from…

  7. Developmental Differences in Speech Act Recognition: A Pragmatic Awareness Study

    ERIC Educational Resources Information Center

    Garcia, Paula

    2004-01-01

    With the growing acknowledgement of the importance of pragmatic competence in second language (L2) learning, language researchers have identified the comprehension of speech acts as they occur in natural conversation as essential to communicative competence (e.g. Bardovi-Harlig, 2001; Thomas, 1983). Nonconventional indirect speech acts are formed…

  8. Audio-Visual Speech in Noise Perception in Dyslexia

    ERIC Educational Resources Information Center

    van Laarhoven, Thijs; Keetels, Mirjam; Schakel, Lemmy; Vroomen, Jean

    2018-01-01

    Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and adults with DD. We found that children with a…

  9. Cross-Channel Amplitude Sweeps Are Crucial to Speech Intelligibility

    ERIC Educational Resources Information Center

    Prendergast, Garreth; Green, Gary G. R.

    2012-01-01

    Classical views of speech perception argue that the static and dynamic characteristics of spectral energy peaks (formants) are the acoustic features that underpin phoneme recognition. Here we use representations where the amplitude modulations of sub-band filtered speech are described, precisely, in terms of co-sinusoidal pulses. These pulses are…

  10. Rate and onset cues can improve cochlear implant synthetic vowel recognition in noise

    PubMed Central

    Mc Laughlin, Myles; Reilly, Richard B.; Zeng, Fan-Gang

    2013-01-01

    Understanding speech-in-noise is difficult for most cochlear implant (CI) users. Speech-in-noise segregation cues are well understood for acoustic hearing but not for electric hearing. This study investigated the effects of stimulation rate and onset delay on synthetic vowel-in-noise recognition in CI subjects. In experiment I, synthetic vowels were presented at 50, 145, or 795 pulse/s and noise at the same three rates, yielding nine combinations. Recognition improved significantly if the noise had a lower rate than the vowel, suggesting that listeners can use temporal gaps in the noise to detect a synthetic vowel. This hypothesis is supported by accurate prediction of synthetic vowel recognition using a temporal integration window model. Using lower rates a similar trend was observed in normal hearing subjects. Experiment II found that for CI subjects, a vowel onset delay improved performance if the noise had a lower or higher rate than the synthetic vowel. These results show that differing rates or onset times can improve synthetic vowel-in-noise recognition, indicating a need to develop speech processing strategies that encode or emphasize these cues. PMID:23464025

  11. Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

    PubMed

    Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado

    2016-01-01

    Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).

  12. The process of spoken word recognition in the face of signal degradation.

    PubMed

    Farris-Trimble, Ashley; McMurray, Bob; Cigrand, Nicole; Tomblin, J Bruce

    2014-02-01

    Though much is known about how words are recognized, little research has focused on how a degraded signal affects the fine-grained temporal aspects of real-time word recognition. The perception of degraded speech was examined in two populations with the goal of describing the time course of word recognition and lexical competition. Thirty-three postlingually deafened cochlear implant (CI) users and 57 normal hearing (NH) adults (16 in a CI-simulation condition) participated in a visual world paradigm eye-tracking task in which their fixations to a set of phonologically related items were monitored as they heard one item being named. Each degraded-speech group was compared with a set of age-matched NH participants listening to unfiltered speech. CI users and the simulation group showed a delay in activation relative to the NH listeners, and there is weak evidence that the CI users showed differences in the degree of peak and late competitor activation. In general, though, the degraded-speech groups behaved statistically similarly with respect to activation levels. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  13. Electromagnetic versus electrical coupling of personal frequency modulation (FM) receivers to cochlear implant sound processors.

    PubMed

    Schafer, Erin C; Romine, Denise; Musgrave, Elizabeth; Momin, Sadaf; Huynh, Christy

    2013-01-01

    Previous research has suggested that electrically coupled frequency modulation (FM) systems substantially improved speech-recognition performance in noise in individuals with cochlear implants (CIs). However, there is limited evidence to support the use of electromagnetically coupled (neck loop) FM receivers with contemporary CI sound processors containing telecoils. The primary goal of this study was to compare speech-recognition performance in noise and subjective ratings of adolescents and adults using one of three contemporary CI sound processors coupled to electromagnetically and electrically coupled FM receivers from Oticon. A repeated-measures design was used to compare speech-recognition performance in noise and subjective ratings without and with the FM systems across three test sessions (Experiment 1) and to compare performance at different FM-gain settings (Experiment 2). Descriptive statistics were used in Experiment 3 to describe output differences measured through a CI sound processor. Experiment 1 included nine adolescents or adults with unilateral or bilateral Advanced Bionics Harmony (n = 3), Cochlear Nucleus 5 (n = 3), and MED-EL OPUS 2 (n = 3) CI sound processors. In Experiment 2, seven of the original nine participants were tested. In Experiment 3, electroacoustic output was measured from a Nucleus 5 sound processor when coupled to the electromagnetically coupled Oticon Arc neck loop and electrically coupled Oticon R2. In Experiment 1, participants completed a field trial with each FM receiver and three test sessions that included speech-recognition performance in noise and a subjective rating scale. In Experiment 2, participants were tested in three receiver-gain conditions. Results in both experiments were analyzed using repeated-measures analysis of variance. Experiment 3 involved electroacoustic-test measures to determine the monitor-earphone output of the CI alone and CI coupled to the two FM receivers. The results in Experiment 1 suggested that both FM receivers provided significantly better speech-recognition performance in noise than the CI alone; however, the electromagnetically coupled receiver provided significantly better speech-recognition performance in noise and better ratings in some situations than the electrically coupled receiver when set to the same gain. In Experiment 2, the primary analysis suggested significantly better speech-recognition performance in noise for the neck-loop versus electrically coupled receiver, but a second analysis, using the best performance across gain settings for each device, revealed no significant differences between the two FM receivers. Experiment 3 revealed monitor-earphone output differences in the Nucleus 5 sound processor for the two FM receivers when set to the +8 setting used in Experiment 1 but equal output when the electrically coupled device was set to a +16 gain setting and the electromagnetically coupled device was set to the +8 gain setting. Individuals with contemporary sound processors may show more favorable speech-recognition performance in noise electromagnetically coupled FM systems (i.e., Oticon Arc), which is most likely related to the input processing and signal processing pathway within the CI sound processor for direct input versus telecoil input. Further research is warranted to replicate these findings with a larger sample size and to develop and validate a more objective approach to fitting FM systems to CI sound processors. American Academy of Audiology.

  14. Speaker emotion recognition: from classical classifiers to deep neural networks

    NASA Astrophysics Data System (ADS)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2018-04-01

    Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.

  15. Advocate: A Distributed Architecture for Speech-to-Speech Translation

    DTIC Science & Technology

    2009-01-01

    tecture, are either wrapped natural-language processing ( NLP ) components or objects developed from scratch using the architecture’s API. GATE is...framework, we put together a demonstration Arabic -to- English speech translation system using both internally developed ( Arabic speech recognition and MT...conditions of our Arabic S2S demonstration system described earlier. Once again, the data size was varied and eighty identical requests were

  16. Examining explanations for fundamental frequency's contribution to speech intelligibility in noise

    NASA Astrophysics Data System (ADS)

    Schlauch, Robert S.; Miller, Sharon E.; Watson, Peter J.

    2005-09-01

    Laures and Weismer [JSLHR, 42, 1148 (1999)] reported that speech with natural variation in fundamental frequency (F0) is more intelligible in noise than speech with a flattened F0 contour. Cognitive-linguistic based explanations have been offered to account for this drop in intelligibility for the flattened condition, but a lower-level mechanism related to auditory streaming may be responsible. Numerous psychoacoustic studies have demonstrated that modulating a tone enables a listener to segregate it from background sounds. To test these rival hypotheses, speech recognition in noise was measured for sentences with six different F0 contours: unmodified, flattened at the mean, natural but exaggerated, reversed, and frequency modulated (rates of 2.5 and 5.0 Hz). The 180 stimulus sentences were produced by five talkers (30 sentences per condition). Speech recognition for fifteen listeners replicate earlier findings showing that flattening the F0 contour results in a roughly 10% reduction in recognition of key words compared with the natural condition. Although the exaggerated condition produced results comparable to those of the flattened condition, the other conditions with unnatural F0 contours all yielded significantly poorer performance than the flattened condition. These results support the cognitive, linguistic-based explanations for the reduction in performance.

  17. Phase effects in masking by harmonic complexes: speech recognition.

    PubMed

    Deroche, Mickael L D; Culling, John F; Chatterjee, Monita

    2013-12-01

    Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.

  18. [The contribution of different cochlear insertion region to Mandarin speech perception in users of cochlear implant].

    PubMed

    Qi, Beier; Liu, Bo; Liu, Sha; Liu, Haihong; Dong, Ruijuan; Zhang, Ning; Gong, Shusheng

    2011-05-01

    To study the effect of cochlear electrode coverage and different insertion region on speech recognition, especially tone perception of cochlear implant users whose native language is Mandarin Chinese. Setting seven test conditions by fitting software. All conditions were created by switching on/off respective channels in order to simulate different insertion position. Then Mandarin CI users received 4 Speech tests, including Vowel Identification test, Consonant Identification test, Tone Identification test-male speaker, Mandarin HINT test (SRS) in quiet and noise. To all test conditions: the average score of vowel identification was significantly different, from 56% to 91% (Rank sum test, P < 0.05). The average score of consonant identification was significantly different, from 72% to 85% (ANOVNA, P < 0.05). The average score of Tone identification was not significantly different (ANOVNA, P > 0.05). However the more channels activated, the higher scores obtained, from 68% to 81%. This study shows that there is a correlation between insertion depth and speech recognition. Because all parts of the basement membrane can help CI users to improve their speech recognition ability, it is very important to enhance verbal communication ability and social interaction ability of CI users by increasing insertion depth and actively stimulating the top region of cochlear.

  19. The MIT Summit Speech Recognition System: A Progress Report

    DTIC Science & Technology

    1989-01-01

    understanding of the human communication process. Despite recent development of some speech recognition systems with high accuracy, the performance of such...over the past four decades on human communication , in the hope that such systems will one day have a performance approaching that of humans. We are...optimize its use. Third, the system must have a stochastic component to deal with the present state of ignorance in our understanding of the human

  20. Effects of Noise on Speech Recognition and Listening Effort in Children with Normal Hearing and Children with Mild Bilateral or Unilateral Hearing Loss

    ERIC Educational Resources Information Center

    Lewis, Dawna; Schmid, Kendra; O'Leary, Samantha; Spalding, Jody; Heinrichs-Graham, Elizabeth; High, Robin

    2016-01-01

    Purpose: This study examined the effects of stimulus type and hearing status on speech recognition and listening effort in children with normal hearing (NH) and children with mild bilateral hearing loss (MBHL) or unilateral hearing loss (UHL). Method Children (5-12 years of age) with NH (Experiment 1) and children (8-12 years of age) with MBHL,…

Top