speech interface technology: Topics by Science.gov

Sample records for speech interface technology

Nursing acceptance of a speech-input interface: a preliminary investigation.

PubMed

Dillon, T W; McDowell, D; Norcio, A F; DeHaemer, M J

1994-01-01

Many new technologies are being developed to improve the efficiency and productivity of nursing staffs. User acceptance is a key to the success of these technologies. In this article, the authors present a discussion of nursing acceptance of computer systems, review the basic design issues for creating a speech-input interface, and report preliminary findings of a study of nursing acceptance of a prototype speech-input interface. Results of the study showed that the 19 nursing subjects expressed acceptance of the prototype speech-input interface.
Performance Evaluation of Speech Recognition Systems as a Next-Generation Pilot-Vehicle Interface Technology

NASA Technical Reports Server (NTRS)

Arthur, Jarvis J., III; Shelton, Kevin J.; Prinzel, Lawrence J., III; Bailey, Randall E.

2016-01-01

During the flight trials known as Gulfstream-V Synthetic Vision Systems Integrated Technology Evaluation (GV-SITE), a Speech Recognition System (SRS) was used by the evaluation pilots. The SRS system was intended to be an intuitive interface for display control (rather than knobs, buttons, etc.). This paper describes the performance of the current "state of the art" Speech Recognition System (SRS). The commercially available technology was evaluated as an application for possible inclusion in commercial aircraft flight decks as a crew-to-vehicle interface. Specifically, the technology is to be used as an interface from aircrew to the onboard displays, controls, and flight management tasks. A flight test of a SRS as well as a laboratory test was conducted.
Use of Computer Speech Technologies To Enhance Learning.

ERIC Educational Resources Information Center

Ferrell, Joe

1999-01-01

Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)
Technological evaluation of gesture and speech interfaces for enabling dismounted soldier-robot dialogue

NASA Astrophysics Data System (ADS)

Kattoju, Ravi Kiran; Barber, Daniel J.; Abich, Julian; Harris, Jonathan

2016-05-01

With increasing necessity for intuitive Soldier-robot communication in military operations and advancements in interactive technologies, autonomous robots have transitioned from assistance tools to functional and operational teammates able to service an array of military operations. Despite improvements in gesture and speech recognition technologies, their effectiveness in supporting Soldier-robot communication is still uncertain. The purpose of the present study was to evaluate the performance of gesture and speech interface technologies to facilitate Soldier-robot communication during a spatial-navigation task with an autonomous robot. Gesture and speech semantically based spatial-navigation commands leveraged existing lexicons for visual and verbal communication from the U.S Army field manual for visual signaling and a previously established Squad Level Vocabulary (SLV). Speech commands were recorded by a Lapel microphone and Microsoft Kinect, and classified by commercial off-the-shelf automatic speech recognition (ASR) software. Visual signals were captured and classified using a custom wireless gesture glove and software. Participants in the experiment commanded a robot to complete a simulated ISR mission in a scaled down urban scenario by delivering a sequence of gesture and speech commands, both individually and simultaneously, to the robot. Performance and reliability of gesture and speech hardware interfaces and recognition tools were analyzed and reported. Analysis of experimental results demonstrated the employed gesture technology has significant potential for enabling bidirectional Soldier-robot team dialogue based on the high classification accuracy and minimal training required to perform gesture commands.
Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

NASA Technical Reports Server (NTRS)

Ye, Sherry

2015-01-01

NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.
Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

NASA Technical Reports Server (NTRS)

Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

2010-01-01

A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
Network speech systems technology program

NASA Astrophysics Data System (ADS)

Weinstein, C. J.

1981-09-01

This report documents work performed during FY 1981 on the DCA-sponsored Network Speech Systems Technology Program. The two areas of work reported are: (1) communication system studies in support of the evolving Defense Switched Network (DSN) and (2) design and implementation of satellite/terrestrial interfaces for the Experimental Integrated Switched Network (EISN). The system studies focus on the development and evaluation of economical and endurable network routing procedures. Satellite/terrestrial interface development includes circuit-switched and packet-switched connections to the experimental wideband satellite network. Efforts in planning and coordination of EISN experiments are reported in detail in a separate EISN Experiment Plan.
Voice Response Systems Technology.

ERIC Educational Resources Information Center

Gerald, Jeanette

1984-01-01

Examines two methods of generating synthetic speech in voice response systems, which allow computers to communicate in human terms (speech), using human interface devices (ears): phoneme and reconstructed voice systems. Considerations prior to implementation, current and potential applications, glossary, directory, and introduction to Input Output…
Automatic Speech Recognition from Neural Signals: A Focused Review.

PubMed

Herff, Christian; Schultz, Tanja

2016-01-01

Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
An Innovative Speech-Based User Interface for Smarthomes and IoT Solutions to Help People with Speech and Motor Disabilities.

PubMed

Malavasi, Massimiliano; Turri, Enrico; Atria, Jose Joaquin; Christensen, Heidi; Marxer, Ricard; Desideri, Lorenzo; Coy, Andre; Tamburini, Fabio; Green, Phil

2017-01-01

A better use of the increasing functional capabilities of home automation systems and Internet of Things (IoT) devices to support the needs of users with disability, is the subject of a research project currently conducted by Area Ausili (Assistive Technology Area), a department of Polo Tecnologico Regionale Corte Roncati of the Local Health Trust of Bologna (Italy), in collaboration with AIAS Ausilioteca Assistive Technology (AT) Team. The main aim of the project is to develop experimental low cost systems for environmental control through simplified and accessible user interfaces. Many of the activities are focused on automatic speech recognition and are developed in the framework of the CloudCAST project. In this paper we report on the first technical achievements of the project and discuss future possible developments and applications within and outside CloudCAST.
Applying Spatial Audio to Human Interfaces: 25 Years of NASA Experience

NASA Technical Reports Server (NTRS)

Begault, Durand R.; Wenzel, Elizabeth M.; Godfrey, Martine; Miller, Joel D.; Anderson, Mark R.

2010-01-01

From the perspective of human factors engineering, the inclusion of spatial audio within a human-machine interface is advantageous from several perspectives. Demonstrated benefits include the ability to monitor multiple streams of speech and non-speech warning tones using a cocktail party advantage, and for aurally-guided visual search. Other potential benefits include the spatial coordination and interaction of multimodal events, and evaluation of new communication technologies and alerting systems using virtual simulation. Many of these technologies were developed at NASA Ames Research Center, beginning in 1985. This paper reviews examples and describes the advantages of spatial sound in NASA-related technologies, including space operations, aeronautics, and search and rescue. The work has involved hardware and software development as well as basic and applied research.
Interfering and Resolving: How Tabletop Interaction Facilitates Co-Construction of Argumentative Knowledge

ERIC Educational Resources Information Center

Falcao, Taciana Pontual; Price, Sara

2011-01-01

Tangible technologies and shared interfaces create new paradigms for mediating collaboration through dynamic, synchronous environments, where action is as important as speech for participating and contributing to the activity. However, interaction with shared interfaces has been shown to be inherently susceptible to peer interference, potentially…
Speech and gesture interfaces for squad-level human-robot teaming

NASA Astrophysics Data System (ADS)

Harris, Jonathan; Barber, Daniel

2014-06-01

As the military increasingly adopts semi-autonomous unmanned systems for military operations, utilizing redundant and intuitive interfaces for communication between Soldiers and robots is vital to mission success. Currently, Soldiers use a common lexicon to verbally and visually communicate maneuvers between teammates. In order for robots to be seamlessly integrated within mixed-initiative teams, they must be able to understand this lexicon. Recent innovations in gaming platforms have led to advancements in speech and gesture recognition technologies, but the reliability of these technologies for enabling communication in human robot teaming is unclear. The purpose for the present study is to investigate the performance of Commercial-Off-The-Shelf (COTS) speech and gesture recognition tools in classifying a Squad Level Vocabulary (SLV) for a spatial navigation reconnaissance and surveillance task. The SLV for this study was based on findings from a survey conducted with Soldiers at Fort Benning, GA. The items of the survey focused on the communication between the Soldier and the robot, specifically in regards to verbally instructing them to execute reconnaissance and surveillance tasks. Resulting commands, identified from the survey, were then converted to equivalent arm and hand gestures, leveraging existing visual signals (e.g. U.S. Army Field Manual for Visual Signaling). A study was then run to test the ability of commercially available automated speech recognition technologies and a gesture recognition glove to classify these commands in a simulated intelligence, surveillance, and reconnaissance task. This paper presents classification accuracy of these devices for both speech and gesture modalities independently.
17 Ways to Say Yes: Toward Nuanced Tone of Voice in AAC and Speech Technology

PubMed Central

Pullin, Graham; Hennig, Shannon

2015-01-01

Abstract People with complex communication needs who use speech-generating devices have very little expressive control over their tone of voice. Despite its importance in human interaction, the issue of tone of voice remains all but absent from AAC research and development however. In this paper, we describe three interdisciplinary projects, past, present and future: The critical design collection Six Speaking Chairs has provoked deeper discussion and inspired a social model of tone of voice; the speculative concept Speech Hedge illustrates challenges and opportunities in designing more expressive user interfaces; the pilot project Tonetable could enable participatory research and seed a research network around tone of voice. We speculate that more radical interactions might expand frontiers of AAC and disrupt speech technology as a whole. PMID:25965913
Exploring expressivity and emotion with artificial voice and speech technologies.

PubMed

Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

2013-10-01

Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.
Speech-recognition interfaces for music information retrieval

NASA Astrophysics Data System (ADS)

Goto, Masataka

2005-09-01

This paper describes two hands-free music information retrieval (MIR) systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. Our MIR-based jukebox systems employ two different speech-recognition interfaces for MIR, speech completion and speech spotter, which exploit intentionally controlled nonverbal speech information in original ways. The first is a music retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user only remembers part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces. (Video clips: http://staff.aist.go.jp/m.goto/MIR/speech-if.html)
Implementing Artificial Intelligence Behaviors in a Virtual World

NASA Technical Reports Server (NTRS)

Krisler, Brian; Thome, Michael

2012-01-01

In this paper, we will present a look at the current state of the art in human-computer interface technologies, including intelligent interactive agents, natural speech interaction and gestural based interfaces. We describe our use of these technologies to implement a cost effective, immersive experience on a public region in Second Life. We provision our Artificial Agents as a German Shepherd Dog avatar with an external rules engine controlling the behavior and movement. To interact with the avatar, we implemented a natural language and gesture system allowing the human avatars to use speech and physical gestures rather than interacting via a keyboard and mouse. The result is a system that allows multiple humans to interact naturally with AI avatars by playing games such as fetch with a flying disk and even practicing obedience exercises using voice and gesture, a natural seeming day in the park.
Integrated multimodal human-computer interface and augmented reality for interactive display applications

NASA Astrophysics Data System (ADS)

Vassiliou, Marius S.; Sundareswaran, Venkataraman; Chen, S.; Behringer, Reinhold; Tam, Clement K.; Chan, M.; Bangayan, Phil T.; McGee, Joshua H.

2000-08-01

We describe new systems for improved integrated multimodal human-computer interaction and augmented reality for a diverse array of applications, including future advanced cockpits, tactical operations centers, and others. We have developed an integrated display system featuring: speech recognition of multiple concurrent users equipped with both standard air- coupled microphones and novel throat-coupled sensors (developed at Army Research Labs for increased noise immunity); lip reading for improving speech recognition accuracy in noisy environments, three-dimensional spatialized audio for improved display of warnings, alerts, and other information; wireless, coordinated handheld-PC control of a large display; real-time display of data and inferences from wireless integrated networked sensors with on-board signal processing and discrimination; gesture control with disambiguated point-and-speak capability; head- and eye- tracking coupled with speech recognition for 'look-and-speak' interaction; and integrated tetherless augmented reality on a wearable computer. The various interaction modalities (speech recognition, 3D audio, eyetracking, etc.) are implemented a 'modality servers' in an Internet-based client-server architecture. Each modality server encapsulates and exposes commercial and research software packages, presenting a socket network interface that is abstracted to a high-level interface, minimizing both vendor dependencies and required changes on the client side as the server's technology improves.
Is talking to an automated teller machine natural and fun?

PubMed

Chan, F Y; Khalid, H M

Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.
Automatic Speech Recognition in Air Traffic Control: a Human Factors Perspective

NASA Technical Reports Server (NTRS)

Karlsson, Joakim

1990-01-01

The introduction of Automatic Speech Recognition (ASR) technology into the Air Traffic Control (ATC) system has the potential to improve overall safety and efficiency. However, because ASR technology is inherently a part of the man-machine interface between the user and the system, the human factors issues involved must be addressed. Here, some of the human factors problems are identified and related methods of investigation are presented. Research at M.I.T.'s Flight Transportation Laboratory is being conducted from a human factors perspective, focusing on intelligent parser design, presentation of feedback, error correction strategy design, and optimal choice of input modalities.

Emerging technologies with potential for objectively evaluating speech recognition skills.

PubMed

Rawool, Vishakha Waman

2016-01-01

Work-related exposure to noise and other ototoxins can cause damage to the cochlea, synapses between the inner hair cells, the auditory nerve fibers, and higher auditory pathways, leading to difficulties in recognizing speech. Procedures designed to determine speech recognition scores (SRS) in an objective manner can be helpful in disability compensation cases where the worker claims to have poor speech perception due to exposure to noise or ototoxins. Such measures can also be helpful in determining SRS in individuals who cannot provide reliable responses to speech stimuli, including patients with Alzheimer's disease, traumatic brain injuries, and infants with and without hearing loss. Cost-effective neural monitoring hardware and software is being rapidly refined due to the high demand for neurogaming (games involving the use of brain-computer interfaces), health, and other applications. More specifically, two related advances in neuro-technology include relative ease in recording neural activity and availability of sophisticated analysing techniques. These techniques are reviewed in the current article and their applications for developing objective SRS procedures are proposed. Issues related to neuroaudioethics (ethics related to collection of neural data evoked by auditory stimuli including speech) and neurosecurity (preservation of a person's neural mechanisms and free will) are also discussed.
Automated speech understanding: the next generation

NASA Astrophysics Data System (ADS)

Picone, J.; Ebel, W. J.; Deshmukh, N.

1995-04-01

Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.
Network Speech Systems Technology Program

NASA Astrophysics Data System (ADS)

Weinstein, C. J.

1980-09-01

This report documents work performed during FY 1980 on the DCA-sponsored Network Speech Systems Technology Program. The areas of work reported are: (1) communication systems studies in Demand-Assignment Multiple Access (DAMA), voice/data integration, and adaptive routing, in support of the evolving Defense Communications System (DCS) and Defense Switched Network (DSN); (2) a satellite/terrestrial integration design study including the functional design of voice and data interfaces to interconnect terrestrial and satellite network subsystems; and (3) voice-conferencing efforts dealing with support of the Secure Voice and Graphics Conferencing (SVGC) Test and Evaluation Program. Progress in definition and planning of experiments for the Experimental Integrated Switched Network (EISN) is detailed separately in an FY 80 Experiment Plan Supplement.
Designing a Humane Multimedia Interface for the Visually Impaired.

ERIC Educational Resources Information Center

Ghaoui, Claude; Mann, M.; Ng, Eng Huat

2001-01-01

Promotes the provision of interfaces that allow users to access most of the functionality of existing graphical user interfaces (GUI) using speech. Uses the design of a speech control tool that incorporates speech recognition and synthesis into existing packaged software such as Teletext, the Internet, or a word processor. (Contains 22…
A multimodal interface for real-time soldier-robot teaming

NASA Astrophysics Data System (ADS)

Barber, Daniel J.; Howard, Thomas M.; Walter, Matthew R.

2016-05-01

Recent research and advances in robotics have led to the development of novel platforms leveraging new sensing capabilities for semantic navigation. As these systems becoming increasingly more robust, they support highly complex commands beyond direct teleoperation and waypoint finding facilitating a transition away from robots as tools to robots as teammates. Supporting future Soldier-Robot teaming requires communication capabilities on par with human-human teams for successful integration of robots. Therefore, as robots increase in functionality, it is equally important that the interface between the Soldier and robot advances as well. Multimodal communication (MMC) enables human-robot teaming through redundancy and levels of communications more robust than single mode interaction. Commercial-off-the-shelf (COTS) technologies released in recent years for smart-phones and gaming provide tools for the creation of portable interfaces incorporating MMC through the use of speech, gestures, and visual displays. However, for multimodal interfaces to be successfully used in the military domain, they must be able to classify speech, gestures, and process natural language in real-time with high accuracy. For the present study, a prototype multimodal interface supporting real-time interactions with an autonomous robot was developed. This device integrated COTS Automated Speech Recognition (ASR), a custom gesture recognition glove, and natural language understanding on a tablet. This paper presents performance results (e.g. response times, accuracy) of the integrated device when commanding an autonomous robot to perform reconnaissance and surveillance activities in an unknown outdoor environment.
Research Operations for Advanced Warfighter Interface Technologies

DTIC Science & Technology

2009-06-01

tk.sourceforge.net 11 Available at http://www.speech.kth.se/ snack 17 Cross-word triphone Multi-Space probability Distribution (MSD)-HMMs [23...prepared and delivered for shipment as part of on-going coating evaluations. 6) The IR terrain board was mounted to an extruded aluminum frame for ease of
Speech Recognition as a Transcription Aid: A Randomized Comparison With Standard Transcription

PubMed Central

Mohr, David N.; Turner, David W.; Pond, Gregory R.; Kamath, Joseph S.; De Vos, Cathy B.; Carpenter, Paul C.

2003-01-01

Objective. Speech recognition promises to reduce information entry costs for clinical information systems. It is most likely to be accepted across an organization if physicians can dictate without concerning themselves with real-time recognition and editing; assistants can then edit and process the computer-generated document. Our objective was to evaluate the use of speech-recognition technology in a randomized controlled trial using our institutional infrastructure. Design. Clinical note dictations from physicians in two specialty divisions were randomized to either a standard transcription process or a speech-recognition process. Secretaries and transcriptionists also were assigned randomly to each of these processes. Measurements. The duration of each dictation was measured. The amount of time spent processing a dictation to yield a finished document also was measured. Secretarial and transcriptionist productivity, defined as hours of secretary work per minute of dictation processed, was determined for speech recognition and standard transcription. Results. Secretaries in the endocrinology division were 87.3% (confidence interval, 83.3%, 92.3%) as productive with the speech-recognition technology as implemented in this study as they were using standard transcription. Psychiatry transcriptionists and secretaries were similarly less productive. Author, secretary, and type of clinical note were significant (p < 0.05) predictors of productivity. Conclusion. When implemented in an organization with an existing document-processing infrastructure (which included training and interfaces of the speech-recognition editor with the existing document entry application), speech recognition did not improve the productivity of secretaries or transcriptionists. PMID:12509359
A real-time phoneme counting algorithm and application for speech rate monitoring.

PubMed

Aharonson, Vered; Aharonson, Eran; Raichlin-Levi, Katia; Sotzianu, Aviv; Amir, Ofer; Ovadia-Blechman, Zehava

2017-03-01

Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient's speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient's practice. The algorithm's phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of -4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice. Copyright Â© 2017 Elsevier Inc. All rights reserved.
A study of speech interfaces for the vehicle environment.

DOT National Transportation Integrated Search

2013-05-01

Over the past few years, there has been a shift in automotive human machine interfaces from : visual-manual interactions (pushing buttons and rotating knobs) to speech interaction. In terms of : distraction, the industry views speech interaction as a...
Improving medical imaging report turnaround times: the role of technolgy.

PubMed

Marquez, Luis O; Stewart, Howard

2005-01-01

At Southern Ohio Medical Center (SOMC), the medical imaging department and the radiologists expressed a strong desire to improve workflow. The improved workflow was a major motivating factor toward implementing a new RIS and speech recognition technology. The need to monitor workflow in a real-time fashion and to evaluate productivity and resources necessitated that a new solution be found. A decision was made to roll out both the new RIS product and speech recognition to maximize the resources to interface and implement the new solution. Prior to implementation of the new RIS, the medical imaging department operated in a conventional electronic-order-entry to paper request manner. The paper request followed the study through exam completion to the radiologist. SOMC entered into a contract with its PACS vendor to participate in beta testing and clinical trials for a new RIS product for the US market. Backup plans were created in the event the product failed to function as planned--either during the beta testing period or during clinical trails. The last piece of the technology puzzle to improve report turnaround time was voice recognition technology. Speech recognition enhanced the RIS technology as soon as it was implemented. The results show that the project has been a success. The new RIS, combined with speech recognition and the PACS, makes for a very effective solution to patient, exam, and results management in the medical imaging department.
Speech-based E-mail and driver behavior: effects of an in-vehicle message system interface.

PubMed

Jamson, A Hamish; Westerman, Stephen J; Hockey, G Robert J; Carsten, Oliver M J

2004-01-01

As mobile office technology becomes more advanced, drivers have increased opportunity to process information "on the move." Although speech-based interfaces can minimize direct interference with driving, the cognitive demands associated with such systems may still cause distraction. We studied the effects on driving performance of an in-vehicle simulated "E-mail" message system; E-mails were either system controlled or driver controlled. A high-fidelity, fixed-base driving simulator was used to test 19 participants on a car-following task. Virtual traffic scenarios varying in driving demand. Drivers compensated for the secondary task by adopting longer headways but showed reduced anticipation of braking requirements and shorter time to collision. Drivers were also less reactive when processing E-mails, demonstrated by a reduction in steering wheel inputs. In most circumstances, there were advantages in providing drivers with control over when E-mails were opened. However, during periods without E-mail interaction in demanding traffic scenarios, drivers showed reduced braking anticipation. This may be a result of increased cognitive costs associated with the decision making process when using a driver-controlled interface when the task of scheduling E-mail acceptance is added to those of driving and E-mail response. Actual or potential applications of this research include the design of speech-based in-vehicle messaging systems.
An intelligent multi-media human-computer dialogue system

NASA Technical Reports Server (NTRS)

Neal, J. G.; Bettinger, K. E.; Byoun, J. S.; Dobes, Z.; Thielman, C. Y.

1988-01-01

Sophisticated computer systems are being developed to assist in the human decision-making process for very complex tasks performed under stressful conditions. The human-computer interface is a critical factor in these systems. The human-computer interface should be simple and natural to use, require a minimal learning period, assist the user in accomplishing his task(s) with a minimum of distraction, present output in a form that best conveys information to the user, and reduce cognitive load for the user. In pursuit of this ideal, the Intelligent Multi-Media Interfaces project is devoted to the development of interface technology that integrates speech, natural language text, graphics, and pointing gestures for human-computer dialogues. The objective of the project is to develop interface technology that uses the media/modalities intelligently in a flexible, context-sensitive, and highly integrated manner modelled after the manner in which humans converse in simultaneous coordinated multiple modalities. As part of the project, a knowledge-based interface system, called CUBRICON (CUBRC Intelligent CONversationalist) is being developed as a research prototype. The application domain being used to drive the research is that of military tactical air control.
Human Factors Society, Annual Meeting, 35th, San Francisco, CA, Sept. 2-6, 1991, Proceedings. Vols. 1 2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

These proceedings discuss human factor issues related to aerospace systems, aging, communications, computer systems, consumer products, education and forensic topics, environmental design, industrial ergonomics, international technology transfer, organizational design and management, personality and individual differences in human performance, safety, system development, test and evaluation, training, and visual performance. Particular attention is given to HUDs, attitude indicators, and sensor displays; human factors of space exploration; behavior and aging; the design and evaluation of phone-based interfaces; knowledge acquisition and expert systems; handwriting, speech, and other input techniques; interface design for text, numerics, and speech; and human factor issues in medicine. Also discussedmore » are cumulative trauma disorders, industrial safety, evaluative techniques for automation impacts on the human operators, visual issues in training, and interpreting and organizing human factor concepts and information.« less
The Next Wave: Humans, Computers, and Redefining Reality

NASA Technical Reports Server (NTRS)

Little, William

2018-01-01

The Augmented/Virtual Reality (AVR) Lab at KSC is dedicated to " exploration into the growing computer fields of Extended Reality and the Natural User Interface (it is) a proving ground for new technologies that can be integrated into future NASA projects and programs." The topics of Human Computer Interface, Human Computer Interaction, Augmented Reality, Virtual Reality, and Mixed Reality are defined; examples of work being done in these fields in the AVR Lab are given. Current new and future work in Computer Vision, Speech Recognition, and Artificial Intelligence are also outlined.
Designing speech-based interfaces for telepresence robots for people with disabilities.

PubMed

Tsui, Katherine M; Flynn, Kelsey; McHugh, Amelia; Yanco, Holly A; Kontak, David

2013-06-01

People with cognitive and/or motor impairments may benefit from using telepresence robots to engage in social activities. To date, these robots, their user interfaces, and their navigation behaviors have not been designed for operation by people with disabilities. We conducted an experiment in which participants (n=12) used a telepresence robot in a scavenger hunt task to determine how they would use speech to command the robot. Based upon the results, we present design guidelines for speech-based interfaces for telepresence robots.
Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

ERIC Educational Resources Information Center

Ferati, Mexhid Adem

2012-01-01

To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…
Projection Mapping User Interface for Disabled People

PubMed Central

Simutis, Rimvydas; Maskeliūnas, Rytis

2018-01-01

Difficulty in communicating is one of the key challenges for people suffering from severe motor and speech disabilities. Often such person can communicate and interact with the environment only using assistive technologies. This paper presents a multifunctional user interface designed to improve communication efficiency and person independence. The main component of this interface is a projection mapping technique used to highlight objects in the environment. Projection mapping makes it possible to create a natural augmented reality information presentation method. The user interface combines a depth sensor and a projector to create camera-projector system. We provide a detailed description of camera-projector system calibration procedure. The described system performs tabletop object detection and automatic projection mapping. Multiple user input modalities have been integrated into the multifunctional user interface. Such system can be adapted to the needs of people with various disabilities. PMID:29686827
Projection Mapping User Interface for Disabled People.

PubMed

Gelšvartas, Julius; Simutis, Rimvydas; Maskeliūnas, Rytis

2018-01-01

Difficulty in communicating is one of the key challenges for people suffering from severe motor and speech disabilities. Often such person can communicate and interact with the environment only using assistive technologies. This paper presents a multifunctional user interface designed to improve communication efficiency and person independence. The main component of this interface is a projection mapping technique used to highlight objects in the environment. Projection mapping makes it possible to create a natural augmented reality information presentation method. The user interface combines a depth sensor and a projector to create camera-projector system. We provide a detailed description of camera-projector system calibration procedure. The described system performs tabletop object detection and automatic projection mapping. Multiple user input modalities have been integrated into the multifunctional user interface. Such system can be adapted to the needs of people with various disabilities.
Speech-based interaction with in-vehicle computers: the effect of speech-based e-mail on drivers' attention to the roadway.

PubMed

Lee, J D; Caven, B; Haake, S; Brown, T L

2001-01-01

As computer applications for cars emerge, a speech-based interface offers an appealing alternative to the visually demanding direct manipulation interface. However, speech-based systems may pose cognitive demands that could undermine driving safety. This study used a car-following task to evaluate how a speech-based e-mail system affects drivers' response to the periodic braking of a lead vehicle. The study included 24 drivers between the ages of 18 and 24 years. A baseline condition with no e-mail system was compared with a simple and a complex e-mail system in both simple and complex driving environments. The results show a 30% (310 ms) increase in reaction time when the speech-based system is used. Subjective workload ratings and probe questions also indicate that speech-based interaction introduces a significant cognitive load, which was highest for the complex e-mail system. These data show that a speech-based interface is not a panacea that eliminates the potential distraction of in-vehicle computers. Actual or potential applications of this research include design of in-vehicle information systems and evaluation of their contributions to driver distraction.
Natural Language Based Multimodal Interface for UAV Mission Planning

NASA Technical Reports Server (NTRS)

Chandarana, Meghan; Meszaros, Erica L.; Trujillo, Anna; Allen, B. Danette

2017-01-01

As the number of viable applications for unmanned aerial vehicle (UAV) systems increases at an exponential rate, interfaces that reduce the reliance on highly skilled engineers and pilots must be developed. Recent work aims to make use of common human communication modalities such as speech and gesture. This paper explores a multimodal natural language interface that uses a combination of speech and gesture input modalities to build complex UAV flight paths by defining trajectory segment primitives. Gesture inputs are used to define the general shape of a segment while speech inputs provide additional geometric information needed to fully characterize a trajectory segment. A user study is conducted in order to evaluate the efficacy of the multimodal interface.

[Neurophysiological Foundations and Practical Realizations of the Brain-Machine Interfaces the Technology in Neurological Rehabilitation].

PubMed

Kaplan, A Ya

2016-01-01

Technology brain-computer interface (BCI) based on the registration and interpretation of EEG has recently become one of the most popular developments in neuroscience and psychophysiology. This is due not only to the intended future use of these technologies in many areas of practical human activity, but also to the fact that IMC--is a completely new paradigm in psychophysiology, allowing test hypotheses about the possibilities of the human brain to the development of skills of interaction with the outside world without the mediation of the motor system, i.e. only with the help of voluntary modulation of EEG generators. This paper examines the theoretical and experimental basis, the current state and prospects of development of training, communicational and assisting complexes based on BCI to control them without muscular effort on the basis of mental commands detected in the EEG of patients with severely impaired speech and motor system.
A speech-controlled environmental control system for people with severe dysarthria.

PubMed

Hawley, Mark S; Enderby, Pam; Green, Phil; Cunningham, Stuart; Brownsell, Simon; Carmichael, James; Parker, Mark; Hatzis, Athanassios; O'Neill, Peter; Palmer, Rebecca

2007-06-01

Automatic speech recognition (ASR) can provide a rapid means of controlling electronic assistive technology. Off-the-shelf ASR systems function poorly for users with severe dysarthria because of the increased variability of their articulations. We have developed a limited vocabulary speaker dependent speech recognition application which has greater tolerance to variability of speech, coupled with a computerised training package which assists dysarthric speakers to improve the consistency of their vocalisations and provides more data for recogniser training. These applications, and their implementation as the interface for a speech-controlled environmental control system (ECS), are described. The results of field trials to evaluate the training program and the speech-controlled ECS are presented. The user-training phase increased the recognition rate from 88.5% to 95.4% (p<0.001). Recognition rates were good for people with even the most severe dysarthria in everyday usage in the home (mean word recognition rate 86.9%). Speech-controlled ECS were less accurate (mean task completion accuracy 78.6% versus 94.8%) but were faster to use than switch-scanning systems, even taking into account the need to repeat unsuccessful operations (mean task completion time 7.7s versus 16.9s, p<0.001). It is concluded that a speech-controlled ECS is a viable alternative to switch-scanning systems for some people with severe dysarthria and would lead, in many cases, to more efficient control of the home.
TRECVID: the utility of a content-based video retrieval evaluation

NASA Astrophysics Data System (ADS)

Hauptmann, Alexander G.

2006-01-01

TRECVID, an annual retrieval evaluation benchmark organized by NIST, encourages research in information retrieval from digital video. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of semantic features, and the automatic segmentation of TV news broadcasts. Evaluations done in the context of the TRECVID benchmarks show that generally, speech transcripts and annotations provide the single most important clue for successful retrieval. However, automatically finding the individual images is still a tremendous and unsolved challenge. The evaluations repeatedly found that none of the multimedia analysis and retrieval techniques provide a significant benefit over retrieval using only textual information such as from automatic speech recognition transcripts or closed captions. In interactive systems, we do find significant differences among the top systems, indicating that interfaces can make a huge difference for effective video/image search. For interactive tasks efficient interfaces require few key clicks, but display large numbers of images for visual inspection by the user. The text search finds the right context region in the video in general, but to select specific relevant images we need good interfaces to easily browse the storyboard pictures. In general, TRECVID has motivated the video retrieval community to be honest about what we don't know how to do well (sometimes through painful failures), and has focused us to work on the actual task of video retrieval, as opposed to flashy demos based on technological capabilities.
Initial constructs for patient-centered outcome measures to evaluate brain-computer interfaces.

PubMed

Andresen, Elena M; Fried-Oken, Melanie; Peters, Betts; Patrick, Donald L

2016-10-01

The authors describe preliminary work toward the creation of patient-centered outcome (PCO) measures to evaluate brain-computer interface (BCI) as an assistive technology (AT) for individuals with severe speech and physical impairments (SSPI). In Phase 1, 591 items from 15 existing measures were mapped to the International Classification of Functioning, Disability and Health (ICF). In Phase 2, qualitative interviews were conducted with eight people with SSPI and seven caregivers. Resulting text data were coded in an iterative analysis. Most items (79%) were mapped to the ICF environmental domain; over half (53%) were mapped to more than one domain. The ICF framework was well suited for mapping items related to body functions and structures, but less so for items in other areas, including personal factors. Two constructs emerged from qualitative data: quality of life (QOL) and AT. Component domains and themes were identified for each. Preliminary constructs, domains and themes were generated for future PCO measures relevant to BCI. Existing instruments are sufficient for initial items but do not adequately match the values of people with SSPI and their caregivers. Field methods for interviewing people with SSPI were successful, and support the inclusion of these individuals in PCO research. Implications for Rehabilitation Adapted interview methods allow people with severe speech and physical impairments to participate in patient-centered outcomes research. Patient-centered outcome measures are needed to evaluate the clinical implementation of brain-computer interface as an assistive technology.
Intelligent interfaces for expert systems

NASA Technical Reports Server (NTRS)

Villarreal, James A.; Wang, Lui

1988-01-01

Vital to the success of an expert system is an interface to the user which performs intelligently. A generic intelligent interface is being developed for expert systems. This intelligent interface was developed around the in-house developed Expert System for the Flight Analysis System (ESFAS). The Flight Analysis System (FAS) is comprised of 84 configuration controlled FORTRAN subroutines that are used in the preflight analysis of the space shuttle. In order to use FAS proficiently, a person must be knowledgeable in the areas of flight mechanics, the procedures involved in deploying a certain payload, and an overall understanding of the FAS. ESFAS, still in its developmental stage, is taking into account much of this knowledge. The generic intelligent interface involves the integration of a speech recognizer and synthesizer, a preparser, and a natural language parser to ESFAS. The speech recognizer being used is capable of recognizing 1000 words of connected speech. The natural language parser is a commercial software package which uses caseframe instantiation in processing the streams of words from the speech recognizer or the keyboard. The systems configuration is described along with capabilities and drawbacks.
Speech Recognition for A Digital Video Library.

ERIC Educational Resources Information Center

Witbrock, Michael J.; Hauptmann, Alexander G.

1998-01-01

Production of the meta-data supporting the Informedia Digital Video Library interface is automated using techniques derived from artificial intelligence research. Speech recognition and natural-language processing, information retrieval, and image analysis are applied to produce an interface that helps users locate information and navigate more…
Computer interfaces for the visually impaired

NASA Technical Reports Server (NTRS)

Higgins, Gerry

1991-01-01

Information access via computer terminals extends to blind and low vision persons employed in many technical and nontechnical disciplines. Two aspects are detailed of providing computer technology for persons with a vision related handicap. First, research into the most effective means of integrating existing adaptive technologies into information systems was made. This was conducted to integrate off the shelf products with adaptive equipment for cohesive integrated information processing systems. Details are included that describe the type of functionality required in software to facilitate its incorporation into a speech and/or braille system. The second aspect is research into providing audible and tactile interfaces to graphics based interfaces. Parameters are included for the design and development of the Mercator Project. The project will develop a prototype system for audible access to graphics based interfaces. The system is being built within the public domain architecture of X windows to show that it is possible to provide access to text based applications within a graphical environment. This information will be valuable to suppliers to ADP equipment since new legislation requires manufacturers to provide electronic access to the visually impaired.
Key considerations in designing a speech brain-computer interface.

PubMed

Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Chabardès, Stéphan; Yvert, Blaise

2016-11-01

Restoring communication in case of aphasia is a key challenge for neurotechnologies. To this end, brain-computer strategies can be envisioned to allow artificial speech synthesis from the continuous decoding of neural signals underlying speech imagination. Such speech brain-computer interfaces do not exist yet and their design should consider three key choices that need to be made: the choice of appropriate brain regions to record neural activity from, the choice of an appropriate recording technique, and the choice of a neural decoding scheme in association with an appropriate speech synthesis method. These key considerations are discussed here in light of (1) the current understanding of the functional neuroanatomy of cortical areas underlying overt and covert speech production, (2) the available literature making use of a variety of brain recording techniques to better characterize and address the challenge of decoding cortical speech signals, and (3) the different speech synthesis approaches that can be considered depending on the level of speech representation (phonetic, acoustic or articulatory) envisioned to be decoded at the core of a speech BCI paradigm. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Intentional Voice Command Detection for Trigger-Free Speech Interface

NASA Astrophysics Data System (ADS)

Obuchi, Yasunari; Sumiyoshi, Takashi

In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
MARTI: man-machine animation real-time interface

NASA Astrophysics Data System (ADS)

Jones, Christian M.; Dlay, Satnam S.

1997-05-01

The research introduces MARTI (man-machine animation real-time interface) for the realization of natural human-machine interfacing. The system uses simple vocal sound-tracks of human speakers to provide lip synchronization of computer graphical facial models. We present novel research in a number of engineering disciplines, which include speech recognition, facial modeling, and computer animation. This interdisciplinary research utilizes the latest, hybrid connectionist/hidden Markov model, speech recognition system to provide very accurate phone recognition and timing for speaker independent continuous speech, and expands on knowledge from the animation industry in the development of accurate facial models and automated animation. The research has many real-world applications which include the provision of a highly accurate and 'natural' man-machine interface to assist user interactions with computer systems and communication with one other using human idiosyncrasies; a complete special effects and animation toolbox providing automatic lip synchronization without the normal constraints of head-sets, joysticks, and skilled animators; compression of video data to well below standard telecommunication channel bandwidth for video communications and multi-media systems; assisting speech training and aids for the handicapped; and facilitating player interaction for 'video gaming' and 'virtual worlds.' MARTI has introduced a new level of realism to man-machine interfacing and special effect animation which has been previously unseen.
The Sound-to-Speech Translations Utilizing Graphics Mediation Interface for Students with Severe Handicaps. Final Report.

ERIC Educational Resources Information Center

Brown, Carrie; And Others

This final report describes activities and outcomes of a research project on a sound-to-speech translation system utilizing a graphic mediation interface for students with severe disabilities. The STS/Graphics system is a voice recognition, computer-based system designed to allow individuals with mental retardation and/or severe physical…
The Use of Spatialized Speech in Auditory Interfaces for Computer Users Who Are Visually Impaired

ERIC Educational Resources Information Center

Sodnik, Jaka; Jakus, Grega; Tomazic, Saso

2012-01-01

Introduction: This article reports on a study that explored the benefits and drawbacks of using spatially positioned synthesized speech in auditory interfaces for computer users who are visually impaired (that is, are blind or have low vision). The study was a practical application of such systems--an enhanced word processing application compared…
The role of voice input for human-machine communication.

PubMed Central

Cohen, P R; Oviatt, S L

1995-01-01

Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology. PMID:7479803
Development and Utility of Automatic Language Processing Technologies. Volume 2

DTIC Science & Technology

2014-04-01

speech for each word using the existing Treetagger program. 3. Stem the word using the revised RevP stemmer, “RussianStemmer2013. java ” (see Section...KBaselineParaphrases2013. java ,” with the paraphrase table and a LM built from the TED training data. Information from the LM was called using the new utility query_interp...GATE/ Java Annotation Patterns Engine (JAPE) interface and on transliteration of Chinese named entities. Available Linguistic Data Consortium (LDC
An overview of artificial intelligence and robotics. Volume 1: Artificial intelligence. Part B: Applications

NASA Technical Reports Server (NTRS)

Gevarter, W. B.

1983-01-01

Artificial Intelligence (AI) is an emerging technology that has recently attracted considerable attention. Many applications are now under development. This report, Part B of a three part report on AI, presents overviews of the key application areas: Expert Systems, Computer Vision, Natural Language Processing, Speech Interfaces, and Problem Solving and Planning. The basic approaches to such systems, the state-of-the-art, existing systems and future trends and expectations are covered.
Adaptive multimodal interaction in mobile augmented reality: A conceptual framework

NASA Astrophysics Data System (ADS)

Abidin, Rimaniza Zainal; Arshad, Haslina; Shukri, Saidatul A'isyah Ahmad

2017-10-01

Recently, Augmented Reality (AR) is an emerging technology in many mobile applications. Mobile AR was defined as a medium for displaying information merged with the real world environment mapped with augmented reality surrounding in a single view. There are four main types of mobile augmented reality interfaces and one of them are multimodal interfaces. Multimodal interface processes two or more combined user input modes (such as speech, pen, touch, manual gesture, gaze, and head and body movements) in a coordinated manner with multimedia system output. In multimodal interface, many frameworks have been proposed to guide the designer to develop a multimodal applications including in augmented reality environment but there has been little work reviewing the framework of adaptive multimodal interface in mobile augmented reality. The main goal of this study is to propose a conceptual framework to illustrate the adaptive multimodal interface in mobile augmented reality. We reviewed several frameworks that have been proposed in the field of multimodal interfaces, adaptive interface and augmented reality. We analyzed the components in the previous frameworks and measure which can be applied in mobile devices. Our framework can be used as a guide for designers and developer to develop a mobile AR application with an adaptive multimodal interfaces.
Performing speech recognition research with hypercard

NASA Technical Reports Server (NTRS)

Shepherd, Chip

1993-01-01

The purpose of this paper is to describe a HyperCard-based system for performing speech recognition research and to instruct Human Factors professionals on how to use the system to obtain detailed data about the user interface of a prototype speech recognition application.
Compressed Speech Technology: Implications for Learning and Instruction.

ERIC Educational Resources Information Center

Sullivan, LeRoy L.

This paper first traces the historical development of speech compression technology, which has made it possible to alter the spoken rate of a pre-recorded message without excessive distortion. Terms used to describe techniques employed as the technology evolved are discussed, including rapid speech, rate altered speech, cut-and-spliced speech, and…
An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

PubMed

Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

2016-08-01

The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the same time, also provided further deep analysis of the speech, which can be useful for the speech therapist.
APEX/SPIN: a free test platform to measure speech intelligibility.

PubMed

Francart, Tom; Hofmann, Michael; Vanthornhout, Jonas; Van Deun, Lieselot; van Wieringen, Astrid; Wouters, Jan

2017-02-01

Measuring speech intelligibility in quiet and noise is important in clinical practice and research. An easy-to-use free software platform for conducting speech tests is presented, called APEX/SPIN. The APEX/SPIN platform allows the use of any speech material in combination with any noise. A graphical user interface provides control over a large range of parameters, such as number of loudspeakers, signal-to-noise ratio and parameters of the procedure. An easy-to-use graphical interface is provided for calibration and storage of calibration values. To validate the platform, perception of words in quiet and sentences in noise were measured both with APEX/SPIN and with an audiometer and CD player, which is a conventional setup in current clinical practice. Five normal-hearing listeners participated in the experimental evaluation. Speech perception results were similar for the APEX/SPIN platform and conventional procedures. APEX/SPIN is a freely available and open source platform that allows the administration of all kinds of custom speech perception tests and procedures.

Speech-generating devices: effectiveness of interface design-a comparative study of autism spectrum disorders.

PubMed

Chen, Chien-Hsu; Wang, Chuan-Po; Lee, I-Jui; Su, Chris Chun-Chin

2016-01-01

We analyzed the efficacy of the interface design of speech generating devices on three non-verbal adolescents with autism spectrum disorder (ASD), in hopes of improving their on-campus communication and cognitive disability. The intervention program was created based on their social and communication needs in school. Two operating interfaces were designed and compared: the Hierarchical Relating Menu and the Pie Abbreviation-Expansion Menu. The experiment used the ABCACB multiple-treatment reversal design. The test items included: (1) accuracy of operating identification; (2) interface operation in response to questions; (3) degree of independent completion. Each of these three items improved with both intervention interfaces. The children were able to operate the interfaces skillfully and respond to questions accurately, which evidenced the effectiveness of the interfaces. We conclude that both interfaces are efficacious enough to help nonverbal children with ASD at different levels.
Security for Telecommuting and Broadband Communications: Recommendations of the National Institute of Standards and Technology

DTIC Science & Technology

2002-08-01

Aware http://www.lavasoftusa.com/ Adware, Alexa 1.0-5.0, Aureate 1.0-3.0, Comet Cursor 1.0-2.0, Cydoor, Doubleclick, DSSAgent, EverAd, EzUla...Internet. Known as “voice over IP” (VOIP), the services convert speech to Internet messages and transmit them to a facility that interfaces with the...reporting, handling, prevention, and recognition . National Information Assurance Partnership (NIAP) - http://www.niap.nist.gov/ NIAP is a U.S
How Speech Communication Training Interfaces with Public Relations Training.

ERIC Educational Resources Information Center

Bosley, Phyllis B.

Speech communication training is a valuable asset for those entering the public relations (PR) field. This notion is reinforced by the 1987 "Design for Undergraduate Public Relations Education," a guide for implementing speech communication courses within a public relations curriculum, and also in the incorporation of oral communication training…
Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

PubMed Central

Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Savariaux, Christophe; Yvert, Blaise

2016-01-01

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer. PMID:27880768
A model of serial order problems in fluent, stuttered and agrammatic speech.

PubMed

Howell, Peter

2007-10-01

Many models of speech production have attempted to explain dysfluent speech. Most models assume that the disruptions that occur when speech is dysfluent arise because the speakers make errors while planning an utterance. In this contribution, a model of the serial order of speech is described that does not make this assumption. It involves the coordination or 'interlocking' of linguistic planning and execution stages at the language-speech interface. The model is examined to determine whether it can distinguish two forms of dysfluent speech (stuttered and agrammatic speech) that are characterized by iteration and omission of whole words and parts of words.
Virtual personal assistance

NASA Astrophysics Data System (ADS)

Aditya, K.; Biswadeep, G.; Kedar, S.; Sundar, S.

2017-11-01

Human computer communication has growing demand recent days. The new generation of autonomous technology aspires to give computer interfaces emotional states that relate and consider user as well as system environment considerations. In the existing computational model is based an artificial intelligent and externally by multi-modal expression augmented with semi human characteristics. But the main problem with is multi-model expression is that the hardware control given to the Artificial Intelligence (AI) is very limited. So, in our project we are trying to give the Artificial Intelligence (AI) more control on the hardware. There are two main parts such as Speech to Text (STT) and Text to Speech (TTS) engines are used accomplish the requirement. In this work, we are using a raspberry pi 3, a speaker and a mic as hardware and for the programing part, we are using python scripting.
Speech systems research at Texas Instruments

NASA Technical Reports Server (NTRS)

Doddington, George R.

1977-01-01

An assessment of automatic speech processing technology is presented. Fundamental problems in the development and the deployment of automatic speech processing systems are defined and a technology forecast for speech systems is presented.
Sperry Univac speech communications technology

NASA Technical Reports Server (NTRS)

Medress, Mark F.

1977-01-01

Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.
Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

PubMed

Fels, S S; Hinton, G E

1997-01-01

Glove-Talk II is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-Talk II uses several input devices, a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. With Glove-Talk II, the subject can speak slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Application of advanced speech technology in manned penetration bombers

NASA Astrophysics Data System (ADS)

North, R.; Lea, W.

1982-03-01

This report documents research on the potential use of speech technology in a manned penetration bomber aircraft (B-52/G and H). The objectives of the project were to analyze the pilot/copilot crewstation tasks over a three-hour-and forty-minute mission and determine the tasks that would benefit the most from conversion to speech recognition/generation, determine the technological feasibility of each of the identified tasks, and prioritize these tasks based on these criteria. Secondary objectives of the program were to enunciate research strategies in the application of speech technologies in airborne environments, and develop guidelines for briefing user commands on the potential of using speech technologies in the cockpit. The results of this study indicated that for the B-52 crewmember, speech recognition would be most beneficial for retrieving chart and procedural data that is contained in the flight manuals. Technological feasibility of these tasks indicated that the checklist and procedural retrieval tasks would be highly feasible for a speech recognition system.
Incorporating Speech Recognition into a Natural User Interface

NASA Technical Reports Server (NTRS)

Chapa, Nicholas

2017-01-01

The Augmented/ Virtual Reality (AVR) Lab has been working to study the applicability of recent virtual and augmented reality hardware and software to KSC operations. This includes the Oculus Rift, HTC Vive, Microsoft HoloLens, and Unity game engine. My project in this lab is to integrate voice recognition and voice commands into an easy to modify system that can be added to an existing portion of a Natural User Interface (NUI). A NUI is an intuitive and simple to use interface incorporating visual, touch, and speech recognition. The inclusion of speech recognition capability will allow users to perform actions or make inquiries using only their voice. The simplicity of needing only to speak to control an on-screen object or enact some digital action means that any user can quickly become accustomed to using this system. Multiple programs were tested for use in a speech command and recognition system. Sphinx4 translates speech to text using a Hidden Markov Model (HMM) based Language Model, an Acoustic Model, and a word Dictionary running on Java. PocketSphinx had similar functionality to Sphinx4 but instead ran on C. However, neither of these programs were ideal as building a Java or C wrapper slowed performance. The most ideal speech recognition system tested was the Unity Engine Grammar Recognizer. A Context Free Grammar (CFG) structure is written in an XML file to specify the structure of phrases and words that will be recognized by Unity Grammar Recognizer. Using Speech Recognition Grammar Specification (SRGS) 1.0 makes modifying the recognized combinations of words and phrases very simple and quick to do. With SRGS 1.0, semantic information can also be added to the XML file, which allows for even more control over how spoken words and phrases are interpreted by Unity. Additionally, using a CFG with SRGS 1.0 produces a Finite State Machine (FSM) functionality limiting the potential for incorrectly heard words or phrases. The purpose of my project was to investigate options for a Speech Recognition System. To that end I attempted to integrate Sphinx4 into a user interface. Sphinx4 had great accuracy and is the only free program able to perform offline speech dictation. However it had a limited dictionary of words that could be recognized, single syllable words were almost impossible for it to hear, and since it ran on Java it could not be integrated into the Unity based NUI. PocketSphinx ran much faster than Sphinx4 which would've made it ideal as a plugin to the Unity NUI, unfortunately creating a C# wrapper for the C code made the program unusable with Unity due to the wrapper slowing code execution and class files becoming unreachable. Unity Grammar Recognizer is the ideal speech recognition interface, it is flexible in recognizing multiple variations of the same command. It is also the most accurate program in recognizing speech due to using an XML grammar to specify speech structure instead of relying solely on a Dictionary and Language model. The Unity Grammar Recognizer will be used with the NUI for these reasons as well as being written in C# which further simplifies the incorporation.
Effect of technological advances on cochlear implant performance in adults.

PubMed

Lenarz, Minoo; Joseph, Gert; Sönmez, Hasibe; Büchner, Andreas; Lenarz, Thomas

2011-12-01

To evaluate the effect of technological advances in the past 20 years on the hearing performance of a large cohort of adult cochlear implant (CI) patients. Individual, retrospective, cohort study. According to technological developments in electrode design and speech-processing strategies, we defined five virtual intervals on the time scale between 1984 and 2008. A cohort of 1,005 postlingually deafened adults was selected for this study, and their hearing performance with a CI was evaluated retrospectively according to these five technological intervals. The test battery was composed of four standard German speech tests: Freiburger monosyllabic test, speech tracking test, Hochmair-Schulz-Moser (HSM) sentence test in quiet, and HSM sentence test in 10 dB noise. The direct comparison of the speech perception in postlingually deafened adults, who were implanted during different technological periods, reveals an obvious improvement in the speech perception in patients who benefited from the recent electrode designs and speech-processing strategies. The major influence of technological advances on CI performance seems to be on speech perception in noise. Better speech perception in noisy surroundings is strong proof for demonstrating the success rate of new electrode designs and speech-processing strategies. Standard (internationally comparable) speech tests in noise should become an obligatory part of the postoperative test battery for adult CI patients. Copyright © 2011 The American Laryngological, Rhinological, and Otological Society, Inc.
Little Houses and Casas Pequenas: Message Formulation and Syntactic Form in Unscripted Speech with Speakers of English and Spanish

ERIC Educational Resources Information Center

Brown-Schmidt, Sarah; Konopka, Agnieszka E.

2008-01-01

During unscripted speech, speakers coordinate the formulation of pre-linguistic messages with the linguistic processes that implement those messages into speech. We examine the process of constructing a contextually appropriate message and interfacing that message with utterance planning in English ("the small butterfly") and Spanish ("la mariposa…
Texting while driving: is speech-based text entry less risky than handheld text entry?

PubMed

He, J; Chaparro, A; Nguyen, B; Burge, R J; Crandall, J; Chaparro, B; Ni, R; Cao, S

2014-11-01

Research indicates that using a cell phone to talk or text while maneuvering a vehicle impairs driving performance. However, few published studies directly compare the distracting effects of texting using a hands-free (i.e., speech-based interface) versus handheld cell phone, which is an important issue for legislation, automotive interface design and driving safety training. This study compared the effect of speech-based versus handheld text entries on simulated driving performance by asking participants to perform a car following task while controlling the duration of a secondary text-entry task. Results showed that both speech-based and handheld text entries impaired driving performance relative to the drive-only condition by causing more variation in speed and lane position. Handheld text entry also increased the brake response time and increased variation in headway distance. Text entry using a speech-based cell phone was less detrimental to driving performance than handheld text entry. Nevertheless, the speech-based text entry task still significantly impaired driving compared to the drive-only condition. These results suggest that speech-based text entry disrupts driving, but reduces the level of performance interference compared to text entry with a handheld device. In addition, the difference in the distraction effect caused by speech-based and handheld text entry is not simply due to the difference in task duration. Copyright © 2014 Elsevier Ltd. All rights reserved.
Glove-TalkII--a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

PubMed

Fels, S S; Hinton, G E

1998-01-01

Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a ContactGlove, a three-space tracker, and a foot pedal), a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Technologies for the Study of Speech: Review and an Application

ERIC Educational Resources Information Center

Babatsouli, Elena

2015-01-01

Technologies used for the study of speech are classified here into non-intrusive and intrusive. The paper informs on current non-intrusive technologies that are used for linguistic investigations of the speech signal, both phonological and phonetic. Providing a point of reference, the review covers existing technological advances in language…
Speech-Enabled Interfaces for Travel Information Systems with Large Grammars

NASA Astrophysics Data System (ADS)

Zhao, Baoli; Allen, Tony; Bargiela, Andrzej

This paper introduces three grammar-segmentation methods capable of handling the large grammar issues associated with producing a real-time speech-enabled VXML bus travel application for London. Large grammars tend to produce relatively slow recognition interfaces and this work shows how this limitation can be successfully addressed. Comparative experimental results show that the novel last-word recognition based grammar segmentation method described here achieves an optimal balance between recognition rate, speed of processing and naturalness of interaction.
A video, text, and speech-driven realistic 3-d virtual head for human-machine interface.

PubMed

Yu, Jun; Wang, Zeng-Fu

2015-05-01

A multiple inputs-driven realistic facial animation system based on 3-D virtual head for human-machine interface is proposed. The system can be driven independently by video, text, and speech, thus can interact with humans through diverse interfaces. The combination of parameterized model and muscular model is used to obtain a tradeoff between computational efficiency and high realism of 3-D facial animation. The online appearance model is used to track 3-D facial motion from video in the framework of particle filtering, and multiple measurements, i.e., pixel color value of input image and Gabor wavelet coefficient of illumination ratio image, are infused to reduce the influence of lighting and person dependence for the construction of online appearance model. The tri-phone model is used to reduce the computational consumption of visual co-articulation in speech synchronized viseme synthesis without sacrificing any performance. The objective and subjective experiments show that the system is suitable for human-machine interaction.
Multimodal user interfaces to improve social integration of elderly and mobility impaired.

PubMed

Dias, Miguel Sales; Pires, Carlos Galinho; Pinto, Fernando Miguel; Teixeira, Vítor Duarte; Freitas, João

2012-01-01

Technologies for Human-Computer Interaction (HCI) and Communication have evolved tremendously over the past decades. However, citizens such as mobility impaired or elderly or others, still face many difficulties interacting with communication services, either due to HCI issues or intrinsic design problems with the services. In this paper we start by presenting the results of two user studies, the first one conducted with a group of mobility impaired users, comprising paraplegic and quadriplegic individuals; and the second one with elderly. The study participants carried out a set of tasks with a multimodal (speech, touch, gesture, keyboard and mouse) and multi-platform (mobile, desktop) system, offering an integrated access to communication and entertainment services, such as email, agenda, conferencing, instant messaging and social media, referred to as LHC - Living Home Center. The system was designed to take into account the requirements captured from these users, with the objective of evaluating if the adoption of multimodal interfaces for audio-visual communication and social media services, could improve the interaction with such services. Our study revealed that a multimodal prototype system, offering natural interaction modalities, especially supporting speech and touch, can in fact improve access to the presented services, contributing to the reduction of social isolation of mobility impaired, as well as elderly, and improving their digital inclusion.
Speech emotion recognition methods: A literature review

NASA Astrophysics Data System (ADS)

Basharirad, Babak; Moradhaseli, Mohammadreza

2017-10-01

Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.

Technology and Speech Training: An Affair to Remember.

ERIC Educational Resources Information Center

Levitt, Harry

1989-01-01

A history of speech training technology is presented, from the simple hand-held mirror to complicated computer-based systems and tactile devices, and subsequent papers in this theme issue are introduced. Both the advantages and problems of technological aids are addressed. Simplicity in the application and use of speech training aids is stressed.…
A Dual-Mode Human Computer Interface Combining Speech and Tongue Motion for People with Severe Disabilities

PubMed Central

Huo, Xueliang; Park, Hangue; Kim, Jeonghee; Ghovanloo, Maysam

2015-01-01

We are presenting a new wireless and wearable human computer interface called the dual-mode Tongue Drive System (dTDS), which is designed to allow people with severe disabilities to use computers more effectively with increased speed, flexibility, usability, and independence through their tongue motion and speech. The dTDS detects users’ tongue motion using a magnetic tracer and an array of magnetic sensors embedded in a compact and ergonomic wireless headset. It also captures the users’ voice wirelessly using a small microphone embedded in the same headset. Preliminary evaluation results based on 14 able-bodied subjects and three individuals with high level spinal cord injuries at level C3–C5 indicated that the dTDS headset, combined with a commercially available speech recognition (SR) software, can provide end users with significantly higher performance than either unimodal forms based on the tongue motion or speech alone, particularly in completing tasks that require both pointing and text entry. PMID:23475380
MediLink: a wearable telemedicine system for emergency and mobile applications.

PubMed

Koval, T; Dudziak, M

1999-01-01

The practical needs of the medical professional faced with critical care or emergency situations differ from those working in many environments where telemedicine and mobile computing have been introduced and tested. One constructive criticism of the telemedicine initiative has been to question what positive benefits are gained from videoconferencing, paperless transactions, and online access to patient record. With a goal of producing a positive answer to such questions an architecture for multipurpose mobile telemedicine applications has been developed. The core technology is based upon a wearable personal computer with a smart-card interface coupled with speech, pen, video input and wireless intranet connectivity. The TransPAC system with the MedLink software system is designed to provide an integrated solution for a broad range of health care functions where mobile and hands-free or limited-access systems are preferred or necessary and where the capabilities of other mobile devices are insufficient or inappropriate. Structured and noise-resistant speech-to-text interfacing plus the use of a web browser-like display, accessible through either a flatpanel, standard, or headset monitor, gives the beltpack TransPAC computer the functions of a complete desktop including PCMCIA card interfaces for internet connectivity and a secure smartcard with 16-bit microprocessor and upwards of 64K memory. The card acts to provide user access control for security, user custom configuration of applications and display and vocabulary, and memory to diminish the need for PC-server communications while in an active session. TransPAC is being implemented for EMT and ER staff usage.
Evaluation of Speech Recognition of Cochlear Implant Recipients Using Adaptive, Digital Remote Microphone Technology and a Speech Enhancement Sound Processing Algorithm.

PubMed

Wolfe, Jace; Morais, Mila; Schafer, Erin; Agrawal, Smita; Koch, Dawn

2015-05-01

Cochlear implant recipients often experience difficulty with understanding speech in the presence of noise. Cochlear implant manufacturers have developed sound processing algorithms designed to improve speech recognition in noise, and research has shown these technologies to be effective. Remote microphone technology utilizing adaptive, digital wireless radio transmission has also been shown to provide significant improvement in speech recognition in noise. There are no studies examining the potential improvement in speech recognition in noise when these two technologies are used simultaneously. The goal of this study was to evaluate the potential benefits and limitations associated with the simultaneous use of a sound processing algorithm designed to improve performance in noise (Advanced Bionics ClearVoice) and a remote microphone system that incorporates adaptive, digital wireless radio transmission (Phonak Roger). A two-by-two way repeated measures design was used to examine performance differences obtained without these technologies compared to the use of each technology separately as well as the simultaneous use of both technologies. Eleven Advanced Bionics (AB) cochlear implant recipients, ages 11 to 68 yr. AzBio sentence recognition was measured in quiet and in the presence of classroom noise ranging in level from 50 to 80 dBA in 5-dB steps. Performance was evaluated in four conditions: (1) No ClearVoice and no Roger, (2) ClearVoice enabled without the use of Roger, (3) ClearVoice disabled with Roger enabled, and (4) simultaneous use of ClearVoice and Roger. Speech recognition in quiet was better than speech recognition in noise for all conditions. Use of ClearVoice and Roger each provided significant improvement in speech recognition in noise. The best performance in noise was obtained with the simultaneous use of ClearVoice and Roger. ClearVoice and Roger technology each improves speech recognition in noise, particularly when used at the same time. Because ClearVoice does not degrade performance in quiet settings, clinicians should consider recommending ClearVoice for routine, full-time use for AB implant recipients. Roger should be used in all instances in which remote microphone technology may assist the user in understanding speech in the presence of noise. American Academy of Audiology.
Generating Contextual Descriptions of Virtual Reality (VR) Spaces

NASA Astrophysics Data System (ADS)

Olson, D. M.; Zaman, C. H.; Sutherland, A.

2017-12-01

Virtual reality holds great potential for science communication, education, and research. However, interfaces for manipulating data and environments in virtual worlds are limited and idiosyncratic. Furthermore, speech and vision are the primary modalities by which humans collect information about the world, but the linking of visual and natural language domains is a relatively new pursuit in computer vision. Machine learning techniques have been shown to be effective at image and speech classification, as well as at describing images with language (Karpathy 2016), but have not yet been used to describe potential actions. We propose a technique for creating a library of possible context-specific actions associated with 3D objects in immersive virtual worlds based on a novel dataset generated natively in virtual reality containing speech, image, gaze, and acceleration data. We will discuss the design and execution of a user study in virtual reality that enabled the collection and the development of this dataset. We will also discuss the development of a hybrid machine learning algorithm linking vision data with environmental affordances in natural language. Our findings demonstrate that it is possible to develop a model which can generate interpretable verbal descriptions of possible actions associated with recognized 3D objects within immersive VR environments. This suggests promising applications for more intuitive user interfaces through voice interaction within 3D environments. It also demonstrates the potential to apply vast bodies of embodied and semantic knowledge to enrich user interaction within VR environments. This technology would allow for applications such as expert knowledge annotation of 3D environments, complex verbal data querying and object manipulation in virtual spaces, and computer-generated, dynamic 3D object affordances and functionality during simulations.
Speech Output Technologies in Interventions for Individuals with Autism Spectrum Disorders: A Scoping Review.

PubMed

Schlosser, Ralf W; Koul, Rajinder K

2015-01-01

The purpose of this scoping review was to (a) map the research evidence on the effectiveness of augmentative and alternative communication (AAC) interventions using speech output technologies (e.g., speech-generating devices, mobile technologies with AAC-specific applications, talking word processors) for individuals with autism spectrum disorders, (b) identify gaps in the existing literature, and (c) posit directions for future research. Outcomes related to speech, language, and communication were considered. A total of 48 studies (47 single case experimental designs and 1 randomized control trial) involving 187 individuals were included. Results were reviewed in terms of three study groupings: (a) studies that evaluated the effectiveness of treatment packages involving speech output, (b) studies comparing one treatment package with speech output to other AAC modalities, and (c) studies comparing the presence with the absence of speech output. The state of the evidence base is discussed and several directions for future research are posited.
Speech perception at the interface of neurobiology and linguistics.

PubMed

Poeppel, David; Idsardi, William J; van Wassenhove, Virginie

2008-03-12

Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.
Speech recognition technology: an outlook for human-to-machine interaction.

PubMed

Erdel, T; Crooks, S

2000-01-01

Speech recognition, as an enabling technology in healthcare-systems computing, is a topic that has been discussed for quite some time, but is just now coming to fruition. Traditionally, speech-recognition software has been constrained by hardware, but improved processors and increased memory capacities are starting to remove some of these limitations. With these barriers removed, companies that create software for the healthcare setting have the opportunity to write more successful applications. Among the criticisms of speech-recognition applications are the high rates of error and steep training curves. However, even in the face of such negative perceptions, there remains significant opportunities for speech recognition to allow healthcare providers and, more specifically, physicians, to work more efficiently and ultimately spend more time with their patients and less time completing necessary documentation. This article will identify opportunities for inclusion of speech-recognition technology in the healthcare setting and examine major categories of speech-recognition software--continuous speech recognition, command and control, and text-to-speech. We will discuss the advantages and disadvantages of each area, the limitations of the software today, and how future trends might affect them.
A Sign Language Screen Reader for Deaf

NASA Astrophysics Data System (ADS)

El Ghoul, Oussama; Jemni, Mohamed

Screen reader technology has appeared first to allow blind and people with reading difficulties to use computer and to access to the digital information. Until now, this technology is exploited mainly to help blind community. During our work with deaf people, we noticed that a screen reader can facilitate the manipulation of computers and the reading of textual information. In this paper, we propose a novel screen reader dedicated to deaf. The output of the reader is a visual translation of the text to sign language. The screen reader is composed by two essential modules: the first one is designed to capture the activities of users (mouse and keyboard events). For this purpose, we adopted Microsoft MSAA application programming interfaces. The second module, which is in classical screen readers a text to speech engine (TTS), is replaced by a novel text to sign (TTSign) engine. This module converts text into sign language animation based on avatar technology.
Trends in communicative access solutions for children with cerebral palsy.

PubMed

Myrden, Andrew; Schudlo, Larissa; Weyand, Sabine; Zeyl, Timothy; Chau, Tom

2014-08-01

Access solutions may facilitate communication in children with limited functional speech and motor control. This study reviews current trends in access solution development for children with cerebral palsy, with particular emphasis on the access technology that harnesses a control signal from the user (eg, movement or physiological change) and the output device (eg, augmentative and alternative communication system) whose behavior is modulated by the user's control signal. Access technologies have advanced from simple mechanical switches to machine vision (eg, eye-gaze trackers), inertial sensing, and emerging physiological interfaces that require minimal physical effort. Similarly, output devices have evolved from bulky, dedicated hardware with limited configurability, to platform-agnostic, highly personalized mobile applications. Emerging case studies encourage the consideration of access technology for all nonverbal children with cerebral palsy with at least nascent contingency awareness. However, establishing robust evidence of the effectiveness of the aforementioned advances will require more expansive studies. © The Author(s) 2014.
Practical applications of interactive voice technologies: Some accomplishments and prospects

NASA Technical Reports Server (NTRS)

Grady, Michael W.; Hicklin, M. B.; Porter, J. E.

1977-01-01

A technology assessment of the application of computers and electronics to complex systems is presented. Three existing systems which utilize voice technology (speech recognition and speech generation) are described. Future directions in voice technology are also described.
Voice-processing technologies--their application in telecommunications.

PubMed Central

Wilpon, J G

1995-01-01

As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper. Images Fig. 1 PMID:7479815
Neuronal populations in the occipital cortex of the blind synchronize to the temporal dynamics of speech

PubMed Central

Van Ackeren, Markus Johannes; Barbero, Francesca M; Mattioni, Stefania; Bottini, Roberto

2018-01-01

The occipital cortex of early blind individuals (EB) activates during speech processing, challenging the notion of a hard-wired neurobiology of language. But, at what stage of speech processing do occipital regions participate in EB? Here we demonstrate that parieto-occipital regions in EB enhance their synchronization to acoustic fluctuations in human speech in the theta-range (corresponding to syllabic rate), irrespective of speech intelligibility. Crucially, enhanced synchronization to the intelligibility of speech was selectively observed in primary visual cortex in EB, suggesting that this region is at the interface between speech perception and comprehension. Moreover, EB showed overall enhanced functional connectivity between temporal and occipital cortices that are sensitive to speech intelligibility and altered directionality when compared to the sighted group. These findings suggest that the occipital cortex of the blind adopts an architecture that allows the tracking of speech material, and therefore does not fully abstract from the reorganized sensory inputs it receives. PMID:29338838
Building Searchable Collections of Enterprise Speech Data.

ERIC Educational Resources Information Center

Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret

The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…
Military and Government Applications of Human-Machine Communication by Voice

NASA Astrophysics Data System (ADS)

Weinstein, Clifford J.

1995-10-01

This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs.
Prototype app for voice therapy: a peer review.

PubMed

Lavaissiéri, Paula; Melo, Paulo Eduardo Damasceno

2017-03-09

Voice speech therapy promotes changes in patients' voice-related habits and rehabilitation. Speech-language therapists use a host of materials ranging from pictures to electronic resources and computer tools as aids in this process. Mobile technology is attractive, interactive and a nearly constant feature in the daily routine of a large part of the population and has a growing application in healthcare. To develop a prototype application for voice therapy, submit it to peer assessment, and to improve the initial prototype based on these assessments. a prototype of the Q-Voz application was developed based on Apple's Human Interface Guidelines. The prototype was analyzed by seven speech therapists who work in the voice area. Improvements to the product were made based on these assessments. all features of the application were considered satisfactory by most evaluators. All evaluators found the application very useful; evaluators reported that patients would find it easier to make changes in voice behavior with the application than without it; the evaluators stated they would use this application with their patients with dysphonia and in the process of rehabilitation and that the application offers useful tools for voice self-management. Based on the suggestions provided, six improvements were made to the prototype. the prototype Q-Voz Application was developed and evaluated by seven judges and subsequently improved. All evaluators stated they would use the application with their patients undergoing rehabilitation, indicating that the Q-Voz Application for mobile devices can be considered an auxiliary tool for voice speech therapy.
Cochlear implant – state of the art

PubMed Central

Lenarz, Thomas

2018-01-01

Cochlear implants are the treatment of choice for auditory rehabilitation of patients with sensory deafness. They restore the missing function of inner hair cells by transforming the acoustic signal into electrical stimuli for activation of auditory nerve fibers. Due to the very fast technology development, cochlear implants provide open-set speech understanding in the majority of patients including the use of the telephone. Children can achieve a near to normal speech and language development provided their deafness is detected early after onset and implantation is performed quickly thereafter. The diagnostic procedure as well as the surgical technique have been standardized and can be adapted to the individual anatomical and physiological needs both in children and adults. Special cases such as cochlear obliteration might require special measures and re-implantation, which can be done in most cases in a straight forward way. Technology upgrades count for better performance. Future developments will focus on better electrode-nerve interfaces by improving electrode technology. An increased number of electrical contacts as well as the biological treatment with regeneration of the dendrites growing onto the electrode will increase the number of electrical channels. This will give room for improved speech coding strategies in order to create the bionic ear, i.e. to restore the process of natural hearing by means of technology. The robot-assisted surgery will allow for high precision surgery and reliable hearing preservation. Biological therapies will support the bionic ear. Methods are bio-hybrid electrodes, which are coded by stem cells transplanted into the inner ear to enhance auto-production of neurotrophins. Local drug delivery will focus on suppression of trauma reaction and local regeneration. Gene therapy by nanoparticles will hopefully lead to the preservation of residual hearing in patients being affected by genetic hearing loss. Overall the cochlear implant is a very powerful tool to rehabilitate patients with sensory deafness. More than 1 million of candidates in Germany today could benefit from this high technology auditory implant. Only 50,000 are implanted so far. In the future, the procedure can be done under local anesthesia, will be minimally invasive and straight forward. Hearing preservation will be routine. PMID:29503669
Dialogue enabling speech-to-text user assistive agent system for hearing-impaired person.

PubMed

Lee, Seongjae; Kang, Sunmee; Han, David K; Ko, Hanseok

2016-06-01

A novel approach for assisting bidirectional communication between people of normal hearing and hearing-impaired is presented. While the existing hearing-impaired assistive devices such as hearing aids and cochlear implants are vulnerable in extreme noise conditions or post-surgery side effects, the proposed concept is an alternative approach wherein spoken dialogue is achieved by means of employing a robust speech recognition technique which takes into consideration of noisy environmental factors without any attachment into human body. The proposed system is a portable device with an acoustic beamformer for directional noise reduction and capable of performing speech-to-text transcription function, which adopts a keyword spotting method. It is also equipped with an optimized user interface for hearing-impaired people, rendering intuitive and natural device usage with diverse domain contexts. The relevant experimental results confirm that the proposed interface design is feasible for realizing an effective and efficient intelligent agent for hearing-impaired.
Voice synthesis application

NASA Astrophysics Data System (ADS)

Lightstone, P. C.; Davidson, W. M.

1982-04-01

The military detection assessment laboratory houses an experimental field system which assesses different alarm indicators such as fence disturbance sensors, MILES cables, and microwave Racons. A speech synthesis board which could be interfaced, by means of a computer, to an alarm logger making verbal acknowledgement of alarms possible was purchased. Different products and different types of voice synthesis were analyzed before a linear predictive code device produced by Telesensory Speech Systems of Palo Alto, California was chosen. This device is called the Speech 1000 Board and has a dedicated 8085 processor. A multiplexer card was designed and the Sp 1000 interfaced through the card into a TMS 990/100M Texas Instrument microcomputer. It was also necessary to design the software with the capability of recognizing and flagging an alarm on any 1 of 32 possible lines. The experimental field system was then packaged with a dc power supply, LED indicators, speakers, and switches, and deployed in the field performing reliably.
Temporal order processing of syllables in the left parietal lobe.

PubMed

Moser, Dana; Baker, Julie M; Sanchez, Carmen E; Rorden, Chris; Fridriksson, Julius

2009-10-07

Speech processing requires the temporal parsing of syllable order. Individuals suffering from posterior left hemisphere brain injury often exhibit temporal processing deficits as well as language deficits. Although the right posterior inferior parietal lobe has been implicated in temporal order judgments (TOJs) of visual information, there is limited evidence to support the role of the left inferior parietal lobe (IPL) in processing syllable order. The purpose of this study was to examine whether the left inferior parietal lobe is recruited during temporal order judgments of speech stimuli. Functional magnetic resonance imaging data were collected on 14 normal participants while they completed the following forced-choice tasks: (1) syllable order of multisyllabic pseudowords, (2) syllable identification of single syllables, and (3) gender identification of both multisyllabic and monosyllabic speech stimuli. Results revealed increased neural recruitment in the left inferior parietal lobe when participants made judgments about syllable order compared with both syllable identification and gender identification. These findings suggest that the left inferior parietal lobe plays an important role in processing syllable order and support the hypothesized role of this region as an interface between auditory speech and the articulatory code. Furthermore, a breakdown in this interface may explain some components of the speech deficits observed after posterior damage to the left hemisphere.

Temporal Order Processing of Syllables in the Left Parietal Lobe

PubMed Central

Baker, Julie M.; Sanchez, Carmen E.; Rorden, Chris; Fridriksson, Julius

2009-01-01

Speech processing requires the temporal parsing of syllable order. Individuals suffering from posterior left hemisphere brain injury often exhibit temporal processing deficits as well as language deficits. Although the right posterior inferior parietal lobe has been implicated in temporal order judgments (TOJs) of visual information, there is limited evidence to support the role of the left inferior parietal lobe (IPL) in processing syllable order. The purpose of this study was to examine whether the left inferior parietal lobe is recruited during temporal order judgments of speech stimuli. Functional magnetic resonance imaging data were collected on 14 normal participants while they completed the following forced-choice tasks: (1) syllable order of multisyllabic pseudowords, (2) syllable identification of single syllables, and (3) gender identification of both multisyllabic and monosyllabic speech stimuli. Results revealed increased neural recruitment in the left inferior parietal lobe when participants made judgments about syllable order compared with both syllable identification and gender identification. These findings suggest that the left inferior parietal lobe plays an important role in processing syllable order and support the hypothesized role of this region as an interface between auditory speech and the articulatory code. Furthermore, a breakdown in this interface may explain some components of the speech deficits observed after posterior damage to the left hemisphere. PMID:19812331
Interfaces. Working Papers in Linguistics No. 32.

ERIC Educational Resources Information Center

Zwicky, Arnold M.

The papers collected here concern the interfaces between various components of grammar (semantics, syntax, morphology, and phonology) and between grammar itself and various extragrammatical domains. They include: "The OSU Random, Unorganized Collection of Speech Act Examples"; "In and Out in Phonology"; "Forestress and…
A perspective on early commercial applications of voice-processing technology for telecommunications and aids for the handicapped.

PubMed Central

Seelbach, C

1995-01-01

The Colloquium on Human-Machine Communication by Voice highlighted the global technical community's focus on the problems and promise of voice-processing technology, particularly, speech recognition and speech synthesis. Clearly, there are many areas in both the research and development of these technologies that can be advanced significantly. However, it is also true that there are many applications of these technologies that are capable of commercialization now. Early successful commercialization of new technology is vital to ensure continuing interest in its development. This paper addresses efforts to commercialize speech technologies in two markets: telecommunications and aids for the handicapped. PMID:7479814
Developing and Evaluating an Oral Skills Training Website Supported by Automatic Speech Recognition Technology

ERIC Educational Resources Information Center

Chen, Howard Hao-Jan

2011-01-01

Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…
Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

NASA Astrophysics Data System (ADS)

Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

2017-09-01

This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.
Probing the Electrode–Neuron Interface With Focused Cochlear Implant Stimulation

PubMed Central

Bierer, Julie Arenberg

2010-01-01

Cochlear implants are highly successful neural prostheses for persons with severe or profound hearing loss who gain little benefit from hearing aid amplification. Although implants are capable of providing important spectral and temporal cues for speech perception, performance on speech tests is variable across listeners. Psychophysical measures obtained from individual implant subjects can also be highly variable across implant channels. This review discusses evidence that such variability reflects deviations in the electrode–neuron interface, which refers to an implant channel's ability to effectively stimulate the auditory nerve. It is proposed that focused electrical stimulation is ideally suited to assess channel-to-channel irregularities in the electrode–neuron interface. In implant listeners, it is demonstrated that channels with relatively high thresholds, as measured with the tripolar configuration, exhibit broader psychophysical tuning curves and smaller dynamic ranges than channels with relatively low thresholds. Broader tuning implies that frequency-specific information intended for one population of neurons in the cochlea may activate more distant neurons, and a compressed dynamic range could make it more difficult to resolve intensity-based information, particularly in the presence of competing noise. Degradation of both types of cues would negatively affect speech perception. PMID:20724356
Terrestrial interface architecture (DSI/DNI)

NASA Astrophysics Data System (ADS)

Rieser, J. H.; Onufry, M.

The 64-kbit/s digital speech interpolation (DSI)/digital noninterpolation (DNI) equipment interfaces the TDMA satellite system with the terrestrial network. This paper provides a functional description of the 64-kbit/s DSI/DNI equipment built at Comsat Laboratories in conformance with the Intelsat TDMA/DSI system specification, and discusses the theoretical and experimental performance of the DSI system. Several DSI-related network and interface issues are discussed, including the interaction between echo-control devices and DSI speech detectors, single and multidestinational DSI operation, location of the DSI equipment relative to the international switching center, and the location and need for Doppler and plesiochronous alignment buffers. The transition from 64-kbit/s DSI to 32-kbit/s low-rate encoding/DSI is expected to begin in 1988. The impact of this transition is discussed as it relates to existing 64-kbit/s DSI/DNI equipment.
Multimodal Interaction with Speech, Gestures and Haptic Feedback in a Media Center Application

NASA Astrophysics Data System (ADS)

Turunen, Markku; Hakulinen, Jaakko; Hella, Juho; Rajaniemi, Juha-Pekka; Melto, Aleksi; Mäkinen, Erno; Rantala, Jussi; Heimonen, Tomi; Laivo, Tuuli; Soronen, Hannu; Hansen, Mervi; Valkama, Pellervo; Miettinen, Toni; Raisamo, Roope

We demonstrate interaction with a multimodal media center application. Mobile phone-based interface includes speech and gesture input and haptic feedback. The setup resembles our long-term public pilot study, where a living room environment containing the application was constructed inside a local media museum allowing visitors to freely test the system.
Comprehension of synthetic speech and digitized natural speech by adults with aphasia.

PubMed

Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E

2017-09-01

Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Military and government applications of human-machine communication by voice.

PubMed Central

Weinstein, C J

1995-01-01

This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6 PMID:7479718
SAM: speech-aware applications in medicine to support structured data entry.

PubMed Central

Wormek, A. K.; Ingenerf, J.; Orthner, H. F.

1997-01-01

In the last two years, improvement in speech recognition technology has directed the medical community's interest to porting and using such innovations in clinical systems. The acceptance of speech recognition systems in clinical domains increases with recognition speed, large medical vocabulary, high accuracy, continuous speech recognition, and speaker independence. Although some commercial speech engines approach these requirements, the greatest benefit can be achieved in adapting a speech recognizer to a specific medical application. The goals of our work are first, to develop a speech-aware core component which is able to establish connections to speech recognition engines of different vendors. This is realized in SAM. Second, with applications based on SAM we want to support the physician in his/her routine clinical care activities. Within the STAMP project (STAndardized Multimedia report generator in Pathology), we extend SAM by combining a structured data entry approach with speech recognition technology. Another speech-aware application in the field of Diabetes care is connected to a terminology server. The server delivers a controlled vocabulary which can be used for speech recognition. PMID:9357730
Being human in a global age of technology.

PubMed

Whelton, Beverly J B

2016-01-01

This philosophical enquiry considers the impact of a global world view and technology on the meaning of being human. The global vision increases our awareness of the common bond between all humans, while technology tends to separate us from an understanding of ourselves as human persons. We review some advances in connecting as community within our world, and many examples of technological changes. This review is not exhaustive. The focus is to understand enough changes to think through the possibility of healthcare professionals becoming cyborgs, human-machine units that are subsequently neither human and nor machine. It is seen that human technology interfaces are a different way of interacting but do not change what it is to be human in our rational capacities of providing meaningful speech and freely chosen actions. In the highly technical environment of the ICU, expert nurses work in harmony with both the technical equipment and the patient. We used Heidegger to consider the nature of equipment, and Descartes to explore unique human capacities. Aristotle, Wallace, Sokolowski, and Clarke provide a summary of humanity as substantial and relational. © 2015 John Wiley & Sons Ltd.
Initial constructs for patient-centered outcome measures to evaluate brain-computer interfaces

PubMed Central

Andresen, Elena M.; Fried-Oken, Melanie; Peters, Betts; Patrick, Donald L.

2016-01-01

Purpose The authors describe preliminary work toward the creation of patient-centered outcome (PCO) measures to evaluate brain-computer interface (BCI) as an assistive technology for individuals with severe speech and physical impairments (SSPI). Method In Phase 1, 591 items from 15 existing measures were mapped to the International Classification of Functioning, Disability and Health (ICF). In Phase 2, qualitative interviews were conducted with eight people with SSPI and seven caregivers. Resulting text data were coded in an iterative analysis. Results Most items (79%) mapped to the ICF environmental domain; over half (53%) mapped to more than one domain. The ICF framework was well suited for mapping items related to body functions and structures, but less so for items in other areas, including personal factors. Two constructs emerged from qualitative data: Quality of Life (QOL) and Assistive Technology. Component domains and themes were identified for each. Conclusions Preliminary constructs, domains, and themes were generated for future PCO measures relevant to BCI. Existing instruments are sufficient for initial items but do not adequately match the values of people with SSPI and their caregivers. Field methods for interviewing people with SSPI were successful, and support the inclusion of these individuals in PCO research. PMID:25806719
Contributions of speech science to the technology of man-machine voice interactions

NASA Technical Reports Server (NTRS)

Lea, Wayne A.

1977-01-01

Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.
Speech recognition: how good is good enough?

PubMed

Krohn, Richard

2002-03-01

Since its infancy in the early 1990s, the technology of speech recognition has undergone a rapid evolution. Not only has the reliability of the programming improved dramatically, the return on investment has become increasingly compelling. The author describes some of the latest health care applications of speech-recognition technology, and how the next advances will be made in this area.
Result on speech perception after conversion from Spectra® to Freedom®.

PubMed

Magalhães, Ana Tereza de Matos; Goffi-Gomez, Maria Valéria Schmidt; Hoshino, Ana Cristina; Tsuji, Robinson Koji; Bento, Ricardo Ferreira; Brito, Rubens

2012-04-01

New technology in the Freedom® speech processor for cochlear implants was developed to improve how incoming acoustic sound is processed; this applies not only for new users, but also for previous generations of cochlear implants. To identify the contribution of this technology-- the Nucleus 22®--on speech perception tests in silence and in noise, and on audiometric thresholds. A cross-sectional cohort study was undertaken. Seventeen patients were selected. The last map based on the Spectra® was revised and optimized before starting the tests. Troubleshooting was used to identify malfunction. To identify the contribution of the Freedom® technology for the Nucleus22®, auditory thresholds and speech perception tests were performed in free field in sound-proof booths. Recorded monosyllables and sentences in silence and in noise (SNR = 0dB) were presented at 60 dBSPL. The nonparametric Wilcoxon test for paired data was used to compare groups. Freedom® applied for the Nucleus22® showed a statistically significant difference in all speech perception tests and audiometric thresholds. The Freedom® technology improved the performance of speech perception and audiometric thresholds of patients with Nucleus 22®.
[A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

PubMed

Wang, Yulin; Tian, Xuelong

2014-08-01

In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients.
A multilingual audiometer simulator software for training purposes.

PubMed

Kompis, Martin; Steffen, Pascal; Caversaccio, Marco; Brugger, Urs; Oesch, Ivo

2012-04-01

A set of algorithms, which allows a computer to determine the answers of simulated patients during pure tone and speech audiometry, is presented. Based on these algorithms, a computer program for training in audiometry was written and found to be useful for teaching purposes. To develop a flexible audiometer simulator software as a teaching and training tool for pure tone and speech audiometry, both with and without masking. First a set of algorithms, which allows a computer to determine the answers of a simulated, hearing-impaired patient, was developed. Then, the software was implemented. Extensive use was made of simple, editable text files to define all texts in the user interface and all patient definitions. The software 'audiometer simulator' is available for free download. It can be used to train pure tone audiometry (both with and without masking), speech audiometry, measurement of the uncomfortable level, and simple simulation tests. Due to the use of text files, the user can alter or add patient definitions and all texts and labels shown on the screen. So far, English, French, German, and Portuguese user interfaces are available and the user can choose between German or French speech audiometry.
Statistical assessment of speech system performance

NASA Technical Reports Server (NTRS)

Moshier, Stephen L.

1977-01-01

Methods for the normalization of performance tests results of speech recognition systems are presented. Technological accomplishments in speech recognition systems, as well as planned research activities are described.
Robot Command Interface Using an Audio-Visual Speech Recognition System

NASA Astrophysics Data System (ADS)

Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

The cortical organization of lexical knowledge: A dual lexicon model of spoken language processing

PubMed Central

Gow, David W.

2012-01-01

Current accounts of spoken language assume the existence of a lexicon where wordforms are stored and interact during spoken language perception, understanding and production. Despite the theoretical importance of the wordform lexicon, the exact localization and function of the lexicon in the broader context of language use is not well understood. This review draws on evidence from aphasia, functional imaging, neuroanatomy, laboratory phonology and behavioral results to argue for the existence of parallel lexica that facilitate different processes in the dorsal and ventral speech pathways. The dorsal lexicon, localized in the inferior parietal region including the supramarginal gyrus, serves as an interface between phonetic and articulatory representations. The ventral lexicon, localized in the posterior superior temporal sulcus and middle temporal gyrus, serves as an interface between phonetic and semantic representations. In addition to their interface roles, the two lexica contribute to the robustness of speech processing. PMID:22498237
TongueToSpeech (TTS): Wearable wireless assistive device for augmented speech.

PubMed

Marjanovic, Nicholas; Piccinini, Giacomo; Kerr, Kevin; Esmailbeigi, Hananeh

2017-07-01

Speech is an important aspect of human communication; individuals with speech impairment are unable to communicate vocally in real time. Our team has developed the TongueToSpeech (TTS) device with the goal of augmenting speech communication for the vocally impaired. The proposed device is a wearable wireless assistive device that incorporates a capacitive touch keyboard interface embedded inside a discrete retainer. This device connects to a computer, tablet or a smartphone via Bluetooth connection. The developed TTS application converts text typed by the tongue into audible speech. Our studies have concluded that an 8-contact point configuration between the tongue and the TTS device would yield the best user precision and speed performance. On average using the TTS device inside the oral cavity takes 2.5 times longer than the pointer finger using a T9 (Text on 9 keys) keyboard configuration to type the same phrase. In conclusion, we have developed a discrete noninvasive wearable device that allows the vocally impaired individuals to communicate in real time.
Joint Spatial-Spectral Feature Space Clustering for Speech Activity Detection from ECoG Signals

PubMed Central

Kanas, Vasileios G.; Mporas, Iosif; Benz, Heather L.; Sgarbas, Kyriakos N.; Bezerianos, Anastasios; Crone, Nathan E.

2014-01-01

Brain machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines (SVM) as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and non-speech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllable repetition tasks and may contribute to the development of portable ECoG-based communication. PMID:24658248
Research in speech communication.

PubMed

Flanagan, J

1995-10-24

Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Communication Supports for People with Motor Speech Disorders

ERIC Educational Resources Information Center

Hanson, Elizabeth K.; Fager, Susan K.

2017-01-01

Communication supports for people with motor speech disorders can include strategies and technologies to supplement natural speech efforts, resolve communication breakdowns, and replace natural speech when necessary to enhance participation in all communicative contexts. This article emphasizes communication supports that can enhance…
Automatic feedback to promote safe walking and speech loudness control in persons with multiple disabilities: two single-case studies.

PubMed

Lancioni, Giulio E; Singh, Nirbhay N; O'Reilly, Mark F; Green, Vanessa A; Alberti, Gloria; Boccasini, Adele; Smaldone, Angela; Oliva, Doretta; Bosco, Andrea

2014-08-01

Assessing automatic feedback technologies to promote safe travel and speech loudness control in two men with multiple disabilities, respectively. The men were involved in two single-case studies. In Study I, the technology involved a microprocessor, two photocells, and a verbal feedback device. The man received verbal alerting/feedback when the photocells spotted an obstacle in front of him. In Study II, the technology involved a sound-detecting unit connected to a throat and an airborne microphone, and to a vibration device. Vibration occurred when the man's speech loudness exceeded a preset level. The man included in Study I succeeded in using the automatic feedback in substitution of caregivers' alerting/feedback for safe travel. The man of Study II used the automatic feedback to successfully reduce his speech loudness. Automatic feedback can be highly effective in helping persons with multiple disabilities improve their travel and speech performance.
The feasibility of miniaturizing the versatile portable speech prosthesis: A market survey of commercial products

NASA Technical Reports Server (NTRS)

Walklet, T.

1981-01-01

The feasibility of a miniature versatile portable speech prosthesis (VPSP) was analyzed and information on its potential users and on other similar devices was collected. The VPSP is a device that incorporates speech synthesis technology. The objective is to provide sufficient information to decide whether there is valuable technology to contribute to the miniaturization of the VPSP. The needs of potential users are identified, the development status of technologies similar or related to those used in the VPSP are evaluated. The VPSP, a computer based speech synthesis system fits on a wheelchair. The purpose was to produce a device that provides communication assistance in educational, vocational, and social situations to speech impaired individuals. It is expected that the VPSP can be a valuable aid for persons who are also motor impaired, which explains the placement of the system on a wheelchair.
Processing of speech signals for physical and sensory disabilities.

PubMed Central

Levitt, H

1995-01-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities. Images Fig. 4 PMID:7479816
Processing of Speech Signals for Physical and Sensory Disabilities

NASA Astrophysics Data System (ADS)

Levitt, Harry

1995-10-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.
Voice technology and BBN

NASA Technical Reports Server (NTRS)

Wolf, Jared J.

1977-01-01

The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.
Hands-free human-machine interaction with voice

NASA Astrophysics Data System (ADS)

Juang, B. H.

2004-05-01

Voice is natural communication interface between a human and a machine. The machine, when placed in today's communication networks, may be configured to provide automation to save substantial operating cost, as demonstrated in AT&T's VRCP (Voice Recognition Call Processing), or to facilitate intelligent services, such as virtual personal assistants, to enhance individual productivity. These intelligent services often need to be accessible anytime, anywhere (e.g., in cars when the user is in a hands-busy-eyes-busy situation or during meetings where constantly talking to a microphone is either undersirable or impossible), and thus call for advanced signal processing and automatic speech recognition techniques which support what we call ``hands-free'' human-machine communication. These techniques entail a broad spectrum of technical ideas, ranging from use of directional microphones and acoustic echo cancellatiion to robust speech recognition. In this talk, we highlight a number of key techniques that were developed for hands-free human-machine communication in the mid-1990s after Bell Labs became a unit of Lucent Technologies. A video clip will be played to demonstrate the accomplishement.
USSR Report, Cybernetics Computers and Automation Technology

DTIC Science & Technology

1985-09-05

understand each other excellently, although in their speech they frequently omit, it would seem, needed words. However, the life experience of the...participants in a conversa- tion and their perception of voice intonations and gestures make it possible to fill in the missing elements of speech ...the Soviet Union. Comrade M. S. Gorbachev’s speech pointed out that microelectronics, computer technology, instrument building and the whole
Activities report of PTT Research

NASA Astrophysics Data System (ADS)

In the field of postal infrastructure research, activities were performed on postcode readers, radiolabels, and techniques of operations research and artificial intelligence. In the field of telecommunication, transportation, and information, research was made on multipurpose coding schemes, speech recognition, hypertext, a multimedia information server, security of electronic data interchange, document retrieval, improvement of the quality of user interfaces, domotics living support (techniques), and standardization of telecommunication prototcols. In the field of telecommunication infrastructure and provisions research, activities were performed on universal personal telecommunications, advanced broadband network technologies, coherent techniques, measurement of audio quality, near field facilities, local beam communication, local area networks, network security, coupling of broadband and narrowband integrated services digital networks, digital mapping, and standardization of protocols.
Cloud-Based Speech Technology for Assistive Technology Applications (CloudCAST).

PubMed

Cunningham, Stuart; Green, Phil; Christensen, Heidi; Atria, José Joaquín; Coy, André; Malavasi, Massimiliano; Desideri, Lorenzo; Rudzicz, Frank

2017-01-01

The CloudCAST platform provides a series of speech recognition services that can be integrated into assistive technology applications. The platform and the services provided by the public API are described. Several exemplar applications have been developed to demonstrate the platform to potential developers and users.
Automatic speech recognition and training for severely dysarthric users of assistive technology: the STARDUST project.

PubMed

Parker, Mark; Cunningham, Stuart; Enderby, Pam; Hawley, Mark; Green, Phil

2006-01-01

The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent proximity to "normal" articulatory patterns. Severe dysarthric output may also be characterized by a small mass of distinguishable phonetic tokens making the acoustic differentiation of target words difficult. Speaker dependent computer speech recognition using Hidden Markov Models was achieved by the identification of robust phonetic elements within the individual speaker output patterns. A new system of speech training using computer generated visual and auditory feedback reduced the inconsistent production of key phonetic tokens over time.
Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review

ERIC Educational Resources Information Center

Young, Victoria; Mihailidis, Alex

2010-01-01

Despite their growing presence in home computer applications and various telephony services, commercial automatic speech recognition technologies are still not easily employed by everyone; especially individuals with speech disorders. In addition, relatively little research has been conducted on automatic speech recognition performance with older…
Designing augmentative and alternative communication applications: the results of focus groups with speech-language pathologists and parents of children with autism spectrum disorder.

PubMed

Boster, Jamie B; McCarthy, John W

2018-05-01

The purpose of this study was to gain insight from speech-language pathologists (SLPs) and parents of children with autism spectrum disorder (ASD) regarding appealing features of augmentative and alternative communication (AAC) applications. Two separate 1-hour focus groups were conducted with 8 SLPs and 5 parents of children with ASD to identify appealing design features of AAC Apps, their benefits and potential concerns. Participants were shown novel interface designs for communication mode, play mode and incentive systems. Participants responded to poll questions and provided benefits and drawbacks of the features as part of structured discussion. SLPs and parents identified a range of appealing features in communication mode (customization, animation and colour-coding) as well as in play mode (games and videos). SLPs preferred interfaces that supported motor planning and instruction while parents preferred those features such as character assistants that would appeal to their child. Overall SLPs and parents agreed on features for future AAC Apps. SLPs and parents have valuable input in regards to future AAC app design informed by their experiences with children with ASD. Both groups are key stakeholders in the design process and should be included in future design and research endeavors. Implications for Rehabilitation AAC applications for the iPad are often designed based on previous devices without consideration of new features. Ensuring the design of new interfaces are appealing and beneficial for children with ASD can potentially further support their communication. This study demonstrates how key stakeholders in AAC including speech language pathologists and parents can provide information to support the development of future AAC interface designs. Key stakeholders may be an untapped resource in the development of future AAC interfaces for children with ASD.
GLOBECOM '89 - IEEE Global Telecommunications Conference and Exhibition, Dallas, TX, Nov. 27-30, 1989, Conference Record. Volumes 1, 2, & 3

NASA Astrophysics Data System (ADS)

The present conference discusses topics in multiwavelength network technology and its applications, advanced digital radio systems in their propagation environment, mobile radio communications, switching programmability, advancements in computer communications, integrated-network management and security, HDTV and image processing in communications, basic exchange communications radio advancements in digital switching, intelligent network evolution, speech coding for telecommunications, and multiple access communications. Also discussed are network designs for quality assurance, recent progress in coherent optical systems, digital radio applications, advanced communications technologies for mobile users, communication software for switching systems, AI and expert systems in network management, intelligent multiplexing nodes, video and image coding, network protocols and performance, system methods in quality and reliability, the design and simulation of lightwave systems, local radio networks, mobile satellite communications systems, fiber networks restoration, packet video networks, human interfaces for future networks, and lightwave networking.
What is the Value of Embedding Artificial Emotional Prosody in Human–Computer Interactions? Implications for Theory and Design in Psychological Science

PubMed Central

Mitchell, Rachel L. C.; Xu, Yi

2015-01-01

In computerized technology, artificial speech is becoming increasingly important, and is already used in ATMs, online gaming and healthcare contexts. However, today’s artificial speech typically sounds monotonous, a main reason for this being the lack of meaningful prosody. One particularly important function of prosody is to convey different emotions. This is because successful encoding and decoding of emotions is vital for effective social cognition, which is increasingly recognized in human–computer interaction contexts. Current attempts to artificially synthesize emotional prosody are much improved relative to early attempts, but there remains much work to be done due to methodological problems, lack of agreed acoustic correlates, and lack of theoretical grounding. If the addition of synthetic emotional prosody is not of sufficient quality, it may risk alienating users instead of enhancing their experience. So the value of embedding emotion cues in artificial speech may ultimately depend on the quality of the synthetic emotional prosody. However, early evidence on reactions to synthesized non-verbal cues in the facial modality bodes well. Attempts to implement the recognition of emotional prosody into artificial applications and interfaces have perhaps been met with greater success, but the ultimate test of synthetic emotional prosody will be to critically compare how people react to synthetic emotional prosody vs. natural emotional prosody, at the behavioral, socio-cognitive and neural levels. PMID:26617563
Effects of Shared Active Surface Technology on the Communication and Speech of Two Preschool Children with Disabilities

ERIC Educational Resources Information Center

Travers, Jason C.; Fefer, Sarah A.

2017-01-01

Shared active surface (SAS) technology can be described as a supersized tablet computer for multiple simultaneous users. SAS technology has the potential to resolve issues historically associated with learning via single-user computer technology. This study reports findings of a SAS on the social communication and nonsocial speech of two preschool…

Skills based evaluation of alternative input methods to command a semi-autonomous electric wheelchair.

PubMed

Rojas, Mario; Ponce, Pedro; Molina, Arturo

2016-08-01

This paper presents the evaluation, under standardized metrics, of alternative input methods to steer and maneuver a semi-autonomous electric wheelchair. The Human-Machine Interface (HMI), which includes a virtual joystick, head movements and speech recognition controls, was designed to facilitate mobility skills for severely disabled people. Thirteen tasks, which are common to all the wheelchair users, were attempted five times by controlling it with the virtual joystick and the hands-free interfaces in different areas for disabled and non-disabled people. Even though the prototype has an intelligent navigation control, based on fuzzy logic and ultrasonic sensors, the evaluation was done without assistance. The scored values showed that both controls, the head movements and the virtual joystick have similar capabilities, 92.3% and 100%, respectively. However, the 54.6% capacity score obtained for the speech control interface indicates the needs of the navigation assistance to accomplish some of the goals. Furthermore, the evaluation time indicates those skills which require more user's training with the interface and specifications to improve the total performance of the wheelchair.
Multidisciplinary unmanned technology teammate (MUTT)

NASA Astrophysics Data System (ADS)

Uzunovic, Nenad; Schneider, Anne; Lacaze, Alberto; Murphy, Karl; Del Giorno, Mark

2013-01-01

The U.S. Army Tank Automotive Research, Development and Engineering Center (TARDEC) held an autonomous robot competition called CANINE in June 2012. The goal of the competition was to develop innovative and natural control methods for robots. This paper describes the winning technology, including the vision system, the operator interaction, and the autonomous mobility. The rules stated only gestures or voice commands could be used for control. The robots would learn a new object at the start of each phase, find the object after it was thrown into a field, and return the object to the operator. Each of the six phases became more difficult, including clutter of the same color or shape as the object, moving and stationary obstacles, and finding the operator who moved from the starting location to a new location. The Robotic Research Team integrated techniques in computer vision, speech recognition, object manipulation, and autonomous navigation. A multi-filter computer vision solution reliably detected the objects while rejecting objects of similar color or shape, even while the robot was in motion. A speech-based interface with short commands provided close to natural communication of complicated commands from the operator to the robot. An innovative gripper design allowed for efficient object pickup. A robust autonomous mobility and navigation solution for ground robotic platforms provided fast and reliable obstacle avoidance and course navigation. The research approach focused on winning the competition while remaining cognizant and relevant to real world applications.
VASP-4096: a very high performance programmable device for digital media processing applications

NASA Astrophysics Data System (ADS)

Krikelis, Argy

2001-03-01

Over the past few years, technology drivers for microprocessors have changed significantly. Media data delivery and processing--such as telecommunications, networking, video processing, speech recognition and 3D graphics--is increasing in importance and will soon dominate the processing cycles consumed in computer-based systems. This paper presents the architecture of the VASP-4096 processor. VASP-4096 provides high media performance with low energy consumption by integrating associative SIMD parallel processing with embedded microprocessor technology. The major innovations in the VASP-4096 is the integration of thousands of processing units in a single chip that are capable of support software programmable high-performance mathematical functions as well as abstract data processing. In addition to 4096 processing units, VASP-4096 integrates on a single chip a RISC controller that is an implementation of the SPARC architecture, 128 Kbytes of Data Memory, and I/O interfaces. The SIMD processing in VASP-4096 implements the ASProCore architecture, which is a proprietary implementation of SIMD processing, operates at 266 MHz with program instructions issued by the RISC controller. The device also integrates a 64-bit synchronous main memory interface operating at 133 MHz (double-data rate), and a 64- bit 66 MHz PCI interface. VASP-4096, compared with other processors architectures that support media processing, offers true performance scalability, support for deterministic and non-deterministic data processing on a single device, and software programmability that can be re- used in future chip generations.
Emotion-prints: interaction-driven emotion visualization on multi-touch interfaces

NASA Astrophysics Data System (ADS)

Cernea, Daniel; Weber, Christopher; Ebert, Achim; Kerren, Andreas

2015-01-01

Emotions are one of the unique aspects of human nature, and sadly at the same time one of the elements that our technological world is failing to capture and consider due to their subtlety and inherent complexity. But with the current dawn of new technologies that enable the interpretation of emotional states based on techniques involving facial expressions, speech and intonation, electrodermal response (EDS) and brain-computer interfaces (BCIs), we are finally able to access real-time user emotions in various system interfaces. In this paper we introduce emotion-prints, an approach for visualizing user emotional valence and arousal in the context of multi-touch systems. Our goal is to offer a standardized technique for representing user affective states in the moment when and at the location where the interaction occurs in order to increase affective self-awareness, support awareness in collaborative and competitive scenarios, and offer a framework for aiding the evaluation of touch applications through emotion visualization. We show that emotion-prints are not only independent of the shape of the graphical objects on the touch display, but also that they can be applied regardless of the acquisition technique used for detecting and interpreting user emotions. Moreover, our representation can encode any affective information that can be decomposed or reduced to Russell's two-dimensional space of valence and arousal. Our approach is enforced by a BCI-based user study and a follow-up discussion of advantages and limitations.
An Investigation of the Compensatory Effectiveness of Assistive Technology on Postsecondary Students with Learning Disabilities. Final Report.

ERIC Educational Resources Information Center

Murphy, Harry; Higgins, Eleanor

This final report describes the activities and accomplishments of a 3-year study on the compensatory effectiveness of three assistive technologies, optical character recognition, speech synthesis, and speech recognition, on postsecondary students (N=140) with learning disabilities. These technologies were investigated relative to: (1) immediate…
Teaching Speech Communication in a Black College: Does Technology Make a Difference?

ERIC Educational Resources Information Center

Nwadike, Fellina O.; Ekeanyanwu, Nnamdi T.

2011-01-01

Teaching a speech communication course in typical HBCUs (historically black colleges and universities) comes with many issues, because the application of technology in some minority institutions differs. The levels of acceptability as well as affordability are also core issues that affect application. Using technology in the classroom means many…
Language Model Applications to Spelling with Brain-Computer Interfaces

PubMed Central

Mora-Cortes, Anderson; Manyakov, Nikolay V.; Chumerin, Nikolay; Van Hulle, Marc M.

2014-01-01

Within the Ambient Assisted Living (AAL) community, Brain-Computer Interfaces (BCIs) have raised great hopes as they provide alternative communication means for persons with disabilities bypassing the need for speech and other motor activities. Although significant advancements have been realized in the last decade, applications of language models (e.g., word prediction, completion) have only recently started to appear in BCI systems. The main goal of this article is to review the language model applications that supplement non-invasive BCI-based communication systems by discussing their potential and limitations, and to discern future trends. First, a brief overview of the most prominent BCI spelling systems is given, followed by an in-depth discussion of the language models applied to them. These language models are classified according to their functionality in the context of BCI-based spelling: the static/dynamic nature of the user interface, the use of error correction and predictive spelling, and the potential to improve their classification performance by using language models. To conclude, the review offers an overview of the advantages and challenges when implementing language models in BCI-based communication systems when implemented in conjunction with other AAL technologies. PMID:24675760
Research in speech communication.

PubMed Central

Flanagan, J

1995-01-01

Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker. Images Fig. 1 Fig. 2 Fig. 5 Fig. 8 Fig. 11 Fig. 12 Fig. 13 PMID:7479806
Lip Movement Exaggerations during Infant-Directed Speech

ERIC Educational Resources Information Center

Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana

2010-01-01

Purpose: Although a growing body of literature has identified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their…
Automated Assessment of Speech Fluency for L2 English Learners

ERIC Educational Resources Information Center

Yoon, Su-Youn

2009-01-01

This dissertation provides an automated scoring method of speech fluency for second language learners of English (L2 learners) based that uses speech recognition technology. Non-standard pronunciation, frequent disfluencies, faulty grammar, and inappropriate lexical choices are crucial characteristics of L2 learners' speech. Due to the ease of…
Speech coding at 4800 bps for mobile satellite communications

NASA Technical Reports Server (NTRS)

Gersho, Allen; Chan, Wai-Yip; Davidson, Grant; Chen, Juin-Hwey; Yong, Mei

1988-01-01

A speech compression project has recently been completed to develop a speech coding algorithm suitable for operation in a mobile satellite environment aimed at providing telephone quality natural speech at 4.8 kbps. The work has resulted in two alternative techniques which achieve reasonably good communications quality at 4.8 kbps while tolerating vehicle noise and rather severe channel impairments. The algorithms are embodied in a compact self-contained prototype consisting of two AT and T 32-bit floating-point DSP32 digital signal processors (DSP). A Motorola 68HC11 microcomputer chip serves as the board controller and interface handler. On a wirewrapped card, the prototype's circuit footprint amounts to only 200 sq cm, and consumes about 9 watts of power.
Task-Oriented, Naturally Elicited Speech (TONE) Database for the Force Requirements Expert System, Hawaii (FRESH)

DTIC Science & Technology

1988-09-01

Group Subgroup Command and control; Computational linguistics; expert system voice recognition; man- machine interface; U.S. Government 19 Abstract...simulates the characteristics of FRESH on a smaller scale. This study assisted NOSC in developing a voice-recognition, man- machine interface that could...scale. This study assisted NOSC in developing a voice-recogni- tion, man- machine interface that could be used with TONE and upgraded at a later date
The Affordance of Speech Recognition Technology for EFL Learning in an Elementary School Setting

ERIC Educational Resources Information Center

Liaw, Meei-Ling

2014-01-01

This study examined the use of speech recognition (SR) technology to support a group of elementary school children's learning of English as a foreign language (EFL). SR technology has been used in various language learning contexts. Its application to EFL teaching and learning is still relatively recent, but a solid understanding of its…
The Effect of Automatic Speech Recognition Eyespeak Software on Iraqi Students' English Pronunciation: A Pilot Study

ERIC Educational Resources Information Center

Sidgi, Lina Fathi Sidig; Shaari, Ahmad Jelani

2017-01-01

The use of technology, such as computer-assisted language learning (CALL), is used in teaching and learning in the foreign language classrooms where it is most needed. One promising emerging technology that supports language learning is automatic speech recognition (ASR). Integrating such technology, especially in the instruction of pronunciation…
Yaounde French Speech Corpus

DTIC Science & Technology

2017-03-01

the Center for Technology Enhanced Language Learning (CTELL), a research cell in the Department of Foreign Languages, United States Military Academy...models for automatic speech recognition (ASR), and to, thereby, investigate the utility of ASR in pedagogical technology . The corpus is a sample of...lexical resources, language technology 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF
Converted and upgraded maps programmed in the newer speech processor for the first generation of multichannel cochlear implant.

PubMed

Magalhães, Ana Tereza de Matos; Goffi-Gomez, M Valéria Schmidt; Hoshino, Ana Cristina; Tsuji, Robinson Koji; Bento, Ricardo Ferreira; Brito, Rubens

2013-09-01

To identify the technological contributions of the newer version of speech processor to the first generation of multichannel cochlear implant and the satisfaction of users of the new technology. Among the new features available, we focused on the effect of the frequency allocation table, the T-SPL and C-SPL, and the preprocessing gain adjustments (adaptive dynamic range optimization). Prospective exploratory study. Cochlear implant center at hospital. Cochlear implant users of the Spectra processor with speech recognition in closed set. Seventeen patients were selected between the ages of 15 and 82 and deployed for more than 8 years. The technology update of the speech processor for the Nucleus 22. To determine Freedom's contribution, thresholds and speech perception tests were performed with the last map used with the Spectra and the maps created for Freedom. To identify the effect of the frequency allocation table, both upgraded and converted maps were programmed. One map was programmed with 25 dB T-SPL and 65 dB C-SPL and the other map with adaptive dynamic range optimization. To assess satisfaction, SADL and APHAB were used. All speech perception tests and all sound field thresholds were statistically better with the new speech processor; 64.7% of patients preferred maintaining the same frequency table that was suggested for the older processor. The sound field threshold was statistically significant at 500, 1,000, 1,500, and 2,000 Hz with 25 dB T-SPL/65 dB C-SPL. Regarding patient's satisfaction, there was a statistically significant improvement, only in the subscale of speech in noise abilities and phone use. The new technology improved the performance of patients with the first generation of multichannel cochlear implant.
Development of speech prostheses: current status and recent advances

PubMed Central

Brumberg, Jonathan S; Guenther, Frank H

2010-01-01

Brain–computer interfaces (BCIs) have been developed over the past decade to restore communication to persons with severe paralysis. In the most severe cases of paralysis, known as locked-in syndrome, patients retain cognition and sensation, but are capable of only slight voluntary eye movements. For these patients, no standard communication method is available, although some can use BCIs to communicate by selecting letters or words on a computer. Recent research has sought to improve on existing techniques by using BCIs to create a direct prediction of speech utterances rather than to simply control a spelling device. Such methods are the first steps towards speech prostheses as they are intended to entirely replace the vocal apparatus of paralyzed users. This article outlines many well known methods for restoration of communication by BCI and illustrates the difference between spelling devices and direct speech prediction or speech prosthesis. PMID:20822389
Building Interfaces between the Humanities and Cognitive Sciences: The Case of Human Speech

ERIC Educational Resources Information Center

Benus, Stefan

2010-01-01

I argue that creating "interfaces" between the humanities and cognitive sciences would be intellectually stimulating for both groups. More specifically for the humanities: they might gain challenging and rewarding avenues of inquiry, attract more funding, and advance their position in the 21st-century universities and among the general public, if…
Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research.

PubMed

Toutios, Asterios; Narayanan, Shrikanth S

2016-01-01

Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.
Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research

PubMed Central

TOUTIOS, ASTERIOS; NARAYANAN, SHRIKANTH S.

2016-01-01

Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development. PMID:27833745

Three Trailblazing Technologies for Schools.

ERIC Educational Resources Information Center

McGinty, Tony

1987-01-01

Provides an overview of the capabilities and potential educational applications of CD-ROM (compact disk read-only memory), artificial intelligence, and speech technology. Highlights include reference materials on CD-ROM; current developments in CD-I (compact disk interactive); synthesized and digital speech for microcomputers, including specific…
The benefits of remote microphone technology for adults with cochlear implants.

PubMed

Fitzpatrick, Elizabeth M; Séguin, Christiane; Schramm, David R; Armstrong, Shelly; Chénier, Josée

2009-10-01

Cochlear implantation has become a standard practice for adults with severe to profound hearing loss who demonstrate limited benefit from hearing aids. Despite the substantial auditory benefits provided by cochlear implants, many adults experience difficulty understanding speech in noisy environments and in other challenging listening conditions such as television. Remote microphone technology may provide some benefit in these situations; however, little is known about whether these systems are effective in improving speech understanding in difficult acoustic environments for this population. This study was undertaken with adult cochlear implant recipients to assess the potential benefits of remote microphone technology. The objectives were to examine the measurable and perceived benefit of remote microphone devices during television viewing and to assess the benefits of a frequency-modulated system for speech understanding in noise. Fifteen adult unilateral cochlear implant users were fit with remote microphone devices in a clinical environment. The study used a combination of direct measurements and patient perceptions to assess speech understanding with and without remote microphone technology. The direct measures involved a within-subject repeated-measures design. Direct measures of patients' speech understanding during television viewing were collected using their cochlear implant alone and with their implant device coupled to an assistive listening device. Questionnaires were administered to document patients' perceptions of benefits during the television-listening tasks. Speech recognition tests of open-set sentences in noise with and without remote microphone technology were also administered. Participants showed improved speech understanding for television listening when using remote microphone devices coupled to their cochlear implant compared with a cochlear implant alone. This benefit was documented both when listening to news and talk show recordings. Questionnaire results also showed statistically significant differences between listening with a cochlear implant alone and listening with a remote microphone device. Participants judged that remote microphone technology provided them with better comprehension, more confidence, and greater ease of listening. Use of a frequency-modulated system coupled to a cochlear implant also showed significant improvement over a cochlear implant alone for open-set sentence recognition in +10 and +5 dB signal to noise ratios. Benefits were measured during remote microphone use in focused-listening situations in a clinical setting, for both television viewing and speech understanding in noise in the audiometric sound suite. The results suggest that adult cochlear implant users should be counseled regarding the potential for enhanced speech understanding in difficult listening environments through the use of remote microphone technology.
Performance of a low data rate speech codec for land-mobile satellite communications

NASA Technical Reports Server (NTRS)

Gersho, Allen; Jedrey, Thomas C.

1990-01-01

In an effort to foster the development of new technologies for the emerging land mobile satellite communications services, JPL funded two development contracts in 1984: one to the Univ. of Calif., Santa Barbara and the other to the Georgia Inst. of Technology, to develop algorithms and real time hardware for near toll quality speech compression at 4800 bits per second. Both universities have developed and delivered speech codecs to JPL, and the UCSB codec was extensively tested by JPL in a variety of experimental setups. The basic UCSB speech codec algorithms and the test results of the various experiments performed with this codec are presented.
Feasibility of automated speech sample collection with stuttering children using interactive voice response (IVR) technology.

PubMed

Vogel, Adam P; Block, Susan; Kefalianos, Elaina; Onslow, Mark; Eadie, Patricia; Barth, Ben; Conway, Laura; Mundt, James C; Reilly, Sheena

2015-04-01

To investigate the feasibility of adopting automated interactive voice response (IVR) technology for remotely capturing standardized speech samples from stuttering children. Participants were 10 6-year-old stuttering children. Their parents called a toll-free number from their homes and were prompted to elicit speech from their children using a standard protocol involving conversation, picture description and games. The automated IVR system was implemented using an off-the-shelf telephony software program and delivered by a standard desktop computer. The software infrastructure utilizes voice over internet protocol. Speech samples were automatically recorded during the calls. Video recordings were simultaneously acquired in the home at the time of the call to evaluate the fidelity of the telephone collected samples. Key outcome measures included syllables spoken, percentage of syllables stuttered and an overall rating of stuttering severity using a 10-point scale. Data revealed a high level of relative reliability in terms of intra-class correlation between the video and telephone acquired samples on all outcome measures during the conversation task. Findings were less consistent for speech samples during picture description and games. Results suggest that IVR technology can be used successfully to automate remote capture of child speech samples.
[Research on Barrier-free Home Environment System Based on Speech Recognition].

PubMed

Zhu, Husheng; Yu, Hongliu; Shi, Ping; Fang, Youfang; Jian, Zhuo

2015-10-01

The number of people with physical disabilities is increasing year by year, and the trend of population aging is more and more serious. In order to improve the quality of the life, a control system of accessible home environment for the patients with serious disabilities was developed to control the home electrical devices with the voice of the patients. The control system includes a central control platform, a speech recognition module, a terminal operation module, etc. The system combines the speech recognition control technology and wireless information transmission technology with the embedded mobile computing technology, and interconnects the lamp, electronic locks, alarms, TV and other electrical devices in the home environment as a whole system through a wireless network node. The experimental results showed that speech recognition success rate was more than 84% in the home environment.
Nurses using futuristic technology in today's healthcare setting.

PubMed

Wolf, Debra M; Kapadia, Amar; Kintzel, Jessie; Anton, Bonnie B

2009-01-01

Human computer interaction (HCI) equates nurses using voice assisted technology within a clinical setting to document patient care real time, retrieve patient information from care plans, and complete routine tasks. This is a reality currently utilized by clinicians today in acute and long term care settings. Voice assisted documentation provides hands & eyes free accurate documentation while enabling effective communication and task management. The speech technology increases the accuracy of documentation, while interfacing directly into the electronic health record (EHR). Using technology consisting of a light weight headset and small fist size wireless computer, verbal responses to easy to follow cues are converted into a database systems allowing staff to obtain individualized care status reports on demand. To further assist staff in their daily process, this innovative technology allows staff to send and receive pages as needed. This paper will discuss how leading edge and award winning technology is being integrated within the United States. Collaborative efforts between clinicians and analyst will be discussed reflecting the interactive design and build functionality. Features such as the system's voice responses and directed cues will be shared and how easily data can be documented, viewed and retrieved. Outcome data will be presented on how the technology impacted organization's quality outcomes, financial reimbursement, and employee's level of satisfaction.
Six characteristics of effective structured reporting and the inevitable integration with speech recognition.

PubMed

Liu, David; Zucherman, Mark; Tulloss, William B

2006-03-01

The reporting of radiological images is undergoing dramatic changes due to the introduction of two new technologies: structured reporting and speech recognition. Each technology has its own unique advantages. The highly organized content of structured reporting facilitates data mining and billing, whereas speech recognition offers a natural succession from the traditional dictation-transcription process. This article clarifies the distinction between the process and outcome of structured reporting, describes fundamental requirements for any effective structured reporting system, and describes the potential development of a novel, easy-to-use, customizable structured reporting system that incorporates speech recognition. This system should have all the advantages derived from structured reporting, accommodate a wide variety of user needs, and incorporate speech recognition as a natural component and extension of the overall reporting process.
Assistive Technology and Adults with Learning Disabilities: A Blueprint for Exploration and Advancement.

ERIC Educational Resources Information Center

Raskind, Marshall

1993-01-01

This article describes assistive technologies for persons with learning disabilities, including word processing, spell checking, proofreading programs, outlining/"brainstorming" programs, abbreviation expanders, speech recognition, speech synthesis/screen review, optical character recognition systems, personal data managers, free-form databases,…
Review of Speech-to-Text Recognition Technology for Enhancing Learning

ERIC Educational Resources Information Center

Shadiev, Rustam; Hwang, Wu-Yuin; Chen, Nian-Shing; Huang, Yueh-Min

2014-01-01

This paper reviewed literature from 1999 to 2014 inclusively on how Speech-to-Text Recognition (STR) technology has been applied to enhance learning. The first aim of this review is to understand how STR technology has been used to support learning over the past fifteen years, and the second is to analyze all research evidence to understand how…
Automatic speech recognition technology development at ITT Defense Communications Division

NASA Technical Reports Server (NTRS)

White, George M.

1977-01-01

An assessment of the applications of automatic speech recognition to defense communication systems is presented. Future research efforts include investigations into the following areas: (1) dynamic programming; (2) recognition of speech degraded by noise; (3) speaker independent recognition; (4) large vocabulary recognition; (5) word spotting and continuous speech recognition; and (6) isolated word recognition.
Speech Perception Benefits of FM and Infrared Devices to Children with Hearing Aids in a Typical Classroom

ERIC Educational Resources Information Center

Anderson, Karen L.; Goldstein, Howard

2004-01-01

Children typically learn in classroom environments that have background noise and reverberation that interfere with accurate speech perception. Amplification technology can enhance the speech perception of students who are hard of hearing. Purpose: This study used a single-subject alternating treatments design to compare the speech recognition…
Using the Electrocorticographic Speech Network to Control a Brain-Computer Interface in Humans

PubMed Central

Leuthardt, Eric C.; Gaona, Charles; Sharma, Mohit; Szrama, Nicholas; Roland, Jarod; Freudenberg, Zac; Solis, Jamie; Breshears, Jonathan; Schalk, Gerwin

2013-01-01

Electrocorticography (ECoG) has emerged as a new signal platform for brain-computer interface (BCI) systems. Classically, the cortical physiology that has been commonly investigated and utilized for device control in humans has been brain signals from sensorimotor cortex. Hence, it was unknown whether other neurophysiological substrates, such as the speech network, could be used to further improve on or complement existing motor-based control paradigms. We demonstrate here for the first time that ECoG signals associated with different overt and imagined phoneme articulation can enable invasively monitored human patients to control a one-dimensional computer cursor rapidly and accurately. This phonetic content was distinguishable within higher gamma frequency oscillations and enabled users to achieve final target accuracies between 68 and 91% within 15 minutes. Additionally, one of the patients achieved robust control using recordings from a microarray consisting of 1 mm spaced microwires. These findings suggest that the cortical network associated with speech could provide an additional cognitive and physiologic substrate for BCI operation and that these signals can be acquired from a cortical array that is small and minimally invasive. PMID:21471638
The Benefit of Remote Microphones Using Four Wireless Protocols.

PubMed

Rodemerk, Krishna S; Galster, Jason A

2015-09-01

Many studies have reported the speech recognition benefits of a personal remote microphone system when used by adult listeners with hearing loss. The advance of wireless technology has allowed for many wireless audio transmission protocols. Some of these protocols interface with commercially available hearing aids. As a result, commercial remote microphone systems use a variety of different protocols for wireless audio transmission. It is not known how these systems compare, with regard to adult speech recognition in noise. The primary goal of this investigation was to determine the speech recognition benefits of four different commercially available remote microphone systems, each with a different wireless audio transmission protocol. A repeated-measures design was used in this study. Sixteen adults, ages 52 to 81 yr, with mild to severe sensorineural hearing loss participated in this study. Participants were fit with three different sets of bilateral hearing aids and four commercially available remote microphone systems (FM, 900 MHz, 2.4 GHz, and Bluetooth(®) paired with near-field magnetic induction). Speech recognition scores were measured by an adaptive version of the Hearing in Noise Test (HINT). The participants were seated both 6 and 12' away from the talker loudspeaker. Participants repeated HINT sentences with and without hearing aids and with four commercially available remote microphone systems in both seated positions with and without contributions from the hearing aid or environmental microphone (24 total conditions). The HINT SNR-50, or the signal-to-noise ratio required for correct repetition of 50% of the sentences, was recorded for all conditions. A one-way repeated measures analysis of variance was used to determine statistical significance of microphone condition. The results of this study revealed that use of the remote microphone systems statistically improved speech recognition in noise relative to unaided and hearing aid-only conditions across all four wireless transmission protocols at 6 and 12' away from the talker. Participants showed a significant improvement in speech recognition in noise when comparing four remote microphone systems with different wireless transmission methods to hearing aids alone. American Academy of Audiology.
Application of speech recognition and synthesis in the general aviation cockpit

NASA Technical Reports Server (NTRS)

North, R. A.; Mountford, S. J.; Bergeron, H.

1984-01-01

Interactive speech recognition/synthesis technology is assessed as a method for the aleviation of single-pilot IFR flight workloads. Attention was given during this series of evaluations to the conditions typical of general aviation twin-engine aircrft cockpits, covering several commonly encountered IFR flight condition scenarios. The most beneficial speech command tasks are noted to be in the data retrieval domain, which would allow the pilot access to uplinked data, checklists, and performance charts. Data entry tasks also appear to benefit from this technology.
Techniques and applications for binaural sound manipulation in human-machine interfaces

NASA Technical Reports Server (NTRS)

Begault, Durand R.; Wenzel, Elizabeth M.

1990-01-01

The implementation of binaural sound to speech and auditory sound cues (auditory icons) is addressed from both an applications and technical standpoint. Techniques overviewed include processing by means of filtering with head-related transfer functions. Application to advanced cockpit human interface systems is discussed, although the techniques are extendable to any human-machine interface. Research issues pertaining to three-dimensional sound displays under investigation at the Aerospace Human Factors Division at NASA Ames Research Center are described.
Techniques and applications for binaural sound manipulation in human-machine interfaces

NASA Technical Reports Server (NTRS)

Begault, Durand R.; Wenzel, Elizabeth M.

1992-01-01

The implementation of binaural sound to speech and auditory sound cues (auditory icons) is addressed from both an applications and technical standpoint. Techniques overviewed include processing by means of filtering with head-related transfer functions. Application to advanced cockpit human interface systems is discussed, although the techniques are extendable to any human-machine interface. Research issues pertaining to three-dimensional sound displays under investigation at the Aerospace Human Factors Division at NASA Ames Research Center are described.
SPEECH--MAN'S NATURAL COMMUNICATION.

ERIC Educational Resources Information Center

DUDLEY, HOMER; AND OTHERS

SESSION 63 OF THE 1967 INSTITUTE OF ELECTRICAL AND ELECTRONIC ENGINEERS INTERNATIONAL CONVENTION BROUGHT TOGETHER SEVEN DISTINGUISHED MEN WORKING IN FIELDS RELEVANT TO LANGUAGE. THEIR TOPICS INCLUDED ORIGIN AND EVOLUTION OF SPEECH AND LANGUAGE, LANGUAGE AND CULTURE, MAN'S PHYSIOLOGICAL MECHANISMS FOR SPEECH, LINGUISTICS, AND TECHNOLOGY AND…
An intelligent listening framework for capturing encounter notes from a doctor-patient dialog

PubMed Central

Klann, Jeffrey G; Szolovits, Peter

2009-01-01

Background Capturing accurate and machine-interpretable primary data from clinical encounters is a challenging task, yet critical to the integrity of the practice of medicine. We explore the intriguing possibility that technology can help accurately capture structured data from the clinical encounter using a combination of automated speech recognition (ASR) systems and tools for extraction of clinical meaning from narrative medical text. Our goal is to produce a displayed evolving encounter note, visible and editable (using speech) during the encounter. Results This is very ambitious, and so far we have taken only the most preliminary steps. We report a simple proof-of-concept system and the design of the more comprehensive one we are building, discussing both the engineering design and challenges encountered. Without a formal evaluation, we were encouraged by our initial results. The proof-of-concept, despite a few false positives, correctly recognized the proper category of single-and multi-word phrases in uncorrected ASR output. The more comprehensive system captures and transcribes speech and stores alternative phrase interpretations in an XML-based format used by a text-engineering framework. It does not yet use the framework to perform the language processing present in the proof-of-concept. Conclusion The work here encouraged us that the goal is reachable, so we conclude with proposed next steps. Some challenging steps include acquiring a corpus of doctor-patient conversations, exploring a workable microphone setup, performing user interface research, and developing a multi-speaker version of our tools. PMID:19891797
Augmentative and Alternative Communication in Autism: A Comparison of the Picture Exchange Communication System and Speech-Output Technology

ERIC Educational Resources Information Center

Boesch, Miriam Chacon

2011-01-01

The purpose of this comparative efficacy study was to investigate the Picture Exchange Communication System (PECS) and a speech-generating device (SGD) in developing requesting skills, social-communicative behavior, and speech for three elementary-age children with severe autism and little to no functional speech. Requesting was selected as the…
Using Automatic Speech Recognition to Dictate Mathematical Expressions: The Development of the "TalkMaths" Application at Kingston University

ERIC Educational Resources Information Center

Wigmore, Angela; Hunter, Gordon; Pflugel, Eckhard; Denholm-Price, James; Binelli, Vincent

2009-01-01

Speech technology--especially automatic speech recognition--has now advanced to a level where it can be of great benefit both to able-bodied people and those with various disabilities. In this paper we describe an application "TalkMaths" which, using the output from a commonly-used conventional automatic speech recognition system,…

A Prospectus for the Future Development of a Speech Lab: Hypertext Applications.

ERIC Educational Resources Information Center

Berube, David M.

This paper presents a plan for the next generation of speech laboratories which integrates technologies of modern communication in order to improve and modernize the instructional process. The paper first examines the application of intermediate technologies including audio-video recording and playback, computer assisted instruction and testing…
Voice Technologies in Libraries: A Look into the Future.

ERIC Educational Resources Information Center

Lange, Holley R., Ed.; And Others

1991-01-01

Discussion of synthesized speech and voice recognition focuses on a forum that addressed the potential for speech technologies in libraries. Topics discussed by three contributors include possible library applications in technical processing, book receipt, circulation control, and database access; use by disabled and illiterate users; and problems…
Using Computer Technology To Monitor Student Progress and Remediate Reading Problems.

ERIC Educational Resources Information Center

McCullough, C. Sue

1995-01-01

Focuses on research about application of text-to-speech systems in diagnosing and remediating word recognition, vocabulary knowledge, and comprehension disabilities. As school psychologists move toward a consultative model of service delivery, they need to know about technology such as speech synthesizers, digitizers, optical-character-recognition…
Fifty years of progress in speech and speaker recognition

NASA Astrophysics Data System (ADS)

Furui, Sadaoki

2004-10-01

Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.
Two distinct auditory-motor circuits for monitoring speech production as revealed by content-specific suppression of auditory cortex.

PubMed

Ylinen, Sari; Nora, Anni; Leminen, Alina; Hakala, Tero; Huotilainen, Minna; Shtyrov, Yury; Mäkelä, Jyrki P; Service, Elisabet

2015-06-01

Speech production, both overt and covert, down-regulates the activation of auditory cortex. This is thought to be due to forward prediction of the sensory consequences of speech, contributing to a feedback control mechanism for speech production. Critically, however, these regulatory effects should be specific to speech content to enable accurate speech monitoring. To determine the extent to which such forward prediction is content-specific, we recorded the brain's neuromagnetic responses to heard multisyllabic pseudowords during covert rehearsal in working memory, contrasted with a control task. The cortical auditory processing of target syllables was significantly suppressed during rehearsal compared with control, but only when they matched the rehearsed items. This critical specificity to speech content enables accurate speech monitoring by forward prediction, as proposed by current models of speech production. The one-to-one phonological motor-to-auditory mappings also appear to serve the maintenance of information in phonological working memory. Further findings of right-hemispheric suppression in the case of whole-item matches and left-hemispheric enhancement for last-syllable mismatches suggest that speech production is monitored by 2 auditory-motor circuits operating on different timescales: Finer grain in the left versus coarser grain in the right hemisphere. Taken together, our findings provide hemisphere-specific evidence of the interface between inner and heard speech. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Representation and Embodiment of Meaning in L2 Communication: Motion Events in the Speech and Gesture of Advanced L2 Korean and L2 English Speakers

ERIC Educational Resources Information Center

Choi, Soojung; Lantolf, James P.

2008-01-01

This study investigates the interface between speech and gesture in second language (L2) narration within Slobin's (2003) thinking-for-speaking (TFS) framework as well as with respect to McNeill's (1992, 2005) growth point (GP) hypothesis. Specifically, our interest is in whether speakers shift from a first language (L1) to a L2 TFS pattern as…
Speech, stone tool-making and the evolution of language.

PubMed

Cataldo, Dana Michelle; Migliano, Andrea Bamberg; Vinicius, Lucio

2018-01-01

The 'technological hypothesis' proposes that gestural language evolved in early hominins to enable the cultural transmission of stone tool-making skills, with speech appearing later in response to the complex lithic industries of more recent hominins. However, no flintknapping study has assessed the efficiency of speech alone (unassisted by gesture) as a tool-making transmission aid. Here we show that subjects instructed by speech alone underperform in stone tool-making experiments in comparison to subjects instructed through either gesture alone or 'full language' (gesture plus speech), and also report lower satisfaction with their received instruction. The results provide evidence that gesture was likely to be selected over speech as a teaching aid in the earliest hominin tool-makers; that speech could not have replaced gesturing as a tool-making teaching aid in later hominins, possibly explaining the functional retention of gesturing in the full language of modern humans; and that speech may have evolved for reasons unrelated to tool-making. We conclude that speech is unlikely to have evolved as tool-making teaching aid superior to gesture, as claimed by the technological hypothesis, and therefore alternative views should be considered. For example, gestural language may have evolved to enable tool-making in earlier hominins, while speech may have later emerged as a response to increased trade and more complex inter- and intra-group interactions in Middle Pleistocene ancestors of Neanderthals and Homo sapiens; or gesture and speech may have evolved in parallel rather than in sequence.
Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features

NASA Astrophysics Data System (ADS)

Nguyen, Chuong H.; Karavas, George K.; Artemiadis, Panagiotis

2018-02-01

Objective. In this paper, we investigate the suitability of imagined speech for brain-computer interface (BCI) applications. Approach. A novel method based on covariance matrix descriptors, which lie in Riemannian manifold, and the relevance vector machines classifier is proposed. The method is applied on electroencephalographic (EEG) signals and tested in multiple subjects. Main results. The method is shown to outperform other approaches in the field with respect to accuracy and robustness. The algorithm is validated on various categories of speech, such as imagined pronunciation of vowels, short words and long words. The classification accuracy of our methodology is in all cases significantly above chance level, reaching a maximum of 70% for cases where we classify three words and 95% for cases of two words. Significance. The results reveal certain aspects that may affect the success of speech imagery classification from EEG signals, such as sound, meaning and word complexity. This can potentially extend the capability of utilizing speech imagery in future BCI applications. The dataset of speech imagery collected from total 15 subjects is also published.
The effect of hearing aid technologies on listening in an automobile.

PubMed

Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A; Stanziola, Rachel W

2013-06-01

Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile/road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the listener/driver, conventional directional processing that places the directivity beam toward the listener's front may not be helpful and, in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener's speech recognition performance and preference for communication in a traveling automobile. A single-blinded, repeated-measures design was used. Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 yr participated in the study. The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 mph on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound-treated booth to assess speech recognition performance and preference with each programmed condition. Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants' preferences for a given processing scheme were generally consistent with speech recognition results. The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. American Academy of Audiology.
Creative Speech Technology: editorial introduction to this special issue.

PubMed

Edwards, Alistair D N; Newell, Christopher

2013-10-01

CreST is the Creative Speech Technology Network, a research network which brought together people from a wide variety of backgrounds spanning arts technology and beyond. The papers in this volume represent some of the outcomes of that collaboration. This editorial introduces the background of the network and each of the papers. In conclusion we demonstrate that this work helped to realize many of the objectives of the network.
Visualizing Syllables: Real-Time Computerized Feedback within a Speech-Language Intervention

ERIC Educational Resources Information Center

DeThorne, Laura; Aparicio Betancourt, Mariana; Karahalios, Karrie; Halle, Jim; Bogue, Ellen

2015-01-01

Computerized technologies now offer unprecedented opportunities to provide real-time visual feedback to facilitate children's speech-language development. We employed a mixed-method design to examine the effectiveness of two speech-language interventions aimed at facilitating children's multisyllabic productions: one incorporated a novel…
A software tool for analyzing multichannel cochlear implant signals.

PubMed

Lai, Wai Kong; Bögli, Hans; Dillier, Norbert

2003-10-01

A useful and convenient means to analyze the radio frequency (RF) signals being sent by a speech processor to a cochlear implant would be to actually capture and display them with appropriate software. This is particularly useful for development or diagnostic purposes. sCILab (Swiss Cochlear Implant Laboratory) is such a PC-based software tool intended for the Nucleus family of Multichannel Cochlear Implants. Its graphical user interface provides a convenient and intuitive means for visualizing and analyzing the signals encoding speech information. Both numerical and graphic displays are available for detailed examination of the captured CI signals, as well as an acoustic simulation of these CI signals. sCILab has been used in the design and verification of new speech coding strategies, and has also been applied as an analytical tool in studies of how different parameter settings of existing speech coding strategies affect speech perception. As a diagnostic tool, it is also useful for troubleshooting problems with the external equipment of the cochlear implant systems.
Automatic Speech Recognition Technology as an Effective Means for Teaching Pronunciation

ERIC Educational Resources Information Center

Elimat, Amal Khalil; AbuSeileek, Ali Farhan

2014-01-01

This study aimed to explore the effect of using automatic speech recognition technology (ASR) on the third grade EFL students' performance in pronunciation, whether teaching pronunciation through ASR is better than regular instruction, and the most effective teaching technique (individual work, pair work, or group work) in teaching pronunciation…
The Promise of NLP and Speech Processing Technologies in Language Assessment

ERIC Educational Resources Information Center

Chapelle, Carol A.; Chung, Yoo-Ree

2010-01-01

Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or…
Multiple Stakeholder Perspectives on Teletherapy Delivery of Speech Pathology Services in Rural Schools: A Preliminary, Qualitative Investigation

PubMed Central

LINCOLN, MICHELLE; HINES, MONIQUE; FAIRWEATHER, CRAIG; RAMSDEN, ROBYN; MARTINOVICH, JULIA

2015-01-01

The objective of this study was to investigate stakeholders’ views on the feasibility and acceptability of a pilot speech pathology teletherapy program for children attending schools in rural New South Wales, Australia. Nine children received speech pathology sessions delivered via Adobe Connect® web-conferencing software. During semi-structured interviews, school principals (n = 3), therapy facilitators (n = 7), and parents (n = 6) described factors that promoted or threatened the program’s feasibility and acceptability. Themes were categorized according to whether they related to (a) the use of technology; (b) the school-based nature of the program; or (c) the combination of using technology with a school-based program. Despite frequent reports of difficulties with technology, teletherapy delivery of speech pathology services in schools was highly acceptable to stakeholders. However, the use of technology within a school environment increased the complexities of service delivery. Service providers should pay careful attention to planning processes and lines of communication in order to promote efficiency and acceptability of teletherapy programs. PMID:25945230
Evaluation of Adaptive Noise Management Technologies for School-Age Children with Hearing Loss.

PubMed

Wolfe, Jace; Duke, Mila; Schafer, Erin; Jones, Christine; Rakita, Lori

2017-05-01

Children with hearing loss experience significant difficulty understanding speech in noisy and reverberant situations. Adaptive noise management technologies, such as fully adaptive directional microphones and digital noise reduction, have the potential to improve communication in noise for children with hearing aids. However, there are no published studies evaluating the potential benefits children receive from the use of adaptive noise management technologies in simulated real-world environments as well as in daily situations. The objective of this study was to compare speech recognition, speech intelligibility ratings (SIRs), and sound preferences of children using hearing aids equipped with and without adaptive noise management technologies. A single-group, repeated measures design was used to evaluate performance differences obtained in four simulated environments. In each simulated environment, participants were tested in a basic listening program with minimal noise management features, a manual program designed for that scene, and the hearing instruments' adaptive operating system that steered hearing instrument parameterization based on the characteristics of the environment. Twelve children with mild to moderately severe sensorineural hearing loss. Speech recognition and SIRs were evaluated in three hearing aid programs with and without noise management technologies across two different test sessions and various listening environments. Also, the participants' perceptual hearing performance in daily real-world listening situations with two of the hearing aid programs was evaluated during a four- to six-week field trial that took place between the two laboratory sessions. On average, the use of adaptive noise management technology improved sentence recognition in noise for speech presented in front of the participant but resulted in a decrement in performance for signals arriving from behind when the participant was facing forward. However, the improvement with adaptive noise management exceeded the decrement obtained when the signal arrived from behind. Most participants reported better subjective SIRs when using adaptive noise management technologies, particularly when the signal of interest arrived from in front of the listener. In addition, most participants reported a preference for the technology with an automatically switching, adaptive directional microphone and adaptive noise reduction in real-world listening situations when compared to conventional, omnidirectional microphone use with minimal noise reduction processing. Use of the adaptive noise management technologies evaluated in this study improves school-age children's speech recognition in noise for signals arriving from the front. Although a small decrement in speech recognition in noise was observed for signals arriving from behind the listener, most participants reported a preference for use of noise management technology both when the signal arrived from in front and from behind the child. The results of this study suggest that adaptive noise management technologies should be considered for use with school-age children when listening in academic and social situations. American Academy of Audiology
Design and development of an AAC app based on a speech-to-symbol technology.

PubMed

Radici, Elena; Bonacina, Stefano; De Leo, Gianluca

2016-08-01

The purpose of this paper is to present the design and the development of an Augmentative and Alternative Communication app that uses a speech to symbol technology to model language, i.e. to recognize the speech and display the text or the graphic content related to it. Our app is intended to be adopted by communication partners who want to engage in interventions focused on improving communication skills. Our app has the goal of translating simple speech sentences in a set of symbols that are understandable by children with complex communication needs. We moderated a focus group among six AAC communication partners. Then, we developed a prototype. We are currently starting testing our app in an AAC Centre in Milan, Italy.
Choosing and Using Text-to-Speech Software

ERIC Educational Resources Information Center

Peters, Tom; Bell, Lori

2007-01-01

This article describes a computer-based technology for generating speech called text-to-speech (TTS). This software is ready for widespread use by libraries, other organizations, and individual users. It offers the affordable ability to turn just about any electronic text that is not image-based into an artificially spoken communication. The…
Digital Data Collection and Analysis: Application for Clinical Practice

ERIC Educational Resources Information Center

Ingram, Kelly; Bunta, Ferenc; Ingram, David

2004-01-01

Technology for digital speech recording and speech analysis is now readily available for all clinicians who use a computer. This article discusses some advantages of moving from analog to digital recordings and outlines basic recording procedures. The purpose of this article is to familiarize speech-language pathologists with computerized audio…
Pattern learning with deep neural networks in EMG-based speech recognition.

PubMed

Wand, Michael; Schultz, Tanja

2014-01-01

We report on classification of phones and phonetic features from facial electromyographic (EMG) data, within the context of our EMG-based Silent Speech interface. In this paper we show that a Deep Neural Network can be used to perform this classification task, yielding a significant improvement over conventional Gaussian Mixture models. Our central contribution is the visualization of patterns which are learned by the neural network. With increasing network depth, these patterns represent more and more intricate electromyographic activity.

Speech Recognition as a Support Service for Deaf and Hard of Hearing Students: Adaptation and Evaluation. Final Report to Spencer Foundation.

ERIC Educational Resources Information Center

Stinson, Michael; Elliot, Lisa; McKee, Barbara; Coyne, Gina

This report discusses a project that adapted new automatic speech recognition (ASR) technology to provide real-time speech-to-text transcription as a support service for students who are deaf and hard of hearing (D/HH). In this system, as the teacher speaks, a hearing intermediary, or captionist, dictates into the speech recognition system in a…
An exploratory study on the driving method of speech synthesis based on the human eye reading imaging data

NASA Astrophysics Data System (ADS)

Gao, Pei-pei; Liu, Feng

2016-10-01

With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.
Implementation of the Intelligent Voice System for Kazakh

NASA Astrophysics Data System (ADS)

Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

2014-04-01

Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.
Library Automation Design for Visually Impaired People

ERIC Educational Resources Information Center

Yurtay, Nilufer; Bicil, Yucel; Celebi, Sait; Cit, Guluzar; Dural, Deniz

2011-01-01

Speech synthesis is a technology used in many different areas in computer science. This technology can bring a solution to reading activity of visually impaired people due to its text to speech conversion. Based on this problem, in this study, a system is designed needed for a visually impaired person to make use of all the library facilities in…
Impact of Technology Based Instruction on Speech Competency and Presentation Confidence Levels of Hispanic College Students

ERIC Educational Resources Information Center

Mundy, Marie-Anne; Padilla Oviedo, Andres; Ramirez, Juan; Taylor, Nick; Flores, Itza

2014-01-01

One of the main goals of universities is to graduate students who are capable and competent in competing in the workforce. As presentational communication skills are critical in today's job market, Hispanic university students need to be trained to effectively develop and deliver presentational speeches. Web/technology enhanced training techniques…
Transcription and Annotation of a Japanese Accented Spoken Corpus of L2 Spanish for the Development of CAPT Applications

ERIC Educational Resources Information Center

Carranza, Mario

2016-01-01

This paper addresses the process of transcribing and annotating spontaneous non-native speech with the aim of compiling a training corpus for the development of Computer Assisted Pronunciation Training (CAPT) applications, enhanced with Automatic Speech Recognition (ASR) technology. To better adapt ASR technology to CAPT tools, the recognition…
Technology and the evolution of clinical methods for stuttering.

PubMed

Packman, Ann; Meredith, Grant

2011-06-01

The World Wide Web (WWW) was 20 years old last year. Enormous amounts of information about stuttering are now available to anyone who can access the Internet. Compared to 20 years ago, people who stutter and their families can now make more informed choices about speech-language interventions, from a distance. Blogs and chat rooms provide opportunities for people who stutter to share their experiences from a distance and to support one another. New technologies are also being adopted into speech-language pathology practice and service delivery. Telehealth is an exciting development as it means that treatment can now be made available to many rural and remotely located people who previously did not have access to it. Possible future technological developments for speech-language pathology practice include Internet based treatments and the use of Virtual Reality. Having speech and CBT treatments for stuttering available on the Internet would greatly increase their accessibility. Second Life also has exciting possibilities for people who stutter. The reader will (1) explain how people who stutter and their families can get information about stuttering from the World Wide Web, (2) discuss how new technologies have been applied in speech-language pathology practice, and (3) summarize the principles and practice of telehealth delivery of services for people who stutter and their families. Copyright © 2011. Published by Elsevier Inc.
Reprint of: technology and the evolution of clinical methods for stuttering.

PubMed

Packman, Ann; Meredith, Grant

2011-09-01

The World Wide Web (WWW) was 20 years old last year. Enormous amounts of information about stuttering are now available to anyone who can access the Internet. Compared to 20 years ago, people who stutter and their families can now make more informed choices about speech-language interventions, from a distance. Blogs and chat rooms provide opportunities for people who stutter to share their experiences from a distance and to support one another. New technologies are also being adopted into speech-language pathology practice and service delivery. Telehealth is an exciting development as it means that treatment can now be made available to many rural and remotely located people who previously did not have access to it. Possible future technological developments for speech-language pathology practice include Internet based treatments and the use of Virtual Reality. Having speech and CBT treatments for stuttering available on the Internet would greatly increase their accessibility. Second Life also has exciting possibilities for people who stutter. The reader will (1) explain how people who stutter and their families can get information about stuttering from the World Wide Web, (2) discuss how new technologies have been applied in speech-language pathology practice, and (3) summarize the principles and practice of telehealth delivery of services for people who stutter and their families. Copyright © 2011. Published by Elsevier Inc.
A Comparison of LBG and ADPCM Speech Compression Techniques

NASA Astrophysics Data System (ADS)

Bachu, Rajesh G.; Patel, Jignasa; Barkana, Buket D.

Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. In all speech there is a degree of predictability and speech coding techniques exploit this to reduce bit rates yet still maintain a suitable level of quality. This paper is a study and implementation of Linde-Buzo-Gray Algorithm (LBG) and Adaptive Differential Pulse Code Modulation (ADPCM) algorithms to compress speech signals. In here we implemented the methods using MATLAB 7.0. The methods we used in this study gave good results and performance in compressing the speech and listening tests showed that efficient and high quality coding is achieved.
Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.

ERIC Educational Resources Information Center

Horn, Carin E.; Scott, Brian L.

A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…
Supporting Dictation Speech Recognition Error Correction: The Impact of External Information

ERIC Educational Resources Information Center

Shi, Yongmei; Zhou, Lina

2011-01-01

Although speech recognition technology has made remarkable progress, its wide adoption is still restricted by notable effort made and frustration experienced by users while correcting speech recognition errors. One of the promising ways to improve error correction is by providing user support. Although support mechanisms have been proposed for…
Using Web Speech Technology with Language Learning Applications

ERIC Educational Resources Information Center

Daniels, Paul

2015-01-01

In this article, the author presents the history of human-to-computer interaction based upon the design of sophisticated computerized speech recognition algorithms. Advancements such as the arrival of cloud-based computing and software like Google's Web Speech API allows anyone with an Internet connection and Chrome browser to take advantage of…
The effect of hearing aid technologies on listening in an automobile

PubMed Central

Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A.; Stanziola, Rachel W.

2014-01-01

Background Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile /road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the driver/listener, conventional directional processing that places the directivity beam toward the listener’s front may not be helpful, and in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). Purpose The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener’s speech recognition performance and preference for communication in a traveling automobile. Research design A single-blinded, repeated-measures design was used. Study Sample Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 years participated in the study. Data Collection and Analysis The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 miles/hour on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound treated booth to assess speech recognition performance and preference with each programmed condition. Results Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants’ preferences for a given processing scheme were generally consistent with speech recognition results. Conclusions The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. PMID:23886425
The Speech multi features fusion perceptual hash algorithm based on tensor decomposition

NASA Astrophysics Data System (ADS)

Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.

2018-03-01

With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.
An integrated analysis of speech and gestural characteristics in conversational child-computer interactions

NASA Astrophysics Data System (ADS)

Yildirim, Serdar; Montanari, Simona; Andersen, Elaine; Narayanan, Shrikanth S.

2003-10-01

Understanding the fine details of children's speech and gestural characteristics helps, among other things, in creating natural computer interfaces. We analyze the acoustic, lexical/non-lexical and spoken/gestural discourse characteristics of young children's speech using audio-video data gathered using a Wizard of Oz technique from 4 to 6 year old children engaged in resolving a series of age-appropriate cognitive challenges. Fundamental and formant frequencies exhibited greater variations between subjects consistent with previous results on read speech [Lee et al., J. Acoust. Soc. Am. 105, 1455-1468 (1999)]. Also, our analysis showed that, in a given bandwidth, phonemic information contained in the speech of young child is significantly less than that of older ones and adults. To enable an integrated analysis, a multi-track annotation board was constructed using the ANVIL tool kit [M. Kipp, Eurospeech 1367-1370 (2001)]. Along with speech transcriptions and acoustic analysis, non-lexical and discourse characteristics, and child's gesture (facial expressions, body movements, hand/head movements) were annotated in a synchronized multilayer system. Initial results showed that younger children rely more on gestures to emphasize their verbal assertions. Younger children use non-lexical speech (e.g., um, huh) associated with frustration and pondering/reflecting more frequently than older ones. Younger children also repair more with humans than with computer.
Multimodal Neuroelectric Interface Development

NASA Technical Reports Server (NTRS)

Trejo, Leonard J.; Wheeler, Kevin R.; Jorgensen, Charles C.; Totah, Joseph (Technical Monitor)

2001-01-01

This project aims to improve performance of NASA missions by developing multimodal neuroelectric technologies for augmented human-system interaction. Neuroelectric technologies will add completely new modes of interaction that operate in parallel with keyboards, speech, or other manual controls, thereby increasing the bandwidth of human-system interaction. We recently demonstrated the feasibility of real-time electromyographic (EMG) pattern recognition for a direct neuroelectric human-computer interface. We recorded EMG signals from an elastic sleeve with dry electrodes, while a human subject performed a range of discrete gestures. A machine-teaming algorithm was trained to recognize the EMG patterns associated with the gestures and map them to control signals. Successful applications now include piloting two Class 4 aircraft simulations (F-15 and 757) and entering data with a "virtual" numeric keyboard. Current research focuses on on-line adaptation of EMG sensing and processing and recognition of continuous gestures. We are also extending this on-line pattern recognition methodology to electroencephalographic (EEG) signals. This will allow us to bypass muscle activity and draw control signals directly from the human brain. Our system can reliably detect P-rhythm (a periodic EEG signal from motor cortex in the 10 Hz range) with a lightweight headset containing saline-soaked sponge electrodes. The data show that EEG p-rhythm can be modulated by real and imaginary motions. Current research focuses on using biofeedback to train of human subjects to modulate EEG rhythms on demand, and to examine interactions of EEG-based control with EMG-based and manual control. Viewgraphs on these neuroelectric technologies are also included.
Nonlinear Frequency Compression in Hearing Aids: Impact on Speech and Language Development

PubMed Central

Bentler, Ruth; Walker, Elizabeth; McCreery, Ryan; Arenas, Richard M.; Roush, Patricia

2015-01-01

Objectives The research questions of this study were: (1) Are children using nonlinear frequency compression (NLFC) in their hearing aids getting better access to the speech signal than children using conventional processing schemes? The authors hypothesized that children whose hearing aids provided wider input bandwidth would have more access to the speech signal, as measured by an adaptation of the Speech Intelligibility Index, and (2) are speech and language skills different for children who have been fit with the two different technologies; if so, in what areas? The authors hypothesized that if the children were getting increased access to the speech signal as a result of their NLFC hearing aids (question 1), it would be possible to see improved performance in areas of speech production, morphosyntax, and speech perception compared with the group with conventional processing. Design Participants included 66 children with hearing loss recruited as part of a larger multisite National Institutes of Health–funded study, Outcomes for Children with Hearing Loss, designed to explore the developmental outcomes of children with mild to severe hearing loss. For the larger study, data on communication, academic and psychosocial skills were gathered in an accelerated longitudinal design, with entry into the study between 6 months and 7 years of age. Subjects in this report consisted of 3-, 4-, and 5-year-old children recruited at the North Carolina test site. All had at least at least 6 months of current hearing aid usage with their NLFC or conventional amplification. Demographic characteristics were compared at the three age levels as well as audibility and speech/language outcomes; speech-perception scores were compared for the 5-year-old groups. Results Results indicate that the audibility provided did not differ between the technology options. As a result, there was no difference between groups on speech or language outcome measures at 4 or 5 years of age, and no impact on speech perception (measured at 5 years of age). The difference in Comprehensive Assessment of Spoken Language and mean length of utterance scores for the 3-year-old group favoring the group with conventional amplification may be a consequence of confounding factors such as increased incidence of prematurity in the group using NLFC. Conclusions Children fit with NLFC had similar audibility, as measured by a modified Speech Intelligibility Index, compared with a matched group of children using conventional technology. In turn, there were no differences in their speech and language abilities. PMID:24892229
Nonlinear frequency compression in hearing aids: impact on speech and language development.

PubMed

Bentler, Ruth; Walker, Elizabeth; McCreery, Ryan; Arenas, Richard M; Roush, Patricia

2014-01-01

The research questions of this study were: (1) Are children using nonlinear frequency compression (NLFC) in their hearing aids getting better access to the speech signal than children using conventional processing schemes? The authors hypothesized that children whose hearing aids provided wider input bandwidth would have more access to the speech signal, as measured by an adaptation of the Speech Intelligibility Index, and (2) are speech and language skills different for children who have been fit with the two different technologies; if so, in what areas? The authors hypothesized that if the children were getting increased access to the speech signal as a result of their NLFC hearing aids (question 1), it would be possible to see improved performance in areas of speech production, morphosyntax, and speech perception compared with the group with conventional processing. Participants included 66 children with hearing loss recruited as part of a larger multisite National Institutes of Health-funded study, Outcomes for Children with Hearing Loss, designed to explore the developmental outcomes of children with mild to severe hearing loss. For the larger study, data on communication, academic and psychosocial skills were gathered in an accelerated longitudinal design, with entry into the study between 6 months and 7 years of age. Subjects in this report consisted of 3-, 4-, and 5-year-old children recruited at the North Carolina test site. All had at least at least 6 months of current hearing aid usage with their NLFC or conventional amplification. Demographic characteristics were compared at the three age levels as well as audibility and speech/language outcomes; speech-perception scores were compared for the 5-year-old groups. Results indicate that the audibility provided did not differ between the technology options. As a result, there was no difference between groups on speech or language outcome measures at 4 or 5 years of age, and no impact on speech perception (measured at 5 years of age). The difference in Comprehensive Assessment of Spoken Language and mean length of utterance scores for the 3-year-old group favoring the group with conventional amplification may be a consequence of confounding factors such as increased incidence of prematurity in the group using NLFC. Children fit with NLFC had similar audibility, as measured by a modified Speech Intelligibility Index, compared with a matched group of children using conventional technology. In turn, there were no differences in their speech and language abilities.
Multimodal interfaces with voice and gesture input

DOE Office of Scientific and Technical Information (OSTI.GOV)

Milota, A.D.; Blattner, M.M.

1995-07-20

The modalities of speech and gesture have different strengths and weaknesses, but combined they create synergy where each modality corrects the weaknesses of the other. We believe that a multimodal system such a one interwining speech and gesture must start from a different foundation than ones which are based solely on pen input. In order to provide a basis for the design of a speech and gesture system, we have examined the research in other disciplines such as anthropology and linguistics. The result of this investigation was a taxonomy that gave us material for the incorporation of gestures whose meaningsmore » are largely transparent to the users. This study describes the taxonomy and gives examples of applications to pen input systems.« less
Field-testing the new DECtalk PC system for medical applications

NASA Technical Reports Server (NTRS)

Grams, R. R.; Smillov, A.; Li, B.

1992-01-01

Synthesized human speech has now reached a new level of performance. With the introduction of DEC's new DECtalk PC, the small system developer will have a very powerful tool for creative design. It has been our privilege to be involved in the beta-testing of this new device and to add a medical dictionary which covers a wide range of medical terminology. With the inherent board level understanding of speech synthesis and the medical dictionary, it is now possible to provide full digital speech output for all medical files and terms. The application of these tools will cover a wide range of options for the future and allow a new dimension in dealing with the complex user interface experienced in medical practice.

Real-time classification of auditory sentences using evoked cortical activity in humans

NASA Astrophysics Data System (ADS)

Moses, David A.; Leonard, Matthew K.; Chang, Edward F.

2018-06-01

Objective. Recent research has characterized the anatomical and functional basis of speech perception in the human auditory cortex. These advances have made it possible to decode speech information from activity in brain regions like the superior temporal gyrus, but no published work has demonstrated this ability in real-time, which is necessary for neuroprosthetic brain-computer interfaces. Approach. Here, we introduce a real-time neural speech recognition (rtNSR) software package, which was used to classify spoken input from high-resolution electrocorticography signals in real-time. We tested the system with two human subjects implanted with electrode arrays over the lateral brain surface. Subjects listened to multiple repetitions of ten sentences, and rtNSR classified what was heard in real-time from neural activity patterns using direct sentence-level and HMM-based phoneme-level classification schemes. Main results. We observed single-trial sentence classification accuracies of 90% or higher for each subject with less than 7 minutes of training data, demonstrating the ability of rtNSR to use cortical recordings to perform accurate real-time speech decoding in a limited vocabulary setting. Significance. Further development and testing of the package with different speech paradigms could influence the design of future speech neuroprosthetic applications.
SDI Software Technology Program Plan Version 1.5

DTIC Science & Technology

1987-06-01

computer generation of auditory communication of meaningful speech. Most speech synthesizers are based on mathematical models of the human vocal tract, but...oral/ auditory and multimodal communications. Although such state-of-the-art interaction technology has not fully matured, user experience has...superior I pattern matching capabilities and the subliminal intuitive deduction capability. The error performance of humans can be helped by careful
Measuring listening effort: driving simulator vs. simple dual-task paradigm

PubMed Central

Wu, Yu-Hsiang; Aksan, Nazan; Rizzo, Matthew; Stangl, Elizabeth; Zhang, Xuyang; Bentler, Ruth

2014-01-01

Objectives The dual-task paradigm has been widely used to measure listening effort. The primary objectives of the study were to (1) investigate the effect of hearing aid amplification and a hearing aid directional technology on listening effort measured by a complicated, more real world dual-task paradigm, and (2) compare the results obtained with this paradigm to a simpler laboratory-style dual-task paradigm. Design The listening effort of adults with hearing impairment was measured using two dual-task paradigms, wherein participants performed a speech recognition task simultaneously with either a driving task in a simulator or a visual reaction-time task in a sound-treated booth. The speech materials and road noises for the speech recognition task were recorded in a van traveling on the highway in three hearing aid conditions: unaided, aided with omni directional processing (OMNI), and aided with directional processing (DIR). The change in the driving task or the visual reaction-time task performance across the conditions quantified the change in listening effort. Results Compared to the driving-only condition, driving performance declined significantly with the addition of the speech recognition task. Although the speech recognition score was higher in the OMNI and DIR conditions than in the unaided condition, driving performance was similar across these three conditions, suggesting that listening effort was not affected by amplification and directional processing. Results from the simple dual-task paradigm showed a similar trend: hearing aid technologies improved speech recognition performance, but did not affect performance in the visual reaction-time task (i.e., reduce listening effort). The correlation between listening effort measured using the driving paradigm and the visual reaction-time task paradigm was significant. The finding showing that our older (56 to 85 years old) participants’ better speech recognition performance did not result in reduced listening effort was not consistent with literature that evaluated younger (approximately 20 years old), normal hearing adults. Because of this, a follow-up study was conducted. In the follow-up study, the visual reaction-time dual-task experiment using the same speech materials and road noises was repeated on younger adults with normal hearing. Contrary to findings with older participants, the results indicated that the directional technology significantly improved performance in both speech recognition and visual reaction-time tasks. Conclusions Adding a speech listening task to driving undermined driving performance. Hearing aid technologies significantly improved speech recognition while driving, but did not significantly reduce listening effort. Listening effort measured by dual-task experiments using a simulated real-world driving task and a conventional laboratory-style task was generally consistent. For a given listening environment, the benefit of hearing aid technologies on listening effort measured from younger adults with normal hearing may not be fully translated to older listeners with hearing impairment. PMID:25083599
Implicit theory manipulations affecting efficacy of a smartphone application aiding speech therapy for Parkinson's patients.

PubMed

Nolan, Peter; Hoskins, Sherria; Johnson, Julia; Powell, Vaughan; Chaudhuri, K Ray; Eglin, Roger

2012-01-01

A Smartphone speech-therapy application (STA) is being developed, intended for people with Parkinson's disease (PD) with reduced implicit volume cues. The STA offers visual volume feedback, addressing diminished auditory cues. Users are typically older adults, less familiar with new technology. Domain-specific implicit theories (ITs) have been shown to result in mastery or helpless behaviors. Studies manipulating participants' implicit theories of 'technology' (Study One), and 'ability to affect one's voice' (Study Two), were coordinated with iterative STA test-stages, using patients with PD with prior speech-therapist referrals. Across studies, findings suggest it is possible to manipulate patients' ITs related to engaging with a Smartphone STA. This potentially impacts initial application approach and overall effort using a technology-based therapy.
Long-Term Outcomes of Speech Therapy for Seven Adolescents with Visual Feedback Technologies: Ultrasound and Electropalatography

ERIC Educational Resources Information Center

Bacsfalvi, Penelope; Bernhardt, Barbara May

2011-01-01

This follow-up study investigated the speech production of seven adolescents and young adults with hearing impairment 2-4 years after speech intervention with ultrasound and electropalatography. Perceptual judgments by seven expert listeners revealed that five out of seven speakers either continued to generalize post-treatment or maintained their…
Ultrasound as Visual Feedback in Speech Habilitation: Exploring Consultative Use in Rural British Columbia, Canada

ERIC Educational Resources Information Center

Bernhardt, B. May; Bacsfalvi, Penelope; Adler-Bock, Marcy; Shimizu, Reiko; Cheney, Audrey; Giesbrecht, Nathan; O'Connell, Maureen; Sirianni, Jason; Radanov, Bosko

2008-01-01

Ultrasound has shown promise as a visual feedback tool in speech therapy. Rural clients, however, often have minimal access to new technologies. The purpose of the current study was to evaluate consultative treatment using ultrasound in rural communities. Two speech-language pathologists (SLPs) trained in ultrasound use provided consultation with…
Implicit prosody mining based on the human eye image capture technology

NASA Astrophysics Data System (ADS)

Gao, Pei-pei; Liu, Feng

2013-08-01

The technology of eye tracker has become the main methods of analyzing the recognition issues in human-computer interaction. Human eye image capture is the key problem of the eye tracking. Based on further research, a new human-computer interaction method introduced to enrich the form of speech synthetic. We propose a method of Implicit Prosody mining based on the human eye image capture technology to extract the parameters from the image of human eyes when reading, control and drive prosody generation in speech synthesis, and establish prosodic model with high simulation accuracy. Duration model is key issues for prosody generation. For the duration model, this paper put forward a new idea for obtaining gaze duration of eyes when reading based on the eye image capture technology, and synchronous controlling this duration and pronunciation duration in speech synthesis. The movement of human eyes during reading is a comprehensive multi-factor interactive process, such as gaze, twitching and backsight. Therefore, how to extract the appropriate information from the image of human eyes need to be considered and the gaze regularity of eyes need to be obtained as references of modeling. Based on the analysis of current three kinds of eye movement control model and the characteristics of the Implicit Prosody reading, relative independence between speech processing system of text and eye movement control system was discussed. It was proved that under the same text familiarity condition, gaze duration of eyes when reading and internal voice pronunciation duration are synchronous. The eye gaze duration model based on the Chinese language level prosodic structure was presented to change previous methods of machine learning and probability forecasting, obtain readers' real internal reading rhythm and to synthesize voice with personalized rhythm. This research will enrich human-computer interactive form, and will be practical significance and application prospect in terms of disabled assisted speech interaction. Experiments show that Implicit Prosody mining based on the human eye image capture technology makes the synthesized speech has more flexible expressions.
Challenges and Recent Developments in Hearing Aids: Part I. Speech Understanding in Noise, Microphone Technologies and Noise Reduction Algorithms

PubMed Central

Chung, King

2004-01-01

This review discusses the challenges in hearing aid design and fitting and the recent developments in advanced signal processing technologies to meet these challenges. The first part of the review discusses the basic concepts and the building blocks of digital signal processing algorithms, namely, the signal detection and analysis unit, the decision rules, and the time constants involved in the execution of the decision. In addition, mechanisms and the differences in the implementation of various strategies used to reduce the negative effects of noise are discussed. These technologies include the microphone technologies that take advantage of the spatial differences between speech and noise and the noise reduction algorithms that take advantage of the spectral difference and temporal separation between speech and noise. The specific technologies discussed in this paper include first-order directional microphones, adaptive directional microphones, second-order directional microphones, microphone matching algorithms, array microphones, multichannel adaptive noise reduction algorithms, and synchrony detection noise reduction algorithms. Verification data for these technologies, if available, are also summarized. PMID:15678225
Speech perception and production in severe environments

NASA Astrophysics Data System (ADS)

Pisoni, David B.

1990-09-01

The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.
An integrated approach to improving noisy speech perception

NASA Astrophysics Data System (ADS)

Koval, Serguei; Stolbov, Mikhail; Smirnova, Natalia; Khitrov, Mikhail

2002-05-01

For a number of practical purposes and tasks, experts have to decode speech recordings of very poor quality. A combination of techniques is proposed to improve intelligibility and quality of distorted speech messages and thus facilitate their comprehension. Along with the application of noise cancellation and speech signal enhancement techniques removing and/or reducing various kinds of distortions and interference (primarily unmasking and normalization in time and frequency fields), the approach incorporates optimal listener expert tactics based on selective listening, nonstandard binaural listening, accounting for short-term and long-term human ear adaptation to noisy speech, as well as some methods of speech signal enhancement to support speech decoding during listening. The approach integrating the suggested techniques ensures high-quality ultimate results and has successfully been applied by Speech Technology Center experts and by numerous other users, mainly forensic institutions, to perform noisy speech records decoding for courts, law enforcement and emergency services, accident investigation bodies, etc.
[The endpoint detection of cough signal in continuous speech].

PubMed

Yang, Guoqing; Mo, Hongqiang; Li, Wen; Lian, Lianfang; Zheng, Zeguang

2010-06-01

The endpoint detection of cough signal in continuous speech has been researched in order to improve the efficiency and veracity of manual recognition or computer-based automatic recognition. First, using the short time zero crossing ratio(ZCR) for identifying the suspicious coughs and getting the threshold of short time energy based on acoustic characteristics of cough. Then, the short time energy is combined with short time ZCR in order to implement the endpoint detection of cough in continuous speech. To evaluate the effect of the method, first, the virtual number of coughs in each recording was identified by two experienced doctors using the graphical user interface (GUI). Second, the recordings were analyzed by automatic endpoint detection program under Matlab7.0. Finally, the comparison between these two results showed: The error rate of undetected cough is 2.18%, and 98.13% of noise, silence and speech were removed. The way of setting short time energy threshold is robust. The endpoint detection program can remove most speech and noise, thus maintaining a lower rate of error.
Musical melody and speech intonation: singing a different tune.

PubMed

Zatorre, Robert J; Baum, Shari R

2012-01-01

Music and speech are often cited as characteristically human forms of communication. Both share the features of hierarchical structure, complex sound systems, and sensorimotor sequencing demands, and both are used to convey and influence emotions, among other functions [1]. Both music and speech also prominently use acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, and the fact that pitch perception and production involve the same peripheral transduction system (cochlea) and the same production mechanism (vocal tract), it might be natural to assume that pitch processing in speech and music would also depend on the same underlying cognitive and neural mechanisms. In this essay we argue that the processing of pitch information differs significantly for speech and music; specifically, we suggest that there are two pitch-related processing systems, one for more coarse-grained, approximate analysis and one for more fine-grained accurate representation, and that the latter is unique to music. More broadly, this dissociation offers clues about the interface between sensory and motor systems, and highlights the idea that multiple processing streams are a ubiquitous feature of neuro-cognitive architectures.
Assessing the Electrode-Neuron Interface with the Electrically Evoked Compound Action Potential, Electrode Position, and Behavioral Thresholds.

PubMed

DeVries, Lindsay; Scheperle, Rachel; Bierer, Julie Arenberg

2016-06-01

Variability in speech perception scores among cochlear implant listeners may largely reflect the variable efficacy of implant electrodes to convey stimulus information to the auditory nerve. In the present study, three metrics were applied to assess the quality of the electrode-neuron interface of individual cochlear implant channels: the electrically evoked compound action potential (ECAP), the estimation of electrode position using computerized tomography (CT), and behavioral thresholds using focused stimulation. The primary motivation of this approach is to evaluate the ECAP as a site-specific measure of the electrode-neuron interface in the context of two peripheral factors that likely contribute to degraded perception: large electrode-to-modiolus distance and reduced neural density. Ten unilaterally implanted adults with Advanced Bionics HiRes90k devices participated. ECAPs were elicited with monopolar stimulation within a forward-masking paradigm to construct channel interaction functions (CIF), behavioral thresholds were obtained with quadrupolar (sQP) stimulation, and data from imaging provided estimates of electrode-to-modiolus distance and scalar location (scala tympani (ST), intermediate, or scala vestibuli (SV)) for each electrode. The width of the ECAP CIF was positively correlated with electrode-to-modiolus distance; both of these measures were also influenced by scalar position. The ECAP peak amplitude was negatively correlated with behavioral thresholds. Moreover, subjects with low behavioral thresholds and large ECAP amplitudes, averaged across electrodes, tended to have higher speech perception scores. These results suggest a potential clinical role for the ECAP in the objective assessment of individual cochlear implant channels, with the potential to improve speech perception outcomes.
Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography

PubMed Central

Freitas, João; Teixeira, António; Silva, Samuel; Oliveira, Catarina; Dias, Miguel Sales

2015-01-01

Nasality is a very important characteristic of several languages, European Portuguese being one of them. This paper addresses the challenge of nasality detection in surface electromyography (EMG) based speech interfaces. We explore the existence of useful information about the velum movement and also assess if muscles deeper down in the face and neck region can be measured using surface electrodes, and the best electrode location to do so. The procedure we adopted uses Real-Time Magnetic Resonance Imaging (RT-MRI), collected from a set of speakers, providing a method to interpret EMG data. By ensuring compatible data recording conditions, and proper time alignment between the EMG and the RT-MRI data, we are able to accurately estimate the time when the velum moves and the type of movement when a nasal vowel occurs. The combination of these two sources revealed interesting and distinct characteristics in the EMG signal when a nasal vowel is uttered, which motivated a classification experiment. Overall results of this experiment provide evidence that it is possible to detect velum movement using sensors positioned below the ear, between mastoid process and the mandible, in the upper neck region. In a frame-based classification scenario, error rates as low as 32.5% for all speakers and 23.4% for the best speaker have been achieved, for nasal vowel detection. This outcome stands as an encouraging result, fostering the grounds for deeper exploration of the proposed approach as a promising route to the development of an EMG-based speech interface for languages with strong nasal characteristics. PMID:26069968
What does voice-processing technology support today?

PubMed Central

Nakatsu, R; Suzuki, Y

1995-01-01

This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions. Images Fig. 3 PMID:7479720
Telerehabilitation, virtual therapists, and acquired neurologic speech and language disorders.

PubMed

Cherney, Leora R; van Vuuren, Sarel

2012-08-01

Telerehabilitation (telerehab) offers cost-effective services that potentially can improve access to care for those with acquired neurologic communication disorders. However, regulatory issues including licensure, reimbursement, and threats to privacy and confidentiality hinder the routine implementation of telerehab services into the clinical setting. Despite these barriers, rapid technological advances and a growing body of research regarding the use of telerehab applications support its use. This article reviews the evidence related to acquired neurologic speech and language disorders in adults, focusing on studies that have been published since 2000. Research studies have used telerehab systems to assess and treat disorders including dysarthria, apraxia of speech, aphasia, and mild Alzheimer disease. They show that telerehab is a valid and reliable vehicle for delivering speech and language services. The studies represent a progression of technological advances in computing, Internet, and mobile technologies. They range on a continuum from working synchronously (in real-time) with a speech-language pathologist to working asynchronously (offline) with a stand-in virtual therapist. One such system that uses a virtual therapist for the treatment of aphasia, the Web-ORLA™ (Rehabilitation Institute of Chicago, Chicago, IL) system, is described in detail. Future directions for the advancement of telerehab for clinical practice are discussed. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Speech-Language Pathologists' Perceptions of the Importance and Ability to Use Assistive Technology in the Kingdom of Saudi Arabia

ERIC Educational Resources Information Center

Al-Dawaideh, Ahmad Mousa

2013-01-01

Speech-language pathologists (SLPs) frequently work with people with severe communication disorders who require assistive technology (AT) for communication. The purpose of this study was to investigate the SLPs perceptions of the importance of and ability level required for using AT, and the relationship of AT with gender, level of education,…
Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG

PubMed Central

O'Sullivan, James A.; Power, Alan J.; Mesgarani, Nima; Rajaram, Siddharth; Foxe, John J.; Shinn-Cunningham, Barbara G.; Slaney, Malcolm; Shamma, Shihab A.; Lalor, Edmund C.

2015-01-01

How humans solve the cocktail party problem remains unknown. However, progress has been made recently thanks to the realization that cortical activity tracks the amplitude envelope of speech. This has led to the development of regression methods for studying the neurophysiology of continuous speech. One such method, known as stimulus-reconstruction, has been successfully utilized with cortical surface recordings and magnetoencephalography (MEG). However, the former is invasive and gives a relatively restricted view of processing along the auditory hierarchy, whereas the latter is expensive and rare. Thus it would be extremely useful for research in many populations if stimulus-reconstruction was effective using electroencephalography (EEG), a widely available and inexpensive technology. Here we show that single-trial (≈60 s) unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment. Furthermore, we show a significant correlation between our EEG-based measure of attention and performance on a high-level attention task. In addition, by attempting to decode attention at individual latencies, we identify neural processing at ∼200 ms as being critical for solving the cocktail party problem. These findings open up new avenues for studying the ongoing dynamics of cognition using EEG and for developing effective and natural brain–computer interfaces. PMID:24429136
Interference effects of vocalization on dual task performance

NASA Astrophysics Data System (ADS)

Owens, J. M.; Goodman, L. S.; Pianka, M. J.

1984-09-01

Voice command and control systems have been proposed as a potential means of off-loading the typically overburdened visual information processing system. However, prior to introducing novel human-machine interfacing technologies in high workload environments, consideration must be given to the integration of the new technologists within existing task structures to ensure that no new sources of workload or interference are systematically introduced. This study examined the use of voice interactive systems technology in the joint performance of two cognitive information processing tasks requiring continuous memory and choice reaction wherein a basis for intertask interference might be expected. Stimuli for the continuous memory task were presented aurally and either voice or keyboard responding was required in the choice reaction task. Performance was significantly degraded in each task when voice responding was required in the choice reaction time task. Performance degradation was evident in higher error scores for both the choice reaction and continuous memory tasks. Performance decrements observed under conditions of high intertask stimulus similarity were not statistically significant. The results signal the need to consider further the task requirements for verbal short-term memory when applying speech technology in multitask environments.
Effects of Compression on Speech Acoustics, Intelligibility, and Sound Quality

PubMed Central

Souza, Pamela E.

2002-01-01

The topic of compression has been discussed quite extensively in the last 20 years (eg, Braida et al., 1982; Dillon, 1996, 2000; Dreschler, 1992; Hickson, 1994; Kuk, 2000 and 2002; Kuk and Ludvigsen, 1999; Moore, 1990; Van Tasell, 1993; Venema, 2000; Verschuure et al., 1996; Walker and Dillon, 1982). However, the latest comprehensive update by this journal was published in 1996 (Kuk, 1996). Since that time, use of compression hearing aids has increased dramatically, from half of hearing aids dispensed only 5 years ago to four out of five hearing aids dispensed today (Strom, 2002b). Most of today's digital and digitally programmable hearing aids are compression devices (Strom, 2002a). It is probable that within a few years, very few patients will be fit with linear hearing aids. Furthermore, compression has increased in complexity, with greater numbers of parameters under the clinician's control. Ideally, these changes will translate to greater flexibility and precision in fitting and selection. However, they also increase the need for information about the effects of compression amplification on speech perception and speech quality. As evidenced by the large number of sessions at professional conferences on fitting compression hearing aids, clinicians continue to have questions about compression technology and when and how it should be used. How does compression work? Who are the best candidates for this technology? How should adjustable parameters be set to provide optimal speech recognition? What effect will compression have on speech quality? These and other questions continue to drive our interest in this technology. This article reviews the effects of compression on the speech signal and the implications for speech intelligibility, quality, and design of clinical procedures. PMID:25425919

Comparison of speech recognition with adaptive digital and FM remote microphone hearing assistance technology by listeners who use hearing aids.

PubMed

Thibodeau, Linda

2014-06-01

The purpose of this study was to compare the benefits of 3 types of remote microphone hearing assistance technology (HAT), adaptive digital broadband, adaptive frequency modulation (FM), and fixed FM, through objective and subjective measures of speech recognition in clinical and real-world settings. Participants included 11 adults, ages 16 to 78 years, with primarily moderate-to-severe bilateral hearing impairment (HI), who wore binaural behind-the-ear hearing aids; and 15 adults, ages 18 to 30 years, with normal hearing. Sentence recognition in quiet and in noise and subjective ratings were obtained in 3 conditions of wireless signal processing. Performance by the listeners with HI when using the adaptive digital technology was significantly better than that obtained with the FM technology, with the greatest benefits at the highest noise levels. The majority of listeners also preferred the digital technology when listening in a real-world noisy environment. The wireless technology allowed persons with HI to surpass persons with normal hearing in speech recognition in noise, with the greatest benefit occurring with adaptive digital technology. The use of adaptive digital technology combined with speechreading cues would allow persons with HI to engage in communication in environments that would have otherwise not been possible with traditional wireless technology.
Public Speaking Anxiety: Comparing Face-to-Face and Web-Based Speeches

ERIC Educational Resources Information Center

Campbell, Scott; Larson, James

2013-01-01

This study is to determine whether or not students have a different level of anxiety between giving a speech to a group of people in a traditional face-to-face classroom setting to a speech given to an audience (visible on a projected screen) into a camera using distance or web-based technology. The study included approximately 70 students.…
Assessing Children's Home Language Environments Using Automatic Speech Recognition Technology

ERIC Educational Resources Information Center

Greenwood, Charles R.; Thiemann-Bourque, Kathy; Walker, Dale; Buzhardt, Jay; Gilkerson, Jill

2011-01-01

The purpose of this research was to replicate and extend some of the findings of Hart and Risley using automatic speech processing instead of human transcription of language samples. The long-term goal of this work is to make the current approach to speech processing possible by researchers and clinicians working on a daily basis with families and…
An Exploration of the Potential of Automatic Speech Recognition to Assist and Enable Receptive Communication in Higher Education

ERIC Educational Resources Information Center

Wald, Mike

2006-01-01

The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search…
The functional neuroanatomy of language

NASA Astrophysics Data System (ADS)

Hickok, Gregory

2009-09-01

There has been substantial progress over the last several years in understanding aspects of the functional neuroanatomy of language. Some of these advances are summarized in this review. It will be argued that recognizing speech sounds is carried out in the superior temporal lobe bilaterally, that the superior temporal sulcus bilaterally is involved in phonological-level aspects of this process, that the frontal/motor system is not central to speech recognition although it may modulate auditory perception of speech, that conceptual access mechanisms are likely located in the lateral posterior temporal lobe (middle and inferior temporal gyri), that speech production involves sensory-related systems in the posterior superior temporal lobe in the left hemisphere, that the interface between perceptual and motor systems is supported by a sensory-motor circuit for vocal tract actions (not dedicated to speech) that is very similar to sensory-motor circuits found in primate parietal lobe, and that verbal short-term memory can be understood as an emergent property of this sensory-motor circuit. These observations are considered within the context of a dual stream model of speech processing in which one pathway supports speech comprehension and the other supports sensory-motor integration. Additional topics of discussion include the functional organization of the planum temporale for spatial hearing and speech-related sensory-motor processes, the anatomical and functional basis of a form of acquired language disorder, conduction aphasia, the neural basis of vocabulary development, and sentence-level/grammatical processing.
Educational Applications for Blind and Partially Sighted Pupils Based on Speech Technologies for Serbian.

PubMed

Lučić, Branko; Ostrogonac, Stevan; Vujnović Sedlar, Nataša; Sečujski, Milan

2015-01-01

The inclusion of persons with disabilities has always represented an important issue. Advancements within the field of computer science have enabled the development of different types of aids, which have significantly improved the quality of life of the disabled. However, for some disabilities, such as visual impairment, the purpose of these aids is to establish an alternative communication channel and thus overcome the user's disability. Speech technologies play the crucial role in this process. This paper presents the ongoing efforts to create a set of educational applications based on speech technologies for Serbian for the early stages of education of blind and partially sighted children. Two educational applications dealing with memory exercises and comprehension of geometrical shapes are presented, along with the initial tests results obtained from research including visually impaired pupils.
Educational Applications for Blind and Partially Sighted Pupils Based on Speech Technologies for Serbian

PubMed Central

Lučić, Branko; Ostrogonac, Stevan; Vujnović Sedlar, Nataša; Sečujski, Milan

2015-01-01

The inclusion of persons with disabilities has always represented an important issue. Advancements within the field of computer science have enabled the development of different types of aids, which have significantly improved the quality of life of the disabled. However, for some disabilities, such as visual impairment, the purpose of these aids is to establish an alternative communication channel and thus overcome the user's disability. Speech technologies play the crucial role in this process. This paper presents the ongoing efforts to create a set of educational applications based on speech technologies for Serbian for the early stages of education of blind and partially sighted children. Two educational applications dealing with memory exercises and comprehension of geometrical shapes are presented, along with the initial tests results obtained from research including visually impaired pupils. PMID:26171422
Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair

NASA Astrophysics Data System (ADS)

Sasou, Akira; Kojima, Hiroaki

2009-12-01

Conventional voice-driven wheelchairs usually employ headset microphones that are capable of achieving sufficient recognition accuracy, even in the presence of surrounding noise. However, such interfaces require users to wear sensors such as a headset microphone, which can be an impediment, especially for the hand disabled. Conversely, it is also well known that the speech recognition accuracy drastically degrades when the microphone is placed far from the user. In this paper, we develop a noise robust speech recognition system for a voice-driven wheelchair. This system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors. We verified the effectiveness of our system in experiments in different environments, and confirmed that our system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors.
Natural user interface as a supplement of the holographic Raman tweezers

NASA Astrophysics Data System (ADS)

Tomori, Zoltan; Kanka, Jan; Kesa, Peter; Jakl, Petr; Sery, Mojmir; Bernatova, Silvie; Antalik, Marian; Zemánek, Pavel

2014-09-01

Holographic Raman tweezers (HRT) manipulates with microobjects by controlling the positions of multiple optical traps via the mouse or joystick. Several attempts have appeared recently to exploit touch tablets, 2D cameras or Kinect game console instead. We proposed a multimodal "Natural User Interface" (NUI) approach integrating hands tracking, gestures recognition, eye tracking and speech recognition. For this purpose we exploited "Leap Motion" and "MyGaze" low-cost sensors and a simple speech recognition program "Tazti". We developed own NUI software which processes signals from the sensors and sends the control commands to HRT which subsequently controls the positions of trapping beams, micropositioning stage and the acquisition system of Raman spectra. System allows various modes of operation proper for specific tasks. Virtual tools (called "pin" and "tweezers") serving for the manipulation with particles are displayed on the transparent "overlay" window above the live camera image. Eye tracker identifies the position of the observed particle and uses it for the autofocus. Laser trap manipulation navigated by the dominant hand can be combined with the gestures recognition of the secondary hand. Speech commands recognition is useful if both hands are busy. Proposed methods make manual control of HRT more efficient and they are also a good platform for its future semi-automated and fully automated work.
"The Communication Needs and Rights of Mankind", Group 1 Report of the Futuristic Priorities Division of the Speech Communication Association. "Future Communication Technologies; Hardware and Software"; Group 2 Report.

ERIC Educational Resources Information Center

Dance, Frank E. X.; And Others

This paper reports on the Futuristic Priorities Division members' recommendations and priorities concerning the impact of the future on communication and on the speech communication discipline. The recommendations and priorities are listed for two subgroups: The Communication Needs and Rights of Mankind; and Future Communication Technologies:…
Phonetics and Technology in the Classroom: A Practical Approach to Using Speech Analysis Software in Second-Language Pronunciation Instruction

ERIC Educational Resources Information Center

Olsen, Daniel J.

2014-01-01

While speech analysis technology has become an integral part of phonetic research, and to some degree is used in language instruction at the most advanced levels, it appears to be mostly absent from the beginning levels of language instruction. In part, the lack of incorporation into the language classroom can be attributed to both the lack of…
A Pilot Investigation regarding Speech-Recognition Performance in Noise for Adults with Hearing Loss in the FM+HA Listening Condition

ERIC Educational Resources Information Center

Lewis, M. Samantha; Gallun, Frederick J.; Gordon, Jane; Lilly, David J.; Crandell, Carl

2010-01-01

While the concurrent use of the hearing aid (HA) microphone with frequency modulation (FM) technology can decrease speech-recognition performance, the FM+HA condition is still an important setting for users of both HA and FM technology. The primary goal of this investigation was to evaluate the effect of attenuating HA gain in the FM+HA listening…
Online EEG Classification of Covert Speech for Brain-Computer Interfacing.

PubMed

Sereshkeh, Alborz Rezazadeh; Trott, Robert; Bricout, Aurélien; Chau, Tom

2017-12-01

Brain-computer interfaces (BCIs) for communication can be nonintuitive, often requiring the performance of hand motor imagery or some other conversation-irrelevant task. In this paper, electroencephalography (EEG) was used to develop two intuitive online BCIs based solely on covert speech. The goal of the first BCI was to differentiate between 10[Formula: see text]s of mental repetitions of the word "no" and an equivalent duration of unconstrained rest. The second BCI was designed to discern between 10[Formula: see text]s each of covert repetition of the words "yes" and "no". Twelve participants used these two BCIs to answer yes or no questions. Each participant completed four sessions, comprising two offline training sessions and two online sessions, one for testing each of the BCIs. With a support vector machine and a combination of spectral and time-frequency features, an average accuracy of [Formula: see text] was reached across participants in the online classification of no versus rest, with 10 out of 12 participants surpassing the chance level (60.0% for [Formula: see text]). The online classification of yes versus no yielded an average accuracy of [Formula: see text], with eight participants exceeding the chance level. Task-specific changes in EEG beta and gamma power in language-related brain areas tended to provide discriminatory information. To our knowledge, this is the first report of online EEG classification of covert speech. Our findings support further study of covert speech as a BCI activation task, potentially leading to the development of more intuitive BCIs for communication.
Dual routes for verbal repetition: articulation-based and acoustic-phonetic codes for pseudoword and word repetition, respectively.

PubMed

Yoo, Sejin; Chung, Jun-Young; Jeon, Hyeon-Ae; Lee, Kyoung-Min; Kim, Young-Bo; Cho, Zang-Hee

2012-07-01

Speech production is inextricably linked to speech perception, yet they are usually investigated in isolation. In this study, we employed a verbal-repetition task to identify the neural substrates of speech processing with two ends active simultaneously using functional MRI. Subjects verbally repeated auditory stimuli containing an ambiguous vowel sound that could be perceived as either a word or a pseudoword depending on the interpretation of the vowel. We found verbal repetition commonly activated the audition-articulation interface bilaterally at Sylvian fissures and superior temporal sulci. Contrasting word-versus-pseudoword trials revealed neural activities unique to word repetition in the left posterior middle temporal areas and activities unique to pseudoword repetition in the left inferior frontal gyrus. These findings imply that the tasks are carried out using different speech codes: an articulation-based code of pseudowords and an acoustic-phonetic code of words. It also supports the dual-stream model and imitative learning of vocabulary. Copyright © 2012 Elsevier Inc. All rights reserved.
Recognizing speech under a processing load: dissociating energetic from informational factors.

PubMed

Mattys, Sven L; Brooks, Joanna; Cooke, Martin

2009-11-01

Effects of perceptual and cognitive loads on spoken-word recognition have so far largely escaped investigation. This study lays the foundations of a psycholinguistic approach to speech recognition in adverse conditions that draws upon the distinction between energetic masking, i.e., listening environments leading to signal degradation, and informational masking, i.e., listening environments leading to depletion of higher-order, domain-general processing resources, independent of signal degradation. We show that severe energetic masking, such as that produced by background speech or noise, curtails reliance on lexical-semantic knowledge and increases relative reliance on salient acoustic detail. In contrast, informational masking, induced by a resource-depleting competing task (divided attention or a memory load), results in the opposite pattern. Based on this clear dissociation, we propose a model of speech recognition that addresses not only the mapping between sensory input and lexical representations, as traditionally advocated, but also the way in which this mapping interfaces with general cognition and non-linguistic processes.
Epidermal mechano-acoustic sensing electronics for cardiovascular diagnostics and human-machine interfaces.

PubMed

Liu, Yuhao; Norton, James J S; Qazi, Raza; Zou, Zhanan; Ammann, Kaitlyn R; Liu, Hank; Yan, Lingqing; Tran, Phat L; Jang, Kyung-In; Lee, Jung Woo; Zhang, Douglas; Kilian, Kristopher A; Jung, Sung Hee; Bretl, Timothy; Xiao, Jianliang; Slepian, Marvin J; Huang, Yonggang; Jeong, Jae-Woong; Rogers, John A

2016-11-01

Physiological mechano-acoustic signals, often with frequencies and intensities that are beyond those associated with the audible range, provide information of great clinical utility. Stethoscopes and digital accelerometers in conventional packages can capture some relevant data, but neither is suitable for use in a continuous, wearable mode, and both have shortcomings associated with mechanical transduction of signals through the skin. We report a soft, conformal class of device configured specifically for mechano-acoustic recording from the skin, capable of being used on nearly any part of the body, in forms that maximize detectable signals and allow for multimodal operation, such as electrophysiological recording. Experimental and computational studies highlight the key roles of low effective modulus and low areal mass density for effective operation in this type of measurement mode on the skin. Demonstrations involving seismocardiography and heart murmur detection in a series of cardiac patients illustrate utility in advanced clinical diagnostics. Monitoring of pump thrombosis in ventricular assist devices provides an example in characterization of mechanical implants. Speech recognition and human-machine interfaces represent additional demonstrated applications. These and other possibilities suggest broad-ranging uses for soft, skin-integrated digital technologies that can capture human body acoustics.
Epidermal mechano-acoustic sensing electronics for cardiovascular diagnostics and human-machine interfaces

PubMed Central

Liu, Yuhao; Norton, James J. S.; Qazi, Raza; Zou, Zhanan; Ammann, Kaitlyn R.; Liu, Hank; Yan, Lingqing; Tran, Phat L.; Jang, Kyung-In; Lee, Jung Woo; Zhang, Douglas; Kilian, Kristopher A.; Jung, Sung Hee; Bretl, Timothy; Xiao, Jianliang; Slepian, Marvin J.; Huang, Yonggang; Jeong, Jae-Woong; Rogers, John A.

2016-01-01

Physiological mechano-acoustic signals, often with frequencies and intensities that are beyond those associated with the audible range, provide information of great clinical utility. Stethoscopes and digital accelerometers in conventional packages can capture some relevant data, but neither is suitable for use in a continuous, wearable mode, and both have shortcomings associated with mechanical transduction of signals through the skin. We report a soft, conformal class of device configured specifically for mechano-acoustic recording from the skin, capable of being used on nearly any part of the body, in forms that maximize detectable signals and allow for multimodal operation, such as electrophysiological recording. Experimental and computational studies highlight the key roles of low effective modulus and low areal mass density for effective operation in this type of measurement mode on the skin. Demonstrations involving seismocardiography and heart murmur detection in a series of cardiac patients illustrate utility in advanced clinical diagnostics. Monitoring of pump thrombosis in ventricular assist devices provides an example in characterization of mechanical implants. Speech recognition and human-machine interfaces represent additional demonstrated applications. These and other possibilities suggest broad-ranging uses for soft, skin-integrated digital technologies that can capture human body acoustics. PMID:28138529
Studies in automatic speech recognition and its application in aerospace

NASA Astrophysics Data System (ADS)

Taylor, Michael Robinson

Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.
Auditory Support in Linguistically Diverse Classrooms: Factors Related to Bilingual Text-to-Speech Use

ERIC Educational Resources Information Center

Van Laere, E.; Braak, J.

2017-01-01

Text-to-speech technology can act as an important support tool in computer-based learning environments (CBLEs) as it provides auditory input, next to on-screen text. Particularly for students who use a language at home other than the language of instruction (LOI) applied at school, text-to-speech can be useful. The CBLE E-Validiv offers content in…
Integrated Speech and Language Technology for Intelligence, Surveillance, and Reconnaissance (ISR)

DTIC Science & Technology

2017-07-01

applying submodularity techniques to address computing challenges posed by large datasets in speech and language processing. MT and speech tools were...aforementioned research-oriented activities, the IT system administration team provided necessary support to laboratory computing and network operations...operations of SCREAM Lab computer systems and networks. Other miscellaneous activities in relation to Task Order 29 are presented in an additional fourth

Intelligibility and Acceptability Testing for Speech Technology

DTIC Science & Technology

1992-05-22

information in memory (Luce, Feustel, and Pisoni, 1983). In high workload or multiple task situations, the added effort of listening to degraded speech can lead...the DRT provides diagnostic feature scores on six phonemic features: voicing, nasality, sustention , sibilation, graveness, and compactness, and on a...of other speech materials (e.g., polysyllabic words, paragraphs) and methods ( memory , comprehension, reaction time) have been used to evaluate the
Performance Evaluation of Intelligent Systems at the National Institute of Standards and Technology (NIST)

DTIC Science & Technology

2011-03-01

past few years, including performance evaluation of emergency response robots , sensor systems on unmanned ground vehicles, speech-to-speech translation...emergency response robots ; intelligent systems; mixed palletizing, testing, simulation; robotic vehicle perception systems; search and rescue robots ...ranging from autonomous vehicles to urban search and rescue robots to speech translation and manufacturing systems. The evaluations have occurred in
Nazareth College: Specialty Preparation for Speech-Language Pathologists to Work with Children Who Are Deaf and Hard of Hearing

ERIC Educational Resources Information Center

Brown, Paula M.; Quenin, Cathy

2010-01-01

The specialty preparation program within the speech-language pathology master's degree program at Nazareth College in Rochester, New York, was designed to train speech-language pathologists to work with children who are deaf and hard of hearing, ages 0 to 21. The program is offered in collaboration with the Rochester Institute of Technology,…
School Leavers with Learning Disabilities Moving from Child to Adult Speech and Language Therapy (SLT) Teams: SLTs' Views of Successful and Less Successful Transition Co-Working Practices

ERIC Educational Resources Information Center

McCartney, Elspeth; Muir, Margaret

2017-01-01

School-leaving for pupils with long-term speech, language, swallowing or communication difficulties requires careful management. Speech and language therapists (SLTs) support communication, secure assistive technology and manage swallowing difficulties post-school. UK SLTs are employed by health services, with child SLT teams based in schools.…
Distributed cooperating processes in a mobile robot control system

NASA Technical Reports Server (NTRS)

Skillman, Thomas L., Jr.

1988-01-01

A mobile inspection robot has been proposed for the NASA Space Station. It will be a free flying autonomous vehicle that will leave a berthing unit to accomplish a variety of inspection tasks around the Space Station, and then return to its berth to recharge, refuel, and transfer information. The Flying Eye robot will receive voice communication to change its attitude, move at a constant velocity, and move to a predefined location along a self generated path. This mobile robot control system requires integration of traditional command and control techniques with a number of AI technologies. Speech recognition, natural language understanding, task and path planning, sensory abstraction and pattern recognition are all required for successful implementation. The interface between the traditional numeric control techniques and the symbolic processing to the AI technologies must be developed, and a distributed computing approach will be needed to meet the real time computing requirements. To study the integration of the elements of this project, a novel mobile robot control architecture and simulation based on the blackboard architecture was developed. The control system operation and structure is discussed.
Applying the System Component and Operationally Relevant Evaluation (SCORE) Framework to Evaluate Advanced Military Technologies

DTIC Science & Technology

2010-03-01

and charac- terize the actions taken by the soldier (e.g., running, walking, climbing stairs ). Real-time image capture and exchange N The ability of...multimedia information sharing among soldiers in the field, two-way speech translation systems, and autonomous robotic platforms. Key words: Emerging...soldiers in the field, two-way speech translation systems, and autonomous robotic platforms. It has been the foundation for 10 technology evaluations
Using speech recognition to enhance the Tongue Drive System functionality in computer access.

PubMed

Huo, Xueliang; Ghovanloo, Maysam

2011-01-01

Tongue Drive System (TDS) is a wireless tongue operated assistive technology (AT), which can enable people with severe physical disabilities to access computers and drive powered wheelchairs using their volitional tongue movements. TDS offers six discrete commands, simultaneously available to the users, for pointing and typing as a substitute for mouse and keyboard in computer access, respectively. To enhance the TDS performance in typing, we have added a microphone, an audio codec, and a wireless audio link to its readily available 3-axial magnetic sensor array, and combined it with a commercially available speech recognition software, the Dragon Naturally Speaking, which is regarded as one of the most efficient ways for text entry. Our preliminary evaluations indicate that the combined TDS and speech recognition technologies can provide end users with significantly higher performance than using each technology alone, particularly in completing tasks that require both pointing and text entry, such as web surfing.
Spatial resolution dependence on spectral frequency in human speech cortex electrocorticography.

PubMed

Muller, Leah; Hamilton, Liberty S; Edwards, Erik; Bouchard, Kristofer E; Chang, Edward F

2016-10-01

Electrocorticography (ECoG) has become an important tool in human neuroscience and has tremendous potential for emerging applications in neural interface technology. Electrode array design parameters are outstanding issues for both research and clinical applications, and these parameters depend critically on the nature of the neural signals to be recorded. Here, we investigate the functional spatial resolution of neural signals recorded at the human cortical surface. We empirically derive spatial spread functions to quantify the shared neural activity for each frequency band of the electrocorticogram. Five subjects with high-density (4 mm center-to-center spacing) ECoG grid implants participated in speech perception and production tasks while neural activity was recorded from the speech cortex, including superior temporal gyrus, precentral gyrus, and postcentral gyrus. The cortical surface field potential was decomposed into traditional EEG frequency bands. Signal similarity between electrode pairs for each frequency band was quantified using a Pearson correlation coefficient. The correlation of neural activity between electrode pairs was inversely related to the distance between the electrodes; this relationship was used to quantify spatial falloff functions for cortical subdomains. As expected, lower frequencies remained correlated over larger distances than higher frequencies. However, both the envelope and phase of gamma and high gamma frequencies (30-150 Hz) are largely uncorrelated (<90%) at 4 mm, the smallest spacing of the high-density arrays. Thus, ECoG arrays smaller than 4 mm have significant promise for increasing signal resolution at high frequencies, whereas less additional gain is achieved for lower frequencies. Our findings quantitatively demonstrate the dependence of ECoG spatial resolution on the neural frequency of interest. We demonstrate that this relationship is consistent across patients and across cortical areas during activity.
Spatial resolution dependence on spectral frequency in human speech cortex electrocorticography

NASA Astrophysics Data System (ADS)

Muller, Leah; Hamilton, Liberty S.; Edwards, Erik; Bouchard, Kristofer E.; Chang, Edward F.

2016-10-01

Objective. Electrocorticography (ECoG) has become an important tool in human neuroscience and has tremendous potential for emerging applications in neural interface technology. Electrode array design parameters are outstanding issues for both research and clinical applications, and these parameters depend critically on the nature of the neural signals to be recorded. Here, we investigate the functional spatial resolution of neural signals recorded at the human cortical surface. We empirically derive spatial spread functions to quantify the shared neural activity for each frequency band of the electrocorticogram. Approach. Five subjects with high-density (4 mm center-to-center spacing) ECoG grid implants participated in speech perception and production tasks while neural activity was recorded from the speech cortex, including superior temporal gyrus, precentral gyrus, and postcentral gyrus. The cortical surface field potential was decomposed into traditional EEG frequency bands. Signal similarity between electrode pairs for each frequency band was quantified using a Pearson correlation coefficient. Main results. The correlation of neural activity between electrode pairs was inversely related to the distance between the electrodes; this relationship was used to quantify spatial falloff functions for cortical subdomains. As expected, lower frequencies remained correlated over larger distances than higher frequencies. However, both the envelope and phase of gamma and high gamma frequencies (30-150 Hz) are largely uncorrelated (<90%) at 4 mm, the smallest spacing of the high-density arrays. Thus, ECoG arrays smaller than 4 mm have significant promise for increasing signal resolution at high frequencies, whereas less additional gain is achieved for lower frequencies. Significance. Our findings quantitatively demonstrate the dependence of ECoG spatial resolution on the neural frequency of interest. We demonstrate that this relationship is consistent across patients and across cortical areas during activity.
Critical Interfaces for Engineers and Scientists, 4 Appraisals. Proceedings of the Annual Joint Meeting of the Engineering Manpower Commission of Engineers Joint Council and the Scientific Manpower Commission, New York, May 18, 1967.

ERIC Educational Resources Information Center

Alden, John D.

Contained in this booklet are the speeches given at the annual joint meeting of the Engineering Manpower Commission and the Scientific Manpower Commission. Each dealt with some problem aspect of the engineer-scientist interface. The presentation by Rear Admiral W. C. Hushing of the U. S. Navy was entitled "The Impact of High Performance Science…
Do What I Say! Voice Recognition Makes Major Advances.

ERIC Educational Resources Information Center

Ruley, C. Dorsey

1994-01-01

Explains voice recognition technology applications in the workplace, schools, and libraries. Highlights include a voice-controlled work station using the DragonDictate system that can be used with dyslexic students, converting text to speech, and converting speech to text. (LRW)
Speech networks at rest and in action: interactions between functional brain networks controlling speech production.

PubMed

Simonyan, Kristina; Fuertinger, Stefan

2015-04-01

Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. Copyright © 2015 the American Physiological Society.
Individual differences in peripheral physiology and implications for the real-time assessment of driver state (phase I & II).

DOT National Transportation Integrated Search

2013-05-01

Cognitively oriented in-vehicle activities (cell-phone calls, speech interfaces, audio translations of text : messages, etc.) increasingly place non-visual demands on a drivers attention. While a drivers eyes may : remain oriented towards the r...
Advancements in text-to-speech technology and implications for AAC applications

NASA Astrophysics Data System (ADS)

Syrdal, Ann K.

2003-10-01

Intelligibility was the initial focus in text-to-speech (TTS) research, since it is clearly a necessary condition for the application of the technology. Sufficiently high intelligibility (approximating human speech) has been achieved in the last decade by the better formant-based and concatenative TTS systems. This led to commercially available TTS systems for highly motivated users, particularly the blind and vocally impaired. Some unnatural qualities of TTS were exploited by these users, such as very fast speaking rates and altered pitch ranges for flagging relevant information. Recently, the focus in TTS research has turned to improving naturalness, so that synthetic speech sounds more human and less robotic. Unit selection approaches to concatenative synthesis have dramatically improved TTS quality, although at the cost of larger and more complex systems. This advancement in naturalness has made TTS technology more acceptable to the general public. The vocally impaired appreciate a more natural voice with which to represent themselves when communicating with others. Unit selection TTS does not achieve such high speaking rates as the earlier TTS systems, however, which is a disadvantage to some AAC device users. An important new research emphasis is to improve and increase the range of emotional expressiveness of TTS.
Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions

NASA Astrophysics Data System (ADS)

Gîlcă, G.; Bîzdoacă, N. G.; Diaconu, I.

2016-08-01

This article aims to implement some practical applications using the Socibot Desktop social robot. We mean to realize three applications: creating a speech sequence using the Kiosk menu of the browser interface, creating a program in the Virtual Robot browser interface and making a new guise to be loaded into the robot's memory in order to be projected onto it face. The first application is actually created in the Compose submenu that contains 5 file categories: audio, eyes, face, head, mood, this being helpful in the creation of the projected sequence. The second application is more complex, the completed program containing: audio files, speeches (can be created in over 20 languages), head movements, the robot's facial parameters function of each action units (AUs) of the facial muscles, its expressions and its line of sight. Last application aims to change the robot's appearance with the guise created by us. The guise was created in Adobe Photoshop and then loaded into the robot's memory.
DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1

NASA Astrophysics Data System (ADS)

Garofolo, J. S.; Lamel, L. F.; Fisher, W. M.; Fiscus, J. G.; Pallett, D. S.

1993-02-01

The Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains speech from 630 speakers representing 8 major dialect divisions of American English, each speaking 10 phonetically-rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic, and word transcriptions, as well as speech waveform data for each spoken sentence. The release of TIMIT contains several improvements over the Prototype CD-ROM released in December, 1988: (1) full 630-speaker corpus, (2) checked and corrected transcriptions, (3) word-alignment transcriptions, (4) NIST SPHERE-headered waveform files and header manipulation software, (5) phonemic dictionary, (6) new test and training subsets balanced for dialectal and phonetic coverage, and (7) more extensive documentation.
Multimodal fusion of polynomial classifiers for automatic person recgonition

NASA Astrophysics Data System (ADS)

Broun, Charles C.; Zhang, Xiaozheng

2001-03-01

With the prevalence of the information age, privacy and personalization are forefront in today's society. As such, biometrics are viewed as essential components of current evolving technological systems. Consumers demand unobtrusive and non-invasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals in all environments without encumbering the user with a head- mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multi modal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality-giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within Markov random field MRF framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late integration approach, based on a probabilistic model, is employed to combine the two modalities. The system is tested on the XM2VTS database combined with AWGN in the audio domain over a range of signal-to-noise ratios.
Implications of Multilingual Interoperability of Speech Technology for Military Use (Les implications de l’interoperabilite multilingue des technologies vocales pour applications militaires)

DTIC Science & Technology

2004-09-01

Databases 2-2 2.3.1 Translanguage English Database 2-2 2.3.2 Australian National Database of Spoken Language 2-3 2.3.3 Strange Corpus 2-3 2.3.4...some relevance to speech technology research. 2.3.1 Translanguage English Database In a daring plan Joseph Mariani, then at LIMSI-CNRS, proposed to...native speakers. The database is known as the ‘ Translanguage English Database’ but is often referred to as the ‘terrible English database.’ About 28
Real-time interactive speech technology at Threshold Technology, Incorporated

NASA Technical Reports Server (NTRS)

Herscher, Marvin B.

1977-01-01

Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed.
Automatic translation among spoken languages

NASA Technical Reports Server (NTRS)

Walter, Sharon M.; Costigan, Kelly

1994-01-01

The Machine Aided Voice Translation (MAVT) system was developed in response to the shortage of experienced military field interrogators with both foreign language proficiency and interrogation skills. Combining speech recognition, machine translation, and speech generation technologies, the MAVT accepts an interrogator's spoken English question and translates it into spoken Spanish. The spoken Spanish response of the potential informant can then be translated into spoken English. Potential military and civilian applications for automatic spoken language translation technology are discussed in this paper.

Speaking with a mirror: engagement of mirror neurons via choral speech and its derivatives induces stuttering inhibition.

PubMed

Kalinowski, Joseph; Saltuklaroglu, Tim

2003-04-01

'Choral speech', 'unison speech', or 'imitation speech' has long been known to immediately induce reflexive, spontaneous, and natural sounding fluency, even the most severe cases of stuttering. Unlike typical post-therapeutic speech, a hallmark characteristic of choral speech is the sense of 'invulnerability' to stuttering, regardless of phonetic context, situational environment, or audience size. We suggest that choral speech immediately inhibits stuttering by engaging mirror systems of neurons, innate primitive neuronal substrates that dominate the initial phases of language development due to their predisposition to reflexively imitate gestural action sequences in a fluent manner. Since mirror systems are primordial in nature, they take precedence over the much later developing stuttering pathology. We suggest that stuttering may best be ameliorated by reengaging mirror neurons via choral speech or one of its derivatives (using digital signal processing technology) to provide gestural mirrors, that are nature's way of immediately overriding the central stuttering block. Copyright 2003 Elsevier Science Ltd.
Texting while driving using Google Glass™: Promising but not distraction-free.

PubMed

He, Jibo; Choi, William; McCarley, Jason S; Chaparro, Barbara S; Wang, Chun

2015-08-01

Texting while driving is risky but common. This study evaluated how texting using a Head-Mounted Display, Google Glass, impacts driving performance. Experienced drivers performed a classic car-following task while using three different interfaces to text: fully manual interaction with a head-down smartphone, vocal interaction with a smartphone, and vocal interaction with Google Glass. Fully manual interaction produced worse driving performance than either of the other interaction methods, leading to more lane excursions and variable vehicle control, and higher workload. Compared to texting vocally with a smartphone, texting using Google Glass produced fewer lane excursions, more braking responses, and lower workload. All forms of texting impaired driving performance compared to undistracted driving. These results imply that the use of Google Glass for texting impairs driving, but its Head-Mounted Display configuration and speech recognition technology may be safer than texting using a smartphone. Copyright © 2015 Elsevier Ltd. All rights reserved.
On the recognition of emotional vocal expressions: motivations for a holistic approach.

PubMed

Esposito, Anna; Esposito, Antonietta M

2012-10-01

Human beings seem to be able to recognize emotions from speech very well and information communication technology aims to implement machines and agents that can do the same. However, to be able to automatically recognize affective states from speech signals, it is necessary to solve two main technological problems. The former concerns the identification of effective and efficient processing algorithms capable of capturing emotional acoustic features from speech sentences. The latter focuses on finding computational models able to classify, with an approximation as good as human listeners, a given set of emotional states. This paper will survey these topics and provide some insights for a holistic approach to the automatic analysis, recognition and synthesis of affective states.
Selecting cockpit functions for speech I/O technology

NASA Technical Reports Server (NTRS)

Simpson, C. A.

1985-01-01

A general methodology for the initial selection of functions for speech generation and speech recognition technology is discussed. The SCR (Stimulus/Central-Processing/Response) compatibility model of Wickens et al. (1983) is examined, and its application is demonstrated for a particular cockpit display problem. Some limits of the applicability of that model are illustrated in the context of predicting overall pilot-aircraft system performance. A program of system performance measurement is recommended for the evaluation of candidate systems. It is suggested that no one measure of system performance can necessarily be depended upon to the exclusion of others. Systems response time, system accuracy, and pilot ratings are all important measures. Finally, these measures must be collected in the context of the total flight task environment.
Hands-free device control using sound picked up in the ear canal

NASA Astrophysics Data System (ADS)

Chhatpar, Siddharth R.; Ngia, Lester; Vlach, Chris; Lin, Dong; Birkhimer, Craig; Juneja, Amit; Pruthi, Tarun; Hoffman, Orin; Lewis, Tristan

2008-04-01

Hands-free control of unmanned ground vehicles is essential for soldiers, bomb disposal squads, and first responders. Having their hands free for other equipment and tasks allows them to be safer and more mobile. Currently, the most successful hands-free control devices are speech-command based. However, these devices use external microphones, and in field environments, e.g., war zones and fire sites, their performance suffers because of loud ambient noise: typically above 90dBA. This paper describes the development of technology using the ear as an output source that can provide excellent command recognition accuracy even in noisy environments. Instead of picking up speech radiating from the mouth, this technology detects speech transmitted internally through the ear canal. Discreet tongue movements also create air pressure changes within the ear canal, and can be used for stealth control. A patented earpiece was developed with a microphone pointed into the ear canal that captures these signals generated by tongue movements and speech. The signals are transmitted from the earpiece to an Ultra-Mobile Personal Computer (UMPC) through a wired connection. The UMPC processes the signals and utilizes them for device control. The processing can include command recognition, ambient noise cancellation, acoustic echo cancellation, and speech equalization. Successful control of an iRobot PackBot has been demonstrated with both speech (13 discrete commands) and tongue (5 discrete commands) signals. In preliminary tests, command recognition accuracy was 95% with speech control and 85% with tongue control.
'Fly Like This': Natural Language Interface for UAV Mission Planning

NASA Technical Reports Server (NTRS)

Chandarana, Meghan; Meszaros, Erica L.; Trujillo, Anna; Allen, B. Danette

2017-01-01

With the increasing presence of unmanned aerial vehicles (UAVs) in everyday environments, the user base of these powerful and potentially intelligent machines is expanding beyond exclusively highly trained vehicle operators to include non-expert system users. Scientists seeking to augment costly and often inflexible methods of data collection historically used are turning towards lower cost and reconfigurable UAVs. These new users require more intuitive and natural methods for UAV mission planning. This paper explores two natural language interfaces - gesture and speech - for UAV flight path generation through individual user studies. Subjects who participated in the user studies also used a mouse-based interface for a baseline comparison. Each interface allowed the user to build flight paths from a library of twelve individual trajectory segments. Individual user studies evaluated performance, efficacy, and ease-of-use of each interface using background surveys, subjective questionnaires, and observations on time and correctness. Analysis indicates that natural language interfaces are promising alternatives to traditional interfaces. The user study data collected on the efficacy and potential of each interface will be used to inform future intuitive UAV interface design for non-expert users.
Hearing aid and hearing assistance technology use in Aotearoa/New Zealand.

PubMed

Kelly-Campbell, Rebecca J; Lessoway, Kamea

2015-05-01

The purpose of this study was to describe factors that are related to hearing aid and hearing assistance technology ownership and use in Aotearoa/New Zealand. Adults with hearing impairment living in New Zealand were surveyed regarding health-related quality of life and device usage. Audiometric data (hearing sensitivity and speech in noise) were collected. Data were obtained from 123 adults with hearing impairment: 73 reported current hearing-aid use, 81 reported current hearing assistance technology use. In both analyses, device users had more difficulty understanding speech in background noise, had poor hearing in both their better and worse hearing ears, and perceived more consequences of hearing impairment in their everyday lives (both emotionally and socially) than non-hearing-aid users. Discriminant analyses showed that the social consequences of hearing impairment and the better ear hearing best classified hearing aid users from non-users but social consequences and worse ear hearing best classified hearing assistance technology users from non-users. Quality of life measurements and speech-in-noise assessments provide useful clinical information. Hearing-impaired adults in New Zealand who use hearing aids also tend to use hearing assistance technology, which has important clinical implications.
Strategies for distant speech recognitionin reverberant environments

NASA Astrophysics Data System (ADS)

Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

2015-12-01

Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.
Real-time speech gisting for ATC applications

NASA Astrophysics Data System (ADS)

Dunkelberger, Kirk A.

1995-06-01

Command and control within the ATC environment remains primarily voice-based. Hence, automatic real time, speaker independent, continuous speech recognition (CSR) has many obvious applications and implied benefits to the ATC community: automated target tagging, aircraft compliance monitoring, controller training, automatic alarm disabling, display management, and many others. However, while current state-of-the-art CSR systems provide upwards of 98% word accuracy in laboratory environments, recent low-intrusion experiments in the ATCT environments demonstrated less than 70% word accuracy in spite of significant investments in recognizer tuning. Acoustic channel irregularities and controller/pilot grammar verities impact current CSR algorithms at their weakest points. It will be shown herein, however, that real time context- and environment-sensitive gisting can provide key command phrase recognition rates of greater than 95% using the same low-intrusion approach. The combination of real time inexact syntactic pattern recognition techniques and a tight integration of CSR, gisting, and ATC database accessor system components is the key to these high phase recognition rates. A system concept for real time gisting in the ATC context is presented herein. After establishing an application context, discussion presents a minimal CSR technology context then focuses on the gisting mechanism, desirable interfaces into the ATCT database environment, and data and control flow within the prototype system. Results of recent tests for a subset of the functionality are presented together with suggestions for further research.
Improved Open-Microphone Speech Recognition

NASA Astrophysics Data System (ADS)

Abrash, Victor

2002-12-01

Many current and future NASA missions make extreme demands on mission personnel both in terms of work load and in performing under difficult environmental conditions. In situations where hands are impeded or needed for other tasks, eyes are busy attending to the environment, or tasks are sufficiently complex that ease of use of the interface becomes critical, spoken natural language dialog systems offer unique input and output modalities that can improve efficiency and safety. They also offer new capabilities that would not otherwise be available. For example, many NASA applications require astronauts to use computers in micro-gravity or while wearing space suits. Under these circumstances, command and control systems that allow users to issue commands or enter data in hands-and eyes-busy situations become critical. Speech recognition technology designed for current commercial applications limits the performance of the open-ended state-of-the-art dialog systems being developed at NASA. For example, today's recognition systems typically listen to user input only during short segments of the dialog, and user input outside of these short time windows is lost. Mistakes detecting the start and end times of user utterances can lead to mistakes in the recognition output, and the dialog system as a whole has no way to recover from this, or any other, recognition error. Systems also often require the user to signal when that user is going to speak, which is impractical in a hands-free environment, or only allow a system-initiated dialog requiring the user to speak immediately following a system prompt. In this project, SRI has developed software to enable speech recognition in a hands-free, open-microphone environment, eliminating the need for a push-to-talk button or other signaling mechanism. The software continuously captures a user's speech and makes it available to one or more recognizers. By constantly monitoring and storing the audio stream, it provides the spoken dialog manager extra flexibility to recognize the signal with no audio gaps between recognition requests, as well as to rerecognize portions of the signal, or to rerecognize speech with different grammars, acoustic models, recognizers, start times, and so on. SRI expects that this new open-mic functionality will enable NASA to develop better error-correction mechanisms for spoken dialog systems, and may also enable new interaction strategies.
Improved Open-Microphone Speech Recognition

NASA Technical Reports Server (NTRS)

Abrash, Victor

2002-01-01

Many current and future NASA missions make extreme demands on mission personnel both in terms of work load and in performing under difficult environmental conditions. In situations where hands are impeded or needed for other tasks, eyes are busy attending to the environment, or tasks are sufficiently complex that ease of use of the interface becomes critical, spoken natural language dialog systems offer unique input and output modalities that can improve efficiency and safety. They also offer new capabilities that would not otherwise be available. For example, many NASA applications require astronauts to use computers in micro-gravity or while wearing space suits. Under these circumstances, command and control systems that allow users to issue commands or enter data in hands-and eyes-busy situations become critical. Speech recognition technology designed for current commercial applications limits the performance of the open-ended state-of-the-art dialog systems being developed at NASA. For example, today's recognition systems typically listen to user input only during short segments of the dialog, and user input outside of these short time windows is lost. Mistakes detecting the start and end times of user utterances can lead to mistakes in the recognition output, and the dialog system as a whole has no way to recover from this, or any other, recognition error. Systems also often require the user to signal when that user is going to speak, which is impractical in a hands-free environment, or only allow a system-initiated dialog requiring the user to speak immediately following a system prompt. In this project, SRI has developed software to enable speech recognition in a hands-free, open-microphone environment, eliminating the need for a push-to-talk button or other signaling mechanism. The software continuously captures a user's speech and makes it available to one or more recognizers. By constantly monitoring and storing the audio stream, it provides the spoken dialog manager extra flexibility to recognize the signal with no audio gaps between recognition requests, as well as to rerecognize portions of the signal, or to rerecognize speech with different grammars, acoustic models, recognizers, start times, and so on. SRI expects that this new open-mic functionality will enable NASA to develop better error-correction mechanisms for spoken dialog systems, and may also enable new interaction strategies.
Evaluation of speech recognizers for use in advanced combat helicopter crew station research and development

NASA Technical Reports Server (NTRS)

Simpson, Carol A.

1990-01-01

The U.S. Army Crew Station Research and Development Facility uses vintage 1984 speech recognizers. An evaluation was performed of newer off-the-shelf speech recognition devices to determine whether newer technology performance and capabilities are substantially better than that of the Army's current speech recognizers. The Phonetic Discrimination (PD-100) Test was used to compare recognizer performance in two ambient noise conditions: quiet office and helicopter noise. Test tokens were spoken by males and females and in isolated-word and connected-work mode. Better overall recognition accuracy was obtained from the newer recognizers. Recognizer capabilities needed to support the development of human factors design requirements for speech command systems in advanced combat helicopters are listed.
Acoustic Event Detection and Classification

NASA Astrophysics Data System (ADS)

Temko, Andrey; Nadeu, Climent; Macho, Dušan; Malkin, Robert; Zieger, Christian; Omologo, Maurizio

The human activity that takes place in meeting rooms or classrooms is reflected in a rich variety of acoustic events (AE), produced either by the human body or by objects handled by humans, so the determination of both the identity of sounds and their position in time may help to detect and describe that human activity. Indeed, speech is usually the most informative sound, but other kinds of AEs may also carry useful information, for example, clapping or laughing inside a speech, a strong yawn in the middle of a lecture, a chair moving or a door slam when the meeting has just started. Additionally, detection and classification of sounds other than speech may be useful to enhance the robustness of speech technologies like automatic speech recognition.
Scaling and universality in the human voice.

PubMed

Luque, Jordi; Luque, Bartolo; Lacasa, Lucas

2015-04-06

Speech is a distinctive complex feature of human capabilities. In order to understand the physics underlying speech production, in this work, we empirically analyse the statistics of large human speech datasets ranging several languages. We first show that during speech, the energy is unevenly released and power-law distributed, reporting a universal robust Gutenberg-Richter-like law in speech. We further show that such 'earthquakes in speech' show temporal correlations, as the interevent statistics are again power-law distributed. As this feature takes place in the intraphoneme range, we conjecture that the process responsible for this complex phenomenon is not cognitive, but it resides in the physiological (mechanical) mechanisms of speech production. Moreover, we show that these waiting time distributions are scale invariant under a renormalization group transformation, suggesting that the process of speech generation is indeed operating close to a critical point. These results are put in contrast with current paradigms in speech processing, which point towards low dimensional deterministic chaos as the origin of nonlinear traits in speech fluctuations. As these latter fluctuations are indeed the aspects that humanize synthetic speech, these findings may have an impact in future speech synthesis technologies. Results are robust and independent of the communication language or the number of speakers, pointing towards a universal pattern and yet another hint of complexity in human speech. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Secure telemonitoring system for delivering telerehabilitation therapy to enhance children's communication function to home.

PubMed

Parmanto, Bambang; Saptono, Andi; Murthi, Raymond; Safos, Charlotte; Lathan, Corinna E

2008-11-01

A secure telemonitoring system was developed to transform CosmoBot system, a stand-alone speech-language therapy software, into a telerehabilitation system. The CosmoBot system is a motivating, computer-based play character designed to enhance children's communication skills and stimulate verbal interaction during the remediation of speech and language disorders. The CosmoBot system consists of the Mission Control human interface device and Cosmo's Play and Learn software featuring a robot character named Cosmo that targets educational goals for children aged 3-5 years. The secure telemonitoring infrastructure links a distant speech-language therapist and child/parents at home or school settings. The result is a telerehabilitation system that allows a speech-language therapist to monitor children's activities at home while providing feedback and therapy materials remotely. We have developed the means for telerehabilitation of communication skills that can be implemented in children's home settings. The architecture allows the therapist to remotely monitor the children after completion of the therapy session and to provide feedback for the following session.
On the context-dependent nature of the contribution of the ventral premotor cortex to speech perception

PubMed Central

Tremblay, Pascale; Small, Steven L.

2011-01-01

What is the nature of the interface between speech perception and production, where auditory and motor representations converge? One set of explanations suggests that during perception, the motor circuits involved in producing a perceived action are in some way enacting the action without actually causing movement (covert simulation) or sending along the motor information to be used to predict its sensory consequences (i.e., efference copy). Other accounts either reject entirely the involvement of motor representations in perception, or explain their role as being more supportive than integral, and not employing the identical circuits used in production. Using fMRI, we investigated whether there are brain regions that are conjointly active for both speech perception and production, and whether these regions are sensitive to articulatory (syllabic) complexity during both processes, which is predicted by a covert simulation account. A group of healthy young adults (1) observed a female speaker produce a set of familiar words (perception), and (2) observed and then repeated the words (production). There were two types of words, varying in articulatory complexity, as measured by the presence or absence of consonant clusters. The simple words contained no consonant cluster (e.g. “palace”), while the complex words contained one to three consonant clusters (e.g. “planet”). Results indicate that the left ventral premotor cortex (PMv) was significantly active during speech perception and speech production but that activation in this region was scaled to articulatory complexity only during speech production, revealing an incompletely specified efferent motor signal during speech perception. The right planum temporal (PT) was also active during speech perception and speech production, and activation in this region was scaled to articulatory complexity during both production and perception. These findings are discussed in the context of current theories theory of speech perception, with particular attention to accounts that include an explanatory role for mirror neurons. PMID:21664275
Speech transport for packet telephony and voice over IP

NASA Astrophysics Data System (ADS)

Baker, Maurice R.

1999-11-01

Recent advances in packet switching, internetworking, and digital signal processing technologies have converged to allow realizable practical implementations of packet telephony systems. This paper provides a tutorial on transmission engineering for packet telephony covering the topics of speech coding/decoding, speech packetization, packet data network transport, and impairments which may negatively impact end-to-end system quality. Particular emphasis is placed upon Voice over Internet Protocol given the current popularity and ubiquity of IP transport.
Study of Man-Machine Communications Systems for the Handicapped. Interim Report.

ERIC Educational Resources Information Center

Kafafian, Haig

Newly developed communications systems for exceptional children include Cybercom; CYBERTYPE; Cyberplace, a keyless keyboard; Cyberphone, a telephonic communication system for deaf and speech impaired persons; Cyberlamp, a visual display; Cyberview, a fiber optic bundle remote visual display; Cybersem, an interface for the blind, fingerless, and…
A keyword spotting model using perceptually significant energy features

NASA Astrophysics Data System (ADS)

Umakanthan, Padmalochini

The task of a keyword recognition system is to detect the presence of certain words in a conversation based on the linguistic information present in human speech. Such keyword spotting systems have applications in homeland security, telephone surveillance and human-computer interfacing. General procedure of a keyword spotting system involves feature generation and matching. In this work, new set of features that are based on the psycho-acoustic masking nature of human speech are proposed. After developing these features a time aligned pattern matching process was implemented to locate the words in a set of unknown words. A word boundary detection technique based on frame classification using the nonlinear characteristics of speech is also addressed in this work. Validation of this keyword spotting model was done using widely acclaimed Cepstral features. The experimental results indicate the viability of using these perceptually significant features as an augmented feature set in keyword spotting.
Advances in EPG for treatment and research: an illustrative case study.

PubMed

Scobbie, James M; Wood, Sara E; Wrench, Alan A

2004-01-01

Electropalatography (EPG), a technique which reveals tongue-palate contact patterns over time, is a highly effective tool for speech research. We report here on recent developments by Articulate Instruments Ltd. These include hardware for Windows-based computers, backwardly compatible (with Reading EPG3) software systems for clinical intervention and laboratory-based analysis for EPG and acoustic data, and an enhanced clinical interface with client and file management tools. We focus here on a single case study of a child aged 10+/-years who had been diagnosed with an intractable speech disorder possibly resulting ultimately from a complete cleft of hard and soft palate. We illustrate how assessment, diagnosis and treatment of the intractable speech disorder are undertaken using this new generation of instrumental phonetic support. We also look forward to future developments in articulatory phonetics that will link EPG with ultrasound for research and clinical communities.

What makes an automated teller machine usable by blind users?

PubMed

Manzke, J M; Egan, D H; Felix, D; Krueger, H

1998-07-01

Fifteen blind and sighted subjects, who featured as a control group for acceptance, were asked for their requirements for automated teller machines (ATMs). Both groups also tested the usability of a partially operational ATM mock-up. This machine was based on an existing cash dispenser, providing natural speech output, different function menus and different key arrangements. Performance and subjective evaluation data of blind and sighted subjects were collected. All blind subjects were able to operate the ATM successfully. The implemented speech output was the main usability factor for them. The different interface designs did not significantly affect performance and subjective evaluation. Nevertheless, design recommendations can be derived from the requirement assessment. The sighted subjects were rather open for design modifications, especially the implementation of speech output. However, there was also a mismatch of the requirements of the two subject groups, mainly concerning the key arrangement.
Opening Statements and Speeches. Plenary Session. Papers.

ERIC Educational Resources Information Center

International Federation of Library Associations, The Hague (Netherlands).

Official opening statements, organizational reports, and papers on libraries in a technological world, which were presented at the 1983 International Federation of Library Associations (IFLA) conference include: (1) welcoming addresses by Franz Georg Kaltwasser and Mathilde Berghofer-Weichner; (2) opening speeches by Else Granheim (IFLA president)…
Using Telerehabilitation to Assess Apraxia of Speech in Adults

ERIC Educational Resources Information Center

Hill, Anne Jane; Theodoros, Deborah; Russell, Trevor; Ward, Elizabeth

2009-01-01

Background: Telerehabilitation is the remote delivery of rehabilitation services via information technology and telecommunication systems. There have been a number of studies that have used videoconferencing to assess speech and language skills in people with acquired neurogenic communication disorders. However, few studies have focused on cases…
Perception and performance in flight simulators: The contribution of vestibular, visual, and auditory information

NASA Technical Reports Server (NTRS)

1979-01-01

The pilot's perception and performance in flight simulators is examined. The areas investigated include: vestibular stimulation, flight management and man cockpit information interfacing, and visual perception in flight simulation. The effects of higher levels of rotary acceleration on response time to constant acceleration, tracking performance, and thresholds for angular acceleration are examined. Areas of flight management examined are cockpit display of traffic information, work load, synthetic speech call outs during the landing phase of flight, perceptual factors in the use of a microwave landing system, automatic speech recognition, automation of aircraft operation, and total simulation of flight training.
Using Speech Recognition to Enhance the Tongue Drive System Functionality in Computer Access

PubMed Central

Huo, Xueliang; Ghovanloo, Maysam

2013-01-01

Tongue Drive System (TDS) is a wireless tongue operated assistive technology (AT), which can enable people with severe physical disabilities to access computers and drive powered wheelchairs using their volitional tongue movements. TDS offers six discrete commands, simultaneously available to the users, for pointing and typing as a substitute for mouse and keyboard in computer access, respectively. To enhance the TDS performance in typing, we have added a microphone, an audio codec, and a wireless audio link to its readily available 3-axial magnetic sensor array, and combined it with a commercially available speech recognition software, the Dragon Naturally Speaking, which is regarded as one of the most efficient ways for text entry. Our preliminary evaluations indicate that the combined TDS and speech recognition technologies can provide end users with significantly higher performance than using each technology alone, particularly in completing tasks that require both pointing and text entry, such as web surfing. PMID:22255801
A phone-assistive device based on Bluetooth technology for cochlear implant users.

PubMed

Qian, Haifeng; Loizou, Philipos C; Dorman, Michael F

2003-09-01

Hearing-impaired people, and particularly hearing-aid and cochlear-implant users, often have difficulty communicating over the telephone. The intelligibility of telephone speech is considerably lower than the intelligibility of face-to-face speech. This is partly because of lack of visual cues, limited telephone bandwidth, and background noise. In addition, cellphones may cause interference with the hearing aid or cochlear implant. To address these problems that hearing-impaired people experience with telephones, this paper proposes a wireless phone adapter that can be used to route the audio signal directly to the hearing aid or cochlear implant processor. This adapter is based on Bluetooth technology. The favorable features of this new wireless technology make the adapter superior to traditional assistive listening devices. A hardware prototype was built and software programs were written to implement the headset profile in the Bluetooth specification. Three cochlear implant users were tested with the proposed phone-adapter and reported good speech quality.
Clinicians' perspectives of therapeutic alliance in face-to-face and telepractice speech-language pathology sessions.

PubMed

Freckmann, Anneka; Hines, Monique; Lincoln, Michelle

2017-06-01

To investigate the face validity of a measure of therapeutic alliance for paediatric speech-language pathology and to determine whether a difference exists in therapeutic alliance reported by speech-language pathologists (SLPs) conducting face-to-face sessions, compared with telepractice SLPs or in their ratings of confidence with technology. SLPs conducting telepractice (n = 14) or face-to-face therapy (n = 18) completed an online survey which included the Therapeutic Alliance Scales for Children - Revised (TASC-r) (Therapist Form) to rate clinicians' perceptions of rapport with up to three clients. Participants also reported their overall perception of rapport with each client and their comfort with technology. There was a strong correlation between TASC-r total scores and overall ratings of rapport, providing preliminary evidence of TASC-r face validity. There was no significant difference between TASC-r scores for telepractice and face-to-face therapy (p = 0.961), nor face-to-face and telepractice SLPs' confidence with familiar (p = 0.414) or unfamiliar technology (p = 0.780). The TASC-r may be a promising tool for measuring therapeutic alliance in speech-language pathology. Telepractice does not appear to have a negative effect on rapport between SLPs and paediatric clients. Future research is required to identify how SLPs develop rapport in telepractice.
A Human Machine Interface for EVA

NASA Astrophysics Data System (ADS)

Hartmann, L.

EVA astronauts work in a challenging environment that includes high rate of muscle fatigue, haptic and proprioception impairment, lack of dexterity and interaction with robotic equipment. Currently they are heavily dependent on support from on-board crew and ground station staff for information and robotics operation. They are limited to the operation of simple controls on the suit exterior and external robot controls that are difficult to operate because of the heavy gloves that are part of the EVA suit. A wearable human machine interface (HMI) inside the suit provides a powerful alternative for robot teleoperation, procedure checklist access, generic equipment operation via virtual control panels and general information retrieval and presentation. The HMI proposed here includes speech input and output, a simple 6 degree of freedom (dof) pointing device and a heads up display (HUD). The essential characteristic of this interface is that it offers an alternative to the standard keyboard and mouse interface of a desktop computer. The astronaut's speech is used as input to command mode changes, execute arbitrary computer commands and generate text. The HMI can respond with speech also in order to confirm selections, provide status and feedback and present text output. A candidate 6 dof pointing device is Measurand's Shapetape, a flexible "tape" substrate to which is attached an optic fiber with embedded sensors. Measurement of the modulation of the light passing through the fiber can be used to compute the shape of the tape and, in particular, the position and orientation of the end of the Shapetape. It can be used to provide any kind of 3d geometric information including robot teleoperation control. The HUD can overlay graphical information onto the astronaut's visual field including robot joint torques, end effector configuration, procedure checklists and virtual control panels. With suitable tracking information about the position and orientation of the EVA suit, the overlaid graphical information can be registered with the external world. For example, information about an object can be positioned on or beside the object. This wearable HMI supports many applications during EVA including robot teleoperation, procedure checklist usage, operation of virtual control panels and general information or documentation retrieval and presentation. Whether the robot end effector is a mobile platform for the EVA astronaut or is an assistant to the astronaut in an assembly or repair task, the astronaut can control the robot via a direct manipulation interface. Embedded in the suit or the astronaut's clothing, Shapetape can measure the user's arm/hand position and orientation which can be directly mapped into the workspace coordinate system of the robot. Motion of the users hand can generate corresponding motion of the robot end effector in order to reposition the EVA platform or to manipulate objects in the robot's grasp. Speech input can be used to execute commands and mode changes without the astronaut having to withdraw from the teleoperation task. Speech output from the system can provide feedback without affecting the user's visual attention. The procedure checklist guiding the astronaut's detailed activities can be presented on the HUD and manipulated (e.g., move, scale, annotate, mark tasks as done, consult prerequisite tasks) by spoken command. Virtual control panels for suit equipment, equipment being repaired or arbitrary equipment on the space station can be displayed on the HUD and can be operated by speech commands or by hand gestures. For example, an antenna being repaired could be pointed under the control of the EVA astronaut. Additionally arbitrary computer activities such as information retrieval and presentation can be carried out using similar interface techniques. Considering the risks, expense and physical challenges of EVA work, it is appropriate that EVA astronauts have considerable support from station crew and ground station staff. Reducing their dependence on such personnel may under many circumstances, however, improve performance and reduce risk. For example, the EVA astronaut is likely to have the best viewpoint at a robotic worksite. Direct access to the procedure checklist can help provide temporal context and continuity throughout an EVA. Access to station facilities through an HMI such as the one described here could be invaluable during an emergency or in a situation in which a fault occurs. The full paper will describe the HMI operation and applications in the EVA context in more detail and will describe current laboratory prototyping activities.
Difficulty understanding speech in noise by the hearing impaired: underlying causes and technological solutions.

PubMed

Healy, Eric W; Yoho, Sarah E

2016-08-01

A primary complaint of hearing-impaired individuals involves poor speech understanding when background noise is present. Hearing aids and cochlear implants often allow good speech understanding in quiet backgrounds. But hearing-impaired individuals are highly noise intolerant, and existing devices are not very effective at combating background noise. As a result, speech understanding in noise is often quite poor. In accord with the significance of the problem, considerable effort has been expended toward understanding and remedying this issue. Fortunately, our understanding of the underlying issues is reasonably good. In sharp contrast, effective solutions have remained elusive. One solution that seems promising involves a single-microphone machine-learning algorithm to extract speech from background noise. Data from our group indicate that the algorithm is capable of producing vast increases in speech understanding by hearing-impaired individuals. This paper will first provide an overview of the speech-in-noise problem and outline why hearing-impaired individuals are so noise intolerant. An overview of our approach to solving this problem will follow.
A dissociation of objective and subjective workload measures in assessing the impact of speech controls in advanced helicopters

NASA Technical Reports Server (NTRS)

Vidulich, Michael A.; Bortolussi, Michael R.

1988-01-01

Among the new technologies that are expected to aid helicopter designers are speech controls. Proponents suggest that speech controls could reduce the potential for manual control overloads and improve time-sharing performance in environments that have heavy demands for manual control. This was tested in a simulation of an advanced single-pilot, scout/attack helicopter. Objective performance indicated that the speech controls were effective in decreasing the interference of discrete responses during moments of heavy flight control activity. However, subjective ratings indicated that the use of speech controls required extra effort to speak precisely and to attend to feedback. Although the operational reliability of speech controls must be improved, the present results indicate that reliable speech controls could enhance the time-sharing efficiency of helicopter pilots. Furthermore, the results demonstrated the importance of using multiple assessment techniques to completely assess a task. Neither the objective nor the subjective measures alone provided complete information. It was the contrast between the measures that was most informative.
[Creating language model of the forensic medicine domain for developing a autopsy recording system by automatic speech recognition].

PubMed

Niijima, H; Ito, N; Ogino, S; Takatori, T; Iwase, H; Kobayashi, M

2000-11-01

For the purpose of practical use of speech recognition technology for recording of forensic autopsy, a language model of the speech recording system, specialized for the forensic autopsy, was developed. The language model for the forensic autopsy by applying 3-gram model was created, and an acoustic model for Japanese speech recognition by Hidden Markov Model in addition to the above were utilized to customize the speech recognition engine for forensic autopsy. A forensic vocabulary set of over 10,000 words was compiled and some 300,000 sentence patterns were made to create the forensic language model, then properly mixing with a general language model to attain high exactitude. When tried by dictating autopsy findings, this speech recognition system was proved to be about 95% of recognition rate that seems to have reached to the practical usability in view of speech recognition software, though there remains rooms for improving its hardware and application-layer software.
Advances in natural language processing.

PubMed

Hirschberg, Julia; Manning, Christopher D

2015-07-17

Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.
Syntactic error modeling and scoring normalization in speech recognition

NASA Technical Reports Server (NTRS)

Olorenshaw, Lex

1991-01-01

The objective was to develop the speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Research was performed in the following areas: (1) syntactic error modeling; (2) score normalization; and (3) phoneme error modeling. The study into the types of errors that a reader makes will provide the basis for creating tests which will approximate the use of the system in the real world. NASA-Johnson will develop this technology into a 'Literacy Tutor' in order to bring innovative concepts to the task of teaching adults to read.
Legal Issues and Computer Use by School-Based Audiologists and Speech-Language Pathologists.

ERIC Educational Resources Information Center

Wynne, Michael K.; Hurst, David S.

1995-01-01

This article reviews ethical and legal issues regarding school-based integration and application of technologies, particularly when used by speech-language pathologists and audiologists. Four issues are addressed: (1) software copyright and licensed use; (2) information access and the right to privacy; (3) computer-assisted or…
Expanding Use of Telepractice in Speech-Language Pathology and Audiology

ERIC Educational Resources Information Center

Edwards, Marge; Stredler-Brown, Arlene; Houston, K. Todd

2012-01-01

Recent advances in videoconferencing technology have resulted in a substantial increase in the use of live videoconferencing--referred to here as telepractice--to diagnose and treat speech, language, and hearing disorders. There is growing support from professional organizations for use of this service delivery model, as videoconferencing…
The Forces Restructuring Our Future and Outdoor Recreation: Transcription of Keynote Speech.

ERIC Educational Resources Information Center

Feather, Frank

This futurist keynote speech of the National Conference for Outdoor Leaders addresses the social, technological, economic, and political forces that are restructuring the world. The concept of geostrategic thinking has the components of global thinking, futuristic thinking, and seeking opportunities. Important developments include: (1) wealth will…
Assessing Disordered Speech and Voice in Parkinson's Disease: A Telerehabilitation Application

ERIC Educational Resources Information Center

Constantinescu, Gabriella; Theodoros, Deborah; Russell, Trevor; Ward, Elizabeth; Wilson, Stephen; Wootton, Richard

2010-01-01

Background: Patients with Parkinson's disease face numerous access barriers to speech pathology services for appropriate assessment and treatment. Telerehabilitation is a possible solution to this problem, whereby rehabilitation services may be delivered to the patient at a distance, via telecommunication and information technologies. A number of…
Connecting Intonation Labels to Mathematical Descriptions of Fundamental Frequency

ERIC Educational Resources Information Center

Grabe, Esther; Kochanski, Greg; Coleman, John

2007-01-01

The mathematical models of intonation used in speech technology are often inaccessible to linguists. By the same token, phonological descriptions of intonation are rarely used by speech technologists, as they cannot be implemented directly in applications. Consequently, these research communities do not benefit much from each other's insights. In…
Enhancing Computer-Based Lessons for Effective Speech Education.

ERIC Educational Resources Information Center

Hemphill, Michael R.; Standerfer, Christina C.

1987-01-01

Assesses the advantages of computer-based instruction on speech education. Concludes that, while it offers tremendous flexibility to the instructor--especially in dynamic lesson design, feedback, graphics, and artificial intelligence--there is no inherent advantage to the use of computer technology in the classroom, unless the student interacts…
Evidence-Based Practice in Communication Disorders: Progress Not Perfection

ERIC Educational Resources Information Center

Kent, Ray D.

2006-01-01

Purpose: This commentary is written in response to a companion paper by Nan Bernstein Ratner ("Evidence-Based Practice: An Examination of its Ramifications for the Practice of Speech-Language Pathology"). Method: The comments reflect my experience as Vice President for Research and Technology of the American Speech-Language-Hearing Association…

Speech Understanding Research. Annual Technical Report.

ERIC Educational Resources Information Center

Walker, Donald E.; And Others

This report is the third in a series of annual reports describing the research performed by Stanford Research Institute to provide the technology that will allow speech understanding systems to be designed and implemented for a variety of different task domains and environmental constraints. The current work is being carried out cooperatively with…
Ultrasound as visual feedback in speech habilitation: exploring consultative use in rural British Columbia, Canada.

PubMed

Bernhardt, May B; Bacsfalvi, Penelope; Adler-Bock, Marcy; Shimizu, Reiko; Cheney, Audrey; Giesbrecht, Nathan; O'connell, Maureen; Sirianni, Jason; Radanov, Bosko

2008-02-01

Ultrasound has shown promise as a visual feedback tool in speech therapy. Rural clients, however, often have minimal access to new technologies. The purpose of the current study was to evaluate consultative treatment using ultrasound in rural communities. Two speech-language pathologists (SLPs) trained in ultrasound use provided consultation with ultrasound in rural British Columbia to 13 school-aged children with residual speech impairments. Local SLPs provided treatment without ultrasound before and after the consultation. Speech samples were transcribed phonetically by independent trained listeners. Eleven children showed greater gains in production of the principal target /[image omitted]/ after the ultrasound consultation. Four of the seven participants who received more consultation time with ultrasound showed greatest improvement. Individual client factors also affected outcomes. The current study was a quasi-experimental clinic-based study. Larger, controlled experimental studies are needed to provide ultimate evaluation of the consultative use of ultrasound in speech therapy.
Preliminary Analysis of Automatic Speech Recognition and Synthesis Technology.

DTIC Science & Technology

1983-05-01

16.311 % a. Seale In/Se"l tAL4 lrs e y i s 2 I ROM men "Ig eddiei, m releerla ons leveltc. Ŗ dots ghoeea INDtISTRtAIJ%6LITARY SPEECH SYNTHESIS PRODUCTS...saquence The SC-01 Suech Syntheszer conftains 64 cf, arent poneme~hs which are accessed try A 6-tht code. 1 - the proper sequ.enti omthnatiors of thoe...connected speech input with widely differing emotional states, diverse accents, and substantial nonperiodic background noise input. As noted previously
Interface Technology for Geometrically Nonlinear Analysis of Multiple Connected Subdomains

NASA Technical Reports Server (NTRS)

Ransom, Jonathan B.

1997-01-01

Interface technology for geometrically nonlinear analysis is presented and demonstrated. This technology is based on an interface element which makes use of a hybrid variational formulation to provide for compatibility between independently modeled connected subdomains. The interface element developed herein extends previous work to include geometric nonlinearity and to use standard linear and nonlinear solution procedures. Several benchmark nonlinear applications of the interface technology are presented and aspects of the implementation are discussed.
Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers

PubMed Central

Mustafa, Mumtaz Begum; Salim, Siti Salwah; Mohamed, Noraini; Al-Qatab, Bassam; Siong, Chng Eng

2014-01-01

Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data. PMID:24466004
Neural representations and mechanisms for the performance of simple speech sequences

PubMed Central

Bohland, Jason W.; Bullock, Daniel; Guenther, Frank H.

2010-01-01

Speakers plan the phonological content of their utterances prior to their release as speech motor acts. Using a finite alphabet of learned phonemes and a relatively small number of syllable structures, speakers are able to rapidly plan and produce arbitrary syllable sequences that fall within the rules of their language. The class of computational models of sequence planning and performance termed competitive queuing (CQ) models have followed Lashley (1951) in assuming that inherently parallel neural representations underlie serial action, and this idea is increasingly supported by experimental evidence. In this paper we develop a neural model that extends the existing DIVA model of speech production in two complementary ways. The new model includes paired structure and content subsystems (cf. MacNeilage, 1998) that provide parallel representations of a forthcoming speech plan, as well as mechanisms for interfacing these phonological planning representations with learned sensorimotor programs to enable stepping through multi-syllabic speech plans. On the basis of previous reports, the model’s components are hypothesized to be localized to specific cortical and subcortical structures, including the left inferior frontal sulcus, the medial premotor cortex, the basal ganglia and thalamus. The new model, called GODIVA (Gradient Order DIVA), thus fills a void in current speech research by providing formal mechanistic hypotheses about both phonological and phonetic processes that are grounded by neuroanatomy and physiology. This framework also generates predictions that can be tested in future neuroimaging and clinical case studies. PMID:19583476
Aphasia rehabilitation during adolescence: a case report.

PubMed

Laures-Gore, Jacqueline; McCusker, Tiffany; Hartley, Leila L

2017-06-01

Descriptions of speech-language interventions addressing the unique aspects of aphasia in adolescence appear to be nonexistent. The current paper presents the case of a male adolescent who experienced a stroke with resultant aphasia and the speech and language therapy he received. Furthermore, we discuss the issues that are unique to an adolescent with aphasia and how they were addressed with this particular patient. Traditional language and apraxia therapy was provided to this patient with inclusion of technology and academic topics. The patient demonstrated improvements in his speech and language abilities, most notably his reading comprehension and speech production. Age-related issues, including academic needs, group treatment, socialization, adherence/compliance, independence and family involvement, emerged during intervention. Although aphasia therapy for adolescents may be similar in many aspects to selected interventions for adults, it is necessary for the clinician to be mindful of age-related issues throughout the course of therapy. Goals and interventions should be selected based on factors salient to an adolescent as well as the potential long-term impact of therapy. Implications for Research Aphasia and its treatment in adolescence need to be further explored. Academics and technology are important aspects of aphasia treatment in adolescence. Issues specific to adolescence such as socialization, adherence/compliance, and independence are important to address in speech-language therapy.
Spatial Learning Using Locomotion Interface to Virtual Environment

ERIC Educational Resources Information Center

Patel, K. K.; Vij, S.

2012-01-01

The inability to navigate independently and interact with the wider world is one of the most significant handicaps that can be caused by blindness, second only to the inability to communicate through reading and writing. Many difficulties are encountered when visually impaired people (VIP) need to visit new and unknown places. Current speech or…
Multilingual Practices in Contemporary and Historical Contexts: Interfaces between Code-Switching and Translation

ERIC Educational Resources Information Center

Kolehmainen, Leena; Skaffari, Janne

2016-01-01

This article serves as an introduction to a collection of four articles on multilingual practices in speech and writing, exploring both contemporary and historical sources. It not only introduces the articles but also discusses the scope and definitions of code-switching, attitudes towards multilingual interaction and, most pertinently, the…
Prediction, Performance, and Promise: Perspective on Time-Shortened Degree Programs.

ERIC Educational Resources Information Center

Smart, John M., Ed.; Howard, Toni A., Ed.

Among the papers and presentations are: the keynote speech (E. Alden Dunham); the quality baccalaureate myth (Richard Giardina); the high school/college interface and time-shortening (panel presentation); restructuring the baccalaureate: a follow-up study (Robert Bersi); a point of view (Richard Meisler); more options: less time? (DeVere E.…
The Function of Gesture in Lexically Focused L2 Instructional Conversations

ERIC Educational Resources Information Center

Smotrova, Tetyana; Lantolf, James P.

2013-01-01

The purpose of the present study is to investigate the mediational function of the gesture-speech interface in the instructional conversation that emerged as teachers attempted to explain the meaning of English words to their students in two EFL classrooms in the Ukraine. Its analytical framework is provided by Vygotsky's sociocultural psychology…
Unsupervised Decoding of Long-Term, Naturalistic Human Neural Recordings with Automated Video and Audio Annotations

PubMed Central

Wang, Nancy X. R.; Olson, Jared D.; Ojemann, Jeffrey G.; Rao, Rajesh P. N.; Brunton, Bingni W.

2016-01-01

Fully automated decoding of human activities and intentions from direct neural recordings is a tantalizing challenge in brain-computer interfacing. Implementing Brain Computer Interfaces (BCIs) outside carefully controlled experiments in laboratory settings requires adaptive and scalable strategies with minimal supervision. Here we describe an unsupervised approach to decoding neural states from naturalistic human brain recordings. We analyzed continuous, long-term electrocorticography (ECoG) data recorded over many days from the brain of subjects in a hospital room, with simultaneous audio and video recordings. We discovered coherent clusters in high-dimensional ECoG recordings using hierarchical clustering and automatically annotated them using speech and movement labels extracted from audio and video. To our knowledge, this represents the first time techniques from computer vision and speech processing have been used for natural ECoG decoding. Interpretable behaviors were decoded from ECoG data, including moving, speaking and resting; the results were assessed by comparison with manual annotation. Discovered clusters were projected back onto the brain revealing features consistent with known functional areas, opening the door to automated functional brain mapping in natural settings. PMID:27148018
Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

PubMed Central

Shin, Young Hoon; Seo, Jiwon

2016-01-01

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867
Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

PubMed

Shin, Young Hoon; Seo, Jiwon

2016-10-29

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.
Recognition of Speech from the Television with Use of a Wireless Technology Designed for Cochlear Implants.

PubMed

Duke, Mila Morais; Wolfe, Jace; Schafer, Erin

2016-05-01

Cochlear implant (CI) recipients often experience difficulty understanding speech in noise and speech that originates from a distance. Many CI recipients also experience difficulty understanding speech originating from a television. Use of hearing assistance technology (HAT) may improve speech recognition in noise and for signals that originate from more than a few feet from the listener; however, there are no published studies evaluating the potential benefits of a wireless HAT designed to deliver audio signals from a television directly to a CI sound processor. The objective of this study was to compare speech recognition in quiet and in noise of CI recipients with the use of their CI alone and with the use of their CI and a wireless HAT (Cochlear Wireless TV Streamer). A two-way repeated measures design was used to evaluate performance differences obtained in quiet and in competing noise (65 dBA) with the CI sound processor alone and with the sound processor coupled to the Cochlear Wireless TV Streamer. Sixteen users of Cochlear Nucleus 24 Freedom, CI512, and CI422 implants were included in the study. Participants were evaluated in four conditions including use of the sound processor alone and use of the sound processor with the wireless streamer in quiet and in the presence of competing noise at 65 dBA. Speech recognition was evaluated in each condition with two full lists of Computer-Assisted Speech Perception Testing and Training Sentence-Level Test sentences presented from a light-emitting diode television. Speech recognition in noise was significantly better with use of the wireless streamer compared to participants' performance with their CI sound processor alone. There was also a nonsignificant trend toward better performance in quiet with use of the TV Streamer. Performance was significantly poorer when evaluated in noise compared to performance in quiet when the TV Streamer was not used. Use of the Cochlear Wireless TV Streamer designed to stream audio from a television directly to a CI sound processor provides better speech recognition in quiet and in noise when compared to performance obtained with use of the CI sound processor alone. American Academy of Audiology.
Biomedical technology transfer. Applications of NASA science and technology

NASA Technical Reports Server (NTRS)

Harrison, D. C.

1980-01-01

Ongoing projects described address: (1) intracranial pressure monitoring; (2) versatile portable speech prosthesis; (3) cardiovascular magnetic measurements; (4) improved EMG biotelemetry for pediatrics; (5) ultrasonic kidney stone disintegration; (6) pediatric roentgen densitometry; (7) X-ray spatial frequency multiplexing; (8) mechanical impedance determination of bone strength; (9) visual-to-tactile mobility aid for the blind; (10) Purkinje image eyetracker and stabilized photocoalqulator; (11) neurological applications of NASA-SRI eyetracker; (12) ICU synthesized speech alarm; (13) NANOPHOR: microelectrophoresis instrument; (14) WRISTCOM: tactile communication system for the deaf-blind; (15) medical applications of NASA liquid-circulating garments; and (16) hip prosthesis with biotelemetry. Potential transfer projects include a person-portable versatile speech prosthesis, a critical care transport sytem, a clinical information system for cardiology, a programmable biofeedback orthosis for scoliosis a pediatric long-bone reconstruction, and spinal immobilization apparatus.
Cochlear implant characteristics and speech perception skills of adolescents with long-term device use.

PubMed

Davidson, Lisa S; Geers, Ann E; Brenner, Christine

2010-10-01

Updated cochlear implant technology and optimized fitting can have a substantial impact on speech perception. The effects of upgrades in processor technology and aided thresholds on word recognition at soft input levels and sentence recognition in noise were examined. We hypothesized that updated speech processors and lower aided thresholds would allow improved recognition of soft speech without compromising performance in noise. 109 teenagers who had used a Nucleus 22-cochlear implant since preschool were tested with their current speech processor(s) (101 unilateral and 8 bilateral): 13 used the Spectra, 22 the ESPrit 22, 61 the ESPrit 3G, and 13 the Freedom. The Lexical Neighborhood Test (LNT) was administered at 70 and 50 dB SPL and the Bamford Kowal Bench sentences were administered in quiet and in noise. Aided thresholds were obtained for frequency-modulated tones from 250 to 4,000 Hz. Results were analyzed using repeated measures analysis of variance. Aided thresholds for the Freedom/3G group were significantly lower (better) than the Spectra/Sprint group. LNT scores at 50 dB were significantly higher for the Freedom/3G group. No significant differences between the 2 groups were found for the LNT at 70 or sentences in quiet or noise. Adolescents using updated processors that allowed for aided detection thresholds of 30 dB HL or better performed the best at soft levels. The BKB in noise results suggest that greater access to soft speech does not compromise listening in noise.
Asynchronous brain-computer interface for cognitive assessment in people with cerebral palsy

NASA Astrophysics Data System (ADS)

Alcaide-Aguirre, R. E.; Warschausky, S. A.; Brown, D.; Aref, A.; Huggins, J. E.

2017-12-01

Objective. Typically, clinical measures of cognition require motor or speech responses. Thus, a significant percentage of people with disabilities are not able to complete standardized assessments. This situation could be resolved by employing a more accessible test administration method, such as a brain-computer interface (BCI). A BCI can circumvent motor and speech requirements by translating brain activity to identify a subject’s response. By eliminating the need for motor or speech input, one could use a BCI to assess an individual who previously did not have access to clinical tests. Approach. We developed an asynchronous, event-related potential BCI-facilitated administration procedure for the peabody picture vocabulary test (PPVT-IV). We then tested our system in typically developing individuals (N = 11), as well as people with cerebral palsy (N = 19) to compare results to the standardized PPVT-IV format and administration. Main results. Standard scores on the BCI-facilitated PPVT-IV, and the standard PPVT-IV were highly correlated (r = 0.95, p < 0.001), with a mean difference of 2.0 ± 6.4 points, which is within the standard error of the PPVT-IV. Significance. Thus, our BCI-facilitated PPVT-IV provided comparable results to the standard PPVT-IV, suggesting that populations for whom standardized cognitive tests are not accessible could benefit from our BCI-facilitated approach.
Automatic speech recognition in air traffic control

NASA Technical Reports Server (NTRS)

Karlsson, Joakim

1990-01-01

Automatic Speech Recognition (ASR) technology and its application to the Air Traffic Control system are described. The advantages of applying ASR to Air Traffic Control, as well as criteria for choosing a suitable ASR system are presented. Results from previous research and directions for future work at the Flight Transportation Laboratory are outlined.
Speech Recognition Technology for Disabilities Education

ERIC Educational Resources Information Center

Tang, K. Wendy; Kamoua, Ridha; Sutan, Victor; Farooq, Omer; Eng, Gilbert; Chu, Wei Chern; Hou, Guofeng

2005-01-01

Speech recognition is an alternative to traditional methods of interacting with a computer, such as textual input through a keyboard. An effective system can replace or reduce the reliability on standard keyboard and mouse input. This can especially assist dyslexic students who have problems with character or word use and manipulation in a textual…

Exploring Speech Recognition Technology: Children with Learning and Emotional/Behavioral Disorders.

ERIC Educational Resources Information Center

Faris-Cole, Debra; Lewis, Rena

2001-01-01

Intermediate grade students with disabilities in written expression and emotional/behavioral disorders were trained to use discrete or continuous speech input devices for written work. The study found extreme variability in the fidelity of the devices, PowerSecretary and Dragon NaturallySpeaking ranging from 49 percent to 87 percent. Both devices…
Validation of Automated Scoring of Oral Reading

ERIC Educational Resources Information Center

Balogh, Jennifer; Bernstein, Jared; Cheng, Jian; Van Moere, Alistair; Townshend, Brent; Suzuki, Masanori

2012-01-01

A two-part experiment is presented that validates a new measurement tool for scoring oral reading ability. Data collected by the U.S. government in a large-scale literacy assessment of adults were analyzed by a system called VersaReader that uses automatic speech recognition and speech processing technologies to score oral reading fluency. In the…
Voice Interactive Analysis System Study. Final Report, August 28, 1978 through March 23, 1979.

ERIC Educational Resources Information Center

Harry, D. P.; And Others

The Voice Interactive Analysis System study continued research and development of the LISTEN real-time, minicomputer based connected speech recognition system, within NAVTRAEQUIPCEN'S program of developing automatic speech technology in support of training. An attempt was made to identify the most effective features detected by the TTI-500 model…
Miles To Go, Promises To Keep: Higher Education in the 21st Century.

ERIC Educational Resources Information Center

Chase, Hank

2000-01-01

Reports on the July 2000 annual meeting of the National Association of College and University Business Officers. Highlights included a keynote luncheon speech by Colin Powell and general session speeches on such topics as the social effects of the technological revolution on campus, college-business partnerships, college-community partnerships,…
"The Most Poisonous Force in Technology"

ERIC Educational Resources Information Center

Carnevale, Dan

2007-01-01

Walt Mossberg, personal-technology columnist for "The Wall Street Journal," highlighted technology trends in his speech to a group of college presidents and other administrators. Mr. Mossberg touched a nerve when he called information-technology departments of large organizations, including colleges, "the most regressive and poisonous force in…
Department of Cybernetic Acoustics

NASA Astrophysics Data System (ADS)

The development of the theory, instrumentation and applications of methods and systems for the measurement, analysis, processing and synthesis of acoustic signals within the audio frequency range, particularly of the speech signal and the vibro-acoustic signal emitted by technical and industrial equipments treated as noise and vibration sources was discussed. The research work, both theoretical and experimental, aims at applications in various branches of science, and medicine, such as: acoustical diagnostics and phoniatric rehabilitation of pathological and postoperative states of the speech organ; bilateral ""man-machine'' speech communication based on the analysis, recognition and synthesis of the speech signal; vibro-acoustical diagnostics and continuous monitoring of the state of machines, technical equipments and technological processes.
Speech Recognition for Medical Dictation: Overview in Quebec and Systematic Review.

PubMed

Poder, Thomas G; Fisette, Jean-François; Déry, Véronique

2018-04-03

Speech recognition is increasingly used in medical reporting. The aim of this article is to identify in the literature the strengths and weaknesses of this technology, as well as barriers to and facilitators of its implementation. A systematic review of systematic reviews was performed using PubMed, Scopus, the Cochrane Library and the Center for Reviews and Dissemination through August 2017. The gray literature has also been consulted. The quality of systematic reviews has been assessed with the AMSTAR checklist. The main inclusion criterion was use of speech recognition for medical reporting (front-end or back-end). A survey has also been conducted in Quebec, Canada, to identify the dissemination of this technology in this province, as well as the factors leading to the success or failure of its implementation. Five systematic reviews were identified. These reviews indicated a high level of heterogeneity across studies. The quality of the studies reported was generally poor. Speech recognition is not as accurate as human transcription, but it can dramatically reduce turnaround times for reporting. In front-end use, medical doctors need to spend more time on dictation and correction than required with human transcription. With speech recognition, major errors occur up to three times more frequently. In back-end use, a potential increase in productivity of transcriptionists was noted. In conclusion, speech recognition offers several advantages for medical reporting. However, these advantages are countered by an increased burden on medical doctors and by risks of additional errors in medical reports. It is also hard to identify for which medical specialties and which clinical activities the use of speech recognition will be the most beneficial.
Welcome to Ames Research Center (1987 forum on Federal technology transfer)

NASA Technical Reports Server (NTRS)

Ballhaus, William F., Jr.

1988-01-01

NASA Ames Research Center has a long and distinguished history of technology development and transfer. Recently, in a welcoming speech to the Forum on Federal Technology Transfer, Director Ballhouse of Ames described significant technologies which have been transferred from Ames to the private sector and identifies future opportunities.
An evaluation of talker localization based on direction of arrival estimation and statistical sound source identification

NASA Astrophysics Data System (ADS)

Nishiura, Takanobu; Nakamura, Satoshi

2002-11-01

It is very important to capture distant-talking speech for a hands-free speech interface with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. Conventional talker localization algorithms in multiple sound source environments not only have difficulty localizing the multiple sound sources accurately, but also have difficulty localizing the target talker among known multiple sound source positions. To cope with these problems, we propose a new talker localization algorithm consisting of two algorithms. One is DOA (direction of arrival) estimation algorithm for multiple sound source localization based on CSP (cross-power spectrum phase) coefficient addition method. The other is statistical sound source identification algorithm based on GMM (Gaussian mixture model) for localizing the target talker position among localized multiple sound sources. In this paper, we particularly focus on the talker localization performance based on the combination of these two algorithms with a microphone array. We conducted evaluation experiments in real noisy reverberant environments. As a result, we confirmed that multiple sound signals can be identified accurately between ''speech'' or ''non-speech'' by the proposed algorithm. [Work supported by ATR, and MEXT of Japan.
Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition.

PubMed

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners' Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners' everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial-F0s to represent male vs. female voices, and different F0 contours (i.e. falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners' sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners' performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution.
Processing F0 with Cochlear Implants: Modulation Frequency Discrimination and Speech Intonation Recognition

PubMed Central

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners’ Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners’ everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial F0s to represent male vs. female voices, and different F0 contours (i.e., falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners’ sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners’ performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution. PMID:18093766
Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

NASA Astrophysics Data System (ADS)

Malcangi, Mario

2009-08-01

Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.
Speech Intelligibility in Persian Hearing Impaired Children with Cochlear Implants and Hearing Aids.

PubMed

Rezaei, Mohammad; Emadi, Maryam; Zamani, Peyman; Farahani, Farhad; Lotfi, Gohar

2017-04-01

The aim of present study is to evaluate and compare speech intelligibility in hearing impaired children with cochlear implants (CI) and hearing aid (HA) users and children with normal hearing (NH). The sample consisted of 45 Persian-speaking children aged 3 to 5-years-old. They were divided into three groups, and each group had 15, children, children with CI and children using hearing aids in Hamadan. Participants was evaluated by the test of speech intelligibility level. Results of ANOVA on speech intelligibility test showed that NH children had significantly better reading performance than hearing impaired children with CI and HA. Post-hoc analysis, using Scheffe test, indicated that the mean score of speech intelligibility of normal children was higher than the HA and CI groups; but the difference was not significant between mean of speech intelligibility in children with hearing loss that use cochlear implant and those using HA. It is clear that even with remarkabkle advances in HA technology, many hearing impaired children continue to find speech production a challenging problem. Given that speech intelligibility is a key element in proper communication and social interaction, consequently, educational and rehabilitation programs are essential to improve speech intelligibility of children with hearing loss.
Spectral analysis method and sample generation for real time visualization of speech

NASA Astrophysics Data System (ADS)

Hobohm, Klaus

A method for translating speech signals into optical models, characterized by high sound discrimination and learnability and designed to provide to deaf persons a feedback towards control of their way of speaking, is presented. Important properties of speech production and perception processes and organs involved in these mechanisms are recalled in order to define requirements for speech visualization. It is established that the spectral representation of time, frequency and amplitude resolution of hearing must be fair and continuous variations of acoustic parameters of speech signal must be depicted by a continuous variation of images. A color table was developed for dynamic illustration and sonograms were generated with five spectral analysis methods such as Fourier transformations and linear prediction coding. For evaluating sonogram quality, test persons had to recognize consonant/vocal/consonant words and an optimized analysis method was achieved with a fast Fourier transformation and a postprocessor. A hardware concept of a real time speech visualization system, based on multiprocessor technology in a personal computer, is presented.
Comparison of speech intelligibility in cockpit noise using SPH-4 flight helmet with and without active noise reduction

NASA Technical Reports Server (NTRS)

Chan, Jeffrey W.; Simpson, Carol A.

1990-01-01

Active Noise Reduction (ANR) is a new technology which can reduce the level of aircraft cockpit noise that reaches the pilot's ear while simultaneously improving the signal to noise ratio for voice communications and other information bearing sound signals in the cockpit. A miniature, ear-cup mounted ANR system was tested to determine whether speech intelligibility is better for helicopter pilots using ANR compared to a control condition of ANR turned off. Two signal to noise ratios (S/N), representative of actual cockpit conditions, were used for the ratio of the speech to cockpit noise sound pressure levels. Speech intelligibility was significantly better with ANR compared to no ANR for both S/N conditions. Variability of speech intelligibility among pilots was also significantly less with ANR. When the stock helmet was used with ANR turned off, the average PB Word speech intelligibility score was below the Normally Acceptable level. In comparison, it was above that level with ANR on in both S/N levels.
Voice input/output capabilities at Perception Technology Corporation

NASA Technical Reports Server (NTRS)

Ferber, Leon A.

1977-01-01

Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included.
[Advanced information technologies for financial services industry]. Final report

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

The project scope is to develop an advanced user interface utilizing speech and/or handwriting recognition technology that will improve the accuracy and speed of recording transactions in the dynamic environment of a foreign exchange (FX) trading floor. The project`s desired result is to improve the base technology for trader`s workstations on FX trading floors. Improved workstation effectiveness will allow vast amounts of complex information and events to be presented and analyzed, thus increasing the volume of money and other assets to be exchanged at an accelerated rate. The project scope is to develop and demonstrate technologies that advance interbank checkmore » imaging and paper check truncation. The following describes the tasks to be completed: (1) Identify the economics value case, the legal and regulatory issues, the business practices that are affected, and the effects upon settlement. (2) Familiarization with existing imaging technology. Develop requirements for image quality, security, and interoperability. Adapt existing technologies to meet requirements. (3) Define requirements for the imaging laboratory and design its architecture. Integrate and test technology from task 2 with equipment in the laboratory. (4) Develop and/or integrate and test remaining components; includes security, storage, and communications. (5) Build a prototype system and test in a laboratory. Install and run in two or more banks. Develop documentation. Conduct training. The project`s desired result is to enable a proof-of-concept trial in which multiple banks will exchange check images, exhibiting operating conditions which a check experiences as it travels through the payments/clearing system. The trial should demonstrate the adequacy of digital check images instead of paper checks.« less
Preparation and Perceptions of Speech-Language Pathologists Working with Children with Cochlear Implants

ERIC Educational Resources Information Center

Compton, Mary V.; Tucker, Denise A.; Flynn, Perry F.

2009-01-01

This study examined the level of preparedness of North Carolina speech-language pathologists (SLPs) who serve school-aged children with cochlear implants (CIs). A survey distributed to 190 school-based SLPs in North Carolina revealed that 79% of the participants felt they had little to no confidence in managing CI technology or in providing…
Apps, iPads, and Literacy: Examining the Feasibility of Speech Recognition in a First-Grade Classroom

ERIC Educational Resources Information Center

Baker, Elizabeth A.

2017-01-01

Informed by sociocultural and systems theory tenets, this study used ethnographic research methods to examine the feasibility of using speech recognition (SR) technology to support struggling readers in an early elementary classroom setting. Observations of eight first graders were conducted as they participated in a structured SR-supported…
Telepractice in Speech-Language Therapy: The Use of Online Technologies for Parent Training and Coaching

ERIC Educational Resources Information Center

Snodgrass, Melinda R.; Chung, Moon Y.; Biller, Maysoon F.; Appel, Katie E.; Meadan, Hedda; Halle, James W.

2017-01-01

Researchers and practitioners have found that telepractice is an effective means of increasing access to high-quality services that meet children's unique needs and is a viable mechanism to deliver speech-language services for multiple purposes. We offer a framework to facilitate the implementation of practices that are used in direct…

75 FR 54040 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-09-03

... Services (TRS) mandatory minimum standards for Video Relay Service (VRS) and Internet Protocol Relay (IP... waivers for one year because the record demonstrates that it is technologically infeasible for VRS and IP... standards for VRS and IP Relay will expire on July 1, 2011, or until the Commission addresses pending...
User Experience of a Mobile Speaking Application with Automatic Speech Recognition for EFL Learning

ERIC Educational Resources Information Center

Ahn, Tae youn; Lee, Sangmin-Michelle

2016-01-01

With the spread of mobile devices, mobile phones have enormous potential regarding their pedagogical use in language education. The goal of this study is to analyse user experience of a mobile-based learning system that is enhanced by speech recognition technology for the improvement of EFL (English as a foreign language) learners' speaking…
EduSpeak[R]: A Speech Recognition and Pronunciation Scoring Toolkit for Computer-Aided Language Learning Applications

ERIC Educational Resources Information Center

Franco, Horacio; Bratt, Harry; Rossier, Romain; Rao Gadde, Venkata; Shriberg, Elizabeth; Abrash, Victor; Precoda, Kristin

2010-01-01

SRI International's EduSpeak[R] system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to…
Using Automatic Speech Recognition Technology with Elicited Oral Response Testing

ERIC Educational Resources Information Center

Cox, Troy L.; Davies, Randall S.

2012-01-01

This study examined the use of automatic speech recognition (ASR) scored elicited oral response (EOR) tests to assess the speaking ability of English language learners. It also examined the relationship between ASR-scored EOR and other language proficiency measures and the ability of the ASR to rate speakers without bias to gender or native…
International Federation of Library Associations General Conference, Montreal 1982. Official Opening Statements and Speeches. Plenary Session I and II. Papers.

ERIC Educational Resources Information Center

International Federation of Library Associations, The Hague (Netherlands).

Official opening statements and papers on networking and the development of information technology which were presented at the 1982 International Federation of Library Associations (IFLA) conference include: (1) opening speeches by Else Granheim (IFLA president) and Kenneth H. Rogers (UNESCO Representative); (2) "The Importance of Networks…
Sensorimotor influences on speech perception in infancy.

PubMed

Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F

2015-11-03

The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.
Interactive Electronic Technical Manuals (IETMs) Annotated Bibliography

DTIC Science & Technology

2002-10-22

translated from their graphical counterparts. This paper examines a set of challenging issues facing speech interface designers and describes approaches...spreading network, combined with visual design techniques, such as typography , color, and transparency, enables the system to fluidly respond to...However, most research and design guidelines address typography and color separately without considering their spatial context or their function as
Technology-enabled management of communication and swallowing disorders in Parkinson's disease: a systematic scoping review.

PubMed

Theodoros, Deborah; Aldridge, Danielle; Hill, Anne J; Russell, Trevor

2018-06-19

Communication and swallowing disorders are highly prevalent in people with Parkinson's disease (PD). Maintenance of functional communication and swallowing over time is challenging for the person with PD and their families and may lead to social isolation and reduced quality of life if not addressed. Speech and language therapists (SLTs) face the conundrum of providing sustainable and flexible services to meet the changing needs of people with PD. Motor, cognitive and psychological issues associated with PD, medication regimens and dependency on others often impede attendance at a centre-based service. The access difficulties experienced by people with PD require a disruptive service approach to meet their needs. Technology-enabled management using information and telecommunications technologies to provide services at a distance has the potential to improve access, and enhance the quality of SLT services to people with PD. To report the status and scope of the evidence for the use of technology in the management of the communication and swallowing disorders associated with PD. Studies were retrieved from four major databases (PubMed, CINAHL, EMBASE and Medline via Web of Science). Data relating to the types of studies, level of evidence, context, nature of the management undertaken, participant perspectives and the types of technologies involved were extracted for the review. A total of 17 studies were included in the review, 15 of which related to the management of communication and swallowing disorders in PD with two studies devoted to participant perspectives. The majority of the studies reported on the treatment of the speech disorder in PD using Lee Silverman Voice Treatment (LSVT LOUD ® ). Synchronous and asynchronous technologies were used in the studies with a predominance of the former. There was a paucity of research in the management of cognitive-communication and swallowing disorders. Research evidence supporting technology-enabled management of the communication and swallowing disorders in PD is limited and predominantly low in quality. The treatment of the speech disorder online is the most developed aspect of the technology-enabled management pathway. Future research needs to address technology-enabled management of cognitive-communication and swallowing disorders and the use of a more diverse range of technologies and management approaches to optimize SLT service delivery to people with PD. © 2018 Royal College of Speech and Language Therapists.
Neuromorphic crossbar circuit with nanoscale filamentary-switching binary memristors for speech recognition.

PubMed

Truong, Son Ngoc; Ham, Seok-Jin; Min, Kyeong-Sik

2014-01-01

In this paper, a neuromorphic crossbar circuit with binary memristors is proposed for speech recognition. The binary memristors which are based on filamentary-switching mechanism can be found more popularly and are easy to be fabricated than analog memristors that are rare in materials and need a more complicated fabrication process. Thus, we develop a neuromorphic crossbar circuit using filamentary-switching binary memristors not using interface-switching analog memristors. The proposed binary memristor crossbar can recognize five vowels with 4-bit 64 input channels. The proposed crossbar is tested by 2,500 speech samples and verified to be able to recognize 89.2% of the tested samples. From the statistical simulation, the recognition rate of the binary memristor crossbar is estimated to be degraded very little from 89.2% to 80%, though the percentage variation in memristance is increased very much from 0% to 15%. In contrast, the analog memristor crossbar loses its recognition rate significantly from 96% to 9% for the same percentage variation in memristance.
Voice interactive electronic warning systems (VIEWS) - An applied approach to voice technology in the helicopter cockpit

NASA Technical Reports Server (NTRS)

Voorhees, J. W.; Bucher, N. M.

1983-01-01

The cockpit has been one of the most rapidly changing areas of new aircraft design over the past thirty years. In connection with these developments, a pilot can now be considered a decision maker/system manager as well as a vehicle controller. There is, however, a trend towards an information overload in the cockpit, and information processing problems begin to occur for the rotorcraft pilot. One approach to overcome the arising difficulties is based on the utilization of voice technology to improve the information transfer rate in the cockpit with respect to both input and output. Attention is given to the background of speech technology, the application of speech technology within the cockpit, voice interactive electronic warning system (VIEWS) simulation, and methodology. Information subsystems are considered along with a dynamic simulation study, and data collection.
New Ideas for Speech Recognition and Related Technologies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J F

The ideas relating to the use of organ motion sensors for the purposes of speech recognition were first described by.the author in spring 1994. During the past year, a series of productive collaborations between the author, Tom McEwan and Larry Ng ensued and have lead to demonstrations, new sensor ideas, and algorithmic descriptions of a large number of speech recognition concepts. This document summarizes the basic concepts of recognizing speech once organ motions have been obtained. Micro power radars and their uses for the measurement of body organ motions, such as those of the heart and lungs, have been demonstratedmore » by Tom McEwan over the past two years. McEwan and I conducted a series of experiments, using these instruments, on vocal organ motions beginning in late spring, during which we observed motions of vocal folds (i.e., cords), tongue, jaw, and related organs that are very useful for speech recognition and other purposes. These will be reviewed in a separate paper. Since late summer 1994, Lawrence Ng and I have worked to make many of the initial recognition ideas more rigorous and to investigate the applications of these new ideas to new speech recognition algorithms, to speech coding, and to speech synthesis. I introduce some of those ideas in section IV of this document, and we describe them more completely in the document following this one, UCRL-UR-120311. For the design and operation of micro-power radars and their application to body organ motions, the reader may contact Tom McEwan directly. The capability for using EM sensors (i.e., radar units) to measure body organ motions and positions has been available for decades. Impediments to their use appear to have been size, excessive power, lack of resolution, and lack of understanding of the value of organ motion measurements, especially as applied to speech related technologies. However, with the invention of very low power, portable systems as demonstrated by McEwan at LLNL researchers have begun to think differently about practical applications of such radars. In particular, his demonstrations of heart and lung motions have opened up many new areas of application for human and animal measurements.« less
Blood glucose meters and accessibility to blind and visually impaired people.

PubMed

Burton, Darren M; Enigk, Matthew G; Lilly, John W

2012-03-01

In 2007, five blood glucose meters (BGMs) were introduced with integrated speech output necessary for use by persons with vision loss. One of those five meters had fully integrated speech output, allowing a person with vision loss independence in accessing all features and functions of the meter. In comparison, 13 BGMs with integrated speech output were available in 2011. Accessibility attributes of these 11 meters were tabulated and product design features examined. All 13 meters were found to be usable by persons with vision loss to obtain a blood glucose measurement. However, only 4 of them featured the fully integrated speech output necessary for a person with vision loss to access all features and functions independently. © 2012 Diabetes Technology Society.
A comparison of sensory-motor activity during speech in first and second languages.

PubMed

Simmonds, Anna J; Wise, Richard J S; Dhanjal, Novraj S; Leech, Robert

2011-07-01

A foreign language (L2) learned after childhood results in an accent. This functional neuroimaging study investigated speech in L2 as a sensory-motor skill. The hypothesis was that there would be an altered response in auditory and somatosensory association cortex, specifically the planum temporale and parietal operculum, respectively, when speaking in L2 relative to L1, independent of rate of speaking. These regions were selected for three reasons. First, an influential computational model proposes that these cortices integrate predictive feedforward and postarticulatory sensory feedback signals during articulation. Second, these adjacent regions (known as Spt) have been identified as a "sensory-motor interface" for speech production. Third, probabilistic anatomical atlases exist for these regions, to ensure the analyses are confined to sensory-motor differences between L2 and L1. The study used functional magnetic resonance imaging (fMRI), and participants produced connected overt speech. The first hypothesis was that there would be greater activity in the planum temporale and the parietal operculum when subjects spoke in L2 compared with L1, one interpretation being that there is less efficient postarticulatory sensory monitoring when speaking in the less familiar L2. The second hypothesis was that this effect would be observed in both cerebral hemispheres. Although Spt is considered to be left-lateralized, this is based on studies of covert speech, whereas overt speech is accompanied by sensory feedback to bilateral auditory and somatosensory cortices. Both hypotheses were confirmed by the results. These findings provide the basis for future investigations of sensory-motor aspects of language learning using serial fMRI studies.
Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex.

PubMed

Ibayashi, Kenji; Kunii, Naoto; Matsuo, Takeshi; Ishishita, Yohei; Shimada, Seijiro; Kawai, Kensuke; Saito, Nobuhito

2018-01-01

Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs) is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA), local field potential (LFP), and electrocorticography (ECoG) are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC), we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs.
Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex

PubMed Central

Ibayashi, Kenji; Kunii, Naoto; Matsuo, Takeshi; Ishishita, Yohei; Shimada, Seijiro; Kawai, Kensuke; Saito, Nobuhito

2018-01-01

Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs) is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA), local field potential (LFP), and electrocorticography (ECoG) are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC), we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs. PMID:29674950
Audiomotor Perceptual Training Enhances Speech Intelligibility in Background Noise.

PubMed

Whitton, Jonathon P; Hancock, Kenneth E; Shannon, Jeffrey M; Polley, Daniel B

2017-11-06

Sensory and motor skills can be improved with training, but learning is often restricted to practice stimuli. As an exception, training on closed-loop (CL) sensorimotor interfaces, such as action video games and musical instruments, can impart a broad spectrum of perceptual benefits. Here we ask whether computerized CL auditory training can enhance speech understanding in levels of background noise that approximate a crowded restaurant. Elderly hearing-impaired subjects trained for 8 weeks on a CL game that, like a musical instrument, challenged them to monitor subtle deviations between predicted and actual auditory feedback as they moved their fingertip through a virtual soundscape. We performed our study as a randomized, double-blind, placebo-controlled trial by training other subjects in an auditory working-memory (WM) task. Subjects in both groups improved at their respective auditory tasks and reported comparable expectations for improved speech processing, thereby controlling for placebo effects. Whereas speech intelligibility was unchanged after WM training, subjects in the CL training group could correctly identify 25% more words in spoken sentences or digit sequences presented in high levels of background noise. Numerically, CL audiomotor training provided more than three times the benefit of our subjects' hearing aids for speech processing in noisy listening conditions. Gains in speech intelligibility could be predicted from gameplay accuracy and baseline inhibitory control. However, benefits did not persist in the absence of continuing practice. These studies employ stringent clinical standards to demonstrate that perceptual learning on a computerized audio game can transfer to "real-world" communication challenges. Copyright © 2017 Elsevier Ltd. All rights reserved.
What Is Educational Technology?

ERIC Educational Resources Information Center

Ingle, Henry T.

1975-01-01

Featured in this issue are the English translations of two speeches delivered to graduate students in educational technology at Pontificia Universidade, Porto Alegre, Brazil. Henry Ingle defines educational technology in the traditional as well as modern sense, describes its essential elements, and discusses situations in which the use of…
Use of speech-to-text technology for documentation by healthcare providers.

PubMed

Ajami, Sima

2016-01-01

Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Speech intelligibility with helicopter noise: tests of three helmet-mounted communication systems.

PubMed

Ribera, John E; Mozo, Ben T; Murphy, Barbara A

2004-02-01

Military aviator helmet communications systems are designed to enhance speech intelligibility (SI) in background noise and reduce exposure to harmful levels of noise. Some aviators, over the course of their aviation career, develop noise-induced hearing loss that may affect their ability to perform required tasks. New technology can improve SI in noise for aviators with normal hearing as well as those with hearing loss. SI in noise scores were obtained from 40 rotary-wing aviators (20 with normal hearing and 20 with hearing-loss waivers). There were three communications systems evaluated: a standard SPH-4B, an SPH-4B aviator helmet modified with communications earplug (CEP), and an SPH-4B modified with active noise reduction (ANR). Subjects' SI was better in noise with newer technologies than with the standard issue aviator helmet. A significant number of aviators on waivers for hearing loss performed within the range of their normal hearing counterparts when wearing the newer technology. The rank order of perceived speech clarity was 1) CEP, 2) ANR, and 3) unmodified SPH-4B. To insure optimum SI in noise for rotary-wing aviators, consideration should be given to retrofitting existing aviator helmets with new technology, and incorporating such advances in communication systems of the future. Review of standards for determining fitness to fly is needed.
A truly human interface: interacting face-to-face with someone whose words are determined by a computer program

PubMed Central

Corti, Kevin; Gillespie, Alex

2015-01-01

We use speech shadowing to create situations wherein people converse in person with a human whose words are determined by a conversational agent computer program. Speech shadowing involves a person (the shadower) repeating vocal stimuli originating from a separate communication source in real-time. Humans shadowing for conversational agent sources (e.g., chat bots) become hybrid agents (“echoborgs”) capable of face-to-face interlocution. We report three studies that investigated people’s experiences interacting with echoborgs and the extent to which echoborgs pass as autonomous humans. First, participants in a Turing Test spoke with a chat bot via either a text interface or an echoborg. Human shadowing did not improve the chat bot’s chance of passing but did increase interrogators’ ratings of how human-like the chat bot seemed. In our second study, participants had to decide whether their interlocutor produced words generated by a chat bot or simply pretended to be one. Compared to those who engaged a text interface, participants who engaged an echoborg were more likely to perceive their interlocutor as pretending to be a chat bot. In our third study, participants were naïve to the fact that their interlocutor produced words generated by a chat bot. Unlike those who engaged a text interface, the vast majority of participants who engaged an echoborg did not sense a robotic interaction. These findings have implications for android science, the Turing Test paradigm, and human–computer interaction. The human body, as the delivery mechanism of communication, fundamentally alters the social psychological dynamics of interactions with machine intelligence. PMID:26042066

Using Text-to-Speech (TTS) for Audio Computer-Assisted Self-Interviewing (ACASI)

ERIC Educational Resources Information Center

Couper, Mick P.; Berglund, Patricia; Kirgis, Nicole; Buageila, Sarrah

2016-01-01

We evaluate the use of text-to-speech (TTS) technology for audio computer-assisted self-interviewing (ACASI). We use a quasi-experimental design, comparing the use of recorded human voice in the 2006-2010 National Survey of Family Growth with the use of TTS in the first year of the 2011-2013 survey, where the essential survey conditions are…
Computer-Based Rehabilitation for Developing Speech and Language in Hearing-Impaired Children: A Systematic Review

ERIC Educational Resources Information Center

Simpson, Andrea; El-Refaie, Amr; Stephenson, Caitlin; Chen, Yi-Ping Phoebe; Deng, Dennis; Erickson, Shane; Tay, David; Morris, Meg E.; Doube, Wendy; Caelli, Terry

2015-01-01

The purpose of this systematic review was to examine whether online or computer-based technologies were effective in assisting the development of speech and language skills in children with hearing loss. Relevant studies of children with hearing loss were analysed with reference to (1) therapy outcomes, (2) factors affecting outcomes, and (3)…
Speech-Enabled Tools for Augmented Interaction in E-Learning Applications

ERIC Educational Resources Information Center

Selouani, Sid-Ahmed A.; Lê, Tang-Hô; Benahmed, Yacine; O'Shaughnessy, Douglas

2008-01-01

This article presents systems that use speech technology, to emulate the one-on-one interaction a student can get from a virtual instructor. A web-based learning tool, the Learn IN Context (LINC+) system, designed and used in a real mixed-mode learning context for a computer (C++ language) programming course taught at the Université de Moncton…
Investigating the Effectiveness of Speech-To-Text Recognition Applications on Learning Performance, Attention, and Meditation

ERIC Educational Resources Information Center

Shadiev, Rustam; Huang, Yueh-Min; Hwang, Jan-Pan

2017-01-01

In this study, the effectiveness of the application of speech-to-text recognition (STR) technology on enhancing learning and concentration in a calm state of mind, hereafter referred to as meditation (An intentional and self-regulated focusing of attention in order to relax and calm the mind), was investigated. This effectiveness was further…
Evaluation of Different Speech and Touch Interfaces to In-Vehicle Music Retrieval Systems

PubMed Central

Garay-Vega, L.; Pradhan, A. K.; Weinberg, G.; Schmidt-Nielsen, B.; Harsham, B.; Shen, Y.; Divekar, G.; Romoser, M.; Knodler, M.; Fisher, D. L.

2010-01-01

In-vehicle music retrieval systems are becoming more and more popular. Previous studies have shown that they pose a real hazard to drivers when the interface is a tactile one which requires multiple entries and a combination of manual control and visual feedback. Voice interfaces exist as an alternative. Such interfaces can require either multiple or single conversational turns. In this study, each of 17 participants between the ages of 18 and 30 years old was asked to use three different music-retrieval systems (one with a multiple entry touch interface, the iPod™, one with a multiple turn voice interface, interface B, and one with a single turn voice interface, interface C) while driving through a virtual world. Measures of secondary task performance, eye behavior, vehicle control, and workload were recorded. When compared with the touch interface, the voice interfaces reduced the total time drivers spent with their eyes off the forward roadway, especially in prolonged glances, as well as both the total number of glances away from the forward roadway and the perceived workload. Furthermore, when compared with driving without a secondary task, both voice interfaces did not significantly impact hazard anticipation, the frequency of long glances away from the forward roadway, or vehicle control. The multiple turn voice interface (B) significantly increased both the time it took drivers to complete the task and the workload. The implications for interface design and safety are discussed. PMID:20380920
Using assistive robots to promote inclusive education.

PubMed

Encarnação, P; Leite, T; Nunes, C; Nunes da Ponte, M; Adams, K; Cook, A; Caiado, A; Pereira, J; Piedade, G; Ribeiro, M

2017-05-01

This paper describes the development and test of physical and virtual integrated augmentative manipulation and communication assistive technologies (IAMCATs) that enable children with motor and speech impairments to manipulate educational items by controlling a robot with a gripper, while communicating through a speech generating device. Nine children with disabilities, nine regular and nine special education teachers participated in the study. Teachers adapted academic activities so they could also be performed by the children with disabilities using the IAMCAT. An inductive content analysis of the teachers' interviews before and after the intervention was performed. Teachers considered the IAMCAT to be a useful resource that can be integrated into the regular class dynamics respecting their curricular planning. It had a positive impact on children with disabilities and on the educational community. However, teachers pointed out the difficulties in managing the class, even with another adult present, due to the extra time required by children with disabilities to complete the activities. The developed assistive technologies enable children with disabilities to participate in academic activities but full inclusion would require another adult in class and strategies to deal with the additional time required by children to complete the activities. Implications for Rehabilitation Integrated augmentative manipulation and communication assistive technologies are useful resources to promote the participation of children with motor and speech impairments in classroom activities. Virtual tools, running on a computer screen, may be easier to use but further research is needed in order to evaluate its effectiveness when compared to physical tools. Full participation of children with motor and speech impairments in academic activities using these technologies requires another adult in class and adequate strategies to manage the extra time the child with disabilities may require to complete the activities.
Robust recognition of loud and Lombard speech in the fighter cockpit environment

NASA Astrophysics Data System (ADS)

Stanton, Bill J., Jr.

1988-08-01

There are a number of challenges associated with incorporating speech recognition technology into the fighter cockpit. One of the major problems is the wide range of variability in the pilot's voice. That can result from changing levels of stress and workload. Increasing the training set to include abnormal speech is not an attractive option because of the innumerable conditions that would have to be represented and the inordinate amount of time to collect such a training set. A more promising approach is to study subsets of abnormal speech that have been produced under controlled cockpit conditions with the purpose of characterizing reliable shifts that occur relative to normal speech. Such was the initiative of this research. Analyses were conducted for 18 features on 17671 phoneme tokens across eight speakers for normal, loud, and Lombard speech. It was discovered that there was a consistent migration of energy in the sonorants. This discovery of reliable energy shifts led to the development of a method to reduce or eliminate these shifts in the Euclidean distances between LPC log magnitude spectra. This combination significantly improved recognition performance of loud and Lombard speech. Discrepancies in recognition error rates between normal and abnormal speech were reduced by approximately 50 percent for all eight speakers combined.
Speech Understanding with a New Implant Technology: A Comparative Study with a New Nonskin Penetrating Baha System

PubMed Central

Caversaccio, Marco

2014-01-01

Objective. To compare hearing and speech understanding between a new, nonskin penetrating Baha system (Baha Attract) to the current Baha system using a skin-penetrating abutment. Methods. Hearing and speech understanding were measured in 16 experienced Baha users. The transmission path via the abutment was compared to a simulated Baha Attract transmission path by attaching the implantable magnet to the abutment and then by adding a sample of artificial skin and the external parts of the Baha Attract system. Four different measurements were performed: bone conduction thresholds directly through the sound processor (BC Direct), aided sound field thresholds, aided speech understanding in quiet, and aided speech understanding in noise. Results. The simulated Baha Attract transmission path introduced an attenuation starting from approximately 5 dB at 1000 Hz, increasing to 20–25 dB above 6000 Hz. However, aided sound field threshold shows smaller differences and aided speech understanding in quiet and in noise does not differ significantly between the two transmission paths. Conclusion. The Baha Attract system transmission path introduces predominately high frequency attenuation. This attenuation can be partially compensated by adequate fitting of the speech processor. No significant decrease in speech understanding in either quiet or in noise was found. PMID:25140314
On the Development of Speech Resources for the Mixtec Language

PubMed Central

2013-01-01

The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). PMID:23710134
Real time speech formant analyzer and display

DOEpatents

Holland, George E.; Struve, Walter S.; Homer, John F.

1987-01-01

A speech analyzer for interpretation of sound includes a sound input which converts the sound into a signal representing the sound. The signal is passed through a plurality of frequency pass filters to derive a plurality of frequency formants. These formants are converted to voltage signals by frequency-to-voltage converters and then are prepared for visual display in continuous real time. Parameters from the inputted sound are also derived and displayed. The display may then be interpreted by the user. The preferred embodiment includes a microprocessor which is interfaced with a television set for displaying of the sound formants. The microprocessor software enables the sound analyzer to present a variety of display modes for interpretive and therapeutic used by the user.
Real time speech formant analyzer and display

DOEpatents

Holland, G.E.; Struve, W.S.; Homer, J.F.

1987-02-03

A speech analyzer for interpretation of sound includes a sound input which converts the sound into a signal representing the sound. The signal is passed through a plurality of frequency pass filters to derive a plurality of frequency formants. These formants are converted to voltage signals by frequency-to-voltage converters and then are prepared for visual display in continuous real time. Parameters from the inputted sound are also derived and displayed. The display may then be interpreted by the user. The preferred embodiment includes a microprocessor which is interfaced with a television set for displaying of the sound formants. The microprocessor software enables the sound analyzer to present a variety of display modes for interpretive and therapeutic used by the user. 19 figs.
Lip Movement Exaggerations During Infant-Directed Speech

PubMed Central

Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana

2011-01-01

Purpose Although a growing body of literature has indentified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their infants. Method Lip movements were recorded from 25 mothers as they spoke to their infants and other adults. Lip shapes were analyzed for differences across speaking conditions. The maximum fundamental frequency, duration, acoustic intensity, and first and second formant frequency of each vowel also were measured. Results Lip movements were significantly larger during IDS than during adult-directed speech, although the exaggerations were vowel specific. All of the vowels produced during IDS were characterized by an elevated vocal pitch and a slowed speaking rate when compared with vowels produced during adult-directed speech. Conclusion The pattern of lip-shape exaggerations did not provide support for the hypothesis that mothers produce exemplar visual models of vowels during IDS. Future work is required to determine whether the observed increases in vertical lip aperture engender visual and acoustic enhancements that facilitate the early learning of speech. PMID:20699342
Speech and swallowing disorders in Parkinson disease.

PubMed

Sapir, Shimon; Ramig, Lorraine; Fox, Cynthia

2008-06-01

To review recent research and clinical studies pertaining to the nature, diagnosis, and treatment of speech and swallowing disorders in Parkinson disease. Although some studies indicate improvement in voice and speech with dopamine therapy and deep brain stimulation of the subthalamic nucleus, others show minimal or adverse effects. Repetitive transcranial magnetic stimulation of the mouth motor cortex and injection of collagen in the vocal folds have preliminary data supporting improvement in phonation in people with Parkinson disease. Treatments focusing on vocal loudness, specifically LSVT LOUD (Lee Silverman Voice Treatment), have been effective for the treatment of speech disorders in Parkinson disease. Changes in brain activity due to LSVT LOUD provide preliminary evidence for neural plasticity. Computer-based technology makes the Lee Silverman Voice Treatment available to a large number of users. A rat model for studying neuropharmacologic effects on vocalization in Parkinson disease has been developed. New diagnostic methods of speech and swallowing are also available as the result of recent studies. Speech rehabilitation with the LSVT LOUD is highly efficacious and scientifically tested. There is a need for more studies to improve understanding, diagnosis, prevention, and treatment of speech and swallowing disorders in Parkinson disease.
Changes in Speech Chunking in Reading Aloud is a marker of Mild Cognitive Impairment and Mild-to-Moderate Alzheimer's Disease.

PubMed

De Looze, Celine; Kelly, Finnian; Crosby, Lisa; Vourdanou, Aisling; Coen, Robert F; Walsh, Cathal; Lawlor, Brian A; Reilly, Richard B

2018-04-04

Speech and language impairments, generally attributed to lexico-semantic deficits, have been documented in Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD). This study investigates the temporal organisation of speech (reflective of speech production planning) in reading aloud in relation to cognitive impairment, particularly working memory and attention deficits in MCI and AD. The discriminative ability of temporal features extracted from a newly designed read speech task is also evaluated for the detection of MCI and AD. Sixteen patients with MCI, eighteen patients with mild-to-moderate AD and thirty-six healthy controls (HC) underwent a battery of neuropsychological tests and read a set of sentences varying in cognitive load, probed by manipulating sentence length and syntactic complexity. Our results show that mild-to-moderate AD is associated with a general slowness of speech, attributed to a higher number of speech chunks, silent pauses and dysfluences, and slower speech and articulation rates. Speech chunking in the context of high cognitive-linguistic demand appears to be an informative marker of MCI, specifically related to early deficits in working memory and attention. In addition, Linear Discriminant Analysis shows the ROC AUCs (Areas Under the Receiver Operating Characteristic Curves) of identifying MCI vs. HC, MCI vs. AD and AD vs HC using these speech characteristics are 0.75, 0.90 and 0.94 respectively. The implementation of connected speech-based technologies in clinical and community settings may provide additional information for the early detection of MCI and AD. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Assistive Devices for People with Hearing, Voice, Speech, or Language Disorders

MedlinePlus

... sleepy. What research is being conducted on assistive technology? The National Institute on Deafness and Other Communication ... NIDCD) funds research into several areas of assistive technology, such as those described below. Improved devices for ...
Interface: The UN Speaks to American Educators. The Major Speeches by UN Officials at the "Global Crossroads" National Assembly, Washington, DC, May 1984.

ERIC Educational Resources Information Center

Bhagat, Susheila R., Ed.

Development education, the process of sensitizing citizens of industrialized countries to the problems of the third world, and related issues of global development, has gained acceptance among educators in recent years. To respond to this global approach to development, a National Assembly ("Global Crossroads: Educating Americans for…
The Intonation-Syntax Interface in the Speech of Individuals with Parkinson's Disease

ERIC Educational Resources Information Center

MacPherson, Megan K.; Huber, Jessica E.; Snow, David P.

2011-01-01

Purpose: This study examined the effect of Parkinson's disease (PD) on the intonational marking of final and nonfinal syntactic boundaries and investigated whether the effect of PD on intonation was sex specific. Method: Eight women and 8 men with PD and 16 age- and sex-matched control participants read a passage at comfortable pitch, rate, and…
Integrating Speech and Iconic Gestures in a Stroop-Like Task: Evidence for Automatic Processing

ERIC Educational Resources Information Center

Kelly, Spencer D.; Creigh, Peter; Bartolotti, James

2010-01-01

Previous research has demonstrated a link between language and action in the brain. The present study investigates the strength of this neural relationship by focusing on a potential interface between the two systems: cospeech iconic gesture. Participants performed a Stroop-like task in which they watched videos of a man and a woman speaking and…
Stated Preferences for Components of a Personal Guidance System for Nonvisual Navigation

ERIC Educational Resources Information Center

Golledge, Reginald G.; Marston, James R.; Loomis, Jack M.; Klatzky, Roberta L.

2004-01-01

This article reports on a survey of the preferences of visually impaired persons for a possible personal navigation device. The results showed that the majority of participants preferred speech input and output interfaces, were willing to use such a product, thought that they would make more trips with such a device, and had some concerns about…
Advanced Electronic Technology

DTIC Science & Technology

1977-11-15

Electronics 15 III. Materials Research 15 TV. Microelectronics 16 V. Surface- Wave Technology 16 DATA SYSTEMS DIVISION 2 INTRODUCTION This...Processing Digital Voice Processing Packet Speech Wideband Integrated Voice/Data Technology Radar Signal Processing Technology Nuclear Safety Designs...facilities make it possible to track the status of these jobs, retrieve their job control language listings, and direct a copy of printed or punched

Multiple benefits of personal FM system use by children with auditory processing disorder (APD).

PubMed

Johnston, Kristin N; John, Andrew B; Kreisman, Nicole V; Hall, James W; Crandell, Carl C

2009-01-01

Children with auditory processing disorders (APD) were fitted with Phonak EduLink FM devices for home and classroom use. Baseline measures of the children with APD, prior to FM use, documented significantly lower speech-perception scores, evidence of decreased academic performance, and psychosocial problems in comparison to an age- and gender-matched control group. Repeated measures during the school year demonstrated speech-perception improvement in noisy classroom environments as well as significant academic and psychosocial benefits. Compared with the control group, the children with APD showed greater speech-perception advantage with FM technology. Notably, after prolonged FM use, even unaided (no FM device) speech-perception performance was improved in the children with APD, suggesting the possibility of fundamentally enhanced auditory system function.
Audio-Visual Situational Awareness for General Aviation Pilots

NASA Technical Reports Server (NTRS)

Spirkovska, Lilly; Lodha, Suresh K.; Clancy, Daniel (Technical Monitor)

2001-01-01

Weather is one of the major causes of general aviation accidents. Researchers are addressing this problem from various perspectives including improving meteorological forecasting techniques, collecting additional weather data automatically via on-board sensors and "flight" modems, and improving weather data dissemination and presentation. We approach the problem from the improved presentation perspective and propose weather visualization and interaction methods tailored for general aviation pilots. Our system, Aviation Weather Data Visualization Environment (AWE), utilizes information visualization techniques, a direct manipulation graphical interface, and a speech-based interface to improve a pilot's situational awareness of relevant weather data. The system design is based on a user study and feedback from pilots.
MILCOM '85 - Military Communications Conference, Boston, MA, October 20-23, 1985, Conference Record. Volumes 1, 2, & 3

NASA Astrophysics Data System (ADS)

The present conference on the development status of communications systems in the context of electronic warfare gives attention to topics in spread spectrum code acquisition, digital speech technology, fiber-optics communications, free space optical communications, the networking of HF systems, and applications and evaluation methods for digital speech. Also treated are issues in local area network system design, coding techniques and applications, technology applications for HF systems, receiver technologies, software development status, channel simultion/prediction methods, C3 networking spread spectrum networks, the improvement of communication efficiency and reliability through technical control methods, mobile radio systems, and adaptive antenna arrays. Finally, communications system cost analyses, spread spectrum performance, voice and image coding, switched networks, and microwave GaAs ICs, are considered.
The effects of speech output technology in the learning of graphic symbols.

PubMed Central

Schlosser, R W; Belfiore, P J; Nigam, R; Blischak, D; Hetzroni, O

1995-01-01

The effects of auditory stimuli in the form of synthetic speech output on the learning of graphic symbols were evaluated. Three adults with severe to profound mental retardation and communication impairments were taught to point to lexigrams when presented with words under two conditions. In the first condition, participants used a voice output communication aid to receive synthetic speech as antecedent and consequent stimuli. In the second condition, with a nonelectronic communications board, participants did not receive synthetic speech. A parallel treatments design was used to evaluate the effects of the synthetic speech output as an added component of the augmentative and alternative communication system. The 3 participants reached criterion when not provided with the auditory stimuli. Although 2 participants also reached criterion when not provided with the auditory stimuli, the addition of auditory stimuli resulted in more efficient learning and a decreased error rate. Maintenance results, however, indicated no differences between conditions. Finding suggest that auditory stimuli in the form of synthetic speech contribute to the efficient acquisition of graphic communication symbols. PMID:14743828
Scientific bases of human-machine communication by voice.

PubMed Central

Schafer, R W

1995-01-01

The scientific bases for human-machine communication by voice are in the fields of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology. The purpose of this paper is to highlight the basic scientific and technological issues in human-machine communication by voice and to point out areas of future research opportunity. The discussion is organized around the following major issues in implementing human-machine voice communication systems: (i) hardware/software implementation of the system, (ii) speech synthesis for voice output, (iii) speech recognition and understanding for voice input, and (iv) usability factors related to how humans interact with machines. PMID:7479802
Auditory-motor interaction revealed by fMRI: speech, music, and working memory in area Spt.

PubMed

Hickok, Gregory; Buchsbaum, Bradley; Humphries, Colin; Muftuler, Tugan

2003-07-01

The concept of auditory-motor interaction pervades speech science research, yet the cortical systems supporting this interface have not been elucidated. Drawing on experimental designs used in recent work in sensory-motor integration in the cortical visual system, we used fMRI in an effort to identify human auditory regions with both sensory and motor response properties, analogous to single-unit responses in known visuomotor integration areas. The sensory phase of the task involved listening to speech (nonsense sentences) or music (novel piano melodies); the "motor" phase of the task involved covert rehearsal/humming of the auditory stimuli. A small set of areas in the superior temporal and temporal-parietal cortex responded both during the listening phase and the rehearsal/humming phase. A left lateralized region in the posterior Sylvian fissure at the parietal-temporal boundary, area Spt, showed particularly robust responses to both phases of the task. Frontal areas also showed combined auditory + rehearsal responsivity consistent with the claim that the posterior activations are part of a larger auditory-motor integration circuit. We hypothesize that this circuit plays an important role in speech development as part of the network that enables acoustic-phonetic input to guide the acquisition of language-specific articulatory-phonetic gestures; this circuit may play a role in analogous musical abilities. In the adult, this system continues to support aspects of speech production, and, we suggest, supports verbal working memory.
The steady-state response of the cerebral cortex to the beat of music reflects both the comprehension of music and attention

PubMed Central

Meltzer, Benjamin; Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

2015-01-01

The brain’s analyses of speech and music share a range of neural resources and mechanisms. Music displays a temporal structure of complexity similar to that of speech, unfolds over comparable timescales, and elicits cognitive demands in tasks involving comprehension and attention. During speech processing, synchronized neural activity of the cerebral cortex in the delta and theta frequency bands tracks the envelope of a speech signal, and this neural activity is modulated by high-level cortical functions such as speech comprehension and attention. It remains unclear, however, whether the cortex also responds to the natural rhythmic structure of music and how the response, if present, is influenced by higher cognitive processes. Here we employ electroencephalography to show that the cortex responds to the beat of music and that this steady-state response reflects musical comprehension and attention. We show that the cortical response to the beat is weaker when subjects listen to a familiar tune than when they listen to an unfamiliar, non-sensical musical piece. Furthermore, we show that in a task of intermodal attention there is a larger neural response at the beat frequency when subjects attend to a musical stimulus than when they ignore the auditory signal and instead focus on a visual one. Our findings may be applied in clinical assessments of auditory processing and music cognition as well as in the construction of auditory brain-machine interfaces. PMID:26300760
Human factors issues associated with the use of speech technology in the cockpit

NASA Technical Reports Server (NTRS)

Kersteen, Z. A.; Damos, D.

1983-01-01

The human factors issues associated with the use of voice technology in the cockpit are summarized. The formulation of the LHX avionics suite is described and the allocation of tasks to voice in the cockpit is discussed. State-of-the-art speech recognition technology is reviewed. Finally, a questionnaire designed to tap pilot opinions concerning the allocation of tasks to voice input and output in the cockpit is presented. This questionnaire was designed to be administered to operational AH-1G Cobra gunship pilots. Half of the questionnaire deals specifically with the AH-1G cockpit and the types of tasks pilots would like to have performed by voice in this existing rotorcraft. The remaining portion of the questionnaire deals with an undefined rotorcraft of the future and is aimed at determining what types of tasks these pilots would like to have performed by voice technology if anything was possible, i.e. if there were no technological constraints.
Network Speech Systems Technology Program.

DTIC Science & Technology

1980-09-30

ognized that the lumped-speaker approximation could be extended even more generally to include cases of combined circuit-switched speech and packet...based on these tables. The first function is an im- portant element of the more general task of system control for a switched network, which in...programs are in preparation, as described below, for both steady-state evaluation and dynamic performance simulation of the algorithm in general
Evaluating the iPad Mini® as a Speech-Generating Device in the Acquisition of a Discriminative Mand Repertoire for Young Children with Autism

ERIC Educational Resources Information Center

Lorah, Elizabeth R.

2018-01-01

There has been an increased interest in research evaluating the use of handheld computing technology as speech-generating devices (SGD) for children with autism. However, given the reliance on single-subject research methodology, replications of these investigations are necessary. This study presents a replication with variation, of a method for…
The Effect of Speech-to-Text Technology on Learning a Writing Strategy

ERIC Educational Resources Information Center

Haug, Katrina N.; Klein, Perry D.

2018-01-01

Previous research has shown that speech-to-text (STT) software can support students in producing a given piece of writing. This is the 1st study to investigate the use of STT to teach a writing strategy. We pretested 45 Grade 5 students on argument writing and trained them to use STT. Students participated in 4 lessons on an argument writing…
Problems in Preparing for the English Impromptu Speech Contest: The Case of Yuanpei Institute of Science and Technology in Taiwan

ERIC Educational Resources Information Center

Hsieh, Shu-min

2006-01-01

Entering an "English Impromptu Speech Contest" intimidates many students who do not have a good command of the English language. Some choose to give up before the contest date while others stand speechless on the stage. This paper identifies a range of problems confronted by contestants from my college, the Yuanpei Institute of Science…
Keynote: FarNet Ten Years On--The Past, Present, and Future for Distance Learners

ERIC Educational Resources Information Center

Alexander-Bennett, Carolyn

2016-01-01

This think piece by Carolyn Alexander-Bennett is a reflection of her keynote speech at DEANZ2016 conference, which was held from 17-20th April at the University of Waikato, New Zealand. In her speech Carolyn revisits the issues, developments, and technology trends that led to the birth of FarNet (an online cluster of schools catering for the…
Impact of Hearing Aid Technology on Outcomes in Daily Life II: Speech Understanding and Listening Effort.

PubMed

Johnson, Jani A; Xu, Jingjing; Cox, Robyn M

2016-01-01

Modern hearing aid (HA) devices include a collection of acoustic signal-processing features designed to improve listening outcomes in a variety of daily auditory environments. Manufacturers market these features at successive levels of technological sophistication. The features included in costlier premium hearing devices are designed to result in further improvements to daily listening outcomes compared with the features included in basic hearing devices. However, independent research has not substantiated such improvements. This research was designed to explore differences in speech-understanding and listening-effort outcomes for older adults using premium-feature and basic-feature HAs in their daily lives. For this participant-blinded, repeated, crossover trial 45 older adults (mean age 70.3 years) with mild-to-moderate sensorineural hearing loss wore each of four pairs of bilaterally fitted HAs for 1 month. HAs were premium- and basic-feature devices from two major brands. After each 1-month trial, participants' speech-understanding and listening-effort outcomes were evaluated in the laboratory and in daily life. Three types of speech-understanding and listening-effort data were collected: measures of laboratory performance, responses to standardized self-report questionnaires, and participant diary entries about daily communication. The only statistically significant superiority for the premium-feature HAs occurred for listening effort in the loud laboratory condition and was demonstrated for only one of the tested brands. The predominant complaint of older adults with mild-to-moderate hearing impairment is difficulty understanding speech in various settings. The combined results of all the outcome measures used in this research suggest that, when fitted using scientifically based practices, both premium- and basic-feature HAs are capable of providing considerable, but essentially equivalent, improvements to speech understanding and listening effort in daily life for this population. For HA providers to make evidence-based recommendations to their clientele with hearing impairment it is essential that further independent research investigates the relative benefit/deficit of different levels of hearing technology across brands and manufacturers in these and other real-world listening domains.
Teaching mindfulness meditation to adults with severe speech and physical impairments: An exploratory study.

PubMed

Goodrich, Elena; Wahbeh, Helané; Mooney, Aimee; Miller, Meghan; Oken, Barry S

2015-01-01

People with severe speech and physical impairments may benefit from mindfulness meditation training because it has the potential to enhance their ability to cope with anxiety, depression and pain and improve their attentional capacity to use brain-computer interface systems. Seven adults with severe speech and physical impairments (SSPI) - defined as speech that is understood less than 25% of the time and/or severely reduced hand function for writing/typing - participated in this exploratory, uncontrolled intervention study. The objectives were to describe the development and implementation of a six-week mindfulness meditation intervention and to identify feasible outcome measures in this population. The weekly intervention was delivered by an instructor in the participant's home, and participants were encouraged to practise daily using audio recordings. The objective adherence to home practice was 10.2 minutes per day. Exploratory outcome measures were an n-back working memory task, the Attention Process Training-II Attention Questionnaire, the Pittsburgh Sleep Quality Index, the Perceived Stress Scale, the Positive and Negative Affect Schedule, and a qualitative feedback survey. There were no statistically significant pre-post results in this small sample, yet administration of the measures proved feasible, and qualitative reports were overall positive. Obstacles to teaching mindfulness meditation to persons with SSPI are reported, and solutions are proposed.
Evaluating Microcomputer Access Technology for Use by Visually Impaired Students.

ERIC Educational Resources Information Center

Ruconich, Sandra

1984-01-01

The article outlines advantages and limitations of five types of access to microcomputer technology for visually impaired students: electronic braille, paper braille, Optacon, synthetic speech, and enlarged print. Additional considerations in access decisions are noted. (CL)
"Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing.

PubMed

Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis G; Atkins, David C; Narayanan, Shrikanth S

2015-01-01

The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.
Nonlinear frequency compression: effects on sound quality ratings of speech and music.

PubMed

Parsa, Vijay; Scollie, Susan; Glista, Danielle; Seelisch, Andreas

2013-03-01

Frequency lowering technologies offer an alternative amplification solution for severe to profound high frequency hearing losses. While frequency lowering technologies may improve audibility of high frequency sounds, the very nature of this processing can affect the perceived sound quality. This article reports the results from two studies that investigated the impact of a nonlinear frequency compression (NFC) algorithm on perceived sound quality. In the first study, the cutoff frequency and compression ratio parameters of the NFC algorithm were varied, and their effect on the speech quality was measured subjectively with 12 normal hearing adults, 12 normal hearing children, 13 hearing impaired adults, and 9 hearing impaired children. In the second study, 12 normal hearing and 8 hearing impaired adult listeners rated the quality of speech in quiet, speech in noise, and music after processing with a different set of NFC parameters. Results showed that the cutoff frequency parameter had more impact on sound quality ratings than the compression ratio, and that the hearing impaired adults were more tolerant to increased frequency compression than normal hearing adults. No statistically significant differences were found in the sound quality ratings of speech-in-noise and music stimuli processed through various NFC settings by hearing impaired listeners. These findings suggest that there may be an acceptable range of NFC settings for hearing impaired individuals where sound quality is not adversely affected. These results may assist an Audiologist in clinical NFC hearing aid fittings for achieving a balance between high frequency audibility and sound quality.
Application of Interface Technology in Progressive Failure Analysis of Composite Panels

NASA Technical Reports Server (NTRS)

Sleight, D. W.; Lotts, C. G.

2002-01-01

A progressive failure analysis capability using interface technology is presented. The capability has been implemented in the COMET-AR finite element analysis code developed at the NASA Langley Research Center and is demonstrated on composite panels. The composite panels are analyzed for damage initiation and propagation from initial loading to final failure using a progressive failure analysis capability that includes both geometric and material nonlinearities. Progressive failure analyses are performed on conventional models and interface technology models of the composite panels. Analytical results and the computational effort of the analyses are compared for the conventional models and interface technology models. The analytical results predicted with the interface technology models are in good correlation with the analytical results using the conventional models, while significantly reducing the computational effort.
Driver compliance to take-over requests with different auditory outputs in conditional automation.

PubMed

Forster, Yannick; Naujoks, Frederik; Neukum, Alexandra; Huestegge, Lynn

2017-12-01

Conditionally automated driving (CAD) systems are expected to improve traffic safety. Whenever the CAD system exceeds its limit of operation, designers of the system need to ensure a safe and timely enough transition from automated to manual mode. An existing visual Human-Machine Interface (HMI) was supplemented by different auditory outputs. The present work compares the effects of different auditory outputs in form of (1) a generic warning tone and (2) additional semantic speech output on driver behavior for the announcement of an upcoming take-over request (TOR). We expect the information carried by means of speech output to lead to faster reactions and better subjective evaluations by the drivers compared to generic auditory output. To test this assumption, N=17 drivers completed two simulator drives, once with a generic warning tone ('Generic') and once with additional speech output ('Speech+generic'), while they were working on a non-driving related task (NDRT; i.e., reading a magazine). Each drive incorporated one transition from automated to manual mode when yellow secondary lanes emerged. Different reaction time measures, relevant for the take-over process, were assessed. Furthermore, drivers evaluated the complete HMI regarding usefulness, ease of use and perceived visual workload just after experiencing the take-over. They gave comparative ratings on usability and acceptance at the end of the experiment. Results revealed that reaction times, reflecting information processing time (i.e., hands on the steering wheel, termination of NDRT), were shorter for 'Speech+generic' compared to 'Generic' while reaction time, reflecting allocation of attention (i.e., first glance ahead), did not show this difference. Subjective ratings were in favor of the system with additional speech output. Copyright © 2017 Elsevier Ltd. All rights reserved.

Application of Interface Technology in Nonlinear Analysis of a Stitched/RFI Composite Wing Stub Box

NASA Technical Reports Server (NTRS)

Wang, John T.; Ransom, Jonathan B.

1997-01-01

A recently developed interface technology was successfully employed in the geometrically nonlinear analysis of a full-scale stitched/RFI composite wing box loaded in bending. The technology allows mismatched finite element models to be joined in a variationally consistent manner and reduces the modeling complexity by eliminating transition meshing. In the analysis, local finite element models of nonlinearly deformed wide bays of the wing box are refined without the need for transition meshing to the surrounding coarse mesh. The COMET-AR finite element code, which has the interface technology capability, was used to perform the analyses. The COMET-AR analysis is compared to both a NASTRAN analysis and to experimental data. The interface technology solution is shown to be in good agreement with both. The viability of interface technology for coupled global/local analysis of large scale aircraft structures is demonstrated.
Case factors affecting hearing aid recommendations by hearing care professionals.

PubMed

Gioia, Carmine; Ben-Akiva, Moshe; Kirkegaard, Matilde; Jørgensen, Ole; Jensen, Kasper; Schum, Don

2015-03-01

Professional recommendations to patients concerning hearing instrument (HI) technology levels are not currently evidence-based. Pre-fitting parameters have not been proven to be the primary indicators for optimal patient outcome with different HI technology levels. This results in subjective decision-making as regards the technology level recommendation made by professionals. The objective of this study is to gain insight into the decision-making criteria utilized by professionals when recommending HI technology levels to hearing-impaired patients. A set of patient variables (and their respective levels) was identified by professionals as determinant for their recommendation of HIs. An experimental design was developed and 21 representative patient cases were generated. The design was based on a contrastive vignette technique according to which different types of vignettes (patient cases) were randomly presented to respondents in an online survey. Based on these patient cases, professionals were asked in the survey to make a treatment recommendation. The online survey was sent to approximately 3,500 professionals from the US, Germany, France, and Italy. The professionals were randomly selected from the databases of Oticon sales companies. The manufacturer sponsoring the survey remained anonymous and was only revealed after completing the survey, if requested by the respondent. The response rate was 20.5%. Data comprised of respondent descriptions and patient case recommendations that were collected from the online survey. A binary logit modeling approach was used to identify the variables that discriminate between the respondents' recommendations of HI technology levels. The results show that HI technology levels are recommended by professionals based on their perception of the patient's activity level in life, the level of HI usage for experienced users, their age, and their speech discrimination score. Surprisingly, the patient's lifestyle as perceived by the hearing care professional, followed by speech discrimination, were the strongest factors in explaining treatment recommendation. An active patient with poor speech discrimination had a 17% chance of being recommended the highest technology level HI. For a very active patient with good speech discrimination, the probability increases to 68%. The discrepancies in HI technology level recommendations are not justified by academic research or evidence of optimal patient outcome with a different HI technology level. The paradigm of lifestyle as the significant variable identified in this study is apparently deeply anchored in the mindset of the professional despite the lack of supporting evidence. These results call for a shift in the professional's technology level recommendation practice, from nonevidence-based to a proven practice that can maximize patient outcome. American Academy of Audiology.
Evaluation of Speech Intelligibility and Sound Localization Abilities with Hearing Aids Using Binaural Wireless Technology.

PubMed

Ibrahim, Iman; Parsa, Vijay; Macpherson, Ewan; Cheesman, Margaret

2013-01-02

Wireless synchronization of the digital signal processing (DSP) features between two hearing aids in a bilateral hearing aid fitting is a fairly new technology. This technology is expected to preserve the differences in time and intensity between the two ears by co-ordinating the bilateral DSP features such as multichannel compression, noise reduction, and adaptive directionality. The purpose of this study was to evaluate the benefits of wireless communication as implemented in two commercially available hearing aids. More specifically, this study measured speech intelligibility and sound localization abilities of normal hearing and hearing impaired listeners using bilateral hearing aids with wireless synchronization of multichannel Wide Dynamic Range Compression (WDRC). Twenty subjects participated; 8 had normal hearing and 12 had bilaterally symmetrical sensorineural hearing loss. Each individual completed the Hearing in Noise Test (HINT) and a sound localization test with two types of stimuli. No specific benefit from wireless WDRC synchronization was observed for the HINT; however, hearing impaired listeners had better localization with the wireless synchronization. Binaural wireless technology in hearing aids may improve localization abilities although the possible effect appears to be small at the initial fitting. With adaptation, the hearing aids with synchronized signal processing may lead to an improvement in localization and speech intelligibility. Further research is required to demonstrate the effect of adaptation to the hearing aids with synchronized signal processing on different aspects of auditory performance.
A multimedia PDA/PC speech and language therapy tool for patients with aphasia.

PubMed

Reeves, Nina; Jefferies, Laura; Cunningham, Sally-Jo; Harris, Catherine

2007-01-01

Aphasia is a speech disorder usually caused by stroke or head injury and may involve a variety of communication difficulties. As 30% of stroke sufferers have a persisting speech and language disorder and therapy resources are low, there is clear scope for the development of technology to support patients between therapy sessions. This paper reports on an empirical study which evaluated SoundHelper, a multimedia application to demonstrate how to pronounce target speech sounds. Two prototypes, involving either video or animation, were developed and evaluated with 20 Speech and Language Therapists. Participants responded positively to both, with the video being preferred because of the perceived extra information provided. The potential for the use on portable devices, since internet access is limited in hospitals, is explored in the light of opinions of Augmented and Alternative Communication (AAC) device users in the UK nd Europe who have expressed a strong desire for more use of internet services.
Impact of speech presentation level on cognitive task performance: implications for auditory display design.

PubMed

Baldwin, Carryl L; Struckman-Johnson, David

2002-01-15

Speech displays and verbal response technologies are increasingly being used in complex, high workload environments that require the simultaneous performance of visual and manual tasks. Examples of such environments include the flight decks of modern aircraft, advanced transport telematics systems providing invehicle route guidance and navigational information and mobile communication equipment in emergency and public safety vehicles. Previous research has established an optimum range for speech intelligibility. However, the potential for variations in presentation levels within this range to affect attentional resources and cognitive processing of speech material has not been examined previously. Results of the current experimental investigation demonstrate that as presentation level increases within this 'optimum' range, participants in high workload situations make fewer sentence-processing errors and generally respond faster. Processing errors were more sensitive to changes in presentation level than were measures of reaction time. Implications of these findings are discussed in terms of their application for the design of speech communications displays in complex multi-task environments.
Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives.

PubMed

Monson, Brian B; Lotto, Andrew J; Story, Brad H

2012-09-01

The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.
Dimensions of Personalisation in Technology-Enhanced Learning: A Framework and Implications for Design

ERIC Educational Resources Information Center

FitzGerald, Elizabeth; Kucirkova, Natalia; Jones, Ann; Cross, Simon; Ferguson, Rebecca; Herodotou, Christothea; Hillaire, Garron; Scanlon, Eileen

2018-01-01

Personalisation of learning is a recurring trend in our society, referred to in government speeches, popular media, conference and research papers and technological innovations. This latter aspect--of using personalisation in technology-enhanced learning (TEL)--has promised much but has not always lived up to the claims made. Personalisation is…
Evaluating a Speech-Language Pathology Technology

PubMed Central

Spinardi-Panes, Ana Carulina; Lopes-Herrera, Simone Aparecida; Maximino, Luciana Paula

2014-01-01

Abstract Background: The creation of new educational strategies based on technology is the essence of telehealth. This innovative learning is an alternative to promote integration and improve the professional practices in speech-language pathology (SLP). The objective of this study was to evaluate an SLP technology designed for distance learning. Materials and Methods: The survey selected fourth-year SLP students (n=60) from three public universities in the state of São Paulo, Brazil. The experimental group (EG) contained 10 students from each university (n=30), and the remaining students formed the control group (CG). Initially, both groups answered a preprotocol questionnaire, and the EG students received the technology, the recommendations, and the deadline to explore the material. In the second stage all students answered the postprotocol questionnaire in order to evaluate the validity and the learning of the technology contents. Results: The comparison between the CG students showed that their performance worsened in the majority in comparison with the EG students, who showed an improved performance. Conclusions: Therefore, this study concluded that the technology instrument actually responded to the population studied and is recommended to complement traditional teaching. PMID:24404815
Application of Business Process Management to drive the deployment of a speech recognition system in a healthcare organization.

PubMed

González Sánchez, María José; Framiñán Torres, José Manuel; Parra Calderón, Carlos Luis; Del Río Ortega, Juan Antonio; Vigil Martín, Eduardo; Nieto Cervera, Jaime

2008-01-01

We present a methodology based on Business Process Management to guide the development of a speech recognition system in a hospital in Spain. The methodology eases the deployment of the system by 1) involving the clinical staff in the process, 2) providing the IT professionals with a description of the process and its requirements, 3) assessing advantages and disadvantages of the speech recognition system, as well as its impact in the organisation, and 4) help reorganising the healthcare process before implementing the new technology in order to identify how it can better contribute to the overall objective of the organisation.
Mapping the cortical representation of speech sounds in a syllable repetition task.

PubMed

Markiewicz, Christopher J; Bohland, Jason W

2016-11-01

Speech repetition relies on a series of distributed cortical representations and functional pathways. A speaker must map auditory representations of incoming sounds onto learned speech items, maintain an accurate representation of those items in short-term memory, interface that representation with the motor output system, and fluently articulate the target sequence. A "dorsal stream" consisting of posterior temporal, inferior parietal and premotor regions is thought to mediate auditory-motor representations and transformations, but the nature and activation of these representations for different portions of speech repetition tasks remains unclear. Here we mapped the correlates of phonetic and/or phonological information related to the specific phonemes and syllables that were heard, remembered, and produced using a series of cortical searchlight multi-voxel pattern analyses trained on estimates of BOLD responses from individual trials. Based on responses linked to input events (auditory syllable presentation), predictive vowel-level information was found in the left inferior frontal sulcus, while syllable prediction revealed significant clusters in the left ventral premotor cortex and central sulcus and the left mid superior temporal sulcus. Responses linked to output events (the GO signal cueing overt production) revealed strong clusters of vowel-related information bilaterally in the mid to posterior superior temporal sulcus. For the prediction of onset and coda consonants, input-linked responses yielded distributed clusters in the superior temporal cortices, which were further informative for classifiers trained on output-linked responses. Output-linked responses in the Rolandic cortex made strong predictions for the syllables and consonants produced, but their predictive power was reduced for vowels. The results of this study provide a systematic survey of how cortical response patterns covary with the identity of speech sounds, which will help to constrain and guide theoretical models of speech perception, speech production, and phonological working memory. Copyright © 2016 Elsevier Inc. All rights reserved.
Network Modeling for Functional Magnetic Resonance Imaging (fMRI) Signals during Ultra-Fast Speech Comprehension in Late-Blind Listeners

PubMed Central

Dietrich, Susanne; Hertrich, Ingo; Ackermann, Hermann

2015-01-01

In many functional magnetic resonance imaging (fMRI) studies blind humans were found to show cross-modal reorganization engaging the visual system in non-visual tasks. For example, blind people can manage to understand (synthetic) spoken language at very high speaking rates up to ca. 20 syllables/s (syl/s). FMRI data showed that hemodynamic activation within right-hemispheric primary visual cortex (V1), bilateral pulvinar (Pv), and left-hemispheric supplementary motor area (pre-SMA) covaried with their capability of ultra-fast speech (16 syllables/s) comprehension. It has been suggested that right V1 plays an important role with respect to the perception of ultra-fast speech features, particularly the detection of syllable onsets. Furthermore, left pre-SMA seems to be an interface between these syllabic representations and the frontal speech processing and working memory network. So far, little is known about the networks linking V1 to Pv, auditory cortex (A1), and (mesio-) frontal areas. Dynamic causal modeling (DCM) was applied to investigate (i) the input structure from A1 and Pv toward right V1 and (ii) output from right V1 and A1 to left pre-SMA. As concerns the input Pv was significantly connected to V1, in addition to A1, in blind participants, but not in sighted controls. Regarding the output V1 was significantly connected to pre-SMA in blind individuals, and the strength of V1-SMA connectivity correlated with the performance of ultra-fast speech comprehension. By contrast, in sighted controls, not understanding ultra-fast speech, pre-SMA did neither receive input from A1 nor V1. Taken together, right V1 might facilitate the “parsing” of the ultra-fast speech stream in blind subjects by receiving subcortical auditory input via the Pv (= secondary visual pathway) and transmitting this information toward contralateral pre-SMA. PMID:26148062
Network Modeling for Functional Magnetic Resonance Imaging (fMRI) Signals during Ultra-Fast Speech Comprehension in Late-Blind Listeners.

PubMed

Dietrich, Susanne; Hertrich, Ingo; Ackermann, Hermann

2015-01-01

In many functional magnetic resonance imaging (fMRI) studies blind humans were found to show cross-modal reorganization engaging the visual system in non-visual tasks. For example, blind people can manage to understand (synthetic) spoken language at very high speaking rates up to ca. 20 syllables/s (syl/s). FMRI data showed that hemodynamic activation within right-hemispheric primary visual cortex (V1), bilateral pulvinar (Pv), and left-hemispheric supplementary motor area (pre-SMA) covaried with their capability of ultra-fast speech (16 syllables/s) comprehension. It has been suggested that right V1 plays an important role with respect to the perception of ultra-fast speech features, particularly the detection of syllable onsets. Furthermore, left pre-SMA seems to be an interface between these syllabic representations and the frontal speech processing and working memory network. So far, little is known about the networks linking V1 to Pv, auditory cortex (A1), and (mesio-) frontal areas. Dynamic causal modeling (DCM) was applied to investigate (i) the input structure from A1 and Pv toward right V1 and (ii) output from right V1 and A1 to left pre-SMA. As concerns the input Pv was significantly connected to V1, in addition to A1, in blind participants, but not in sighted controls. Regarding the output V1 was significantly connected to pre-SMA in blind individuals, and the strength of V1-SMA connectivity correlated with the performance of ultra-fast speech comprehension. By contrast, in sighted controls, not understanding ultra-fast speech, pre-SMA did neither receive input from A1 nor V1. Taken together, right V1 might facilitate the "parsing" of the ultra-fast speech stream in blind subjects by receiving subcortical auditory input via the Pv (= secondary visual pathway) and transmitting this information toward contralateral pre-SMA.
Voice/Natural Language Interfacing for Robotic Control.

DTIC Science & Technology

1987-11-01

THIS PAGE REPORT DOCUMENTATION PAGE Is. REPORT SECURITY CLASSIFICATION lb . RESTRICTIVE MARKINGS UNCLASSIFIED 2a. SECURITY CLASSIFICATION AUTHORITY 3...until major computing power can be profitably allocated to the speech recognition process, off-the- shelf units will never have sufficient intelligence to...coordinate transformation for a location, and opening or closing the gripper’s toggles. External to world operations, each joint may be rotated
Establishing a Deradicalization/Disengagement Model for America’s Correctional Facilities: Recommendations for Countering Prison Radicalization

DTIC Science & Technology

2013-03-01

Singaporeans on the ISA,” Malaysian Business, October 1, 2011, 56. 121 “Parliamentary Speech on the Internal Security Act—Speech by Mr Teo Chee Hean...Nanyang Technological University , Singapore and the Religious Rehabilitation Group (RRG), Singapore: S. Rajarathnam School of International Studies... University , Singapore and the Religious Rehabilitation Group (RRG). Singapore: S. Rajarathnam School of International Studies, 2009. http
The perceptual significance of high-frequency energy in the human voice.

PubMed

Monson, Brian B; Hunter, Eric J; Lotto, Andrew J; Story, Brad H

2014-01-01

While human vocalizations generate acoustical energy at frequencies up to (and beyond) 20 kHz, the energy at frequencies above about 5 kHz has traditionally been neglected in speech perception research. The intent of this paper is to review (1) the historical reasons for this research trend and (2) the work that continues to elucidate the perceptual significance of high-frequency energy (HFE) in speech and singing. The historical and physical factors reveal that, while HFE was believed to be unnecessary and/or impractical for applications of interest, it was never shown to be perceptually insignificant. Rather, the main causes for focus on low-frequency energy appear to be because the low-frequency portion of the speech spectrum was seen to be sufficient (from a perceptual standpoint), or the difficulty of HFE research was too great to be justifiable (from a technological standpoint). The advancement of technology continues to overcome concerns stemming from the latter reason. Likewise, advances in our understanding of the perceptual effects of HFE now cast doubt on the first cause. Emerging evidence indicates that HFE plays a more significant role than previously believed, and should thus be considered in speech and voice perception research, especially in research involving children and the hearing impaired.
The perceptual significance of high-frequency energy in the human voice

PubMed Central

Monson, Brian B.; Hunter, Eric J.; Lotto, Andrew J.; Story, Brad H.

2014-01-01

While human vocalizations generate acoustical energy at frequencies up to (and beyond) 20 kHz, the energy at frequencies above about 5 kHz has traditionally been neglected in speech perception research. The intent of this paper is to review (1) the historical reasons for this research trend and (2) the work that continues to elucidate the perceptual significance of high-frequency energy (HFE) in speech and singing. The historical and physical factors reveal that, while HFE was believed to be unnecessary and/or impractical for applications of interest, it was never shown to be perceptually insignificant. Rather, the main causes for focus on low-frequency energy appear to be because the low-frequency portion of the speech spectrum was seen to be sufficient (from a perceptual standpoint), or the difficulty of HFE research was too great to be justifiable (from a technological standpoint). The advancement of technology continues to overcome concerns stemming from the latter reason. Likewise, advances in our understanding of the perceptual effects of HFE now cast doubt on the first cause. Emerging evidence indicates that HFE plays a more significant role than previously believed, and should thus be considered in speech and voice perception research, especially in research involving children and the hearing impaired. PMID:24982643
A Survey of Speech-Language-Hearing Therapists' Career Situation and Challenges in Mainland China.

PubMed

Lin, Qiang; Lu, Jianliang; Chen, Zhuoming; Yan, Jiajian; Wang, Hong; Ouyang, Hui; Mou, Zhiwei; Huang, Dongfeng; O'Young, Bryan

2016-01-01

The aim of this survey was to investigate the background of speech-language pathologists and their training needs to provide a profile of the current state of the profession in Mainland China. A survey was conducted of 293 speech-language therapists. The questionnaire used asked questions related to their career background and had a 24-item ranking scale covering almost all of the common speech-language-hearing disorders. A summary of the raw data was constructed by calculating the average ranking score for each answer choice in order to determine the academic training needs with the highest preference among the respondents. The majority of respondents were female, <35 years old and with a total service time of <5 years. More than three quarters of the training needs with the highest preference among the 24 items involved basic-level knowledge of common speech-language-hearing disorders, such as diagnosis, assessment and conventional treatment, but seldom specific advanced technology or current progress. The results revealed that speech-language therapists in Mainland China tend to be young, with little total working experience and at the first stage of their career. This may be due to the lack of systematic educational programs and national certification systems for speech-language therapists. © 2016 S. Karger AG, Basel.
New Perspectives on Assessing Amplification Effects

PubMed Central

Souza, Pamela E.; Tremblay, Kelly L.

2006-01-01

Clinicians have long been aware of the range of performance variability with hearing aids. Despite improvements in technology, there remain many instances of well-selected and appropriately fitted hearing aids whereby the user reports minimal improvement in speech understanding. This review presents a multistage framework for understanding how a hearing aid affects performance. Six stages are considered: (1) acoustic content of the signal, (2) modification of the signal by the hearing aid, (3) interaction between sound at the output of the hearing aid and the listener's ear, (4) integrity of the auditory system, (5) coding of available acoustic cues by the listener's auditory system, and (6) correct identification of the speech sound. Within this framework, this review describes methodology and research on 2 new assessment techniques: acoustic analysis of speech measured at the output of the hearing aid and auditory evoked potentials recorded while the listener wears hearing aids. Acoustic analysis topics include the relationship between conventional probe microphone tests and probe microphone measurements using speech, appropriate procedures for such tests, and assessment of signal-processing effects on speech acoustics and recognition. Auditory evoked potential topics include an overview of physiologic measures of speech processing and the effect of hearing loss and hearing aids on cortical auditory evoked potential measurements in response to speech. Finally, the clinical utility of these procedures is discussed. PMID:16959734
Present Vision--Future Vision.

ERIC Educational Resources Information Center

Fitterman, L. Jeffrey

This paper addresses issues of current and future technology use for and by individuals with visual impairments and blindness in Florida. Present technology applications used in vision programs in Florida are individually described, including video enlarging, speech output, large inkprint, braille print, paperless braille, and tactual output…
Computers and Communications. Improving the Employability of Persons with Handicaps.

ERIC Educational Resources Information Center

Deitel, Harvey M.

1984-01-01

Reviews applications of computer and communications technologies for persons with visual, hearing, physical, speech, and language impairments, as well as the effects of technologies on transportation, work at home, education, and other aspects affecting the employment of the disabled. (SK)

Development of an algorithm for improving quality and information processing capacity of MathSpeak synthetic speech renderings.

PubMed

Isaacson, M D; Srinivasan, S; Lloyd, L L

2010-01-01

MathSpeak is a set of rules for non speaking of mathematical expressions. These rules have been incorporated into a computerised module that translates printed mathematics into the non-ambiguous MathSpeak form for synthetic speech rendering. Differences between individual utterances produced with the translator module are difficult to discern because of insufficient pausing between utterances; hence, the purpose of this study was to develop an algorithm for improving the synthetic speech rendering of MathSpeak. To improve synthetic speech renderings, an algorithm for inserting pauses was developed based upon recordings of middle and high school math teachers speaking mathematic expressions. Efficacy testing of this algorithm was conducted with college students without disabilities and high school/college students with visual impairments. Parameters measured included reception accuracy, short-term memory retention, MathSpeak processing capacity and various rankings concerning the quality of synthetic speech renderings. All parameters measured showed statistically significant improvements when the algorithm was used. The algorithm improves the quality and information processing capacity of synthetic speech renderings of MathSpeak. This increases the capacity of individuals with print disabilities to perform mathematical activities and to successfully fulfill science, technology, engineering and mathematics academic and career objectives.
Female voice communications in high levels of aircraft cockpit noises--Part I: spectra, levels, and microphones.

PubMed

Nixon, C W; Morris, L J; McCavitt, A R; McKinley, R L; Anderson, T R; McDaniel, M P; Yeager, D G

1998-07-01

Female produced speech, although more intelligible than male speech in some noise spectra, may be more vulnerable to degradation by high levels of some military aircraft cockpit noises. The acoustic features of female speech are higher in frequency, lower in power, and appear more susceptible than male speech to masking by some of these military noises. Current military aircraft voice communication systems were optimized for the male voice and may not adequately accommodate the female voice in these high level noises. This applied study investigated the intelligibility of female and male speech produced in the noise spectra of four military aircraft cockpits at levels ranging from 95 dB to 115 dB. The experimental subjects used standard flight helmets and headsets, noise-canceling microphones, and military aircraft voice communications systems during the measurements. The intelligibility of female speech was lower than that of male speech for all experimental conditions; however, differences were small and insignificant except at the highest levels of the cockpit noises. Intelligibility for both genders varied with aircraft noise spectrum and level. Speech intelligibility of both genders was acceptable during normal cruise noises of all four aircraft, but improvements are required in the higher levels of noise created during aircraft maximum operating conditions. The intelligibility of female speech was unacceptable at the highest measured noise level of 115 dB and may constitute a problem for other military aviators. The intelligibility degradation due to the noise can be neutralized by use of an available, improved noise-canceling microphone, by the application of current active noise reduction technology to the personal communication equipment, and by the development of a voice communications system to accommodate the speech produced by both female and male aviators.
"Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing

PubMed Central

Xiao, Bo; Imel, Zac E.; Georgiou, Panayiotis G.; Atkins, David C.; Narayanan, Shrikanth S.

2015-01-01

The technology for evaluating patient-provider interactions in psychotherapy–observational coding–has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies. PMID:26630392
Real-Time Speech-to-Text Services. [A Report of the] National Task Force on Quality of Services in the Postsecondary Education of Deaf and Hard of Hearing Students.

ERIC Educational Resources Information Center

Stinson, Michael; Eisenberg, Sandy; Horn, Christy; Larson, Judy; Levitt, Harry; Stuckless, Ross

This report describes and discusses several applications of new computer-based technologies which enable postsecondary students with deafness or hearing impairments to read the text of the language being spoken by the instructor and fellow students virtually in real time. Two current speech-to-text options are described: (1) steno-based systems in…
Coming Together

ERIC Educational Resources Information Center

Villano, Matt

2007-01-01

In many ways, unified communications (UC) is the Holy Grail in the world of campus telecommunications; everybody wants it, yet the phrase means something different to everyone. "Campus Technology" (CT) tackled this subject in the recent webinar sponsored by Applied Voice & Speech Technologies (AVST), "Ten Steps for Building an Affordable, Reliable…
Department of Defense Strategy for Operating in Cyberspace

DTIC Science & Technology

2011-07-01

incubator for new forms of entrepreneurship , advances in technology, the spread of free speech, and new social networks that drive our economy and...research, and technology. DoD will continue to embrace this spirit of entrepreneurship and work in partnership with these communities and institutions
Legal Issues in Educational Technology: Implications for School Leaders.

ERIC Educational Resources Information Center

Quinn, David M.

2003-01-01

Discusses several legal issues involving the use of educational technology: Freedom of speech, regulation of Internet material harmful to minors, student-developed Web pages, harassment and hostile work environment, staff and student privacy, special education, plagiarism, and copyright issues. Includes recommendations for addressing technology…
Adoption of Speech Recognition Technology in Community Healthcare Nursing.

PubMed

Al-Masslawi, Dawood; Block, Lori; Ronquillo, Charlene

2016-01-01

Adoption of new health information technology is shown to be challenging. However, the degree to which new technology will be adopted can be predicted by measures of usefulness and ease of use. In this work these key determining factors are focused on for design of a wound documentation tool. In the context of wound care at home, consistent with evidence in the literature from similar settings, use of Speech Recognition Technology (SRT) for patient documentation has shown promise. To achieve a user-centred design, the results from a conducted ethnographic fieldwork are used to inform SRT features; furthermore, exploratory prototyping is used to collect feedback about the wound documentation tool from home care nurses. During this study, measures developed for healthcare applications of the Technology Acceptance Model will be used, to identify SRT features that improve usefulness (e.g. increased accuracy, saving time) or ease of use (e.g. lowering mental/physical effort, easy to remember tasks). The identified features will be used to create a low fidelity prototype that will be evaluated in future experiments.
Development of Speech Input/Output Interfaces for Tactical Aircraft

DTIC Science & Technology

1983-07-01

PERFORMING ORGANIZATION NAME AND AOORESS 10 PROGRAM ELEMENT ROJECT TASK AREA & WORK UNIT NUMBERS Canyon Research Group , Inc. 741 Lakefield Road, Suite B...I. Strieb Canyon Research Group , Inc. 741 Lakefield Road, Suite B *Westlake Village, CA 91361 July 1983 Report for Period: December 1981 - July 1983...Approved for public release; distribution unlimited. 17 OISTRISUTION STATEMENT (of the abstract entered itn Block 20. II difer , nt from Report) 18
ECLSS evolution: Advanced instrumentation interface requirements. Volume 3: Appendix C

NASA Technical Reports Server (NTRS)

1991-01-01

An Advanced ECLSS (Environmental Control and Life Support System) Technology Interfaces Database was developed primarily to provide ECLSS analysts with a centralized and portable source of ECLSS technologies interface requirements data. The database contains 20 technologies which were previously identified in the MDSSC ECLSS Technologies database. The primary interfaces of interest in this database are fluid, electrical, data/control interfaces, and resupply requirements. Each record contains fields describing the function and operation of the technology. Fields include: an interface diagram, description applicable design points and operating ranges, and an explaination of data, as required. A complete set of data was entered for six of the twenty components including Solid Amine Water Desorbed (SAWD), Thermoelectric Integrated Membrane Evaporation System (TIMES), Electrochemical Carbon Dioxide Concentrator (EDC), Solid Polymer Electrolysis (SPE), Static Feed Electrolysis (SFE), and BOSCH. Additional data was collected for Reverse Osmosis Water Reclaimation-Potable (ROWRP), Reverse Osmosis Water Reclaimation-Hygiene (ROWRH), Static Feed Solid Polymer Electrolyte (SFSPE), Trace Contaminant Control System (TCCS), and Multifiltration Water Reclamation - Hygiene (MFWRH). A summary of the database contents is presented in this report.
Evaluation of Speech Intelligibility and Sound Localization Abilities with Hearing Aids Using Binaural Wireless Technology

PubMed Central

Ibrahim, Iman; Parsa, Vijay; Macpherson, Ewan; Cheesman, Margaret

2012-01-01

Wireless synchronization of the digital signal processing (DSP) features between two hearing aids in a bilateral hearing aid fitting is a fairly new technology. This technology is expected to preserve the differences in time and intensity between the two ears by co-ordinating the bilateral DSP features such as multichannel compression, noise reduction, and adaptive directionality. The purpose of this study was to evaluate the benefits of wireless communication as implemented in two commercially available hearing aids. More specifically, this study measured speech intelligibility and sound localization abilities of normal hearing and hearing impaired listeners using bilateral hearing aids with wireless synchronization of multichannel Wide Dynamic Range Compression (WDRC). Twenty subjects participated; 8 had normal hearing and 12 had bilaterally symmetrical sensorineural hearing loss. Each individual completed the Hearing in Noise Test (HINT) and a sound localization test with two types of stimuli. No specific benefit from wireless WDRC synchronization was observed for the HINT; however, hearing impaired listeners had better localization with the wireless synchronization. Binaural wireless technology in hearing aids may improve localization abilities although the possible effect appears to be small at the initial fitting. With adaptation, the hearing aids with synchronized signal processing may lead to an improvement in localization and speech intelligibility. Further research is required to demonstrate the effect of adaptation to the hearing aids with synchronized signal processing on different aspects of auditory performance. PMID:26557339
Cutaneous sensory nerve as a substitute for auditory nerve in solving deaf-mutes’ hearing problem: an innovation in multi-channel-array skin-hearing technology

PubMed Central

Li, Jianwen; Li, Yan; Zhang, Ming; Ma, Weifang; Ma, Xuezong

2014-01-01

The current use of hearing aids and artificial cochleas for deaf-mute individuals depends on their auditory nerve. Skin-hearing technology, a patented system developed by our group, uses a cutaneous sensory nerve to substitute for the auditory nerve to help deaf-mutes to hear sound. This paper introduces a new solution, multi-channel-array skin-hearing technology, to solve the problem of speech discrimination. Based on the filtering principle of hair cells, external voice signals at different frequencies are converted to current signals at corresponding frequencies using electronic multi-channel bandpass filtering technology. Different positions on the skin can be stimulated by the electrode array, allowing the perception and discrimination of external speech signals to be determined by the skin response to the current signals. Through voice frequency analysis, the frequency range of the band-pass filter can also be determined. These findings demonstrate that the sensory nerves in the skin can help to transfer the voice signal and to distinguish the speech signal, suggesting that the skin sensory nerves are good candidates for the replacement of the auditory nerve in addressing deaf-mutes’ hearing problems. Scientific hearing experiments can be more safely performed on the skin. Compared with the artificial cochlea, multi-channel-array skin-hearing aids have lower operation risk in use, are cheaper and are more easily popularized. PMID:25317171
Technology for Persons with Disabilities. An Introduction.

ERIC Educational Resources Information Center

IBM, Atlanta, GA. National Support Center for Persons with Disabilities.

This paper contains an overview of technology, national support organizations, and IBM support available to persons with disabilities related to impairments affecting hearing, learning, mobility, speech or language, and vision. The information was obtained from the IBM National Support Center for Persons with Disabilities, which was created to…
Tools for the Task? Perspectives on Assistive Technology in Educational Settings.

ERIC Educational Resources Information Center

Todis, Bonnie

1996-01-01

A two-year qualitative study evaluated use of assistive technology by 13 students. Excerpts from case studies illustrate the perspectives of parents, specialists (physical therapists and speech language pathologists), special and regular education teachers, instructional assistants, student users, and peers. Results demonstrate the complex…
Department of Defense Strategy for Operating in Cyberspace

DTIC Science & Technology

2011-07-01

incubator for new forms of entrepreneurship , advances in technology, the spread of free speech, and new social networks that drive our economy and reflect...and technology. DoD will continue to embrace this spirit of entrepreneurship and work in partnership with these communities and institutions to
TEND 2000: Proceedings of the Technological Education and National Development Conference, "Crossroads of the New Millennium" (2nd, April 8-10, 2000, Abu Dhabi, United Arab Emirates).

ERIC Educational Resources Information Center

Higher Colleges of Technology, Abu Dhabi (United Arab Emirates).

This document contains a total of 57 welcoming speeches, theme addresses, seminar and workshop papers, and poster sessions that were presented at a conference on technological education and national development. The papers explore the ways technology and technological advances have both necessitated and enabled changes in the way education is…
The Human Interface Technology Laboratory.

ERIC Educational Resources Information Center

Washington Univ., Seattle. Washington Technology Center.

This booklet contains information about the Human Interface Technology Laboratory (HITL), which was established by the Washington Technology Center at the University of Washington to transform virtual world concepts and research into practical, economically viable technology products. The booklet is divided into seven sections: (1) a brief…
Everyday listening questionnaire: correlation between subjective hearing and objective performance.

PubMed

Brendel, Martina; Frohne-Buechner, Carolin; Lesinski-Schiedat, Anke; Lenarz, Thomas; Buechner, Andreas

2014-01-01

Clinical experience has demonstrated that speech understanding by cochlear implant (CI) recipients has improved over recent years with the development of new technology. The Everyday Listening Questionnaire 2 (ELQ 2) was designed to collect information regarding the challenges faced by CI recipients in everyday listening. The aim of this study was to compare self-assessment of CI users using ELQ 2 with objective speech recognition measures and to compare results between users of older and newer coding strategies. During their regular clinical review appointments a group of representative adult CI recipients implanted with the Advanced Bionics implant system were asked to complete the questionnaire. The first 100 patients who agreed to participate in this survey were recruited independent of processor generation and speech coding strategy. Correlations between subjectively scored hearing performance in everyday listening situations and objectively measured speech perception abilities were examined relative to the speech coding strategies used. When subjects were grouped by strategy there were significant differences between users of older 'standard' strategies and users of the newer, currently available strategies (HiRes and HiRes 120), especially in the categories of telephone use and music perception. Significant correlations were found between certain subjective ratings and the objective speech perception data in noise. There is a good correlation between subjective and objective data. Users of more recent speech coding strategies tend to have fewer problems in difficult hearing situations.
Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives

PubMed Central

Monson, Brian B.; Lotto, Andrew J.; Story, Brad H.

2012-01-01

The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech. PMID:22978902
EDITORIAL: Special issue containing contributions from the 39th Neural Interfaces Conference Special issue containing contributions from the 39th Neural Interfaces Conference

NASA Astrophysics Data System (ADS)

Weiland, James D.

2011-07-01

Implantable neural interfaces provide substantial benefits to individuals with neurological disorders. That was the unequivocal message delivered by speaker after speaker from the podium of the 39th Neural Interfaces Conference (NIC2010) held in Long Beach, California, in June 2010. Giving benefit to patients is the most important measure for any biomedical technology, and myriad presentations at NIC2010 made clear that implantable neurostimulation technology has achieved this goal. Cochlear implants allow deaf people to communicate through speech. Deep brain stimulators give back mobility and dexterity necessary for so many daily tasks that are often taken for granted. Chronic pain can be alleviated through spinal cord stimulation. Motor prosthesis systems have been demonstrated in humans, through both reanimation of paralyzed limbs and neural control of robotic arms. Earlier this year, a retinal prosthesis was approved for sale in Europe, providing some hope for the blind. In sum, current clinical implants have been tremendously beneficial for today's patients and experimental systems that will be translated to the clinic promise to expand the number of people helped through bioelectronic therapies. Yet there are significant opportunities for improvement. For sensory prostheses, patients report an artificial sensation, clearly different from the natural sensation they remember. Neuromodulation systems, such as deep brain stimulation and pain stimulators, often have side effects that are tolerated as long as the side effects are less impactful than the disease. The papers published in the special issue from NIC2010 reflect the maturing and expanding field of neural interfaces. Our field has moved past proof-of-principle demonstrations and is now focusing on proving the longevity required for clinical implementation of new devices, extending existing approaches to new diseases and improving current devices for better outcomes. Closed-loop neuromodulation is a strategy that can potentially optimize dosing, reduce side effects and extend implant battery life. The article by Liang et al investigates methods for closed loop control of epilepsy, using neural recording to detect imminent seizures and stimulation to halt the aberrant neural activity leading to seizure. Liu et al report on a model of basal ganglia function that could lead to optimized, closed-loop stimulation to reduce symptoms of Parkinson's disease while avoiding side effects. Our laboratory, as described in Ray et al, is investigating the interface between stimulating microelectrodes and the retina, to inform the design of a high-resolution retinal prosthesis. Three contributions address the issue of long-term stability of cortical recording, which remains a major hurdle to implementation of neural recording systems. The Utah group reports on the in vitro testing of a completely implantable, wireless neural recording system, demonstrating almost one year of reliable performance under simulated implant conditions. Shenoy's laboratory at Stanford demonstrates that useful signals can be recorded from research animals for over 2.5 years. Lempka et al describe a modeling approach to analyzing intracortical microelectrode recordings. These findings represent real and significant progress towards overcoming the final barriers to implementation of a reliable cortical interface. Planning is well underway for the 40th Neural Interfaces Conference, which will be held in Salt Lake City, Utah, in June 2012. The conference promises to continue the NIC tradition of showcasing the latest results from clinical trials of neural interface therapies while providing ample time for dynamic exchange amongst the interdisciplinary audience of engineers, scientists and clinicians.

Accessibility of insulin pumps for blind and visually impaired people.

PubMed

Uslan, Mark M; Burton, Darren M; Chertow, Bruce S; Collins, Ronda

2004-10-01

Continuous subcutaneous insulin infusion using an insulin pump (IP) more closely mimics the normal pancreas than multiple insulin injections. It is an effective, and often a preferred, means of maintaining normal blood glucose levels, but IPs were not designed to be fully accessible to blind or visually impaired people. This study will identify accessibility issues related to the design of IPs and focus on the key improvements required in the user interface to provide access for people who are blind or visually impaired. IPs that are commercially available were evaluated, and features and functions such as operating procedures, user interface design, and user manuals were tabulated and analyzed. Potential failures and design priorities were identified through a Failure Modes and Effects Analysis (FMEA). Although the IPs do provide some limited audio output, in general, it was found to be of minimal use to people who are blind or visually impaired. None of the IPs uses high-contrast displays with consistently large fonts preferred by people who are visually impaired. User manuals were also found to be of minimal use. Results of the FMEA emphasize the need to focus design improvements on communicating and verifying information so that errors and failures can be detected and corrected. The most important recommendation for future IP development is speech output capability, which, more than any other improvement, would break down accessibility barriers and allow blind and visually impaired people to take advantage of the benefits of IP technology.
A Systematic Review of Tablet Computers and Portable Media Players as Speech Generating Devices for Individuals with Autism Spectrum Disorder.

PubMed

Lorah, Elizabeth R; Parnell, Ashley; Whitby, Peggy Schaefer; Hantula, Donald

2015-12-01

Powerful, portable, off-the-shelf handheld devices, such as tablet based computers (i.e., iPad(®); Galaxy(®)) or portable multimedia players (i.e., iPod(®)), can be adapted to function as speech generating devices for individuals with autism spectrum disorders or related developmental disabilities. This paper reviews the research in this new and rapidly growing area and delineates an agenda for future investigations. In general, participants using these devices acquired verbal repertoires quickly. Studies comparing these devices to picture exchange or manual sign language found that acquisition was often quicker when using a tablet computer and that the vast majority of participants preferred using the device to picture exchange or manual sign language. Future research in interface design, user experience, and extended verbal repertoires is recommended.
Biomedical technology transfer: Bioinstrumentation for cardiology, neurology, and the circulatory system

NASA Technical Reports Server (NTRS)

1976-01-01

Developments in applying aerospace medical technology to the design and production of medical equipment and instrumentation are reported. Projects described include intercranial pressure transducers, leg negative pressure devices, a synthetic speech prosthesis for victims of cerebral palsy, and a Doppler blood flow instrument. Commercialization activities for disseminating and utilizing NASA technology, and new biomedical problem areas are discussed.
A Qualitative Study of the Child, Family and Professional Factors That Influence the Use of Assistive Technology in Early Intervention.

ERIC Educational Resources Information Center

Hider, Erin D.

Factors involved in assistive technology use by young children with disabilities were explored through case studies of five families who had received intensive training at Camp Gizmo, an assistive technology camp for young children. Families, service providers, and preservice students in special education and speech language pathology engaged in a…
Development of a Dedicated Speech Work Station.

DTIC Science & Technology

1984-12-01

AD-Ai55 465 DEVELOPMENT OF R DEDICATED SPEECH WORK STTION(U) AIR / FORCE INST OF TECH WRIGHT-PATTERSON AFB OH SCHOOL OF ENGINEERING W H LIEBER DEC 84...Presented to the Faculty of the School of Engineering of the Air Force Institute of Technology Air University in Partial Fulfillment of the Requirement for...the Degree of Master of Science in Electrical Engineering by William H. Lieber, B.S.E.E. Capt USAF Graduate Electrical Engineering December 1984
Expanding the functionality of speech recognition in radiology: creating a real-time methodology for measurement and analysis of occupational stress and fatigue.

PubMed

Reiner, Bruce I

2013-02-01

While occupational stress and fatigue have been well described throughout medicine, the radiology community is particularly susceptible due to declining reimbursements, heightened demands for service deliverables, and increasing exam volume and complexity. The resulting occupational stress can be variable in nature and dependent upon a number of intrinsic and extrinsic stressors. Intrinsic stressors largely account for inter-radiologist stress variability and relate to unique attributes of the radiologist such as personality, emotional state, education/training, and experience. Extrinsic stressors may account for intra-radiologist stress variability and include cumulative workload and task complexity. The creation of personalized stress profiles creates a mechanism for accounting for both inter- and intra-radiologist stress variability, which is essential in creating customizable stress intervention strategies. One viable option for real-time occupational stress measurement is voice stress analysis, which can be directly implemented through existing speech recognition technology and has been proven to be effective in stress measurement and analysis outside of medicine. This technology operates by detecting stress in the acoustic properties of speech through a number of different variables including duration, glottis source factors, pitch distribution, spectral structure, and intensity. The correlation of these speech derived stress measures with outcomes data can be used to determine the user-specific inflection point at which stress becomes detrimental to clinical performance.
Perceptions of parents and speech and language therapists on the effects of paediatric cochlear implantation and habilitation and education following it.

PubMed

Huttunen, Kerttu; Välimaa, Taina

2012-01-01

During the process of implantation, parents may have rather heterogeneous expectations and concerns about their child's development and the functioning of habilitation and education services. Their views on habilitation and education are important for building family-centred practices. We explored the perceptions of parents and speech and language therapists (SLTs) on the effects of implantation on the child and the family and on the quality of services provided. Their views were also compared. Parents and SLTs of 18 children filled out questionnaires containing open- and closed-ended questions at 6 months and annually 1-5 years after activation of the implant. Their responses were analysed mainly using data-based inductive content analysis. Positive experiences outnumbered negative ones in the responses of both the parents and the SLTs surveyed. The parents were particularly satisfied with the improvement in communication and expanded social life in the family. These were the most prevalent themes also raised by the SLTs. The parents were also satisfied with the organization and content of habilitation. Most of the negative experiences were related to arrangement of hospital visits and the usability and maintenance of speech processor technology. Some children did not receive enough speech and language therapy, and some of the parents were dissatisfied with educational services. The habilitation process had generally required parental efforts at an expected level. However, parents with a child with at least one concomitant problem experienced habilitation as more stressful than did other parents. Parents and SLTs had more positive than negative experiences with implantation. As the usability and maintenance of speech processor technology were often compromised, we urge implant centres to ensure sufficient personnel for technical maintenance. It is also important to promote services by providing enough information and parental support. © 2011 Royal College of Speech & Language Therapists.
Embedded Web Technology: Applying World Wide Web Standards to Embedded Systems

NASA Technical Reports Server (NTRS)

Ponyik, Joseph G.; York, David W.

2002-01-01

Embedded Systems have traditionally been developed in a highly customized manner. The user interface hardware and software along with the interface to the embedded system are typically unique to the system for which they are built, resulting in extra cost to the system in terms of development time and maintenance effort. World Wide Web standards have been developed in the passed ten years with the goal of allowing servers and clients to intemperate seamlessly. The client and server systems can consist of differing hardware and software platforms but the World Wide Web standards allow them to interface without knowing about the details of system at the other end of the interface. Embedded Web Technology is the merging of Embedded Systems with the World Wide Web. Embedded Web Technology decreases the cost of developing and maintaining the user interface by allowing the user to interface to the embedded system through a web browser running on a standard personal computer. Embedded Web Technology can also be used to simplify an Embedded System's internal network.
Motivating Readers with Illustrative eText

ERIC Educational Resources Information Center

Edwards, Peter

2008-01-01

Assistive technology (AT)--the use of technology to assist individuals with disabilities--encompasses a wide range of applications including problems with reading, writing, and language arts; speech-language disorders; students with mild disabilities; and older students. An exciting and motivational use of AT to assist readers that has not been…
Technology Education Leadership: Observations and Reflections

ERIC Educational Resources Information Center

Sanders, Mark

2006-01-01

This article presents the text of a speech given by the author at the Maley Spirit of Excellence Breakfast in March, 2006. In this address, Sanders talks about the characteristics necessary for technology education leadership, and recommends seven leadership initiatives that the profession should pursue: (1) State-level political action; (2)…
Technology, Privacy, and Electronic Freedom of Speech.

ERIC Educational Resources Information Center

McDonald, Frances M.

1986-01-01

Explores five issues related to technology's impact on privacy and access to information--regulation and licensing of the press, electronic surveillance, invasion of privacy, copyright, and policy-making and regulation. The importance of First Amendment rights and civil liberties in forming a coherent national information policy is stressed.…
Performance of a New Speech Translation Device in Translating Verbal Recommendations of Medication Action Plans for Patients with Diabetes

PubMed Central

Soller, R. William; Chan, Philip; Higa, Amy

2012-01-01

Background Language barriers are significant hurdles for chronic disease patients in achieving self-management goals of therapy, particularly in settings where practitioners have limited nonprimary language skills, and in-person translators may not always be available. S-MINDS© (Speaking Multilingual Interactive Natural Dialog System), a concept-based speech translation approach developed by Fluential Inc., can be applied to bridge the technologic gaps that limit the complexity and length of utterances that can be recognized and translated by devices and has the potential to broaden access to translation services in the clinical settings. Methods The prototype translation system was evaluated prospectively for accuracy and patient satisfaction in underserved Spanish-speaking patients with diabetes and limited English proficiency and was compared with other commercial systems for robustness against degradation of translation due to ambient noise and speech patterns. Results Accuracy related to translating the English–Spanish–English communication string from practitioner to device to patient to device to practitioner was high (97–100%). Patient satisfaction was high (means of 4.7–4.9 over four domains on a 5-point Likert scale). The device outperformed three other commercial speech translation systems in terms of accuracy during fast speech utterances, under quiet and noisy fluent speech conditions, and when challenged with various speech disfluencies (i.e., fillers, false starts, stutters, repairs, and long pauses). Conclusions A concept-based English–Spanish speech translation system has been successfully developed in prototype form that can accept long utterances (up to 20 words) with limited to no degradation in accuracy. The functionality of the system is superior to leading commercial speech translation systems. PMID:22920821
Visual activity predicts auditory recovery from deafness after adult cochlear implantation.

PubMed

Strelnikov, Kuzma; Rouger, Julien; Demonet, Jean-François; Lagleyre, Sebastien; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal

2013-12-01

Modern cochlear implantation technologies allow deaf patients to understand auditory speech; however, the implants deliver only a coarse auditory input and patients must use long-term adaptive processes to achieve coherent percepts. In adults with post-lingual deafness, the high progress of speech recovery is observed during the first year after cochlear implantation, but there is a large range of variability in the level of cochlear implant outcomes and the temporal evolution of recovery. It has been proposed that when profoundly deaf subjects receive a cochlear implant, the visual cross-modal reorganization of the brain is deleterious for auditory speech recovery. We tested this hypothesis in post-lingually deaf adults by analysing whether brain activity shortly after implantation correlated with the level of auditory recovery 6 months later. Based on brain activity induced by a speech-processing task, we found strong positive correlations in areas outside the auditory cortex. The highest positive correlations were found in the occipital cortex involved in visual processing, as well as in the posterior-temporal cortex known for audio-visual integration. The other area, which positively correlated with auditory speech recovery, was localized in the left inferior frontal area known for speech processing. Our results demonstrate that the visual modality's functional level is related to the proficiency level of auditory recovery. Based on the positive correlation of visual activity with auditory speech recovery, we suggest that visual modality may facilitate the perception of the word's auditory counterpart in communicative situations. The link demonstrated between visual activity and auditory speech perception indicates that visuoauditory synergy is crucial for cross-modal plasticity and fostering speech-comprehension recovery in adult cochlear-implanted deaf patients.
Building Languages

MedlinePlus

... Support Services Technology and Audiology Medical and Surgical Solutions Putting it all Together Building Language American Sign Language (ASL) Conceptually Accurate Signed English (CASE) Cued Speech Finger Spelling Listening/Auditory Training ...
A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks.

PubMed

Shahamiri, Seyed Reza; Salim, Siti Salwah Binti

2014-09-01

Automatic speech recognition (ASR) can be very helpful for speakers who suffer from dysarthria, a neurological disability that damages the control of motor speech articulators. Although a few attempts have been made to apply ASR technologies to sufferers of dysarthria, previous studies show that such ASR systems have not attained an adequate level of performance. In this study, a dysarthric multi-networks speech recognizer (DM-NSR) model is provided using a realization of multi-views multi-learners approach called multi-nets artificial neural networks, which tolerates variability of dysarthric speech. In particular, the DM-NSR model employs several ANNs (as learners) to approximate the likelihood of ASR vocabulary words and to deal with the complexity of dysarthric speech. The proposed DM-NSR approach was presented as both speaker-dependent and speaker-independent paradigms. In order to highlight the performance of the proposed model over legacy models, multi-views single-learner models of the DM-NSRs were also provided and their efficiencies were compared in detail. Moreover, a comparison among the prominent dysarthric ASR methods and the proposed one is provided. The results show that the DM-NSR recorded improved recognition rate by up to 24.67% and the error rate was reduced by up to 8.63% over the reference model.
Automated Intelligibility Assessment of Pathological Speech Using Phonological Features

NASA Astrophysics Data System (ADS)

Middag, Catherine; Martens, Jean-Pierre; Van Nuffelen, Gwen; De Bodt, Marc

2009-12-01

It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words) and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008) is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.
RecoverNow: Feasibility of a Mobile Tablet-Based Rehabilitation Intervention to Treat Post-Stroke Communication Deficits in the Acute Care Setting

PubMed Central

Corbett, Dale; Finestone, Hillel M.; Hatcher, Simon; Lumsden, Jim; Momoli, Franco; Shamy, Michel C. F.; Stotts, Grant; Swartz, Richard H.; Yang, Christine

2016-01-01

Background Approximately 40% of patients diagnosed with stroke experience some degree of aphasia. With limited health care resources, patients’ access to speech and language therapies is often delayed. We propose using mobile-platform technology to initiate early speech-language therapy in the acute care setting. For this pilot, our objective was to assess the feasibility of a tablet-based speech-language therapy for patients with communication deficits following acute stroke. Methods We enrolled consecutive patients admitted with a stroke and communication deficits with NIHSS score ≥1 on the best language and/or dysarthria parameters. We excluded patients with severe comprehension deficits where communication was not possible. Following baseline assessment by a speech-language pathologist (SLP), patients were provided with a mobile tablet programmed with individualized therapy applications based on the assessment, and instructed to use it for at least one hour per day. Our objective was to establish feasibility by measuring recruitment rate, adherence rate, retention rate, protocol deviations and acceptability. Results Over 6 months, 143 patients were admitted with a new diagnosis of stroke: 73 had communication deficits, 44 met inclusion criteria, and 30 were enrolled into RecoverNow (median age 62, 26.6% female) for a recruitment rate of 68% of eligible participants. Participants received mobile tablets at a mean 6.8 days from admission [SEM 1.6], and used them for a mean 149.8 minutes/day [SEM 19.1]. In-hospital retention rate was 97%, and 96% of patients scored the mobile tablet-based communication therapy as at least moderately convenient 3/5 or better with 5/5 being most “convenient”. Conclusions Individualized speech-language therapy delivered by mobile tablet technology is feasible in acute care. PMID:28002479
RecoverNow: Feasibility of a Mobile Tablet-Based Rehabilitation Intervention to Treat Post-Stroke Communication Deficits in the Acute Care Setting.

PubMed

Mallet, Karen H; Shamloul, Rany M; Corbett, Dale; Finestone, Hillel M; Hatcher, Simon; Lumsden, Jim; Momoli, Franco; Shamy, Michel C F; Stotts, Grant; Swartz, Richard H; Yang, Christine; Dowlatshahi, Dar

2016-01-01

Approximately 40% of patients diagnosed with stroke experience some degree of aphasia. With limited health care resources, patients' access to speech and language therapies is often delayed. We propose using mobile-platform technology to initiate early speech-language therapy in the acute care setting. For this pilot, our objective was to assess the feasibility of a tablet-based speech-language therapy for patients with communication deficits following acute stroke. We enrolled consecutive patients admitted with a stroke and communication deficits with NIHSS score ≥1 on the best language and/or dysarthria parameters. We excluded patients with severe comprehension deficits where communication was not possible. Following baseline assessment by a speech-language pathologist (SLP), patients were provided with a mobile tablet programmed with individualized therapy applications based on the assessment, and instructed to use it for at least one hour per day. Our objective was to establish feasibility by measuring recruitment rate, adherence rate, retention rate, protocol deviations and acceptability. Over 6 months, 143 patients were admitted with a new diagnosis of stroke: 73 had communication deficits, 44 met inclusion criteria, and 30 were enrolled into RecoverNow (median age 62, 26.6% female) for a recruitment rate of 68% of eligible participants. Participants received mobile tablets at a mean 6.8 days from admission [SEM 1.6], and used them for a mean 149.8 minutes/day [SEM 19.1]. In-hospital retention rate was 97%, and 96% of patients scored the mobile tablet-based communication therapy as at least moderately convenient 3/5 or better with 5/5 being most "convenient". Individualized speech-language therapy delivered by mobile tablet technology is feasible in acute care.
Smartphones as multimodal communication devices to facilitate clinical knowledge processes: randomized controlled trial.

PubMed

Pimmer, Christoph; Mateescu, Magdalena; Zahn, Carmen; Genewein, Urs

2013-11-27

Despite the widespread use and advancements of mobile technology that facilitate rich communication modes, there is little evidence demonstrating the value of smartphones for effective interclinician communication and knowledge processes. The objective of this study was to determine the effects of different synchronous smartphone-based modes of communication, such as (1) speech only, (2) speech and images, and (3) speech, images, and image annotation (guided noticing) on the recall and transfer of visually and verbally represented medical knowledge. The experiment was conducted from November 2011 to May 2012 at the University Hospital Basel (Switzerland) with 42 medical students in a master's program. All participants analyzed a standardized case (a patient with a subcapital fracture of the fifth metacarpal bone) based on a radiological image, photographs of the hand, and textual descriptions, and were asked to consult a remote surgical specialist via a smartphone. Participants were randomly assigned to 3 experimental conditions/groups. In group 1, the specialist provided verbal explanations (speech only). In group 2, the specialist provided verbal explanations and displayed the radiological image and the photographs to the participants (speech and images). In group 3, the specialist provided verbal explanations, displayed the radiological image and the photographs, and annotated the radiological image by drawing structures/angle elements (speech, images, and image annotation). To assess knowledge recall, participants were asked to write brief summaries of the case (verbally represented knowledge) after the consultation and to re-analyze the diagnostic images (visually represented knowledge). To assess knowledge transfer, participants analyzed a similar case without specialist support. Data analysis by ANOVA found that participants in groups 2 and 3 (images used) evaluated the support provided by the specialist as significantly more positive than group 1, the speech-only group (group 1: mean 4.08, SD 0.90; group 2: mean 4.73, SD 0.59; group 3: mean 4.93, SD 0.25; F2,39=6.76, P=.003; partial η(2)=0.26, 1-β=.90). However, significant positive effects on the recall and transfer of visually represented medical knowledge were only observed when the smartphone-based communication involved the combination of speech, images, and image annotation (group 3). There were no significant positive effects on the recall and transfer of visually represented knowledge between group 1 (speech only) and group 2 (speech and images). No significant differences were observed between the groups regarding verbally represented medical knowledge. The results show (1) the value of annotation functions for digital and mobile technology for interclinician communication and medical informatics, and (2) the use of guided noticing (the integration of speech, images, and image annotation) leads to significantly improved knowledge gains for visually represented knowledge. This is particularly valuable in situations involving complex visual subject matters, typical in clinical practice.
Smartphones as Multimodal Communication Devices to Facilitate Clinical Knowledge Processes: Randomized Controlled Trial

PubMed Central

Mateescu, Magdalena; Zahn, Carmen; Genewein, Urs

2013-01-01

Background Despite the widespread use and advancements of mobile technology that facilitate rich communication modes, there is little evidence demonstrating the value of smartphones for effective interclinician communication and knowledge processes. Objective The objective of this study was to determine the effects of different synchronous smartphone-based modes of communication, such as (1) speech only, (2) speech and images, and (3) speech, images, and image annotation (guided noticing) on the recall and transfer of visually and verbally represented medical knowledge. Methods The experiment was conducted from November 2011 to May 2012 at the University Hospital Basel (Switzerland) with 42 medical students in a master’s program. All participants analyzed a standardized case (a patient with a subcapital fracture of the fifth metacarpal bone) based on a radiological image, photographs of the hand, and textual descriptions, and were asked to consult a remote surgical specialist via a smartphone. Participants were randomly assigned to 3 experimental conditions/groups. In group 1, the specialist provided verbal explanations (speech only). In group 2, the specialist provided verbal explanations and displayed the radiological image and the photographs to the participants (speech and images). In group 3, the specialist provided verbal explanations, displayed the radiological image and the photographs, and annotated the radiological image by drawing structures/angle elements (speech, images, and image annotation). To assess knowledge recall, participants were asked to write brief summaries of the case (verbally represented knowledge) after the consultation and to re-analyze the diagnostic images (visually represented knowledge). To assess knowledge transfer, participants analyzed a similar case without specialist support. Results Data analysis by ANOVA found that participants in groups 2 and 3 (images used) evaluated the support provided by the specialist as significantly more positive than group 1, the speech-only group (group 1: mean 4.08, SD 0.90; group 2: mean 4.73, SD 0.59; group 3: mean 4.93, SD 0.25; F 2,39=6.76, P=.003; partial η2=0.26, 1–β=.90). However, significant positive effects on the recall and transfer of visually represented medical knowledge were only observed when the smartphone-based communication involved the combination of speech, images, and image annotation (group 3). There were no significant positive effects on the recall and transfer of visually represented knowledge between group 1 (speech only) and group 2 (speech and images). No significant differences were observed between the groups regarding verbally represented medical knowledge. Conclusions The results show (1) the value of annotation functions for digital and mobile technology for interclinician communication and medical informatics, and (2) the use of guided noticing (the integration of speech, images, and image annotation) leads to significantly improved knowledge gains for visually represented knowledge. This is particularly valuable in situations involving complex visual subject matters, typical in clinical practice. PMID:24284080

Conversing with Computers

NASA Technical Reports Server (NTRS)

2004-01-01

I/NET, Inc., is making the dream of natural human-computer conversation a practical reality. Through a combination of advanced artificial intelligence research and practical software design, I/NET has taken the complexity out of developing advanced, natural language interfaces. Conversational capabilities like pronoun resolution, anaphora and ellipsis processing, and dialog management that were once available only in the laboratory can now be brought to any application with any speech recognition system using I/NET s conversational engine middleware.
Impact of dynamic rate coding aspects of mobile phone networks on forensic voice comparison.

PubMed

Alzqhoul, Esam A S; Nair, Balamurali B T; Guillemin, Bernard J

2015-09-01

Previous studies have shown that landline and mobile phone networks are different in their ways of handling the speech signal, and therefore in their impact on it. But the same is also true of the different networks within the mobile phone arena. There are two major mobile phone technologies currently in use today, namely the global system for mobile communications (GSM) and code division multiple access (CDMA) and these are fundamentally different in their design. For example, the quality of the coded speech in the GSM network is a function of channel quality, whereas in the CDMA network it is determined by channel capacity (i.e., the number of users sharing a cell site). This paper examines the impact on the speech signal of a key feature of these networks, namely dynamic rate coding, and its subsequent impact on the task of likelihood-ratio-based forensic voice comparison (FVC). Surprisingly, both FVC accuracy and precision are found to be better for both GSM- and CDMA-coded speech than for uncoded. Intuitively one expects FVC accuracy to increase with increasing coded speech quality. This trend is shown to occur for the CDMA network, but, surprisingly, not for the GSM network. Further, in respect to comparisons between these two networks, FVC accuracy for CDMA-coded speech is shown to be slightly better than for GSM-coded speech, particularly when the coded-speech quality is high, but in terms of FVC precision the two networks are shown to be very similar. Copyright © 2015 The Chartered Society of Forensic Sciences. Published by Elsevier Ireland Ltd. All rights reserved.
A development of intelligent entertainment robot for home life

NASA Astrophysics Data System (ADS)

Kim, Cheoltaek; Lee, Ju-Jang

2005-12-01

The purpose of this paper was to present the study and design idea for entertainment robot with educational purpose (IRFEE). The robot has been designed for home life considering dependability and interaction. The developed robot has three objectives - 1. Develop autonomous robot, 2. Design robot considering mobility and robustness, 3. Develop robot interface and software considering entertainment and education functionalities. The autonomous navigation was implemented by active vision based SLAM and modified EPF algorithm. The two differential wheels, the pan-tilt were designed mobility and robustness and the exterior was designed considering esthetic element and minimizing interference. The speech and tracking algorithm provided the good interface with human. The image transfer and Internet site connection is needed for service of remote connection and educational purpose.
Comparison of Classification Methods for P300 Brain-Computer Interface on Disabled Subjects

PubMed Central

Manyakov, Nikolay V.; Chumerin, Nikolay; Combaz, Adrien; Van Hulle, Marc M.

2011-01-01

We report on tests with a mind typing paradigm based on a P300 brain-computer interface (BCI) on a group of amyotrophic lateral sclerosis (ALS), middle cerebral artery (MCA) stroke, and subarachnoid hemorrhage (SAH) patients, suffering from motor and speech disabilities. We investigate the achieved typing accuracy given the individual patient's disorder, and how it correlates with the type of classifier used. We considered 7 types of classifiers, linear as well as nonlinear ones, and found that, overall, one type of linear classifier yielded a higher classification accuracy. In addition to the selection of the classifier, we also suggest and discuss a number of recommendations to be considered when building a P300-based typing system for disabled subjects. PMID:21941530
An Architectural Experience for Interface Design

ERIC Educational Resources Information Center

Gong, Susan P.

2016-01-01

The problem of human-computer interface design was brought to the foreground with the emergence of the personal computer, the increasing complexity of electronic systems, and the need to accommodate the human operator in these systems. With each new technological generation discovering the interface design problems of its own technologies, initial…
Conduction Aphasia, Sensory-Motor Integration, and Phonological Short-term Memory – an Aggregate analysis of Lesion and fMRI data

PubMed Central

Buchsbaum, Bradley R.; Baldo, Juliana; Okada, Kayoko; Berman, Karen F.; Dronkers, Nina; D’Esposito, Mark; Hickok, Gregory

2011-01-01

Conduction aphasia is a language disorder characterized by frequent speech errors, impaired verbatim repetition, a deficit in phonological short-term memory, and naming difficulties in the presence of otherwise fluent and grammatical speech output. While traditional models of conduction aphasia have typically implicated white matter pathways, recent advances in lesions reconstruction methodology applied to groups of patients have implicated left temporoparietal zones. Parallel work using functional magnetic resonance imaging (fMRI) has pinpointed a region in the posterior most portion of the left planum temporale, area Spt, which is critical for phonological working memory. Here we show that the region of maximal lesion overlap in a sample of 14 patients with conduction aphasia perfectly circumscribes area Spt, as defined in an aggregate fMRI analysis of 105 subjects performing a phonological working memory task. We provide a review of the evidence supporting the idea that Spt is an interface site for the integration of sensory and vocal tract-related motor representations of complex sound sequences, such as speech and music and show how the symptoms of conduction aphasia can be explained by damage to this system. PMID:21256582
Conduction aphasia, sensory-motor integration, and phonological short-term memory - an aggregate analysis of lesion and fMRI data.

PubMed

Buchsbaum, Bradley R; Baldo, Juliana; Okada, Kayoko; Berman, Karen F; Dronkers, Nina; D'Esposito, Mark; Hickok, Gregory

2011-12-01

Conduction aphasia is a language disorder characterized by frequent speech errors, impaired verbatim repetition, a deficit in phonological short-term memory, and naming difficulties in the presence of otherwise fluent and grammatical speech output. While traditional models of conduction aphasia have typically implicated white matter pathways, recent advances in lesions reconstruction methodology applied to groups of patients have implicated left temporoparietal zones. Parallel work using functional magnetic resonance imaging (fMRI) has pinpointed a region in the posterior most portion of the left planum temporale, area Spt, which is critical for phonological working memory. Here we show that the region of maximal lesion overlap in a sample of 14 patients with conduction aphasia perfectly circumscribes area Spt, as defined in an aggregate fMRI analysis of 105 subjects performing a phonological working memory task. We provide a review of the evidence supporting the idea that Spt is an interface site for the integration of sensory and vocal tract-related motor representations of complex sound sequences, such as speech and music and show how the symptoms of conduction aphasia can be explained by damage to this system. 2011 Elsevier Inc. All rights reserved.
Robust Recognition of Loud and Lombard speech in the Fighter Cockpit Environment

DTIC Science & Technology

1988-08-01

the latter as inter-speaker variability. According to Zue [Z85j, inter-speaker variabilities can be attributed to sociolinguistic background, dialect...34 Journal of the Acoustical Society of America , Vol 50, 1971. [At74I B. S. Atal, "Linear prediction for speaker identification," Journal of the Acoustical...Society of America , Vol 55, 1974. [B771 B. Beek, E. P. Neuberg, and D. C. Hodge, "An Assessment of the Technology of Automatic Speech Recognition for
Text to Speech (TTS) Capabilities for the Common Driver Trainer (CDT)

DTIC Science & Technology

2010-10-01

harnessing in’leigle jalClpeno jocelyn linu ~ los angeles lottery margarine mathematlze mathematized mathematized meme memes memol...including Julie, Kate, and Paul . Based upon the names of the voices, it may be that the VoiceText capability is the technology being used currently on...DFTTSExportToFileEx(O, " Paul ", 1, 1033, "Testing the Digital Future Text-to-Speech SDK.", -1, -1, -1, -1, -1, DFTTS_ TEXT_ TYPE_ XML, "test.wav", 0, "", -1
Processing Techniques for Intelligibility Improvement to Speech with Co-Channel Interference.

DTIC Science & Technology

1983-09-01

processing was found to be always less than in the original unprocessed co-channel sig- nali also as the length of the comb filter increased, the...7 D- i35 702 PROCESSING TECHNIQUES FOR INTELLIGIBILITY IMPRO EMENT 1𔃼.TO SPEECH WITH CO-C..(U) SIGNAL TECHNOLOGY INC GOLETACA B A HANSON ET AL SEP...11111111122 11111.25 1111 .4 111.6 MICROCOPY RESOLUTION TEST CHART NATIONAL BUREAU Of STANDARDS- 1963-A RA R.83-225 Set ,’ember 1983 PROCESSING
MEMS capacitive accelerometer-based middle ear microphone.

PubMed

Young, Darrin J; Zurcher, Mark A; Semaan, Maroun; Megerian, Cliff A; Ko, Wen H

2012-12-01

The design, implementation, and characterization of a microelectromechanical systems (MEMS) capacitive accelerometer-based middle ear microphone are presented in this paper. The microphone is intended for middle ear hearing aids as well as future fully implantable cochlear prosthesis. Human temporal bones acoustic response characterization results are used to derive the accelerometer design requirements. The prototype accelerometer is fabricated in a commercial silicon-on-insulator (SOI) MEMS process. The sensor occupies a sensing area of 1 mm × 1 mm with a chip area of 2 mm × 2.4 mm and is interfaced with a custom-designed low-noise electronic IC chip over a flexible substrate. The packaged sensor unit occupies an area of 2.5 mm × 6.2 mm with a weight of 25 mg. The sensor unit attached to umbo can detect a sound pressure level (SPL) of 60 dB at 500 Hz, 35 dB at 2 kHz, and 57 dB at 8 kHz. An improved sound detection limit of 34-dB SPL at 150 Hz and 24-dB SPL at 500 Hz can be expected by employing start-of-the-art MEMS fabrication technology, which results in an articulation index of approximately 0.76. Further micro/nanofabrication technology advancement is needed to enhance the microphone sensitivity for improved understanding of normal conversational speech.
Expert Perspectives on Using Mainstream Mobile Technology for School-Age Children Who Require Augmentative and Alternative Communication (AAC): A Policy Delphi Study

ERIC Educational Resources Information Center

Nguyen, Vinh-An

2017-01-01

Despite legislation in the U.S.A requiring the use of assistive technology in special education, there remains an underutilization of technology-based speech intervention for young students who require augmentative and alternative communication (AAC). The purpose of this Policy Delphi study was to address three guiding research questions that…
A Literature Review on Operator Interface Technologies for Network Enabled Operational Environments Using Complex System Analysis

DTIC Science & Technology

2009-05-30

d’interface fondées sur le comportement et sur la psychologie , ainsi que des méthodes de conception et de mise en œuvre d’interfaces multi-agents. On a mis...réseaucentriques. Ces technologies comprennent des approches de conception d’interface fondées sur le comportement et sur la psychologie , ainsi que des
Therapists' Perspectives: Supporting Children to Use Switches and Technology for Accessing Their Environment, Leisure, and Communication

ERIC Educational Resources Information Center

Beauchamp, Fiona; Bourke-Taylor, Helen; Brown, Ted

2018-01-01

Background: Many children with cerebral palsy learn to use technology to access their environments and communicate; however, minimal research informs practice. Methods: A descriptive qualitative study with purposive sampling recruited 10 therapists (occupational, speech, and physiotherapists) from one early intervention service. Data were…
Nurturing Our Spiritual Imagination in an Age of Science and Technology.

ERIC Educational Resources Information Center

Lear, Norman

1989-01-01

Addresses the issue of spiritual needs in the face of a materialistic, technological, self-aggrandizing culture in a speech to the American Academy of Religion. Urges religious educators to point society in the direction of environmental awareness, reintegrating spirituality with rationality. Sees the vital role religion plays in helping students…
Evaluating the Feasibility of Using Remote Technology for Cochlear Implants

ERIC Educational Resources Information Center

Goehring, Jenny L.; Hughes, Michelle L.; Baudhuin, Jacquelyn L.

2012-01-01

The use of remote technology to provide cochlear implant services has gained popularity in recent years. This article contains a review of research evaluating the feasibility of remote service delivery for recipients of cochlear implants. To date, published studies have determined that speech-processor programming levels and other objective tests…
Selling health data: de-identification, privacy, and speech.

PubMed

Kaplan, Bonnie

2015-07-01

Two court cases that involve selling prescription data for pharmaceutical marketing affect biomedical informatics, patient and clinician privacy, and regulation. Sorrell v. IMS Health Inc. et al. in the United States and R v. Department of Health, Ex Parte Source Informatics Ltd. in the United Kingdom concern privacy and health data protection, data de-identification and reidentification, drug detailing (marketing), commercial benefit from the required disclosure of personal information, clinician privacy and the duty of confidentiality, beneficial and unsavory uses of health data, regulating health technologies, and considering data as speech. Individuals should, at the very least, be aware of how data about them are collected and used. Taking account of how those data are used is needed so societal norms and law evolve ethically as new technologies affect health data privacy and protection.
Interface between Education and Technology: Australia. Education and Polity 1.

ERIC Educational Resources Information Center

Birch, Ian; And Others

The first of three main sections in this review of research covers current and recent developments in the interfacing of education and technology in Australia, with particular attention paid to policy initiatives adopted by governments, industry, academic institutions, and the community with respect to the interface. The second part reviews…
Using leap motion to investigate the emergence of structure in speech and language.

PubMed

Eryilmaz, Kerem; Little, Hannah

2017-10-01

In evolutionary linguistics, experiments using artificial signal spaces are being used to investigate the emergenceof speech structure. These signal spaces need to be continuous, non-discretized spaces from which discrete unitsand patterns can emerge. They need to be dissimilar from-but comparable with-the vocal tract, in order tominimize interference from pre-existing linguistic knowledge, while informing us about language. This is a hardbalance to strike. This article outlines a new approach that uses the Leap Motion, an infrared controller that canconvert manual movement in 3d space into sound. The signal space using this approach is more flexible than signalspaces in previous attempts. Further, output data using this approach is simpler to arrange and analyze. Theexperimental interface was built using free, and mostly open- source libraries in Python. We provide our sourcecode for other researchers as open source.
A survey of the state-of-the-art and focused research in range systems, task 1

NASA Technical Reports Server (NTRS)

Omura, J. K.

1986-01-01

This final report presents the latest research activity in voice compression. We have designed a non-real time simulation system that is implemented around the IBM-PC where the IBM-PC is used as a speech work station for data acquisition and analysis of voice samples. A real-time implementation is also proposed. This real-time Voice Compression Board (VCB) is built around the Texas Instruments TMS-3220. The voice compression algorithm investigated here was described in an earlier report titled, Low Cost Voice Compression for Mobile Digital Radios, by the author. We will assume the reader is familiar with the voice compression algorithm discussed in this report. The VCB compresses speech waveforms at data rates ranging from 4.8 K bps to 16 K bps. This board interfaces to the IBM-PC 8-bit bus, and plugs into a single expansion slot on the mother board.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.