A centralized audio presentation manager
DOE Office of Scientific and Technical Information (OSTI.GOV)
Papp, A.L. III; Blattner, M.M.
1994-05-16
The centralized audio presentation manager addresses the problems which occur when multiple programs running simultaneously attempt to use the audio output of a computer system. Time dependence of sound means that certain auditory messages must be scheduled simultaneously, which can lead to perceptual problems due to psychoacoustic phenomena. Furthermore, the combination of speech and nonspeech audio is examined; each presents its own problems of perceptibility in an acoustic environment composed of multiple auditory streams. The centralized audio presentation manager receives abstract parameterized message requests from the currently running programs, and attempts to create and present a sonic representation in themore » most perceptible manner through the use of a theoretically and empirically designed rule set.« less
Bertollo, David N; Alexander, Mary Jane; Shinn, Marybeth; Aybar, Jalila B
2007-06-01
This column describes the nonproprietary software Talker, used to adapt screening instruments to audio computer-assisted self-interviewing (ACASI) systems for low-literacy populations and other populations. Talker supports ease of programming, multiple languages, on-site scoring, and the ability to update a central research database. Key features include highly readable text display, audio presentation of questions and audio prompting of answers, and optional touch screen input. The scripting language for adapting instruments is briefly described as well as two studies in which respondents provided positive feedback on its use.
TECHNICAL NOTE: Portable audio electronics for impedance-based measurements in microfluidics
NASA Astrophysics Data System (ADS)
Wood, Paul; Sinton, David
2010-08-01
We demonstrate the use of audio electronics-based signals to perform on-chip electrochemical measurements. Cell phones and portable music players are examples of consumer electronics that are easily operated and are ubiquitous worldwide. Audio output (play) and input (record) signals are voltage based and contain frequency and amplitude information. A cell phone, laptop soundcard and two compact audio players are compared with respect to frequency response; the laptop soundcard provides the most uniform frequency response, while the cell phone performance is found to be insufficient. The audio signals in the common portable music players and laptop soundcard operate in the range of 20 Hz to 20 kHz and are found to be applicable, as voltage input and output signals, to impedance-based electrochemical measurements in microfluidic systems. Validated impedance-based measurements of concentration (0.1-50 mM), flow rate (2-120 µL min-1) and particle detection (32 µm diameter) are demonstrated. The prevailing, lossless, wave audio file format is found to be suitable for data transmission to and from external sources, such as a centralized lab, and the cost of all hardware (in addition to audio devices) is ~10 USD. The utility demonstrated here, in combination with the ubiquitous nature of portable audio electronics, presents new opportunities for impedance-based measurements in portable microfluidic systems.
Collecting, Transcribing, Analyzing and Presenting Plurilingual Interactional Data
ERIC Educational Resources Information Center
Moore, Emilee; Llompart, Júlia
2017-01-01
Interactional data is often central to research in plurilingual learning environments. However, getting a grip on the processes of collecting, organizing, transcribing, analyzing and presenting audio and/or visual data is possibly the most exciting, but also one of the most challenging things about learning to do qualitative research. Although the…
Schierholz, Irina; Finke, Mareike; Kral, Andrej; Büchner, Andreas; Rach, Stefan; Lenarz, Thomas; Dengler, Reinhard; Sandmann, Pascale
2017-04-01
There is substantial variability in speech recognition ability across patients with cochlear implants (CIs), auditory brainstem implants (ABIs), and auditory midbrain implants (AMIs). To better understand how this variability is related to central processing differences, the current electroencephalography (EEG) study compared hearing abilities and auditory-cortex activation in patients with electrical stimulation at different sites of the auditory pathway. Three different groups of patients with auditory implants (Hannover Medical School; ABI: n = 6, CI: n = 6; AMI: n = 2) performed a speeded response task and a speech recognition test with auditory, visual, and audio-visual stimuli. Behavioral performance and cortical processing of auditory and audio-visual stimuli were compared between groups. ABI and AMI patients showed prolonged response times on auditory and audio-visual stimuli compared with NH listeners and CI patients. This was confirmed by prolonged N1 latencies and reduced N1 amplitudes in ABI and AMI patients. However, patients with central auditory implants showed a remarkable gain in performance when visual and auditory input was combined, in both speech and non-speech conditions, which was reflected by a strong visual modulation of auditory-cortex activation in these individuals. In sum, the results suggest that the behavioral improvement for audio-visual conditions in central auditory implant patients is based on enhanced audio-visual interactions in the auditory cortex. Their findings may provide important implications for the optimization of electrical stimulation and rehabilitation strategies in patients with central auditory prostheses. Hum Brain Mapp 38:2206-2225, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Audio-Visual, Visuo-Tactile and Audio-Tactile Correspondences in Preschoolers.
Nava, Elena; Grassi, Massimo; Turati, Chiara
2016-01-01
Interest in crossmodal correspondences has recently seen a renaissance thanks to numerous studies in human adults. Yet, still very little is known about crossmodal correspondences in children, particularly in sensory pairings other than audition and vision. In the current study, we investigated whether 4-5-year-old children match auditory pitch to the spatial motion of visual objects (audio-visual condition). In addition, we investigated whether this correspondence extends to touch, i.e., whether children also match auditory pitch to the spatial motion of touch (audio-tactile condition) and the spatial motion of visual objects to touch (visuo-tactile condition). In two experiments, two different groups of children were asked to indicate which of two stimuli fitted best with a centrally located third stimulus (Experiment 1), or to report whether two presented stimuli fitted together well (Experiment 2). We found sensitivity to the congruency of all of the sensory pairings only in Experiment 2, suggesting that only under specific circumstances can these correspondences be observed. Our results suggest that pitch-height correspondences for audio-visual and audio-tactile combinations may still be weak in preschool children, and speculate that this could be due to immature linguistic and auditory cues that are still developing at age five.
BOLDSync: a MATLAB-based toolbox for synchronized stimulus presentation in functional MRI.
Joshi, Jitesh; Saharan, Sumiti; Mandal, Pravat K
2014-02-15
Precise and synchronized presentation of paradigm stimuli in functional magnetic resonance imaging (fMRI) is central to obtaining accurate information about brain regions involved in a specific task. In this manuscript, we present a new MATLAB-based toolbox, BOLDSync, for synchronized stimulus presentation in fMRI. BOLDSync provides a user friendly platform for design and presentation of visual, audio, as well as multimodal audio-visual (AV) stimuli in functional imaging experiments. We present simulation experiments that demonstrate the millisecond synchronization accuracy of BOLDSync, and also illustrate the functionalities of BOLDSync through application to an AV fMRI study. BOLDSync gains an advantage over other available proprietary and open-source toolboxes by offering a user friendly and accessible interface that affords both precision in stimulus presentation and versatility across various types of stimulus designs and system setups. BOLDSync is a reliable, efficient, and versatile solution for synchronized stimulus presentation in fMRI study. Copyright © 2013 Elsevier B.V. All rights reserved.
Audio-visual presentation of information for informed consent for participation in clinical trials.
Ryan, R E; Prictor, M J; McLaughlin, K J; Hill, S J
2008-01-23
Informed consent is a critical component of clinical research. Different methods of presenting information to potential participants of clinical trials may improve the informed consent process. Audio-visual interventions (presented for example on the Internet, DVD, or video cassette) are one such method. To assess the effects of providing audio-visual information alone, or in conjunction with standard forms of information provision, to potential clinical trial participants in the informed consent process, in terms of their satisfaction, understanding and recall of information about the study, level of anxiety and their decision whether or not to participate. We searched: the Cochrane Consumers and Communication Review Group Specialised Register (searched 20 June 2006); the Cochrane Central Register of Controlled Trials (CENTRAL), The Cochrane Library, issue 2, 2006; MEDLINE (Ovid) (1966 to June week 1 2006); EMBASE (Ovid) (1988 to 2006 week 24); and other databases. We also searched reference lists of included studies and relevant review articles, and contacted study authors and experts. There were no language restrictions. Randomised and quasi-randomised controlled trials comparing audio-visual information alone, or in conjunction with standard forms of information provision (such as written or oral information as usually employed in the particular service setting), with standard forms of information provision alone, in the informed consent process for clinical trials. Trials involved individuals or their guardians asked to participate in a real (not hypothetical) clinical study. Two authors independently assessed studies for inclusion and extracted data. Due to heterogeneity no meta-analysis was possible; we present the findings in a narrative review. We included 4 trials involving data from 511 people. Studies were set in the USA and Canada. Three were randomised controlled trials (RCTs) and the fourth a quasi-randomised trial. Their quality was mixed and results should be interpreted with caution. Considerable uncertainty remains about the effects of audio-visual interventions, compared with standard forms of information provision (such as written or oral information normally used in the particular setting), for use in the process of obtaining informed consent for clinical trials. Audio-visual interventions did not consistently increase participants' levels of knowledge/understanding (assessed in four studies), although one study showed better retention of knowledge amongst intervention recipients. An audio-visual intervention may transiently increase people's willingness to participate in trials (one study), but this was not sustained at two to four weeks post-intervention. Perceived worth of the trial did not appear to be influenced by an audio-visual intervention (one study), but another study suggested that the quality of information disclosed may be enhanced by an audio-visual intervention. Many relevant outcomes including harms were not measured. The heterogeneity in results may reflect the differences in intervention design, content and delivery, the populations studied and the diverse methods of outcome assessment in included studies. The value of audio-visual interventions for people considering participating in clinical trials remains unclear. Evidence is mixed as to whether audio-visual interventions enhance people's knowledge of the trial they are considering entering, and/or the health condition the trial is designed to address; one study showed improved retention of knowledge amongst intervention recipients. The intervention may also have small positive effects on the quality of information disclosed, and may increase willingness to participate in the short-term; however the evidence is weak. There were no data for several primary outcomes, including harms. In the absence of clear results, triallists should continue to explore innovative methods of providing information to potential trial participants. Further research should take the form of high-quality randomised controlled trials, with clear reporting of methods. Studies should conduct content assessment of audio-visual and other innovative interventions for people of differing levels of understanding and education; also for different age and cultural groups. Researchers should assess systematically the effects of different intervention components and delivery characteristics, and should involve consumers in intervention development. Studies should assess additional outcomes relevant to individuals' decisional capacity, using validated tools, including satisfaction; anxiety; and adherence to the subsequent trial protocol.
Rouhani, R; Cronenberger, H; Stein, L; Hannum, W; Reed, A M; Wilhelm, C; Hsiao, H
1995-01-01
This paper describes the design, authoring, and development of interactive, computerized, multimedia clinical simulations in pediatric rheumatology/immunology and related musculoskeletal diseases, the development and implementation of a high speed information management system for their centralized storage and distribution, and analytical methods for evaluating the total system's educational impact on medical students and pediatric residents. An FDDI fiber optic network with client/server/host architecture is the core. The server houses digitized audio, still-image video clips and text files. A host station houses the DB2/2 database containing case-associated labels and information. Cases can be accessed from any workstation via a customized interface in AVA/2 written specifically for this application. OS/2 Presentation Manager controls, written in C, are incorporated into the interface. This interface allows SQL searches and retrievals of cases and case materials. In addition to providing user-directed clinical experiences, this centralized information management system provides designated faculty with the ability to add audio notes and visual pointers to image files. Users may browse through case materials, mark selected ones and download them for utilization in lectures or for editing and converting into 35mm slides.
Minds-On Audio-Guided Activities (MAGA): More than Hearing and Better than Seeing
ERIC Educational Resources Information Center
Hancock, James Brian, II; Fornari, Marco
2012-01-01
Minds-On Audio-Guided Activities (MAGA) are podcast-delivered instruction designed to facilitate learning through all-body experiments. Instruction by MAGA has undergone preliminary testing in an introductory physics course at Central Michigan University. Topics are currently focused on mechanics and range from discovering the differences between…
Space Shuttle Orbiter audio subsystem. [to communication and tracking system
NASA Technical Reports Server (NTRS)
Stewart, C. H.
1978-01-01
The selection of the audio multiplex control configuration for the Space Shuttle Orbiter audio subsystem is discussed and special attention is given to the evaluation criteria of cost, weight and complexity. The specifications and design of the subsystem are described and detail is given to configurations of the audio terminal and audio central control unit (ATU, ACCU). The audio input from the ACCU, at a signal level of -12.2 to 14.8 dBV, nominal range, at 1 kHz, was found to have balanced source impedance and a balanced local impedance of 6000 + or - 600 ohms at 1 kHz, dc isolated. The Lyndon B. Johnson Space Center (JSC) electroacoustic test laboratory, an audio engineering facility consisting of a collection of acoustic test chambers, analyzed problems of speaker and headset performance, multiplexed control data coupled with audio channels, and the Orbiter cabin acoustic effects on the operational performance of voice communications. This system allows technical management and project engineering to address key constraining issues, such as identifying design deficiencies of the headset interface unit and the assessment of the Orbiter cabin performance of voice communications, which affect the subsystem development.
The Personal Hearing System—A Software Hearing Aid for a Personal Communication System
NASA Astrophysics Data System (ADS)
Grimm, Giso; Guilmin, Gwénaël; Poppen, Frank; Vlaming, Marcel S. M. G.; Hohmann, Volker
2009-12-01
A concept and architecture of a personal communication system (PCS) is introduced that integrates audio communication and hearing support for the elderly and hearing-impaired through a personal hearing system (PHS). The concept envisions a central processor connected to audio headsets via a wireless body area network (WBAN). To demonstrate the concept, a prototype PCS is presented that is implemented on a netbook computer with a dedicated audio interface in combination with a mobile phone. The prototype can be used for field-testing possible applications and to reveal possibilities and limitations of the concept of integrating hearing support in consumer audio communication devices. It is shown that the prototype PCS can integrate hearing aid functionality, telephony, public announcement systems, and home entertainment. An exemplary binaural speech enhancement scheme that represents a large class of possible PHS processing schemes is shown to be compatible with the general concept. However, an analysis of hardware and software architectures shows that the implementation of a PCS on future advanced cell phone-like devices is challenging. Because of limitations in processing power, recoding of prototype implementations into fixed point arithmetic will be required and WBAN performance is still a limiting factor in terms of data rate and delay.
ERIC Educational Resources Information Center
Machado, Calixto; Estévez, Mario; Leisman, Gerry; Melillo, Robert; Rodríguez, Rafael; DeFina, Phillip; Hernández, Adrián; Pérez-Nellar, Jesús; Naranjo, Rolando; Chinchilla, Mauricio; Garófalo, Nicolás; Vargas, José; Beltrán, Carlos
2015-01-01
We studied autistics by quantitative EEG spectral and coherence analysis during three experimental conditions: basal, watching a cartoon with audio (V-A), and with muted audio band (VwA). Significant reductions were found for the absolute power spectral density (PSD) in the central region for delta and theta, and in the posterior region for sigma…
Babjack, Destiny L; Cernicky, Brandon; Sobotka, Andrew J; Basler, Lee; Struthers, Devon; Kisic, Richard; Barone, Kimberly; Zuccolotto, Anthony P
2015-09-01
Using differing computer platforms and audio output devices to deliver audio stimuli often introduces (1) substantial variability across labs and (2) variable time between the intended and actual sound delivery (the sound onset latency). Fast, accurate audio onset latencies are particularly important when audio stimuli need to be delivered precisely as part of studies that depend on accurate timing (e.g., electroencephalographic, event-related potential, or multimodal studies), or in multisite studies in which standardization and strict control over the computer platforms used is not feasible. This research describes the variability introduced by using differing configurations and introduces a novel approach to minimizing audio sound latency and variability. A stimulus presentation and latency assessment approach is presented using E-Prime and Chronos (a new multifunction, USB-based data presentation and collection device). The present approach reliably delivers audio stimuli with low latencies that vary by ≤1 ms, independent of hardware and Windows operating system (OS)/driver combinations. The Chronos audio subsystem adopts a buffering, aborting, querying, and remixing approach to the delivery of audio, to achieve a consistent 1-ms sound onset latency for single-sound delivery, and precise delivery of multiple sounds that achieves standard deviations of 1/10th of a millisecond without the use of advanced scripting. Chronos's sound onset latencies are small, reliable, and consistent across systems. Testing of standard audio delivery devices and configurations highlights the need for careful attention to consistency between labs, experiments, and multiple study sites in their hardware choices, OS selections, and adoption of audio delivery systems designed to sidestep the audio latency variability issue.
Audio-visual presentation of information for informed consent for participation in clinical trials.
Synnot, Anneliese; Ryan, Rebecca; Prictor, Megan; Fetherstonhaugh, Deirdre; Parker, Barbara
2014-05-09
Informed consent is a critical component of clinical research. Different methods of presenting information to potential participants of clinical trials may improve the informed consent process. Audio-visual interventions (presented, for example, on the Internet or on DVD) are one such method. We updated a 2008 review of the effects of these interventions for informed consent for trial participation. To assess the effects of audio-visual information interventions regarding informed consent compared with standard information or placebo audio-visual interventions regarding informed consent for potential clinical trial participants, in terms of their understanding, satisfaction, willingness to participate, and anxiety or other psychological distress. We searched: the Cochrane Central Register of Controlled Trials (CENTRAL), The Cochrane Library, issue 6, 2012; MEDLINE (OvidSP) (1946 to 13 June 2012); EMBASE (OvidSP) (1947 to 12 June 2012); PsycINFO (OvidSP) (1806 to June week 1 2012); CINAHL (EbscoHOST) (1981 to 27 June 2012); Current Contents (OvidSP) (1993 Week 27 to 2012 Week 26); and ERIC (Proquest) (searched 27 June 2012). We also searched reference lists of included studies and relevant review articles, and contacted study authors and experts. There were no language restrictions. We included randomised and quasi-randomised controlled trials comparing audio-visual information alone, or in conjunction with standard forms of information provision (such as written or verbal information), with standard forms of information provision or placebo audio-visual information, in the informed consent process for clinical trials. Trials involved individuals or their guardians asked to consider participating in a real or hypothetical clinical study. (In the earlier version of this review we only included studies evaluating informed consent interventions for real studies). Two authors independently assessed studies for inclusion and extracted data. We synthesised the findings using meta-analysis, where possible, and narrative synthesis of results. We assessed the risk of bias of individual studies and considered the impact of the quality of the overall evidence on the strength of the results. We included 16 studies involving data from 1884 participants. Nine studies included participants considering real clinical trials, and eight included participants considering hypothetical clinical trials, with one including both. All studies were conducted in high-income countries.There is still much uncertainty about the effect of audio-visual informed consent interventions on a range of patient outcomes. However, when considered across comparisons, we found low to very low quality evidence that such interventions may slightly improve knowledge or understanding of the parent trial, but may make little or no difference to rate of participation or willingness to participate. Audio-visual presentation of informed consent may improve participant satisfaction with the consent information provided. However its effect on satisfaction with other aspects of the process is not clear. There is insufficient evidence to draw conclusions about anxiety arising from audio-visual informed consent. We found conflicting, very low quality evidence about whether audio-visual interventions took more or less time to administer. No study measured researcher satisfaction with the informed consent process, nor ease of use.The evidence from real clinical trials was rated as low quality for most outcomes, and for hypothetical studies, very low. We note, however, that this was in large part due to poor study reporting, the hypothetical nature of some studies and low participant numbers, rather than inconsistent results between studies or confirmed poor trial quality. We do not believe that any studies were funded by organisations with a vested interest in the results. The value of audio-visual interventions as a tool for helping to enhance the informed consent process for people considering participating in clinical trials remains largely unclear, although trends are emerging with regard to improvements in knowledge and satisfaction. Many relevant outcomes have not been evaluated in randomised trials. Triallists should continue to explore innovative methods of providing information to potential trial participants during the informed consent process, mindful of the range of outcomes that the intervention should be designed to achieve, and balancing the resource implications of intervention development and delivery against the purported benefits of any intervention.More trials, adhering to CONSORT standards, and conducted in settings and populations underserved in this review, i.e. low- and middle-income countries and people with low literacy, would strengthen the results of this review and broaden its applicability. Assessing process measures, such as time taken to administer the intervention and researcher satisfaction, would inform the implementation of audio-visual consent materials.
pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.
Giannakopoulos, Theodoros
2015-01-01
Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.
pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis
Giannakopoulos, Theodoros
2015-01-01
Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library. PMID:26656189
ERIC Educational Resources Information Center
Diambra, Henry M.; And Others
VIDAC (Video Audio Compressed), a new technology based upon non-real-time transmission of audiovisual information via conventional television systems, has been invented by the Westinghouse Electric Corporation. This system permits time compression, during storage and transmission of the audio component of a still visual-narrative audio…
Full Mesh Audio Conferencing Using the Point-to-Multipoint On-Board Switching Capability of ACTS
NASA Technical Reports Server (NTRS)
Rivett, Mary L.; Sethna, Zubin H.
1996-01-01
The purpose of this paper is to describe an implementation of audio conferencing using the ACTS T1-VSAT network. In particular, this implementation evaluates the use of the on-board switching capability of the satellite as a viable alternative for providing the multipoint connectivity normally provided by terrestrial audio bridge equipment The system that was implemented provides full mesh, full-duplex audio conferencing, with end-to-end voice paths between all participants requiring only a single hop (i.e. 250 msec. delay). Moreover, it addresses the lack of spontaneity in current systems by allowing a user to easily start a conference from any standard telephone handset connected to an ACTS earth station, and quickly add new members to the conference at any time using the 'hook flash' capability. No prior scheduling of resources is required and there is no central point of control, thereby providing the user with the spontaneity desired in audio conference control.
ERIC Educational Resources Information Center
Jongbloed, Harry J. L.
As the fourth part of a comparative study on the administration of audiovisual services in advanced and developing countries, this UNESCO-funded study reports on the African countries of Cameroun, Republic of Central Africa, Dahomey, Gabon, Ghana, Kenya, Libya, Mali, Nigeria, Rwanda, Senegal, Swaziland, Tunisia, Upper Volta and Zambia. Information…
NASA Astrophysics Data System (ADS)
Shimizu, Dominique
Though blended course audio feedback has been associated with several measures of course satisfaction at the postsecondary and graduate levels compared to text feedback, it may take longer to prepare and positive results are largely unverified in K-12 literature. The purpose of this quantitative study was to investigate the time investment and learning impact of audio communications with 228 secondary students in a blended online learning biology unit at a central Florida public high school. A short, individualized audio message regarding the student's progress was given to each student in the audio group; similar text-based messages were given to each student in the text-based group on the same schedule; a control got no feedback. A pretest and posttest were employed to measure learning gains in the three groups. To compare the learning gains in two types of feedback with each other and to no feedback, a controlled, randomized, experimental design was implemented. In addition, the creation and posting of audio and text feedback communications were timed in order to assess whether audio feedback took longer to produce than text only feedback. While audio feedback communications did take longer to create and post, there was no difference between learning gains as measured by posttest scores when student received audio, text-based, or no feedback. Future studies using a similar randomized, controlled experimental design are recommended to verify these results and test whether the trend holds in a broader range of subjects, over different time frames, and using a variety of assessment types to measure student learning.
ERIC Educational Resources Information Center
Udo, J. P.; Acevedo, B.; Fels, D. I.
2010-01-01
Audio description (AD) has been introduced as one solution for providing people who are blind or have low vision with access to live theatre, film and television content. However, there is little research to inform the process, user preferences and presentation style. We present a study of a single live audio-described performance of Hart House…
Video content parsing based on combined audio and visual information
NASA Astrophysics Data System (ADS)
Zhang, Tong; Kuo, C.-C. Jay
1999-08-01
While previous research on audiovisual data segmentation and indexing primarily focuses on the pictorial part, significant clues contained in the accompanying audio flow are often ignored. A fully functional system for video content parsing can be achieved more successfully through a proper combination of audio and visual information. By investigating the data structure of different video types, we present tools for both audio and visual content analysis and a scheme for video segmentation and annotation in this research. In the proposed system, video data are segmented into audio scenes and visual shots by detecting abrupt changes in audio and visual features, respectively. Then, the audio scene is categorized and indexed as one of the basic audio types while a visual shot is presented by keyframes and associate image features. An index table is then generated automatically for each video clip based on the integration of outputs from audio and visual analysis. It is shown that the proposed system provides satisfying video indexing results.
One Message, Many Voices: Mobile Audio Counselling in Health Education.
Pimmer, Christoph; Mbvundula, Francis
2018-01-01
Health workers' use of counselling information on their mobile phones for health education is a central but little understood phenomenon in numerous mobile health (mHealth) projects in Sub-Saharan Africa. Drawing on empirical data from an interpretive case study in the setting of the Millennium Villages Project in rural Malawi, this research investigates the ways in which community health workers (CHWs) perceive that audio-counselling messages support their health education practice. Three main themes emerged from the analysis: phone-aided audio counselling (1) legitimises the CHWs' use of mobile phones during household visits; (2) helps CHWs to deliver a comprehensive counselling message; (3) supports CHWs in persuading communities to change their health practices. The findings show the complexity and interplay of the multi-faceted, sociocultural, political, and socioemotional meanings associated with audio-counselling use. Practical implications and the demand for further research are discussed.
The Audio Description as a Physics Teaching Tool
ERIC Educational Resources Information Center
Cozendey, Sabrina; Costa, Maria da Piedade
2016-01-01
This study analyses the use of audio description in teaching physics concepts, aiming to determine the variables that influence the understanding of the concept. One education resource was audio described. For make the audio description the screen was freezing. The video with and without audio description should be presented to students, so that…
Electrophysiological evidence for Audio-visuo-lingual speech integration.
Treille, Avril; Vilain, Coriandre; Schwartz, Jean-Luc; Hueber, Thomas; Sato, Marc
2018-01-31
Recent neurophysiological studies demonstrate that audio-visual speech integration partly operates through temporal expectations and speech-specific predictions. From these results, one common view is that the binding of auditory and visual, lipread, speech cues relies on their joint probability and prior associative audio-visual experience. The present EEG study examined whether visual tongue movements integrate with relevant speech sounds, despite little associative audio-visual experience between the two modalities. A second objective was to determine possible similarities and differences of audio-visual speech integration between unusual audio-visuo-lingual and classical audio-visuo-labial modalities. To this aim, participants were presented with auditory, visual, and audio-visual isolated syllables, with the visual presentation related to either a sagittal view of the tongue movements or a facial view of the lip movements of a speaker, with lingual and facial movements previously recorded by an ultrasound imaging system and a video camera. In line with previous EEG studies, our results revealed an amplitude decrease and a latency facilitation of P2 auditory evoked potentials in both audio-visual-lingual and audio-visuo-labial conditions compared to the sum of unimodal conditions. These results argue against the view that auditory and visual speech cues solely integrate based on prior associative audio-visual perceptual experience. Rather, they suggest that dynamic and phonetic informational cues are sharable across sensory modalities, possibly through a cross-modal transfer of implicit articulatory motor knowledge. Copyright © 2017 Elsevier Ltd. All rights reserved.
Eye movements while viewing narrated, captioned, and silent videos
Ross, Nicholas M.; Kowler, Eileen
2013-01-01
Videos are often accompanied by narration delivered either by an audio stream or by captions, yet little is known about saccadic patterns while viewing narrated video displays. Eye movements were recorded while viewing video clips with (a) audio narration, (b) captions, (c) no narration, or (d) concurrent captions and audio. A surprisingly large proportion of time (>40%) was spent reading captions even in the presence of a redundant audio stream. Redundant audio did not affect the saccadic reading patterns but did lead to skipping of some portions of the captions and to delays of saccades made into the caption region. In the absence of captions, fixations were drawn to regions with a high density of information, such as the central region of the display, and to regions with high levels of temporal change (actions and events), regardless of the presence of narration. The strong attraction to captions, with or without redundant audio, raises the question of what determines how time is apportioned between captions and video regions so as to minimize information loss. The strategies of apportioning time may be based on several factors, including the inherent attraction of the line of sight to any available text, the moment by moment impressions of the relative importance of the information in the caption and the video, and the drive to integrate visual text accompanied by audio into a single narrative stream. PMID:23457357
Navit, Saumya; Johri, Nikita; Khan, Suleman Abbas; Singh, Rahul Kumar; Chadha, Dheera; Navit, Pragati; Sharma, Anshul; Bahuguna, Rachana
2015-12-01
Dental anxiety is a widespread phenomenon and a concern for paediatric dentistry. The inability of children to deal with threatening dental stimuli often manifests as behaviour management problems. Nowadays, the use of non-aversive behaviour management techniques is more advocated, which are more acceptable to parents, patients and practitioners. Therefore, this present study was conducted to find out which audio aid was the most effective in the managing anxious children. The aim of the present study was to compare the efficacy of audio-distraction aids in reducing the anxiety of paediatric patients while undergoing various stressful and invasive dental procedures. The objectives were to ascertain whether audio distraction is an effective means of anxiety management and which type of audio aid is the most effective. A total number of 150 children, aged between 6 to 12 years, randomly selected amongst the patients who came for their first dental check-up, were placed in five groups of 30 each. These groups were the control group, the instrumental music group, the musical nursery rhymes group, the movie songs group and the audio stories group. The control group was treated under normal set-up & audio group listened to various audio presentations during treatment. Each child had four visits. In each visit, after the procedures was completed, the anxiety levels of the children were measured by the Venham's Picture Test (VPT), Venham's Clinical Rating Scale (VCRS) and pulse rate measurement with the help of pulse oximeter. A significant difference was seen between all the groups for the mean pulse rate, with an increase in subsequent visit. However, no significant difference was seen in the VPT & VCRS scores between all the groups. Audio aids in general reduced anxiety in comparison to the control group, and the most significant reduction in anxiety level was observed in the audio stories group. The conclusion derived from the present study was that audio distraction was effective in reducing anxiety and audio-stories were the most effective.
ERIC Educational Resources Information Center
Sewell, Edward H., Jr.
This study investigates the effects of cartoon illustrations on female and male college student comprehension and evaluation of information presented in several combinations of print, audio, and visual formats. Subjects were assigned to one of five treatment groups: printed text, printed text with cartoons, audiovisual presentations, audio only…
Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin'ya
2013-01-01
It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.
Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap
Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin’Ya
2013-01-01
It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap. PMID:23658549
Effective Use of Audio Media in Multimedia Presentations.
ERIC Educational Resources Information Center
Kerr, Brenda
This paper emphasizes research-based reasons for adding audio to multimedia presentations. The first section summarizes suggestions from a review of research on the effectiveness of audio media when accompanied by other forms of media; types of research studies (e.g., evaluation, intra-medium, and aptitude treatment interaction studies) are also…
The Use of Audio and Animation in Computer Based Instruction.
ERIC Educational Resources Information Center
Koroghlanian, Carol; Klein, James D.
This study investigated the effects of audio, animation, and spatial ability in a computer-based instructional program for biology. The program presented instructional material via test or audio with lean text and included eight instructional sequences presented either via static illustrations or animations. High school students enrolled in a…
Open Touch/Sound Maps: A system to convey street data through haptic and auditory feedback
NASA Astrophysics Data System (ADS)
Kaklanis, Nikolaos; Votis, Konstantinos; Tzovaras, Dimitrios
2013-08-01
The use of spatial (geographic) information is becoming ever more central and pervasive in today's internet society but the most of it is currently inaccessible to visually impaired users. However, access in visual maps is severely restricted to visually impaired and people with blindness, due to their inability to interpret graphical information. Thus, alternative ways of a map's presentation have to be explored, in order to enforce the accessibility of maps. Multiple types of sensory perception like touch and hearing may work as a substitute of vision for the exploration of maps. The use of multimodal virtual environments seems to be a promising alternative for people with visual impairments. The present paper introduces a tool for automatic multimodal map generation having haptic and audio feedback using OpenStreetMap data. For a desired map area, an elevation map is being automatically generated and can be explored by touch, using a haptic device. A sonification and a text-to-speech (TTS) mechanism provide also audio navigation information during the haptic exploration of the map.
The Effect of Audio and Animation in Multimedia Instruction
ERIC Educational Resources Information Center
Koroghlanian, Carol; Klein, James D.
2004-01-01
This study investigated the effects of audio, animation, and spatial ability in a multimedia computer program for high school biology. Participants completed a multimedia program that presented content by way of text or audio with lean text. In addition, several instructional sequences were presented either with static illustrations or animations.…
Code of Federal Regulations, 2012 CFR
2012-01-01
... concerning the accessibility of videos, DVDs, and other audio-visual presentations shown on-aircraft to... meet concerning the accessibility of videos, DVDs, and other audio-visual presentations shown on... videos, DVDs, and other audio-visual displays played on aircraft for safety purposes, and all such new...
Code of Federal Regulations, 2013 CFR
2013-01-01
... concerning the accessibility of videos, DVDs, and other audio-visual presentations shown on-aircraft to... meet concerning the accessibility of videos, DVDs, and other audio-visual presentations shown on... videos, DVDs, and other audio-visual displays played on aircraft for safety purposes, and all such new...
Instrumental Landing Using Audio Indication
NASA Astrophysics Data System (ADS)
Burlak, E. A.; Nabatchikov, A. M.; Korsun, O. N.
2018-02-01
The paper proposes an audio indication method for presenting to a pilot the information regarding the relative positions of an aircraft in the tasks of precision piloting. The implementation of the method is presented, the use of such parameters of audio signal as loudness, frequency and modulation are discussed. To confirm the operability of the audio indication channel the experiments using modern aircraft simulation facility were carried out. The simulated performed the instrument landing using the proposed audio method to indicate the aircraft deviations in relation to the slide path. The results proved compatible with the simulated instrumental landings using the traditional glidescope pointers. It inspires to develop the method in order to solve other precision piloting tasks.
ERIC Educational Resources Information Center
Anderson, Gerald D.; Olson, David B.
The document presents the transcript of the audio narrative portion of approximately 100 interviews with first and second generation Scandinavian immigrants to the United States. The document is intended for use by secondary school classroom teachers as they develop and implement educational programs related to the Scandinavian heritage in…
Review of Audio Interfacing Literature for Computer-Assisted Music Instruction.
ERIC Educational Resources Information Center
Watanabe, Nan
1980-01-01
Presents a review of the literature dealing with audio devices used in computer assisted music instruction and discusses the need for research and development of reliable, cost-effective, random access audio hardware. (Author)
ERIC Educational Resources Information Center
Schlenker, Richard M.; And Others
Presented is a manuscript for an introductory boiler water chemistry course for marine engineer education. The course is modular, self-paced, audio-tutorial, contract graded and combined lecture-laboratory instructed. Lectures are presented to students individually via audio-tapes and 35 mm slides. The course consists of a total of 17 modules -…
Mining knowledge in noisy audio data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Czyzewski, A.
1996-12-31
This paper demonstrates a KDD method applied to audio data analysis, particularly, it presents possibilities which result from replacing traditional methods of analysis and acoustic signal processing by KDD algorithms when restoring audio recordings affected by strong noise.
Papadopoulos, Konstantinos; Koustriava, Eleni; Koukourikos, Panagiotis; Kartasidou, Lefkothea; Barouti, Marialena; Varveris, Asimis; Misiou, Marina; Zacharogeorga, Timoclia; Anastasiadis, Theocharis
2017-01-01
Disorientation and inability of wayfinding are phenomena with a great frequency for individuals with visual impairments during the process of travelling novel environments. Orientation and mobility aids could suggest important tools for the preparation of a more secure and cognitively mapped travelling. The aim of the present study was to examine if spatial knowledge structured after an individual with blindness had studied the map of an urban area that was delivered through a verbal description, an audio-tactile map or an audio-haptic map, could be used for detecting in the area specific points of interest. The effectiveness of the three aids with reference to each other was also examined. The results of the present study highlight the effectiveness of the audio-tactile and the audio-haptic maps as orientation and mobility aids, especially when these are compared to verbal descriptions.
Kumar, Deepesh; Verma, Sunny; Bhattacharya, Sutapa; Lahiri, Uttama
2016-06-13
Neurological disorders often manifest themselves in the form of movement deficit on the part of the patient. Conventional rehabilitation often used to address these deficits, though powerful are often monotonous in nature. Adequate audio-visual stimulation can prove to be motivational. In the research presented here we indicate the applicability of audio-visual stimulation to rehabilitation exercises to address at least some of the movement deficits for upper and lower limbs. Added to the audio-visual stimulation, we also use Functional Electrical Stimulation (FES). In our presented research we also show the applicability of FES in conjunction with audio-visual stimulation delivered through VR-based platform for grasping skills of patients with movement disorder.
Spatialized audio improves call sign recognition during multi-aircraft control.
Kim, Sungbin; Miller, Michael E; Rusnock, Christina F; Elshaw, John J
2018-07-01
We investigated the impact of a spatialized audio display on response time, workload, and accuracy while monitoring auditory information for relevance. The human ability to differentiate sound direction implies that spatial audio may be used to encode information. Therefore, it is hypothesized that spatial audio cues can be applied to aid differentiation of critical versus noncritical verbal auditory information. We used a human performance model and a laboratory study involving 24 participants to examine the effect of applying a notional, automated parser to present audio in a particular ear depending on information relevance. Operator workload and performance were assessed while subjects listened for and responded to relevant audio cues associated with critical information among additional noncritical information. Encoding relevance through spatial location in a spatial audio display system--as opposed to monophonic, binaural presentation--significantly reduced response time and workload, particularly for noncritical information. Future auditory displays employing spatial cues to indicate relevance have the potential to reduce workload and improve operator performance in similar task domains. Furthermore, these displays have the potential to reduce the dependence of workload and performance on the number of audio cues. Published by Elsevier Ltd.
Implementing Audio-CASI on Windows’ Platforms
Cooley, Philip C.; Turner, Charles F.
2011-01-01
Audio computer-assisted self interviewing (Audio-CASI) technologies have recently been shown to provide important and sometimes dramatic improvements in the quality of survey measurements. This is particularly true for measurements requiring respondents to divulge highly sensitive information such as their sexual, drug use, or other sensitive behaviors. However, DOS-based Audio-CASI systems that were designed and adopted in the early 1990s have important limitations. Most salient is the poor control they provide for manipulating the video presentation of survey questions. This article reports our experiences adapting Audio-CASI to Microsoft Windows 3.1 and Windows 95 platforms. Overall, our Windows-based system provided the desired control over video presentation and afforded other advantages including compatibility with a much wider array of audio devices than our DOS-based Audio-CASI technologies. These advantages came at the cost of increased system requirements --including the need for both more RAM and larger hard disks. While these costs will be an issue for organizations converting large inventories of PCS to Windows Audio-CASI today, this will not be a serious constraint for organizations and individuals with small inventories of machines to upgrade or those purchasing new machines today. PMID:22081743
Method for Reading Sensors and Controlling Actuators Using Audio Interfaces of Mobile Devices
Aroca, Rafael V.; Burlamaqui, Aquiles F.; Gonçalves, Luiz M. G.
2012-01-01
This article presents a novel closed loop control architecture based on audio channels of several types of computing devices, such as mobile phones and tablet computers, but not restricted to them. The communication is based on an audio interface that relies on the exchange of audio tones, allowing sensors to be read and actuators to be controlled. As an application example, the presented technique is used to build a low cost mobile robot, but the system can also be used in a variety of mechatronics applications and sensor networks, where smartphones are the basic building blocks. PMID:22438726
Method for reading sensors and controlling actuators using audio interfaces of mobile devices.
Aroca, Rafael V; Burlamaqui, Aquiles F; Gonçalves, Luiz M G
2012-01-01
This article presents a novel closed loop control architecture based on audio channels of several types of computing devices, such as mobile phones and tablet computers, but not restricted to them. The communication is based on an audio interface that relies on the exchange of audio tones, allowing sensors to be read and actuators to be controlled. As an application example, the presented technique is used to build a low cost mobile robot, but the system can also be used in a variety of mechatronics applications and sensor networks, where smartphones are the basic building blocks.
Zhang, Zhengyi; Zhang, Gaoyan; Zhang, Yuanyuan; Liu, Hong; Xu, Junhai; Liu, Baolin
2017-12-01
This study aimed to investigate the functional connectivity in the brain during the cross-modal integration of polyphonic characters in Chinese audio-visual sentences. The visual sentences were all semantically reasonable and the audible pronunciations of the polyphonic characters in corresponding sentences contexts varied in four conditions. To measure the functional connectivity, correlation, coherence and phase synchronization index (PSI) were used, and then multivariate pattern analysis was performed to detect the consensus functional connectivity patterns. These analyses were confined in the time windows of three event-related potential components of P200, N400 and late positive shift (LPS) to investigate the dynamic changes of the connectivity patterns at different cognitive stages. We found that when differentiating the polyphonic characters with abnormal pronunciations from that with the appreciate ones in audio-visual sentences, significant classification results were obtained based on the coherence in the time window of the P200 component, the correlation in the time window of the N400 component and the coherence and PSI in the time window the LPS component. Moreover, the spatial distributions in these time windows were also different, with the recruitment of frontal sites in the time window of the P200 component, the frontal-central-parietal regions in the time window of the N400 component and the central-parietal sites in the time window of the LPS component. These findings demonstrate that the functional interaction mechanisms are different at different stages of audio-visual integration of polyphonic characters.
Defraene, Bruno; van Waterschoot, Toon; Diehl, Moritz; Moonen, Marc
2016-07-01
Subjective audio quality evaluation experiments have been conducted to assess the performance of embedded-optimization-based precompensation algorithms for mitigating perceptible linear and nonlinear distortion in audio signals. It is concluded with statistical significance that the perceived audio quality is improved by applying an embedded-optimization-based precompensation algorithm, both in case (i) nonlinear distortion and (ii) a combination of linear and nonlinear distortion is present. Moreover, a significant positive correlation is reported between the collected subjective and objective PEAQ audio quality scores, supporting the validity of using PEAQ to predict the impact of linear and nonlinear distortion on the perceived audio quality.
Audio-Tutorial Instruction in Medicine.
ERIC Educational Resources Information Center
Boyle, Gloria J.; Herrick, Merlyn C.
This progress report concerns an audio-tutorial approach used at the University of Missouri-Columbia School of Medicine. Instructional techniques such as slide-tape presentations, compressed speech audio tapes, computer-assisted instruction (CAI), motion pictures, television, microfiche, and graphic and printed materials have been implemented,…
Supervisory Control of Unmanned Vehicles
2010-04-01
than-ideal video quality (Chen et al., 2007; Chen and Thropp, 2007). Simpson et al. (2004) proposed using a spatial audio display to augment UAV...operator’s SA and discussed its utility for each of the three SA levels. They recommended that both visual and spatial audio information should be...presented concurrently. They also suggested that presenting the audio information spatially may enhance UAV operator’s sense of presence (i.e
François, Clément; Cunillera, Toni; Garcia, Enara; Laine, Matti; Rodriguez-Fornells, Antoni
2017-04-01
Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning. Copyright © 2016 Elsevier Ltd. All rights reserved.
Singh, Divya; Samadi, Firoza; Jaiswal, Jn; Tripathi, Abhay Mani
2014-01-01
The purpose of the present study was to evaluate the eff-cacy of 'audio distraction' in anxious pediatric dental patients. Sixty children were randomly selected and equally divided into two groups of thirty each. The first group was control group (group A) and the second group was music group (group B). The dental procedure employed was extraction for both the groups. The children included in music group were allowed to hear audio presentation throughout the treatment procedure. Anxiety was measured by using Venham's picture test, pulse rate, blood pressure and oxygen saturation. 'Audio distraction' was found efficacious in alleviating anxiety of pediatric dental patients. 'Audio distraction' did decrease the anxiety in pediatric patients to a significant extent. How to cite this article: Singh D, Samadi F, Jaiswal JN, Tripathi AM. Stress Reduction through Audio Distraction in Anxious Pediatric Dental Patients: An Adjunctive Clinical Study. Int J Clin Pediatr Dent 2014;7(3):149-152.
One size does not fit all: older adults benefit from redundant text in multimedia instruction
Fenesi, Barbara; Vandermorris, Susan; Kim, Joseph A.; Shore, David I.; Heisz, Jennifer J.
2015-01-01
The multimedia design of presentations typically ignores that younger and older adults have varying cognitive strengths and weaknesses. We examined whether differential instructional design may enhance learning in these populations. Younger and older participants viewed one of three computer-based presentations: Audio only (narration), Redundant (audio narration with redundant text), or Complementary (audio narration with non-redundant text and images). Younger participants learned better when audio narration was paired with relevant images compared to when audio narration was paired with redundant text. However, older participants learned best when audio narration was paired with redundant text. Younger adults, who presumably have a higher working memory capacity (WMC), appear to benefit more from complementary information that may drive deeper conceptual processing. In contrast, older adults learn better from presentations that support redundant coding across modalities, which may help mitigate the effects of age-related decline in WMC. Additionally, several misconceptions of design quality appeared across age groups: both younger and older participants positively rated less effective designs. Findings suggest that one-size does not fit all, with older adults requiring unique multimedia design tailored to their cognitive abilities for effective learning. PMID:26284000
One size does not fit all: older adults benefit from redundant text in multimedia instruction.
Fenesi, Barbara; Vandermorris, Susan; Kim, Joseph A; Shore, David I; Heisz, Jennifer J
2015-01-01
The multimedia design of presentations typically ignores that younger and older adults have varying cognitive strengths and weaknesses. We examined whether differential instructional design may enhance learning in these populations. Younger and older participants viewed one of three computer-based presentations: Audio only (narration), Redundant (audio narration with redundant text), or Complementary (audio narration with non-redundant text and images). Younger participants learned better when audio narration was paired with relevant images compared to when audio narration was paired with redundant text. However, older participants learned best when audio narration was paired with redundant text. Younger adults, who presumably have a higher working memory capacity (WMC), appear to benefit more from complementary information that may drive deeper conceptual processing. In contrast, older adults learn better from presentations that support redundant coding across modalities, which may help mitigate the effects of age-related decline in WMC. Additionally, several misconceptions of design quality appeared across age groups: both younger and older participants positively rated less effective designs. Findings suggest that one-size does not fit all, with older adults requiring unique multimedia design tailored to their cognitive abilities for effective learning.
Audio aided electro-tactile perception training for finger posture biofeedback.
Vargas, Jose Gonzalez; Yu, Wenwei
2008-01-01
Visual information is one of the prerequisites for most biofeedback studies. The aim of this study is to explore how the usage of an audio aided training helps in the learning process of dynamical electro-tactile perception without any visual feedback. In this research, the electrical simulation patterns associated with the experimenter's finger postures and motions were presented to the subjects. Along with the electrical stimulation patterns 2 different types of information, verbal and audio information on finger postures and motions, were presented to the verbal training subject group (group 1) and audio training subject group (group 2), respectively. The results showed an improvement in the ability to distinguish and memorize electrical stimulation patterns correspondent to finger postures and motions without visual feedback, and with audio tones aid, the learning was faster and the perception became more precise after training. Thus, this study clarified that, as a substitution to visual presentation, auditory information could help effectively in the formation of electro-tactile perception. Further research effort needed to make clear the difference between the visual guided and audio aided training in terms of information compilation, post-training effect and robustness of the perception.
[Intermodal timing cues for audio-visual speech recognition].
Hashimoto, Masahiro; Kumashiro, Masaharu
2004-06-01
The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.
Speech information retrieval: a review
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hafen, Ryan P.; Henry, Michael J.
Audio is an information-rich component of multimedia. Information can be extracted from audio in a number of different ways, and thus there are several established audio signal analysis research fields. These fields include speech recognition, speaker recognition, audio segmentation and classification, and audio finger-printing. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major audio analysis fields. The goal is to introduce enough back-ground for someone new in the field to quickly gain high-level understanding andmore » to provide direction for further study.« less
Meyerhoff, Hauke S; Huff, Markus
2016-04-01
Human long-term memory for visual objects and scenes is tremendous. Here, we test how auditory information contributes to long-term memory performance for realistic scenes. In a total of six experiments, we manipulated the presentation modality (auditory, visual, audio-visual) as well as semantic congruency and temporal synchrony between auditory and visual information of brief filmic clips. Our results show that audio-visual clips generally elicit more accurate memory performance than unimodal clips. This advantage even increases with congruent visual and auditory information. However, violations of audio-visual synchrony hardly have any influence on memory performance. Memory performance remained intact even with a sequential presentation of auditory and visual information, but finally declined when the matching tracks of one scene were presented separately with intervening tracks during learning. With respect to memory performance, our results therefore show that audio-visual integration is sensitive to semantic congruency but remarkably robust against asymmetries between different modalities.
Effects of audio-visual presentation of target words in word translation training
NASA Astrophysics Data System (ADS)
Akahane-Yamada, Reiko; Komaki, Ryo; Kubo, Rieko
2004-05-01
Komaki and Akahane-Yamada (Proc. ICA2004) used 2AFC translation task in vocabulary training, in which the target word is presented visually in orthographic form of one language, and the appropriate meaning in another language has to be chosen between two choices. Present paper examined the effect of audio-visual presentation of target word when native speakers of Japanese learn to translate English words into Japanese. Pairs of English words contrasted in several phonemic distinctions (e.g., /r/-/l/, /b/-/v/, etc.) were used as word materials, and presented in three conditions; visual-only (V), audio-only (A), and audio-visual (AV) presentations. Identification accuracy of those words produced by two talkers was also assessed. During pretest, the accuracy for A stimuli was lowest, implying that insufficient translation ability and listening ability interact with each other when aurally presented word has to be translated. However, there was no difference in accuracy between V and AV stimuli, suggesting that participants translate the words depending on visual information only. The effect of translation training using AV stimuli did not transfer to identification ability, showing that additional audio information during translation does not help improve speech perception. Further examination is necessary to determine the effective L2 training method. [Work supported by TAO, Japan.
ERIC Educational Resources Information Center
Virginia State Dept. of Agriculture and Consumer Services, Richmond, VA.
This document is an annotated bibliography of audio-visual aids in the field of consumer education, intended especially for use among low-income, elderly, and handicapped consumers. It was developed to aid consumer education program planners in finding audio-visual resources to enhance their presentations. Materials listed include 293 resources…
NASA Technical Reports Server (NTRS)
1974-01-01
A descriptive handbook for the audio/CTE splitter/interleaver (RCA part No. 8673734-502) was presented. This unit is designed to perform two major functions: extract audio and time data from an interleaved video/audio signal (splitter section), and provide a test interleaved video/audio/CTE signal for the system (interleaver section). It is a rack mounting unit 7 inches high, 19 inches wide, 20 inches deep, mounted on slides for retracting from the rack, and weighs approximately 40 pounds. The following information is provided: installation, operation, principles of operation, maintenance, schematics and parts lists.
LiveDescribe: Can Amateur Describers Create High-Quality Audio Description?
ERIC Educational Resources Information Center
Branje, Carmen J.; Fels, Deborah I.
2012-01-01
Introduction: The study presented here evaluated the usability of the audio description software LiveDescribe and explored the acceptance rates of audio description created by amateur describers who used LiveDescribe to facilitate the creation of their descriptions. Methods: Twelve amateur describers with little or no previous experience with…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kasemir, Kay; Hartman, Steven M
2009-01-01
A new alarm system toolkit has been implemented at SNS. The toolkit handles the Central Control Room (CCR) 'annunciator', or audio alarms. For the new alarm system to be effective, the alarms must be meaningful and properly configured. Along with the implementation of the new alarm toolkit, a thorough documentation and rationalization of the alarm configuration is taking place. Requirements and maintenance of a robust alarm configuration have been gathered from system and operations experts. In this paper we present our practical experience with the vacuum system alarm handling configuration of the alarm toolkit.
ERIC Educational Resources Information Center
Ludlow, Barbara L.; Foshay, John B.; Duff, Michael C.
Video presentations of teaching episodes in home, school, and community settings and audio recordings of parents' and professionals' views can be important adjuncts to personnel preparation in special education. This paper describes instructional applications of digital media and outlines steps in producing audio and video segments. Digital audio…
Effects of Audio-Visual Information on the Intelligibility of Alaryngeal Speech
ERIC Educational Resources Information Center
Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline
2010-01-01
Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…
Guidelines for the Production of Audio Materials for Print Handicapped Readers.
ERIC Educational Resources Information Center
National Library of Australia, Canberra.
Procedural guidelines developed by the Audio Standards Committee of the National Library of Australia to help improve the overall quality of production of audio materials for visually handicapped readers are presented. This report covers the following areas: selection of narrators and the narration itself; copyright; recording of books, magazines,…
Advances in audio source seperation and multisource audio content retrieval
NASA Astrophysics Data System (ADS)
Vincent, Emmanuel
2012-06-01
Audio source separation aims to extract the signals of individual sound sources from a given recording. In this paper, we review three recent advances which improve the robustness of source separation in real-world challenging scenarios and enable its use for multisource content retrieval tasks, such as automatic speech recognition (ASR) or acoustic event detection (AED) in noisy environments. We present a Flexible Audio Source Separation Toolkit (FASST) and discuss its advantages compared to earlier approaches such as independent component analysis (ICA) and sparse component analysis (SCA). We explain how cues as diverse as harmonicity, spectral envelope, temporal fine structure or spatial location can be jointly exploited by this toolkit. We subsequently present the uncertainty decoding (UD) framework for the integration of audio source separation and audio content retrieval. We show how the uncertainty about the separated source signals can be accurately estimated and propagated to the features. Finally, we explain how this uncertainty can be efficiently exploited by a classifier, both at the training and the decoding stage. We illustrate the resulting performance improvements in terms of speech separation quality and speaker recognition accuracy.
Summarizing Audiovisual Contents of a Video Program
NASA Astrophysics Data System (ADS)
Gong, Yihong
2003-12-01
In this paper, we focus on video programs that are intended to disseminate information and knowledge such as news, documentaries, seminars, etc, and present an audiovisual summarization system that summarizes the audio and visual contents of the given video separately, and then integrating the two summaries with a partial alignment. The audio summary is created by selecting spoken sentences that best present the main content of the audio speech while the visual summary is created by eliminating duplicates/redundancies and preserving visually rich contents in the image stream. The alignment operation aims to synchronize each spoken sentence in the audio summary with its corresponding speaker's face and to preserve the rich content in the visual summary. A Bipartite Graph-based audiovisual alignment algorithm is developed to efficiently find the best alignment solution that satisfies these alignment requirements. With the proposed system, we strive to produce a video summary that: (1) provides a natural visual and audio content overview, and (2) maximizes the coverage for both audio and visual contents of the original video without having to sacrifice either of them.
Cross-Modal Matching of Audio-Visual German and French Fluent Speech in Infancy
Kubicek, Claudia; Hillairet de Boisferon, Anne; Dupierrix, Eve; Pascalis, Olivier; Lœvenbruck, Hélène; Gervain, Judit; Schwarzer, Gudrun
2014-01-01
The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants’ audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matching native as well as non-native audio and visual speech stimuli, whereas 6-month-olds perceived the audio-visual correspondence of native language stimuli only. This suggests that intersensory matching narrows for fluent speech between 4.5 and 6 months of age. In Experiment 2, auditory and visual speech information was presented simultaneously, therefore, providing temporal synchrony cues. Here, 6-month-olds were found to match native as well as non-native speech indicating facilitation of temporal synchrony cues on the intersensory perception of non-native fluent speech. Intriguingly, despite the fact that audio and visual stimuli cohered temporally, 12-month-olds matched the non-native language only. Results were discussed with regard to multisensory perceptual narrowing during the first year of life. PMID:24586651
Power in telephone-advice nursing.
Leppänen, Vesa
2010-03-01
Power is a central aspect of nursing, especially in telephone-advice nursing, where nurses assess callers' medical problems and decide what measures that need to be taken. This article presents a framework for understanding how power operates in social interaction between nurses and callers in telephone-advice nursing in primary care in Sweden. Power is analysed as the result of nurses and callers being oriented to five social structures that are relevant to their actions in this context, namely the organization of telephone-advice nursing, the social stock of medical knowledge, the professional division of labour between nurses and doctors, structures of social interaction and structures of emotions. While structural constraints govern some actions to a high degree, calls take place in an organizational free room that give nurses more leeway for acting more creatively. The discussion focuses on the introduction of new technologies of control, for instance computerized decision support systems and audio recording of calls, and on how they reduce the free room. Empirical data consist of 276 audio-recorded telephone calls to 13 nurses at six primary-care centres and of qualitative interviews with 18 nurses.
NASA Astrophysics Data System (ADS)
Barbieri, Ivano; Lambruschini, Paolo; Raggio, Marco; Stagnaro, Riccardo
2007-12-01
The increase in the availability of bandwidth for wireless links, network integration, and the computational power on fixed and mobile platforms at affordable costs allows nowadays for the handling of audio and video data, their quality making them suitable for medical application. These information streams can support both continuous monitoring and emergency situations. According to this scenario, the authors have developed and implemented the mobile communication system which is described in this paper. The system is based on ITU-T H.323 multimedia terminal recommendation, suitable for real-time data/video/audio and telemedical applications. The audio and video codecs, respectively, H.264 and G723.1, were implemented and optimized in order to obtain high performance on the system target processors. Offline media streaming storage and retrieval functionalities were supported by integrating a relational database in the hospital central system. The system is based on low-cost consumer technologies such as general packet radio service (GPRS) and wireless local area network (WLAN or WiFi) for lowband data/video transmission. Implementation and testing were carried out for medical emergency and telemedicine application. In this paper, the emergency case study is described.
van Hoesel, Richard J M
2015-04-01
One of the key benefits of using cochlear implants (CIs) in both ears rather than just one is improved localization. It is likely that in complex listening scenes, improved localization allows bilateral CI users to orient toward talkers to improve signal-to-noise ratios and gain access to visual cues, but to date, that conjecture has not been tested. To obtain an objective measure of that benefit, seven bilateral CI users were assessed for both auditory-only and audio-visual speech intelligibility in noise using a novel dynamic spatial audio-visual test paradigm. For each trial conducted in spatially distributed noise, first, an auditory-only cueing phrase that was spoken by one of four talkers was selected and presented from one of four locations. Shortly afterward, a target sentence was presented that was either audio-visual or, in another test configuration, audio-only and was spoken by the same talker and from the same location as the cueing phrase. During the target presentation, visual distractors were added at other spatial locations. Results showed that in terms of speech reception thresholds (SRTs), the average improvement for bilateral listening over the better performing ear alone was 9 dB for the audio-visual mode, and 3 dB for audition-alone. Comparison of bilateral performance for audio-visual and audition-alone showed that inclusion of visual cues led to an average SRT improvement of 5 dB. For unilateral device use, no such benefit arose, presumably due to the greatly reduced ability to localize the target talker to acquire visual information. The bilateral CI speech intelligibility advantage over the better ear in the present study is much larger than that previously reported for static talker locations and indicates greater everyday speech benefits and improved cost-benefit than estimated to date.
Active Learning in the Online Environment: The Integration of Student-Generated Audio Files
ERIC Educational Resources Information Center
Bolliger, Doris U.; Armier, David Des, Jr.
2013-01-01
Educators have integrated instructor-produced audio files in a variety of settings and environments for purposes such as content presentation, lecture reviews, student feedback, and so forth. Few instructors, however, require students to produce audio files and share them with peers. The purpose of this study was to obtain empirical data on…
Podcasting by Synchronising PowerPoint and Voice: What Are the Pedagogical Benefits?
ERIC Educational Resources Information Center
Griffin, Darren K.; Mitchell, David; Thompson, Simon J.
2009-01-01
The purpose of this study was to investigate the efficacy of audio-visual synchrony in podcasting and its possible pedagogical benefits. "Synchrony" in this study refers to the simultaneous playback of audio and video data streams, so that the transitions between presentation slides occur at "lecturer chosen" points in the audio commentary.…
ERIC Educational Resources Information Center
Fawcett, Hannah; Oldfield, Jeremy
2016-01-01
Previous research suggests that audio feedback may be an important mechanism for facilitating effective and timely assignment feedback. The present study examined expectations and experiences of audio and written feedback provided through "turnitin for iPad®" from students within the same cohort and assignment. The results showed that…
Audio-Visual Communications, A Tool for the Professional
ERIC Educational Resources Information Center
Journal of Environmental Health, 1976
1976-01-01
The manner in which the Cuyahoga County, Ohio Department of Environmental Health utilizes audio-visual presentations for communication with business and industry, professional public health agencies and the general public is presented. Subjects including food sanitation, radiation protection and safety are described. (BT)
Interactive MPEG-4 low-bit-rate speech/audio transmission over the Internet
NASA Astrophysics Data System (ADS)
Liu, Fang; Kim, JongWon; Kuo, C.-C. Jay
1999-11-01
The recently developed MPEG-4 technology enables the coding and transmission of natural and synthetic audio-visual data in the form of objects. In an effort to extend the object-based functionality of MPEG-4 to real-time Internet applications, architectural prototypes of multiplex layer and transport layer tailored for transmission of MPEG-4 data over IP are under debate among Internet Engineering Task Force (IETF), and MPEG-4 systems Ad Hoc group. In this paper, we present an architecture for interactive MPEG-4 speech/audio transmission system over the Internet. It utilities a framework of Real Time Streaming Protocol (RTSP) over Real-time Transport Protocol (RTP) to provide controlled, on-demand delivery of real time speech/audio data. Based on a client-server model, a couple of low bit-rate bit streams (real-time speech/audio, pre- encoded speech/audio) are multiplexed and transmitted via a single RTP channel to the receiver. The MPEG-4 Scene Description (SD) and Object Descriptor (OD) bit streams are securely sent through the RTSP control channel. Upon receiving, an initial MPEG-4 audio- visual scene is constructed after de-multiplexing, decoding of bit streams, and scene composition. A receiver is allowed to manipulate the initial audio-visual scene presentation locally, or interactively arrange scene changes by sending requests to the server. A server may also choose to update the client with new streams and list of contents for user selection.
Robot Command Interface Using an Audio-Visual Speech Recognition System
NASA Astrophysics Data System (ADS)
Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
Nirme, Jens; Haake, Magnus; Lyberg Åhlander, Viveka; Brännström, Jonas; Sahlén, Birgitta
2018-04-05
Seeing a speaker's face facilitates speech recognition, particularly under noisy conditions. Evidence for how it might affect comprehension of the content of the speech is more sparse. We investigated how children's listening comprehension is affected by multi-talker babble noise, with or without presentation of a digitally animated virtual speaker, and whether successful comprehension is related to performance on a test of executive functioning. We performed a mixed-design experiment with 55 (34 female) participants (8- to 9-year-olds), recruited from Swedish elementary schools. The children were presented with four different narratives, each in one of four conditions: audio-only presentation in a quiet setting, audio-only presentation in noisy setting, audio-visual presentation in a quiet setting, and audio-visual presentation in a noisy setting. After each narrative, the children answered questions on the content and rated their perceived listening effort. Finally, they performed a test of executive functioning. We found significantly fewer correct answers to explicit content questions after listening in noise. This negative effect was only mitigated to a marginally significant degree by audio-visual presentation. Strong executive function only predicted more correct answers in quiet settings. Altogether, our results are inconclusive regarding how seeing a virtual speaker affects listening comprehension. We discuss how methodological adjustments, including modifications to our virtual speaker, can be used to discriminate between possible explanations to our results and contribute to understanding the listening conditions children face in a typical classroom.
Description of Audio-Visual Recording Equipment and Method of Installation for Pilot Training.
ERIC Educational Resources Information Center
Neese, James A.
The Audio-Video Recorder System was developed to evaluate the effectiveness of in-flight audio/video recording as a pilot training technique for the U.S. Air Force Pilot Training Program. It will be used to gather background and performance data for an experimental program. A detailed description of the system is presented and construction and…
Deutsch Durch Audio-Visuelle Methode: An Audio-Lingual-Oral Approach to the Teaching of German.
ERIC Educational Resources Information Center
Dickinson Public Schools, ND. Instructional Media Center.
This teaching guide, designed to accompany Chilton's "Deutsch Durch Audio-Visuelle Methode" for German 1 and 2 in a three-year secondary school program, focuses major attention on the operational plan of the program and a student orientation unit. A section on teaching a unit discusses four phases: (1) presentation, (2) explanation, (3)…
Huffman coding in advanced audio coding standard
NASA Astrophysics Data System (ADS)
Brzuchalski, Grzegorz
2012-05-01
This article presents several hardware architectures of Advanced Audio Coding (AAC) Huffman noiseless encoder, its optimisations and working implementation. Much attention has been paid to optimise the demand of hardware resources especially memory size. The aim of design was to get as short binary stream as possible in this standard. The Huffman encoder with whole audio-video system has been implemented in FPGA devices.
ERIC Educational Resources Information Center
Bosma and Associates International, Seattle, WA.
This final report presents an independent formative and summative evaluation of the National Library Services for the Blind and Physically Handicapped (NLS/BPH) braille and audio magazine program. In this program, 77 magazines are distributed directly to subscribers, with 43 magazines available on audio flexible discs and 34 magazines available in…
ERIC Educational Resources Information Center
Carter, Curtis W.
2012-01-01
This article contends that instructional designers and developers should attend to four particular design principles when creating instructional audio. Support for this view is presented by referencing the limited research that has been done in this area, and by indicating how and why each of the four principles is important to the design process.…
Reasons to Rethink the Use of Audio and Video Lectures in Online Courses
ERIC Educational Resources Information Center
Stetz, Thomas A.; Bauman, Antonina A.
2013-01-01
Recent technological developments allow any instructor to create audio and video lectures for the use in online classes. However, it is questionable if it is worth the time and effort that faculty put into preparing those lectures. This paper presents thirteen factors that should be considered before preparing and using audio and video lectures in…
ERIC Educational Resources Information Center
Inceçay, Volkan; Koçoglu, Zeynep
2017-01-01
The present study examined whether or not different input delivery modes have an effect on listening comprehension of Turkish students learning English at the university level. It investigated the effect of one single mode, which is audio-only, and three dual input delivery modes, which were audio-video, audio-video with target language subtitles…
High capacity reversible watermarking for audio by histogram shifting and predicted error expansion.
Wang, Fei; Xie, Zhaoxin; Chen, Zuo
2014-01-01
Being reversible, the watermarking information embedded in audio signals can be extracted while the original audio data can achieve lossless recovery. Currently, the few reversible audio watermarking algorithms are confronted with following problems: relatively low SNR (signal-to-noise) of embedded audio; a large amount of auxiliary embedded location information; and the absence of accurate capacity control capability. In this paper, we present a novel reversible audio watermarking scheme based on improved prediction error expansion and histogram shifting. First, we use differential evolution algorithm to optimize prediction coefficients and then apply prediction error expansion to output stego data. Second, in order to reduce location map bits length, we introduced histogram shifting scheme. Meanwhile, the prediction error modification threshold according to a given embedding capacity can be computed by our proposed scheme. Experiments show that this algorithm improves the SNR of embedded audio signals and embedding capacity, drastically reduces location map bits length, and enhances capacity control capability.
Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech.
Alm, Magnus; Behne, Dawn
2013-10-01
Previous research indicates that perception of audio-visual (AV) synchrony changes in adulthood. Possible explanations for these age differences include a decline in hearing acuity, a decline in cognitive processing speed, and increased experience with AV binding. The current study aims to isolate the effect of AV experience by comparing synchrony judgments from 20 young adults (20 to 30 yrs) and 20 normal-hearing middle-aged adults (50 to 60 yrs), an age range for which a decline of cognitive processing speed is expected to be minimal. When presented with AV stop consonant syllables with asynchronies ranging from 440 ms audio-lead to 440 ms visual-lead, middle-aged adults showed significantly less tolerance for audio-lead than young adults. Middle-aged adults also showed a greater shift in their point of subjective simultaneity than young adults. Natural audio-lead asynchronies are arguably more predictable than natural visual-lead asynchronies, and this predictability may render audio-lead thresholds more prone to experience-related fine-tuning.
WebGL and web audio software lightweight components for multimedia education
NASA Astrophysics Data System (ADS)
Chang, Xin; Yuksel, Kivanc; Skarbek, Władysław
2017-08-01
The paper presents the results of our recent work on development of contemporary computing platform DC2 for multimedia education usingWebGL andWeb Audio { the W3C standards. Using literate programming paradigm the WEBSA educational tools were developed. It offers for a user (student), the access to expandable collection of WEBGL Shaders and web Audio scripts. The unique feature of DC2 is the option of literate programming, offered for both, the author and the reader in order to improve interactivity to lightweightWebGL andWeb Audio components. For instance users can define: source audio nodes including synthetic sources, destination audio nodes, and nodes for audio processing such as: sound wave shaping, spectral band filtering, convolution based modification, etc. In case of WebGL beside of classic graphics effects based on mesh and fractal definitions, the novel image processing analysis by shaders is offered like nonlinear filtering, histogram of gradients, and Bayesian classifiers.
The modality and redundancy effects in multimedia learning in children with dyslexia.
Knoop-van Campen, Carolien A N; Segers, Eliane; Verhoeven, Ludo
2018-05-01
The present study aimed to examine the modality and redundancy effects in multimedia learning in children with dyslexia in order to find out whether their learning benefits from written and/or spoken text with pictures. We compared study time and knowledge gain in 26 11-year-old children with dyslexia and 38 typically reading peers in a within-subjects design. All children were presented with a series of user-paced multimedia lessons in 3 conditions: pictorial information presented with (a) written text, (b) audio, or (c) combined text and audio. We also examined whether children's learning outcomes were related to their working memory. With respect to study time, we found modality and reversed redundancy effects. Children with dyslexia spent more time learning in the text condition, compared with the audio condition and the combined text-and-audio condition. Regarding knowledge gain, no modality or redundancy effects were evidenced. Although the groups differed on working memory, it did not influence the modality or redundancy effect on study time or knowledge gain. In multimedia learning, it thus is more efficient to provide children with dyslexia with audio or with auditory support. Copyright © 2018 John Wiley & Sons, Ltd.
Detecting double compression of audio signal
NASA Astrophysics Data System (ADS)
Yang, Rui; Shi, Yun Q.; Huang, Jiwu
2010-01-01
MP3 is the most popular audio format nowadays in our daily life, for example music downloaded from the Internet and file saved in the digital recorder are often in MP3 format. However, low bitrate MP3s are often transcoded to high bitrate since high bitrate ones are of high commercial value. Also audio recording in digital recorder can be doctored easily by pervasive audio editing software. This paper presents two methods for the detection of double MP3 compression. The methods are essential for finding out fake-quality MP3 and audio forensics. The proposed methods use support vector machine classifiers with feature vectors formed by the distributions of the first digits of the quantized MDCT (modified discrete cosine transform) coefficients. Extensive experiments demonstrate the effectiveness of the proposed methods. To the best of our knowledge, this piece of work is the first one to detect double compression of audio signal.
Design of an audio advertisement dataset
NASA Astrophysics Data System (ADS)
Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting
2015-12-01
Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.
News video story segmentation method using fusion of audio-visual features
NASA Astrophysics Data System (ADS)
Wen, Jun; Wu, Ling-da; Zeng, Pu; Luan, Xi-dao; Xie, Yu-xiang
2007-11-01
News story segmentation is an important aspect for news video analysis. This paper presents a method for news video story segmentation. Different form prior works, which base on visual features transform, the proposed technique uses audio features as baseline and fuses visual features with it to refine the results. At first, it selects silence clips as audio features candidate points, and selects shot boundaries and anchor shots as two kinds of visual features candidate points. Then this paper selects audio feature candidates as cues and develops different fusion method, which effectively using diverse type visual candidates to refine audio candidates, to get story boundaries. Experiment results show that this method has high efficiency and adaptability to different kinds of news video.
ERIC Educational Resources Information Center
Kirkwood, Adrian
The first of two papers in this report, "The Present and the Future of Audio-Visual Production Centres in Distance Universities," describes changes in the Open University in Great Britain. The Open University's use of television and audio materials are increasingly being distributed to students on cassette. Although transmission is still…
Non-invasive distress evaluation in preterm newborn infants.
Manfredi, C; Bocchi, L; Orlandi, S; Calisti, M; Spaccaterra, L; Donzelli, G P
2008-01-01
With the increased survival of very preterm infants, there is a growing concern for their developmental outcomes. Infant cry characteristics reflect the development and possibly the integrity of the central nervous system. In this paper, relationships between fundamental frequency (F(0)) and vocal tract resonance frequencies (F(1)-F(3)) are investigated for a set of preterm newborns, by means of a multi-purpose voice analysis tool (BioVoice), characterised by high-resolution and tracking capabilities. Also, first results about possible distress occurring during cry in preterm newborn infants, as related to the decrease of central blood oxygenation, are presented. To this aim, a recording system (Newborn Recorder) has been developed, that allows synchronised, non-invasive monitoring of blood oxygenation and audio recordings of newborn infant's cry. The method has been applied to preterm newborns at the Intensive Care Unit, A.Meyer Children Hospital, Firenze, Italy.
ERIC Educational Resources Information Center
Medical Care Development, Inc., Augusta, ME.
The Aroostook County Telecommunications System is a slow-scan television network connecting five health institutions in the county with the Central Maine Interactive Telecommunications System (CMITS). The Aroostook system allows both audio conferencing and still-image videoconferencing among the Aroostook participants, and provides for an…
Influence of audio triggered emotional attention on video perception
NASA Astrophysics Data System (ADS)
Torres, Freddy; Kalva, Hari
2014-02-01
Perceptual video coding methods attempt to improve compression efficiency by discarding visual information not perceived by end users. Most of the current approaches for perceptual video coding only use visual features ignoring the auditory component. Many psychophysical studies have demonstrated that auditory stimuli affects our visual perception. In this paper we present our study of audio triggered emotional attention and it's applicability to perceptual video coding. Experiments with movie clips show that the reaction time to detect video compression artifacts was longer when video was presented with the audio information. The results reported are statistically significant with p=0.024.
Effect of Audio Coaching on Correlation of Abdominal Displacement With Lung Tumor Motion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakamura, Mitsuhiro; Narita, Yuichiro; Matsuo, Yukinori
2009-10-01
Purpose: To assess the effect of audio coaching on the time-dependent behavior of the correlation between abdominal motion and lung tumor motion and the corresponding lung tumor position mismatches. Methods and Materials: Six patients who had a lung tumor with a motion range >8 mm were enrolled in the present study. Breathing-synchronized fluoroscopy was performed initially without audio coaching, followed by fluoroscopy with recorded audio coaching for multiple days. Two different measurements, anteroposterior abdominal displacement using the real-time positioning management system and superoinferior (SI) lung tumor motion by X-ray fluoroscopy, were performed simultaneously. Their sequential images were recorded using onemore » display system. The lung tumor position was automatically detected with a template matching technique. The relationship between the abdominal and lung tumor motion was analyzed with and without audio coaching. Results: The mean SI tumor displacement was 10.4 mm without audio coaching and increased to 23.0 mm with audio coaching (p < .01). The correlation coefficients ranged from 0.89 to 0.97 with free breathing. Applying audio coaching, the correlation coefficients improved significantly (range, 0.93-0.99; p < .01), and the SI lung tumor position mismatches became larger in 75% of all sessions. Conclusion: Audio coaching served to increase the degree of correlation and make it more reproducible. In addition, the phase shifts between tumor motion and abdominal displacement were improved; however, all patients breathed more deeply, and the SI lung tumor position mismatches became slightly larger with audio coaching than without audio coaching.« less
Using Cinematic Techniques in a Multimedia Museum Guide.
ERIC Educational Resources Information Center
Zancanaro, M.; Stock, O.; Alfaro, I.
This paper introduces the idea of enhancing the audio presentation of a multimedia museum guide by using the PDA screen to travel throughout a fresco and identify the various details in it. During the presentation, a sequence of pictures is synchronized with the audio commentary, and the transitions among the pictures are planned according to…
Jia, Lina; Shi, Zhuanghua; Zang, Xuelian; Müller, Hermann J
2013-11-06
Although attention can be captured toward high-arousal stimuli, little is known about how perceiving emotion in one modality influences the temporal processing of non-emotional stimuli in other modalities. We addressed this issue by presenting observers spatially uninformative emotional pictures while they performed an audio-tactile temporal-order judgment (TOJ) task. In Experiment 1, audio-tactile stimuli were presented at the same location straight ahead of the participants, who had to judge "which modality came first?". In Experiments 2 and 3, the audio-tactile stimuli were delivered one to the left and the other to the right side, and participants had to judge "which side came first?". We found both negative and positive high-arousal pictures to significantly bias TOJs towards the tactile and away from the auditory event when the audio-tactile stimuli were spatially separated; by contrast, there was no such bias when the audio-tactile stimuli originated from the same location. To further examine whether this bias is attributable to the emotional meanings conveyed by the pictures or to their high arousal effect, we compared and contrasted the influences of near-body threat vs. remote threat (emotional) pictures on audio-tactile TOJs in Experiment 3. The bias manifested only in the near-body threat condition. Taken together, the findings indicate that visual stimuli conveying meanings of near-body interaction activate a sensorimotor functional link prioritizing the processing of tactile over auditory signals when these signals are spatially separated. In contrast, audio-tactile signals from the same location engender strong crossmodal integration, thus counteracting modality-based attentional shifts induced by the emotional pictures. © 2013 Published by Elsevier B.V.
ERIC Educational Resources Information Center
Dain, Bernice, Comp.; Nevin, David, Comp.
The present revised and expanded edition of this document is an inclusive cumulation. A few items have been included which are on order as new to the collection or as replacements. This discography is intended to serve primarily as a local user's guide. The call number preceding each entry is based on the Audio-Visual Department's own, unique…
When the third party observer of a neuropsychological evaluation is an audio-recorder.
Constantinou, Marios; Ashendorf, Lee; McCaffrey, Robert J
2002-08-01
The presence of third parties during neuropsychological evaluations is an issue of concern for contemporary neuropsychologists. Previous studies have reported that the presence of an observer during neuropsychological testing alters the performance of individuals under evaluation. The present study sought to investigate whether audio-recording affects the neuropsychological test performance of individuals in the same way that third party observation does. In the presence of an audio-recorder the performance of the participants on memory tests declined. Performance on motor tests, on the other hand, was not affected by the presence of an audio-recorder. The implications of these findings in forensic neuropsychological evaluations are discussed.
The Introduction and Refinement of the Assessment of Digitally Recorded Audio Presentations
ERIC Educational Resources Information Center
Sinclair, Stefanie
2016-01-01
This case study critically evaluates benefits and challenges of a form of assessment included in a final year undergraduate Religious Studies Open University module, which combines a written essay task with a digital audio recording of a short oral presentation. Based on the analysis of student and tutor feedback and sample assignments, this study…
ERIC Educational Resources Information Center
Kim, Young-Suk Grace
2016-01-01
Purpose: The primary aim of the present study was to examine whether different ways of presenting narrative stimuli (i.e., live narrative stimuli versus audio-recorded narrative stimuli) influence children's performances on narrative comprehension and oral-retell quality. Method: Children in kindergarten (n = 54), second grade (n = 74), and fourth…
ERIC Educational Resources Information Center
Kim, Young-Suk Grace
2016-01-01
Purpose: The primary aim of the present study was to examine whether different ways of presenting narrative stimuli (i.e., live narrative stimuli versus audio-recorded narrative stimuli) influence children's performances on narrative comprehension and oral-retell quality. Method: Children in kindergarten (n = 54), second grade (n = 74), and fourth…
Effects of aging on audio-visual speech integration.
Huyse, Aurélie; Leybaert, Jacqueline; Berthommier, Frédéric
2014-10-01
This study investigated the impact of aging on audio-visual speech integration. A syllable identification task was presented in auditory-only, visual-only, and audio-visual congruent and incongruent conditions. Visual cues were either degraded or unmodified. Stimuli were embedded in stationary noise alternating with modulated noise. Fifteen young adults and 15 older adults participated in this study. Results showed that older adults had preserved lipreading abilities when the visual input was clear but not when it was degraded. The impact of aging on audio-visual integration also depended on the quality of the visual cues. In the visual clear condition, the audio-visual gain was similar in both groups and analyses in the framework of the fuzzy-logical model of perception confirmed that older adults did not differ from younger adults in their audio-visual integration abilities. In the visual reduction condition, the audio-visual gain was reduced in the older group, but only when the noise was stationary, suggesting that older participants could compensate for the loss of lipreading abilities by using the auditory information available in the valleys of the noise. The fuzzy-logical model of perception confirmed the significant impact of aging on audio-visual integration by showing an increased weight of audition in the older group.
Laboratory and in-flight experiments to evaluate 3-D audio display technology
NASA Technical Reports Server (NTRS)
Ericson, Mark; Mckinley, Richard; Kibbe, Marion; Francis, Daniel
1994-01-01
Laboratory and in-flight experiments were conducted to evaluate 3-D audio display technology for cockpit applications. A 3-D audio display generator was developed which digitally encodes naturally occurring direction information onto any audio signal and presents the binaural sound over headphones. The acoustic image is stabilized for head movement by use of an electromagnetic head-tracking device. In the laboratory, a 3-D audio display generator was used to spatially separate competing speech messages to improve the intelligibility of each message. Up to a 25 percent improvement in intelligibility was measured for spatially separated speech at high ambient noise levels (115 dB SPL). During the in-flight experiments, pilots reported that spatial separation of speech communications provided a noticeable improvement in intelligibility. The use of 3-D audio for target acquisition was also investigated. In the laboratory, 3-D audio enabled the acquisition of visual targets in about two seconds average response time at 17 degrees accuracy. During the in-flight experiments, pilots correctly identified ground targets 50, 75, and 100 percent of the time at separation angles of 12, 20, and 35 degrees, respectively. In general, pilot performance in the field with the 3-D audio display generator was as expected, based on data from laboratory experiments.
2001-05-01
audio-visual aids. Rapid correction methods of the pilot’s performance capacity: * psychosomatic self-management; * rational psychotherapy; * music ... therapy ; * central nervous system (CNS) electro-tranquilization; * sauna; * hydrotherapy; * manual therapy; 10-3 * recreational therapy (active rest
Lexicality Drives Audio-Motor Transformations in Broca's Area
ERIC Educational Resources Information Center
Kotz, S. A.; D'Ausilio, A.; Raettig, T.; Begliomini, C.; Craighero, L.; Fabbri-Destro, M.; Zingales, C.; Haggard, P.; Fadiga, L.
2010-01-01
Broca's area is classically associated with speech production. Recently, Broca's area has also been implicated in speech perception and non-linguistic information processing. With respect to the latter function, Broca's area is considered to be a central area in a network constituting the human mirror system, which maps observed or heard actions…
[Ventriloquism and audio-visual integration of voice and face].
Yokosawa, Kazuhiko; Kanaya, Shoko
2012-07-01
Presenting synchronous auditory and visual stimuli in separate locations creates the illusion that the sound originates from the direction of the visual stimulus. Participants' auditory localization bias, called the ventriloquism effect, has revealed factors affecting the perceptual integration of audio-visual stimuli. However, many studies on audio-visual processes have focused on performance in simplified experimental situations, with a single stimulus in each sensory modality. These results cannot necessarily explain our perceptual behavior in natural scenes, where various signals exist within a single sensory modality. In the present study we report the contributions of a cognitive factor, that is, the audio-visual congruency of speech, although this factor has often been underestimated in previous ventriloquism research. Thus, we investigated the contribution of speech congruency on the ventriloquism effect using a spoken utterance and two videos of a talking face. The salience of facial movements was also manipulated. As a result, when bilateral visual stimuli are presented in synchrony with a single voice, cross-modal speech congruency was found to have a significant impact on the ventriloquism effect. This result also indicated that more salient visual utterances attracted participants' auditory localization. The congruent pairing of audio-visual utterances elicited greater localization bias than did incongruent pairing, whereas previous studies have reported little dependency on the reality of stimuli in ventriloquism. Moreover, audio-visual illusory congruency, owing to the McGurk effect, caused substantial visual interference to auditory localization. This suggests that a greater flexibility in responding to multi-sensory environments exists than has been previously considered.
Audio-visual affective expression recognition
NASA Astrophysics Data System (ADS)
Huang, Thomas S.; Zeng, Zhihong
2007-11-01
Automatic affective expression recognition has attracted more and more attention of researchers from different disciplines, which will significantly contribute to a new paradigm for human computer interaction (affect-sensitive interfaces, socially intelligent environments) and advance the research in the affect-related fields including psychology, psychiatry, and education. Multimodal information integration is a process that enables human to assess affective states robustly and flexibly. In order to understand the richness and subtleness of human emotion behavior, the computer should be able to integrate information from multiple sensors. We introduce in this paper our efforts toward machine understanding of audio-visual affective behavior, based on both deliberate and spontaneous displays. Some promising methods are presented to integrate information from both audio and visual modalities. Our experiments show the advantage of audio-visual fusion in affective expression recognition over audio-only or visual-only approaches.
Integrated approach to multimodal media content analysis
NASA Astrophysics Data System (ADS)
Zhang, Tong; Kuo, C.-C. Jay
1999-12-01
In this work, we present a system for the automatic segmentation, indexing and retrieval of audiovisual data based on the combination of audio, visual and textural content analysis. The video stream is demultiplexed into audio, image and caption components. Then, a semantic segmentation of the audio signal based on audio content analysis is conducted, and each segment is indexed as one of the basic audio types. The image sequence is segmented into shots based on visual information analysis, and keyframes are extracted from each shot. Meanwhile, keywords are detected from the closed caption. Index tables are designed for both linear and non-linear access to the video. It is shown by experiments that the proposed methods for multimodal media content analysis are effective. And that the integrated framework achieves satisfactory results for video information filtering and retrieval.
Ultra-low-cost clinical pulse oximetry.
Petersen, Christian L; Gan, Heng; MacInnis, Martin J; Dumont, Guy A; Ansermino, J Mark
2013-01-01
An ultra-low-cost pulse oximeter is presented that interfaces a conventional clinical finger sensor with a mobile phone through the headset jack audio interface. All signal processing is performed using the audio subsystem of the phone. In a preliminary volunteer study in a hypoxia chamber, we compared the oxygen saturation obtained with the audio pulse oximeter against a commercially available (and FDA approved) reference pulse oximeter (Nonin Xpod). Good agreement was found between the outputs of the two devices.
DAVE: A plug and play model for distributed multimedia application development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mines, R.F.; Friesen, J.A.; Yang, C.L.
1994-07-01
This paper presents a model being used for the development of distributed multimedia applications. The Distributed Audio Video Environment (DAVE) was designed to support the development of a wide range of distributed applications. The implementation of this model is described. DAVE is unique in that it combines a simple ``plug and play`` programming interface, supports both centralized and fully distributed applications, provides device and media extensibility, promotes object reuseability, and supports interoperability and network independence. This model enables application developers to easily develop distributed multimedia applications and create reusable multimedia toolkits. DAVE was designed for developing applications such as videomore » conferencing, media archival, remote process control, and distance learning.« less
Engel, Annerose; Bangert, Marc; Horbank, David; Hijmans, Brenda S; Wilkens, Katharina; Keller, Peter E; Keysers, Christian
2012-11-01
To investigate the cross-modal transfer of movement patterns necessary to perform melodies on the piano, 22 non-musicians learned to play short sequences on a piano keyboard by (1) merely listening and replaying (vision of own fingers occluded) or (2) merely observing silent finger movements and replaying (on a silent keyboard). After training, participants recognized with above chance accuracy (1) audio-motor learned sequences upon visual presentation (89±17%), and (2) visuo-motor learned sequences upon auditory presentation (77±22%). The recognition rates for visual presentation significantly exceeded those for auditory presentation (p<.05). fMRI revealed that observing finger movements corresponding to audio-motor trained melodies is associated with stronger activation in the left rolandic operculum than observing untrained sequences. This region was also involved in silent execution of sequences, suggesting that a link to motor representations may play a role in cross-modal transfer from audio-motor training condition to visual recognition. No significant differences in brain activity were found during listening to visuo-motor trained compared to untrained melodies. Cross-modal transfer was stronger from the audio-motor training condition to visual recognition and this is discussed in relation to the fact that non-musicians are familiar with how their finger movements look (motor-to-vision transformation), but not with how they sound on a piano (motor-to-sound transformation). Copyright © 2012 Elsevier Inc. All rights reserved.
Bimodal emotion congruency is critical to preverbal infants' abstract rule learning.
Tsui, Angeline Sin Mei; Ma, Yuen Ki; Ho, Anna; Chow, Hiu Mei; Tseng, Chia-huei
2016-05-01
Extracting general rules from specific examples is important, as we must face the same challenge displayed in various formats. Previous studies have found that bimodal presentation of grammar-like rules (e.g. ABA) enhanced 5-month-olds' capacity to acquire a rule that infants failed to learn when the rule was presented with visual presentation of the shapes alone (circle-triangle-circle) or auditory presentation of the syllables (la-ba-la) alone. However, the mechanisms and constraints for this bimodal learning facilitation are still unknown. In this study, we used audio-visual relation congruency between bimodal stimulation to disentangle possible facilitation sources. We exposed 8- to 10-month-old infants to an AAB sequence consisting of visual faces with affective expressions and/or auditory voices conveying emotions. Our results showed that infants were able to distinguish the learned AAB rule from other novel rules under bimodal stimulation when the affects in audio and visual stimuli were congruently paired (Experiments 1A and 2A). Infants failed to acquire the same rule when audio-visual stimuli were incongruently matched (Experiment 2B) and when only the visual (Experiment 1B) or the audio (Experiment 1C) stimuli were presented. Our results highlight that bimodal facilitation in infant rule learning is not only dependent on better statistical probability and redundant sensory information, but also the relational congruency of audio-visual information. A video abstract of this article can be viewed at https://m.youtube.com/watch?v=KYTyjH1k9RQ. © 2015 John Wiley & Sons Ltd.
CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset
Cao, Houwei; Cooper, David G.; Keutmann, Michael K.; Gur, Ruben C.; Nenkova, Ani; Verma, Ragini
2014-01-01
People convey their emotional state in their face and voice. We present an audio-visual data set uniquely suited for the study of multi-modal emotion expression and perception. The data set consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnic backgrounds were rated by multiple raters in three modalities: audio, visual, and audio-visual. Categorical emotion labels and real-value intensity values for the perceived emotion were collected using crowd-sourcing from 2,443 raters. The human recognition of intended emotion for the audio-only, visual-only, and audio-visual data are 40.9%, 58.2% and 63.6% respectively. Recognition rates are highest for neutral, followed by happy, anger, disgust, fear, and sad. Average intensity levels of emotion are rated highest for visual-only perception. The accurate recognition of disgust and fear requires simultaneous audio-visual cues, while anger and happiness can be well recognized based on evidence from a single modality. The large dataset we introduce can be used to probe other questions concerning the audio-visual perception of emotion. PMID:25653738
Perception of Emotion: Differences in Mode of Presentation, Sex of Perceiver, and Race of Expressor.
ERIC Educational Resources Information Center
Kozel, Nicholas J.; Gitter, A. George
A 2 x 2 x 4 factorial design was utilized to investigate the effects of sex of perceiver, race of expressor (Negro and White), and mode of presentation of stimuli (audio and visual, visual only, audio only, and still pictures) on perception of emotion (POE). Perception of seven emotions (anger, happiness, surprise, fear, disgust, pain, and…
Wilbiks, Jonathan M P; Dyson, Benjamin J
2016-01-01
Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations) are combined with the temporal unpredictability of the critical frame (Experiment 2), or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4). Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus.
Wilbiks, Jonathan M. P.; Dyson, Benjamin J.
2016-01-01
Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations) are combined with the temporal unpredictability of the critical frame (Experiment 2), or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4). Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus. PMID:27977790
PROTAX-Sound: A probabilistic framework for automated animal sound identification
Somervuo, Panu; Ovaskainen, Otso
2017-01-01
Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities. PMID:28863178
PROTAX-Sound: A probabilistic framework for automated animal sound identification.
de Camargo, Ulisses Moliterno; Somervuo, Panu; Ovaskainen, Otso
2017-01-01
Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities.
Impact of Audio-Visual Asynchrony on Lip-Reading Effects -Neuromagnetic and Psychophysical Study-
Yahata, Izumi; Kanno, Akitake; Sakamoto, Shuichi; Takanashi, Yoshitaka; Takata, Shiho; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio
2016-01-01
The effects of asynchrony between audio and visual (A/V) stimuli on the N100m responses of magnetoencephalography in the left hemisphere were compared with those on the psychophysical responses in 11 participants. The latency and amplitude of N100m were significantly shortened and reduced in the left hemisphere by the presentation of visual speech as long as the temporal asynchrony between A/V stimuli was within 100 ms, but were not significantly affected with audio lags of -500 and +500 ms. However, some small effects were still preserved on average with audio lags of 500 ms, suggesting similar asymmetry of the temporal window to that observed in psychophysical measurements, which tended to be more robust (wider) for audio lags; i.e., the pattern of visual-speech effects as a function of A/V lag observed in the N100m in the left hemisphere grossly resembled that in psychophysical measurements on average, although the individual responses were somewhat varied. The present results suggest that the basic configuration of the temporal window of visual effects on auditory-speech perception could be observed from the early auditory processing stage. PMID:28030631
Pratt, Hillel; Bleich, Naomi; Mittelman, Nomi
2015-11-01
Spatio-temporal distributions of cortical activity to audio-visual presentations of meaningless vowel-consonant-vowels and the effects of audio-visual congruence/incongruence, with emphasis on the McGurk effect, were studied. The McGurk effect occurs when a clearly audible syllable with one consonant, is presented simultaneously with a visual presentation of a face articulating a syllable with a different consonant and the resulting percept is a syllable with a consonant other than the auditorily presented one. Twenty subjects listened to pairs of audio-visually congruent or incongruent utterances and indicated whether pair members were the same or not. Source current densities of event-related potentials to the first utterance in the pair were estimated and effects of stimulus-response combinations, brain area, hemisphere, and clarity of visual articulation were assessed. Auditory cortex, superior parietal cortex, and middle temporal cortex were the most consistently involved areas across experimental conditions. Early (<200 msec) processing of the consonant was overall prominent in the left hemisphere, except right hemisphere prominence in superior parietal cortex and secondary visual cortex. Clarity of visual articulation impacted activity in secondary visual cortex and Wernicke's area. McGurk perception was associated with decreased activity in primary and secondary auditory cortices and Wernicke's area before 100 msec, increased activity around 100 msec which decreased again around 180 msec. Activity in Broca's area was unaffected by McGurk perception and was only increased to congruent audio-visual stimuli 30-70 msec following consonant onset. The results suggest left hemisphere prominence in the effects of stimulus and response conditions on eight brain areas involved in dynamically distributed parallel processing of audio-visual integration. Initially (30-70 msec) subcortical contributions to auditory cortex, superior parietal cortex, and middle temporal cortex occur. During 100-140 msec, peristriate visual influences and Wernicke's area join in the processing. Resolution of incongruent audio-visual inputs is then attempted, and if successful, McGurk perception occurs and cortical activity in left hemisphere further increases between 170 and 260 msec.
Audio-visual synchrony and feature-selective attention co-amplify early visual processing.
Keitel, Christian; Müller, Matthias M
2016-05-01
Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space.
Effects of Exposure to Advertisements on Audience Impressions
NASA Astrophysics Data System (ADS)
Hasegawa, Hiroshi; Sato, Mie; Kasuga, Masao; Nagao, Yoshihide; Shono, Toru; Norose, Yuka; Oku, Ritsuya; Nogami, Akira; Miyazawa, Yoshitaka
This study investigated effects of listening and/or watching commercial-messages (CMs) on audience impressions. We carried out experiments of TV advertisements presentation in conditions of audio only, video only, and audio-video. As results, we confirmed the following two effects: image-multiple effect, that is, the audience brings to mind various images that are not directly expressed in the content, and marking-up effect, that is, the audience concentrates on some images that are directly expressed in the content. The image-multiple effect, in particular, strongly appeared under the audio only condition. Next, we investigated changes in the following seven subjective responses; usage image, experience, familiarity, exclusiveness, feeling at home, affection, and willingness to buy, after exposure to advertisements under conditions of audio only and audio-video. As a result, noting that the image-multiple effect became stronger as the evaluation scores of the responses increased.
Audio fingerprint extraction for content identification
NASA Astrophysics Data System (ADS)
Shiu, Yu; Yeh, Chia-Hung; Kuo, C. C. J.
2003-11-01
In this work, we present an audio content identification system that identifies some unknown audio material by comparing its fingerprint with those extracted off-line and saved in the music database. We will describe in detail the procedure to extract audio fingerprints and demonstrate that they are robust to noise and content-preserving manipulations. The main feature in the proposed system is the zero-crossing rate extracted with the octave-band filter bank. The zero-crossing rate can be used to describe the dominant frequency in each subband with a very low computational cost. The size of audio fingerprint is small and can be efficiently stored along with the compressed files in the database. It is also robust to many modifications such as tempo change and time-alignment distortion. Besides, the octave-band filter bank is used to enhance the robustness to distortion, especially those localized on some frequency regions.
The sweet-home project: audio technology in smart homes to improve well-being and reliance.
Vacher, Michel; Istrate, Dan; Portet, François; Joubert, Thierry; Chevalier, Thierry; Smidtas, Serge; Meillon, Brigitte; Lecouteux, Benjamin; Sehili, Mohamed; Chahuara, Pedro; Méniard, Sylvain
2011-01-01
The Sweet-Home project aims at providing audio-based interaction technology that lets the user have full control over their home environment, at detecting distress situations and at easing the social inclusion of the elderly and frail population. This paper presents an overview of the project focusing on the multimodal sound corpus acquisition and labelling and on the investigated techniques for speech and sound recognition. The user study and the recognition performances show the interest of this audio technology.
The many facets of auditory display
NASA Technical Reports Server (NTRS)
Blattner, Meera M.
1995-01-01
In this presentation we will examine some of the ways sound can be used in a virtual world. We make the case that many different types of audio experience are available to us. A full range of audio experiences include: music, speech, real-world sounds, auditory displays, and auditory cues or messages. The technology of recreating real-world sounds through physical modeling has advanced in the past few years allowing better simulation of virtual worlds. Three-dimensional audio has further enriched our sensory experiences.
Design of batch audio/video conversion platform based on JavaEE
NASA Astrophysics Data System (ADS)
Cui, Yansong; Jiang, Lianpin
2018-03-01
With the rapid development of digital publishing industry, the direction of audio / video publishing shows the diversity of coding standards for audio and video files, massive data and other significant features. Faced with massive and diverse data, how to quickly and efficiently convert to a unified code format has brought great difficulties to the digital publishing organization. In view of this demand and present situation in this paper, basing on the development architecture of Sptring+SpringMVC+Mybatis, and combined with the open source FFMPEG format conversion tool, a distributed online audio and video format conversion platform with a B/S structure is proposed. Based on the Java language, the key technologies and strategies designed in the design of platform architecture are analyzed emphatically in this paper, designing and developing a efficient audio and video format conversion system, which is composed of “Front display system”, "core scheduling server " and " conversion server ". The test results show that, compared with the ordinary audio and video conversion scheme, the use of batch audio and video format conversion platform can effectively improve the conversion efficiency of audio and video files, and reduce the complexity of the work. Practice has proved that the key technology discussed in this paper can be applied in the field of large batch file processing, and has certain practical application value.
Sense-Making in a Temporary Organization: Implementing a New Curriculum in a Swedish Municipality
ERIC Educational Resources Information Center
Nordholm, Daniel Erik
2015-01-01
This article explores sense-making in a municipality-led temporary organization established in response to the introduction of a new curriculum and marking system in Sweden. Qualitative data were extracted from audio-recorded interviews (n = 18) and observations of central subject group meetings (n = 6). By applying core elements of sociological…
Pluralist Discourses of Bilingualism and Translanguaging Talk in Classrooms
ERIC Educational Resources Information Center
Durán, Leah; Palmer, Deborah
2014-01-01
This paper examines student and teacher talk in a first grade classroom in a two-way immersion school in Central Texas. Drawing on audio and video data from a year-long study in a first grade two-way classroom and using a methodology that fuses ethnography and discourse analysis, the authors explore how pluralist discourses are constructed and…
Animation, audio, and spatial ability: Optimizing multimedia for scientific explanations
NASA Astrophysics Data System (ADS)
Koroghlanian, Carol May
This study investigated the effects of audio, animation and spatial ability in a computer based instructional program for biology. The program presented instructional material via text or audio with lean text and included eight instructional sequences presented either via static illustrations or animations. High school students enrolled in a biology course were blocked by spatial ability and randomly assigned to one of four treatments (Text-Static Illustration Audio-Static Illustration, Text-Animation, Audio-Animation). The study examined the effects of instructional mode (Text vs. Audio), illustration mode (Static Illustration vs. Animation) and spatial ability (Low vs. High) on practice and posttest achievement, attitude and time. Results for practice achievement indicated that high spatial ability participants achieved more than low spatial ability participants. Similar results for posttest achievement and spatial ability were not found. Participants in the Static Illustration treatments achieved the same as participants in the Animation treatments on both the practice and posttest. Likewise, participants in the Text treatments achieved the same as participants in the Audio treatments on both the practice and posttest. In terms of attitude, participants responded favorably to the computer based instructional program. They found the program interesting, felt the static illustrations or animations made the explanations easier to understand and concentrated on learning the material. Furthermore, participants in the Animation treatments felt the information was easier to understand than participants in the Static Illustration treatments. However, no difference for any attitude item was found for participants in the Text as compared to those in the Audio treatments. Significant differences were found by Spatial Ability for three attitude items concerning concentration and interest. In all three items, the low spatial ability participants responded more positively than high spatial ability participants. In addition, low spatial ability participants reported greater mental effort than high spatial ability participants. Findings for time-in-program and time-in-instruction indicated that participants in the Animation treatments took significantly more time than participants in the Static Illustration treatments. No time differences of any type were found for participants in the Text versus Audio treatments. Implications for the design of multimedia instruction and topics for future research are included in the discussion.
Audio-guided audiovisual data segmentation, indexing, and retrieval
NASA Astrophysics Data System (ADS)
Zhang, Tong; Kuo, C.-C. Jay
1998-12-01
While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data, based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e., speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based upon morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes, such as applause, explosions, bird sounds, etc. This fine-level classification and indexing step is based upon time- frequency analysis of audio signals and the use of the hidden Markov model as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90 percent for the coarse-level classification, and higher than 85 percent for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.
Audio-video feature correlation: faces and speech
NASA Astrophysics Data System (ADS)
Durand, Gwenael; Montacie, Claude; Caraty, Marie-Jose; Faudemay, Pascal
1999-08-01
This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.
Voice over: Audio-visual congruency and content recall in the gallery setting
Fairhurst, Merle T.; Scott, Minnie; Deroy, Ophelia
2017-01-01
Experimental research has shown that pairs of stimuli which are congruent and assumed to ‘go together’ are recalled more effectively than an item presented in isolation. Will this multisensory memory benefit occur when stimuli are richer and longer, in an ecological setting? In the present study, we focused on an everyday situation of audio-visual learning and manipulated the relationship between audio guide tracks and viewed portraits in the galleries of the Tate Britain. By varying the gender and narrative style of the voice-over, we examined how the perceived congruency and assumed unity of the audio guide track with painted portraits affected subsequent recall. We show that tracks perceived as best matching the viewed portraits led to greater recall of both sensory and linguistic content. We provide the first evidence that manipulating crossmodal congruence and unity assumptions can effectively impact memory in a multisensory ecological setting, even in the absence of precise temporal alignment between sensory cues. PMID:28636667
Singing voice detection for karaoke application
NASA Astrophysics Data System (ADS)
Shenoy, Arun; Wu, Yuansheng; Wang, Ye
2005-07-01
We present a framework to detect the regions of singing voice in musical audio signals. This work is oriented towards the development of a robust transcriber of lyrics for karaoke applications. The technique leverages on a combination of low-level audio features and higher level musical knowledge of rhythm and tonality. Musical knowledge of the key is used to create a song-specific filterbank to attenuate the presence of the pitched musical instruments. This is followed by subband processing of the audio to detect the musical octaves in which the vocals are present. Text processing is employed to approximate the duration of the sung passages using freely available lyrics. This is used to obtain a dynamic threshold for vocal/ non-vocal segmentation. This pairing of audio and text processing helps create a more accurate system. Experimental evaluation on a small database of popular songs shows the validity of the proposed approach. Holistic and per-component evaluation of the system is conducted and various improvements are discussed.
Voice over: Audio-visual congruency and content recall in the gallery setting.
Fairhurst, Merle T; Scott, Minnie; Deroy, Ophelia
2017-01-01
Experimental research has shown that pairs of stimuli which are congruent and assumed to 'go together' are recalled more effectively than an item presented in isolation. Will this multisensory memory benefit occur when stimuli are richer and longer, in an ecological setting? In the present study, we focused on an everyday situation of audio-visual learning and manipulated the relationship between audio guide tracks and viewed portraits in the galleries of the Tate Britain. By varying the gender and narrative style of the voice-over, we examined how the perceived congruency and assumed unity of the audio guide track with painted portraits affected subsequent recall. We show that tracks perceived as best matching the viewed portraits led to greater recall of both sensory and linguistic content. We provide the first evidence that manipulating crossmodal congruence and unity assumptions can effectively impact memory in a multisensory ecological setting, even in the absence of precise temporal alignment between sensory cues.
NASA Astrophysics Data System (ADS)
Clerc, G.; Décriaud, J.-P.; Doyen, G.; Halbwachs, M.; Henrotte, M.; Rémy, J.; Zhang, X.-C.
1984-07-01
A campaign of audio-magnetotelluric soundings in the crater of Momotombo has shown that the structure of this volcano is suitable to a surveillance by this method; it has an only central activity, and the resistivity versus the depth decreases suddenly at least 100 times at about 270 m. So, this technical paper describes the equipment designed and built in order to measure every 2 hr the apparent resistivity in a given direction, at seven frequencies regularly spaced from 5 to 312 Hz. The data are preprocessed to fill only 32 bytes: they are - on the one hand, printed locally for the close surveillance; - on the other hand, transmitted by an ARGOS platform and received in Garchy via satellite for searching studies
Zelic, Gregory; Mottet, Denis; Lagarde, Julien
2012-01-01
Recent behavioral neuroscience research revealed that elementary reactive behavior can be improved in the case of cross-modal sensory interactions thanks to underlying multisensory integration mechanisms. Can this benefit be generalized to an ongoing coordination of movements under severe physical constraints? We choose a juggling task to examine this question. A central issue well-known in juggling lies in establishing and maintaining a specific temporal coordination among balls, hands, eyes and posture. Here, we tested whether providing additional timing information about the balls and hands motions by using external sound and tactile periodic stimulations, the later presented at the wrists, improved the behavior of jugglers. One specific combination of auditory and tactile metronome led to a decrease of the spatiotemporal variability of the juggler's performance: a simple sound associated to left and right tactile cues presented antiphase to each other, which corresponded to the temporal pattern of hands movement in the juggling task. A contrario, no improvements were obtained in the case of other auditory and tactile combinations. We even found a degraded performance when tactile events were presented alone. The nervous system thus appears able to integrate in efficient way environmental information brought by different sensory modalities, but only if the information specified matches specific features of the coordination pattern. We discuss the possible implications of these results for the understanding of the neuronal integration process implied in audio-tactile interaction in the context of complex voluntary movement, and considering the well-known gating effect of movement on vibrotactile perception. PMID:22384211
Development of a Bayesian Estimator for Audio-Visual Integration: A Neurocomputational Study
Ursino, Mauro; Crisafulli, Andrea; di Pellegrino, Giuseppe; Magosso, Elisa; Cuppini, Cristiano
2017-01-01
The brain integrates information from different sensory modalities to generate a coherent and accurate percept of external events. Several experimental studies suggest that this integration follows the principle of Bayesian estimate. However, the neural mechanisms responsible for this behavior, and its development in a multisensory environment, are still insufficiently understood. We recently presented a neural network model of audio-visual integration (Neural Computation, 2017) to investigate how a Bayesian estimator can spontaneously develop from the statistics of external stimuli. Model assumes the presence of two unimodal areas (auditory and visual) topologically organized. Neurons in each area receive an input from the external environment, computed as the inner product of the sensory-specific stimulus and the receptive field synapses, and a cross-modal input from neurons of the other modality. Based on sensory experience, synapses were trained via Hebbian potentiation and a decay term. Aim of this work is to improve the previous model, including a more realistic distribution of visual stimuli: visual stimuli have a higher spatial accuracy at the central azimuthal coordinate and a lower accuracy at the periphery. Moreover, their prior probability is higher at the center, and decreases toward the periphery. Simulations show that, after training, the receptive fields of visual and auditory neurons shrink to reproduce the accuracy of the input (both at the center and at the periphery in the visual case), thus realizing the likelihood estimate of unimodal spatial position. Moreover, the preferred positions of visual neurons contract toward the center, thus encoding the prior probability of the visual input. Finally, a prior probability of the co-occurrence of audio-visual stimuli is encoded in the cross-modal synapses. The model is able to simulate the main properties of a Bayesian estimator and to reproduce behavioral data in all conditions examined. In particular, in unisensory conditions the visual estimates exhibit a bias toward the fovea, which increases with the level of noise. In cross modal conditions, the SD of the estimates decreases when using congruent audio-visual stimuli, and a ventriloquism effect becomes evident in case of spatially disparate stimuli. Moreover, the ventriloquism decreases with the eccentricity. PMID:29046631
Enhancing Navigation Skills through Audio Gaming.
Sánchez, Jaime; Sáenz, Mauricio; Pascual-Leone, Alvaro; Merabet, Lotfi
2010-01-01
We present the design, development and initial cognitive evaluation of an Audio-based Environment Simulator (AbES). This software allows a blind user to navigate through a virtual representation of a real space for the purposes of training orientation and mobility skills. Our findings indicate that users feel satisfied and self-confident when interacting with the audio-based interface, and the embedded sounds allow them to correctly orient themselves and navigate within the virtual world. Furthermore, users are able to transfer spatial information acquired through virtual interactions into real world navigation and problem solving tasks.
NASA Technical Reports Server (NTRS)
1992-01-01
Ames Research Center research into virtual reality led to the development of the Convolvotron, a high speed digital audio processing system that delivers three-dimensional sound over headphones. It consists of a two-card set designed for use with a personal computer. The Convolvotron's primary application is presentation of 3D audio signals over headphones. Four independent sound sources are filtered with large time-varying filters that compensate for motion. The perceived location of the sound remains constant. Possible applications are in air traffic control towers or airplane cockpits, hearing and perception research and virtual reality development.
Enhancing Navigation Skills through Audio Gaming
Sánchez, Jaime; Sáenz, Mauricio; Pascual-Leone, Alvaro; Merabet, Lotfi
2014-01-01
We present the design, development and initial cognitive evaluation of an Audio-based Environment Simulator (AbES). This software allows a blind user to navigate through a virtual representation of a real space for the purposes of training orientation and mobility skills. Our findings indicate that users feel satisfied and self-confident when interacting with the audio-based interface, and the embedded sounds allow them to correctly orient themselves and navigate within the virtual world. Furthermore, users are able to transfer spatial information acquired through virtual interactions into real world navigation and problem solving tasks. PMID:25505796
Vacher, Michel; Chahuara, Pedro; Lecouteux, Benjamin; Istrate, Dan; Portet, Francois; Joubert, Thierry; Sehili, Mohamed; Meillon, Brigitte; Bonnefond, Nicolas; Fabre, Sébastien; Roux, Camille; Caffiau, Sybille
2013-01-01
The Sweet-Home project aims at providing audio-based interaction technology that lets the user have full control over their home environment, at detecting distress situations and at easing the social inclusion of the elderly and frail population. This paper presents an overview of the project focusing on the implemented techniques for speech and sound recognition as context-aware decision making with uncertainty. A user experiment in a smart home demonstrates the interest of this audio-based technology.
Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola
2015-11-06
Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.
NASA Astrophysics Data System (ADS)
Monteiro Santos, Fernando A.; Afonso, António R. Andrade; Dupis, André
2007-03-01
Audio-magnetotelluric (AMT) and resistivity (dc) surveys are often used in environmental, hydrological and geothermal evaluation. The separate interpretation of those geophysical data sets assuming two-dimensional models frequently produces ambiguous results. The joint inversion of AMT and dc data is advocated by several authors as an efficient method for reducing the ambiguity inherent to each of those methods. This paper presents results obtained from the two-dimensional joint inversion of dipole-dipole and scalar AMT data acquired in a low enthalpy geothermal field situated in a graben. The joint inverted models show a better definition of shallow and deep structures. The results show that the extension of the benefits using joint inversion depends on the number and spacing of the AMT sites. The models obtained from experimental data display a low resistivity zone (<20 Ω m) in the central part of the graben that was correlated with the geothermal reservoir. The resistivity distribution models were used to estimate the distribution of the porosity in the geothermal reservoir applying two different approaches and considering the clay minerals effect. The results suggest that the maximum porosity of the reservoir is not uniform and might be in the range of 12% to 24%.
Wang, Ya-Ching; Griffiths, Jane; Grande, Gunn
2017-09-01
This article presents the qualitative findings of a mixed-methods study that explored factors influencing lesbians' breast health-care behavior and intentions. A total of 37 semi-structured face-to-face interviews were conducted among women who self-identified as lesbians or women who partnered with the same gender who were aged 20 years or above in four areas of Taiwan (North, Central, South, and East Taiwan) between August 2012 and October 2012. Interviews were audio recorded with participants' consent. The interviews were analyzed using constant comparative analysis with Nvivo audio-coding support. Four themes were identified to be strongly associated with the lesbians' breast health-care behavior and their intentions, namely, gender identity, gender role expression, partners' support, and concerns about health-care providers' reactions. Important barriers to the women's breast health-care behavior and intentions were masculine identity ("T-identity" in Taiwan), masculine appearance, concerns about health-care providers' lack of knowledge of multiple gender diversity, and their attitudes toward lesbians. Conversely, their partners' support was a factor facilitating the women's breast health-care behavior and intentions, particularly for the T-identity lesbians. These findings suggest the significance of and need for culturally competent care and are important for improving Taiwanese lesbians' breast health.
Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola
2015-01-01
Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented. PMID:26561811
A Comparison of Television and Audio Presentations of the MLA French Listening Examination
ERIC Educational Resources Information Center
Stallings, William M.
1972-01-01
Although nonverbal cues are often available in real-life communication, listening is usually tested by aural stimuli broadcast from an audio-tape. It would seem that testing listening comprehension might be improved by using television to offer nonverbal cues in addition to aural stimuli. (Author)
The Audio-Visual Marketing Handbook for Independent Schools.
ERIC Educational Resources Information Center
Griffith, Tom
This how-to booklet offers specific advice on producing video or slide/tape programs for marketing independent schools. Five chapters present guidelines for various stages in the process: (1) Audio-Visual Marketing in Context (aesthetics and economics of audiovisual marketing); (2) A Question of Identity (identifying the audience and deciding on…
Notes from the Field: Lolak--Another Moribund Language of Indonesia, with Supporting Audio
ERIC Educational Resources Information Center
Lobel, Jason William; Paputungan, Ade Tatak
2017-01-01
This paper consists of a short multimedia introduction to Lolak, a near-extinct Greater Central Philippine language traditionally spoken in three small communities on the island of Sulawesi in Indonesia. In addition to being one of the most underdocumented languages in the area, it is also spoken by one of the smallest native speaker populations…
Korycki, Rafal
2014-05-01
Since the appearance of digital audio recordings, audio authentication has been becoming increasingly difficult. The currently available technologies and free editing software allow a forger to cut or paste any single word without audible artifacts. Nowadays, the only method referring to digital audio files commonly approved by forensic experts is the ENF criterion. It consists in fluctuation analysis of the mains frequency induced in electronic circuits of recording devices. Therefore, its effectiveness is strictly dependent on the presence of mains signal in the recording, which is a rare occurrence. Recently, much attention has been paid to authenticity analysis of compressed multimedia files and several solutions were proposed for detection of double compression in both digital video and digital audio. This paper addresses the problem of tampering detection in compressed audio files and discusses new methods that can be used for authenticity analysis of digital recordings. Presented approaches consist in evaluation of statistical features extracted from the MDCT coefficients as well as other parameters that may be obtained from compressed audio files. Calculated feature vectors are used for training selected machine learning algorithms. The detection of multiple compression covers up tampering activities as well as identification of traces of montage in digital audio recordings. To enhance the methods' robustness an encoder identification algorithm was developed and applied based on analysis of inherent parameters of compression. The effectiveness of tampering detection algorithms is tested on a predefined large music database consisting of nearly one million of compressed audio files. The influence of compression algorithms' parameters on the classification performance is discussed, based on the results of the current study. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Pallett, Edward; Rentowl, Patricia; Hanning, Christopher
2009-09-01
An Electronic Portable Information Collection audio device (EPIC-Vox) has been developed to deliver questionnaires in spoken word format via headphones. Patients respond by pressing buttons on the device. The aims of this study were to determine limits of agreement between, and test-retest reliability of audio (A) and paper (P) versions of the Brief Fatigue Inventory (BFI). Two hundred sixty outpatients (204 male, mean age 55.7 years) attending a sleep disorders clinic were allocated to four groups using block randomization. All completed the BFI twice, separated by a one-minute distracter task. Half the patients completed paper and audio versions, then an evaluation questionnaire. The remainder completed either paper or audio versions to compare test-retest reliability. BFI global scores were analyzed using Bland-Altman methodology. Agreement between categorical fatigue severity scores was determined using Cohen's kappa. The mean (SD) difference between paper and audio scores was -0.04 (0.48). The limits of agreement (mean difference+/-2SD) were -0.93 to +1.00. Test-retest reliability of the paper BFI showed a mean (SD) difference of 0.17 (0.32) between first and second presentations (limits -0.46 to +0.81). For audio, the mean (SD) difference was 0.17 (0.48) (limits -0.79 to +1.14). For agreement between categorical scores, Cohen's kappa=0.73 for P and A, 0.67 (P at test and retest) and 0.87 (A at test and retest). Evaluation preferences (n=128): 36.7% audio; 18.0% paper; and 45.3% no preference. A total of 99.2% found EPIC-Vox "easy to use." These data demonstrate that the English audio version of the BFI provides an acceptable alternative to the paper questionnaire.
Kuribayashi, Ryuma; Nittono, Hiroshi
2017-01-01
High-resolution audio has a higher sampling frequency and a greater bit depth than conventional low-resolution audio such as compact disks. The higher sampling frequency enables inaudible sound components (above 20 kHz) that are cut off in low-resolution audio to be reproduced. Previous studies of high-resolution audio have mainly focused on the effect of such high-frequency components. It is known that alpha-band power in a human electroencephalogram (EEG) is larger when the inaudible high-frequency components are present than when they are absent. Traditionally, alpha-band EEG activity has been associated with arousal level. However, no previous studies have explored whether sound sources with high-frequency components affect the arousal level of listeners. The present study examined this possibility by having 22 participants listen to two types of a 400-s musical excerpt of French Suite No. 5 by J. S. Bach (on cembalo, 24-bit quantization, 192 kHz A/D sampling), with or without inaudible high-frequency components, while performing a visual vigilance task. High-alpha (10.5-13 Hz) and low-beta (13-20 Hz) EEG powers were larger for the excerpt with high-frequency components than for the excerpt without them. Reaction times and error rates did not change during the task and were not different between the excerpts. The amplitude of the P3 component elicited by target stimuli in the vigilance task increased in the second half of the listening period for the excerpt with high-frequency components, whereas no such P3 amplitude change was observed for the other excerpt without them. The participants did not distinguish between these excerpts in terms of sound quality. Only a subjective rating of inactive pleasantness after listening was higher for the excerpt with high-frequency components than for the other excerpt. The present study shows that high-resolution audio that retains high-frequency components has an advantage over similar and indistinguishable digital sound sources in which such components are artificially cut off, suggesting that high-resolution audio with inaudible high-frequency components induces a relaxed attentional state without conscious awareness.
Stochastic modeling of soundtrack for efficient segmentation and indexing of video
NASA Astrophysics Data System (ADS)
Naphade, Milind R.; Huang, Thomas S.
1999-12-01
Tools for efficient and intelligent management of digital content are essential for digital video data management. An extremely challenging research area in this context is that of multimedia analysis and understanding. The capabilities of audio analysis in particular for video data management are yet to be fully exploited. We present a novel scheme for indexing and segmentation of video by analyzing the audio track. This analysis is then applied to the segmentation and indexing of movies. We build models for some interesting events in the motion picture soundtrack. The models built include music, human speech and silence. We propose the use of hidden Markov models to model the dynamics of the soundtrack and detect audio-events. Using these models we segment and index the soundtrack. A practical problem in motion picture soundtracks is that the audio in the track is of a composite nature. This corresponds to the mixing of sounds from different sources. Speech in foreground and music in background are common examples. The coexistence of multiple individual audio sources forces us to model such events explicitly. Experiments reveal that explicit modeling gives better result than modeling individual audio events separately.
Acoustic Calibration of the Exterior Effects Room at the NASA Langley Research Center
NASA Technical Reports Server (NTRS)
Faller, Kenneth J., II; Rizzi, Stephen A.; Klos, Jacob; Chapin, William L.; Surucu, Fahri; Aumann, Aric R.
2010-01-01
The Exterior Effects Room (EER) at the NASA Langley Research Center is a 39-seat auditorium built for psychoacoustic studies of aircraft community noise. The original reproduction system employed monaural playback and hence lacked sound localization capability. In an effort to more closely recreate field test conditions, a significant upgrade was undertaken to allow simulation of a three-dimensional audio and visual environment. The 3D audio system consists of 27 mid and high frequency satellite speakers and 4 subwoofers, driven by a real-time audio server running an implementation of Vector Base Amplitude Panning. The audio server is part of a larger simulation system, which controls the audio and visual presentation of recorded and synthesized aircraft flyovers. The focus of this work is on the calibration of the 3D audio system, including gains used in the amplitude panning algorithm, speaker equalization, and absolute gain control. Because the speakers are installed in an irregularly shaped room, the speaker equalization includes time delay and gain compensation due to different mounting distances from the focal point, filtering for color compensation due to different installations (half space, corner, baffled/unbaffled), and cross-over filtering.
Assessment of rural soundscapes with high-speed train noise.
Lee, Pyoung Jik; Hong, Joo Young; Jeon, Jin Yong
2014-06-01
In the present study, rural soundscapes with high-speed train noise were assessed through laboratory experiments. A total of ten sites with varying landscape metrics were chosen for audio-visual recording. The acoustical characteristics of the high-speed train noise were analyzed using various noise level indices. Landscape metrics such as the percentage of natural features (NF) and Shannon's diversity index (SHDI) were adopted to evaluate the landscape features of the ten sites. Laboratory experiments were then performed with 20 well-trained listeners to investigate the perception of high-speed train noise in rural areas. The experiments consisted of three parts: 1) visual-only condition, 2) audio-only condition, and 3) combined audio-visual condition. The results showed that subjects' preference for visual images was significantly related to NF, the number of land types, and the A-weighted equivalent sound pressure level (LAeq). In addition, the visual images significantly influenced the noise annoyance, and LAeq and NF were the dominant factors affecting the annoyance from high-speed train noise in the combined audio-visual condition. In addition, Zwicker's loudness (N) was highly correlated with the annoyance from high-speed train noise in both the audio-only and audio-visual conditions. © 2013.
Highlight summarization in golf videos using audio signals
NASA Astrophysics Data System (ADS)
Kim, Hyoung-Gook; Kim, Jin Young
2008-01-01
In this paper, we present an automatic summarization of highlights in golf videos based on audio information alone without video information. The proposed highlight summarization system is carried out based on semantic audio segmentation and detection on action units from audio signals. Studio speech, field speech, music, and applause are segmented by means of sound classification. Swing is detected by the methods of impulse onset detection. Sounds like swing and applause form a complete action unit, while studio speech and music parts are used to anchor the program structure. With the advantage of highly precise detection of applause, highlights are extracted effectively. Our experimental results obtain high classification precision on 18 golf games. It proves that the proposed system is very effective and computationally efficient to apply the technology to embedded consumer electronic devices.
Method and apparatus for obtaining complete speech signals for speech recognition applications
NASA Technical Reports Server (NTRS)
Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)
2009-01-01
The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.
Computationally Efficient Clustering of Audio-Visual Meeting Data
NASA Astrophysics Data System (ADS)
Hung, Hayley; Friedland, Gerald; Yeo, Chuohao
This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
Auditory and audio-vocal responses of single neurons in the monkey ventral premotor cortex.
Hage, Steffen R
2018-03-20
Monkey vocalization is a complex behavioral pattern, which is flexibly used in audio-vocal communication. A recently proposed dual neural network model suggests that cognitive control might be involved in this behavior, originating from a frontal cortical network in the prefrontal cortex and mediated via projections from the rostral portion of the ventral premotor cortex (PMvr) and motor cortex to the primary vocal motor network in the brainstem. For the rapid adjustment of vocal output to external acoustic events, strong interconnections between vocal motor and auditory sites are needed, which are present at cortical and subcortical levels. However, the role of the PMvr in audio-vocal integration processes remains unclear. In the present study, single neurons in the PMvr were recorded in rhesus monkeys (Macaca mulatta) while volitionally producing vocalizations in a visual detection task or passively listening to monkey vocalizations. Ten percent of randomly selected neurons in the PMvr modulated their discharge rate in response to acoustic stimulation with species-specific calls. More than four-fifths of these auditory neurons showed an additional modulation of their discharge rates either before and/or during the monkeys' motor production of the vocalization. Based on these audio-vocal interactions, the PMvr might be well positioned to mediate higher order auditory processing with cognitive control of the vocal motor output to the primary vocal motor network. Such audio-vocal integration processes in the premotor cortex might constitute a precursor for the evolution of complex learned audio-vocal integration systems, ultimately giving rise to human speech. Copyright © 2018 Elsevier B.V. All rights reserved.
Modeling sports highlights using a time-series clustering framework and model interpretation
NASA Astrophysics Data System (ADS)
Radhakrishnan, Regunathan; Otsuka, Isao; Xiong, Ziyou; Divakaran, Ajay
2005-01-01
In our past work on sports highlights extraction, we have shown the utility of detecting audience reaction using an audio classification framework. The audio classes in the framework were chosen based on intuition. In this paper, we present a systematic way of identifying the key audio classes for sports highlights extraction using a time series clustering framework. We treat the low-level audio features as a time series and model the highlight segments as "unusual" events in a background of an "usual" process. The set of audio classes to characterize the sports domain is then identified by analyzing the consistent patterns in each of the clusters output from the time series clustering framework. The distribution of features from the training data so obtained for each of the key audio classes, is parameterized by a Minimum Description Length Gaussian Mixture Model (MDL-GMM). We also interpret the meaning of each of the mixture components of the MDL-GMM for the key audio class (the "highlight" class) that is correlated with highlight moments. Our results show that the "highlight" class is a mixture of audience cheering and commentator's excited speech. Furthermore, we show that the precision-recall performance for highlights extraction based on this "highlight" class is better than that of our previous approach which uses only audience cheering as the key highlight class.
Initial utilization of the CVIRB video production facility
NASA Technical Reports Server (NTRS)
Parrish, Russell V.; Busquets, Anthony M.; Hogge, Thomas W.
1987-01-01
Video disk technology is one of the central themes of a technology demonstrator workstation being assembled as a man/machine interface for the Space Station Data Management Test Bed at Johnson Space Center. Langley Research Center personnel involved in the conception and implementation of this workstation have assembled a video production facility to allow production of video disk material for this propose. This paper documents the initial familiarization efforts in the field of video production for those personnel and that facility. Although the entire video disk production cycle was not operational for this initial effort, the production of a simulated disk on video tape did acquaint the personnel with the processes involved and with the operation of the hardware. Invaluable experience in storyboarding, script writing, audio and video recording, and audio and video editing was gained in the production process.
Streaming Audio and Video: New Challenges and Opportunities for Museums.
ERIC Educational Resources Information Center
Spadaccini, Jim
Streaming audio and video present new challenges and opportunities for museums. Streaming media is easier to author and deliver to Internet audiences than ever before; digital video editing is commonplace now that the tools--computers, digital video cameras, and hard drives--are so affordable; the cost of serving video files across the Internet…
Spanish for Agricultural Purposes: The Audio Program.
ERIC Educational Resources Information Center
Mainous, Bruce H.; And Others
The manual is meant to accompany and supplement the basic manual and to serve as support to the audio component of "Spanish for Agricultural Purposes," a one-semester course for North American agriculture specialists preparing to work in Latin America, consists of exercises to supplement readings presented in the course's basic manual and to…
Transitioning from Analog to Digital Audio Recording in Childhood Speech Sound Disorders
ERIC Educational Resources Information Center
Shriberg, Lawrence D.; Mcsweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.
2005-01-01
Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing…
The Changing Role of the Educational Video in Higher Distance Education
ERIC Educational Resources Information Center
Laaser, Wolfram; Toloza, Eduardo A.
2017-01-01
The article argues that the ongoing usage of audio visual media is falling behind in terms of educational quality compared to prior achievements in the history of distance education. After reviewing some important steps and experiences of audio visual digital media development, we analyse predominant presentation formats on the Web. Special focus…
Low-cost synchronization of high-speed audio and video recordings in bio-acoustic experiments.
Laurijssen, Dennis; Verreycken, Erik; Geipel, Inga; Daems, Walter; Peremans, Herbert; Steckel, Jan
2018-02-27
In this paper, we present a method for synchronizing high-speed audio and video recordings of bio-acoustic experiments. By embedding a random signal into the recorded video and audio data, robust synchronization of a diverse set of sensor streams can be performed without the need to keep detailed records. The synchronization can be performed using recording devices without dedicated synchronization inputs. We demonstrate the efficacy of the approach in two sets of experiments: behavioral experiments on different species of echolocating bats and the recordings of field crickets. We present the general operating principle of the synchronization method, discuss its synchronization strength and provide insights into how to construct such a device using off-the-shelf components. © 2018. Published by The Company of Biologists Ltd.
NASA Astrophysics Data System (ADS)
Novey, Levi T.; Hall, Troy E.
2007-03-01
Auditory forms of nonpersonal communication have rarely been evaluated in informal settings like parks and museums. This study evaluated the effect of an interpretive audio tour on visitor knowledge and social behavior at Carlsbad Caverns National Park. A cross-sectional pretest/posttest quasi-experimental design compared the responses of audio tour users (n = 123) and nonusers (n = 131) on several knowledge questions. Observations (n = 700) conducted at seven sites within the caverns documented sign reading, time spent listening to the audio, within group conversation, and other social behaviors for a different sample of visitors. Pretested tour users and nonusers did not differ in visitor characteristics, knowledge, or attitude variables, suggesting the two populations were similar. On a 12-item knowledge quiz, tour users' scores increased from 5.7 to 10.3, and nonusers' scores increased from 6.2 to 8.4. Most visitors were able to identify some of the park's major messages when presented with a multiple-choice question, but more audio users than nonusers identified resource preservation as a primary message in an open-ended question. Based on observations, audio tour users and nonusers did not differ substantially in their interactions with other members of their group or in their reading of interpretive signs in the cave. Audio tour users had positive reactions to the tour, and these reactions, coupled with the positive learning outcomes and negligible effects on social interaction, suggest that audio tours can be an effective communication medium in informal educational settings.
Comparison of audio and audiovisual measures of adult stuttering: Implications for clinical trials.
O'Brian, Sue; Jones, Mark; Onslow, Mark; Packman, Ann; Menzies, Ross; Lowe, Robyn
2015-04-15
This study investigated whether measures of percentage syllables stuttered (%SS) and stuttering severity ratings with a 9-point scale differ when made from audiovisual compared with audio-only recordings. Four experienced speech-language pathologists measured %SS and assigned stuttering severity ratings to 10-minute audiovisual and audio-only recordings of 36 adults. There was a mean 18% increase in %SS scores when samples were presented in audiovisual compared with audio-only mode. This result was consistent across both higher and lower %SS scores and was found to be directly attributable to counts of stuttered syllables rather than the total number of syllables. There was no significant difference between stuttering severity ratings made from the two modes. In clinical trials research, when using %SS as the primary outcome measure, audiovisual samples would be preferred as long as clear, good quality, front-on images can be easily captured. Alternatively, stuttering severity ratings may be a more valid measure to use as they correlate well with %SS and values are not influenced by the presentation mode.
The role of laryngoscopy in the diagnosis of spasmodic dysphonia.
Daraei, Pedram; Villari, Craig R; Rubin, Adam D; Hillel, Alexander T; Hapner, Edie R; Klein, Adam M; Johns, Michael M
2014-03-01
Spasmodic dysphonia (SD) can be difficult to diagnose, and patients often see multiple physicians for many years before diagnosis. Improving the speed of diagnosis for individuals with SD may decrease the time to treatment and improve patient quality of life more quickly. To assess whether the diagnosis of SD can be accurately predicted through auditory cues alone without the assistance of visual cues offered by laryngoscopic examination. Single-masked, case-control study at a specialized referral center that included patients who underwent laryngoscopic examination as part of a multidisciplinary workup for dysphonia. Twenty-two patients were selected in total: 10 with SD, 5 with vocal tremor, and 7 controls without SD or vocal tremor. The laryngoscopic examination was recorded, deidentified, and edited to make 3 media clips for each patient: video alone, audio alone, and combined video and audio. These clips were randomized and presented to 3 fellowship-trained laryngologist raters (A.D.R., A.T.H., and A.M.K.), who established the most probable diagnosis for each clip. Intrarater and interrater reliability were evaluated using repeat clips incorporated in the presentations. We measured diagnostic accuracy for video-only, audio-only, and combined multimedia clips. These measures were established before data collection. Data analysis was accomplished with analysis of variance and Tukey honestly significant differences. Of patients with SD, diagnostic accuracy was 10%, 73%, and 73% for video-only, audio-only, and combined, respectively (P < .001, df = 2). Of patients with vocal tremor, diagnostic accuracy was 93%, 73%, and 100% for video-only, audio-only, and combined, respectively (P = .05, df = 2). Of the controls, diagnostic accuracy was 81%, 19%, and 62% for video-only, audio-only, and combined, respectively (P < .001, df = 2). The diagnosis of SD during examination is based primarily on auditory cues. Viewing combined audio and video clips afforded no change in diagnostic accuracy compared with audio alone. Laryngoscopy serves an important role in the diagnosis of SD by excluding other pathologic causes and identifying vocal tremor.
Perceptual Audio Hashing Functions
NASA Astrophysics Data System (ADS)
Özer, Hamza; Sankur, Bülent; Memon, Nasir; Anarım, Emin
2005-12-01
Perceptual hash functions provide a tool for fast and reliable identification of content. We present new audio hash functions based on summarization of the time-frequency spectral characteristics of an audio document. The proposed hash functions are based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies. They are found, on one hand, to perform very satisfactorily in identification and verification tests, and on the other hand, to be very resilient to a large variety of attacks. Moreover, we address the issue of security of hashes and propose a keying technique, and thereby a key-dependent hash function.
Harris, J L; Salus, D; Rerecich, R; Larsen, D
1996-01-01
Assertions made by Merikle (1988) regarding audio subliminal messages were tested. Seventeen participants were presented subliminal messages embedded in a white-noise cover, and three signal-to-noise (S/N) detection ratios were examined. Participants were asked to guess message presence and message content, to determine subjective/objective thresholds. Results showed that participants were unable to identify target words presented in this audio subliminal stimulus format beyond chance levels.
Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.
Smayda, Kirsten E; Van Engen, Kristin J; Maddox, W Todd; Chandrasekaran, Bharath
2016-01-01
Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35) and thirty-three older adults (ages 60-90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both semantic and visual cues are available to the listener.
Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults
Smayda, Kirsten E.; Van Engen, Kristin J.; Maddox, W. Todd; Chandrasekaran, Bharath
2016-01-01
Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18–35) and thirty-three older adults (ages 60–90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both semantic and visual cues are available to the listener. PMID:27031343
Guerra Jiménez, Gloria; Mazón Gutiérrez, Ángel; Marco de Lucas, Enrique; Valle San Román, Natalia; Martín Laez, Rubén; Morales Angulo, Carmelo
2015-01-01
Chiari malformation is an alteration of the base of the skull with herniation through the foramen magnum of the brain stem and cerebellum. Although the most common presentation is occipital headache, the association of audio-vestibular symptoms is not rare. The aim of our study was to describe audio-vestibular signs and symptoms in Chiari malformation type i (CM-I). We performed a retrospective observational study of patients referred to our unit during the last 5 years. We also carried out a literature review of audio-vestibular signs and symptoms in this disease. There were 9 patients (2 males and 7 females), with an average age of 42.8 years. Five patients presented a Ménière-like syndrome; 2 cases, a recurrent vertigo with peripheral features; one patient showed a sudden hearing loss; and one case suffered a sensorineural hearing loss with early childhood onset. The most common audio-vestibular symptom indicated in the literature in patients with CM-I is unsteadiness (49%), followed by dizziness (18%), nystagmus (15%) and hearing loss (15%). Nystagmus is frequently horizontal (74%) or down-beating (18%). Other audio-vestibular signs and symptoms are tinnitus (11%), aural fullness (10%) and hyperacusis (1%). Occipital headache that increases with Valsalva manoeuvres and hand paresthesias are very suggestive symptoms. The appearance of audio-vestibular manifestations in CM-I makes it common to refer these patients to neurotologists. Unsteadiness, vertiginous syndromes and sensorineural hearing loss are frequent. Nystagmus, especially horizontal and down-beating, is not rare. It is important for neurotologists to familiarise themselves with CM-I symptoms to be able to consider it in differential diagnosis. Copyright © 2014 Elsevier España, S.L.U. y Sociedad Española de Otorrinolaringología y Patología Cérvico-Facial. All rights reserved.
ERIC Educational Resources Information Center
Franklin-Landi, Rebecca
2017-01-01
Since the development of Task-Based Language Teaching (TBLT) in the 1980's, learner needs have been central to English for Specific Purposes (ESP) teaching and learning, including in the field of English for Medical Purposes (EMP). This paper reports on two studies, conducted at Nice University Medical Faculty between October 2015 and March 2016,…
Say What? The Role of Audio in Multimedia Video
NASA Astrophysics Data System (ADS)
Linder, C. A.; Holmes, R. M.
2011-12-01
Audio, including interviews, ambient sounds, and music, is a critical-yet often overlooked-part of an effective multimedia video. In February 2010, Linder joined scientists working on the Global Rivers Observatory Project for two weeks of intensive fieldwork in the Congo River watershed. The team's goal was to learn more about how climate change and deforestation are impacting the river system and coastal ocean. Using stills and video shot with a lightweight digital SLR outfit and audio recorded with a pocket-sized sound recorder, Linder documented the trials and triumphs of working in the heart of Africa. Using excerpts from the six-minute Congo multimedia video, this presentation will illustrate how to record and edit an engaging audio track. Topics include interview technique, collecting ambient sounds, choosing and using music, and editing it all together to educate and entertain the viewer.
Sound for Film: Audio Education for Filmmakers.
ERIC Educational Resources Information Center
Lazar, Wanda
1998-01-01
Identifies the specific, unique, and important elements of audio education required by film professionals. Presents a model unit to be included in a film studies program, either as a separate course or as part of a film production or introduction to film course. Offers a model syllabus for such a course or unit on sound in film. (SR)
ERIC Educational Resources Information Center
Baldwin, Thomas F.
Man seems unable to retain different information from different senses or channels simultaneously; one channel gains full attention. However, it is hypothesized that if the message elements arriving simultaneously from audio and visual channels are redundant, man will retain the information. An attempt was made to measure redundancy in the audio…
ERIC Educational Resources Information Center
Negari, Giti Mousapour; Azizi, Aliye; Arani, Davood Khedmatkar
2018-01-01
The present study attempted to investigate the effects of audio input enhancement on EFL learners' retention of intensifiers. To this end, two research questions were formulated. In order to address these research questions, this study attempted to reject two null hypotheses. Pretest-posttest control group quasi-experimental design was employed to…
The Redundancy Effect on Retention and Transfer for Individuals with High Symptoms of ADHD
ERIC Educational Resources Information Center
Brown, Victoria; Lewis, David; Toussaint, Mario
2016-01-01
The multimedia elements of text and audio need to be carefully integrated together to maximize the impact of those elements for learning in a multimedia environment. Redundancy information presented through audio and visual channels can inhibit learning for individuals diagnosed with ADHD, who may experience challenges in the processing of…
Introduction to Human Services, Chapter III. Video Script Package, Text, and Audio Script Package.
ERIC Educational Resources Information Center
Miami-Dade Community Coll., FL.
Video, textual, and audio components of the third module of a multi-media, introductory course on Human Services are presented. The module packages, developed at Miami-Dade Community College, deal with technology, social change, and problem dependencies. A video cassette script is first provided that explores the "traditional,""inner," and "other…
Audio-Visual Aid in Teaching "Fatty Liver"
ERIC Educational Resources Information Center
Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha
2016-01-01
Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…
Agency Video, Audio and Imagery Library
NASA Technical Reports Server (NTRS)
Grubbs, Rodney
2015-01-01
The purpose of this presentation was to inform the ISS International Partners of the new NASA Agency Video, Audio and Imagery Library (AVAIL) website. AVAIL is a new resource for the public to search for and download NASA-related imagery, and is not intended to replace the current process by which the International Partners receive their Space Station imagery products.
Focus on Hinduism: Audio-Visual Resources for Teaching Religion. Occasional Publication No. 23.
ERIC Educational Resources Information Center
Dell, David; And Others
The guide presents annotated lists of audio and visual materials about the Hindu religion. The authors point out that Hinduism cannot be comprehended totally by reading books; thus the resources identified in this guide will enhance understanding based on reading. The guide is intended for use by high school and college students, teachers,…
The Audio-Tutorial Approach to Learning Through Independent Study and Integrated Experiences.
ERIC Educational Resources Information Center
Postlethwait, S. N.; And Others
The rationale of the integrated experience approach to teaching botany at Purdue University is given and the history of the audio-tutorial course at Purdue and its present organization are described. A sample week's unit of study is given, including transcription of the tape, reproduction of printed materials and photographs of other materials…
Audio-visual integration through the parallel visual pathways.
Kaposvári, Péter; Csete, Gergő; Bognár, Anna; Csibri, Péter; Tóth, Eszter; Szabó, Nikoletta; Vécsei, László; Sáry, Gyula; Tamás Kincses, Zsigmond
2015-10-22
Audio-visual integration has been shown to be present in a wide range of different conditions, some of which are processed through the dorsal, and others through the ventral visual pathway. Whereas neuroimaging studies have revealed integration-related activity in the brain, there has been no imaging study of the possible role of segregated visual streams in audio-visual integration. We set out to determine how the different visual pathways participate in this communication. We investigated how audio-visual integration can be supported through the dorsal and ventral visual pathways during the double flash illusion. Low-contrast and chromatic isoluminant stimuli were used to drive preferably the dorsal and ventral pathways, respectively. In order to identify the anatomical substrates of the audio-visual interaction in the two conditions, the psychophysical results were correlated with the white matter integrity as measured by diffusion tensor imaging.The psychophysiological data revealed a robust double flash illusion in both conditions. A correlation between the psychophysical results and local fractional anisotropy was found in the occipito-parietal white matter in the low-contrast condition, while a similar correlation was found in the infero-temporal white matter in the chromatic isoluminant condition. Our results indicate that both of the parallel visual pathways may play a role in the audio-visual interaction. Copyright © 2015. Published by Elsevier B.V.
Phillips, Reid H; Jain, Rahil; Browning, Yoni; Shah, Rachana; Kauffman, Peter; Dinh, Doan; Lutz, Barry R
2016-08-16
Fluid control remains a challenge in development of portable lab-on-a-chip devices. Here, we show that microfluidic networks driven by single-frequency audio tones create resonant oscillating flow that is predicted by equivalent electrical circuit models. We fabricated microfluidic devices with fluidic resistors (R), inductors (L), and capacitors (C) to create RLC networks with band-pass resonance in the audible frequency range available on portable audio devices. Microfluidic devices were fabricated from laser-cut adhesive plastic, and a "buzzer" was glued to a diaphragm (capacitor) to integrate the actuator on the device. The AC flowrate magnitude was measured by imaging oscillation of bead tracers to allow direct comparison to the RLC circuit model across the frequency range. We present a systematic build-up from single-channel systems to multi-channel (3-channel) networks, and show that RLC circuit models predict complex frequency-dependent interactions within multi-channel networks. Finally, we show that adding flow rectifying valves to the network creates pumps that can be driven by amplified and non-amplified audio tones from common audio devices (iPod and iPhone). This work shows that RLC circuit models predict resonant flow responses in multi-channel fluidic networks as a step towards microfluidic devices controlled by audio tones.
Audio stream classification for multimedia database search
NASA Astrophysics Data System (ADS)
Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.
2013-03-01
Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.
Selective Attention Modulates the Direction of Audio-Visual Temporal Recalibration
Ikumi, Nara; Soto-Faraco, Salvador
2014-01-01
Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging), was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes. PMID:25004132
Selective attention modulates the direction of audio-visual temporal recalibration.
Ikumi, Nara; Soto-Faraco, Salvador
2014-01-01
Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging), was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes.
Applications of ENF criterion in forensic audio, video, computer and telecommunication analysis.
Grigoras, Catalin
2007-04-11
This article reports on the electric network frequency criterion as a means of assessing the integrity of digital audio/video evidence and forensic IT and telecommunication analysis. A brief description is given to different ENF types and phenomena that determine ENF variations. In most situations, to reach a non-authenticity opinion, the visual inspection of spectrograms and comparison with an ENF database are enough. A more detailed investigation, in the time domain, requires short time windows measurements and analyses. The stability of the ENF over geographical distances has been established by comparison of synchronized recordings made at different locations on the same network. Real cases are presented, in which the ENF criterion was used to investigate audio and video files created with secret surveillance systems, a digitized audio/video recording and a TV broadcasted reportage. By applying the ENF Criterion in forensic audio/video analysis, one can determine whether and where a digital recording has been edited, establish whether it was made at the time claimed, and identify the time and date of the registering operation.
A first demonstration of audio-frequency optical coherence elastography of tissue
NASA Astrophysics Data System (ADS)
Adie, Steven G.; Alexandrov, Sergey A.; Armstrong, Julian J.; Kennedy, Brendan F.; Sampson, David D.
2008-12-01
Optical elastography is aimed at using the visco-elastic properties of soft tissue as a contrast mechanism, and could be particularly suitable for high-resolution differentiation of tumour from surrounding normal tissue. We present a new approach to measure the effect of an applied stimulus in the kilohertz frequency range that is based on optical coherence tomography. We describe the approach and present the first in vivo optical coherence elastography measurements in human skin at audio excitation frequencies.
Taylor, Terence E; Lacalle Muls, Helena; Costello, Richard W; Reilly, Richard B
2018-01-01
Asthma and chronic obstructive pulmonary disease (COPD) patients are required to inhale forcefully and deeply to receive medication when using a dry powder inhaler (DPI). There is a clinical need to objectively monitor the inhalation flow profile of DPIs in order to remotely monitor patient inhalation technique. Audio-based methods have been previously employed to accurately estimate flow parameters such as the peak inspiratory flow rate of inhalations, however, these methods required multiple calibration inhalation audio recordings. In this study, an audio-based method is presented that accurately estimates inhalation flow profile using only one calibration inhalation audio recording. Twenty healthy participants were asked to perform 15 inhalations through a placebo Ellipta™ DPI at a range of inspiratory flow rates. Inhalation flow signals were recorded using a pneumotachograph spirometer while inhalation audio signals were recorded simultaneously using the Inhaler Compliance Assessment device attached to the inhaler. The acoustic (amplitude) envelope was estimated from each inhalation audio signal. Using only one recording, linear and power law regression models were employed to determine which model best described the relationship between the inhalation acoustic envelope and flow signal. Each model was then employed to estimate the flow signals of the remaining 14 inhalation audio recordings. This process repeated until each of the 15 recordings were employed to calibrate single models while testing on the remaining 14 recordings. It was observed that power law models generated the highest average flow estimation accuracy across all participants (90.89±0.9% for power law models and 76.63±2.38% for linear models). The method also generated sufficient accuracy in estimating inhalation parameters such as peak inspiratory flow rate and inspiratory capacity within the presence of noise. Estimating inhaler inhalation flow profiles using audio based methods may be clinically beneficial for inhaler technique training and the remote monitoring of patient adherence.
Lacalle Muls, Helena; Costello, Richard W.; Reilly, Richard B.
2018-01-01
Asthma and chronic obstructive pulmonary disease (COPD) patients are required to inhale forcefully and deeply to receive medication when using a dry powder inhaler (DPI). There is a clinical need to objectively monitor the inhalation flow profile of DPIs in order to remotely monitor patient inhalation technique. Audio-based methods have been previously employed to accurately estimate flow parameters such as the peak inspiratory flow rate of inhalations, however, these methods required multiple calibration inhalation audio recordings. In this study, an audio-based method is presented that accurately estimates inhalation flow profile using only one calibration inhalation audio recording. Twenty healthy participants were asked to perform 15 inhalations through a placebo Ellipta™ DPI at a range of inspiratory flow rates. Inhalation flow signals were recorded using a pneumotachograph spirometer while inhalation audio signals were recorded simultaneously using the Inhaler Compliance Assessment device attached to the inhaler. The acoustic (amplitude) envelope was estimated from each inhalation audio signal. Using only one recording, linear and power law regression models were employed to determine which model best described the relationship between the inhalation acoustic envelope and flow signal. Each model was then employed to estimate the flow signals of the remaining 14 inhalation audio recordings. This process repeated until each of the 15 recordings were employed to calibrate single models while testing on the remaining 14 recordings. It was observed that power law models generated the highest average flow estimation accuracy across all participants (90.89±0.9% for power law models and 76.63±2.38% for linear models). The method also generated sufficient accuracy in estimating inhalation parameters such as peak inspiratory flow rate and inspiratory capacity within the presence of noise. Estimating inhaler inhalation flow profiles using audio based methods may be clinically beneficial for inhaler technique training and the remote monitoring of patient adherence. PMID:29346430
Empowering file-based radio production through media asset management systems
NASA Astrophysics Data System (ADS)
Muylaert, Bjorn; Beckers, Tom
2006-10-01
In recent years, IT-based production and archiving of media has matured to a level which enables broadcasters to switch over from tape- or CD-based to file-based workflows for the production of their radio and television programs. This technology is essential for the future of broadcasters as it provides the flexibility and speed of execution the customer demands by enabling, among others, concurrent access and production, faster than real-time ingest, edit during ingest, centrally managed annotation and quality preservation of media. In terms of automation of program production, the radio department is the most advanced within the VRT, the Flemish broadcaster. Since a couple of years ago, the radio department has been working with digital equipment and producing its programs mainly on standard IT equipment. Historically, the shift from analogue to digital based production has been a step by step process initiated and coordinated by each radio station separately, resulting in a multitude of tools and metadata collections, some of them developed in-house, lacking integration. To make matters worse, each of those stations adopted a slightly different production methodology. The planned introduction of a company-wide Media Asset Management System allows a coordinated overhaul to a unified production architecture. Benefits include the centralized ingest and annotation of audio material and the uniform, integrated (in terms of IT infrastructure) workflow model. Needless to say, the ingest strategy, metadata management and integration with radio production systems play a major role in the level of success of any improvement effort. This paper presents a data model for audio-specific concepts relevant to radio production. It includes an investigation of ingest techniques and strategies. Cooperation with external, professional production tools is demonstrated through a use-case scenario: the integration of an existing, multi-track editing tool with a commercially available Media Asset Management System. This will enable an uncomplicated production chain, with a recognizable look and feel for all system users, regardless of their affiliated radio station, as well as central retrieval and storage of information and metadata.
Social Network Extraction and Analysis Based on Multimodal Dyadic Interaction
Escalera, Sergio; Baró, Xavier; Vitrià, Jordi; Radeva, Petia; Raducanu, Bogdan
2012-01-01
Social interactions are a very important component in people’s lives. Social network analysis has become a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. For our study, we used a set of videos belonging to New York Times’ Blogging Heads opinion blog. The Social Network is represented as an oriented graph, whose directed links are determined by the Influence Model. The links’ weights are a measure of the “influence” a person has over the other. The states of the Influence Model encode automatically extracted audio/visual features from our videos using state-of-the art algorithms. Our results are reported in terms of accuracy of audio/visual data fusion for speaker segmentation and centrality measures used to characterize the extracted social network. PMID:22438733
Rhythmic synchronization tapping to an audio-visual metronome in budgerigars.
Hasegawa, Ai; Okanoya, Kazuo; Hasegawa, Toshikazu; Seki, Yoshimasa
2011-01-01
In all ages and countries, music and dance have constituted a central part in human culture and communication. Recently, vocal-learning animals such as parrots and elephants have been found to share rhythmic ability with humans. Thus, we investigated the rhythmic synchronization of budgerigars, a vocal-mimicking parrot species, under controlled conditions and a systematically designed experimental paradigm as a first step in understanding the evolution of musical entrainment. We trained eight budgerigars to perform isochronous tapping tasks in which they pecked a key to the rhythm of audio-visual metronome-like stimuli. The budgerigars showed evidence of entrainment to external stimuli over a wide range of tempos. They seemed to be inherently inclined to tap at fast tempos, which have a similar time scale to the rhythm of budgerigars' natural vocalizations. We suggest that vocal learning might have contributed to their performance, which resembled that of humans.
Audio Spectrogram Representations for Processing with Convolutional Neural Networks
NASA Astrophysics Data System (ADS)
Wyse, L.
2017-05-01
One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than it seems to be for visual images, and a variety of representations have been used for different applications including the raw digitized sample stream, hand-crafted features, machine discovered features, MFCCs and variants that include deltas, and a variety of spectral representations. This paper reviews some of these representations and issues that arise, focusing particularly on spectrograms for generating audio using neural networks for style transfer.
High performance MPEG-audio decoder IC
NASA Technical Reports Server (NTRS)
Thorn, M.; Benbassat, G.; Cyr, K.; Li, S.; Gill, M.; Kam, D.; Walker, K.; Look, P.; Eldridge, C.; Ng, P.
1993-01-01
The emerging digital audio and video compression technology brings both an opportunity and a new challenge to IC design. The pervasive application of compression technology to consumer electronics will require high volume, low cost IC's and fast time to market of the prototypes and production units. At the same time, the algorithms used in the compression technology result in complex VLSI IC's. The conflicting challenges of algorithm complexity, low cost, and fast time to market have an impact on device architecture and design methodology. The work presented in this paper is about the design of a dedicated, high precision, Motion Picture Expert Group (MPEG) audio decoder.
Three dimensional audio versus head down TCAS displays
NASA Technical Reports Server (NTRS)
Begault, Durand R.; Pittman, Marc T.
1994-01-01
The advantage of a head up auditory display was evaluated in an experiment designed to measure and compare the acquisition time for capturing visual targets under two conditions: Standard head down traffic collision avoidance system (TCAS) display, and three-dimensional (3-D) audio TCAS presentation. Ten commercial airline crews were tested under full mission simulation conditions at the NASA Ames Crew-Vehicle Systems Research Facility Advanced Concepts Flight Simulator. Scenario software generated targets corresponding to aircraft which activated a 3-D aural advisory or a TCAS advisory. Results showed a significant difference in target acquisition time between the two conditions, favoring the 3-D audio TCAS condition by 500 ms.
Aeronautical audio broadcasting via satellite
NASA Technical Reports Server (NTRS)
Tzeng, Forrest F.
1993-01-01
A system design for aeronautical audio broadcasting, with C-band uplink and L-band downlink, via Inmarsat space segments is presented. Near-transparent-quality compression of 5-kHz bandwidth audio at 20.5 kbit/s is achieved based on a hybrid technique employing linear predictive modeling and transform-domain residual quantization. Concatenated Reed-Solomon/convolutional codes with quadrature phase shift keying are selected for bandwidth and power efficiency. RF bandwidth at 25 kHz per channel, and a decoded bit error rate at 10(exp -6) with E(sub b)/N(sub o) at 3.75 dB are obtained. An interleaver, scrambler, modem synchronization, and frame format were designed, and frequency-division multiple access was selected over code-division multiple access. A link budget computation based on a worst-case scenario indicates sufficient system power margins. Transponder occupancy analysis for 72 audio channels demonstrates ample remaining capacity to accommodate emerging aeronautical services.
Automated Cough Assessment on a Mobile Platform
2014-01-01
The development of an Automated System for Asthma Monitoring (ADAM) is described. This consists of a consumer electronics mobile platform running a custom application. The application acquires an audio signal from an external user-worn microphone connected to the device analog-to-digital converter (microphone input). This signal is processed to determine the presence or absence of cough sounds. Symptom tallies and raw audio waveforms are recorded and made easily accessible for later review by a healthcare provider. The symptom detection algorithm is based upon standard speech recognition and machine learning paradigms and consists of an audio feature extraction step followed by a Hidden Markov Model based Viterbi decoder that has been trained on a large database of audio examples from a variety of subjects. Multiple Hidden Markov Model topologies and orders are studied. Performance of the recognizer is presented in terms of the sensitivity and the rate of false alarm as determined in a cross-validation test. PMID:25506590
Audio visual summary: Implementing PURPA in Mid-America
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
The audio-visual presentation, Implementing PURPA in Mid-America, is a slide presentation designed to complement deliverable W-101-2, a booklet entitled Implementing PURPA in Mid-America: A Guide to the Public Utility Regulatory Policies Act. The presentation lasts 10 to 12 min and explains the major sections of PURPA, the rules promulgated by the Federal Energy Regulatory Commission to implement PURPA, and the implications of PURPA and its rules. It delineates the rights and responsibilities of citizens who want to sell electricity to utilities, explains the certification process, and discusses the rights and responsibilities of the utilities.
Auteur Description: From the Director's Creative Vision to Audio Description
ERIC Educational Resources Information Center
Szarkowska, Agnieszka
2013-01-01
In this report, the author follows the suggestion that a film director's creative vision should be incorporated into Audio description (AD), a major technique for making films, theater performances, operas, and other events accessible to people who are blind or have low vision. The author presents a new type of AD for auteur and artistic films:…
ERIC Educational Resources Information Center
Bogale, Gebeyehu W.; Boer, Henk; Seydel, Erwin R.
2011-01-01
In Ethiopia the level of illiteracy in rural areas is very high. In this study, we investigated the effects of an audio HIV/AIDS prevention intervention targeted at rural illiterate females. In the intervention we used social-oriented presentation formats, such as discussion between similar females and role-play. In a pretest and posttest…
Learning Vocabulary through E-Book Reading of Young Children with Various Reading Abilities
ERIC Educational Resources Information Center
Lee, Sung Hee
2017-01-01
Previous studies revealed that young children learn novel word meanings by simply reading and listening to a printed book. In today's classroom, many children's e-books provide audio narration support so young readers can simply listen to the e-books. The focus of the present study is to examine the effect of e-book reading with audio narration…
ERIC Educational Resources Information Center
Cooper, William
The material presented here is the result of a review of the Technical Development Plan of the National Library of Medicine, made with the object of describing the role of audiovisual materials in medical education, research and service, and particularly in the continuing education of physicians and allied health personnel. A historical background…
Automatic summarization of soccer highlights using audio-visual descriptors.
Raventós, A; Quijada, R; Torres, Luis; Tarrés, Francesc
2015-01-01
Automatic summarization generation of sports video content has been object of great interest for many years. Although semantic descriptions techniques have been proposed, many of the approaches still rely on low-level video descriptors that render quite limited results due to the complexity of the problem and to the low capability of the descriptors to represent semantic content. In this paper, a new approach for automatic highlights summarization generation of soccer videos using audio-visual descriptors is presented. The approach is based on the segmentation of the video sequence into shots that will be further analyzed to determine its relevance and interest. Of special interest in the approach is the use of the audio information that provides additional robustness to the overall performance of the summarization system. For every video shot a set of low and mid level audio-visual descriptors are computed and lately adequately combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. A variety of results are presented with real soccer video sequences that prove the validity of the approach.
A real-time detector system for precise timing of audiovisual stimuli.
Henelius, Andreas; Jagadeesan, Sharman; Huotilainen, Minna
2012-01-01
The successful recording of neurophysiologic signals, such as event-related potentials (ERPs) or event-related magnetic fields (ERFs), relies on precise information of stimulus presentation times. We have developed an accurate and flexible audiovisual sensor solution operating in real-time for on-line use in both auditory and visual ERP and ERF paradigms. The sensor functions independently of the used audio or video stimulus presentation tools or signal acquisition system. The sensor solution consists of two independent sensors; one for sound and one for light. The microcontroller-based audio sensor incorporates a novel approach to the detection of natural sounds such as multipart audio stimuli, using an adjustable dead time. This aids in producing exact markers for complex auditory stimuli and reduces the number of false detections. The analog photosensor circuit detects changes in light intensity on the screen and produces a marker for changes exceeding a threshold. The microcontroller software for the audio sensor is free and open source, allowing other researchers to customise the sensor for use in specific auditory ERP/ERF paradigms. The hardware schematics and software for the audiovisual sensor are freely available from the webpage of the authors' lab.
Schwartz, Jean-Luc; Savariaux, Christophe
2014-01-01
An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction. PMID:25079216
Digital Multicasting of Multiple Audio Streams
NASA Technical Reports Server (NTRS)
Macha, Mitchell; Bullock, John
2007-01-01
The Mission Control Center Voice Over Internet Protocol (MCC VOIP) system (see figure) comprises hardware and software that effect simultaneous, nearly real-time transmission of as many as 14 different audio streams to authorized listeners via the MCC intranet and/or the Internet. The original version of the MCC VOIP system was conceived to enable flight-support personnel located in offices outside a spacecraft mission control center to monitor audio loops within the mission control center. Different versions of the MCC VOIP system could be used for a variety of public and commercial purposes - for example, to enable members of the general public to monitor one or more NASA audio streams through their home computers, to enable air-traffic supervisors to monitor communication between airline pilots and air-traffic controllers in training, and to monitor conferences among brokers in a stock exchange. At the transmitting end, the audio-distribution process begins with feeding the audio signals to analog-to-digital converters. The resulting digital streams are sent through the MCC intranet, using a user datagram protocol (UDP), to a server that converts them to encrypted data packets. The encrypted data packets are then routed to the personal computers of authorized users by use of multicasting techniques. The total data-processing load on the portion of the system upstream of and including the encryption server is the total load imposed by all of the audio streams being encoded, regardless of the number of the listeners or the number of streams being monitored concurrently by the listeners. The personal computer of a user authorized to listen is equipped with special- purpose MCC audio-player software. When the user launches the program, the user is prompted to provide identification and a password. In one of two access- control provisions, the program is hard-coded to validate the user s identity and password against a list maintained on a domain-controller computer at the MCC. In the other access-control provision, the program verifies that the user is authorized to have access to the audio streams. Once both access-control checks are completed, the audio software presents a graphical display that includes audiostream-selection buttons and volume-control sliders. The user can select all or any subset of the available audio streams and can adjust the volume of each stream independently of that of the other streams. The audio-player program spawns a "read" process for the selected stream(s). The spawned process sends, to the router(s), a "multicast-join" request for the selected streams. The router(s) responds to the request by sending the encrypted multicast packets to the spawned process. The spawned process receives the encrypted multicast packets and sends a decryption packet to audio-driver software. As the volume or muting features are changed by the user, interrupts are sent to the spawned process to change the corresponding attributes sent to the audio-driver software. The total latency of this system - that is, the total time from the origination of the audio signals to generation of sound at a listener s computer - lies between four and six seconds.
Audio frequency in vivo optical coherence elastography
NASA Astrophysics Data System (ADS)
Adie, Steven G.; Kennedy, Brendan F.; Armstrong, Julian J.; Alexandrov, Sergey A.; Sampson, David D.
2009-05-01
We present a new approach to optical coherence elastography (OCE), which probes the local elastic properties of tissue by using optical coherence tomography to measure the effect of an applied stimulus in the audio frequency range. We describe the approach, based on analysis of the Bessel frequency spectrum of the interferometric signal detected from scatterers undergoing periodic motion in response to an applied stimulus. We present quantitative results of sub-micron excitation at 820 Hz in a layered phantom and the first such measurements in human skin in vivo.
Active noise control for infant incubators.
Yu, Xun; Gujjula, Shruthi; Kuo, Sen M
2009-01-01
This paper presents an active noise control system for infant incubators. Experimental results show that global noise reduction can be achieved for infant incubator ANC systems. An audio-integration algorithm is presented to introduce a healthy audio (intrauterine) sound with the ANC system to mask the residual noise and soothe the infant. Carbon nanotube based transparent thin film speaker is also introduced in this paper as the actuator for the ANC system to generate the destructive secondary sound, which can significantly save the congested incubator space and without blocking the view of doctors and nurses.
ERIC Educational Resources Information Center
Cooke, Nancy L.; Mackiewicz, Sara Moore; Wood, Charles L.; Helf, Shawnna
2009-01-01
Parents with Limited English Proficiency (LEP) may find it difficult to become involved in their children's education due to their lack of English proficiency. The present study examined the effects of using audio prompting to assist mothers with LEP in teaching their preschool children English vocabulary. Mothers were trained to tutor their…
ERIC Educational Resources Information Center
ManTech Technical Services Corp., Fairfax, VA.
This report presents the results of a management study of audio playback equipment operations conducted by the National Library Service, Library of Congress, its associated network of state and local machine lending agencies (MLA), and other parties that play a role in current operations. The objectives were to document current operations,…
Language Teaching with the Help of Multiple Methods. Collection d'"Etudes linguistiques," No. 21.
ERIC Educational Resources Information Center
Nivette, Jos, Ed.
This book presents articles on language teaching media. Among the titles are: (1) "Il Foreign Language Teaching e l'impiego degli audio-visivi" (Foreign Language Teaching and the Use of Audio Visual Methods) by D'Agostino, (2) "Le role et la nature de l'image dans l'enseignement programme de l'anglais, langue seconde" (The Role and Nature of the…
ERIC Educational Resources Information Center
Avance, Lyonel D.; Carr, Dorothy B.
Presented is the final report of a project to develop and field test audio and visual media to accompany developmentally sequenced activities appropriate for a physical education program for handicapped children from preschool through high school. Brief sections cover the following: the purposes and accomplishments of the project; the population…
Design of a video teleconference facility for a synchronous satellite communications link
NASA Technical Reports Server (NTRS)
Richardson, M. D.
1979-01-01
The system requirements, design tradeoffs, and final design of a video teleconference facility are discussed, including proper lighting, graphics transmission, and picture aesthetics. Methods currently accepted in the television broadcast industry are used in the design. The unique problems associated with using an audio channel with a synchronous satellite communications link are discussed, and a final audio system design is presented.
Objective Assessment of Patient Inhaler User Technique Using an Audio-Based Classification Approach.
Taylor, Terence E; Zigel, Yaniv; Egan, Clarice; Hughes, Fintan; Costello, Richard W; Reilly, Richard B
2018-02-01
Many patients make critical user technique errors when using pressurised metered dose inhalers (pMDIs) which reduce the clinical efficacy of respiratory medication. Such critical errors include poor actuation coordination (poor timing of medication release during inhalation) and inhaling too fast (peak inspiratory flow rate over 90 L/min). Here, we present a novel audio-based method that objectively assesses patient pMDI user technique. The Inhaler Compliance Assessment device was employed to record inhaler audio signals from 62 respiratory patients as they used a pMDI with an In-Check Flo-Tone device attached to the inhaler mouthpiece. Using a quadratic discriminant analysis approach, the audio-based method generated a total frame-by-frame accuracy of 88.2% in classifying sound events (actuation, inhalation and exhalation). The audio-based method estimated the peak inspiratory flow rate and volume of inhalations with an accuracy of 88.2% and 83.94% respectively. It was detected that 89% of patients made at least one critical user technique error even after tuition from an expert clinical reviewer. This method provides a more clinically accurate assessment of patient inhaler user technique than standard checklist methods.
Luque, Joaquín; Larios, Diego F; Personal, Enrique; Barbancho, Julio; León, Carlos
2016-05-18
Environmental audio monitoring is a huge area of interest for biologists all over the world. This is why some audio monitoring system have been proposed in the literature, which can be classified into two different approaches: acquirement and compression of all audio patterns in order to send them as raw data to a main server; or specific recognition systems based on audio patterns. The first approach presents the drawback of a high amount of information to be stored in a main server. Moreover, this information requires a considerable amount of effort to be analyzed. The second approach has the drawback of its lack of scalability when new patterns need to be detected. To overcome these limitations, this paper proposes an environmental Wireless Acoustic Sensor Network architecture focused on use of generic descriptors based on an MPEG-7 standard. These descriptors demonstrate it to be suitable to be used in the recognition of different patterns, allowing a high scalability. The proposed parameters have been tested to recognize different behaviors of two anuran species that live in Spanish natural parks; the Epidalea calamita and the Alytes obstetricans toads, demonstrating to have a high classification performance.
Apparatus for providing sensory substitution of force feedback
NASA Technical Reports Server (NTRS)
Massimino, Michael J. (Inventor); Sheridan, Thomas B. (Inventor)
1995-01-01
A feedback apparatus for an operator to control an effector that is remote from the operator to interact with a remote environment has a local input device to be manipulated by the operator. Sensors in the effector's environment are capable of sensing the amplitude of forces arising between the effector and its environment, the direction of application of such forces, or both amplitude and direction. A feedback signal corresponding to such a component of the force, is generated and transmitted to the environment of the operator. The signal is transduced into an auditory sensory substitution signal to which the operator is sensitive. Sound production apparatus present the auditory signal to the operator. The full range of the force amplitude may be represented by a single, audio speaker. Auditory display elements may be stereo headphones or free standing audio speakers, numbering from one to many more than two. The location of the application of the force may also be specified by the location of audio speakers that generate signals corresponding to specific forces. Alternatively, the location may be specified by the frequency of an audio signal, or by the apparent location of an audio signal, as simulated by a combination of signals originating at different locations.
Luque, Joaquín; Larios, Diego F.; Personal, Enrique; Barbancho, Julio; León, Carlos
2016-01-01
Environmental audio monitoring is a huge area of interest for biologists all over the world. This is why some audio monitoring system have been proposed in the literature, which can be classified into two different approaches: acquirement and compression of all audio patterns in order to send them as raw data to a main server; or specific recognition systems based on audio patterns. The first approach presents the drawback of a high amount of information to be stored in a main server. Moreover, this information requires a considerable amount of effort to be analyzed. The second approach has the drawback of its lack of scalability when new patterns need to be detected. To overcome these limitations, this paper proposes an environmental Wireless Acoustic Sensor Network architecture focused on use of generic descriptors based on an MPEG-7 standard. These descriptors demonstrate it to be suitable to be used in the recognition of different patterns, allowing a high scalability. The proposed parameters have been tested to recognize different behaviors of two anuran species that live in Spanish natural parks; the Epidalea calamita and the Alytes obstetricans toads, demonstrating to have a high classification performance. PMID:27213375
Characterization of HF Propagation for Digital Audio Broadcasting
NASA Technical Reports Server (NTRS)
Vaisnys, Arvydas
1997-01-01
The purpose of this presentation is to give a brief overview of some propagation measurements in the Short Wave (3-30 MHz) bands, made in support of a digital audio transmission system design for the Voice of America. This task is a follow on to the Digital Broadcast Satellite Radio task, during which several mitigation techniques would be applicable to digital audio in the Short Wave bands as well, in spite of the differences in propagation impairments in these two bands. Two series of propagation measurements were made to quantify the range of impairments that could be expected. An assessment of the performance of a prototype version of the receiver was also made.
Magnetic Resonance and Spectroscopy of the Human Brain in Gulf War Illness
2005-08-01
relationship between GWI and stress . Acoustic startle is a hallmark feature of PTSD . Past studies have shown that PTSD subjects have an increased startle...brain, neuro- psychological testing, audio vestibular testing, PTSD 16. SECURITY CLASSIFICATION OF: 17. LIMITATION 18. NUMBER 19a. NAME OF...such as PTSD , depression, or alcohol abuse. 2) Reduced NAA in the basal ganglia and pons correlates with central nervous system signs and symptoms of
Gautam, Anjali; Bhambal, Ajay; Moghe, Swapnil
2018-01-01
Children with special needs face unique challenges in day-to-day practice. They are dependent on their close ones for everything. To improve oral hygiene in such visually impaired children, undue training and education are required. Braille is an important language for reading and writing for the visually impaired. It helps them understand and visualize the world via touch. Audio aids are being used to impart health education to the visually impaired. Tactile models help them perceive things which they cannot visualize and hence are an important learning tool. This study aimed to assess the improvement in oral hygiene by audio aids and Braille and tactile models in visually impaired children aged 6-16 years of Bhopal city. This was a prospective study. Sixty visually impaired children aged 6-16 years were selected and randomly divided into three groups (20 children each). Group A: audio aids + Braille, Group B: audio aids + tactile models, and Group C: audio aids + Braille + tactile models. Instructions were given for maintaining good oral hygiene and brushing techniques were explained to all children. After 3 months' time, the oral hygiene status was recorded and compared using plaque and gingival index. ANNOVA test was used. The present study showed a decrease in the mean plaque and gingival scores at all time intervals in individual group as compared to that of the baseline that was statistically significant. The study depicts that the combination of audio aids, Braille and tactile models is an effective way to provide oral health education and improve oral health status of visually impaired children.
ERIC Educational Resources Information Center
Dow, James
Ways in which computers and audio tape recorder techniques were used to record, index, and present data collected during two summers of field work in a rural area of Mexico are described. The research goal was to study the Otomi Indian shamans. Two computers were used: the Honeywell 6800 DPS-2 and the Osborne-1 microcomputer. The database system…
Report on Distance Learning Technologies.
1995-09-01
26 cities. The CSX system includes full-motion video, animations , audio, and interactive examples and testing to teach the use of a new computer...video. The change to all-digital media now permits the use of full-motion video, animation , and audio on networks. It is possible to have independent...is possible to download entire multimedia presentations from the network. To date there is not a great deal known about teaching courses using the
Aural Communication in Aviation.
1981-06-01
of standards. f. Audio Warnings and Controls Voice versus tone warnings. Design of highly descriminative audio warnings. Optimum number of warnings to...EIGHT TABLE 1 Experimental Procedure The present studies were designed so that each subject served as his/her own control , i.e., each subject... controller is experienced and the message is unexpected, and especially if one or both of them are non -native speakers of English. This should be taken
Quantitative Information Differences Between Object-Person Presentation Methods
ERIC Educational Resources Information Center
Boyd, J. Edwin; Perry, Raymond P.
1972-01-01
Subjects used significantly more adjectives, on an adjective checklist (ACL), in giving their impressions of an object-person; based on written and audiovisual presentations, more than audio presentations. (SD)
Wavelet-based audio embedding and audio/video compression
NASA Astrophysics Data System (ADS)
Mendenhall, Michael J.; Claypoole, Roger L., Jr.
2001-12-01
Watermarking, traditionally used for copyright protection, is used in a new and exciting way. An efficient wavelet-based watermarking technique embeds audio information into a video signal. Several effective compression techniques are applied to compress the resulting audio/video signal in an embedded fashion. This wavelet-based compression algorithm incorporates bit-plane coding, index coding, and Huffman coding. To demonstrate the potential of this audio embedding and audio/video compression algorithm, we embed an audio signal into a video signal and then compress. Results show that overall compression rates of 15:1 can be achieved. The video signal is reconstructed with a median PSNR of nearly 33 dB. Finally, the audio signal is extracted from the compressed audio/video signal without error.
Three-Dimensional Audio Client Library
NASA Technical Reports Server (NTRS)
Rizzi, Stephen A.
2005-01-01
The Three-Dimensional Audio Client Library (3DAudio library) is a group of software routines written to facilitate development of both stand-alone (audio only) and immersive virtual-reality application programs that utilize three-dimensional audio displays. The library is intended to enable the development of three-dimensional audio client application programs by use of a code base common to multiple audio server computers. The 3DAudio library calls vendor-specific audio client libraries and currently supports the AuSIM Gold-Server and Lake Huron audio servers. 3DAudio library routines contain common functions for (1) initiation and termination of a client/audio server session, (2) configuration-file input, (3) positioning functions, (4) coordinate transformations, (5) audio transport functions, (6) rendering functions, (7) debugging functions, and (8) event-list-sequencing functions. The 3DAudio software is written in the C++ programming language and currently operates under the Linux, IRIX, and Windows operating systems.
The VTLA System of Course Delivery and Faculty Development in Materials Education
NASA Technical Reports Server (NTRS)
Berrettini, Robert; Roy, Rustum
1996-01-01
There is a national need for high-quality, upper division courses that address critical topics in materials synthesis, particularly those beyond the present expertise of the typical university department's faculty. A new project has been started to test a novel distance education and faculty development system, called Video Tape Live Audio (VTLA). This, if successful, would at once enlarge the national Materials Science and Engineering (MSE) student cohort studying material synthesis and develop faculty expertise at the receiving sites. The mechanics for the VTLA scheme are as follows: A course is designed in the field selected for emphasis and for which there is likely to be considerable demand, in this example 'Ceramic Materials Synthesis: Theory and Case Studies'. One of the very best researcher/teachers records lectures of TV studio quality with appropriate visuals. Universities and colleges which wish to offer the course agree to offer it at the same hour at least once a week. The videotaped lectures and accompanying text, readings and visuals are shipped to the professor in charge, who has an appropriate background. The professor arranges the classroom TV presentation equipment and supervises the course. Video lectures are played during regular course hours twice a week with time for discussion by the supervising professor. Typically the third weekly classroom period is scheduled by all sites at a common designated hour, during which the course author/presenter answers questions, provides greater depth, etc. on a live audio link to all course sites. Questions are submitted by fax and e-mail prior to the audio tutorial. coordinating professors at various sites have separate audio teleconferences at the beginning and end of the course, dealing with the philosophical and pedagogical approach to the course, content and mechanics. Following service once or twice as an 'apprentice' to the course, the coordinating professors may then offer it without the necessity of the live audio tutorial.
Audio-Visual Perception System for a Humanoid Robotic Head
Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M.; Bandera, Juan P.; Romero-Garces, Adrian; Reche-Lopez, Pedro
2014-01-01
One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework. PMID:24878593
Grigsby, Timothy J; Unger, Jennifer B; Molina, Gregory B; Baron, Mel
2017-01-01
Dementia is a clinical syndrome characterized by progressive degeneration in cognitive ability that limits the capacity for independent living. Interventions are needed to target the medical, social, psychological, and knowledge needs of caregivers and patients. This study used a mixed methods approach to evaluate the effectiveness of a dementia novela presented in an audio-visual format in improving dementia attitudes, beliefs and knowledge. Adults from Los Angeles (N = 42, 83% female, 90% Hispanic/Latino, mean age = 42.2 years, 41.5% with less than a high school education) viewed an audio-visual novela on dementia. Participants completed surveys immediately before and after viewing the material. The novela produced significant improvements in overall knowledge (t(41) = -9.79, p < .0001) and led to positive increases in specific attitudes toward people with dementia but not in beliefs that screening would be beneficial. Qualitative results provided concordant and discordant evidence for the quantitative findings. Results indicate that an audio-visual novela can be useful for improving attitudes and knowledge about dementia, but further work is needed to investigate the relation with health disparities in screening and treatment behaviors. Audio visual novelas are an innovative format for health education and change attitudes and knowledge about dementia.
Digital signal processing techniques for pitch shifting and time scaling of audio signals
NASA Astrophysics Data System (ADS)
Buś, Szymon; Jedrzejewski, Konrad
2016-09-01
In this paper, we present the techniques used for modifying the spectral content (pitch shifting) and for changing the time duration (time scaling) of an audio signal. A short introduction gives a necessary background for understanding the discussed issues and contains explanations of the terms used in the paper. In subsequent sections we present three different techniques appropriate both for pitch shifting and for time scaling. These techniques use three different time-frequency representations of a signal, namely short-time Fourier transform (STFT), continuous wavelet transform (CWT) and constant-Q transform (CQT). The results of simulation studies devoted to comparison of the properties of these methods are presented and discussed in the paper.
Bayesian networks and information theory for audio-visual perception modeling.
Besson, Patricia; Richiardi, Jonas; Bourdin, Christophe; Bringoux, Lionel; Mestre, Daniel R; Vercher, Jean-Louis
2010-09-01
Thanks to their different senses, human observers acquire multiple information coming from their environment. Complex cross-modal interactions occur during this perceptual process. This article proposes a framework to analyze and model these interactions through a rigorous and systematic data-driven process. This requires considering the general relationships between the physical events or factors involved in the process, not only in quantitative terms, but also in term of the influence of one factor on another. We use tools from information theory and probabilistic reasoning to derive relationships between the random variables of interest, where the central notion is that of conditional independence. Using mutual information analysis to guide the model elicitation process, a probabilistic causal model encoded as a Bayesian network is obtained. We exemplify the method by using data collected in an audio-visual localization task for human subjects, and we show that it yields a well-motivated model with good predictive ability. The model elicitation process offers new prospects for the investigation of the cognitive mechanisms of multisensory perception.
NASA Astrophysics Data System (ADS)
Tekesin-Cankurtaranlar, Ozge; Tuysuz, Okan; Riza Kilic, Ali
2017-04-01
In this study, we present the results of Magnetotelluric (MT) and Audio-magnetotelluric (AMT) soundings over a potential geothermal field. Study area is located in the northeasternmost part of the Alasehir (or Gediz) Graben, Western Anatolia, which is delimited by NW-SE trending fault systems and is filled by Miocene to Recent sediments. Study area is also very close to the Kula Quaternary volcanic region, a possible geothermal heat source for the region, last eruption of which was 12.000 years ago. Relatively thin crust, high heat flow values and intense tectonic activity of the Western Anatolia possibly refers to the high geothermal potential. In fact, along the southern and central part of the graben there are many productive areas reaching up to 300 degrees Celsius. By this motivation, to determine the geothermal potential of the study area MT and AMT measurements had been carried out on a total of 45 stations covering about 8 km2 area. All profiles shows higher resistivity values (>140 ohm.m) at greater depths, possibly indicating a metamorphic basement covered by Miocene to Recent sediments. This metamorphic basement gets shallower towards the North where the geothermally weathered schists and marbles crop out. Furthermore, a normal fault interface between metamorphic basement and Neogene sediments shows high resistivity contrast. Results indicate that the metamorphic basement is a less conductive block located at a depth of 1500 - 2000 m at the south and gets shallower towards the north as normal fault blocks.
Long-term memory biases auditory spatial attention.
Zimmermann, Jacqueline F; Moscovitch, Morris; Alain, Claude
2017-10-01
Long-term memory (LTM) has been shown to bias attention to a previously learned visual target location. Here, we examined whether memory-predicted spatial location can facilitate the detection of a faint pure tone target embedded in real world audio clips (e.g., soundtrack of a restaurant). During an initial familiarization task, participants heard audio clips, some of which included a lateralized target (p = 50%). On each trial participants indicated whether the target was presented from the left, right, or was absent. Following a 1 hr retention interval, participants were presented with the same audio clips, which now all included a target. In Experiment 1, participants showed memory-based gains in response time and d'. Experiment 2 showed that temporal expectations modulate attention, with greater memory-guided attention effects on performance when temporal context was reinstated from learning (i.e., when timing of the target within audio clips was not changed from initially learned timing). Experiment 3 showed that while conscious recall of target locations was modulated by exposure to target-context associations during learning (i.e., better recall with higher number of learning blocks), the influence of LTM associations on spatial attention was not reduced (i.e., number of learning blocks did not affect memory-guided attention). Both Experiments 2 and 3 showed gains in performance related to target-context associations, even for associations that were not explicitly remembered. Together, these findings indicate that memory for audio clips is acquired quickly and is surprisingly robust; both implicit and explicit LTM for the location of a faint target tone modulated auditory spatial attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Ad Hoc Selection of Voice over Internet Streams
NASA Technical Reports Server (NTRS)
Macha, Mitchell G. (Inventor); Bullock, John T. (Inventor)
2014-01-01
A method and apparatus for a communication system technique involving ad hoc selection of at least two audio streams is provided. Each of the at least two audio streams is a packetized version of an audio source. A data connection exists between a server and a client where a transport protocol actively propagates the at least two audio streams from the server to the client. Furthermore, software instructions executable on the client indicate a presence of the at least two audio streams, allow selection of at least one of the at least two audio streams, and direct the selected at least one of the at least two audio streams for audio playback.
Ad Hoc Selection of Voice over Internet Streams
NASA Technical Reports Server (NTRS)
Macha, Mitchell G. (Inventor); Bullock, John T. (Inventor)
2008-01-01
A method and apparatus for a communication system technique involving ad hoc selection of at least two audio streams is provided. Each of the at least two audio streams is a packetized version of an audio source. A data connection exists between a server and a client where a transport protocol actively propagates the at least two audio streams from the server to the client. Furthermore, software instructions executable on the client indicate a presence of the at least two audio streams, allow selection of at least one of the at least two audio streams, and direct the selected at least one of the at least two audio streams for audio playback.
Audio in Courseware: Design Knowledge Issues.
ERIC Educational Resources Information Center
Aarntzen, Diana
1993-01-01
Considers issues that need to be addressed when incorporating audio in courseware design. Topics discussed include functions of audio in courseware; the relationship between auditive and visual information; learner characteristics in relation to audio; events of instruction; and audio characteristics, including interactivity and speech technology.…
A Virtual Audio Guidance and Alert System for Commercial Aircraft Operations
NASA Technical Reports Server (NTRS)
Begault, Durand R.; Wenzel, Elizabeth M.; Shrum, Richard; Miller, Joel; Null, Cynthia H. (Technical Monitor)
1996-01-01
Our work in virtual reality systems at NASA Ames Research Center includes the area of aurally-guided visual search, using specially-designed audio cues and spatial audio processing (also known as virtual or "3-D audio") techniques (Begault, 1994). Previous studies at Ames had revealed that use of 3-D audio for Traffic Collision Avoidance System (TCAS) advisories significantly reduced head-down time, compared to a head-down map display (0.5 sec advantage) or no display at all (2.2 sec advantage) (Begault, 1993, 1995; Begault & Pittman, 1994; see Wenzel, 1994, for an audio demo). Since the crew must keep their head up and looking out the window as much as possible when taxiing under low-visibility conditions, and the potential for "blunder" is increased under such conditions, it was sensible to evaluate the audio spatial cueing for a prototype audio ground collision avoidance warning (GCAW) system, and a 3-D audio guidance system. Results were favorable for GCAW, but not for the audio guidance system.
The priming function of in-car audio instruction.
Keyes, Helen; Whitmore, Antony; Naneva, Stanislava; McDermott, Daragh
2018-05-01
Studies to date have focused on the priming power of visual road signs, but not the priming potential of audio road scene instruction. Here, the relative priming power of visual, audio, and multisensory road scene instructions was assessed. In a lab-based study, participants responded to target road scene turns following visual, audio, or multisensory road turn primes which were congruent or incongruent to the primes in direction, or control primes. All types of instruction (visual, audio, and multisensory) were successful in priming responses to a road scene. Responses to multisensory-primed targets (both audio and visual) were faster than responses to either audio or visual primes alone. Incongruent audio primes did not affect performance negatively in the manner of incongruent visual or multisensory primes. Results suggest that audio instructions have the potential to prime drivers to respond quickly and safely to their road environment. Peak performance will be observed if audio and visual road instruction primes can be timed to co-occur.
Robust audio-visual speech recognition under noisy audio-video conditions.
Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji
2014-02-01
This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Audio-visual interactions in environment assessment.
Preis, Anna; Kociński, Jędrzej; Hafke-Dys, Honorata; Wrzosek, Małgorzata
2015-08-01
The aim of the study was to examine how visual and audio information influences audio-visual environment assessment. Original audio-visual recordings were made at seven different places in the city of Poznań. Participants of the psychophysical experiments were asked to rate, on a numerical standardized scale, the degree of comfort they would feel if they were in such an environment. The assessments of audio-visual comfort were carried out in a laboratory in four different conditions: (a) audio samples only, (b) original audio-visual samples, (c) video samples only, and (d) mixed audio-visual samples. The general results of this experiment showed a significant difference between the investigated conditions, but not for all the investigated samples. There was a significant improvement in comfort assessment when visual information was added (in only three out of 7 cases), when conditions (a) and (b) were compared. On the other hand, the results show that the comfort assessment of audio-visual samples could be changed by manipulating the audio rather than the video part of the audio-visual sample. Finally, it seems, that people could differentiate audio-visual representations of a given place in the environment based rather of on the sound sources' compositions than on the sound level. Object identification is responsible for both landscape and soundscape grouping. Copyright © 2015. Published by Elsevier B.V.
Content-based audio authentication using a hierarchical patchwork watermark embedding
NASA Astrophysics Data System (ADS)
Gulbis, Michael; Müller, Erika
2010-05-01
Content-based audio authentication watermarking techniques extract perceptual relevant audio features, which are robustly embedded into the audio file to protect. Manipulations of the audio file are detected on the basis of changes between the original embedded feature information and the anew extracted features during verification. The main challenges of content-based watermarking are on the one hand the identification of a suitable audio feature to distinguish between content preserving and malicious manipulations. On the other hand the development of a watermark, which is robust against content preserving modifications and able to carry the whole authentication information. The payload requirements are significantly higher compared to transaction watermarking or copyright protection. Finally, the watermark embedding should not influence the feature extraction to avoid false alarms. Current systems still lack a sufficient alignment of watermarking algorithm and feature extraction. In previous work we developed a content-based audio authentication watermarking approach. The feature is based on changes in DCT domain over time. A patchwork algorithm based watermark was used to embed multiple one bit watermarks. The embedding process uses the feature domain without inflicting distortions to the feature. The watermark payload is limited by the feature extraction, more precisely the critical bands. The payload is inverse proportional to segment duration of the audio file segmentation. Transparency behavior was analyzed in dependence of segment size and thus the watermark payload. At a segment duration of about 20 ms the transparency shows an optimum (measured in units of Objective Difference Grade). Transparency and/or robustness are fast decreased for working points beyond this area. Therefore, these working points are unsuitable to gain further payload, needed for the embedding of the whole authentication information. In this paper we present a hierarchical extension of the watermark method to overcome the limitations given by the feature extraction. The approach is a recursive application of the patchwork algorithm onto its own patches, with a modified patch selection to ensure a better signal to noise ratio for the watermark embedding. The robustness evaluation was done by compression (mp3, ogg, aac), normalization, and several attacks of the stirmark benchmark for audio suite. Compared on the base of same payload and transparency the hierarchical approach shows improved robustness.
Evaluating the Use of Auditory Systems to Improve Performance in Combat Search and Rescue
2012-03-01
take advantage of human binaural hearing to present spatial information through auditory stimuli as it would occur in the real world. This allows the...multiple operators unambiguously and in a short amount of time. Spatial audio basics Spatial audio works with human binaural hearing to generate... binaural recordings “sound better” when heard in the same location where the recordings were made. While this appears to be related to the acoustic
47 CFR 73.403 - Digital audio broadcasting service requirements.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 47 Telecommunication 4 2012-10-01 2012-10-01 false Digital audio broadcasting service requirements... SERVICES RADIO BROADCAST SERVICES Digital Audio Broadcasting § 73.403 Digital audio broadcasting service requirements. (a) Broadcast radio stations using IBOC must transmit at least one over-the-air digital audio...
47 CFR 73.403 - Digital audio broadcasting service requirements.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 4 2011-10-01 2011-10-01 false Digital audio broadcasting service requirements... SERVICES RADIO BROADCAST SERVICES Digital Audio Broadcasting § 73.403 Digital audio broadcasting service requirements. (a) Broadcast radio stations using IBOC must transmit at least one over-the-air digital audio...
47 CFR 73.403 - Digital audio broadcasting service requirements.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 47 Telecommunication 4 2014-10-01 2014-10-01 false Digital audio broadcasting service requirements... SERVICES RADIO BROADCAST SERVICES Digital Audio Broadcasting § 73.403 Digital audio broadcasting service requirements. (a) Broadcast radio stations using IBOC must transmit at least one over-the-air digital audio...
47 CFR 73.403 - Digital audio broadcasting service requirements.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 47 Telecommunication 4 2013-10-01 2013-10-01 false Digital audio broadcasting service requirements... SERVICES RADIO BROADCAST SERVICES Digital Audio Broadcasting § 73.403 Digital audio broadcasting service requirements. (a) Broadcast radio stations using IBOC must transmit at least one over-the-air digital audio...
Topper, Nicholas C.; Burke, S.N.; Maurer, A.P.
2014-01-01
BACKGROUND Current methods for aligning neurophysiology and video data are either prepackaged, requiring the additional purchase of a software suite, or use a blinking LED with a stationary pulse-width and frequency. These methods lack significant user interface for adaptation, are expensive, or risk a misalignment of the two data streams. NEW METHOD A cost-effective means to obtain high-precision alignment of behavioral and neurophysiological data is obtained by generating an audio-pulse embedded with two domains of information, a low-frequency binary-counting signal and a high, randomly changing frequency. This enabled the derivation of temporal information while maintaining enough entropy in the system for algorithmic alignment. RESULTS The sample to frame index constructed using the audio input correlation method described in this paper enables video and data acquisition to be aligned at a sub-frame level of precision. COMPARISONS WITH EXISTING METHOD Traditionally, a synchrony pulse is recorded on-screen via a flashing diode. The higher sampling rate of the audio input of the camcorder enables the timing of an event to be detected with greater precision. CONCLUSIONS While On-line analysis and synchronization using specialized equipment may be the ideal situation in some cases, the method presented in the current paper presents a viable, low cost alternative, and gives the flexibility to interface with custom off-line analysis tools. Moreover, the ease of constructing and implements this set-up presented in the current paper makes it applicable to a wide variety of applications that require video recording. PMID:25256648
Topper, Nicholas C; Burke, Sara N; Maurer, Andrew Porter
2014-12-30
Current methods for aligning neurophysiology and video data are either prepackaged, requiring the additional purchase of a software suite, or use a blinking LED with a stationary pulse-width and frequency. These methods lack significant user interface for adaptation, are expensive, or risk a misalignment of the two data streams. A cost-effective means to obtain high-precision alignment of behavioral and neurophysiological data is obtained by generating an audio-pulse embedded with two domains of information, a low-frequency binary-counting signal and a high, randomly changing frequency. This enabled the derivation of temporal information while maintaining enough entropy in the system for algorithmic alignment. The sample to frame index constructed using the audio input correlation method described in this paper enables video and data acquisition to be aligned at a sub-frame level of precision. Traditionally, a synchrony pulse is recorded on-screen via a flashing diode. The higher sampling rate of the audio input of the camcorder enables the timing of an event to be detected with greater precision. While on-line analysis and synchronization using specialized equipment may be the ideal situation in some cases, the method presented in the current paper presents a viable, low cost alternative, and gives the flexibility to interface with custom off-line analysis tools. Moreover, the ease of constructing and implements this set-up presented in the current paper makes it applicable to a wide variety of applications that require video recording. Copyright © 2014 Elsevier B.V. All rights reserved.
Desantis, Andrea; Haggard, Patrick
2016-01-01
To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events. PMID:27982063
Desantis, Andrea; Haggard, Patrick
2016-12-16
To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events.
Audio visual speech source separation via improved context dependent association model
NASA Astrophysics Data System (ADS)
Kazemi, Alireza; Boostani, Reza; Sobhanmanesh, Fariborz
2014-12-01
In this paper, we exploit the non-linear relation between a speech source and its associated lip video as a source of extra information to propose an improved audio-visual speech source separation (AVSS) algorithm. The audio-visual association is modeled using a neural associator which estimates the visual lip parameters from a temporal context of acoustic observation frames. We define an objective function based on mean square error (MSE) measure between estimated and target visual parameters. This function is minimized for estimation of the de-mixing vector/filters to separate the relevant source from linear instantaneous or time-domain convolutive mixtures. We have also proposed a hybrid criterion which uses AV coherency together with kurtosis as a non-Gaussianity measure. Experimental results are presented and compared in terms of visually relevant speech detection accuracy and output signal-to-interference ratio (SIR) of source separation. The suggested audio-visual model significantly improves relevant speech classification accuracy compared to existing GMM-based model and the proposed AVSS algorithm improves the speech separation quality compared to reference ICA- and AVSS-based methods.
Stropahl, Maren; Schellhardt, Sebastian; Debener, Stefan
2017-06-01
The concurrent presentation of different auditory and visual syllables may result in the perception of a third syllable, reflecting an illusory fusion of visual and auditory information. This well-known McGurk effect is frequently used for the study of audio-visual integration. Recently, it was shown that the McGurk effect is strongly stimulus-dependent, which complicates comparisons across perceivers and inferences across studies. To overcome this limitation, we developed the freely available Oldenburg audio-visual speech stimuli (OLAVS), consisting of 8 different talkers and 12 different syllable combinations. The quality of the OLAVS set was evaluated with 24 normal-hearing subjects. All 96 stimuli were characterized based on their stimulus disparity, which was obtained from a probabilistic model (cf. Magnotti & Beauchamp, 2015). Moreover, the McGurk effect was studied in eight adult cochlear implant (CI) users. By applying the individual, stimulus-independent parameters of the probabilistic model, the predicted effect of stronger audio-visual integration in CI users could be confirmed, demonstrating the validity of the new stimulus material.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shull, D.
This report documents the initial feasibility tests performed using a commercial acoustic emission instrument for the purpose of detecting beetles in Department of Energy 9975 shipping packages. The device selected for this testing was a commercial handheld instrument and probe developed for the detection of termites, weevils, beetles and other insect infestations in wooden structures, trees, plants and soil. The results of two rounds of testing are presented. The first tests were performed by the vendor using only the hand-held instrument’s indications and real-time operator analysis of the audio signal content. The second tests included hands-free positioning of the instrumentmore » probe and post-collection analysis of the recorded audio signal content including audio background comparisons. The test results indicate that the system is promising for detecting the presence of drugstore beetles, however, additional work would be needed to improve the ease of detection and to automate the signal processing to eliminate the need for human interpretation. Mechanisms for hands-free positioning of the probe and audio background discrimination are also necessary for reliable detection and to reduce potential operator dose in radiation environments.« less
The effect of context and audio-visual modality on emotions elicited by a musical performance
Coutinho, Eduardo; Scherer, Klaus R.
2016-01-01
In this work, we compared emotions induced by the same performance of Schubert Lieder during a live concert and in a laboratory viewing/listening setting to determine the extent to which laboratory research on affective reactions to music approximates real listening conditions in dedicated performances. We measured emotions experienced by volunteer members of an audience that attended a Lieder recital in a church (Context 1) and emotional reactions to an audio-video-recording of the same performance in a university lecture hall (Context 2). Three groups of participants were exposed to three presentation versions in Context 2: (1) an audio-visual recording, (2) an audio-only recording, and (3) a video-only recording. Participants achieved statistically higher levels of emotional convergence in the live performance than in the laboratory context, and the experience of particular emotions was determined by complex interactions between auditory and visual cues in the performance. This study demonstrates the contribution of the performance setting and the performers’ appearance and nonverbal expression to emotion induction by music, encouraging further systematic research into the factors involved. PMID:28781419
Reconfigurable Auditory-Visual Display
NASA Technical Reports Server (NTRS)
Begault, Durand R. (Inventor); Anderson, Mark R. (Inventor); McClain, Bryan (Inventor); Miller, Joel D. (Inventor)
2008-01-01
System and method for visual and audible communication between a central operator and N mobile communicators (N greater than or equal to 2), including an operator transceiver and interface, configured to receive and display, for the operator, visually perceptible and audibly perceptible signals from each of the mobile communicators. The interface (1) presents an audible signal from each communicator as if the audible signal is received from a different location relative to the operator and (2) allows the operator to select, to assign priority to, and to display, the visual signals and the audible signals received from a specified communicator. Each communicator has an associated signal transmitter that is configured to transmit at least one of the visual signals and the audio signal associated with the communicator, where at least one of the signal transmitters includes at least one sensor that senses and transmits a sensor value representing a selected environmental or physiological parameter associated with the communicator.
A sLORETA study for gaze-independent BCI speller.
Xingwei An; Jinwen Wei; Shuang Liu; Dong Ming
2017-07-01
EEG-based BCI (brain-computer-interface) speller, especially gaze-independent BCI speller, has become a hot topic in recent years. It provides direct spelling device by non-muscular method for people with severe motor impairments and with limited gaze movement. Brain needs to conduct both stimuli-driven and stimuli-related attention in fast presented BCI paradigms for such BCI speller applications. Few researchers studied the mechanism of brain response to such fast presented BCI applications. In this study, we compared the distribution of brain activation in visual, auditory, and audio-visual combined stimuli paradigms using sLORETA (standardized low-resolution brain electromagnetic tomography). Between groups comparisons showed the importance of visual and auditory stimuli in audio-visual combined paradigm. They both contribute to the activation of brain regions, with visual stimuli being the predominate stimuli. Visual stimuli related brain region was mainly located at parietal and occipital lobe, whereas response in frontal-temporal lobes might be caused by auditory stimuli. These regions played an important role in audio-visual bimodal paradigms. These new findings are important for future study of ERP speller as well as the mechanism of fast presented stimuli.
The power of digital audio in interactive instruction: An unexploited medium
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pratt, J.; Trainor, M.
1989-01-01
Widespread use of audio in computer-based training (CBT) occurred with the advent of the interactive videodisc technology. This paper discusses the alternative of digital audio, which, unlike videodisc audio, enables one to rapidly revise the audio used in the CBT and which may be used in nonvideo CBT applications as well. We also discuss techniques used in audio script writing, editing, and production. Results from evaluations indicate a high degree of user satisfaction. 4 refs.
47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.
Code of Federal Regulations, 2012 CFR
2012-10-01
... Message (EOM) codes using the EAS Protocol. The Attention Signal must precede any emergency audio message... audio messages. No Attention Signal is required for EAS messages that do not contain audio programming... EAS messages in the main audio channel. All DAB stations shall also transmit EAS messages on all audio...
47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.
Code of Federal Regulations, 2014 CFR
2014-10-01
... Message (EOM) codes using the EAS Protocol. The Attention Signal must precede any emergency audio message... audio messages. No Attention Signal is required for EAS messages that do not contain audio programming... EAS messages in the main audio channel. All DAB stations shall also transmit EAS messages on all audio...
47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.
Code of Federal Regulations, 2013 CFR
2013-10-01
... Message (EOM) codes using the EAS Protocol. The Attention Signal must precede any emergency audio message... audio messages. No Attention Signal is required for EAS messages that do not contain audio programming... EAS messages in the main audio channel. All DAB stations shall also transmit EAS messages on all audio...
Communicative Competence in Audio Classrooms: A Position Paper for the CADE 1991 Conference.
ERIC Educational Resources Information Center
Burge, Liz
Classroom practitioners need to move their attention away from the technological and logistical competencies required for audio conferencing (AC) to the required communicative competencies in order to advance their skills in handling the psychodynamics of audio virtual classrooms which include audio alone and audio with graphics. While the…
Wardman, M J; Yorke, V C; Hallam, J L
2018-05-01
Feedback is an essential part of the learning process, and students expect their feedback to be personalised, meaningful and timely. Objective Structured Clinical Examination (OSCE) assessments allow examiners to observe students carefully over the course of a number of varied station types, across a number of clinical knowledge and skill domains. They therefore present an ideal opportunity to record detailed feedback which allows students to reflect on and improve their performance. This article outlines two methods by which OSCE feedback was collected and then disseminated to undergraduate dental students across 2-year groups in a UK dental school: (i) Individual written feedback comments made by examiners during the examination, (ii) General audio feedback recorded by groups of examiners immediately following the examination. Evaluation of the feedback was sought from students and staff examiners. A multi-methods approach utilising Likert questionnaire items (quantitative) and open-ended feedback questions (qualitative) was used. Data analysis explored student and staff perceptions of the audio and written feedback. A total of 131 students (response rate 68%) and 52 staff examiners (response rate 83%) completed questionnaires. Quantitative data analysis showed that the written and audio formats were reported as a meaningful source of feedback for learning by both students (93% written, 89% audio) and staff (96% written, 92% audio). Qualitative data revealed the complementary nature of both types of feedback. Written feedback gives specific, individual information whilst audio shares general observations and allows students to learn from others. The advantages, limitations and challenges of the feedback methods are discussed, leading to the development of an informed set of implementation guidelines. Written and audio feedback methods are valued by students and staff. It is proposed that these may be very easily applied to OSCEs running in other dental schools. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
SNR-adaptive stream weighting for audio-MES ASR.
Lee, Ki-Seung
2008-08-01
Myoelectric signals (MESs) from the speaker's mouth region have been successfully shown to improve the noise robustness of automatic speech recognizers (ASRs), thus promising to extend their usability in implementing noise-robust ASR. In the recognition system presented herein, extracted audio and facial MES features were integrated by a decision fusion method, where the likelihood score of the audio-MES observation vector was given by a linear combination of class-conditional observation log-likelihoods of two classifiers, using appropriate weights. We developed a weighting process adaptive to SNRs. The main objective of the paper involves determining the optimal SNR classification boundaries and constructing a set of optimum stream weights for each SNR class. These two parameters were determined by a method based on a maximum mutual information criterion. Acoustic and facial MES data were collected from five subjects, using a 60-word vocabulary. Four types of acoustic noise including babble, car, aircraft, and white noise were acoustically added to clean speech signals with SNR ranging from -14 to 31 dB. The classification accuracy of the audio ASR was as low as 25.5%. Whereas, the classification accuracy of the MES ASR was 85.2%. The classification accuracy could be further improved by employing the proposed audio-MES weighting method, which was as high as 89.4% in the case of babble noise. A similar result was also found for the other types of noise.
Grouping and Segregation of Sensory Events by Actions in Temporal Audio-Visual Recalibration.
Ikumi, Nara; Soto-Faraco, Salvador
2016-01-01
Perception in multi-sensory environments involves both grouping and segregation of events across sensory modalities. Temporal coincidence between events is considered a strong cue to resolve multisensory perception. However, differences in physical transmission and neural processing times amongst modalities complicate this picture. This is illustrated by cross-modal recalibration, whereby adaptation to audio-visual asynchrony produces shifts in perceived simultaneity. Here, we examined whether voluntary actions might serve as a temporal anchor to cross-modal recalibration in time. Participants were tested on an audio-visual simultaneity judgment task after an adaptation phase where they had to synchronize voluntary actions with audio-visual pairs presented at a fixed asynchrony (vision leading or vision lagging). Our analysis focused on the magnitude of cross-modal recalibration to the adapted audio-visual asynchrony as a function of the nature of the actions during adaptation, putatively fostering cross-modal grouping or, segregation. We found larger temporal adjustments when actions promoted grouping than segregation of sensory events. However, a control experiment suggested that additional factors, such as attention to planning/execution of actions, could have an impact on recalibration effects. Contrary to the view that cross-modal temporal organization is mainly driven by external factors related to the stimulus or environment, our findings add supporting evidence for the idea that perceptual adjustments strongly depend on the observer's inner states induced by motor and cognitive demands.
Grouping and Segregation of Sensory Events by Actions in Temporal Audio-Visual Recalibration
Ikumi, Nara; Soto-Faraco, Salvador
2017-01-01
Perception in multi-sensory environments involves both grouping and segregation of events across sensory modalities. Temporal coincidence between events is considered a strong cue to resolve multisensory perception. However, differences in physical transmission and neural processing times amongst modalities complicate this picture. This is illustrated by cross-modal recalibration, whereby adaptation to audio-visual asynchrony produces shifts in perceived simultaneity. Here, we examined whether voluntary actions might serve as a temporal anchor to cross-modal recalibration in time. Participants were tested on an audio-visual simultaneity judgment task after an adaptation phase where they had to synchronize voluntary actions with audio-visual pairs presented at a fixed asynchrony (vision leading or vision lagging). Our analysis focused on the magnitude of cross-modal recalibration to the adapted audio-visual asynchrony as a function of the nature of the actions during adaptation, putatively fostering cross-modal grouping or, segregation. We found larger temporal adjustments when actions promoted grouping than segregation of sensory events. However, a control experiment suggested that additional factors, such as attention to planning/execution of actions, could have an impact on recalibration effects. Contrary to the view that cross-modal temporal organization is mainly driven by external factors related to the stimulus or environment, our findings add supporting evidence for the idea that perceptual adjustments strongly depend on the observer's inner states induced by motor and cognitive demands. PMID:28154529
2004-11-01
this paper we describe the systems developed by MITLL and used in DARPA EARS Rich Transcription Fall 2004 (RT-04F) speaker diarization evaluation...many types of audio sources, the focus if the DARPA EARS project and the NIST Rich Transcription evaluations is primarily speaker diarization ...present or samples of any of the speakers . An overview of the general diarization problem and approaches can be found in [1]. In this paper, we
The effects of 5.1 sound presentations on the perception of stereoscopic imagery in video games
NASA Astrophysics Data System (ADS)
Cullen, Brian; Galperin, Daniel; Collins, Karen; Hogue, Andrew; Kapralos, Bill
2013-03-01
Stereoscopic 3D (S3D) content in games, film and other audio-visual media has been steadily increasing over the past number of years. However, there are still open, fundamental questions regarding its implementation, particularly as it relates to a multi-modal experience that involves sound and haptics. Research has shown that sound has considerable impact on our perception of 2D phenomena, but very little research has considered how sound may influence stereoscopic 3D. Here we present the results of an experiment that examined the effects of 5.1 surround sound (5.1) and stereo loudspeaker setups on depth perception in relation to S3D imagery within a video game environment. Our aim was to answer the question: "can 5.1 surround sound enhance the participant's perception of depth in the stereoscopic field when compared to traditional stereo sound presentations?" In addition, our study examined how the presence or absence of Doppler frequency shift and frequency fall-off audio effects can also influence depth judgment under these conditions. Results suggest that 5.1 surround sound presentations enhance the apparent depth of stereoscopic imagery when compared to stereo presentations. Results also suggest that the addition of audio effects such as Doppler shift and frequency fall-off filters can influence the apparent depth of S3D objects.
47 CFR 73.322 - FM stereophonic sound transmission standards.
Code of Federal Regulations, 2014 CFR
2014-10-01
... transmission, modulation of the carrier by audio components within the baseband range of 50 Hz to 15 kHz shall... the carrier by audio components within the audio baseband range of 23 kHz to 99 kHz shall not exceed... method described in (a), must limit the modulation of the carrier by audio components within the audio...
47 CFR 73.322 - FM stereophonic sound transmission standards.
Code of Federal Regulations, 2013 CFR
2013-10-01
... transmission, modulation of the carrier by audio components within the baseband range of 50 Hz to 15 kHz shall... the carrier by audio components within the audio baseband range of 23 kHz to 99 kHz shall not exceed... method described in (a), must limit the modulation of the carrier by audio components within the audio...
47 CFR 73.322 - FM stereophonic sound transmission standards.
Code of Federal Regulations, 2011 CFR
2011-10-01
... transmission, modulation of the carrier by audio components within the baseband range of 50 Hz to 15 kHz shall... the carrier by audio components within the audio baseband range of 23 kHz to 99 kHz shall not exceed... method described in (a), must limit the modulation of the carrier by audio components within the audio...
47 CFR 73.322 - FM stereophonic sound transmission standards.
Code of Federal Regulations, 2012 CFR
2012-10-01
... transmission, modulation of the carrier by audio components within the baseband range of 50 Hz to 15 kHz shall... the carrier by audio components within the audio baseband range of 23 kHz to 99 kHz shall not exceed... method described in (a), must limit the modulation of the carrier by audio components within the audio...
Comparing Audio and Video Data for Rating Communication
Williams, Kristine; Herman, Ruth; Bontempo, Daniel
2013-01-01
Video recording has become increasingly popular in nursing research, adding rich nonverbal, contextual, and behavioral information. However, benefits of video over audio data have not been well established. We compared communication ratings of audio versus video data using the Emotional Tone Rating Scale. Twenty raters watched video clips of nursing care and rated staff communication on 12 descriptors that reflect dimensions of person-centered and controlling communication. Another group rated audio-only versions of the same clips. Interrater consistency was high within each group with ICC (2,1) for audio = .91, and video = .94. Interrater consistency for both groups combined was also high with ICC (2,1) for audio and video = .95. Communication ratings using audio and video data were highly correlated. The value of video being superior to audio recorded data should be evaluated in designing studies evaluating nursing care. PMID:23579475
Enhanced audio-visual interactions in the auditory cortex of elderly cochlear-implant users.
Schierholz, Irina; Finke, Mareike; Schulte, Svenja; Hauthal, Nadine; Kantzke, Christoph; Rach, Stefan; Büchner, Andreas; Dengler, Reinhard; Sandmann, Pascale
2015-10-01
Auditory deprivation and the restoration of hearing via a cochlear implant (CI) can induce functional plasticity in auditory cortical areas. How these plastic changes affect the ability to integrate combined auditory (A) and visual (V) information is not yet well understood. In the present study, we used electroencephalography (EEG) to examine whether age, temporary deafness and altered sensory experience with a CI can affect audio-visual (AV) interactions in post-lingually deafened CI users. Young and elderly CI users and age-matched NH listeners performed a speeded response task on basic auditory, visual and audio-visual stimuli. Regarding the behavioral results, a redundant signals effect, that is, faster response times to cross-modal (AV) than to both of the two modality-specific stimuli (A, V), was revealed for all groups of participants. Moreover, in all four groups, we found evidence for audio-visual integration. Regarding event-related responses (ERPs), we observed a more pronounced visual modulation of the cortical auditory response at N1 latency (approximately 100 ms after stimulus onset) in the elderly CI users when compared with young CI users and elderly NH listeners. Thus, elderly CI users showed enhanced audio-visual binding which may be a consequence of compensatory strategies developed due to temporary deafness and/or degraded sensory input after implantation. These results indicate that the combination of aging, sensory deprivation and CI facilitates the coupling between the auditory and the visual modality. We suggest that this enhancement in multisensory interactions could be used to optimize auditory rehabilitation, especially in elderly CI users, by the application of strong audio-visually based rehabilitation strategies after implant switch-on. Copyright © 2015 Elsevier B.V. All rights reserved.
Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues
NASA Astrophysics Data System (ADS)
Adams, W. H.; Iyengar, Giridharan; Lin, Ching-Yung; Naphade, Milind Ramesh; Neti, Chalapathy; Nock, Harriet J.; Smith, John R.
2003-12-01
We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM), hidden Markov models (HMM), and support vector machines (SVM). Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.
Video conference quality assessment based on cooperative sensing of video and audio
NASA Astrophysics Data System (ADS)
Wang, Junxi; Chen, Jialin; Tian, Xin; Zhou, Cheng; Zhou, Zheng; Ye, Lu
2015-12-01
This paper presents a method to video conference quality assessment, which is based on cooperative sensing of video and audio. In this method, a proposed video quality evaluation method is used to assess the video frame quality. The video frame is divided into noise image and filtered image by the bilateral filters. It is similar to the characteristic of human visual, which could also be seen as a low-pass filtering. The audio frames are evaluated by the PEAQ algorithm. The two results are integrated to evaluate the video conference quality. A video conference database is built to test the performance of the proposed method. It could be found that the objective results correlate well with MOS. Then we can conclude that the proposed method is efficiency in assessing video conference quality.
NASA Astrophysics Data System (ADS)
Pallone, Arthur
Necessity often leads to inspiration. Such was the case when a traditional amplifier quit working during the collection of an alpha particle spectrum. I had a 15 battery-powered audio amplifier in my box of project electronics so I connected it between the preamplifier and the multichannel analyzer. The alpha particle spectrum that appeared on the computer screen matched expectations even without correcting for impedance mismatches. Encouraged by this outcome, I have begun to systematically replace each of the parts in a traditional charged particle spectrometer with audio and video components available through consumer electronics stores with the goal of producing an inexpensive charged particle spectrometer for use in education and research. Hopefully my successes, setbacks, and results to date described in this presentation will inform and inspire others.
Neuromorphic audio-visual sensor fusion on a sound-localizing robot.
Chan, Vincent Yue-Sek; Jin, Craig T; van Schaik, André
2012-01-01
This paper presents the first robotic system featuring audio-visual (AV) sensor fusion with neuromorphic sensors. We combine a pair of silicon cochleae and a silicon retina on a robotic platform to allow the robot to learn sound localization through self motion and visual feedback, using an adaptive ITD-based sound localization algorithm. After training, the robot can localize sound sources (white or pink noise) in a reverberant environment with an RMS error of 4-5° in azimuth. We also investigate the AV source binding problem and an experiment is conducted to test the effectiveness of matching an audio event with a corresponding visual event based on their onset time. Despite the simplicity of this method and a large number of false visual events in the background, a correct match can be made 75% of the time during the experiment.
Lefebvre, M
1979-01-01
The present information production techniques are so inefficient that it is out of the question to generalize them. On the other hand audio-visual communication raises a major political problem, especially for developing countries. Audio-visual equipment has gone through adjustment phases; the example of the tape and cassette recorder is given: 2 technological improvements have completely modified its use; the transistors have allowed considerable reduction in volume and weight as well as the energy necessary; the invention of the cassette has simplified its use. Technological research is following 3 major directions: the production of equipment which consumes little energy; the improvement of electronic component production techniques (towards cheaper electronic components); finally, the designing of systems allowing to stock large quantities of information. The communication systems will probably make so much progress in the areas of technology and programming, that they will soon have very different uses than the present ones. The question is whether our civilizations will let themselves be dominated by these new systems, or whether they will succeed to turn them into progress tools.
NASA Astrophysics Data System (ADS)
Linder, C. A.; Wilbert, M.; Holmes, R. M.
2010-12-01
Multimedia video presentations, which integrate still photographs with video clips, audio interviews, ambient sounds, and music, are an effective and engaging way to tell science stories. In July 2009, Linder joined professors and undergraduates on an expedition to the Kolyma River in northeastern Siberia. This IPY science project, called The Polaris Project (http://www.thepolarisproject.org), is an undergraduate research experience where students and faculty work together to increase our understanding of climate change impacts, including thawing permafrost, in this remote corner of the world. During the summer field season, Linder conducted dozens of interviews, captured over 20,000 still photographs and hours of ambient audio and video clips. Following the 2009 expedition, Linder blended this massive archive of visual and audio information into a 10-minute overview video and five student vignettes. In 2010, Linder again traveled to Siberia as part of the Polaris Project, this time mentoring an environmental journalism student who will lead the production of a video about the 2010 field season. Using examples from the Polaris productions, we will present tips, tools, and techniques for creating compelling multimedia science stories.
Information-Driven Active Audio-Visual Source Localization
Schult, Niclas; Reineking, Thomas; Kluss, Thorsten; Zetzsche, Christoph
2015-01-01
We present a system for sensorimotor audio-visual source localization on a mobile robot. We utilize a particle filter for the combination of audio-visual information and for the temporal integration of consecutive measurements. Although the system only measures the current direction of the source, the position of the source can be estimated because the robot is able to move and can therefore obtain measurements from different directions. These actions by the robot successively reduce uncertainty about the source’s position. An information gain mechanism is used for selecting the most informative actions in order to minimize the number of actions required to achieve accurate and precise position estimates in azimuth and distance. We show that this mechanism is an efficient solution to the action selection problem for source localization, and that it is able to produce precise position estimates despite simplified unisensory preprocessing. Because of the robot’s mobility, this approach is suitable for use in complex and cluttered environments. We present qualitative and quantitative results of the system’s performance and discuss possible areas of application. PMID:26327619
Extraterrestrial sound for planetaria: A pedagogical study.
Leighton, T G; Banda, N; Berges, B; Joseph, P F; White, P R
2016-08-01
The purpose of this project was to supply an acoustical simulation device to a local planetarium for use in live shows aimed at engaging and inspiring children in science and engineering. The device plays audio simulations of estimates of the sounds produced by natural phenomena to accompany audio-visual presentations and live shows about Venus, Mars, and Titan. Amongst the simulated noise are the sounds of thunder, wind, and cryo-volcanoes. The device can also modify the speech of the presenter (or audience member) in accordance with the underlying physics to reproduce those vocalizations as if they had been produced on the world under discussion. Given that no time series recordings exist of sounds from other worlds, these sounds had to be simulated. The goal was to ensure that the audio simulations were delivered in time for a planetarium's launch show to enable the requested outreach to children. The exercise has also allowed an explanation of the science and engineering behind the creation of the sounds. This has been achieved for young children, and also for older students and undergraduates, who could then debate the limitations of that method.
Energy and Power Spectra of Thunder in the Magdalena Mountains, Central New Mexico
NASA Astrophysics Data System (ADS)
Johnson, R. L.; Johnson, J. B.; Arechiga, R. O.; Michnovicz, J. C.; Edens, H. E.; Rison, W.
2011-12-01
Thunder is generated primarily by heating and expansion of the atmosphere around a lightning channel and by charge relaxation within a cloud. Broadband acoustic studies are important for inferring dynamic charge behavior during and after lightning events. During the Summer monsoon seasons of 2009-2011, we deployed networks of 3-5 stations consisting of broadband (0.01 to 500 Hz) acoustic arrays and audio microphones in the Magdalena Mountains in central New Mexico. We utilize Lightning Mapping Array (LMA) data for accurate timing of lightning events within a 10 km radius of our network. Unlike the LMA, which detects VHF signals from breakdown processes, thunder signals may be used to observe charge dynamics and thermal shocking of the atmosphere. Previous investigations show that thunder spectral content may distinguish between electrostatic and thermal heating processes. We collected extensive datasets in terms of number of independent broadband sensors (up to 20), number of observed flashes (hundreds from multiple storms), and available coincident LMA data. We use infrasound and audio data to quantify total acoustic energy produced at lightning sources in various frequency bands. We attribute the spectral content and intensity of thunder signals to source characteristics, sensor locations, propagation effects, and noise. We observe variations in acoustic energy for both entire storm systems and individual lightning flashes. We propose that some variations may be related to the type of lightning flash and that spectral content is important for distinguishing between thunder generation mechanisms.
Predicting the Overall Spatial Quality of Automotive Audio Systems
NASA Astrophysics Data System (ADS)
Koya, Daisuke
The spatial quality of automotive audio systems is often compromised due to their unideal listening environments. Automotive audio systems need to be developed quickly due to industry demands. A suitable perceptual model could evaluate the spatial quality of automotive audio systems with similar reliability to formal listening tests but take less time. Such a model is developed in this research project by adapting an existing model of spatial quality for automotive audio use. The requirements for the adaptation were investigated in a literature review. A perceptual model called QESTRAL was reviewed, which predicts the overall spatial quality of domestic multichannel audio systems. It was determined that automotive audio systems are likely to be impaired in terms of the spatial attributes that were not considered in developing the QESTRAL model, but metrics are available that might predict these attributes. To establish whether the QESTRAL model in its current form can accurately predict the overall spatial quality of automotive audio systems, MUSHRA listening tests using headphone auralisation with head tracking were conducted to collect results to be compared against predictions by the model. Based on guideline criteria, the model in its current form could not accurately predict the overall spatial quality of automotive audio systems. To improve prediction performance, the QESTRAL model was recalibrated and modified using existing metrics of the model, those that were proposed from the literature review, and newly developed metrics. The most important metrics for predicting the overall spatial quality of automotive audio systems included those that were interaural cross-correlation (IACC) based, relate to localisation of the frontal audio scene, and account for the perceived scene width in front of the listener. Modifying the model for automotive audio systems did not invalidate its use for domestic audio systems. The resulting model predicts the overall spatial quality of 2- and 5-channel automotive audio systems with a cross-validation performance of R. 2 = 0.85 and root-mean-squareerror (RMSE) = 11.03%.
Exploring the Implementation of Steganography Protocols on Quantum Audio Signals
NASA Astrophysics Data System (ADS)
Chen, Kehan; Yan, Fei; Iliyasu, Abdullah M.; Zhao, Jianping
2018-02-01
Two quantum audio steganography (QAS) protocols are proposed, each of which manipulates or modifies the least significant qubit (LSQb) of the host quantum audio signal that is encoded as an FRQA (flexible representation of quantum audio) audio content. The first protocol (i.e. the conventional LSQb QAS protocol or simply the cLSQ stego protocol) is built on the exchanges between qubits encoding the quantum audio message and the LSQb of the amplitude information in the host quantum audio samples. In the second protocol, the embedding procedure to realize it implants information from a quantum audio message deep into the constraint-imposed most significant qubit (MSQb) of the host quantum audio samples, we refer to it as the pseudo MSQb QAS protocol or simply the pMSQ stego protocol. The cLSQ stego protocol is designed to guarantee high imperceptibility between the host quantum audio and its stego version, whereas the pMSQ stego protocol ensures that the resulting stego quantum audio signal is better immune to illicit tampering and copyright violations (a.k.a. robustness). Built on the circuit model of quantum computation, the circuit networks to execute the embedding and extraction algorithms of both QAS protocols are determined and simulation-based experiments are conducted to demonstrate their implementation. Outcomes attest that both protocols offer promising trade-offs in terms of imperceptibility and robustness.
Comparing audio and video data for rating communication.
Williams, Kristine; Herman, Ruth; Bontempo, Daniel
2013-09-01
Video recording has become increasingly popular in nursing research, adding rich nonverbal, contextual, and behavioral information. However, benefits of video over audio data have not been well established. We compared communication ratings of audio versus video data using the Emotional Tone Rating Scale. Twenty raters watched video clips of nursing care and rated staff communication on 12 descriptors that reflect dimensions of person-centered and controlling communication. Another group rated audio-only versions of the same clips. Interrater consistency was high within each group with Interclass Correlation Coefficient (ICC) (2,1) for audio .91, and video = .94. Interrater consistency for both groups combined was also high with ICC (2,1) for audio and video = .95. Communication ratings using audio and video data were highly correlated. The value of video being superior to audio-recorded data should be evaluated in designing studies evaluating nursing care.
ten Oever, Sanne; Sack, Alexander T.; Wheat, Katherine L.; Bien, Nina; van Atteveldt, Nienke
2013-01-01
Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception. PMID:23805110
Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke
2013-01-01
Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.
Wofford, J L; Currin, D; Michielutte, R; Wofford, M M
2001-04-20
Inadequate reading literacy is a major barrier to better educating patients. Despite its high prevalence, practical solutions for detecting and overcoming low literacy in a busy clinical setting remain elusive. In exploring the potential role for the multimedia computer in improving office-based patient education, we compared the accuracy of information captured from audio-computer interviewing of patients with that obtained from subsequent verbal questioning. Adult medicine clinic, urban community health center Convenience sample of patients awaiting clinic appointments (n = 59). Exclusion criteria included obvious psychoneurologic impairment or primary language other than English. A multimedia computer presentation that used audio-computer interviewing with localized imagery and voices to elicit responses to 4 questions on prior computer use and cancer risk perceptions. Three patients refused or were unable to interact with the computer at all, and 3 patients required restarting the presentation from the beginning but ultimately completed the computerized survey. Of the 51 evaluable patients (72.5% African-American, 66.7% female, mean age 47.5 [+/- 18.1]), the mean time in the computer presentation was significantly longer with older age and with no prior computer use but did not differ by gender or race. Despite a high proportion of no prior computer use (60.8%), there was a high rate of agreement (88.7% overall) between audio-computer interviewing and subsequent verbal questioning. Audio-computer interviewing is feasible in this urban community health center. The computer offers a partial solution for overcoming literacy barriers inherent in written patient education materials and provides an efficient means of data collection that can be used to better target patients' educational needs.
Recognition and characterization of unstructured environmental sounds
NASA Astrophysics Data System (ADS)
Chu, Selina
2011-12-01
Environmental sounds are what we hear everyday, or more generally sounds that surround us ambient or background audio. Humans utilize both vision and hearing to respond to their surroundings, a capability still quite limited in machine processing. The first step toward achieving multimodal input applications is the ability to process unstructured audio and recognize audio scenes (or environments). Such ability would have applications in content analysis and mining of multimedia data or improving robustness in context aware applications through multi-modality, such as in assistive robotics, surveillances, or mobile device-based services. The goal of this thesis is on the characterization of unstructured environmental sounds for understanding and predicting the context surrounding of an agent or device. Most research on audio recognition has focused primarily on speech and music. Less attention has been paid to the challenges and opportunities for using audio to characterize unstructured audio. My research focuses on investigating challenging issues in characterizing unstructured environmental audio and to develop novel algorithms for modeling the variations of the environment. The first step in building a recognition system for unstructured auditory environment was to investigate on techniques and audio features for working with such audio data. We begin by performing a study that explore suitable features and the feasibility of designing an automatic environment recognition system using audio information. In my initial investigation to explore the feasibility of designing an automatic environment recognition system using audio information, I have found that traditional recognition and feature extraction for audio were not suitable for environmental sound, as they lack any type of structures, unlike those of speech and music which contain formantic and harmonic structures, thus dispelling the notion that traditional speech and music recognition techniques can simply be used for realistic environmental sound. Natural unstructured environment sounds contain a large variety of sounds, which are in fact noise-like and are not effectively modeled by Mel-frequency cepstral coefficients (MFCCs) or other commonly-used audio features, e.g. energy, zero-crossing, etc. Due to the lack of appropriate features that is suitable for environmental audio and to achieve a more effective representation, I proposed a specialized feature extraction algorithm for environmental sounds that utilizes the matching pursuit (MP) algorithm to learn the inherent structure of each type of sounds, which we called MP-features. MP-features have shown to capture and represent sounds from different sources and different ranges, where frequency domain features (e.g., MFCCs) fail and can be advantageous when combining with MFCCs to improve the overall performance. The third component leads to our investigation on modeling and detecting the background audio. One of the goals of this research is to characterize an environment. Since many events would blend into the background, I wanted to look for a way to achieve a general model for any particular environment. Once we have an idea of the background, it will enable us to identify foreground events even if we havent seen these events before. Therefore, the next step is to investigate into learning the audio background model for each environment type, despite the occurrences of different foreground events. In this work, I presented a framework for robust audio background modeling, which includes learning the models for prediction, data knowledge and persistent characteristics of the environment. This approach has the ability to model the background and detect foreground events as well as the ability to verify whether the predicted background is indeed the background or a foreground event that protracts for a longer period of time. In this work, I also investigated the use of a semi-supervised learning technique to exploit and label new unlabeled audio data. The final components of my thesis will involve investigating on learning sound structures for generalization and applying the proposed ideas to context aware applications. The inherent nature of environmental sound is noisy and contains relatively large amounts of overlapping events between different environments. Environmental sounds contain large variances even within a single environment type, and frequently, there are no divisible or clear boundaries between some types. Traditional methods of classification are generally not robust enough to handle classes with overlaps. This audio, hence, requires representation by complex models. Using deep learning architecture provides a way to obtain a generative model-based method for classification. Specifically, I considered the use of Deep Belief Networks (DBNs) to model environmental audio and investigate its applicability with noisy data to improve robustness and generalization. A framework was proposed using composite-DBNs to discover high-level representations and to learn a hierarchical structure for different acoustic environments in a data-driven fashion. Experimental results on real data sets demonstrate its effectiveness over traditional methods with over 90% accuracy on recognition for a high number of environmental sound types.
Phillips, Yvonne F; Towsey, Michael; Roe, Paul
2018-01-01
Audio recordings of the environment are an increasingly important technique to monitor biodiversity and ecosystem function. While the acquisition of long-duration recordings is becoming easier and cheaper, the analysis and interpretation of that audio remains a significant research area. The issue addressed in this paper is the automated reduction of environmental audio data to facilitate ecological investigations. We describe a method that first reduces environmental audio to vectors of acoustic indices, which are then clustered. This can reduce the audio data by six to eight orders of magnitude yet retain useful ecological information. We describe techniques to visualise sequences of cluster occurrence (using for example, diel plots, rose plots) that assist interpretation of environmental audio. Colour coding acoustic clusters allows months and years of audio data to be visualised in a single image. These techniques are useful in identifying and indexing the contents of long-duration audio recordings. They could also play an important role in monitoring long-term changes in species abundance brought about by habitat degradation and/or restoration.
Towsey, Michael; Roe, Paul
2018-01-01
Audio recordings of the environment are an increasingly important technique to monitor biodiversity and ecosystem function. While the acquisition of long-duration recordings is becoming easier and cheaper, the analysis and interpretation of that audio remains a significant research area. The issue addressed in this paper is the automated reduction of environmental audio data to facilitate ecological investigations. We describe a method that first reduces environmental audio to vectors of acoustic indices, which are then clustered. This can reduce the audio data by six to eight orders of magnitude yet retain useful ecological information. We describe techniques to visualise sequences of cluster occurrence (using for example, diel plots, rose plots) that assist interpretation of environmental audio. Colour coding acoustic clusters allows months and years of audio data to be visualised in a single image. These techniques are useful in identifying and indexing the contents of long-duration audio recordings. They could also play an important role in monitoring long-term changes in species abundance brought about by habitat degradation and/or restoration. PMID:29494629
Holographic disk with high data transfer rate: its application to an audio response memory.
Kubota, K; Ono, Y; Kondo, M; Sugama, S; Nishida, N; Sakaguchi, M
1980-03-15
This paper describes a memory realized with a high data transfer rate using the holographic parallel-processing function and its application to an audio response system that supplies many audio messages to many terminals simultaneously. Digitalized audio messages are recorded as tiny 1-D Fourier transform holograms on a holographic disk. A hologram recorder and a hologram reader were constructed to test and demonstrate the holographic audio response memory feasibility. Experimental results indicate the potentiality of an audio response system with a 2000-word vocabulary and 250-Mbit/sec bit transfer rate.
Music mixing preferences of cochlear implant recipients: a pilot study.
Buyens, Wim; van Dijk, Bas; Moonen, Marc; Wouters, Jan
2014-05-01
Music perception and appraisal are generally poor in cochlear implant recipients. Simple musical structures, lyrics that are easy to follow, and clear rhythm/beat have been reported among the top factors to enhance music enjoyment. The present study investigated the preference for modified relative instrument levels in music with normal-hearing and cochlear implant subjects. In experiment 1, test subjects were given a mixing console and multi-track recordings to determine their most enjoyable audio mix. In experiment 2, a preference rating experiment based on the preferred relative level settings in experiment 1 was performed. Experiment 1 was performed with four postlingually deafened cochlear implant subjects, experiment 2 with ten normal-hearing and ten cochlear implant subjects. A significant difference in preference rating was found between normal-hearing and cochlear implant subjects. The latter preferred an audio mix with larger vocals-to-instruments ratio. In addition, given an audio mix with clear vocals and attenuated instruments, cochlear implant subjects preferred the bass/drum track to be louder than the other instrument tracks. The original audio mix in real-world music might not be suitable for cochlear implant recipients. Modifying the relative instrument level settings potentially improves music enjoyment.
Formal Verification of a Power Controller Using the Real-Time Model Checker UPPAAL
NASA Technical Reports Server (NTRS)
Havelund, Klaus; Larsen, Kim Guldstrand; Skou, Arne
1999-01-01
A real-time system for power-down control in audio/video components is modeled and verified using the real-time model checker UPPAAL. The system is supposed to reside in an audio/video component and control (read from and write to) links to neighbor audio/video components such as TV, VCR and remote-control. In particular, the system is responsible for the powering up and down of the component in between the arrival of data, and in order to do so in a safe way without loss of data, it is essential that no link interrupts are lost. Hence, a component system is a multitasking system with hard real-time requirements, and we present techniques for modeling time consumption in such a multitasked, prioritized system. The work has been carried out in a collaboration between Aalborg University and the audio/video company B&O. By modeling the system, 3 design errors were identified and corrected, and the following verification confirmed the validity of the design but also revealed the necessity for an upper limit of the interrupt frequency. The resulting design has been implemented and it is going to be incorporated as part of a new product line.
Harrison, Neil R; Witheridge, Sian; Makin, Alexis; Wuerger, Sophie M; Pegna, Alan J; Meyer, Georg F
2015-11-01
Motion is represented by low-level signals, such as size-expansion in vision or loudness changes in the auditory modality. The visual and auditory signals from the same object or event may be integrated and facilitate detection. We explored behavioural and electrophysiological correlates of congruent and incongruent audio-visual depth motion in conditions where auditory level changes, visual expansion, and visual disparity cues were manipulated. In Experiment 1 participants discriminated auditory motion direction whilst viewing looming or receding, 2D or 3D, visual stimuli. Responses were faster and more accurate for congruent than for incongruent audio-visual cues, and the congruency effect (i.e., difference between incongruent and congruent conditions) was larger for visual 3D cues compared to 2D cues. In Experiment 2, event-related potentials (ERPs) were collected during presentation of the 2D and 3D, looming and receding, audio-visual stimuli, while participants detected an infrequent deviant sound. Our main finding was that audio-visual congruity was affected by retinal disparity at an early processing stage (135-160ms) over occipito-parietal scalp. Topographic analyses suggested that similar brain networks were activated for the 2D and 3D congruity effects, but that cortical responses were stronger in the 3D condition. Differences between congruent and incongruent conditions were observed between 140-200ms, 220-280ms, and 350-500ms after stimulus onset. Copyright © 2015 Elsevier Ltd. All rights reserved.
78 FR 38093 - Seventh Meeting: RTCA Special Committee 226, Audio Systems and Equipment
Federal Register 2010, 2011, 2012, 2013, 2014
2013-06-25
... Committee 226, Audio Systems and Equipment AGENCY: Federal Aviation Administration (FAA), U.S. Department of Transportation (DOT). ACTION: Meeting Notice of RTCA Special Committee 226, Audio Systems and Equipment. SUMMARY... 226, Audio Systems and Equipment [[Page 38094
Bunegin, L; Wahl, D; Albin, M S
1994-03-01
Cerebral embolism has been implicated in the development of cognitive and neurological deficits following bypass surgery. This study proposes methodology for estimating cerebral air embolus volume using transcranial Doppler sonography. Transcranial Doppler audio signals of air bubbles in the middle cerebral artery obtained from in vivo experiments were subjected to a fast-Fourier transform analysis. Audio segments when no air was present as well as artifact resulting from electrocautery and sensor movement were also subjected to fast-Fourier transform analysis. Spectra were compared, and frequency and power differences were noted and used for development of audio band-pass filters for isolation of frequencies associated with air emboli. In a bench model of the middle cerebral artery circulation, repetitive injections of various air volumes between 0.5 and 500 microL were made. Transcranial Doppler audio output was band-pass filtered, acquired digitally, then subjected to a fast-Fourier transform power spectrum analysis and power spectrum integration. A linear least-squares correlation was performed on the data. Fast-Fourier transform analysis of audio segments indicated that frequencies between 250 and 500 Hz are consistently dominant in the spectrum when air emboli are present. Background frequencies appear to be below 240 Hz, and artifact resulting from sensor movement and electrocautery appears to be below 300 Hz. Data from the middle cerebral artery model filtered through a 307- to 450-Hz band-pass filter yielded a linear relation between emboli volume and the integrated value of the power spectrum near 40 microL. Detection of emboli less than 0.5 microL was inconsistent, and embolus volumes greater than 40 microL were indistinguishable from one another. The preliminary technique described in this study may represent a starting point from which automated detection and volume estimation of cerebral emboli might be approached.
Diagnostic accuracy of sleep bruxism scoring in absence of audio-video recording: a pilot study.
Carra, Maria Clotilde; Huynh, Nelly; Lavigne, Gilles J
2015-03-01
Based on the most recent polysomnographic (PSG) research diagnostic criteria, sleep bruxism is diagnosed when >2 rhythmic masticatory muscle activity (RMMA)/h of sleep are scored on the masseter and/or temporalis muscles. These criteria have not yet been validated for portable PSG systems. This pilot study aimed to assess the diagnostic accuracy of scoring sleep bruxism in absence of audio-video recordings. Ten subjects (mean age 24.7 ± 2.2) with a clinical diagnosis of sleep bruxism spent one night in the sleep laboratory. PSG were performed with a portable system (type 2) while audio-video was recorded. Sleep studies were scored by the same examiner three times: (1) without, (2) with, and (3) without audio-video in order to test the intra-scoring and intra-examiner reliability for RMMA scoring. The RMMA event-by-event concordance rate between scoring without audio-video and with audio-video was 68.3 %. Overall, the RMMA index was overestimated by 23.8 % without audio-video. However, the intra-class correlation coefficient (ICC) between scorings with and without audio-video was good (ICC = 0.91; p < 0.001); the intra-examiner reliability was high (ICC = 0.97; p < 0.001). The clinical diagnosis of sleep bruxism was confirmed in 8/10 subjects based on scoring without audio-video and in 6/10 subjects with audio-video. Although the absence of audio-video recording, the diagnostic accuracy of assessing RMMA with portable PSG systems appeared to remain good, supporting their use for both research and clinical purposes. However, the risk of moderate overestimation in absence of audio-video must be taken into account.
A Robust Zero-Watermarking Algorithm for Audio
NASA Astrophysics Data System (ADS)
Chen, Ning; Zhu, Jie
2007-12-01
In traditional watermarking algorithms, the insertion of watermark into the host signal inevitably introduces some perceptible quality degradation. Another problem is the inherent conflict between imperceptibility and robustness. Zero-watermarking technique can solve these problems successfully. Instead of embedding watermark, the zero-watermarking technique extracts some essential characteristics from the host signal and uses them for watermark detection. However, most of the available zero-watermarking schemes are designed for still image and their robustness is not satisfactory. In this paper, an efficient and robust zero-watermarking technique for audio signal is presented. The multiresolution characteristic of discrete wavelet transform (DWT), the energy compression characteristic of discrete cosine transform (DCT), and the Gaussian noise suppression property of higher-order cumulant are combined to extract essential features from the host audio signal and they are then used for watermark recovery. Simulation results demonstrate the effectiveness of our scheme in terms of inaudibility, detection reliability, and robustness.
Multimodal Speaker Diarization.
Noulas, A; Englebienne, G; Krose, B J A
2012-01-01
We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovisual space. The framework is very robust to different contexts, makes no assumptions about the location of the recording equipment, and does not require labeled training data as it acquires the model parameters using the Expectation Maximization (EM) algorithm. We apply the proposed model to two meeting videos and a news broadcast video, all of which come from publicly available data sets. The results acquired in speaker diarization are in favor of the proposed multimodal framework, which outperforms the single modality analysis results and improves over the state-of-the-art audio-based speaker diarization.
Walker, H Jack; Feild, Hubert S; Giles, William F; Armenakis, Achilles A; Bernerth, Jeremy B
2009-09-01
This study investigated participants' reactions to employee testimonials presented on recruitment Web sites. The authors manipulated the presence of employee testimonials, richness of media communicating testimonials (video with audio vs. picture with text), and representation of racial minorities in employee testimonials. Participants were more attracted to organizations and perceived information as more credible when testimonials were included on recruitment Web sites. Testimonials delivered via video with audio had higher attractiveness and information credibility ratings than those given via picture with text. Results also showed that Blacks responded more favorably, whereas Whites responded more negatively, to the recruiting organization as the proportion of minorities shown giving testimonials on the recruitment Web site increased. However, post hoc analyses revealed that use of a richer medium (video with audio vs. picture with text) to communicate employee testimonials tended to attenuate these racial effects.
McGurk Effect in Gender Identification: Vision Trumps Audition in Voice Judgments.
Peynircioǧlu, Zehra F; Brent, William; Tatz, Joshua R; Wyatt, Jordan
2017-01-01
Demonstrations of non-speech McGurk effects are rare, mostly limited to emotion identification, and sometimes not considered true analogues. We presented videos of males and females singing a single syllable on the same pitch and asked participants to indicate the true range of the voice-soprano, alto, tenor, or bass. For one group of participants, the gender shown on the video matched the gender of the voice heard, and for the other group they were mismatched. Soprano or alto responses were interpreted as "female voice" decisions and tenor or bass responses as "male voice" decisions. Identification of the voice gender was 100% correct in the preceding audio-only condition. However, whereas performance was also 100% correct in the matched video/audio condition, it was only 31% correct in the mismatched video/audio condition. Thus, the visual gender information overrode the voice gender identification, showing a robust non-speech McGurk effect.
Savran, Arman; Cao, Houwei; Shah, Miraj; Nenkova, Ani; Verma, Ragini
2013-01-01
We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics. Tested on the Audio-visual Emotion Recognition Challenge dataset, our single modality estimators achieve substantially higher scores than the official baseline method for every dimension of affect. Our filtering-based multi-modality fusion achieves correlation performance of 0.344 (baseline: 0.136) and 0.280 (baseline: 0.096) for the fully continuous and word level sub challenges, respectively. PMID:25300451
Savran, Arman; Cao, Houwei; Shah, Miraj; Nenkova, Ani; Verma, Ragini
2012-01-01
We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics. Tested on the Audio-visual Emotion Recognition Challenge dataset, our single modality estimators achieve substantially higher scores than the official baseline method for every dimension of affect. Our filtering-based multi-modality fusion achieves correlation performance of 0.344 (baseline: 0.136) and 0.280 (baseline: 0.096) for the fully continuous and word level sub challenges, respectively.
Imaging of conductivity distributions using audio-frequency electromagnetic data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Ki Ha; Morrison, H.F.
1990-10-01
The objective of this study has been to develop mathematical methods for mapping conductivity distributions between boreholes using low frequency electromagnetic (em) data. In relation to this objective this paper presents two recent developments in high-resolution crosshole em imaging techniques. These are (1) audio-frequency diffusion tomography, and (2) a transform method in which low frequency data is first transformed into a wave-like field. The idea in the second approach is that we can then treat the transformed field using conventional techniques designed for wave field analysis.
A device for recording automatic audio tape recording1
Bernal, Martha E.; Gibson, Dennis M.; Williams, Donald E.; Pesses, Danny I.
1971-01-01
Adaptation of a commercially available timer for use as a means of operating an audio tape recorder several times during the day is described. Data on a mother's rates of commanding her children were collected via both physically present observer and recorder methods in order to compare the usefulness of the recordings with direct observation. There was a high positive relationship between observer-recorder command rates, with the observer rates being consistently higher, when data were collected via both methods simultaneously as well as at different points in time. ImagesFig. 1 PMID:16795287
A device for recording automatic audio tape recording.
Bernal, M E; Gibson, D M; Williams, D E; Pesses, D I
1971-01-01
Adaptation of a commercially available timer for use as a means of operating an audio tape recorder several times during the day is described. Data on a mother's rates of commanding her children were collected via both physically present observer and recorder methods in order to compare the usefulness of the recordings with direct observation. There was a high positive relationship between observer-recorder command rates, with the observer rates being consistently higher, when data were collected via both methods simultaneously as well as at different points in time.
Optimal Window and Lattice in Gabor Transform. Application to Audio Analysis.
Lachambre, Helene; Ricaud, Benjamin; Stempfel, Guillaume; Torrésani, Bruno; Wiesmeyr, Christoph; Onchis-Moaca, Darian
2015-01-01
This article deals with the use of optimal lattice and optimal window in Discrete Gabor Transform computation. In the case of a generalized Gaussian window, extending earlier contributions, we introduce an additional local window adaptation technique for non-stationary signals. We illustrate our approach and the earlier one by addressing three time-frequency analysis problems to show the improvements achieved by the use of optimal lattice and window: close frequencies distinction, frequency estimation and SNR estimation. The results are presented, when possible, with real world audio signals.
47 CFR 73.403 - Digital audio broadcasting service requirements.
Code of Federal Regulations, 2010 CFR
2010-10-01
... programming stream at no direct charge to listeners. In addition, a broadcast radio station must simulcast its analog audio programming on one of its digital audio programming streams. The DAB audio programming... analog programming service currently provided to listeners. (b) Emergency information. The emergency...
High-Fidelity Piezoelectric Audio Device
NASA Technical Reports Server (NTRS)
Woodward, Stanley E.; Fox, Robert L.; Bryant, Robert G.
2003-01-01
ModalMax is a very innovative means of harnessing the vibration of a piezoelectric actuator to produce an energy efficient low-profile device with high-bandwidth high-fidelity audio response. The piezoelectric audio device outperforms many commercially available speakers made using speaker cones. The piezoelectric device weighs substantially less (4 g) than the speaker cones which use magnets (10 g). ModalMax devices have extreme fabrication simplicity. The entire audio device is fabricated by lamination. The simplicity of the design lends itself to lower cost. The piezoelectric audio device can be used without its acoustic chambers and thereby resulting in a very low thickness of 0.023 in. (0.58 mm). The piezoelectric audio device can be completely encapsulated, which makes it very attractive for use in wet environments. Encapsulation does not significantly alter the audio response. Its small size (see Figure 1) is applicable to many consumer electronic products, such as pagers, portable radios, headphones, laptop computers, computer monitors, toys, and electronic games. The audio device can also be used in automobile or aircraft sound systems.
NASA Astrophysics Data System (ADS)
McMullen, Kyla A.
Although the concept of virtual spatial audio has existed for almost twenty-five years, only in the past fifteen years has modern computing technology enabled the real-time processing needed to deliver high-precision spatial audio. Furthermore, the concept of virtually walking through an auditory environment did not exist. The applications of such an interface have numerous potential uses. Spatial audio has the potential to be used in various manners ranging from enhancing sounds delivered in virtual gaming worlds to conveying spatial locations in real-time emergency response systems. To incorporate this technology in real-world systems, various concerns should be addressed. First, to widely incorporate spatial audio into real-world systems, head-related transfer functions (HRTFs) must be inexpensively created for each user. The present study further investigated an HRTF subjective selection procedure previously developed within our research group. Users discriminated auditory cues to subjectively select their preferred HRTF from a publicly available database. Next, the issue of training to find virtual sources was addressed. Listeners participated in a localization training experiment using their selected HRTFs. The training procedure was created from the characterization of successful search strategies in prior auditory search experiments. Search accuracy significantly improved after listeners performed the training procedure. Next, in the investigation of auditory spatial memory, listeners completed three search and recall tasks with differing recall methods. Recall accuracy significantly decreased in tasks that required the storage of sound source configurations in memory. To assess the impacts of practical scenarios, the present work assessed the performance effects of: signal uncertainty, visual augmentation, and different attenuation modeling. Fortunately, source uncertainty did not affect listeners' ability to recall or identify sound sources. The present study also found that the presence of visual reference frames significantly increased recall accuracy. Additionally, the incorporation of drastic attenuation significantly improved environment recall accuracy. Through investigating the aforementioned concerns, the present study made initial footsteps guiding the design of virtual auditory environments that support spatial configuration recall.
Validation of a digital audio recording method for the objective assessment of cough in the horse.
Duz, M; Whittaker, A G; Love, S; Parkin, T D H; Hughes, K J
2010-10-01
To validate the use of digital audio recording and analysis for quantification of coughing in horses. Part A: Nine simultaneous digital audio and video recordings were collected individually from seven stabled horses over a 1 h period using a digital audio recorder attached to the halter. Audio files were analysed using audio analysis software. Video and audio recordings were analysed for cough count and timing by two blinded operators on two occasions using a randomised study design for determination of intra-operator and inter-operator agreement. Part B: Seventy-eight hours of audio recordings obtained from nine horses were analysed once by two blinded operators to assess inter-operator repeatability on a larger sample. Part A: There was complete agreement between audio and video analyses and inter- and intra-operator analyses. Part B: There was >97% agreement between operators on number and timing of 727 coughs recorded over 78 h. The results of this study suggest that the cough monitor methodology used has excellent sensitivity and specificity for the objective assessment of cough in horses and intra- and inter-operator variability of recorded coughs is minimal. Crown Copyright 2010. Published by Elsevier India Pvt Ltd. All rights reserved.
47 CFR 73.9005 - Compliance requirements for covered demodulator products: Audio.
Code of Federal Regulations, 2010 CFR
2010-10-01
... products: Audio. 73.9005 Section 73.9005 Telecommunication FEDERAL COMMUNICATIONS COMMISSION (CONTINUED....9005 Compliance requirements for covered demodulator products: Audio. Except as otherwise provided in §§ 73.9003(a) or 73.9004(a), covered demodulator products shall not output the audio portions of...
36 CFR 1002.12 - Audio disturbances.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 36 Parks, Forests, and Public Property 3 2014-07-01 2014-07-01 false Audio disturbances. 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...
36 CFR 1002.12 - Audio disturbances.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 36 Parks, Forests, and Public Property 3 2012-07-01 2012-07-01 false Audio disturbances. 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...
50 CFR 27.72 - Audio equipment.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 50 Wildlife and Fisheries 6 2010-10-01 2010-10-01 false Audio equipment. 27.72 Section 27.72 Wildlife and Fisheries UNITED STATES FISH AND WILDLIFE SERVICE, DEPARTMENT OF THE INTERIOR (CONTINUED) THE... Audio equipment. The operation or use of audio devices including radios, recording and playback devices...
36 CFR 1002.12 - Audio disturbances.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 36 Parks, Forests, and Public Property 3 2011-07-01 2011-07-01 false Audio disturbances. 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...
36 CFR 1002.12 - Audio disturbances.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false Audio disturbances. 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...
50 CFR 27.72 - Audio equipment.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 50 Wildlife and Fisheries 8 2011-10-01 2011-10-01 false Audio equipment. 27.72 Section 27.72 Wildlife and Fisheries UNITED STATES FISH AND WILDLIFE SERVICE, DEPARTMENT OF THE INTERIOR (CONTINUED) THE... Audio equipment. The operation or use of audio devices including radios, recording and playback devices...
50 CFR 27.72 - Audio equipment.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 50 Wildlife and Fisheries 9 2012-10-01 2012-10-01 false Audio equipment. 27.72 Section 27.72 Wildlife and Fisheries UNITED STATES FISH AND WILDLIFE SERVICE, DEPARTMENT OF THE INTERIOR (CONTINUED) THE... Audio equipment. The operation or use of audio devices including radios, recording and playback devices...
47 CFR 87.483 - Audio visual warning systems.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 47 Telecommunication 5 2014-10-01 2014-10-01 false Audio visual warning systems. 87.483 Section 87... AVIATION SERVICES Stations in the Radiodetermination Service § 87.483 Audio visual warning systems. An audio visual warning system (AVWS) is a radar-based obstacle avoidance system. AVWS activates...
Semantic Context Detection Using Audio Event Fusion
NASA Astrophysics Data System (ADS)
Chu, Wei-Ta; Cheng, Wen-Huang; Wu, Ja-Ling
2006-12-01
Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish semantic context detection. Two levels of modeling, audio event and semantic context modeling, are devised to bridge the gap between physical audio features and semantic concepts. In this work, hidden Markov models (HMMs) are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the semantic context level, generative (ergodic hidden Markov model) and discriminative (support vector machine (SVM)) approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics.
Software tools for developing an acoustics multimedia CD-ROM
NASA Astrophysics Data System (ADS)
Bigelow, Todd W.; Wheeler, Paul A.
2003-10-01
A multimedia CD-ROM was developed to accompany the textbook, Science of Sound, by Tom Rossing. This paper discusses the multimedia elements included in the CD-ROM and the various software packages used to create them. PowerPoint presentations with an audio-track background were converted to web pages using Impatica. Animations of acoustic examples and quizzes were developed using Flash by Macromedia. Vegas Video and Sound Forge by Sonic Foundry were used for editing video and audio clips while Cleaner by Discreet was used to compress the clips for use over the internet. Math tutorials were presented as whiteboard presentations using Hitachis Starboard to create the graphics and TechSmiths Camtasia Studio to record the presentations. The CD-ROM is in a web-page format created with Macromedias Dreamweaver. All of these elements are integrated into a single course supplement that can be viewed by any computer with a web browser.
Listeners' expectation of room acoustical parameters based on visual cues
NASA Astrophysics Data System (ADS)
Valente, Daniel L.
Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audio-visual study, in which participants are instructed to make spatial congruency and quantity judgments in dynamic cross-modal environments. The results of these psychophysical tests suggest the importance of consilient audio-visual presentation to the legibility of an auditory scene. Several studies have looked into audio-visual interaction in room perception in recent years, but these studies rely on static images, speech signals, or photographs alone to represent the visual scene. Building on these studies, the aim is to propose a testing method that uses monochromatic compositing (blue-screen technique) to position a studio recording of a musical performance in a number of virtual acoustical environments and ask subjects to assess these environments. In the first experiment of the study, video footage was taken from five rooms varying in physical size from a small studio to a small performance hall. Participants were asked to perceptually align two distinct acoustical parameters---early-to-late reverberant energy ratio and reverberation time---of two solo musical performances in five contrasting visual environments according to their expectations of how the room should sound given its visual appearance. In the second experiment in the study, video footage shot from four different listening positions within a general-purpose space was coupled with sounds derived from measured binaural impulse responses (IRs). The relationship between the presented image, sound, and virtual receiver position was examined. It was found that many visual cues caused different perceived events of the acoustic environment. This included the visual attributes of the space in which the performance was located as well as the visual attributes of the performer. The addressed visual makeup of the performer included: (1) an actual video of the performance, (2) a surrogate image of the performance, for example a loudspeaker's image reproducing the performance, (3) no visual image of the performance (empty room), or (4) a multi-source visual stimulus (actual video of the performance coupled with two images of loudspeakers positioned to the left and right of the performer). For this experiment, perceived auditory events of sound were measured in terms of two subjective spatial metrics: Listener Envelopment (LEV) and Apparent Source Width (ASW) These metrics were hypothesized to be dependent on the visual imagery of the presented performance. Data was also collected by participants matching direct and reverberant sound levels for the presented audio-visual scenes. In the final experiment, participants judged spatial expectations of an ensemble of musicians presented in the five physical spaces from Experiment 1. Supporting data was accumulated in two stages. First, participants were given an audio-visual matching test, in which they were instructed to align the auditory width of a performing ensemble to a varying set of audio and visual cues. In the second stage, a conjoint analysis design paradigm was explored to extrapolate the relative magnitude of explored audio-visual factors in affecting three assessed response criteria: Congruency (the perceived match-up of the auditory and visual cues in the assessed performance), ASW and LEV. Results show that both auditory and visual factors affect the collected responses, and that the two sensory modalities coincide in distinct interactions. This study reveals participant resiliency in the presence of forced auditory-visual mismatch: Participants are able to adjust the acoustic component of the cross-modal environment in a statistically similar way despite randomized starting values for the monitored parameters. Subjective results of the experiments are presented along with objective measurements for verification.
JIP: Java image processing on the Internet
NASA Astrophysics Data System (ADS)
Wang, Dongyan; Lin, Bo; Zhang, Jun
1998-12-01
In this paper, we present JIP - Java Image Processing on the Internet, a new Internet based application for remote education and software presentation. JIP offers an integrate learning environment on the Internet where remote users not only can share static HTML documents and lectures notes, but also can run and reuse dynamic distributed software components, without having the source code or any extra work of software compilation, installation and configuration. By implementing a platform-independent distributed computational model, local computational resources are consumed instead of the resources on a central server. As an extended Java applet, JIP allows users to selected local image files on their computers or specify any image on the Internet using an URL as input. Multimedia lectures such as streaming video/audio and digital images are integrated into JIP and intelligently associated with specific image processing functions. Watching demonstrations an practicing the functions with user-selected input data dramatically encourages leaning interest, while promoting the understanding of image processing theory. The JIP framework can be easily applied to other subjects in education or software presentation, such as digital signal processing, business, mathematics, physics, or other areas such as employee training and charged software consumption.
Huber, Rainer; Meis, Markus; Klink, Karin; Bartsch, Christian; Bitzer, Joerg
2014-01-01
Within the Lower Saxony Research Network Design of Environments for Ageing (GAL), a personal activity and household assistant (PAHA), an ambient reminder system, has been developed. One of its central output modality to interact with the user is sound. The study presented here evaluated three different system technologies for sound reproduction using up to five loudspeakers, including the "phantom source" concept. Moreover, a technology for hearing loss compensation for the mostly older users of the PAHA was implemented and evaluated. Evaluation experiments with 21 normal hearing and hearing impaired test subjects were carried out. The results show that after direct comparison of the sound presentation concepts, the presentation by the single TV speaker was most preferred, whereas the phantom source concept got the highest acceptance ratings as far as the general concept is concerned. The localization accuracy of the phantom source concept was good as long as the exact listening position was known to the algorithm and speech stimuli were used. Most subjects preferred the original signals over the pre-processed, dynamic-compressed signals, although processed speech was often described as being clearer.
Subjective evaluation and electroacoustic theoretical validation of a new approach to audio upmixing
NASA Astrophysics Data System (ADS)
Usher, John S.
Audio signal processing systems for converting two-channel (stereo) recordings to four or five channels are increasingly relevant. These audio upmixers can be used with conventional stereo sound recordings and reproduced with multichannel home theatre or automotive loudspeaker audio systems to create a more engaging and natural-sounding listening experience. This dissertation discusses existing approaches to audio upmixing for recordings of musical performances and presents specific design criteria for a system to enhance spatial sound quality. A new upmixing system is proposed and evaluated according to these criteria and a theoretical model for its behavior is validated using empirical measurements. The new system removes short-term correlated components from two electronic audio signals using a pair of adaptive filters, updated according to a frequency domain implementation of the normalized-least-means-square algorithm. The major difference of the new system with all extant audio upmixers is that unsupervised time-alignment of the input signals (typically, by up to +/-10 ms) as a function of frequency (typically, using a 1024-band equalizer) is accomplished due to the non-minimum phase adaptive filter. Two new signals are created from the weighted difference of the inputs, and are then radiated with two loudspeakers behind the listener. According to the consensus in the literature on the effect of interaural correlation on auditory image formation, the self-orthogonalizing properties of the algorithm ensure minimal distortion of the frontal source imagery and natural-sounding, enveloping reverberance (ambiance) imagery. Performance evaluation of the new upmix system was accomplished in two ways: Firstly, using empirical electroacoustic measurements which validate a theoretical model of the system; and secondly, with formal listening tests which investigated auditory spatial imagery with a graphical mapping tool and a preference experiment. Both electroacoustic and subjective methods investigated system performance with a variety of test stimuli for solo musical performances reproduced using a loudspeaker in an orchestral concert-hall and recorded using different microphone techniques. The objective and subjective evaluations combined with a comparative study with two commercial systems demonstrate that the proposed system provides a new, computationally practical, high sound quality solution to upmixing.
The use of ambient audio to increase safety and immersion in location-based games
NASA Astrophysics Data System (ADS)
Kurczak, John Jason
The purpose of this thesis is to propose an alternative type of interface for mobile software being used while walking or running. Our work addresses the problem of visual user interfaces for mobile software be- ing potentially unsafe for pedestrians, and not being very immersive when used for location-based games. In addition, location-based games and applications can be dif- ficult to develop when directly interfacing with the sensors used to track the user's location. These problems need to be addressed because portable computing devices are be- coming a popular tool for navigation, playing games, and accessing the internet while walking. This poses a safety problem for mobile users, who may be paying too much attention to their device to notice and react to hazards in their environment. The difficulty of developing location-based games and other location-aware applications may significantly hinder the prevalence of applications that explore new interaction techniques for ubiquitous computing. We created the TREC toolkit to address the issues with tracking sensors while developing location-based games and applications. We have developed functional location-based applications with TREC to demonstrate the amount of work that can be saved by using this toolkit. In order to have a safer and more immersive alternative to visual interfaces, we have developed ambient audio interfaces for use with mobile applications. Ambient audio uses continuous streams of sound over headphones to present information to mobile users without distracting them from walking safely. In order to test the effectiveness of ambient audio, we ran a study to compare ambient audio with handheld visual interfaces in a location-based game. We compared players' ability to safely navigate the environment, their sense of immersion in the game, and their performance at the in-game tasks. We found that ambient audio was able to significantly increase players' safety and sense of immersion compared to a visual interface, while players performed signifi- cantly better at the game tasks when using the visual interface. This makes ambient audio a legitimate alternative to visual interfaces for mobile users when safety and immersion are a priority.
47 CFR 10.520 - Common audio attention signal.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 1 2011-10-01 2011-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...
36 CFR 2.12 - Audio disturbances.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 36 Parks, Forests, and Public Property 1 2012-07-01 2012-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...
36 CFR 2.12 - Audio disturbances.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 36 Parks, Forests, and Public Property 1 2010-07-01 2010-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...
37 CFR 202.22 - Acquisition and deposit of unpublished audio and audiovisual transmission programs.
Code of Federal Regulations, 2011 CFR
2011-07-01
... unpublished audio and audiovisual transmission programs. 202.22 Section 202.22 Patents, Trademarks, and... REGISTRATION OF CLAIMS TO COPYRIGHT § 202.22 Acquisition and deposit of unpublished audio and audiovisual... and copies of unpublished audio and audiovisual transmission programs by the Library of Congress under...
36 CFR § 1002.12 - Audio disturbances.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 36 Parks, Forests, and Public Property 3 2013-07-01 2012-07-01 true Audio disturbances. § 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...
47 CFR 10.520 - Common audio attention signal.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 47 Telecommunication 1 2013-10-01 2013-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...
37 CFR 202.22 - Acquisition and deposit of unpublished audio and audiovisual transmission programs.
Code of Federal Regulations, 2012 CFR
2012-07-01
... unpublished audio and audiovisual transmission programs. 202.22 Section 202.22 Patents, Trademarks, and... REGISTRATION OF CLAIMS TO COPYRIGHT § 202.22 Acquisition and deposit of unpublished audio and audiovisual... and copies of unpublished audio and audiovisual transmission programs by the Library of Congress under...
36 CFR 2.12 - Audio disturbances.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 36 Parks, Forests, and Public Property 1 2013-07-01 2013-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...
37 CFR 202.22 - Acquisition and deposit of unpublished audio and audiovisual transmission programs.
Code of Federal Regulations, 2013 CFR
2013-07-01
... unpublished audio and audiovisual transmission programs. 202.22 Section 202.22 Patents, Trademarks, and... REGISTRATION OF CLAIMS TO COPYRIGHT § 202.22 Acquisition and deposit of unpublished audio and audiovisual... and copies of unpublished audio and audiovisual transmission programs by the Library of Congress under...
ENERGY STAR Certified Audio Video
Certified models meet all ENERGY STAR requirements as listed in the Version 3.0 ENERGY STAR Program Requirements for Audio Video Equipment that are effective as of May 1, 2013. A detailed listing of key efficiency criteria are available at http://www.energystar.gov/index.cfm?c=audio_dvd.pr_crit_audio_dvd
36 CFR 2.12 - Audio disturbances.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 36 Parks, Forests, and Public Property 1 2014-07-01 2014-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...
Code of Federal Regulations, 2014 CFR
2014-10-01
...: (1) Inputs. Decoders must have the capability to receive at least two audio inputs from EAS... externally, at least two minutes of audio or text messages. A decoder manufactured without an internal means to record and store audio or text must be equipped with a means (such as an audio or digital jack...
Code of Federal Regulations, 2013 CFR
2013-10-01
...: (1) Inputs. Decoders must have the capability to receive at least two audio inputs from EAS... externally, at least two minutes of audio or text messages. A decoder manufactured without an internal means to record and store audio or text must be equipped with a means (such as an audio or digital jack...
47 CFR 10.520 - Common audio attention signal.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 47 Telecommunication 1 2014-10-01 2014-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...
37 CFR 202.22 - Acquisition and deposit of unpublished audio and audiovisual transmission programs.
Code of Federal Regulations, 2014 CFR
2014-07-01
... unpublished audio and audiovisual transmission programs. 202.22 Section 202.22 Patents, Trademarks, and... REGISTRATION OF CLAIMS TO COPYRIGHT § 202.22 Acquisition and deposit of unpublished audio and audiovisual... and copies of unpublished audio and audiovisual transmission programs by the Library of Congress under...
47 CFR 10.520 - Common audio attention signal.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 47 Telecommunication 1 2012-10-01 2012-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...
Code of Federal Regulations, 2012 CFR
2012-10-01
...: (1) Inputs. Decoders must have the capability to receive at least two audio inputs from EAS... externally, at least two minutes of audio or text messages. A decoder manufactured without an internal means to record and store audio or text must be equipped with a means (such as an audio or digital jack...
36 CFR 2.12 - Audio disturbances.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 36 Parks, Forests, and Public Property 1 2011-07-01 2011-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...
Sounding ruins: reflections on the production of an 'audio drift'.
Gallagher, Michael
2015-07-01
This article is about the use of audio media in researching places, which I term 'audio geography'. The article narrates some episodes from the production of an 'audio drift', an experimental environmental sound work designed to be listened to on a portable MP3 player whilst walking in a ruinous landscape. Reflecting on how this work functions, I argue that, as well as representing places, audio geography can shape listeners' attention and bodily movements, thereby reworking places, albeit temporarily. I suggest that audio geography is particularly apt for amplifying the haunted and uncanny qualities of places. I discuss some of the issues raised for research ethics, epistemology and spectral geographies.
Sounding ruins: reflections on the production of an ‘audio drift’
Gallagher, Michael
2014-01-01
This article is about the use of audio media in researching places, which I term ‘audio geography’. The article narrates some episodes from the production of an ‘audio drift’, an experimental environmental sound work designed to be listened to on a portable MP3 player whilst walking in a ruinous landscape. Reflecting on how this work functions, I argue that, as well as representing places, audio geography can shape listeners’ attention and bodily movements, thereby reworking places, albeit temporarily. I suggest that audio geography is particularly apt for amplifying the haunted and uncanny qualities of places. I discuss some of the issues raised for research ethics, epistemology and spectral geographies. PMID:29708107
DETECTOR FOR MODULATED AND UNMODULATED SIGNALS
Patterson, H.H.; Webber, G.H.
1959-08-25
An r-f signal-detecting device is described, which is embodied in a compact coaxial circuit principally comprising a detecting crystal diode and a modulating crystal diode connected in parallel. Incoming modulated r-f signals are demodulated by the detecting crystal diode to furnish an audio input to an audio amplifier. The detecting diode will not, however, produce an audio signal from an unmodulated r-f signal. In order that unmodulated signals may be detected, such incoming signals have a locally produced audio signal superimposed on them at the modulating crystal diode and then the"induced or artificially modulated" signal is reflected toward the detecting diode which in the process of demodulation produces an audio signal for the audio amplifier.
Ganesh, Attigodu Chandrashekara; Berthommier, Frédéric; Schwartz, Jean-Luc
2016-01-01
We introduce "Audio-Visual Speech Scene Analysis" (AVSSA) as an extension of the two-stage Auditory Scene Analysis model towards audiovisual scenes made of mixtures of speakers. AVSSA assumes that a coherence index between the auditory and the visual input is computed prior to audiovisual fusion, enabling to determine whether the sensory inputs should be bound together. Previous experiments on the modulation of the McGurk effect by audiovisual coherent vs. incoherent contexts presented before the McGurk target have provided experimental evidence supporting AVSSA. Indeed, incoherent contexts appear to decrease the McGurk effect, suggesting that they produce lower audiovisual coherence hence less audiovisual fusion. The present experiments extend the AVSSA paradigm by creating contexts made of competing audiovisual sources and measuring their effect on McGurk targets. The competing audiovisual sources have respectively a high and a low audiovisual coherence (that is, large vs. small audiovisual comodulations in time). The first experiment involves contexts made of two auditory sources and one video source associated to either the first or the second audio source. It appears that the McGurk effect is smaller after the context made of the visual source associated to the auditory source with less audiovisual coherence. In the second experiment with the same stimuli, the participants are asked to attend to either one or the other source. The data show that the modulation of fusion depends on the attentional focus. Altogether, these two experiments shed light on audiovisual binding, the AVSSA process and the role of attention.
Covic, Amra; Keitel, Christian; Porcu, Emanuele; Schröger, Erich; Müller, Matthias M
2017-11-01
The neural processing of a visual stimulus can be facilitated by attending to its position or by a co-occurring auditory tone. Using frequency-tagging, we investigated whether facilitation by spatial attention and audio-visual synchrony rely on similar neural processes. Participants attended to one of two flickering Gabor patches (14.17 and 17 Hz) located in opposite lower visual fields. Gabor patches further "pulsed" (i.e. showed smooth spatial frequency variations) at distinct rates (3.14 and 3.63 Hz). Frequency-modulating an auditory stimulus at the pulse-rate of one of the visual stimuli established audio-visual synchrony. Flicker and pulsed stimulation elicited stimulus-locked rhythmic electrophysiological brain responses that allowed tracking the neural processing of simultaneously presented Gabor patches. These steady-state responses (SSRs) were quantified in the spectral domain to examine visual stimulus processing under conditions of synchronous vs. asynchronous tone presentation and when respective stimulus positions were attended vs. unattended. Strikingly, unique patterns of effects on pulse- and flicker driven SSRs indicated that spatial attention and audiovisual synchrony facilitated early visual processing in parallel and via different cortical processes. We found attention effects to resemble the classical top-down gain effect facilitating both, flicker and pulse-driven SSRs. Audio-visual synchrony, in turn, only amplified synchrony-producing stimulus aspects (i.e. pulse-driven SSRs) possibly highlighting the role of temporally co-occurring sights and sounds in bottom-up multisensory integration. Copyright © 2017 Elsevier Inc. All rights reserved.
Liu, Baolin; Meng, Xianyao; Wang, Zhongning; Wu, Guangning
2011-11-14
In the present study, we used event-related potentials (ERPs) to examine whether semantic integration occurs for ecologically unrelated audio-visual information. Videos with synchronous audio-visual information were used as stimuli, where the auditory stimuli were sine wave sounds with different sound levels, and the visual stimuli were simple geometric figures with different areas. In the experiment, participants were shown an initial display containing a single shape (drawn from a set of 6 shapes) with a fixed size (14cm(2)) simultaneously with a 3500Hz tone of a fixed intensity (80dB). Following a short delay, another shape/tone pair was presented and the relationship between the size of the shape and the intensity of the tone varied across trials: in the V+A- condition, a large shape was paired with a soft tone; in the V+A+ condition, a large shape was paired with a loud tone, and so forth. The ERPs results revealed that N400 effect was elicited under the VA- condition (V+A- and V-A+) as compared to the VA+ condition (V+A+ and V-A-). It was shown that semantic integration would occur when simultaneous, ecologically unrelated auditory and visual stimuli enter the human brain. We considered that this semantic integration was based on semantic constraint of audio-visual information, which might come from the long-term learned association stored in the human brain and short-term experience of incoming information. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Liu, Hong; Zhang, Gaoyan; Liu, Baolin
2017-04-01
In the Chinese language, a polyphone is a kind of special character that has more than one pronunciation, with each pronunciation corresponding to a different meaning. Here, we aimed to reveal the cognitive processing of audio-visual information integration of polyphones in a sentence context using the event-related potential (ERP) method. Sentences ending with polyphones were presented to subjects simultaneously in both an auditory and a visual modality. Four experimental conditions were set in which the visual presentations were the same, but the pronunciations of the polyphones were: the correct pronunciation; another pronunciation of the polyphone; a semantically appropriate pronunciation but not the pronunciation of the polyphone; or a semantically inappropriate pronunciation but also not the pronunciation of the polyphone. The behavioral results demonstrated significant differences in response accuracies when judging the semantic meanings of the audio-visual sentences, which reflected the different demands on cognitive resources. The ERP results showed that in the early stage, abnormal pronunciations were represented by the amplitude of the P200 component. Interestingly, because the phonological information mediated access to the lexical semantics, the amplitude and latency of the N400 component changed linearly across conditions, which may reflect the gradually increased semantic mismatch in the four conditions when integrating the auditory pronunciation with the visual information. Moreover, the amplitude of the late positive shift (LPS) showed a significant correlation with the behavioral response accuracies, demonstrating that the LPS component reveals the demand of cognitive resources for monitoring and resolving semantic conflicts when integrating the audio-visual information.
A digital audio/video interleaving system. [for Shuttle Orbiter
NASA Technical Reports Server (NTRS)
Richards, R. W.
1978-01-01
A method of interleaving an audio signal with its associated video signal for simultaneous transmission or recording, and the subsequent separation of the two signals, is described. Comparisons are made between the new audio signal interleaving system and the Skylab Pam audio/video interleaving system, pointing out improvements gained by using the digital audio/video interleaving system. It was found that the digital technique is the simplest, most effective and most reliable method for interleaving audio and/or other types of data into the video signal for the Shuttle Orbiter application. Details of the design of a multiplexer capable of accommodating two basic data channels, each consisting of a single 31.5-kb/s digital bit stream are given. An adaptive slope delta modulation system is introduced to digitize audio signals, producing a high immunity of work intelligibility to channel errors, primarily due to the robust nature of the delta-modulation algorithm.
Valente, Daniel L.; Braasch, Jonas; Myrbeck, Shane A.
2012-01-01
Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audiovisual environment in which participants were instructed to make auditory width judgments in dynamic bi-modal settings. The results of these psychophysical tests suggest the importance of congruent audio visual presentation to the ecological interpretation of an auditory scene. Supporting data were accumulated in five rooms of ascending volumes and varying reverberation times. Participants were given an audiovisual matching test in which they were instructed to pan the auditory width of a performing ensemble to a varying set of audio and visual cues in rooms. Results show that both auditory and visual factors affect the collected responses and that the two sensory modalities coincide in distinct interactions. The greatest differences between the panned audio stimuli given a fixed visual width were found in the physical space with the largest volume and the greatest source distance. These results suggest, in this specific instance, a predominance of auditory cues in the spatial analysis of the bi-modal scene. PMID:22280585
PDF Lecture Materials for Online and ``Flipped'' Format Astronomy Courses
NASA Astrophysics Data System (ADS)
Kary, D. M.; Eisberg, J.
2013-04-01
Online astronomy courses typically rely on students reading the textbook and/or a set of text-based lecture notes to replace the “lecture” material. However, many of our students report that this is much less engaging than in-person lectures, especially given the amount of interactive work such as “think-pair-share” problems done in many astronomy classes. Students have similarly criticized direct lecture-capture. To address this, we have developed a set of PowerPoint-style presentations with embedded lecture audio combined with prompts for student interaction including think-pair-share questions. These are formatted PDF packages that can be used on a range of different computers using free software. The presentations are first developed using Microsoft PowerPoint software. Audio recordings of scripted lectures are then synchronized with the presentations and the entire package is converted to PDF using Adobe Presenter. This approach combines the ease of editing that PowerPoint provides along with the platform-independence of PDF. It's easy to add, remove, or edit individual slides as needed, and PowerPoint supports internal links so that think-pair-share questions can be inserted with links to feedback based on the answers selected. Modern PDF files support animated visuals with synchronized audio and they can be read using widely available free software. Using these files students in an online course can get many of the benefits of seeing and hearing the course material presented in an in-person lecture format. Students needing extra help in traditional lecture classes can use these presentations to help review the materials covered in lecture. Finally, the presentations can be used in a “flipped” format in which students work through the presentations outside of class time while spending the “lecture” time on in-class interaction.
Eyben, Florian; Weninger, Felix; Lehment, Nicolas; Schuller, Björn; Rigoll, Gerhard
2013-01-01
Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology "out of the lab" to real-world, diverse data. In this contribution, we address the problem of finding "disturbing" scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.
Eyben, Florian; Weninger, Felix; Lehment, Nicolas; Schuller, Björn; Rigoll, Gerhard
2013-01-01
Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology “out of the lab” to real-world, diverse data. In this contribution, we address the problem of finding “disturbing” scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis. PMID:24391704
Characteristics of audio and sub-audio telluric signals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Telford, W.M.
1977-06-01
Telluric current measurements in the audio and sub-audio frequency range, made in various parts of Canada and South America over the past four years, indicate that the signal amplitude is relatively uniform over 6 to 8 midday hours (LMT) except in Chile and that the signal anisotropy is reasonably constant in azimuth.
43 CFR 8365.2-2 - Audio devices.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 43 Public Lands: Interior 2 2013-10-01 2013-10-01 false Audio devices. 8365.2-2 Section 8365.2-2..., DEPARTMENT OF THE INTERIOR RECREATION PROGRAMS VISITOR SERVICES Rules of Conduct § 8365.2-2 Audio devices. On... audio device such as a radio, television, musical instrument, or other noise producing device or...
43 CFR 8365.2-2 - Audio devices.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 43 Public Lands: Interior 2 2012-10-01 2012-10-01 false Audio devices. 8365.2-2 Section 8365.2-2..., DEPARTMENT OF THE INTERIOR RECREATION PROGRAMS VISITOR SERVICES Rules of Conduct § 8365.2-2 Audio devices. On... audio device such as a radio, television, musical instrument, or other noise producing device or...
43 CFR 8365.2-2 - Audio devices.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false Audio devices. 8365.2-2 Section 8365.2-2..., DEPARTMENT OF THE INTERIOR RECREATION PROGRAMS VISITOR SERVICES Rules of Conduct § 8365.2-2 Audio devices. On... audio device such as a radio, television, musical instrument, or other noise producing device or...
43 CFR 8365.2-2 - Audio devices.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 43 Public Lands: Interior 2 2014-10-01 2014-10-01 false Audio devices. 8365.2-2 Section 8365.2-2..., DEPARTMENT OF THE INTERIOR RECREATION PROGRAMS VISITOR SERVICES Rules of Conduct § 8365.2-2 Audio devices. On... audio device such as a radio, television, musical instrument, or other noise producing device or...
78 FR 18416 - Sixth Meeting: RTCA Special Committee 226, Audio Systems and Equipment
Federal Register 2010, 2011, 2012, 2013, 2014
2013-03-26
... 226, Audio Systems and Equipment AGENCY: Federal Aviation Administration (FAA), U.S. Department of Transportation (DOT). ACTION: Meeting Notice of RTCA Special Committee 226, Audio Systems and Equipment. SUMMARY... 226, Audio Systems and Equipment. DATES: The meeting will be held April 15-17, 2013 from 9:00 a.m.-5...
Could Audio-Described Films Benefit from Audio Introductions? An Audience Response Study
ERIC Educational Resources Information Center
Romero-Fresco, Pablo; Fryer, Louise
2013-01-01
Introduction: Time constraints limit the quantity and type of information conveyed in audio description (AD) for films, in particular the cinematic aspects. Inspired by introductory notes for theatre AD, this study developed audio introductions (AIs) for "Slumdog Millionaire" and "Man on Wire." Each AI comprised 10 minutes of…
Audio-Vision: Audio-Visual Interaction in Desktop Multimedia.
ERIC Educational Resources Information Center
Daniels, Lee
Although sophisticated multimedia authoring applications are now available to amateur programmers, the use of audio in of these programs has been inadequate. Due to the lack of research in the use of audio in instruction, there are few resources to assist the multimedia producer in using sound effectively and efficiently. This paper addresses the…
Audio Frequency Analysis in Mobile Phones
ERIC Educational Resources Information Center
Aguilar, Horacio Munguía
2016-01-01
A new experiment using mobile phones is proposed in which its audio frequency response is analyzed using the audio port for inputting external signal and getting a measurable output. This experiment shows how the limited audio bandwidth used in mobile telephony is the main cause of the poor speech quality in this service. A brief discussion is…
A Longitudinal, Quantitative Study of Student Attitudes towards Audio Feedback for Assessment
ERIC Educational Resources Information Center
Parkes, Mitchell; Fletcher, Peter
2017-01-01
This paper reports on the findings of a three-year longitudinal study investigating the experiences of postgraduate level students who were provided with audio feedback for their assessment. Results indicated that students positively received audio feedback. Overall, students indicated a preference for audio feedback over written feedback. No…
Audio-Tutorial Instruction: A Strategy For Teaching Introductory College Geology.
ERIC Educational Resources Information Center
Fenner, Peter; Andrews, Ted F.
The rationale of audio-tutorial instruction is discussed, and the history and development of the audio-tutorial botany program at Purdue University is described. Audio-tutorial programs in geology at eleven colleges and one school are described, illustrating several ways in which programs have been developed and integrated into courses. Programs…
Audio-video decision support for patients: the documentary genré as a basis for decision aids.
Volandes, Angelo E; Barry, Michael J; Wood, Fiona; Elwyn, Glyn
2013-09-01
Decision support tools are increasingly using audio-visual materials. However, disagreement exists about the use of audio-visual materials as they may be subjective and biased. This is a literature review of the major texts for documentary film studies to extrapolate issues of objectivity and bias from film to decision support tools. The key features of documentary films are that they attempt to portray real events and that the attempted reality is always filtered through the lens of the filmmaker. The same key features can be said of decision support tools that use audio-visual materials. Three concerns arising from documentary film studies as they apply to the use of audio-visual materials in decision support tools include whose perspective matters (stakeholder bias), how to choose among audio-visual materials (selection bias) and how to ensure objectivity (editorial bias). Decision science needs to start a debate about how audio-visual materials are to be used in decision support tools. Simply because audio-visual materials may be subjective and open to bias does not mean that we should not use them. Methods need to be found to ensure consensus around balance and editorial control, such that audio-visual materials can be used. © 2011 John Wiley & Sons Ltd.
Audio Motor Training at the Foot Level Improves Space Representation.
Aggius-Vella, Elena; Campus, Claudio; Finocchietti, Sara; Gori, Monica
2017-01-01
Spatial representation is developed thanks to the integration of visual signals with the other senses. It has been shown that the lack of vision compromises the development of some spatial representations. In this study we tested the effect of a new rehabilitation device called ABBI (Audio Bracelet for Blind Interaction) to improve space representation. ABBI produces an audio feedback linked to body movement. Previous studies from our group showed that this device improves the spatial representation of space in early blind adults around the upper part of the body. Here we evaluate whether the audio motor feedback produced by ABBI can also improve audio spatial representation of sighted individuals in the space around the legs. Forty five blindfolded sighted subjects participated in the study, subdivided into three experimental groups. An audio space localization (front-back discrimination) task was performed twice by all groups of subjects before and after different kind of training conditions. A group (experimental) performed an audio-motor training with the ABBI device placed on their foot. Another group (control) performed a free motor activity without audio feedback associated with body movement. The other group (control) passively listened to the ABBI sound moved at foot level by the experimenter without producing any body movement. Results showed that only the experimental group, which performed the training with the audio-motor feedback, showed an improvement in accuracy for sound discrimination. No improvement was observed for the two control groups. These findings suggest that the audio-motor training with ABBI improves audio space perception also in the space around the legs in sighted individuals. This result provides important inputs for the rehabilitation of the space representations in the lower part of the body.
Audio Motor Training at the Foot Level Improves Space Representation
Aggius-Vella, Elena; Campus, Claudio; Finocchietti, Sara; Gori, Monica
2017-01-01
Spatial representation is developed thanks to the integration of visual signals with the other senses. It has been shown that the lack of vision compromises the development of some spatial representations. In this study we tested the effect of a new rehabilitation device called ABBI (Audio Bracelet for Blind Interaction) to improve space representation. ABBI produces an audio feedback linked to body movement. Previous studies from our group showed that this device improves the spatial representation of space in early blind adults around the upper part of the body. Here we evaluate whether the audio motor feedback produced by ABBI can also improve audio spatial representation of sighted individuals in the space around the legs. Forty five blindfolded sighted subjects participated in the study, subdivided into three experimental groups. An audio space localization (front-back discrimination) task was performed twice by all groups of subjects before and after different kind of training conditions. A group (experimental) performed an audio-motor training with the ABBI device placed on their foot. Another group (control) performed a free motor activity without audio feedback associated with body movement. The other group (control) passively listened to the ABBI sound moved at foot level by the experimenter without producing any body movement. Results showed that only the experimental group, which performed the training with the audio-motor feedback, showed an improvement in accuracy for sound discrimination. No improvement was observed for the two control groups. These findings suggest that the audio-motor training with ABBI improves audio space perception also in the space around the legs in sighted individuals. This result provides important inputs for the rehabilitation of the space representations in the lower part of the body. PMID:29326564
Attention to sound improves auditory reliability in audio-tactile spatial optimal integration.
Vercillo, Tiziana; Gori, Monica
2015-01-01
The role of attention on multisensory processing is still poorly understood. In particular, it is unclear whether directing attention toward a sensory cue dynamically reweights cue reliability during integration of multiple sensory signals. In this study, we investigated the impact of attention in combining audio-tactile signals in an optimal fashion. We used the Maximum Likelihood Estimation (MLE) model to predict audio-tactile spatial localization on the body surface. We developed a new audio-tactile device composed by several small units, each one consisting of a speaker and a tactile vibrator independently controllable by external software. We tested participants in an attentional and a non-attentional condition. In the attentional experiment, participants performed a dual task paradigm: they were required to evaluate the duration of a sound while performing an audio-tactile spatial task. Three unisensory or multisensory stimuli, conflictual or not conflictual sounds and vibrations arranged along the horizontal axis, were presented sequentially. In the primary task participants had to evaluate in a space bisection task the position of the second stimulus (the probe) with respect to the others (the standards). In the secondary task they had to report occasionally changes in duration of the second auditory stimulus. In the non-attentional task participants had only to perform the primary task (space bisection). Our results showed an enhanced auditory precision (and auditory weights) in the auditory attentional condition with respect to the control non-attentional condition. The results of this study support the idea that modality-specific attention modulates multisensory integration.
Virtual environment display for a 3D audio room simulation
NASA Astrophysics Data System (ADS)
Chapin, William L.; Foster, Scott
1992-06-01
Recent developments in virtual 3D audio and synthetic aural environments have produced a complex acoustical room simulation. The acoustical simulation models a room with walls, ceiling, and floor of selected sound reflecting/absorbing characteristics and unlimited independent localizable sound sources. This non-visual acoustic simulation, implemented with 4 audio ConvolvotronsTM by Crystal River Engineering and coupled to the listener with a Poihemus IsotrakTM, tracking the listener's head position and orientation, and stereo headphones returning binaural sound, is quite compelling to most listeners with eyes closed. This immersive effect should be reinforced when properly integrated into a full, multi-sensory virtual environment presentation. This paper discusses the design of an interactive, visual virtual environment, complementing the acoustic model and specified to: 1) allow the listener to freely move about the space, a room of manipulable size, shape, and audio character, while interactively relocating the sound sources; 2) reinforce the listener's feeling of telepresence into the acoustical environment with visual and proprioceptive sensations; 3) enhance the audio with the graphic and interactive components, rather than overwhelm or reduce it; and 4) serve as a research testbed and technology transfer demonstration. The hardware/software design of two demonstration systems, one installed and one portable, are discussed through the development of four iterative configurations. The installed system implements a head-coupled, wide-angle, stereo-optic tracker/viewer and multi-computer simulation control. The portable demonstration system implements a head-mounted wide-angle, stereo-optic display, separate head and pointer electro-magnetic position trackers, a heterogeneous parallel graphics processing system, and object oriented C++ program code.
An Audio-Visual Presentation of Black Francophone Poetry.
ERIC Educational Resources Information Center
Bruner, Charlotte H.
1982-01-01
A college class project to develop a videocassette presentation of African, Caribbean, and Afro-American French poetry is described from its inception through the processes of obtaining copyright and translation permissions, arranging scripts, presenting at various functions, and reception by Francophone and non-Francophone audiences. (MSE)
Embedded security system for multi-modal surveillance in a railway carriage
NASA Astrophysics Data System (ADS)
Zouaoui, Rhalem; Audigier, Romaric; Ambellouis, Sébastien; Capman, François; Benhadda, Hamid; Joudrier, Stéphanie; Sodoyer, David; Lamarque, Thierry
2015-10-01
Public transport security is one of the main priorities of the public authorities when fighting against crime and terrorism. In this context, there is a great demand for autonomous systems able to detect abnormal events such as violent acts aboard passenger cars and intrusions when the train is parked at the depot. To this end, we present an innovative approach which aims at providing efficient automatic event detection by fusing video and audio analytics and reducing the false alarm rate compared to classical stand-alone video detection. The multi-modal system is composed of two microphones and one camera and integrates onboard video and audio analytics and fusion capabilities. On the one hand, for detecting intrusion, the system relies on the fusion of "unusual" audio events detection with intrusion detections from video processing. The audio analysis consists in modeling the normal ambience and detecting deviation from the trained models during testing. This unsupervised approach is based on clustering of automatically extracted segments of acoustic features and statistical Gaussian Mixture Model (GMM) modeling of each cluster. The intrusion detection is based on the three-dimensional (3D) detection and tracking of individuals in the videos. On the other hand, for violent events detection, the system fuses unsupervised and supervised audio algorithms with video event detection. The supervised audio technique detects specific events such as shouts. A GMM is used to catch the formant structure of a shout signal. Video analytics use an original approach for detecting aggressive motion by focusing on erratic motion patterns specific to violent events. As data with violent events is not easily available, a normality model with structured motions from non-violent videos is learned for one-class classification. A fusion algorithm based on Dempster-Shafer's theory analyses the asynchronous detection outputs and computes the degree of belief of each probable event.
ERIC Educational Resources Information Center
Bilbro, J.; Iluzada, C.; Clark, D. E.
2013-01-01
The authors compared student perceptions of audio and written feedback in order to assess what types of students may benefit from receiving audio feedback on their essays rather than written feedback. Many instructors previously have reported the advantages they see in audio feedback, but little quantitative research has been done on how the…
Design and Usability Testing of an Audio Platform Game for Players with Visual Impairments
ERIC Educational Resources Information Center
Oren, Michael; Harding, Chris; Bonebright, Terri L.
2008-01-01
This article reports on the evaluation of a novel audio platform game that creates a spatial, interactive experience via audio cues. A pilot study with players with visual impairments, and usability testing comparing the visual and audio game versions using both sighted players and players with visual impairments, revealed that all the…
78 FR 57673 - Eighth Meeting: RTCA Special Committee 226, Audio Systems and Equipment
Federal Register 2010, 2011, 2012, 2013, 2014
2013-09-19
... Committee 226, Audio Systems and Equipment AGENCY: Federal Aviation Administration (FAA), U.S. Department of Transportation (DOT). ACTION: Meeting Notice of RTCA Special Committee 226, Audio Systems and Equipment. SUMMARY... Committee 226, Audio Systems and Equipment. DATES: The meeting will be held October 8-10, 2012 from 9:00 a.m...
77 FR 37732 - Fourteenth Meeting: RTCA Special Committee 224, Audio Systems and Equipment
Federal Register 2010, 2011, 2012, 2013, 2014
2012-06-22
... Committee 224, Audio Systems and Equipment AGENCY: Federal Aviation Administration (FAA), U.S. Department of Transportation (DOT). ACTION: Meeting Notice of RTCA Special Committee 224, Audio Systems and Equipment. SUMMARY... Committee 224, Audio Systems and Equipment. DATES: The meeting will be held July 11, 2012, from 10 a.m.-4 p...
Federal Register 2010, 2011, 2012, 2013, 2014
2011-09-19
... Rules and Policies for the Satellite Digital Audio Radio Service in the 2310-2360 MHz Frequency Band... Digital Audio Radio Service (SDARS) Second Report and Order. The information collection requirements were... of these rule sections. See Satellite Digital Audio Radio Service (SDARS) Second Report and Order...
The Use of Asynchronous Audio Feedback with Online RN-BSN Students
ERIC Educational Resources Information Center
London, Julie E.
2013-01-01
The use of audio technology by online nursing educators is a recent phenomenon. Research has been conducted in the area of audio technology in different domains and populations, but very few researchers have focused on nursing. Preliminary results have indicated that using audio in place of text can increase student cognition and socialization.…
ERIC Educational Resources Information Center
Aleman-Centeno, Josefina R.
1983-01-01
Discusses the development and evaluation of CAVIS, which consists of an Apple microcomputer used with audiovisual dialogs. Includes research on the effects of three conditions: (1) computer with audio and visual, (2) computer with audio alone and (3) audio alone in short-term and long-term recall. (EKN)
Low-delay predictive audio coding for the HIVITS HDTV codec
NASA Astrophysics Data System (ADS)
McParland, A. K.; Gilchrist, N. H. C.
1995-01-01
The status of work relating to predictive audio coding, as part of the European project on High Quality Video Telephone and HD(TV) Systems (HIVITS), is reported. The predictive coding algorithm is developed, along with six-channel audio coding and decoding hardware. Demonstrations of the audio codec operating in conjunction with the video codec, are given.
2008-09-24
Director's Colloquium: Ruslan Belikov, Ames Astrophysicist presents 'Imaging other Earths and High Contrast Coronagraphy at Ames abstract: Exoplanet detection over the past decade - Audio available through Ames Library
Detection of goal events in soccer videos
NASA Astrophysics Data System (ADS)
Kim, Hyoung-Gook; Roeber, Steffen; Samour, Amjad; Sikora, Thomas
2005-01-01
In this paper, we present an automatic extraction of goal events in soccer videos by using audio track features alone without relying on expensive-to-compute video track features. The extracted goal events can be used for high-level indexing and selective browsing of soccer videos. The detection of soccer video highlights using audio contents comprises three steps: 1) extraction of audio features from a video sequence, 2) event candidate detection of highlight events based on the information provided by the feature extraction Methods and the Hidden Markov Model (HMM), 3) goal event selection to finally determine the video intervals to be included in the summary. For this purpose we compared the performance of the well known Mel-scale Frequency Cepstral Coefficients (MFCC) feature extraction method vs. MPEG-7 Audio Spectrum Projection feature (ASP) extraction method based on three different decomposition methods namely Principal Component Analysis( PCA), Independent Component Analysis (ICA) and Non-Negative Matrix Factorization (NMF). To evaluate our system we collected five soccer game videos from various sources. In total we have seven hours of soccer games consisting of eight gigabytes of data. One of five soccer games is used as the training data (e.g., announcers' excited speech, audience ambient speech noise, audience clapping, environmental sounds). Our goal event detection results are encouraging.
The Impact of Audio Book on the Elderly Mental Health.
Ameri, Fereshteh; Vazifeshenas, Naser; Haghparast, Abbas
2017-01-01
The growing elderly population calls mental health professionals to take measures concerning the treatment of the elderly mental disorders. Today in developed countries, bibliotherapy is used for the treatment of the most prevalent psychiatric disorders. Therefore, this study aimed to investigate the effects of audio book on the elderly mental health of Retirement Center of Shahid Beheshti University of Medical Sciences. This experimental study was conducted on 60 elderly people participated in 8 audio book presentation sessions, and their mental health aspects were evaluated through mental health questionnaire (SCL-90-R). Data were analyzed using SPSS 24. Data analysis revealed that the mean difference of pretest and posttest of control group is less than 5.0, so no significant difference was observed in their mental health, but this difference was significant in the experimental group (more than 5.0). Therefore, a significant improvement in mental health and its dimensions have observed in elderly people participated in audio book sessions. This therapeutic intervention was effective on mental health dimensions of paranoid ideation, psychosis, phobia, aggression, depression, interpersonal sensitivity, anxiety, obsessive-compulsive and somatic complaints. Considering the fact that our population is moving toward aging, the obtained results could be useful for policy makers and health and social planners to improve the health status of the elderly.
Audio Spatial Representation Around the Body
Aggius-Vella, Elena; Campus, Claudio; Finocchietti, Sara; Gori, Monica
2017-01-01
Studies have found that portions of space around our body are differently coded by our brain. Numerous works have investigated visual and auditory spatial representation, focusing mostly on the spatial representation of stimuli presented at head level, especially in the frontal space. Only few studies have investigated spatial representation around the entire body and its relationship with motor activity. Moreover, it is still not clear whether the space surrounding us is represented as a unitary dimension or whether it is split up into different portions, differently shaped by our senses and motor activity. To clarify these points, we investigated audio localization of dynamic and static sounds at different body levels. In order to understand the role of a motor action in auditory space representation, we asked subjects to localize sounds by pointing with the hand or the foot, or by giving a verbal answer. We found that the audio sound localization was different depending on the body part considered. Moreover, a different pattern of response was observed when subjects were asked to make actions with respect to the verbal responses. These results suggest that the audio space around our body is split in various spatial portions, which are perceived differently: front, back, around chest, and around foot, suggesting that these four areas could be differently modulated by our senses and our actions. PMID:29249999
Young children's recall and reconstruction of audio and audiovisual narratives.
Gibbons, J; Anderson, D R; Smith, R; Field, D E; Fischer, C
1986-08-01
It has been claimed that the visual component of audiovisual media dominates young children's cognitive processing. This experiment examines the effects of input modality while controlling the complexity of the visual and auditory content and while varying the comprehension task (recall vs. reconstruction). 4- and 7-year-olds were presented brief stories through either audio or audiovisual media. The audio version consisted of narrated character actions and character utterances. The narrated actions were matched to the utterances on the basis of length and propositional complexity. The audiovisual version depicted the actions visually by means of stop animation instead of by auditory narrative statements. The character utterances were the same in both versions. Audiovisual input produced superior performance on explicit information in the 4-year-olds and produced more inferences at both ages. Because performance on utterances was superior in the audiovisual condition as compared to the audio condition, there was no evidence that visual input inhibits processing of auditory information. Actions were more likely to be produced by the younger children than utterances, regardless of input medium, indicating that prior findings of visual dominance may have been due to the salience of narrative action. Reconstruction, as compared to recall, produced superior depiction of actions at both ages as well as more constrained relevant inferences and narrative conventions.
The Impact of Audio Book on the Elderly Mental Health
Ameri, Fereshteh; Vazifeshenas, Naser; Haghparast, Abbas
2017-01-01
Introduction: The growing elderly population calls mental health professionals to take measures concerning the treatment of the elderly mental disorders. Today in developed countries, bibliotherapy is used for the treatment of the most prevalent psychiatric disorders. Therefore, this study aimed to investigate the effects of audio book on the elderly mental health of Retirement Center of Shahid Beheshti University of Medical Sciences. Methods: This experimental study was conducted on 60 elderly people participated in 8 audio book presentation sessions, and their mental health aspects were evaluated through mental health questionnaire (SCL-90-R). Data were analyzed using SPSS 24. Results: Data analysis revealed that the mean difference of pretest and posttest of control group is less than 5.0, so no significant difference was observed in their mental health, but this difference was significant in the experimental group (more than 5.0). Therefore, a significant improvement in mental health and its dimensions have observed in elderly people participated in audio book sessions. This therapeutic intervention was effective on mental health dimensions of paranoid ideation, psychosis, phobia, aggression, depression, interpersonal sensitivity, anxiety, obsessive-compulsive and somatic complaints. Conclusion: Considering the fact that our population is moving toward aging, the obtained results could be useful for policy makers and health and social planners to improve the health status of the elderly. PMID:29167723
Tensions in the field: teaching standards of practice in optometry case presentations.
Spafford, Marlee M; Lingard, Lorelei; Schryer, Catherine F; Hrynchak, Patricia K
2004-10-01
Professional identity formation and its relationship to case presentations were studied in an optometry school's onsite clinic. Eight optometry students and six faculty optometrists were audio-recorded during 31 oral case presentations and the teaching exchanges related to them. Using convenience sampling, interviews were audio-recorded of four of the students and four of the optometrists from the field observations. After transcribing these audio-recordings, the research team members applied a grounded theory method to identify, test, and revise emergent themes. The theme reported herein pertains to communicating standards of practice. Faculty optometrists demonstrated three ways of communicating standards of practice to optometry students during case presentations: Official Way, Our Way, and My Way. Although there were differences between these standards, the rationale for the disparities was rarely explicitly articulated by the instructors to the students. Without this information, the incongruity among the standards was left to the students to interpret on their own. The risk created by faculty not articulating the rationale underlying standards of practice was that students misinterpreted the optometrists' ways as idiosyncratic. Thus, opportunities were missed in the educational setting to assist students in making responsible decisions, locating their position in practice, and shaping their professional identity. Competing responsibilities of patient care and student education left instructors with little time to articulate rationale for standards of practice. Therefore, educators must reflect on innovative ways to bring into relief the logic behind their actions when working with novices.
Sinusoidal Analysis-Synthesis of Audio Using Perceptual Criteria
NASA Astrophysics Data System (ADS)
Painter, Ted; Spanias, Andreas
2003-12-01
This paper presents a new method for the selection of sinusoidal components for use in compact representations of narrowband audio. The method consists of ranking and selecting the most perceptually relevant sinusoids. The idea behind the method is to maximize the matching between the auditory excitation pattern associated with the original signal and the corresponding auditory excitation pattern associated with the modeled signal that is being represented by a small set of sinusoidal parameters. The proposed component-selection methodology is shown to outperform the maximum signal-to-mask ratio selection strategy in terms of subjective quality.
Inexpensive Audio Activities: Earbud-based Sound Experiments
NASA Astrophysics Data System (ADS)
Allen, Joshua; Boucher, Alex; Meggison, Dean; Hruby, Kate; Vesenka, James
2016-11-01
Inexpensive alternatives to a number of classic introductory physics sound laboratories are presented including interference phenomena, resonance conditions, and frequency shifts. These can be created using earbuds, economical supplies such as Giant Pixie Stix® wrappers, and free software available for PCs and mobile devices. We describe two interference laboratories (beat frequency and two-speaker interference) and two resonance laboratories (quarter- and half-wavelength). Lastly, a Doppler laboratory using rotating earbuds is explained. The audio signal captured by all experiments is analyzed on free spectral analysis software and many of the experiments incorporate the unifying theme of measuring the speed of sound in air.
Code of Federal Regulations, 2011 CFR
2011-10-01
... Digital Audio Broadcasting § 73.402 Definitions. (a) DAB. Digital audio broadcast stations are those radio... into multiple channels for additional audio programming uses. (g) Datacasting. Subdividing the digital...
Code of Federal Regulations, 2012 CFR
2012-10-01
... Digital Audio Broadcasting § 73.402 Definitions. (a) DAB. Digital audio broadcast stations are those radio... into multiple channels for additional audio programming uses. (g) Datacasting. Subdividing the digital...
Code of Federal Regulations, 2014 CFR
2014-10-01
... Digital Audio Broadcasting § 73.402 Definitions. (a) DAB. Digital audio broadcast stations are those radio... into multiple channels for additional audio programming uses. (g) Datacasting. Subdividing the digital...
Code of Federal Regulations, 2013 CFR
2013-10-01
... Digital Audio Broadcasting § 73.402 Definitions. (a) DAB. Digital audio broadcast stations are those radio... into multiple channels for additional audio programming uses. (g) Datacasting. Subdividing the digital...
DOE Office of Scientific and Technical Information (OSTI.GOV)
George, Rohini; Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA; Chung, Theodore D.
2006-07-01
Purpose: Respiratory gating is a commercially available technology for reducing the deleterious effects of motion during imaging and treatment. The efficacy of gating is dependent on the reproducibility within and between respiratory cycles during imaging and treatment. The aim of this study was to determine whether audio-visual biofeedback can improve respiratory reproducibility by decreasing residual motion and therefore increasing the accuracy of gated radiotherapy. Methods and Materials: A total of 331 respiratory traces were collected from 24 lung cancer patients. The protocol consisted of five breathing training sessions spaced about a week apart. Within each session the patients initially breathedmore » without any instruction (free breathing), with audio instructions and with audio-visual biofeedback. Residual motion was quantified by the standard deviation of the respiratory signal within the gating window. Results: Audio-visual biofeedback significantly reduced residual motion compared with free breathing and audio instruction. Displacement-based gating has lower residual motion than phase-based gating. Little reduction in residual motion was found for duty cycles less than 30%; for duty cycles above 50% there was a sharp increase in residual motion. Conclusions: The efficiency and reproducibility of gating can be improved by: incorporating audio-visual biofeedback, using a 30-50% duty cycle, gating during exhalation, and using displacement-based gating.« less
ERIC Educational Resources Information Center
Bergman, Daniel
2015-01-01
This study examined the effects of audio and video self-recording on preservice teachers' written reflections. Participants (n = 201) came from a secondary teaching methods course and its school-based (clinical) fieldwork. The audio group (n[subscript A] = 106) used audio recorders to monitor their teaching in fieldwork placements; the video group…
ERIC Educational Resources Information Center
Rush, S. Craig
2014-01-01
This article draws on the author's experience using qualitative video and audio analysis, most notably through use of the Transana qualitative video and audio analysis software program, as an alternative method for teaching IQ administration skills to students in a graduate psychology program. Qualitative video and audio analysis may be useful for…
Development and Assessment of Web Courses That Use Streaming Audio and Video Technologies.
ERIC Educational Resources Information Center
Ingebritsen, Thomas S.; Flickinger, Kathleen
Iowa State University, through a program called Project BIO (Biology Instructional Outreach), has been using RealAudio technology for about 2 years in college biology courses that are offered entirely via the World Wide Web. RealAudio is a type of streaming media technology that can be used to deliver audio content and a variety of other media…
Audio distribution and Monitoring Circuit
NASA Technical Reports Server (NTRS)
Kirkland, J. M.
1983-01-01
Versatile circuit accepts and distributes TV audio signals. Three-meter audio distribution and monitoring circuit provides flexibility in monitoring, mixing, and distributing audio inputs and outputs at various signal and impedance levels. Program material is simultaneously monitored on three channels, or single-channel version built to monitor transmitted or received signal levels, drive speakers, interface to building communications, and drive long-line circuits.
Hearing You Loud and Clear: Student Perspectives of Audio Feedback in Higher Education
ERIC Educational Resources Information Center
Gould, Jill; Day, Pat
2013-01-01
The use of audio feedback for students in a full-time community nursing degree course is appraised. The aim of this mixed methods study was to examine student views on audio feedback for written assignments. Questionnaires and a focus group were used to capture student opinion of this pilot project. The majority of students valued audio feedback…
How we give personalised audio feedback after summative OSCEs.
Harrison, Christopher J; Molyneux, Adrian J; Blackwell, Sara; Wass, Valerie J
2015-04-01
Students often receive little feedback after summative objective structured clinical examinations (OSCEs) to enable them to improve their performance. Electronic audio feedback has shown promise in other educational areas. We investigated the feasibility of electronic audio feedback in OSCEs. An electronic OSCE system was designed, comprising (1) an application for iPads allowing examiners to mark in the key consultation skill domains, provide "tick-box" feedback identifying strengths and difficulties, and record voice feedback; (2) a feedback website giving students the opportunity to view/listen in multiple ways to the feedback. Acceptability of the audio feedback was investigated, using focus groups with students and questionnaires with both examiners and students. 87 (95%) students accessed the examiners' audio comments; 83 (90%) found the comments useful and 63 (68%) reported changing the way they perform a skill as a result of the audio feedback. They valued its highly personalised, relevant nature and found it much more useful than written feedback. Eighty-nine per cent of examiners gave audio feedback to all students on their stations. Although many found the method easy, lack of time was a factor. Electronic audio feedback provides timely, personalised feedback to students after a summative OSCE provided enough time is allocated to the process.
Guidelines for Effective Teleconference Presentations in Continuing Medical Education.
ERIC Educational Resources Information Center
Raszkowski, Robert R.; Chute, Alan G.
Designing teleconference programs for the physician learner puts unique demands on the teleconferencing medium. Typically, physicians expect a 1-hour lecture presentation with high information density. To effectively present the medical content material in an audio medium, strategies which structure and organize the content material are necessary.…
A Portable Presentation Package for Audio-Visual Instruction. Technical Documentary Report.
ERIC Educational Resources Information Center
Smith, Edgar A.; And Others
The Portable Presentation Package is a prototype of an audiovisual equipment package designed to facilitate technical training in remote areas, situations in which written communications are difficult, or in situations requiring rapid presentation of instructional material. The major criteria employed in developing the package were (1) that the…
Audio Steganography with Embedded Text
NASA Astrophysics Data System (ADS)
Teck Jian, Chua; Chai Wen, Chuah; Rahman, Nurul Hidayah Binti Ab.; Hamid, Isredza Rahmi Binti A.
2017-08-01
Audio steganography is about hiding the secret message into the audio. It is a technique uses to secure the transmission of secret information or hide their existence. It also may provide confidentiality to secret message if the message is encrypted. To date most of the steganography software such as Mp3Stego and DeepSound use block cipher such as Advanced Encryption Standard or Data Encryption Standard to encrypt the secret message. It is a good practice for security. However, the encrypted message may become too long to embed in audio and cause distortion of cover audio if the secret message is too long. Hence, there is a need to encrypt the message with stream cipher before embedding the message into the audio. This is because stream cipher provides bit by bit encryption meanwhile block cipher provide a fixed length of bits encryption which result a longer output compare to stream cipher. Hence, an audio steganography with embedding text with Rivest Cipher 4 encryption cipher is design, develop and test in this project.
Design and implementation of an audio indicator
NASA Astrophysics Data System (ADS)
Zheng, Shiyong; Li, Zhao; Li, Biqing
2017-04-01
This page proposed an audio indicator which designed by using C9014, LED by operational amplifier level indicator, the decimal count/distributor of CD4017. The experimental can control audibly neon and holiday lights through the signal. Input audio signal after C9014 composed of operational amplifier for power amplifier, the adjust potentiometer extraction amplification signal input voltage CD4017 distributors make its drive to count, then connect the LED display running situation of the circuit. This simple audio indicator just use only U1 and can produce two colors LED with the audio signal tandem come pursuit of the running effect, from LED display the running of the situation takes can understand the general audio signal. The variation in the audio and the frequency of the signal and the corresponding level size. In this light can achieve jump to change, slowly, atlas, lighting four forms, used in home, hotel, discos, theater, advertising and other fields, and a wide range of USES, rU1h life in a modern society.
Ultrasonic speech translator and communications system
Akerman, M.A.; Ayers, C.W.; Haynes, H.D.
1996-07-23
A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system includes an ultrasonic transmitting device and an ultrasonic receiving device. The ultrasonic transmitting device accepts as input an audio signal such as human voice input from a microphone or tape deck. The ultrasonic transmitting device frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output. 7 figs.
Leach, Matthew J; Canaway, Rachel; Hunter, Jennifer
2018-05-01
To develop a policy, practice, education and research agenda for evidence-based practice (EBP) in traditional and complementary medicine (T&CM). The study was a secondary analysis of qualitative data, using the method of roundtable discussion. The sample comprised seventeen experts in EBP and T&CM. The discussion was audio-recorded, and the transcript analysed using thematic analysis. Four central themes emerged from the data; understanding evidence and EBP, drivers of change, interpersonal interaction, and moving forward. Captured within these themes were fifteen sub-themes. These themes/sub-themes translated into three broad calls to action: (1) defining terminology, (2) defining the EBP approach, and (3) fostering social movement. These calls to action formed the framework of the agenda. This analysis presents a potential framework for an agenda to improve EBP implementation in T&CM. The fundamental elements of this action plan seek clarification, leadership and unification on the issue of EBP in T&CM. Copyright © 2018 Elsevier Ltd. All rights reserved.
Lundeborg Hammarström, Inger
2018-01-01
The present study investigated word-initial (WI) /r/-clusters in Central Swedish-speaking children with and without protracted phonological development (PPD). Data for WI singleton /r/ and singleton and cluster /l/ served as comparisons. Participants were twelve 4-year-olds with PPD and twelve age- and gender-matched typically developing (TD) controls. Native speakers audio-recorded and transcribed 109 target single words using a Swedish phonology test with 12 WI C+/r/-clusters and three WI CC+/r/-clusters. The results showed significantly higher match scores for the TD children, a lower match proportion for the /r/ targets and for singletons compared with clusters, and differences in mismatch patterns between the groups. There were no matches for /r/-cluster targets in the PPD group, with all children except two in that group showing deletions for both /r/-cluster types. The differences in mismatch proportions and types between the PPD group and controls suggests new directions for future clinical practice.
Medical Subspecialty Textbooks in the 21st Century. Essential or Headed for Extinction?
Broaddus, V Courtney; Grippi, Michael A
2015-08-01
In recent years, the role of medical subspecialty textbooks as sources of information for students, trainees, and practicing clinicians has been challenged. Although the structure of textbooks continues to evolve from standard, printed versions to digital formats, including e-books and online texts, we maintain that the authoritative compilation of clinical and scientific material by experts in the field (i.e., a modern-day textbook) remains central to the education, training, and practice of subspecialists. Regardless of format, an effective medical subspecialty textbook is authoritative, comprehensive, and integrated in its coverage of the subject. Textbook content represents a unique synthesis of clinical and scientific material of real educational and clinical value. Incorporation of illustrations, including figures, tables, videos, and audios, bolsters the presentation and further solidifies the reader's understanding of the subject. The textbook, both printed and digital, reinforces the many widely available online resources and serves as a platform from which to evaluate other sources of information and to launch additional scientific and clinical inquiry.
Dance on cortex: enhanced theta synchrony in experts when watching a dance piece.
Poikonen, Hanna; Toiviainen, Petri; Tervaniemi, Mari
2018-03-01
When watching performing arts, a wide and complex network of brain processes emerge. These processes can be shaped by professional expertise. When compared to laymen, dancers have enhanced processes in observation of short dance movement and listening to music. But how do the cortical processes differ in musicians and dancers when watching an audio-visual dance performance? In our study, we presented the participants long excerpts from the contemporary dance choreography of Carmen. During multimodal movement of a dancer, theta phase synchrony over the fronto-central electrodes was stronger in dancers when compared to musicians and laymen. In addition, alpha synchrony was decreased in all groups during large rapid movement when compared to nearly motionless parts of the choreography. Our results suggest an enhanced cortical communication in dancers when watching dance and, further, that this enhancement is rather related to multimodal, cognitive and emotional processes than to simple observation of dance movement. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Edwards, Roger L; Edwards, Sandra L; Bryner, James; Cunningham, Kelly; Rogers, Amy; Slattery, Martha L
2008-04-01
We describe a computer-assisted data collection system developed for a multicenter cohort study of American Indian and Alaska Native people. The study computer-assisted participant evaluation system or SCAPES is built around a central database server that controls a small private network with touch screen workstations. SCAPES encompasses the self-administered questionnaires, the keyboard-based stations for interviewer-administered questionnaires, a system for inputting medical measurements, and administrative tasks such as data exporting, backup and management. Elements of SCAPES hardware/network design, data storage, programming language, software choices, questionnaire programming including the programming of questionnaires administered using audio computer-assisted self-interviewing (ACASI), and participant identification/data security system are presented. Unique features of SCAPES are that data are promptly made available to participants in the form of health feedback; data can be quickly summarized for tribes for health monitoring and planning at the community level; and data are available to study investigators for analyses and scientific evaluation.
Alderete, John; Davies, Monica
2018-04-01
This work describes a methodology of collecting speech errors from audio recordings and investigates how some of its assumptions affect data quality and composition. Speech errors of all types (sound, lexical, syntactic, etc.) were collected by eight data collectors from audio recordings of unscripted English speech. Analysis of these errors showed that: (i) different listeners find different errors in the same audio recordings, but (ii) the frequencies of error patterns are similar across listeners; (iii) errors collected "online" using on the spot observational techniques are more likely to be affected by perceptual biases than "offline" errors collected from audio recordings; and (iv) datasets built from audio recordings can be explored and extended in a number of ways that traditional corpus studies cannot be.
MAC, A System for Automatically IPR Identification, Collection and Distribution
NASA Astrophysics Data System (ADS)
Serrão, Carlos
Controlling Intellectual Property Rights (IPR) in the Digital World is a very hard challenge. The facility to create multiple bit-by-bit identical copies from original IPR works creates the opportunities for digital piracy. One of the most affected industries by this fact is the Music Industry. The Music Industry has supported huge losses during the last few years due to this fact. Moreover, this fact is also affecting the way that music rights collecting and distributing societies are operating to assure a correct music IPR identification, collection and distribution. In this article a system for automating this IPR identification, collection and distribution is presented and described. This system makes usage of advanced automatic audio identification system based on audio fingerprinting technology. This paper will present the details of the system and present a use-case scenario where this system is being used.
NASA Technical Reports Server (NTRS)
Begault, Durand R.
1993-01-01
The advantage of a head-up auditory display was evaluated in a preliminary experiment designed to measure and compare the acquisition time for capturing visual targets under two auditory conditions: standard one-earpiece presentation and two-earpiece three-dimensional (3D) audio presentation. Twelve commercial airline crews were tested under full mission simulation conditions at the NASA-Ames Man-Vehicle Systems Research Facility advanced concepts flight simulator. Scenario software generated visual targets corresponding to aircraft that would activate a traffic collision avoidance system (TCAS) aural advisory; the spatial auditory position was linked to the visual position with 3D audio presentation. Results showed that crew members using a 3D auditory display acquired targets approximately 2.2 s faster than did crew members who used one-earpiece head- sets, but there was no significant difference in the number of targets acquired.
Sonification of optical coherence tomography data and images
Ahmad, Adeel; Adie, Steven G.; Wang, Morgan; Boppart, Stephen A.
2010-01-01
Sonification is the process of representing data as non-speech audio signals. In this manuscript, we describe the auditory presentation of OCT data and images. OCT acquisition rates frequently exceed our ability to visually analyze image-based data, and multi-sensory input may therefore facilitate rapid interpretation. This conversion will be especially valuable in time-sensitive surgical or diagnostic procedures. In these scenarios, auditory feedback can complement visual data without requiring the surgeon to constantly monitor the screen, or provide additional feedback in non-imaging procedures such as guided needle biopsies which use only axial-scan data. In this paper we present techniques to translate OCT data and images into sound based on the spatial and spatial frequency properties of the OCT data. Results obtained from parameter-mapped sonification of human adipose and tumor tissues are presented, indicating that audio feedback of OCT data may be useful for the interpretation of OCT images. PMID:20588846
Fuzzy Logic-Based Audio Pattern Recognition
NASA Astrophysics Data System (ADS)
Malcangi, M.
2008-11-01
Audio and audio-pattern recognition is becoming one of the most important technologies to automatically control embedded systems. Fuzzy logic may be the most important enabling methodology due to its ability to rapidly and economically model such application. An audio and audio-pattern recognition engine based on fuzzy logic has been developed for use in very low-cost and deeply embedded systems to automate human-to-machine and machine-to-machine interaction. This engine consists of simple digital signal-processing algorithms for feature extraction and normalization, and a set of pattern-recognition rules manually tuned or automatically tuned by a self-learning process.
Paper-Based Textbooks with Audio Support for Print-Disabled Students.
Fujiyoshi, Akio; Ohsawa, Akiko; Takaira, Takuya; Tani, Yoshiaki; Fujiyoshi, Mamoru; Ota, Yuko
2015-01-01
Utilizing invisible 2-dimensional codes and digital audio players with a 2-dimensional code scanner, we developed paper-based textbooks with audio support for students with print disabilities, called "multimodal textbooks." Multimodal textbooks can be read with the combination of the two modes: "reading printed text" and "listening to the speech of the text from a digital audio player with a 2-dimensional code scanner." Since multimodal textbooks look the same as regular textbooks and the price of a digital audio player is reasonable (about 30 euro), we think multimodal textbooks are suitable for students with print disabilities in ordinary classrooms.
Recording Technologies: Sights & Sounds. Resources in Technology.
ERIC Educational Resources Information Center
Deal, Walter F., III
1994-01-01
Provides information on recording technologies such as laser disks, audio and videotape, and video cameras. Presents a design brief that includes objectives, student outcomes, and a student quiz. (JOW)
Cai, Yefeng; Wu, Ming; Yang, Jun
2014-02-01
This paper describes a method for focusing the reproduced sound in the bright zone without disturbing other people in the dark zone in personal audio systems. The proposed method combines the least-squares and acoustic contrast criteria. A constrained parameter is introduced to tune the balance between two performance indices, namely, the acoustic contrast and the spatial average error. An efficient implementation of this method using convex optimization is presented. Offline simulations and real-time experiments using a linear loudspeaker array are conducted to evaluate the performance of the presented method. Results show that compared with the traditional acoustic contrast control method, the proposed method can improve the flatness of response in the bright zone by sacrificing the level of acoustic contrast.
Musical examination to bridge audio data and sheet music
NASA Astrophysics Data System (ADS)
Pan, Xunyu; Cross, Timothy J.; Xiao, Liangliang; Hei, Xiali
2015-03-01
The digitalization of audio is commonly implemented for the purpose of convenient storage and transmission of music and songs in today's digital age. Analyzing digital audio for an insightful look at a specific musical characteristic, however, can be quite challenging for various types of applications. Many existing musical analysis techniques can examine a particular piece of audio data. For example, the frequency of digital sound can be easily read and identified at a specific section in an audio file. Based on this information, we could determine the musical note being played at that instant, but what if you want to see a list of all the notes played in a song? While most existing methods help to provide information about a single piece of the audio data at a time, few of them can analyze the available audio file on a larger scale. The research conducted in this work considers how to further utilize the examination of audio data by storing more information from the original audio file. In practice, we develop a novel musical analysis system Musicians Aid to process musical representation and examination of audio data. Musicians Aid solves the previous problem by storing and analyzing the audio information as it reads it rather than tossing it aside. The system can provide professional musicians with an insightful look at the music they created and advance their understanding of their work. Amateur musicians could also benefit from using it solely for the purpose of obtaining feedback about a song they were attempting to play. By comparing our system's interpretation of traditional sheet music with their own playing, a musician could ensure what they played was correct. More specifically, the system could show them exactly where they went wrong and how to adjust their mistakes. In addition, the application could be extended over the Internet to allow users to play music with one another and then review the audio data they produced. This would be particularly useful for teaching music lessons on the web. The developed system is evaluated with songs played with guitar, keyboard, violin, and other popular musical instruments (primarily electronic or stringed instruments). The Musicians Aid system is successful at both representing and analyzing audio data and it is also powerful in assisting individuals interested in learning and understanding music.
Transitioning from analog to digital audio recording in childhood speech sound disorders.
Shriberg, Lawrence D; McSweeny, Jane L; Anderson, Bruce E; Campbell, Thomas F; Chial, Michael R; Green, Jordan R; Hauner, Katherina K; Moore, Christopher A; Rusiewicz, Heather L; Wilson, David L
2005-06-01
Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants' speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise.
Evaluation of architectures for an ASP MPEG-4 decoder using a system-level design methodology
NASA Astrophysics Data System (ADS)
Garcia, Luz; Reyes, Victor; Barreto, Dacil; Marrero, Gustavo; Bautista, Tomas; Nunez, Antonio
2005-06-01
Trends in multimedia consumer electronics, digital video and audio, aim to reach users through low-cost mobile devices connected to data broadcasting networks with limited bandwidth. An emergent broadcasting network is the digital audio broadcasting network (DAB) which provides CD quality audio transmission together with robustness and efficiency techniques to allow good quality reception in motion conditions. This paper focuses on the system-level evaluation of different architectural options to allow low bandwidth digital video reception over DAB, based on video compression techniques. Profiling and design space exploration techniques are applied over the ASP MPEG-4 decoder in order to find out the best HW/SW partition given the application and platform constraints. An innovative SystemC-based system-level design tool, called CASSE, is being used for modelling, exploration and evaluation of different ASP MPEG-4 decoder HW/SW partitions. System-level trade offs and quantitative data derived from this analysis are also presented in this work.
Transitioning from analog to digital audio recording in childhood speech sound disorders
Shriberg, Lawrence D.; McSweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.
2014-01-01
Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants’ speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise. PMID:16019779
Bogale, Gebeyehu W; Boer, Henk; Seydel, Erwin R
2011-02-01
In Ethiopia the level of illiteracy in rural areas is very high. In this study, we investigated the effects of an audio HIV/AIDS prevention intervention targeted at rural illiterate females. In the intervention we used social-oriented presentation formats, such as discussion between similar females and role-play. In a pretest and posttest experimental study with an intervention group (n = 210) and control group (n = 210), we investigated the effects on HIV/AIDS knowledge and social cognitions. The intervention led to significant and relevant increases in HIV/AIDS knowledge, self-efficacy, perceived vulnerability to HIV/AIDS infection, response efficacy of condoms and condom use intention. In the intervention group, self-efficacy at posttest was the main determinant of condom use intention, with also a significant contribution of vulnerability. We conclude that audio HIV/AIDS prevention interventions can play an important role in empowering rural illiterate females in the prevention of HIV/AIDS.
NASA Astrophysics Data System (ADS)
Campo, D.; Quintero, O. L.; Bastidas, M.
2016-04-01
We propose a study of the mathematical properties of voice as an audio signal. This work includes signals in which the channel conditions are not ideal for emotion recognition. Multiresolution analysis- discrete wavelet transform - was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states. ANNs proved to be a system that allows an appropriate classification of such states. This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features. Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify.
"Travelers In The Night" in the Old and New Media
NASA Astrophysics Data System (ADS)
Grauer, Albert D.
2015-11-01
"Travelers in the Night" is a series of 2 minute audio programs based on current research in astronomy and the space sciences.After more than a year of submitting “Travelers In The Night” 2 minute audio pieces to NPR and Community Radio stations with limited success, a parallel effort was initiated by posting the pieces as audio podcasts on Spreaker.com and iTunes.The classic media dispenses programming whose content and schedule is determined by editors and station managers. Riding the wave of new technology, people from every demographic group across the globe are selecting what, when, and how they receive information and entertainment. This change is significant with the Pew Research Center reporting that currently more than 60% of Facebook and Twitter users now get their news and/or links to stories from these sources. What remains constant is the public’s interest in astronomy and space.This poster presents relevant statistics and a discussion of the initial results of these two parallel efforts.
Direct broadcast satellite-radio market, legal, regulatory, and business considerations
NASA Technical Reports Server (NTRS)
Sood, Des R.
1991-01-01
A Direct Broadcast Satellite-Radio (DBS-R) System offers the prospect of delivering high quality audio broadcasts to large audiences at costs lower than or comparable to those incurred using the current means of broadcasting. The maturation of mobile communications technologies, and advances in microelectronics and digital signal processing now make it possible to bring this technology to the marketplace. Heightened consumer interest in improved audio quality coupled with the technological and economic feasibility of meeting this demand via DBS-R make it opportune to start planning for implementation of DBS-R Systems. NASA-Lewis and the Voice of America as part of their on-going efforts to improve the quality of international audio broadcasts, have undertaken a number of tasks to more clearly define the technical, marketing, organizational, legal, and regulatory issues underlying implementation of DBS-R Systems. The results and an assessment is presented of the business considerations underlying the construction, launch, and operation of DBS-R Systems.
Direct broadcast satellite-radio market, legal, regulatory, and business considerations
NASA Astrophysics Data System (ADS)
Sood, Des R.
1991-03-01
A Direct Broadcast Satellite-Radio (DBS-R) System offers the prospect of delivering high quality audio broadcasts to large audiences at costs lower than or comparable to those incurred using the current means of broadcasting. The maturation of mobile communications technologies, and advances in microelectronics and digital signal processing now make it possible to bring this technology to the marketplace. Heightened consumer interest in improved audio quality coupled with the technological and economic feasibility of meeting this demand via DBS-R make it opportune to start planning for implementation of DBS-R Systems. NASA-Lewis and the Voice of America as part of their on-going efforts to improve the quality of international audio broadcasts, have undertaken a number of tasks to more clearly define the technical, marketing, organizational, legal, and regulatory issues underlying implementation of DBS-R Systems. The results and an assessment is presented of the business considerations underlying the construction, launch, and operation of DBS-R Systems.
Rosemann, Stephanie; Thiel, Christiane M
2018-07-15
Hearing loss is associated with difficulties in understanding speech, especially under adverse listening conditions. In these situations, seeing the speaker improves speech intelligibility in hearing-impaired participants. On the neuronal level, previous research has shown cross-modal plastic reorganization in the auditory cortex following hearing loss leading to altered processing of auditory, visual and audio-visual information. However, how reduced auditory input effects audio-visual speech perception in hearing-impaired subjects is largely unknown. We here investigated the impact of mild to moderate age-related hearing loss on processing audio-visual speech using functional magnetic resonance imaging. Normal-hearing and hearing-impaired participants performed two audio-visual speech integration tasks: a sentence detection task inside the scanner and the McGurk illusion outside the scanner. Both tasks consisted of congruent and incongruent audio-visual conditions, as well as auditory-only and visual-only conditions. We found a significantly stronger McGurk illusion in the hearing-impaired participants, which indicates stronger audio-visual integration. Neurally, hearing loss was associated with an increased recruitment of frontal brain areas when processing incongruent audio-visual, auditory and also visual speech stimuli, which may reflect the increased effort to perform the task. Hearing loss modulated both the audio-visual integration strength measured with the McGurk illusion and brain activation in frontal areas in the sentence task, showing stronger integration and higher brain activation with increasing hearing loss. Incongruent compared to congruent audio-visual speech revealed an opposite brain activation pattern in left ventral postcentral gyrus in both groups, with higher activation in hearing-impaired participants in the incongruent condition. Our results indicate that already mild to moderate hearing loss impacts audio-visual speech processing accompanied by changes in brain activation particularly involving frontal areas. These changes are modulated by the extent of hearing loss. Copyright © 2018 Elsevier Inc. All rights reserved.
Digital Audio Application to Short Wave Broadcasting
NASA Technical Reports Server (NTRS)
Chen, Edward Y.
1997-01-01
Digital audio is becoming prevalent not only in consumer electornics, but also in different broadcasting media. Terrestrial analog audio broadcasting in the AM and FM bands will be eventually be replaced by digital systems.
Steganalysis of recorded speech
NASA Astrophysics Data System (ADS)
Johnson, Micah K.; Lyu, Siwei; Farid, Hany
2005-03-01
Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Speed on the dance floor: Auditory and visual cues for musical tempo.
London, Justin; Burger, Birgitta; Thompson, Marc; Toiviainen, Petri
2016-02-01
Musical tempo is most strongly associated with the rate of the beat or "tactus," which may be defined as the most prominent rhythmic periodicity present in the music, typically in a range of 1.67-2 Hz. However, other factors such as rhythmic density, mean rhythmic inter-onset interval, metrical (accentual) structure, and rhythmic complexity can affect perceived tempo (Drake, Gros, & Penel, 1999; London, 2011 Drake, Gros, & Penel, 1999; London, 2011). Visual information can also give rise to a perceived beat/tempo (Iversen, et al., 2015), and auditory and visual temporal cues can interact and mutually influence each other (Soto-Faraco & Kingstone, 2004; Spence, 2015). A five-part experiment was performed to assess the integration of auditory and visual information in judgments of musical tempo. Participants rated the speed of six classic R&B songs on a seven point scale while observing an animated figure dancing to them. Participants were presented with original and time-stretched (±5%) versions of each song in audio-only, audio+video (A+V), and video-only conditions. In some videos the animations were of spontaneous movements to the different time-stretched versions of each song, and in other videos the animations were of "vigorous" versus "relaxed" interpretations of the same auditory stimulus. Two main results were observed. First, in all conditions with audio, even though participants were able to correctly rank the original vs. time-stretched versions of each song, a song-specific tempo-anchoring effect was observed, such that sped-up versions of slower songs were judged to be faster than slowed-down versions of faster songs, even when their objective beat rates were the same. Second, when viewing a vigorous dancing figure in the A+V condition, participants gave faster tempo ratings than from the audio alone or when viewing the same audio with a relaxed dancing figure. The implications of this illusory tempo percept for cross-modal sensory integration and working memory are discussed, and an "energistic" account of tempo perception is proposed. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Park, Nam In; Kim, Seon Man; Kim, Hong Kook; Kim, Ji Woon; Kim, Myeong Bo; Yun, Su Won
In this paper, we propose a video-zoom driven audio-zoom algorithm in order to provide audio zooming effects in accordance with the degree of video-zoom. The proposed algorithm is designed based on a super-directive beamformer operating with a 4-channel microphone system, in conjunction with a soft masking process that considers the phase differences between microphones. Thus, the audio-zoom processed signal is obtained by multiplying an audio gain derived from a video-zoom level by the masked signal. After all, a real-time audio-zoom system is implemented on an ARM-CORETEX-A8 having a clock speed of 600 MHz after different levels of optimization are performed such as algorithmic level, C-code, and memory optimizations. To evaluate the complexity of the proposed real-time audio-zoom system, test data whose length is 21.3 seconds long is sampled at 48 kHz. As a result, it is shown from the experiments that the processing time for the proposed audio-zoom system occupies 14.6% or less of the ARM clock cycles. It is also shown from the experimental results performed in a semi-anechoic chamber that the signal with the front direction can be amplified by approximately 10 dB compared to the other directions.
NASA Astrophysics Data System (ADS)
Nasrudin, Ajeng Ratih; Setiawan, Wawan; Sanjaya, Yayan
2017-05-01
This study is titled the impact of audio narrated animation on students' understanding in learning humanrespiratory system based on gender. This study was conducted in eight grade of junior high school. This study aims to investigate the difference of students' understanding and learning environment at boys and girls classes in learning human respiratory system using audio narrated animation. Research method that is used is quasy experiment with matching pre-test post-test comparison group design. The procedures of study are: (1) preliminary study and learning habituation using audio narrated animation; (2) implementation of learning using audio narrated animation and taking data; (3) analysis and discussion. The result of analysis shows that there is significant difference on students' understanding and learning environment at boys and girls classes in learning human respiratory system using audio narrated animation, both in general and specifically in achieving learning indicators. The discussion related to the impact of audio narrated animation, gender characteristics, and constructivist learning environment. It can be concluded that there is significant difference of students' understanding at boys and girls classes in learning human respiratory system using audio narrated animation. Additionally, based on interpretation of students' respond, there is the difference increment of agreement level in learning environment.
Zarkada, Angeliki; Regan, Julie
2018-06-01
The Dysphagia Outcome and Severity Scale (DOSS) is widely used to measure dysphagia severity based on videofluoroscopy (VFSS). This study investigated inter-rater reliability (IRR) of the DOSS. It also determined the effect of clinical experience, VFSS audio-recording and training on DOSS IRR. A quantitative prospective research design was used. Seventeen speech and language pathologists (SLPs) were recruited from an acute teaching hospital, Dublin (> 3 years' VFSS experience, n = 10) and from a postgraduate dysphagia programme in a university setting (< 3 years' VFSS experience; n = 7). During testing, participants viewed eight VFSS clips (5 with audio-recording). Each VFSS clip was independently rated using the DOSS scale. Four weeks later, the less experienced group attended a 1-h training session on DOSS rating after which DOSS IRR was re-tested. Cohen's kappa co-efficient was used to establish IRR. IRR of the DOSS presented only fair agreement (κ = 0.36, p < 0.05). DOSS IRR was significantly higher (κ = 0.342) within the more experienced SLP group, compared to the less experienced SLP group (κ = 0.298) (p < 0.05). DOSS IRR was significantly higher in VFSS clips with audio-recording (κ = 0.287) compared to VFSS clips without audio-recording (κ = - 0.0395) (p < 0.05). IRR of the DOSS pre-training (κ = 0.328) was significantly better comparing to post-training (κ = 0.218) (p < 0.05). Findings raise concerns as the DOSS is frequently used in clinical practice to capture dysphagia severity and to monitor changes.
Targum, Steven D; Murphy, Christopher; Khan, Jibran; Zumpano, Laura; Whitlock, Mark; Simen, Arthur A; Binneman, Brendon
2018-04-01
Objective : The assessment of patients with generalized anxiety disorder (GAD) to deteremine whether a medication intervention is necessary is not always clear and might benefit from a second opinion. However, second opinions are time consuming, expensive, and not practical in most settings. We obtained independent, second opinion reviews of the primary clinician's assessment via audio-digital recording. Design : An audio-digital recording of key site-based assessments was used to generate site-independent "dual" reviews of the clinical presentation, symptom severity, and medication requirements of patients with GAD as part of the screening procedures for a clinical trial (ClinicalTrials.gov: NCT02310568). Results : Site-independent reviewers affirmed the diagnosis, symptom severity metrics, and treatment requirements of 90 moderately ill patients with GAD. The patients endorsed excessive worry that was hard to control and essentially all six of the associated DSM-IV-TR anxiety symptoms. The Hamilton Rating Scale for Anxiety scores revealed moderately severe anxiety with a high Pearson's correlation ( r =0.852) between site-based and independent raters and minimal scoring discordance on each scale item. Based upon their independent reviews, these "second" opinions confirmed that these GAD patients warranted a new medication intervention. Thirty patients (33.3%) reported a previous history of a major depressive episode (MDE) and had significantly more depressive symptoms than patients without a history of MDE. Conclusion : The audio-digital recording method provides a useful second opinion that can affirm the need for a different treatment intervention in these anxious patients. A second live assessment would have required additional clinic time and added patient burden. The audio-digital recording method is less burdensome than live second opinion assessments and might have utility in both research and clinical practice settings.
Real Time Implementation of an LPC Algorithm. Speech Signal Processing Research at CHI
1975-05-01
SIGNAL PROCESSING HARDWARE 2-1 2.1 INTRODUCTION 2-1 2.2 TWO- CHANNEL AUDIO SIGNAL SYSTEM 2-2 2.3 MULTI- CHANNEL AUDIO SIGNAL SYSTEM 2-5 2.3.1... Channel Audio Signal System 2-30 I ii kv^i^ünt«.jfc*. ji .„* ,:-v*. ’.ii. *.. ...... — ■ -,,.,-c-» —ipponp ■^ TOHaBWgBpwiBWgPlpaiPWgW v.«.wN...Messages .... 1-55 1-13. Lost or Out of Order Message 1-56 2-1. Block Diagram of Two- Channel Audio Signal System . . 2-3 2-2. Block Diagram of Audio
An audio-magnetotelluric investigation in Terceira Island (Azores)
NASA Astrophysics Data System (ADS)
Monteiro Santos, Fernando A.; Trota, António; Soares, António; Luzio, Rafael; Lourenço, Nuno; Matos, Liliana; Almeida, Eugénio; Gaspar, João L.; Miranda, Jorge M.
2006-08-01
Ten audio-magnetotelluric soundings have been carried out along a profile crossing the Serra do Cume caldera in the eastern part of the Terceira Island (Azores). The main objectives of this investigation were to detect geoelectrical features related with tectonic structures and to characterize regional hydrological and hydrothermal aspects mainly those related to geothermal fluid dynamics. Three-dimensional numerical investigation showed that the data acquired at periods shorter than 1 s are not significantly affected by ocean effect. The data was analysed using the Smith's decomposition method in order to investigate possible distortions caused by superficial structures and to estimate a global regional strike. The results suggest that in general the soundings were not distorted. A regional N55°W strike was chosen for the two-dimensional data inversion. The low-resistivity zones (10-30 ohm-m) displayed in the central part of the 2-D geoelectrical model have been interpreted as caused by hydrothermal circulation. The low-resistivity anomalies at the ends of the profile might be attributed to alteration zones with interaction of seawater intrusion. High-resistivity (> 300 ohm-m) values have been related with less permeable zones in the SW of Cinco Picos and Guilherme Moniz caldera walls.
77 FR 74543 - Federal Advisory Committee Meeting
Federal Register 2010, 2011, 2012, 2013, 2014
2012-12-14
.../411593 . Dial-in: After you've connected your computer, audio connection instructions will be presented... the status of current research projects. FOR FURTHER INFORMATION CONTACT: The meeting is open to the...
Yu, Jesang; Choi, Ji Hoon; Ma, Sun Young; Jeung, Tae Sig; Lim, Sangwook
2015-09-01
To compare audio-only biofeedback to conventional audiovisual biofeedback for regulating patients' respiration during four-dimensional radiotherapy, limiting damage to healthy surrounding tissues caused by organ movement. Six healthy volunteers were assisted by audiovisual or audio-only biofeedback systems to regulate their respirations. Volunteers breathed through a mask developed for this study by following computer-generated guiding curves displayed on a screen, combined with instructional sounds. They then performed breathing following instructional sounds only. The guiding signals and the volunteers' respiratory signals were logged at 20 samples per second. The standard deviations between the guiding and respiratory curves for the audiovisual and audio-only biofeedback systems were 21.55% and 23.19%, respectively; the average correlation coefficients were 0.9778 and 0.9756, respectively. The regularities between audiovisual and audio-only biofeedback for six volunteers' respirations were same statistically from the paired t-test. The difference between the audiovisual and audio-only biofeedback methods was not significant. Audio-only biofeedback has many advantages, as patients do not require a mask and can quickly adapt to this method in the clinic.
Ultrasonic speech translator and communications system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Akerman, M.A.; Ayers, C.W.; Haynes, H.D.
1996-07-23
A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system includes an ultrasonic transmitting device and an ultrasonic receiving device. The ultrasonic transmitting device accepts as input an audio signal such as human voice input from a microphone or tape deck. The ultrasonic transmitting device frequency modulatesmore » an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output. 7 figs.« less
Ultrasonic speech translator and communications system
Akerman, M. Alfred; Ayers, Curtis W.; Haynes, Howard D.
1996-01-01
A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system (20) includes an ultrasonic transmitting device (100) and an ultrasonic receiving device (200). The ultrasonic transmitting device (100) accepts as input (115) an audio signal such as human voice input from a microphone (114) or tape deck. The ultrasonic transmitting device (100) frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device (200) converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output (250).
Aurally aided visual search performance in a dynamic environment
NASA Astrophysics Data System (ADS)
McIntire, John P.; Havig, Paul R.; Watamaniuk, Scott N. J.; Gilkey, Robert H.
2008-04-01
Previous research has repeatedly shown that people can find a visual target significantly faster if spatial (3D) auditory displays direct attention to the corresponding spatial location. However, previous research has only examined searches for static (non-moving) targets in static visual environments. Since motion has been shown to affect visual acuity, auditory acuity, and visual search performance, it is important to characterize aurally-aided search performance in environments that contain dynamic (moving) stimuli. In the present study, visual search performance in both static and dynamic environments is investigated with and without 3D auditory cues. Eight participants searched for a single visual target hidden among 15 distracting stimuli. In the baseline audio condition, no auditory cues were provided. In the 3D audio condition, a virtual 3D sound cue originated from the same spatial location as the target. In the static search condition, the target and distractors did not move. In the dynamic search condition, all stimuli moved on various trajectories at 10 deg/s. The results showed a clear benefit of 3D audio that was present in both static and dynamic environments, suggesting that spatial auditory displays continue to be an attractive option for a variety of aircraft, motor vehicle, and command & control applications.
NASA Astrophysics Data System (ADS)
Cerwin, Steve; Barnes, Julie; Kell, Scott; Walters, Mark
2003-09-01
This paper describes development and application of a novel method to accomplish real-time solid angle acoustic direction finding using two 8-element orthogonal microphone arrays. The developed prototype system was intended for localization and signature recognition of ground-based sounds from a small UAV. Recent advances in computer speeds have enabled the implementation of microphone arrays in many audio applications. Still, the real-time presentation of a two-dimensional sound field for the purpose of audio target localization is computationally challenging. In order to overcome this challenge, a crosspower spectrum phase1 (CSP) technique was applied to each 8-element arm of a 16-element cross array to provide audio target localization. In this paper, we describe the technique and compare it with two other commonly used techniques; Cross-Spectral Matrix2 and MUSIC3. The results show that the CSP technique applied to two 8-element orthogonal arrays provides a computationally efficient solution with reasonable accuracy and tolerable artifacts, sufficient for real-time applications. Additional topics include development of a synchronized 16-channel transmitter and receiver to relay the airborne data to the ground-based processor and presentation of test data demonstrating both ground-mounted operation and airborne localization of ground-based gunshots and loud engine sounds.
Linguistic experience and audio-visual perception of non-native fricatives.
Wang, Yue; Behne, Dawn M; Jiang, Haisheng
2008-09-01
This study examined the effects of linguistic experience on audio-visual (AV) perception of non-native (L2) speech. Canadian English natives and Mandarin Chinese natives differing in degree of English exposure [long and short length of residence (LOR) in Canada] were presented with English fricatives of three visually distinct places of articulation: interdentals nonexistent in Mandarin and labiodentals and alveolars common in both languages. Stimuli were presented in quiet and in a cafe-noise background in four ways: audio only (A), visual only (V), congruent AV (AVc), and incongruent AV (AVi). Identification results showed that overall performance was better in the AVc than in the A or V condition and better in quiet than in cafe noise. While the Mandarin long LOR group approximated the native English patterns, the short LOR group showed poorer interdental identification, more reliance on visual information, and greater AV-fusion with the AVi materials, indicating the failure of L2 visual speech category formation with the short LOR non-natives and the positive effects of linguistic experience with the long LOR non-natives. These results point to an integrated network in AV speech processing as a function of linguistic background and provide evidence to extend auditory-based L2 speech learning theories to the visual domain.
The presentation of expert testimony via live audio-visual communication.
Miller, R D
1991-01-01
As part of a national effort to improve efficiency in court procedures, the American Bar Association has recommended, on the basis of a number of pilot studies, increased use of current audio-visual technology, such as telephone and live video communication, to eliminate delays caused by unavailability of participants in both civil and criminal procedures. Although these recommendations were made to facilitate court proceedings, and for the convenience of attorneys and judges, they also have the potential to save significant time for clinical expert witnesses as well. The author reviews the studies of telephone testimony that were done by the American Bar Association and other legal research groups, as well as the experience in one state forensic evaluation and treatment center. He also reviewed the case law on the issue of remote testimony. He then presents data from a national survey of state attorneys general concerning the admissibility of testimony via audio-visual means, including video depositions. Finally, he concludes that the option to testify by telephone provides a significant savings in precious clinical time for forensic clinicians in public facilities, and urges that such clinicians work actively to convince courts and/or legislatures in states that do not permit such testimony (currently the majority), to consider accepting it, to improve the effective use of scarce clinical resources in public facilities.
Observations on online educational materials for powder diffraction crystallography software.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Toby, B. H.
2010-10-01
This article presents a series of approaches used to educate potential users of crystallographic software for powder diffraction. The approach that has been most successful in the author's opinion is the web lecture, where an audio presentation is coupled to a video-like record of the contents of the presenter's computer screen.
Using Presentation Software to Flip an Undergraduate Analytical Chemistry Course
ERIC Educational Resources Information Center
Fitzgerald, Neil; Li, Luisa
2015-01-01
An undergraduate analytical chemistry course has been adapted to a flipped course format. Course content was provided by video clips, text, graphics, audio, and simple animations organized as concept maps using the cloud-based presentation platform, Prezi. The advantages of using Prezi to present course content in a flipped course format are…
ERIC Educational Resources Information Center
Carlin, Michael; Toglia, Michael P.; Belmonte, Colleen; DiMeglio, Chiara
2012-01-01
In the present study the effects of visual, auditory, and audio-visual presentation formats on memory for thematically constructed lists were assessed in individuals with intellectual disability and mental age-matched children. The auditory recognition test included target items, unrelated foils, and two types of semantic lures: critical related…
Research into Teleconferencing
1981-02-01
Wichman (1970) found more cooperation under conditions of audio- visual communication than conditions of audio communication alone. Laplante (1971) found...was found for audio teleconferences. These results, taken with the results concerning group perfor- mance, seem to indicate that visual communication gives
Entertainment and Pacification System For Car Seat
NASA Technical Reports Server (NTRS)
Elrod, Susan Vinz (Inventor); Dabney, Richard W. (Inventor)
2006-01-01
An entertainment and pacification system for use with a child car seat has speakers mounted in the child car seat with a plurality of audio sources and an anti-noise audio system coupled to the child car seat. A controllable switching system provides for, at any given time, the selective activation of i) one of the audio sources such that the audio signal generated thereby is coupled to one or more of the speakers, and ii) the anti-noise audio system such that an ambient-noise-canceling audio signal generated thereby is coupled to one or more of the speakers. The controllable switching system can receive commands generated at one of first controls located at the child car seat and second controls located remotely with respect to the child car seat with commands generated by the second controls overriding commands generated by the first controls.
A high efficiency PWM CMOS class-D audio power amplifier
NASA Astrophysics Data System (ADS)
Zhangming, Zhu; Lianxi, Liu; Yintang, Yang; Han, Lei
2009-02-01
Based on the difference close-loop feedback technique and the difference pre-amp, a high efficiency PWM CMOS class-D audio power amplifier is proposed. A rail-to-rail PWM comparator with window function has been embedded in the class-D audio power amplifier. Design results based on the CSMC 0.5 μm CMOS process show that the max efficiency is 90%, the PSRR is -75 dB, the power supply voltage range is 2.5-5.5 V, the THD+N in 1 kHz input frequency is less than 0.20%, the quiescent current in no load is 2.8 mA, and the shutdown current is 0.5 μA. The active area of the class-D audio power amplifier is about 1.47 × 1.52 mm2. With the good performance, the class-D audio power amplifier can be applied to several audio power systems.
The Uses of Media in Early Childhood Education
ERIC Educational Resources Information Center
Grossman, Bruce
1976-01-01
This article discusses the educational benefits of involving young children in the media arts and presents suggestions for using still cameras, movie cameras, audio tape recorders, and video tape recorders. (SB)
Implementing Audio Digital Feedback Loop Using the National Instruments RIO System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, G.; Byrd, J. M.
2006-11-20
Development of system for high precision RF distribution and laser synchronization at Berkeley Lab has been ongoing for several years. Successful operation of these systems requires multiple audio bandwidth feedback loops running at relatively high gains. Stable operation of the feedback loops requires careful design of the feedback transfer function. To allow for flexible and compact implementation, we have developed digital feedback loops on the National Instruments Reconfigurable Input/Output (RIO) platform. This platform uses an FPGA and multiple I/Os that can provide eight parallel channels running different filters. We present the design and preliminary experimental results of this system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Falferi, P.; Mezzena, R.; Vitale, S.
1997-08-01
The coupling effects of a commercial dc superconducting quantum interference device (SQUID) to an electrical LC resonator which operates at audio frequencies ({approx}1kHz) with quality factors Q{approx}10{sup 6} are presented. The variations of the resonance frequency of the resonator as functions of the flux applied to the SQUID are due to the SQUID dynamic inductance in good agreement with the predictions of a model. The variations of the quality factor point to a feedback mechanism between the output of the SQUID and the input circuit. {copyright} {ital 1997 American Institute of Physics.}
Charbonneau, Geneviève; Véronneau, Marie; Boudrias-Fournier, Colin; Lepore, Franco; Collignon, Olivier
2013-10-28
The relative reliability of separate sensory estimates influences the way they are merged into a unified percept. We investigated how eccentricity-related changes in reliability of auditory and visual stimuli influence their integration across the entire frontal space. First, we surprisingly found that despite a strong decrease in auditory and visual unisensory localization abilities in periphery, the redundancy gain resulting from the congruent presentation of audio-visual targets was not affected by stimuli eccentricity. This result therefore contrasts with the common prediction that a reduction in sensory reliability necessarily induces an enhanced integrative gain. Second, we demonstrate that the visual capture of sounds observed with spatially incongruent audio-visual targets (ventriloquist effect) steadily decreases with eccentricity, paralleling a lowering of the relative reliability of unimodal visual over unimodal auditory stimuli in periphery. Moreover, at all eccentricities, the ventriloquist effect positively correlated with a weighted combination of the spatial resolution obtained in unisensory conditions. These findings support and extend the view that the localization of audio-visual stimuli relies on an optimal combination of auditory and visual information according to their respective spatial reliability. All together, these results evidence that the external spatial coordinates of multisensory events relative to an observer's body (e.g., eyes' or head's position) influence how this information is merged, and therefore determine the perceptual outcome.
A Cough-Based Algorithm for Automatic Diagnosis of Pertussis.
Pramono, Renard Xaviero Adhi; Imtiaz, Syed Anas; Rodriguez-Villegas, Esther
2016-01-01
Pertussis is a contagious respiratory disease which mainly affects young children and can be fatal if left untreated. The World Health Organization estimates 16 million pertussis cases annually worldwide resulting in over 200,000 deaths. It is prevalent mainly in developing countries where it is difficult to diagnose due to the lack of healthcare facilities and medical professionals. Hence, a low-cost, quick and easily accessible solution is needed to provide pertussis diagnosis in such areas to contain an outbreak. In this paper we present an algorithm for automated diagnosis of pertussis using audio signals by analyzing cough and whoop sounds. The algorithm consists of three main blocks to perform automatic cough detection, cough classification and whooping sound detection. Each of these extract relevant features from the audio signal and subsequently classify them using a logistic regression model. The output from these blocks is collated to provide a pertussis likelihood diagnosis. The performance of the proposed algorithm is evaluated using audio recordings from 38 patients. The algorithm is able to diagnose all pertussis successfully from all audio recordings without any false diagnosis. It can also automatically detect individual cough sounds with 92% accuracy and PPV of 97%. The low complexity of the proposed algorithm coupled with its high accuracy demonstrates that it can be readily deployed using smartphones and can be extremely useful for quick identification or early screening of pertussis and for infection outbreaks control.
A Cough-Based Algorithm for Automatic Diagnosis of Pertussis
Pramono, Renard Xaviero Adhi; Imtiaz, Syed Anas; Rodriguez-Villegas, Esther
2016-01-01
Pertussis is a contagious respiratory disease which mainly affects young children and can be fatal if left untreated. The World Health Organization estimates 16 million pertussis cases annually worldwide resulting in over 200,000 deaths. It is prevalent mainly in developing countries where it is difficult to diagnose due to the lack of healthcare facilities and medical professionals. Hence, a low-cost, quick and easily accessible solution is needed to provide pertussis diagnosis in such areas to contain an outbreak. In this paper we present an algorithm for automated diagnosis of pertussis using audio signals by analyzing cough and whoop sounds. The algorithm consists of three main blocks to perform automatic cough detection, cough classification and whooping sound detection. Each of these extract relevant features from the audio signal and subsequently classify them using a logistic regression model. The output from these blocks is collated to provide a pertussis likelihood diagnosis. The performance of the proposed algorithm is evaluated using audio recordings from 38 patients. The algorithm is able to diagnose all pertussis successfully from all audio recordings without any false diagnosis. It can also automatically detect individual cough sounds with 92% accuracy and PPV of 97%. The low complexity of the proposed algorithm coupled with its high accuracy demonstrates that it can be readily deployed using smartphones and can be extremely useful for quick identification or early screening of pertussis and for infection outbreaks control. PMID:27583523
Effect of Spinal Manipulative Therapy on the Singing Voice.
Fachinatto, Ana Paula A; Duprat, André de Campos; Silva, Marta Andrada E; Bracher, Eduardo Sawaya Botelho; Benedicto, Camila de Carvalho; Luz, Victor Botta Colangelo; Nogueira, Maruan Nogueira; Fonseca, Beatriz Suster Gomes
2015-09-01
This study investigated the effect of spinal manipulative therapy (SMT) on the singing voice of male individuals. Randomized, controlled, case-crossover trial. Twenty-nine subjects were selected among male members of the Heralds of the Gospel. This association was chosen because it is a group of persons with similar singing activities. Participants were randomly assigned to two groups: (A) chiropractic SMT procedure and (B) nontherapeutic transcutaneous electrical nerve stimulation (TENS) procedure. Recordings of the singing voice of each participant were taken immediately before and after the procedures. After a 14-day period, procedures were switched between groups: participants who underwent SMT on the first day were subjected to TENS and vice versa. Recordings were subjected to perceptual audio and acoustic evaluations. The same recording segment of each participant was selected. Perceptual audio evaluation was performed by a specialist panel (SP). Recordings of each participant were randomly presented thus making the SP blind to intervention type and recording session (before/after intervention). Recordings compiled in a randomized order were also subjected to acoustic evaluation. No differences in the quality of the singing on perceptual audio evaluation were observed between TENS and SMT. No differences in the quality of the singing voice of asymptomatic male singers were observed on perceptual audio evaluation or acoustic evaluation after a single spinal manipulative intervention of the thoracic and cervical spine. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
The implementation of Project-Based Learning in courses Audio Video to Improve Employability Skills
NASA Astrophysics Data System (ADS)
Sulistiyo, Edy; Kustono, Djoko; Purnomo; Sutaji, Eddy
2018-04-01
This paper presents a project-based learning (PjBL) in subjects with Audio Video the Study Programme Electro Engineering Universitas Negeri Surabaya which consists of two ways namely the design of the prototype audio-video and assessment activities project-based learning tailored to the skills of the 21st century in the form of employability skills. The purpose of learning innovation is applying the lab work obtained in the theory classes. The PjBL aims to motivate students, centering on the problems of teaching in accordance with the world of work. Measures of learning include; determine the fundamental questions, designs, develop a schedule, monitor the learners and progress, test the results, evaluate the experience, project assessment, and product assessment. The results of research conducted showed the level of mastery of the ability to design tasks (of 78.6%), technical planning (39,3%), creativity (42,9%), innovative (46,4%), problem solving skills (the 57.1%), skill to communicate (75%), oral expression (75%), searching and understanding information (to 64.3%), collaborative work skills (71,4%), and classroom conduct (of 78.6%). In conclusion, instructors have to do the reflection and make improvements in some of the aspects that have a level of mastery of the skills less than 60% both on the application of project-based learning courses, audio video.
Musical stairs: the impact of audio feedback during stair-climbing physical therapies for children.
Khan, Ajmal; Biddiss, Elaine
2015-05-01
Enhanced biofeedback during rehabilitation therapies has the potential to provide a therapeutic environment optimally designed for neuroplasticity. This study investigates the impact of audio feedback on the achievement of a targeted therapeutic goal, namely, use of reciprocal steps. Stair-climbing therapy sessions conducted with and without audio feedback were compared in a randomized AB/BA cross-over study design. Seventeen children, aged 4-7 years, with various diagnoses participated. Reports from the participants, therapists, and a blinded observer were collected to evaluate achievement of the therapeutic goal, motivation and enjoyment during the therapy sessions. Audio feedback resulted in a 5.7% increase (p = 0.007) in reciprocal steps. Levels of participant enjoyment increased significantly (p = 0.031) and motivation was reported by child participants and therapists to be greater when audio feedback was provided. These positive results indicate that audio feedback may influence the achievement of therapeutic goals and promote enjoyment and motivation in young patients engaged in rehabilitation therapies. This study lays the groundwork for future research to determine the long term effects of audio feedback on functional outcomes of therapy. Stair-climbing is an important mobility skill for promoting independence and activities of daily life and is a key component of rehabilitation therapies for physically disabled children. Provision of audio feedback during stair-climbing therapies for young children may increase their achievement of a targeted therapeutic goal (i.e., use of reciprocal steps). Children's motivation and enjoyment of the stair-climbing therapy was enhanced when audio feedback was provided.
Making Your Blackboard Courses Talk!
ERIC Educational Resources Information Center
Burcham, Tim M.
This presentation shows how to deliver audio/video (AV) lectures to online students using relatively inexpensive AV software (i.e., Camtasia Studio) and the standard Blackboard interface. The first section describes two types of production programs: presentation media converters and screen capture utilities. The second section covers making an AV…
A Case Study in Acoustical Design.
ERIC Educational Resources Information Center
Ledford, Bruce R.; Brown, John A.
1992-01-01
Addresses concerns of both facilities planners and instructional designers in planning for the audio component of group presentations. Factors in the architectural design of enclosures for the reproduction of sound are described, including frequency, amplitude, and reverberation; and a case study for creating an acceptable enclosure is presented.…
Code of Federal Regulations, 2013 CFR
2013-01-01
... that conducting the conference by audio-visual telecommunication: (i) Is necessary to prevent prejudice.... If the Judge determines that a conference conducted by audio-visual telecommunication would... correspondence, the conference shall be conducted by audio-visual telecommunication unless the Judge determines...
Code of Federal Regulations, 2011 CFR
2011-01-01
... that conducting the conference by audio-visual telecommunication: (i) Is necessary to prevent prejudice.... If the Judge determines that a conference conducted by audio-visual telecommunication would... correspondence, the conference shall be conducted by audio-visual telecommunication unless the Judge determines...
47 CFR 11.54 - EAS operation during a National Level emergency.
Code of Federal Regulations, 2013 CFR
2013-10-01
... emergency, EAS Participants may transmit in lieu of the EAS audio feed an audio feed of the President's voice message from an alternative source, such as a broadcast network audio feed. [77 FR 16705, Mar. 22...
Code of Federal Regulations, 2012 CFR
2012-01-01
... that conducting the conference by audio-visual telecommunication: (i) Is necessary to prevent prejudice.... If the Judge determines that a conference conducted by audio-visual telecommunication would... correspondence, the conference shall be conducted by audio-visual telecommunication unless the Judge determines...
7 CFR 47.14 - Prehearing conferences.
Code of Federal Regulations, 2012 CFR
2012-01-01
... determines that conducting the conference by audio-visual telecommunication: (i) Is necessary to prevent.... If the examiner determines that a conference conducted by audio-visual telecommunication would... correspondence, the conference shall be conducted by audio-visual telecommunication unless the examiner...
47 CFR 11.54 - EAS operation during a National Level emergency.
Code of Federal Regulations, 2014 CFR
2014-10-01
... emergency, EAS Participants may transmit in lieu of the EAS audio feed an audio feed of the President's voice message from an alternative source, such as a broadcast network audio feed. [77 FR 16705, Mar. 22...
Code of Federal Regulations, 2014 CFR
2014-01-01
... that conducting the conference by audio-visual telecommunication: (i) Is necessary to prevent prejudice.... If the Judge determines that a conference conducted by audio-visual telecommunication would... correspondence, the conference shall be conducted by audio-visual telecommunication unless the Judge determines...
Code of Federal Regulations, 2012 CFR
2012-01-01
... which the deposition is to be conducted (telephone, audio-visual telecommunication, or by personal...) The place of the deposition; (iii) The manner of the deposition (telephone, audio-visual... shall be conducted in the manner (telephone, audio-visual telecommunication, or personal attendance of...
Code of Federal Regulations, 2010 CFR
2010-01-01
... that conducting the conference by audio-visual telecommunication: (i) Is necessary to prevent prejudice.... If the Judge determines that a conference conducted by audio-visual telecommunication would... correspondence, the conference shall be conducted by audio-visual telecommunication unless the Judge determines...
47 CFR 11.54 - EAS operation during a National Level emergency.
Code of Federal Regulations, 2012 CFR
2012-10-01
... emergency, EAS Participants may transmit in lieu of the EAS audio feed an audio feed of the President's voice message from an alternative source, such as a broadcast network audio feed. [77 FR 16705, Mar. 22...
Realization of guitar audio effects using methods of digital signal processing
NASA Astrophysics Data System (ADS)
Buś, Szymon; Jedrzejewski, Konrad
2015-09-01
The paper is devoted to studies on possibilities of realization of guitar audio effects by means of methods of digital signal processing. As a result of research, some selected audio effects corresponding to the specifics of guitar sound were realized as the real-time system called Digital Guitar Multi-effect. Before implementation in the system, the selected effects were investigated using the dedicated application with a graphical user interface created in Matlab environment. In the second stage, the real-time system based on a microcontroller and an audio codec was designed and realized. The system is designed to perform audio effects on the output signal of an electric guitar.
Power saver circuit for audio/visual signal unit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Right, R. W.
1985-02-12
A combined audio and visual signal unit with the audio and visual components actuated alternately and powered over a single cable pair in such a manner that only one of the audio and visual components is drawing power from the power supply at any given instant. Thus, the power supply is never called upon to provide more energy than that drawn by the one of the components having the greater power requirement. This is particularly advantageous when several combined audio and visual signal units are coupled in parallel on one cable pair. Typically, the signal unit may comprise a hornmore » and a strobe light for a fire alarm signalling system.« less
The Verbotonal Method of Aural Rehabilitation: A Case Study
ERIC Educational Resources Information Center
Eisenberg, Diane; Santore, Frances
1976-01-01
A case study is presented of a 12-year-old child with a congenital profound bilateral sensori-neural hearing loss, who received rehabilitative audio-therapy according to the verbotonal method. (Author/LS)
ERIC Educational Resources Information Center
Plants, Helen L.; And Others
1973-01-01
Discusses the characteristics, especially the commonalities, of four innovative educational methods: audio-tutorial systems, guided design, programed instruction, and the Keller method. Indicates that content materials still play the major role in student learning at present. (CC)
Code of Federal Regulations, 2012 CFR
2012-01-01
... (telephone, audio-visual telecommunication, or personal attendance of those who are to participate in the... that conducting the deposition by audio-visual telecommunication: (i) Is necessary to prevent prejudice... determines that a deposition conducted by audio-visual telecommunication would measurably increase the United...
47 CFR Figure 2 to Subpart N of... - Typical Audio Wave
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 1 2011-10-01 2011-10-01 false Typical Audio Wave 2 Figure 2 to Subpart N of Part 2 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL FREQUENCY ALLOCATIONS AND RADIO... Audio Wave EC03JN91.006 ...
9 CFR 202.112 - Rule 12: Oral hearing.
Code of Federal Regulations, 2010 CFR
2010-01-01
... hearing shall be conducted by audio-visual telecommunication unless the presiding officer determines that... hearing by audio-visual telecommunication. If the presiding officer determines that a hearing conducted by audio-visual telecommunication would measurably increase the United States Department of Agriculture's...
9 CFR 202.112 - Rule 12: Oral hearing.
Code of Federal Regulations, 2011 CFR
2011-01-01
... hearing shall be conducted by audio-visual telecommunication unless the presiding officer determines that... hearing by audio-visual telecommunication. If the presiding officer determines that a hearing conducted by audio-visual telecommunication would measurably increase the United States Department of Agriculture's...
MedlinePlus FAQ: Is audio description available for videos on MedlinePlus?
... audiodescription.html Question: Is audio description available for videos on MedlinePlus? To use the sharing features on ... page, please enable JavaScript. Answer: Audio description of videos helps make the content of videos accessible to ...
The impact of modality and working memory capacity on achievement in a multimedia environment
NASA Astrophysics Data System (ADS)
Stromfors, Charlotte M.
This study explored the impact of working memory capacity and student learning in a dual modality, multimedia environment titled Visualizing Topography. This computer-based instructional program focused on the basic skills in reading and interpreting topographic maps. Two versions of the program presented the same instructional content but varied the modality of verbal information: the audio-visual condition coordinated topographic maps and narration; the visual-visual condition provided the same topographic maps with readable text. An analysis of covariance procedure was conducted to evaluate the effects due to the two conditions in relation to working memory capacity, controlling for individual differences in spatial visualization and prior knowledge. The scores on the Figural Intersection Test were used to separate subjects into three levels in terms of their measured working memory capacity: low, medium, and high. Subjects accessed Visualizing Topography by way of the Internet and proceeded independently through the program. The program architecture was linear in format. Subjects had a minimum amount of flexibility within each of five segments, but not between segments. One hundred and fifty-one subjects were randomly assigned to either the audio-visual or the visual-visual condition. The average time spent in the program was thirty-one minutes. The results of the ANCOVA revealed a small to moderate modality effect favoring an audio-visual condition. The results also showed that subjects with low and medium working capacity benefited more from the audio-visual condition than the visual-visual condition, while subjects with a high working memory capacity did not benefit from either condition. Although splitting the data reduced group sizes, ANCOVA results by gender suggested that the audio-visual condition favored females with low working memory capacities. The results have implications for designers of educational software, the teachers who select software, and the students themselves. Splitting information into two, non-redundant sources, one audio and one visual, may effectively extend working memory capacity. This is especially significant for the student population encountering difficult science concepts that require the formation and manipulation of mental representations. It is recommended that multimedia environments be designed or selected with attention to modality conditions that facilitate student learning.
StirMark Benchmark: audio watermarking attacks based on lossy compression
NASA Astrophysics Data System (ADS)
Steinebach, Martin; Lang, Andreas; Dittmann, Jana
2002-04-01
StirMark Benchmark is a well-known evaluation tool for watermarking robustness. Additional attacks are added to it continuously. To enable application based evaluation, in our paper we address attacks against audio watermarks based on lossy audio compression algorithms to be included in the test environment. We discuss the effect of different lossy compression algorithms like MPEG-2 audio Layer 3, Ogg or VQF on a selection of audio test data. Our focus is on changes regarding the basic characteristics of the audio data like spectrum or average power and on removal of embedded watermarks. Furthermore we compare results of different watermarking algorithms and show that lossy compression is still a challenge for most of them. There are two strategies for adding evaluation of robustness against lossy compression to StirMark Benchmark: (a) use of existing free compression algorithms (b) implementation of a generic lossy compression simulation. We discuss how such a model can be implemented based on the results of our tests. This method is less complex, as no real psycho acoustic model has to be applied. Our model can be used for audio watermarking evaluation of numerous application fields. As an example, we describe its importance for e-commerce applications with watermarking security.
Padmanabhan, R; Hildreth, A J; Laws, D
2005-09-01
Pre-operative anxiety is common and often significant. Ambulatory surgery challenges our pre-operative goal of an anxiety-free patient by requiring people to be 'street ready' within a brief period of time after surgery. Recently, it has been demonstrated that music can be used successfully to relieve patient anxiety before operations, and that audio embedded with tones that create binaural beats within the brain of the listener decreases subjective levels of anxiety in patients with chronic anxiety states. We measured anxiety with the State-Trait Anxiety Inventory questionnaire and compared binaural beat audio (Binaural Group) with an identical soundtrack but without these added tones (Audio Group) and with a third group who received no specific intervention (No Intervention Group). Mean [95% confidence intervals] decreases in anxiety scores were 26.3%[19-33%] in the Binaural Group (p = 0.001 vs. Audio Group, p < 0.0001 vs. No Intervention Group), 11.1%[6-16%] in the Audio Group (p = 0.15 vs. No Intervention Group) and 3.8%[0-7%] in the No Intervention Group. Binaural beat audio has the potential to decrease acute pre-operative anxiety significantly.
Yu, Jesang; Choi, Ji Hoon; Ma, Sun Young; Jeung, Tae Sig
2015-01-01
Purpose To compare audio-only biofeedback to conventional audiovisual biofeedback for regulating patients' respiration during four-dimensional radiotherapy, limiting damage to healthy surrounding tissues caused by organ movement. Materials and Methods Six healthy volunteers were assisted by audiovisual or audio-only biofeedback systems to regulate their respirations. Volunteers breathed through a mask developed for this study by following computer-generated guiding curves displayed on a screen, combined with instructional sounds. They then performed breathing following instructional sounds only. The guiding signals and the volunteers' respiratory signals were logged at 20 samples per second. Results The standard deviations between the guiding and respiratory curves for the audiovisual and audio-only biofeedback systems were 21.55% and 23.19%, respectively; the average correlation coefficients were 0.9778 and 0.9756, respectively. The regularities between audiovisual and audio-only biofeedback for six volunteers' respirations were same statistically from the paired t-test. Conclusion The difference between the audiovisual and audio-only biofeedback methods was not significant. Audio-only biofeedback has many advantages, as patients do not require a mask and can quickly adapt to this method in the clinic. PMID:26484309
ERIC Educational Resources Information Center
Fredricks, Susan M.; Tierney, John; Bodek, Matthew; Fredericks, Margaret
2016-01-01
The objective of this article is to explain and provide rubrics for science and communication faculty as a means to help nonscience students, in basic science classes, understand that proper communication and presentation skills are a necessity in all courses and future walks of life.