Sample records for audio signal processing

  1. Real Time Implementation of an LPC Algorithm. Speech Signal Processing Research at CHI

    DTIC Science & Technology

    1975-05-01

    SIGNAL PROCESSING HARDWARE 2-1 2.1 INTRODUCTION 2-1 2.2 TWO- CHANNEL AUDIO SIGNAL SYSTEM 2-2 2.3 MULTI- CHANNEL AUDIO SIGNAL SYSTEM 2-5 2.3.1... Channel Audio Signal System 2-30 I ii kv^i^ünt«.jfc*. ji .„* ,:-v*. ’.ii. *.. ...... — ■ -,,.,-c-» —ipponp ■^ TOHaBWgBpwiBWgPlpaiPWgW v.«.wN...Messages .... 1-55 1-13. Lost or Out of Order Message 1-56 2-1. Block Diagram of Two- Channel Audio Signal System . . 2-3 2-2. Block Diagram of Audio

  2. DETECTOR FOR MODULATED AND UNMODULATED SIGNALS

    DOEpatents

    Patterson, H.H.; Webber, G.H.

    1959-08-25

    An r-f signal-detecting device is described, which is embodied in a compact coaxial circuit principally comprising a detecting crystal diode and a modulating crystal diode connected in parallel. Incoming modulated r-f signals are demodulated by the detecting crystal diode to furnish an audio input to an audio amplifier. The detecting diode will not, however, produce an audio signal from an unmodulated r-f signal. In order that unmodulated signals may be detected, such incoming signals have a locally produced audio signal superimposed on them at the modulating crystal diode and then the"induced or artificially modulated" signal is reflected toward the detecting diode which in the process of demodulation produces an audio signal for the audio amplifier.

  3. Realization of guitar audio effects using methods of digital signal processing

    NASA Astrophysics Data System (ADS)

    Buś, Szymon; Jedrzejewski, Konrad

    2015-09-01

    The paper is devoted to studies on possibilities of realization of guitar audio effects by means of methods of digital signal processing. As a result of research, some selected audio effects corresponding to the specifics of guitar sound were realized as the real-time system called Digital Guitar Multi-effect. Before implementation in the system, the selected effects were investigated using the dedicated application with a graphical user interface created in Matlab environment. In the second stage, the real-time system based on a microcontroller and an audio codec was designed and realized. The system is designed to perform audio effects on the output signal of an electric guitar.

  4. 47 CFR 73.756 - System specifications for double-sideband (DBS) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    .... Nominal carrier frequencies shall be integral multiples of 5 kHz. (2) Audio-frequency band. The upper limit of the audio-frequency band (at—3 dB) of the transmitter shall not exceed 4.5 kHz and the lower... processing. If audio-frequency signal processing is used, the dynamic range of the modulating signal shall be...

  5. 47 CFR 73.756 - System specifications for double-sideband (DBS) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    .... Nominal carrier frequencies shall be integral multiples of 5 kHz. (2) Audio-frequency band. The upper limit of the audio-frequency band (at—3 dB) of the transmitter shall not exceed 4.5 kHz and the lower... processing. If audio-frequency signal processing is used, the dynamic range of the modulating signal shall be...

  6. 47 CFR 73.756 - System specifications for double-sideband (DBS) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    .... Nominal carrier frequencies shall be integral multiples of 5 kHz. (2) Audio-frequency band. The upper limit of the audio-frequency band (at—3 dB) of the transmitter shall not exceed 4.5 kHz and the lower... processing. If audio-frequency signal processing is used, the dynamic range of the modulating signal shall be...

  7. 47 CFR 73.756 - System specifications for double-sideband (DBS) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    .... Nominal carrier frequencies shall be integral multiples of 5 kHz. (2) Audio-frequency band. The upper limit of the audio-frequency band (at—3 dB) of the transmitter shall not exceed 4.5 kHz and the lower... processing. If audio-frequency signal processing is used, the dynamic range of the modulating signal shall be...

  8. Method and apparatus for obtaining complete speech signals for speech recognition applications

    NASA Technical Reports Server (NTRS)

    Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

    2009-01-01

    The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

  9. Design and Implementation of a Video-Zoom Driven Digital Audio-Zoom System for Portable Digital Imaging Devices

    NASA Astrophysics Data System (ADS)

    Park, Nam In; Kim, Seon Man; Kim, Hong Kook; Kim, Ji Woon; Kim, Myeong Bo; Yun, Su Won

    In this paper, we propose a video-zoom driven audio-zoom algorithm in order to provide audio zooming effects in accordance with the degree of video-zoom. The proposed algorithm is designed based on a super-directive beamformer operating with a 4-channel microphone system, in conjunction with a soft masking process that considers the phase differences between microphones. Thus, the audio-zoom processed signal is obtained by multiplying an audio gain derived from a video-zoom level by the masked signal. After all, a real-time audio-zoom system is implemented on an ARM-CORETEX-A8 having a clock speed of 600 MHz after different levels of optimization are performed such as algorithmic level, C-code, and memory optimizations. To evaluate the complexity of the proposed real-time audio-zoom system, test data whose length is 21.3 seconds long is sampled at 48 kHz. As a result, it is shown from the experiments that the processing time for the proposed audio-zoom system occupies 14.6% or less of the ARM clock cycles. It is also shown from the experimental results performed in a semi-anechoic chamber that the signal with the front direction can be amplified by approximately 10 dB compared to the other directions.

  10. Robot Command Interface Using an Audio-Visual Speech Recognition System

    NASA Astrophysics Data System (ADS)

    Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

    In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

  11. Efficient audio signal processing for embedded systems

    NASA Astrophysics Data System (ADS)

    Chiu, Leung Kin

    As mobile platforms continue to pack on more computational power, electronics manufacturers start to differentiate their products by enhancing the audio features. However, consumers also demand smaller devices that could operate for longer time, hence imposing design constraints. In this research, we investigate two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound ”richer" and "fuller." Piezoelectric speakers have a small form factor but exhibit poor response in the low-frequency region. In the algorithm, we combine psychoacoustic bass extension and dynamic range compression to improve the perceived bass coming out from the tiny speakers. We also developed an audio energy reduction algorithm for loudspeaker power management. The perceptually transparent algorithm extends the battery life of mobile devices and prevents thermal damage in speakers. This method is similar to audio compression algorithms, which encode audio signals in such a ways that the compression artifacts are not easily perceivable. Instead of reducing the storage space, however, we suppress the audio contents that are below the hearing threshold, therefore reducing the signal energy. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The system is an example of an analog-to-information converter. The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. In this classifier architecture, we combine simple "base" analog classifiers to form a strong one. We also designed the circuits to implement the AdaBoost-based analog classifier.

  12. 47 CFR 73.757 - System specifications for single-sideband (SSB) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... dB per octave. (4) Modulation processing. If audio-frequency signal processing is used, the dynamic... broadcasting service. (a) System parameters—(1) Channel spacing. In a mixed DSB, SSB and digital environment... emission is one giving the same audio-frequency signal-to-noise ratio at the receiver output as the...

  13. AOD furnace splash soft-sensor in the smelting process based on improved BP neural network

    NASA Astrophysics Data System (ADS)

    Ma, Haitao; Wang, Shanshan; Wu, Libin; Yu, Ying

    2017-11-01

    In view of argon oxygen refining low carbon ferrochrome production process, in the splash of smelting process as the research object, based on splash mechanism analysis in the smelting process , using multi-sensor information fusion and BP neural network modeling techniques is proposed in this paper, using the vibration signal, the audio signal and the flame image signal in the furnace as the characteristic signal of splash, the vibration signal, the audio signal and the flame image signal in the furnace integration and modeling, and reconstruct splash signal, realize the splash soft measurement in the smelting process, the simulation results show that the method can accurately forecast splash type in the smelting process, provide a new method of measurement for forecast splash in the smelting process, provide more accurate information to control splash.

  14. Mining knowledge in noisy audio data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Czyzewski, A.

    1996-12-31

    This paper demonstrates a KDD method applied to audio data analysis, particularly, it presents possibilities which result from replacing traditional methods of analysis and acoustic signal processing by KDD algorithms when restoring audio recordings affected by strong noise.

  15. Acoustic signal recovery by thermal demodulation

    NASA Astrophysics Data System (ADS)

    Boullosa, R. R.; Santillán, Arturo O.

    2006-10-01

    One operating mode of recently developed thermoacoustic transducers is as an audio speaker that uses an input superimposed on a direct current; as a result, the audio signal occurs at the same frequency as the input signal. To extend the potential applications of these kinds of sources, the authors propose an alternative driving mode in which a simple thermoacoustic device, consisting of a metal film over a substrate and a heat sink, is excited with a high frequency sinusoid that is amplitude modulated by a lower frequency signal. They show that the modulating signal is recovered in the radiated waves due to a mechanism that is inherent to this type of thermoacoustic process. If the frequency of the carrier is higher than 30kHz and any modulating signal (the one of interest) is in the audio frequency range, only this signal will be heard. Thus, the thermoacoustic device operates as an audio-band, self-demodulating speaker.

  16. Fuzzy Logic-Based Audio Pattern Recognition

    NASA Astrophysics Data System (ADS)

    Malcangi, M.

    2008-11-01

    Audio and audio-pattern recognition is becoming one of the most important technologies to automatically control embedded systems. Fuzzy logic may be the most important enabling methodology due to its ability to rapidly and economically model such application. An audio and audio-pattern recognition engine based on fuzzy logic has been developed for use in very low-cost and deeply embedded systems to automate human-to-machine and machine-to-machine interaction. This engine consists of simple digital signal-processing algorithms for feature extraction and normalization, and a set of pattern-recognition rules manually tuned or automatically tuned by a self-learning process.

  17. Modified DCTNet for audio signals classification

    NASA Astrophysics Data System (ADS)

    Xian, Yin; Pu, Yunchen; Gan, Zhe; Lu, Liang; Thompson, Andrew

    2016-10-01

    In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental results show that the A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art performance in bird song classification rate, and improve artist identification accuracy in music data. They demonstrate A-DCTNet's applicability to signal processing problems.

  18. Digital Audio Signal Processing and Nde: AN Unlikely but Valuable Partnership

    NASA Astrophysics Data System (ADS)

    Gaydecki, Patrick

    2008-02-01

    In the Digital Signal Processing (DSP) group, within the School of Electrical and Electronic Engineering at The University of Manchester, research is conducted into two seemingly distinct and disparate subjects: instrumentation for nondestructive evaluation, and DSP systems & algorithms for digital audio. We have often found that many of the hardware systems and algorithms employed to recover, extract or enhance audio signals may also be applied to signals provided by ultrasonic or magnetic NDE instruments. Furthermore, modern DSP hardware is so fast (typically performing hundreds of millions of operations per second), that much of the processing and signal reconstruction may be performed in real time. Here, we describe some of the hardware systems we have developed, together with algorithms that can be implemented both in real time and offline. A next generation system has now been designed, which incorporates a processor operating at 0.55 Giga MMACS, six input and eight output analogue channels, digital input/output in the form of S/PDIF, a JTAG and a USB interface. The software allows the user, with no knowledge of filter theory or programming, to design and run standard or arbitrary FIR, IIR and adaptive filters. Using audio as a vehicle, we can demonstrate the remarkable properties of modern reconstruction algorithms when used in conjunction with such hardware; applications in NDE include signal enhancement and recovery in acoustic, ultrasonic, magnetic and eddy current modalities.

  19. Wavelet-based audio embedding and audio/video compression

    NASA Astrophysics Data System (ADS)

    Mendenhall, Michael J.; Claypoole, Roger L., Jr.

    2001-12-01

    Watermarking, traditionally used for copyright protection, is used in a new and exciting way. An efficient wavelet-based watermarking technique embeds audio information into a video signal. Several effective compression techniques are applied to compress the resulting audio/video signal in an embedded fashion. This wavelet-based compression algorithm incorporates bit-plane coding, index coding, and Huffman coding. To demonstrate the potential of this audio embedding and audio/video compression algorithm, we embed an audio signal into a video signal and then compress. Results show that overall compression rates of 15:1 can be achieved. The video signal is reconstructed with a median PSNR of nearly 33 dB. Finally, the audio signal is extracted from the compressed audio/video signal without error.

  20. Digital audio watermarking using moment-preserving thresholding

    NASA Astrophysics Data System (ADS)

    Choi, DooSeop; Jung, Hae Kyung; Choi, Hyuk; Kim, Taejeong

    2007-09-01

    The Moment-Preserving Thresholding technique for digital images has been used in digital image processing for decades, especially in image binarization and image compression. Its main strength lies in that the binary values that the MPT produces as a result, called representative values, are usually unaffected when the signal being thresholded goes through a signal processing operation. The two representative values in MPT together with the threshold value are obtained by solving the system of the preservation equations for the first, second, and third moment. Relying on this robustness of the representative values to various signal processing attacks considered in the watermarking context, this paper proposes a new watermarking scheme for audio signals. The watermark is embedded in the root-sum-square (RSS) of the two representative values of each signal block using the quantization technique. As a result, the RSS values are modified by scaling the signal according to the watermark bit sequence under the constraint of inaudibility relative to the human psycho-acoustic model. We also address and suggest solutions to the problem of synchronization and power scaling attacks. Experimental results show that the proposed scheme maintains high audio quality and robustness to various attacks including MP3 compression, re-sampling, jittering, and, DA/AD conversion.

  1. Ultrasonic Leak Detection System

    NASA Technical Reports Server (NTRS)

    Youngquist, Robert C. (Inventor); Moerk, J. Steven (Inventor)

    1998-01-01

    A system for detecting ultrasonic vibrations. such as those generated by a small leak in a pressurized container. vessel. pipe. or the like. comprises an ultrasonic transducer assembly and a processing circuit for converting transducer signals into an audio frequency range signal. The audio frequency range signal can be used to drive a pair of headphones worn by an operator. A diode rectifier based mixing circuit provides a simple, inexpensive way to mix the transducer signal with a square wave signal generated by an oscillator, and thereby generate the audio frequency signal. The sensitivity of the system is greatly increased through proper selection and matching of the system components. and the use of noise rejection filters and elements. In addition, a parabolic collecting horn is preferably employed which is mounted on the transducer assembly housing. The collecting horn increases sensitivity of the system by amplifying the received signals. and provides directionality which facilitates easier location of an ultrasonic vibration source.

  2. How actions shape perception: learning action-outcome relations and predicting sensory outcomes promote audio-visual temporal binding

    PubMed Central

    Desantis, Andrea; Haggard, Patrick

    2016-01-01

    To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events. PMID:27982063

  3. How actions shape perception: learning action-outcome relations and predicting sensory outcomes promote audio-visual temporal binding.

    PubMed

    Desantis, Andrea; Haggard, Patrick

    2016-12-16

    To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events.

  4. Reconstruction of audio waveforms from spike trains of artificial cochlea models

    PubMed Central

    Zai, Anja T.; Bhargava, Saurabh; Mesgarani, Nima; Liu, Shih-Chii

    2015-01-01

    Spiking cochlea models describe the analog processing and spike generation process within the biological cochlea. Reconstructing the audio input from the artificial cochlea spikes is therefore useful for understanding the fidelity of the information preserved in the spikes. The reconstruction process is challenging particularly for spikes from the mixed signal (analog/digital) integrated circuit (IC) cochleas because of multiple non-linearities in the model and the additional variance caused by random transistor mismatch. This work proposes an offline method for reconstructing the audio input from spike responses of both a particular spike-based hardware model called the AEREAR2 cochlea and an equivalent software cochlea model. This method was previously used to reconstruct the auditory stimulus based on the peri-stimulus histogram of spike responses recorded in the ferret auditory cortex. The reconstructed audio from the hardware cochlea is evaluated against an analogous software model using objective measures of speech quality and intelligibility; and further tested in a word recognition task. The reconstructed audio under low signal-to-noise (SNR) conditions (SNR < –5 dB) gives a better classification performance than the original SNR input in this word recognition task. PMID:26528113

  5. Concurrent emotional pictures modulate temporal order judgments of spatially separated audio-tactile stimuli.

    PubMed

    Jia, Lina; Shi, Zhuanghua; Zang, Xuelian; Müller, Hermann J

    2013-11-06

    Although attention can be captured toward high-arousal stimuli, little is known about how perceiving emotion in one modality influences the temporal processing of non-emotional stimuli in other modalities. We addressed this issue by presenting observers spatially uninformative emotional pictures while they performed an audio-tactile temporal-order judgment (TOJ) task. In Experiment 1, audio-tactile stimuli were presented at the same location straight ahead of the participants, who had to judge "which modality came first?". In Experiments 2 and 3, the audio-tactile stimuli were delivered one to the left and the other to the right side, and participants had to judge "which side came first?". We found both negative and positive high-arousal pictures to significantly bias TOJs towards the tactile and away from the auditory event when the audio-tactile stimuli were spatially separated; by contrast, there was no such bias when the audio-tactile stimuli originated from the same location. To further examine whether this bias is attributable to the emotional meanings conveyed by the pictures or to their high arousal effect, we compared and contrasted the influences of near-body threat vs. remote threat (emotional) pictures on audio-tactile TOJs in Experiment 3. The bias manifested only in the near-body threat condition. Taken together, the findings indicate that visual stimuli conveying meanings of near-body interaction activate a sensorimotor functional link prioritizing the processing of tactile over auditory signals when these signals are spatially separated. In contrast, audio-tactile signals from the same location engender strong crossmodal integration, thus counteracting modality-based attentional shifts induced by the emotional pictures. © 2013 Published by Elsevier B.V.

  6. Ultrasonic speech translator and communications system

    DOEpatents

    Akerman, M.A.; Ayers, C.W.; Haynes, H.D.

    1996-07-23

    A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system includes an ultrasonic transmitting device and an ultrasonic receiving device. The ultrasonic transmitting device accepts as input an audio signal such as human voice input from a microphone or tape deck. The ultrasonic transmitting device frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output. 7 figs.

  7. Audio signal analysis for tool wear monitoring in sheet metal stamping

    NASA Astrophysics Data System (ADS)

    Ubhayaratne, Indivarie; Pereira, Michael P.; Xiang, Yong; Rolfe, Bernard F.

    2017-02-01

    Stamping tool wear can significantly degrade product quality, and hence, online tool condition monitoring is a timely need in many manufacturing industries. Even though a large amount of research has been conducted employing different sensor signals, there is still an unmet demand for a low-cost easy to set up condition monitoring system. Audio signal analysis is a simple method that has the potential to meet this demand, but has not been previously used for stamping process monitoring. Hence, this paper studies the existence and the significance of the correlation between emitted sound signals and the wear state of sheet metal stamping tools. The corrupting sources generated by the tooling of the stamping press and surrounding machinery have higher amplitudes compared to that of the sound emitted by the stamping operation itself. Therefore, a newly developed semi-blind signal extraction technique was employed as a pre-processing technique to mitigate the contribution of these corrupting sources. The spectral analysis results of the raw and extracted signals demonstrate a significant qualitative relationship between wear progression and the emitted sound signature. This study lays the basis for employing low-cost audio signal analysis in the development of a real-time industrial tool condition monitoring system.

  8. Estimation of inhalation flow profile using audio-based methods to assess inhaler medication adherence.

    PubMed

    Taylor, Terence E; Lacalle Muls, Helena; Costello, Richard W; Reilly, Richard B

    2018-01-01

    Asthma and chronic obstructive pulmonary disease (COPD) patients are required to inhale forcefully and deeply to receive medication when using a dry powder inhaler (DPI). There is a clinical need to objectively monitor the inhalation flow profile of DPIs in order to remotely monitor patient inhalation technique. Audio-based methods have been previously employed to accurately estimate flow parameters such as the peak inspiratory flow rate of inhalations, however, these methods required multiple calibration inhalation audio recordings. In this study, an audio-based method is presented that accurately estimates inhalation flow profile using only one calibration inhalation audio recording. Twenty healthy participants were asked to perform 15 inhalations through a placebo Ellipta™ DPI at a range of inspiratory flow rates. Inhalation flow signals were recorded using a pneumotachograph spirometer while inhalation audio signals were recorded simultaneously using the Inhaler Compliance Assessment device attached to the inhaler. The acoustic (amplitude) envelope was estimated from each inhalation audio signal. Using only one recording, linear and power law regression models were employed to determine which model best described the relationship between the inhalation acoustic envelope and flow signal. Each model was then employed to estimate the flow signals of the remaining 14 inhalation audio recordings. This process repeated until each of the 15 recordings were employed to calibrate single models while testing on the remaining 14 recordings. It was observed that power law models generated the highest average flow estimation accuracy across all participants (90.89±0.9% for power law models and 76.63±2.38% for linear models). The method also generated sufficient accuracy in estimating inhalation parameters such as peak inspiratory flow rate and inspiratory capacity within the presence of noise. Estimating inhaler inhalation flow profiles using audio based methods may be clinically beneficial for inhaler technique training and the remote monitoring of patient adherence.

  9. Estimation of inhalation flow profile using audio-based methods to assess inhaler medication adherence

    PubMed Central

    Lacalle Muls, Helena; Costello, Richard W.; Reilly, Richard B.

    2018-01-01

    Asthma and chronic obstructive pulmonary disease (COPD) patients are required to inhale forcefully and deeply to receive medication when using a dry powder inhaler (DPI). There is a clinical need to objectively monitor the inhalation flow profile of DPIs in order to remotely monitor patient inhalation technique. Audio-based methods have been previously employed to accurately estimate flow parameters such as the peak inspiratory flow rate of inhalations, however, these methods required multiple calibration inhalation audio recordings. In this study, an audio-based method is presented that accurately estimates inhalation flow profile using only one calibration inhalation audio recording. Twenty healthy participants were asked to perform 15 inhalations through a placebo Ellipta™ DPI at a range of inspiratory flow rates. Inhalation flow signals were recorded using a pneumotachograph spirometer while inhalation audio signals were recorded simultaneously using the Inhaler Compliance Assessment device attached to the inhaler. The acoustic (amplitude) envelope was estimated from each inhalation audio signal. Using only one recording, linear and power law regression models were employed to determine which model best described the relationship between the inhalation acoustic envelope and flow signal. Each model was then employed to estimate the flow signals of the remaining 14 inhalation audio recordings. This process repeated until each of the 15 recordings were employed to calibrate single models while testing on the remaining 14 recordings. It was observed that power law models generated the highest average flow estimation accuracy across all participants (90.89±0.9% for power law models and 76.63±2.38% for linear models). The method also generated sufficient accuracy in estimating inhalation parameters such as peak inspiratory flow rate and inspiratory capacity within the presence of noise. Estimating inhaler inhalation flow profiles using audio based methods may be clinically beneficial for inhaler technique training and the remote monitoring of patient adherence. PMID:29346430

  10. Ultrasonic speech translator and communications system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Akerman, M.A.; Ayers, C.W.; Haynes, H.D.

    1996-07-23

    A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system includes an ultrasonic transmitting device and an ultrasonic receiving device. The ultrasonic transmitting device accepts as input an audio signal such as human voice input from a microphone or tape deck. The ultrasonic transmitting device frequency modulatesmore » an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output. 7 figs.« less

  11. Ultrasonic speech translator and communications system

    DOEpatents

    Akerman, M. Alfred; Ayers, Curtis W.; Haynes, Howard D.

    1996-01-01

    A wireless communication system undetectable by radio frequency methods for converting audio signals, including human voice, to electronic signals in the ultrasonic frequency range, transmitting the ultrasonic signal by way of acoustical pressure waves across a carrier medium, including gases, liquids, or solids, and reconverting the ultrasonic acoustical pressure waves back to the original audio signal. The ultrasonic speech translator and communication system (20) includes an ultrasonic transmitting device (100) and an ultrasonic receiving device (200). The ultrasonic transmitting device (100) accepts as input (115) an audio signal such as human voice input from a microphone (114) or tape deck. The ultrasonic transmitting device (100) frequency modulates an ultrasonic carrier signal with the audio signal producing a frequency modulated ultrasonic carrier signal, which is transmitted via acoustical pressure waves across a carrier medium such as gases, liquids or solids. The ultrasonic receiving device (200) converts the frequency modulated ultrasonic acoustical pressure waves to a frequency modulated electronic signal, demodulates the audio signal from the ultrasonic carrier signal, and conditions the demodulated audio signal to reproduce the original audio signal at its output (250).

  12. 47 CFR 10.520 - Common audio attention signal.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 47 Telecommunication 1 2011-10-01 2011-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...

  13. 47 CFR 10.520 - Common audio attention signal.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 47 Telecommunication 1 2013-10-01 2013-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...

  14. 47 CFR 10.520 - Common audio attention signal.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 47 Telecommunication 1 2014-10-01 2014-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...

  15. 47 CFR 10.520 - Common audio attention signal.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 47 Telecommunication 1 2012-10-01 2012-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...

  16. 47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... Message (EOM) codes using the EAS Protocol. The Attention Signal must precede any emergency audio message... audio messages. No Attention Signal is required for EAS messages that do not contain audio programming... EAS messages in the main audio channel. All DAB stations shall also transmit EAS messages on all audio...

  17. 47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... Message (EOM) codes using the EAS Protocol. The Attention Signal must precede any emergency audio message... audio messages. No Attention Signal is required for EAS messages that do not contain audio programming... EAS messages in the main audio channel. All DAB stations shall also transmit EAS messages on all audio...

  18. 47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... Message (EOM) codes using the EAS Protocol. The Attention Signal must precede any emergency audio message... audio messages. No Attention Signal is required for EAS messages that do not contain audio programming... EAS messages in the main audio channel. All DAB stations shall also transmit EAS messages on all audio...

  19. On-line Tool Wear Detection on DCMT070204 Carbide Tool Tip Based on Noise Cutting Audio Signal using Artificial Neural Network

    NASA Astrophysics Data System (ADS)

    Prasetyo, T.; Amar, S.; Arendra, A.; Zam Zami, M. K.

    2018-01-01

    This study develops an on-line detection system to predict the wear of DCMT070204 tool tip during the cutting process of the workpiece. The machine used in this research is CNC ProTurn 9000 to cut ST42 steel cylinder. The audio signal has been captured using the microphone placed in the tool post and recorded in Matlab. The signal is recorded at the sampling rate of 44.1 kHz, and the sampling size of 1024. The recorded signal is 110 data derived from the audio signal while cutting using a normal chisel and a worn chisel. And then perform signal feature extraction in the frequency domain using Fast Fourier Transform. Feature selection is done based on correlation analysis. And tool wear classification was performed using artificial neural networks with 33 input features selected. This artificial neural network is trained with back propagation method. Classification performance testing yields an accuracy of 74%.

  20. A digital audio/video interleaving system. [for Shuttle Orbiter

    NASA Technical Reports Server (NTRS)

    Richards, R. W.

    1978-01-01

    A method of interleaving an audio signal with its associated video signal for simultaneous transmission or recording, and the subsequent separation of the two signals, is described. Comparisons are made between the new audio signal interleaving system and the Skylab Pam audio/video interleaving system, pointing out improvements gained by using the digital audio/video interleaving system. It was found that the digital technique is the simplest, most effective and most reliable method for interleaving audio and/or other types of data into the video signal for the Shuttle Orbiter application. Details of the design of a multiplexer capable of accommodating two basic data channels, each consisting of a single 31.5-kb/s digital bit stream are given. An adaptive slope delta modulation system is introduced to digitize audio signals, producing a high immunity of work intelligibility to channel errors, primarily due to the robust nature of the delta-modulation algorithm.

  1. Low-cost mm-wave Doppler/FMCW transceivers for ground surveillance applications

    NASA Astrophysics Data System (ADS)

    Hansen, H. J.; Lindop, R. W.; Majstorovic, D.

    2005-12-01

    A 35 GHz Doppler CW/FMCW transceiver (Equivalent Radiated Power ERP=30dBm) has been assembled and its operation described. Both instantaneous beat signals (relating to range in FMCW mode) and Doppler signals (relating to targets moving at ~1.5 ms -1) exhibit audio frequencies. Consequently, the radar processing is provided by laptop PC using its inbuilt video-audio media system with appropriate MathWorks software. The implications of radar-on-chip developments are addressed.

  2. Characteristics of audio and sub-audio telluric signals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Telford, W.M.

    1977-06-01

    Telluric current measurements in the audio and sub-audio frequency range, made in various parts of Canada and South America over the past four years, indicate that the signal amplitude is relatively uniform over 6 to 8 midday hours (LMT) except in Chile and that the signal anisotropy is reasonably constant in azimuth.

  3. Ultra-low-cost clinical pulse oximetry.

    PubMed

    Petersen, Christian L; Gan, Heng; MacInnis, Martin J; Dumont, Guy A; Ansermino, J Mark

    2013-01-01

    An ultra-low-cost pulse oximeter is presented that interfaces a conventional clinical finger sensor with a mobile phone through the headset jack audio interface. All signal processing is performed using the audio subsystem of the phone. In a preliminary volunteer study in a hypoxia chamber, we compared the oxygen saturation obtained with the audio pulse oximeter against a commercially available (and FDA approved) reference pulse oximeter (Nonin Xpod). Good agreement was found between the outputs of the two devices.

  4. Design and implementation of an audio indicator

    NASA Astrophysics Data System (ADS)

    Zheng, Shiyong; Li, Zhao; Li, Biqing

    2017-04-01

    This page proposed an audio indicator which designed by using C9014, LED by operational amplifier level indicator, the decimal count/distributor of CD4017. The experimental can control audibly neon and holiday lights through the signal. Input audio signal after C9014 composed of operational amplifier for power amplifier, the adjust potentiometer extraction amplification signal input voltage CD4017 distributors make its drive to count, then connect the LED display running situation of the circuit. This simple audio indicator just use only U1 and can produce two colors LED with the audio signal tandem come pursuit of the running effect, from LED display the running of the situation takes can understand the general audio signal. The variation in the audio and the frequency of the signal and the corresponding level size. In this light can achieve jump to change, slowly, atlas, lighting four forms, used in home, hotel, discos, theater, advertising and other fields, and a wide range of USES, rU1h life in a modern society.

  5. Digital Multicasting of Multiple Audio Streams

    NASA Technical Reports Server (NTRS)

    Macha, Mitchell; Bullock, John

    2007-01-01

    The Mission Control Center Voice Over Internet Protocol (MCC VOIP) system (see figure) comprises hardware and software that effect simultaneous, nearly real-time transmission of as many as 14 different audio streams to authorized listeners via the MCC intranet and/or the Internet. The original version of the MCC VOIP system was conceived to enable flight-support personnel located in offices outside a spacecraft mission control center to monitor audio loops within the mission control center. Different versions of the MCC VOIP system could be used for a variety of public and commercial purposes - for example, to enable members of the general public to monitor one or more NASA audio streams through their home computers, to enable air-traffic supervisors to monitor communication between airline pilots and air-traffic controllers in training, and to monitor conferences among brokers in a stock exchange. At the transmitting end, the audio-distribution process begins with feeding the audio signals to analog-to-digital converters. The resulting digital streams are sent through the MCC intranet, using a user datagram protocol (UDP), to a server that converts them to encrypted data packets. The encrypted data packets are then routed to the personal computers of authorized users by use of multicasting techniques. The total data-processing load on the portion of the system upstream of and including the encryption server is the total load imposed by all of the audio streams being encoded, regardless of the number of the listeners or the number of streams being monitored concurrently by the listeners. The personal computer of a user authorized to listen is equipped with special- purpose MCC audio-player software. When the user launches the program, the user is prompted to provide identification and a password. In one of two access- control provisions, the program is hard-coded to validate the user s identity and password against a list maintained on a domain-controller computer at the MCC. In the other access-control provision, the program verifies that the user is authorized to have access to the audio streams. Once both access-control checks are completed, the audio software presents a graphical display that includes audiostream-selection buttons and volume-control sliders. The user can select all or any subset of the available audio streams and can adjust the volume of each stream independently of that of the other streams. The audio-player program spawns a "read" process for the selected stream(s). The spawned process sends, to the router(s), a "multicast-join" request for the selected streams. The router(s) responds to the request by sending the encrypted multicast packets to the spawned process. The spawned process receives the encrypted multicast packets and sends a decryption packet to audio-driver software. As the volume or muting features are changed by the user, interrupts are sent to the spawned process to change the corresponding attributes sent to the audio-driver software. The total latency of this system - that is, the total time from the origination of the audio signals to generation of sound at a listener s computer - lies between four and six seconds.

  6. Automated Cough Assessment on a Mobile Platform

    PubMed Central

    2014-01-01

    The development of an Automated System for Asthma Monitoring (ADAM) is described. This consists of a consumer electronics mobile platform running a custom application. The application acquires an audio signal from an external user-worn microphone connected to the device analog-to-digital converter (microphone input). This signal is processed to determine the presence or absence of cough sounds. Symptom tallies and raw audio waveforms are recorded and made easily accessible for later review by a healthcare provider. The symptom detection algorithm is based upon standard speech recognition and machine learning paradigms and consists of an audio feature extraction step followed by a Hidden Markov Model based Viterbi decoder that has been trained on a large database of audio examples from a variety of subjects. Multiple Hidden Markov Model topologies and orders are studied. Performance of the recognizer is presented in terms of the sensitivity and the rate of false alarm as determined in a cross-validation test. PMID:25506590

  7. Audio distribution and Monitoring Circuit

    NASA Technical Reports Server (NTRS)

    Kirkland, J. M.

    1983-01-01

    Versatile circuit accepts and distributes TV audio signals. Three-meter audio distribution and monitoring circuit provides flexibility in monitoring, mixing, and distributing audio inputs and outputs at various signal and impedance levels. Program material is simultaneously monitored on three channels, or single-channel version built to monitor transmitted or received signal levels, drive speakers, interface to building communications, and drive long-line circuits.

  8. Power saver circuit for audio/visual signal unit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Right, R. W.

    1985-02-12

    A combined audio and visual signal unit with the audio and visual components actuated alternately and powered over a single cable pair in such a manner that only one of the audio and visual components is drawing power from the power supply at any given instant. Thus, the power supply is never called upon to provide more energy than that drawn by the one of the components having the greater power requirement. This is particularly advantageous when several combined audio and visual signal units are coupled in parallel on one cable pair. Typically, the signal unit may comprise a hornmore » and a strobe light for a fire alarm signalling system.« less

  9. Perceptually controlled doping for audio source separation

    NASA Astrophysics Data System (ADS)

    Mahé, Gaël; Nadalin, Everton Z.; Suyama, Ricardo; Romano, João MT

    2014-12-01

    The separation of an underdetermined audio mixture can be performed through sparse component analysis (SCA) that relies however on the strong hypothesis that source signals are sparse in some domain. To overcome this difficulty in the case where the original sources are available before the mixing process, the informed source separation (ISS) embeds in the mixture a watermark, which information can help a further separation. Though powerful, this technique is generally specific to a particular mixing setup and may be compromised by an additional bitrate compression stage. Thus, instead of watermarking, we propose a `doping' method that makes the time-frequency representation of each source more sparse, while preserving its audio quality. This method is based on an iterative decrease of the distance between the distribution of the signal and a target sparse distribution, under a perceptual constraint. We aim to show that the proposed approach is robust to audio coding and that the use of the sparsified signals improves the source separation, in comparison with the original sources. In this work, the analysis is made only in instantaneous mixtures and focused on voice sources.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shull, D.

    This report documents the initial feasibility tests performed using a commercial acoustic emission instrument for the purpose of detecting beetles in Department of Energy 9975 shipping packages. The device selected for this testing was a commercial handheld instrument and probe developed for the detection of termites, weevils, beetles and other insect infestations in wooden structures, trees, plants and soil. The results of two rounds of testing are presented. The first tests were performed by the vendor using only the hand-held instrument’s indications and real-time operator analysis of the audio signal content. The second tests included hands-free positioning of the instrumentmore » probe and post-collection analysis of the recorded audio signal content including audio background comparisons. The test results indicate that the system is promising for detecting the presence of drugstore beetles, however, additional work would be needed to improve the ease of detection and to automate the signal processing to eliminate the need for human interpretation. Mechanisms for hands-free positioning of the probe and audio background discrimination are also necessary for reliable detection and to reduce potential operator dose in radiation environments.« less

  11. Digital signal processing techniques for pitch shifting and time scaling of audio signals

    NASA Astrophysics Data System (ADS)

    Buś, Szymon; Jedrzejewski, Konrad

    2016-09-01

    In this paper, we present the techniques used for modifying the spectral content (pitch shifting) and for changing the time duration (time scaling) of an audio signal. A short introduction gives a necessary background for understanding the discussed issues and contains explanations of the terms used in the paper. In subsequent sections we present three different techniques appropriate both for pitch shifting and for time scaling. These techniques use three different time-frequency representations of a signal, namely short-time Fourier transform (STFT), continuous wavelet transform (CWT) and constant-Q transform (CQT). The results of simulation studies devoted to comparison of the properties of these methods are presented and discussed in the paper.

  12. Robust High-Capacity Audio Watermarking Based on FFT Amplitude Modification

    NASA Astrophysics Data System (ADS)

    Fallahpour, Mehdi; Megías, David

    This paper proposes a novel robust audio watermarking algorithm to embed data and extract it in a bit-exact manner based on changing the magnitudes of the FFT spectrum. The key point is selecting a frequency band for embedding based on the comparison between the original and the MP3 compressed/decompressed signal and on a suitable scaling factor. The experimental results show that the method has a very high capacity (about 5kbps), without significant perceptual distortion (ODG about -0.25) and provides robustness against common audio signal processing such as added noise, filtering and MPEG compression (MP3). Furthermore, the proposed method has a larger capacity (number of embedded bits to number of host bits rate) than recent image data hiding methods.

  13. Detection and characterization of lightning-based sources using continuous wavelet transform: application to audio-magnetotellurics

    NASA Astrophysics Data System (ADS)

    Larnier, H.; Sailhac, P.; Chambodut, A.

    2018-01-01

    Atmospheric electromagnetic waves created by global lightning activity contain information about electrical processes of the inner and the outer Earth. Large signal-to-noise ratio events are particularly interesting because they convey information about electromagnetic properties along their path. We introduce a new methodology to automatically detect and characterize lightning-based waves using a time-frequency decomposition obtained through the application of continuous wavelet transform. We focus specifically on three types of sources, namely, atmospherics, slow tails and whistlers, that cover the frequency range 10 Hz to 10 kHz. Each wave has distinguishable characteristics in the time-frequency domain due to source shape and dispersion processes. Our methodology allows automatic detection of each type of event in the time-frequency decomposition thanks to their specific signature. Horizontal polarization attributes are also recovered in the time-frequency domain. This procedure is first applied to synthetic extremely low frequency time-series with different signal-to-noise ratios to test for robustness. We then apply it on real data: three stations of audio-magnetotelluric data acquired in Guadeloupe, oversea French territories. Most of analysed atmospherics and slow tails display linear polarization, whereas analysed whistlers are elliptically polarized. The diversity of lightning activity is finally analysed in an audio-magnetotelluric data processing framework, as used in subsurface prospecting, through estimation of the impedance response functions. We show that audio-magnetotelluric processing results depend mainly on the frequency content of electromagnetic waves observed in processed time-series, with an emphasis on the difference between morning and afternoon acquisition. Our new methodology based on the time-frequency signature of lightning-induced electromagnetic waves allows automatic detection and characterization of events in audio-magnetotelluric time-series, providing the means to assess quality of response functions obtained through processing.

  14. 47 CFR 10.520 - Common audio attention signal.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 47 Telecommunication 1 2010-10-01 2010-10-01 false Common audio attention signal. 10.520 Section 10.520 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL COMMERCIAL MOBILE ALERT SYSTEM Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment...

  15. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

    PubMed

    Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

    2018-05-01

    Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.

  16. Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans.

    PubMed

    Bresch, Erik; Nielsen, Jon; Nayak, Krishna; Narayanan, Shrikanth

    2006-10-01

    This letter describes a data acquisition setup for recording, and processing, running speech from a person in a magnetic resonance imaging (MRI) scanner. The main focus is on ensuring synchronicity between image and audio acquisition, and in obtaining good signal to noise ratio to facilitate further speech analysis and modeling. A field-programmable gate array based hardware design for synchronizing the scanner image acquisition to other external data such as audio is described. The audio setup itself features two fiber optical microphones and a noise-canceling filter. Two noise cancellation methods are described including a novel approach using a pulse sequence specific model of the gradient noise of the MRI scanner. The setup is useful for scientific speech production studies. Sample results of speech and singing data acquired and processed using the proposed method are given.

  17. Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans (L)

    PubMed Central

    Bresch, Erik; Nielsen, Jon; Nayak, Krishna; Narayanan, Shrikanth

    2007-01-01

    This letter describes a data acquisition setup for recording, and processing, running speech from a person in a magnetic resonance imaging (MRI) scanner. The main focus is on ensuring synchronicity between image and audio acquisition, and in obtaining good signal to noise ratio to facilitate further speech analysis and modeling. A field-programmable gate array based hardware design for synchronizing the scanner image acquisition to other external data such as audio is described. The audio setup itself features two fiber optical microphones and a noise-canceling filter. Two noise cancellation methods are described including a novel approach using a pulse sequence specific model of the gradient noise of the MRI scanner. The setup is useful for scientific speech production studies. Sample results of speech and singing data acquired and processed using the proposed method are given. PMID:17069275

  18. Speaker Localisation Using Time Difference of Arrival

    DTIC Science & Technology

    2008-04-01

    School of Electrical and Electronic Engineering of the University of Adelaide. His area of expertise and interest is in Signal Processing including audio ...support of Theatre intelligence capabilities. His recent research interests include: information visualisation , text and data mining, and speech and...by: steering microphone arrays to improve the quality of audio pickup for recording, communication and transcription; enhancing the separation – and

  19. 47 CFR 73.128 - AM stereophonic broadcasting.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... channel reversed. (iii) Left and Right Channel only, under all conditions of modulation for the... (NRSC-1). (2) The left and right channel audio signals shall conform to frequency response limitations...)=audio signal left channel, R(t)=audio signal right channel, m=modulation factor, and mpeak(L(t)+R(t))=1...

  20. 3D Audio System

    NASA Technical Reports Server (NTRS)

    1992-01-01

    Ames Research Center research into virtual reality led to the development of the Convolvotron, a high speed digital audio processing system that delivers three-dimensional sound over headphones. It consists of a two-card set designed for use with a personal computer. The Convolvotron's primary application is presentation of 3D audio signals over headphones. Four independent sound sources are filtered with large time-varying filters that compensate for motion. The perceived location of the sound remains constant. Possible applications are in air traffic control towers or airplane cockpits, hearing and perception research and virtual reality development.

  1. Multi-channel spatialization systems for audio signals

    NASA Technical Reports Server (NTRS)

    Begault, Durand R. (Inventor)

    1993-01-01

    Synthetic head related transfer functions (HRTF's) for imposing reprogrammable spatial cues to a plurality of audio input signals included, for example, in multiple narrow-band audio communications signals received simultaneously are generated and stored in interchangeable programmable read only memories (PROM's) which store both head related transfer function impulse response data and source positional information for a plurality of desired virtual source locations. The analog inputs of the audio signals are filtered and converted to digital signals from which synthetic head related transfer functions are generated in the form of linear phase finite impulse response filters. The outputs of the impulse response filters are subsequently reconverted to analog signals, filtered, mixed, and fed to a pair of headphones.

  2. Multi-channel spatialization system for audio signals

    NASA Technical Reports Server (NTRS)

    Begault, Durand R. (Inventor)

    1995-01-01

    Synthetic head related transfer functions (HRTF's) for imposing reprogramable spatial cues to a plurality of audio input signals included, for example, in multiple narrow-band audio communications signals received simultaneously are generated and stored in interchangeable programmable read only memories (PROM's) which store both head related transfer function impulse response data and source positional information for a plurality of desired virtual source locations. The analog inputs of the audio signals are filtered and converted to digital signals from which synthetic head related transfer functions are generated in the form of linear phase finite impulse response filters. The outputs of the impulse response filters are subsequently reconverted to analog signals, filtered, mixed and fed to a pair of headphones.

  3. Orbital component extraction by time-variant sinusoidal modeling.

    NASA Astrophysics Data System (ADS)

    Sinnesael, Matthias; Zivanovic, Miroslav; De Vleeschouwer, David; Claeys, Philippe; Schoukens, Johan

    2016-04-01

    Accurately deciphering periodic variations in paleoclimate proxy signals is essential for cyclostratigraphy. Classical spectral analysis often relies on methods based on the (Fast) Fourier Transformation. This technique has no unique solution separating variations in amplitude and frequency. This characteristic makes it difficult to correctly interpret a proxy's power spectrum or to accurately evaluate simultaneous changes in amplitude and frequency in evolutionary analyses. Here, we circumvent this drawback by using a polynomial approach to estimate instantaneous amplitude and frequency in orbital components. This approach has been proven useful to characterize audio signals (music and speech), which are non-stationary in nature (Zivanovic and Schoukens, 2010, 2012). Paleoclimate proxy signals and audio signals have in nature similar dynamics; the only difference is the frequency relationship between the different components. A harmonic frequency relationship exists in audio signals, whereas this relation is non-harmonic in paleoclimate signals. However, the latter difference is irrelevant for the problem at hand. Using a sliding window approach, the model captures time variations of an orbital component by modulating a stationary sinusoid centered at its mean frequency, with a single polynomial. Hence, the parameters that determine the model are the mean frequency of the orbital component and the polynomial coefficients. The first parameter depends on geologic interpretation, whereas the latter are estimated by means of linear least-squares. As an output, the model provides the orbital component waveform, either in the depth or time domain. Furthermore, it allows for a unique decomposition of the signal into its instantaneous amplitude and frequency. Frequency modulation patterns can be used to reconstruct changes in accumulation rate, whereas amplitude modulation can be used to reconstruct e.g. eccentricity-modulated precession. The time-variant sinusoidal model is applied to well-established Pleistocene benthic isotope records to evaluate its performance. Zivanovic M. and Schoukens J. (2010) On The Polynomial Approximation for Time-Variant Harmonic Signal Modeling. IEEE Transactions On Audio, Speech, and Language Processing vol. 19, no. 3, pp. 458-467. Doi: 10.1109/TASL.2010.2049673. Zivanovic M. and Schoukens J. (2012) Single and Piecewise Polynomials for Modeling of Pitched Sounds. IEEE Transactions On Audio, Speech, and Language Processing vol. 20, no. 4, pp. 1270-1281. Doi: 10.1109/TASL.2011.2174228.

  4. A haptic-inspired audio approach for structural health monitoring decision-making

    NASA Astrophysics Data System (ADS)

    Mao, Zhu; Todd, Michael; Mascareñas, David

    2015-03-01

    Haptics is the field at the interface of human touch (tactile sensation) and classification, whereby tactile feedback is used to train and inform a decision-making process. In structural health monitoring (SHM) applications, haptic devices have been introduced and applied in a simplified laboratory scale scenario, in which nonlinearity, representing the presence of damage, was encoded into a vibratory manual interface. In this paper, the "spirit" of haptics is adopted, but here ultrasonic guided wave scattering information is transformed into audio (rather than tactile) range signals. After sufficient training, the structural damage condition, including occurrence and location, can be identified through the encoded audio waveforms. Different algorithms are employed in this paper to generate the transformed audio signals and the performance of each encoding algorithms is compared, and also compared with standard machine learning classifiers. In the long run, the haptic decision-making is aiming to detect and classify structural damages in a more rigorous environment, and approaching a baseline-free fashion with embedded temperature compensation.

  5. Signals Intelligence - Processing - Analysis - Classification

    DTIC Science & Technology

    2009-10-01

    Example: Language identification from audio signals. In a certain mission, a set of languages seems important beforehand. These languages will – with a...Uebler, Ulla (2003) The Visualisation of Diverse Intelligence. In Proceedings NATO (Research and Technology Agency) conference on “Military Data

  6. 47 CFR 78.1 - Purpose.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... television and related audio signals, signals of standard and FM broadcast stations, signals of BRS/EBS fixed... audio signals to TV translator and low-power TV stations. [69 FR 72046, Dec. 10, 2004] ...

  7. 47 CFR 78.1 - Purpose.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... television and related audio signals, signals of standard and FM broadcast stations, signals of BRS/EBS fixed... audio signals to TV translator and low-power TV stations. [69 FR 72046, Dec. 10, 2004] ...

  8. 47 CFR 78.1 - Purpose.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... television and related audio signals, signals of standard and FM broadcast stations, signals of BRS/EBS fixed... audio signals to TV translator and low-power TV stations. [69 FR 72046, Dec. 10, 2004] ...

  9. 47 CFR 78.1 - Purpose.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... television and related audio signals, signals of standard and FM broadcast stations, signals of BRS/EBS fixed... audio signals to TV translator and low-power TV stations. [69 FR 72046, Dec. 10, 2004] ...

  10. 47 CFR 78.1 - Purpose.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... television and related audio signals, signals of standard and FM broadcast stations, signals of BRS/EBS fixed... audio signals to TV translator and low-power TV stations. [69 FR 72046, Dec. 10, 2004] ...

  11. Multisensory and modality specific processing of visual speech in different regions of the premotor cortex

    PubMed Central

    Callan, Daniel E.; Jones, Jeffery A.; Callan, Akiko

    2014-01-01

    Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex (PMC) has been shown to be active during both observation and execution of action (“Mirror System” properties), and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI) study, participants identified vowels produced by a speaker in audio-visual (saw the speaker's articulating face and heard her voice), visual only (only saw the speaker's articulating face), and audio only (only heard the speaker's voice) conditions with varying audio signal-to-noise ratios in order to determine the regions of the PMC involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the functional magnetic resonance imaging (fMRI) analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and PMC. The left ventral inferior premotor cortex (PMvi) showed properties of multimodal (audio-visual) enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex (PMvs/PMd) did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the PMC are involved with mapping unimodal (in this case visual) sensory features of the speech signal with articulatory speech gestures. PMID:24860526

  12. Singing voice detection for karaoke application

    NASA Astrophysics Data System (ADS)

    Shenoy, Arun; Wu, Yuansheng; Wang, Ye

    2005-07-01

    We present a framework to detect the regions of singing voice in musical audio signals. This work is oriented towards the development of a robust transcriber of lyrics for karaoke applications. The technique leverages on a combination of low-level audio features and higher level musical knowledge of rhythm and tonality. Musical knowledge of the key is used to create a song-specific filterbank to attenuate the presence of the pitched musical instruments. This is followed by subband processing of the audio to detect the musical octaves in which the vocals are present. Text processing is employed to approximate the duration of the sung passages using freely available lyrics. This is used to obtain a dynamic threshold for vocal/ non-vocal segmentation. This pairing of audio and text processing helps create a more accurate system. Experimental evaluation on a small database of popular songs shows the validity of the proposed approach. Holistic and per-component evaluation of the system is conducted and various improvements are discussed.

  13. 47 CFR 73.756 - System specifications for double-sideband (DBS) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 47 Telecommunication 4 2010-10-01 2010-10-01 false System specifications for double-sideband (DBS... Stations § 73.756 System specifications for double-sideband (DBS) modulated emissions in the HF... processing. If audio-frequency signal processing is used, the dynamic range of the modulating signal shall be...

  14. Low-order auditory Zernike moment: a novel approach for robust music identification in the compressed domain

    NASA Astrophysics Data System (ADS)

    Li, Wei; Xiao, Chuan; Liu, Yaduo

    2013-12-01

    Audio identification via fingerprint has been an active research field for years. However, most previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store music on personal computers and/or transmit it over the Internet. It will be interesting if a compressed unknown audio fragment could be directly recognized from the database without decompressing it into the wave format at first. So far, very few algorithms run directly on the compressed domain for music information retrieval, and most of them take advantage of the modified discrete cosine transform coefficients or derived cepstrum and energy type of features. As a first attempt, we propose in this paper utilizing compressed domain auditory Zernike moment adapted from image processing techniques as the key feature to devise a novel robust audio identification algorithm. Such fingerprint exhibits strong robustness, due to its statistically stable nature, against various audio signal distortions such as recompression, noise contamination, echo adding, equalization, band-pass filtering, pitch shifting, and slight time scale modification. Experimental results show that in a music database which is composed of 21,185 MP3 songs, a 10-s long music segment is able to identify its original near-duplicate recording, with average top-5 hit rate up to 90% or above even under severe audio signal distortions.

  15. Apparatus for providing sensory substitution of force feedback

    NASA Technical Reports Server (NTRS)

    Massimino, Michael J. (Inventor); Sheridan, Thomas B. (Inventor)

    1995-01-01

    A feedback apparatus for an operator to control an effector that is remote from the operator to interact with a remote environment has a local input device to be manipulated by the operator. Sensors in the effector's environment are capable of sensing the amplitude of forces arising between the effector and its environment, the direction of application of such forces, or both amplitude and direction. A feedback signal corresponding to such a component of the force, is generated and transmitted to the environment of the operator. The signal is transduced into an auditory sensory substitution signal to which the operator is sensitive. Sound production apparatus present the auditory signal to the operator. The full range of the force amplitude may be represented by a single, audio speaker. Auditory display elements may be stereo headphones or free standing audio speakers, numbering from one to many more than two. The location of the application of the force may also be specified by the location of audio speakers that generate signals corresponding to specific forces. Alternatively, the location may be specified by the frequency of an audio signal, or by the apparent location of an audio signal, as simulated by a combination of signals originating at different locations.

  16. Miles Hand Grenade

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harrington, John J.; Buttz, James H.; Maish, Alex B.

    2005-11-15

    A simulated grenade for MILES-type simulations generates a unique RF signal and a unique audio signal. A detector utilizes the time between receipt of the RF signal and the slower-traveling audio signal to determine the distance between the detector and the simulated grenade.

  17. TECHNICAL NOTE: Portable audio electronics for impedance-based measurements in microfluidics

    NASA Astrophysics Data System (ADS)

    Wood, Paul; Sinton, David

    2010-08-01

    We demonstrate the use of audio electronics-based signals to perform on-chip electrochemical measurements. Cell phones and portable music players are examples of consumer electronics that are easily operated and are ubiquitous worldwide. Audio output (play) and input (record) signals are voltage based and contain frequency and amplitude information. A cell phone, laptop soundcard and two compact audio players are compared with respect to frequency response; the laptop soundcard provides the most uniform frequency response, while the cell phone performance is found to be insufficient. The audio signals in the common portable music players and laptop soundcard operate in the range of 20 Hz to 20 kHz and are found to be applicable, as voltage input and output signals, to impedance-based electrochemical measurements in microfluidic systems. Validated impedance-based measurements of concentration (0.1-50 mM), flow rate (2-120 µL min-1) and particle detection (32 µm diameter) are demonstrated. The prevailing, lossless, wave audio file format is found to be suitable for data transmission to and from external sources, such as a centralized lab, and the cost of all hardware (in addition to audio devices) is ~10 USD. The utility demonstrated here, in combination with the ubiquitous nature of portable audio electronics, presents new opportunities for impedance-based measurements in portable microfluidic systems.

  18. Non-invasive measurement of pulse wave velocity using transputer-based analysis of Doppler flow audio signals.

    PubMed

    Stewart, W R; Ramsey, M W; Jones, C J

    1994-08-01

    A system for the measurement of arterial pulse wave velocity is described. A personal computer (PC) plug-in transputer board is used to process the audio signals from two pocket Doppler ultrasound units. The transputer is used to provide a set of bandpass digital filters on two channels. The times of excursion of power through thresholds in each filter are recorded and used to estimate the onset of systolic flow. The system does not require an additional spectrum analyser and can work in real time. The transputer architecture provides for easy integration into any wider physiological measurement system.

  19. Audio-visual temporal perception in children with restored hearing.

    PubMed

    Gori, Monica; Chilosi, Anna; Forli, Francesca; Burr, David

    2017-05-01

    It is not clear how audio-visual temporal perception develops in children with restored hearing. In this study we measured temporal discrimination thresholds with an audio-visual temporal bisection task in 9 deaf children with restored audition, and 22 typically hearing children. In typically hearing children, audition was more precise than vision, with no gain in multisensory conditions (as previously reported in Gori et al. (2012b)). However, deaf children with restored audition showed similar thresholds for audio and visual thresholds and some evidence of gain in audio-visual temporal multisensory conditions. Interestingly, we found a strong correlation between auditory weighting of multisensory signals and quality of language: patients who gave more weight to audition had better language skills. Similarly, auditory thresholds for the temporal bisection task were also a good predictor of language skills. This result supports the idea that the temporal auditory processing is associated with language development. Copyright © 2017. Published by Elsevier Ltd.

  20. High capacity reversible watermarking for audio by histogram shifting and predicted error expansion.

    PubMed

    Wang, Fei; Xie, Zhaoxin; Chen, Zuo

    2014-01-01

    Being reversible, the watermarking information embedded in audio signals can be extracted while the original audio data can achieve lossless recovery. Currently, the few reversible audio watermarking algorithms are confronted with following problems: relatively low SNR (signal-to-noise) of embedded audio; a large amount of auxiliary embedded location information; and the absence of accurate capacity control capability. In this paper, we present a novel reversible audio watermarking scheme based on improved prediction error expansion and histogram shifting. First, we use differential evolution algorithm to optimize prediction coefficients and then apply prediction error expansion to output stego data. Second, in order to reduce location map bits length, we introduced histogram shifting scheme. Meanwhile, the prediction error modification threshold according to a given embedding capacity can be computed by our proposed scheme. Experiments show that this algorithm improves the SNR of embedded audio signals and embedding capacity, drastically reduces location map bits length, and enhances capacity control capability.

  1. Subjective evaluation and electroacoustic theoretical validation of a new approach to audio upmixing

    NASA Astrophysics Data System (ADS)

    Usher, John S.

    Audio signal processing systems for converting two-channel (stereo) recordings to four or five channels are increasingly relevant. These audio upmixers can be used with conventional stereo sound recordings and reproduced with multichannel home theatre or automotive loudspeaker audio systems to create a more engaging and natural-sounding listening experience. This dissertation discusses existing approaches to audio upmixing for recordings of musical performances and presents specific design criteria for a system to enhance spatial sound quality. A new upmixing system is proposed and evaluated according to these criteria and a theoretical model for its behavior is validated using empirical measurements. The new system removes short-term correlated components from two electronic audio signals using a pair of adaptive filters, updated according to a frequency domain implementation of the normalized-least-means-square algorithm. The major difference of the new system with all extant audio upmixers is that unsupervised time-alignment of the input signals (typically, by up to +/-10 ms) as a function of frequency (typically, using a 1024-band equalizer) is accomplished due to the non-minimum phase adaptive filter. Two new signals are created from the weighted difference of the inputs, and are then radiated with two loudspeakers behind the listener. According to the consensus in the literature on the effect of interaural correlation on auditory image formation, the self-orthogonalizing properties of the algorithm ensure minimal distortion of the frontal source imagery and natural-sounding, enveloping reverberance (ambiance) imagery. Performance evaluation of the new upmix system was accomplished in two ways: Firstly, using empirical electroacoustic measurements which validate a theoretical model of the system; and secondly, with formal listening tests which investigated auditory spatial imagery with a graphical mapping tool and a preference experiment. Both electroacoustic and subjective methods investigated system performance with a variety of test stimuli for solo musical performances reproduced using a loudspeaker in an orchestral concert-hall and recorded using different microphone techniques. The objective and subjective evaluations combined with a comparative study with two commercial systems demonstrate that the proposed system provides a new, computationally practical, high sound quality solution to upmixing.

  2. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.

    PubMed

    Giannakopoulos, Theodoros

    2015-01-01

    Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.

  3. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis

    PubMed Central

    Giannakopoulos, Theodoros

    2015-01-01

    Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library. PMID:26656189

  4. Sparse gammatone signal model optimized for English speech does not match the human auditory filters.

    PubMed

    Strahl, Stefan; Mertins, Alfred

    2008-07-18

    Evidence that neurosensory systems use sparse signal representations as well as improved performance of signal processing algorithms using sparse signal models raised interest in sparse signal coding in the last years. For natural audio signals like speech and environmental sounds, gammatone atoms have been derived as expansion functions that generate a nearly optimal sparse signal model (Smith, E., Lewicki, M., 2006. Efficient auditory coding. Nature 439, 978-982). Furthermore, gammatone functions are established models for the human auditory filters. Thus far, a practical application of a sparse gammatone signal model has been prevented by the fact that deriving the sparsest representation is, in general, computationally intractable. In this paper, we applied an accelerated version of the matching pursuit algorithm for gammatone dictionaries allowing real-time and large data set applications. We show that a sparse signal model in general has advantages in audio coding and that a sparse gammatone signal model encodes speech more efficiently in terms of sparseness than a sparse modified discrete cosine transform (MDCT) signal model. We also show that the optimal gammatone parameters derived for English speech do not match the human auditory filters, suggesting for signal processing applications to derive the parameters individually for each applied signal class instead of using psychometrically derived parameters. For brain research, it means that care should be taken with directly transferring findings of optimality for technical to biological systems.

  5. Exploring the Implementation of Steganography Protocols on Quantum Audio Signals

    NASA Astrophysics Data System (ADS)

    Chen, Kehan; Yan, Fei; Iliyasu, Abdullah M.; Zhao, Jianping

    2018-02-01

    Two quantum audio steganography (QAS) protocols are proposed, each of which manipulates or modifies the least significant qubit (LSQb) of the host quantum audio signal that is encoded as an FRQA (flexible representation of quantum audio) audio content. The first protocol (i.e. the conventional LSQb QAS protocol or simply the cLSQ stego protocol) is built on the exchanges between qubits encoding the quantum audio message and the LSQb of the amplitude information in the host quantum audio samples. In the second protocol, the embedding procedure to realize it implants information from a quantum audio message deep into the constraint-imposed most significant qubit (MSQb) of the host quantum audio samples, we refer to it as the pseudo MSQb QAS protocol or simply the pMSQ stego protocol. The cLSQ stego protocol is designed to guarantee high imperceptibility between the host quantum audio and its stego version, whereas the pMSQ stego protocol ensures that the resulting stego quantum audio signal is better immune to illicit tampering and copyright violations (a.k.a. robustness). Built on the circuit model of quantum computation, the circuit networks to execute the embedding and extraction algorithms of both QAS protocols are determined and simulation-based experiments are conducted to demonstrate their implementation. Outcomes attest that both protocols offer promising trade-offs in terms of imperceptibility and robustness.

  6. DWT-Based High Capacity Audio Watermarking

    NASA Astrophysics Data System (ADS)

    Fallahpour, Mehdi; Megías, David

    This letter suggests a novel high capacity robust audio watermarking algorithm by using the high frequency band of the wavelet decomposition, for which the human auditory system (HAS) is not very sensitive to alteration. The main idea is to divide the high frequency band into frames and then, for embedding, the wavelet samples are changed based on the average of the relevant frame. The experimental results show that the method has very high capacity (about 5.5kbps), without significant perceptual distortion (ODG in [-1, 0] and SNR about 33dB) and provides robustness against common audio signal processing such as added noise, filtering, echo and MPEG compression (MP3).

  7. Informed spectral analysis: audio signal parameter estimation using side information

    NASA Astrophysics Data System (ADS)

    Fourer, Dominique; Marchand, Sylvain

    2013-12-01

    Parametric models are of great interest for representing and manipulating sounds. However, the quality of the resulting signals depends on the precision of the parameters. When the signals are available, these parameters can be estimated, but the presence of noise decreases the resulting precision of the estimation. Furthermore, the Cramér-Rao bound shows the minimal error reachable with the best estimator, which can be insufficient for demanding applications. These limitations can be overcome by using the coding approach which consists in directly transmitting the parameters with the best precision using the minimal bitrate. However, this approach does not take advantage of the information provided by the estimation from the signal and may require a larger bitrate and a loss of compatibility with existing file formats. The purpose of this article is to propose a compromised approach, called the 'informed approach,' which combines analysis with (coded) side information in order to increase the precision of parameter estimation using a lower bitrate than pure coding approaches, the audio signal being known. Thus, the analysis problem is presented in a coder/decoder configuration where the side information is computed and inaudibly embedded into the mixture signal at the coder. At the decoder, the extra information is extracted and is used to assist the analysis process. This study proposes applying this approach to audio spectral analysis using sinusoidal modeling which is a well-known model with practical applications and where theoretical bounds have been calculated. This work aims at uncovering new approaches for audio quality-based applications. It provides a solution for challenging problems like active listening of music, source separation, and realistic sound transformations.

  8. Detection of emetic activity in the cat by monitoring venous pressure and audio signals

    NASA Technical Reports Server (NTRS)

    Nagahara, A.; Fox, Robert A.; Daunton, Nancy G.; Elfar, S.

    1991-01-01

    To investigate the use of audio signals as a simple, noninvasive measure of emetic activity, the relationship between the somatic events and sounds associated with retching and vomiting was studied. Thoracic venous pressure obtained from an implanted external jugular catheter was shown to provide a precise measure of the somatic events associated with retching and vomiting. Changes in thoracic venous pressure monitored through an indwelling external jugular catheter with audio signals, obtained from a microphone located above the animal in a test chamber, were compared. In addition, two independent observers visually monitored emetic episodes. Retching and vomiting were induced by injection of xylazine (0.66mg/kg s.c.), or by motion. A unique audio signal at a frequency of approximately 250 Hz is produced at the time of the negative thoracic venous pressure change associated with retching. Sounds with higher frequencies (around 2500 Hz) occur in conjunction with the positive pressure changes associated with vomiting. These specific signals could be discriminated reliably by individuals reviewing the audio recordings of the sessions. Retching and those emetic episodes associated with positive venous pressure changes were detected accurately by audio monitoring, with 90 percent of retches and 100 percent of emetic episodes correctly identified. Retching was detected more accurately (p is less than .05) by audio monitoring than by direct visual observation. However, with visual observation a few incidents in which stomach contents were expelled in the absence of positive pressure changes or detectable sounds were identified. These data suggest that in emetic situations, the expulsion of stomach contents may be accomplished by more than one neuromuscular system and that audio signals can be used to detect emetic episodes associated with thoracic venous pressure changes.

  9. Audio-guided audiovisual data segmentation, indexing, and retrieval

    NASA Astrophysics Data System (ADS)

    Zhang, Tong; Kuo, C.-C. Jay

    1998-12-01

    While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data, based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e., speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based upon morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes, such as applause, explosions, bird sounds, etc. This fine-level classification and indexing step is based upon time- frequency analysis of audio signals and the use of the hidden Markov model as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90 percent for the coarse-level classification, and higher than 85 percent for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.

  10. ASTP video tape recorder ground support equipment (audio/CTE splitter/interleaver). Operations manual

    NASA Technical Reports Server (NTRS)

    1974-01-01

    A descriptive handbook for the audio/CTE splitter/interleaver (RCA part No. 8673734-502) was presented. This unit is designed to perform two major functions: extract audio and time data from an interleaved video/audio signal (splitter section), and provide a test interleaved video/audio/CTE signal for the system (interleaver section). It is a rack mounting unit 7 inches high, 19 inches wide, 20 inches deep, mounted on slides for retracting from the rack, and weighs approximately 40 pounds. The following information is provided: installation, operation, principles of operation, maintenance, schematics and parts lists.

  11. Audio signal processor

    NASA Technical Reports Server (NTRS)

    Hymer, R. L.

    1970-01-01

    System provides automatic volume control for an audio amplifier or a voice communication system without introducing noise surges during pauses in the input, and without losing the initial signal when the input resumes.

  12. SNR-adaptive stream weighting for audio-MES ASR.

    PubMed

    Lee, Ki-Seung

    2008-08-01

    Myoelectric signals (MESs) from the speaker's mouth region have been successfully shown to improve the noise robustness of automatic speech recognizers (ASRs), thus promising to extend their usability in implementing noise-robust ASR. In the recognition system presented herein, extracted audio and facial MES features were integrated by a decision fusion method, where the likelihood score of the audio-MES observation vector was given by a linear combination of class-conditional observation log-likelihoods of two classifiers, using appropriate weights. We developed a weighting process adaptive to SNRs. The main objective of the paper involves determining the optimal SNR classification boundaries and constructing a set of optimum stream weights for each SNR class. These two parameters were determined by a method based on a maximum mutual information criterion. Acoustic and facial MES data were collected from five subjects, using a 60-word vocabulary. Four types of acoustic noise including babble, car, aircraft, and white noise were acoustically added to clean speech signals with SNR ranging from -14 to 31 dB. The classification accuracy of the audio ASR was as low as 25.5%. Whereas, the classification accuracy of the MES ASR was 85.2%. The classification accuracy could be further improved by employing the proposed audio-MES weighting method, which was as high as 89.4% in the case of babble noise. A similar result was also found for the other types of noise.

  13. Influence of F0 and Sequence Length of Audio and Electroglottographic Signals on Perturbation Measures for Voice Assessment.

    PubMed

    Hohm, Julian; Döllinger, Michael; Bohr, Christopher; Kniesburges, Stefan; Ziethe, Anke

    2015-07-01

    Within the functional assessment of voice disorders, an objective analysis of measured parameters from audio, electroglottographic (EGG), or visual signals is desired. In a typical clinical situation, reliable objective analysis is not always possible due to missing standardization and unknown stability of the clinical parameters. The aim of this study was to investigate the robustness/stability of measured clinical parameters of the audio and EGG signals in a typical clinical setting to ensure a reliable objective analysis. In particular, the influence of F0 and of the sequence length on several definitions of jitter and shimmer will be analyzed. Seventy-four young healthy women produced a sustained vowel /a/ and an upward triad with abrupt changeovers. Different sequence lengths (100, 150, 500, and 1000 ms) of sustained phonation and triads (100 and 150 ms) were extracted from the audio and EGG signals. In total, six variations of jitter and four variations of shimmer parameters were analyzed. Jitter%, Jitter11p, and JitterPPQ of the audio signal as well as Jittermean, Shimmer, and Shimmer11p of the EGG signal are unaffected by both sequence length and F0. Influence of F0 and sequence length on several perturbation measures of the audio and EGG signals was identified. For an objective clinical voice assessment, unaffected definitions of jitter and shimmer should be preferred and applied to enable comparability between different recordings, examinations, and studies. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  14. Reliability and validity of an audio signal modified shuttle walk test.

    PubMed

    Singla, Rupak; Rai, Richa; Faye, Abhishek Anil; Jain, Anil Kumar; Chowdhury, Ranadip; Bandyopadhyay, Debdutta

    2017-01-01

    The audio signal in the conventionally accepted protocol of shuttle walk test (SWT) is not well-understood by the patients and modification of the audio signal may improve the performance of the test. The aim of this study is to study the validity and reliability of an audio signal modified SWT, called the Singla-Richa modified SWT (SWTSR), in healthy normal adults. In SWTSR, the audio signal was modified with the addition of reverse counting to it. A total of 54 healthy normal adults underwent conventional SWT (CSWT) at one instance and two times SWTSRon the same day. The validity was assessed by comparing outcomes of the SWTSRto outcomes of CSWT using the Pearson correlation coefficient and Bland-Altman plot. Test-retest reliability of SWTSRwas assessed using the intraclass correlation coefficient (ICC). The acceptability of the modified test in comparison to the conventional test was assessed using Likert scale. The distance walked (mean ± standard deviation) in the CSWT and SWTSRtest was 853.33 ± 217.33 m and 857.22 ± 219.56 m, respectively (Pearson correlation coefficient - 0.98; P < 0.001) indicating SWTSRto be a valid test. The SWTSRwas found to be a reliable test with ICC of 0.98 (95% confidence interval: 0.97-0.99). The acceptability of SWTSRwas significantly higher than CSWT. The SWTSRwith modified audio signal with reverse counting is a reliable as well as a valid test when compared with CSWT in healthy normal adults. It better understood by subjects compared to CSWT.

  15. Programed Instruction for Selected CIC Watch Officer Tasks. 2. Evaluation of the Audio Notebook in the Teaching of the Allied Naval Signal Book.

    ERIC Educational Resources Information Center

    Brock, John F.

    The research evaluated oral program instruction used with a multitape recorder, the audio notebook, as a means of promoting adaptation to student differences and flexibility in instructional scheduling. Use of the Allied Naval Signal Book required by the CIC (Combat Information Center) watch officer position was programed for the audio notebook in…

  16. Audio feature extraction using probability distribution function

    NASA Astrophysics Data System (ADS)

    Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.

    2015-05-01

    Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.

  17. Perceptual congruency of audio-visual speech affects ventriloquism with bilateral visual stimuli.

    PubMed

    Kanaya, Shoko; Yokosawa, Kazuhiko

    2011-02-01

    Many studies on multisensory processes have focused on performance in simplified experimental situations, with a single stimulus in each sensory modality. However, these results cannot necessarily be applied to explain our perceptual behavior in natural scenes where various signals exist within one sensory modality. We investigated the role of audio-visual syllable congruency on participants' auditory localization bias or the ventriloquism effect using spoken utterances and two videos of a talking face. Salience of facial movements was also manipulated. Results indicated that more salient visual utterances attracted participants' auditory localization. Congruent pairing of audio-visual utterances elicited greater localization bias than incongruent pairing, while previous studies have reported little dependency on the reality of stimuli in ventriloquism. Moreover, audio-visual illusory congruency, owing to the McGurk effect, caused substantial visual interference on auditory localization. Multisensory performance appears more flexible and adaptive in this complex environment than in previous studies.

  18. Juno Listens to Jupiters Auroras

    NASA Image and Video Library

    2016-09-01

    During Juno's close flyby of Jupiter on August 27, 2016, the Waves instrument received radio signals associated with the giant planet's intense auroras. Animation and audio display the signals after they have been shifted into the audio frequency range.

  19. Steganalysis of recorded speech

    NASA Astrophysics Data System (ADS)

    Johnson, Micah K.; Lyu, Siwei; Farid, Hany

    2005-03-01

    Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.

  20. Audio signal encryption using chaotic Hénon map and lifting wavelet transforms

    NASA Astrophysics Data System (ADS)

    Roy, Animesh; Misra, A. P.

    2017-12-01

    We propose an audio signal encryption scheme based on the chaotic Hénon map. The scheme mainly comprises two phases: one is the preprocessing stage where the audio signal is transformed into data by the lifting wavelet scheme and the other in which the transformed data is encrypted by chaotic data set and hyperbolic functions. Furthermore, we use dynamic keys and consider the key space size to be large enough to resist any kind of cryptographic attacks. A statistical investigation is also made to test the security and the efficiency of the proposed scheme.

  1. The effects of stereo disparity on the behavioural and electrophysiological correlates of perception of audio-visual motion in depth.

    PubMed

    Harrison, Neil R; Witheridge, Sian; Makin, Alexis; Wuerger, Sophie M; Pegna, Alan J; Meyer, Georg F

    2015-11-01

    Motion is represented by low-level signals, such as size-expansion in vision or loudness changes in the auditory modality. The visual and auditory signals from the same object or event may be integrated and facilitate detection. We explored behavioural and electrophysiological correlates of congruent and incongruent audio-visual depth motion in conditions where auditory level changes, visual expansion, and visual disparity cues were manipulated. In Experiment 1 participants discriminated auditory motion direction whilst viewing looming or receding, 2D or 3D, visual stimuli. Responses were faster and more accurate for congruent than for incongruent audio-visual cues, and the congruency effect (i.e., difference between incongruent and congruent conditions) was larger for visual 3D cues compared to 2D cues. In Experiment 2, event-related potentials (ERPs) were collected during presentation of the 2D and 3D, looming and receding, audio-visual stimuli, while participants detected an infrequent deviant sound. Our main finding was that audio-visual congruity was affected by retinal disparity at an early processing stage (135-160ms) over occipito-parietal scalp. Topographic analyses suggested that similar brain networks were activated for the 2D and 3D congruity effects, but that cortical responses were stronger in the 3D condition. Differences between congruent and incongruent conditions were observed between 140-200ms, 220-280ms, and 350-500ms after stimulus onset. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Audio Frequency Analysis in Mobile Phones

    ERIC Educational Resources Information Center

    Aguilar, Horacio Munguía

    2016-01-01

    A new experiment using mobile phones is proposed in which its audio frequency response is analyzed using the audio port for inputting external signal and getting a measurable output. This experiment shows how the limited audio bandwidth used in mobile telephony is the main cause of the poor speech quality in this service. A brief discussion is…

  3. Entertainment and Pacification System For Car Seat

    NASA Technical Reports Server (NTRS)

    Elrod, Susan Vinz (Inventor); Dabney, Richard W. (Inventor)

    2006-01-01

    An entertainment and pacification system for use with a child car seat has speakers mounted in the child car seat with a plurality of audio sources and an anti-noise audio system coupled to the child car seat. A controllable switching system provides for, at any given time, the selective activation of i) one of the audio sources such that the audio signal generated thereby is coupled to one or more of the speakers, and ii) the anti-noise audio system such that an ambient-noise-canceling audio signal generated thereby is coupled to one or more of the speakers. The controllable switching system can receive commands generated at one of first controls located at the child car seat and second controls located remotely with respect to the child car seat with commands generated by the second controls overriding commands generated by the first controls.

  4. Detecting double compression of audio signal

    NASA Astrophysics Data System (ADS)

    Yang, Rui; Shi, Yun Q.; Huang, Jiwu

    2010-01-01

    MP3 is the most popular audio format nowadays in our daily life, for example music downloaded from the Internet and file saved in the digital recorder are often in MP3 format. However, low bitrate MP3s are often transcoded to high bitrate since high bitrate ones are of high commercial value. Also audio recording in digital recorder can be doctored easily by pervasive audio editing software. This paper presents two methods for the detection of double MP3 compression. The methods are essential for finding out fake-quality MP3 and audio forensics. The proposed methods use support vector machine classifiers with feature vectors formed by the distributions of the first digits of the quantized MDCT (modified discrete cosine transform) coefficients. Extensive experiments demonstrate the effectiveness of the proposed methods. To the best of our knowledge, this piece of work is the first one to detect double compression of audio signal.

  5. Dynamic single sideband modulation for realizing parametric loudspeaker

    NASA Astrophysics Data System (ADS)

    Sakai, Shinichi; Kamakura, Tomoo

    2008-06-01

    A parametric loudspeaker, that presents remarkably narrow directivity compared with a conventional loudspeaker, is newly produced and examined. To work the loudspeaker optimally, we prototyped digitally a single sideband modulator based on the Weaver method and appropriate signal processing. The processing techniques are to change the carrier amplitude dynamically depending on the envelope of audio signals, and then to operate the square root or fourth root to the carrier amplitude for improving input-output acoustic linearity. The usefulness of the present modulation scheme has been verified experimentally.

  6. Real-time implementation of second generation of audio multilevel information coding

    NASA Astrophysics Data System (ADS)

    Ali, Murtaza; Tewfik, Ahmed H.; Viswanathan, V.

    1994-03-01

    This paper describes real-time implementation of a novel wavelet- based audio compression method. This method is based on the discrete wavelet (DWT) representation of signals. A bit allocation procedure is used to allocate bits to the transform coefficients in an adaptive fashion. The bit allocation procedure has been designed to take advantage of the masking effect in human hearing. The procedure minimizes the number of bits required to represent each frame of audio signals at a fixed distortion level. The real-time implementation provides almost transparent compression of monophonic CD quality audio signals (samples at 44.1 KHz and quantized using 16 bits/sample) at bit rates of 64-78 Kbits/sec. Our implementation uses two ASPI Elf boards, each of which is built around a TI TMS230C31 DSP chip. The time required for encoding of a mono CD signal is about 92 percent of real time and that for decoding about 61 percent.

  7. A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices.

    PubMed

    Choi, Yerim; Jeon, Yu-Mi; Wang, Lin; Kim, Kwanho

    2017-08-23

    The safety of children has always been an important issue, and several studies have been conducted to determine the stress state of a child to ensure the safety. Audio signals and biological signals including heart rate are known to be effective for stress state detection. However, collecting those data requires specialized equipment, which is not appropriate for the constant monitoring of children, and advanced data analysis is required for accurate detection. In this regard, we propose a stress state detection framework which utilizes both audio signal and heart rate collected from wearable devices, and adopted machine learning methods for the detection. Experiments using real-world data were conducted to compare detection performances across various machine learning methods and noise levels of audio signal. Adopting the proposed framework in the real-world will contribute to the enhancement of child safety.

  8. Comparison between audio-only and audiovisual biofeedback for regulating patients' respiration during four-dimensional radiotherapy.

    PubMed

    Yu, Jesang; Choi, Ji Hoon; Ma, Sun Young; Jeung, Tae Sig; Lim, Sangwook

    2015-09-01

    To compare audio-only biofeedback to conventional audiovisual biofeedback for regulating patients' respiration during four-dimensional radiotherapy, limiting damage to healthy surrounding tissues caused by organ movement. Six healthy volunteers were assisted by audiovisual or audio-only biofeedback systems to regulate their respirations. Volunteers breathed through a mask developed for this study by following computer-generated guiding curves displayed on a screen, combined with instructional sounds. They then performed breathing following instructional sounds only. The guiding signals and the volunteers' respiratory signals were logged at 20 samples per second. The standard deviations between the guiding and respiratory curves for the audiovisual and audio-only biofeedback systems were 21.55% and 23.19%, respectively; the average correlation coefficients were 0.9778 and 0.9756, respectively. The regularities between audiovisual and audio-only biofeedback for six volunteers' respirations were same statistically from the paired t-test. The difference between the audiovisual and audio-only biofeedback methods was not significant. Audio-only biofeedback has many advantages, as patients do not require a mask and can quickly adapt to this method in the clinic.

  9. 47 CFR 73.402 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... Digital Audio Broadcasting § 73.402 Definitions. (a) DAB. Digital audio broadcast stations are those radio... purposes. (b) In Band On Channel DAB System. A technical system in which a station's digital signal is... which transmits both the digital and analog signals within the spectral emission mask of a single AM or...

  10. Department of Cybernetic Acoustics

    NASA Astrophysics Data System (ADS)

    The development of the theory, instrumentation and applications of methods and systems for the measurement, analysis, processing and synthesis of acoustic signals within the audio frequency range, particularly of the speech signal and the vibro-acoustic signal emitted by technical and industrial equipments treated as noise and vibration sources was discussed. The research work, both theoretical and experimental, aims at applications in various branches of science, and medicine, such as: acoustical diagnostics and phoniatric rehabilitation of pathological and postoperative states of the speech organ; bilateral ""man-machine'' speech communication based on the analysis, recognition and synthesis of the speech signal; vibro-acoustical diagnostics and continuous monitoring of the state of machines, technical equipments and technological processes.

  11. Attention to sound improves auditory reliability in audio-tactile spatial optimal integration.

    PubMed

    Vercillo, Tiziana; Gori, Monica

    2015-01-01

    The role of attention on multisensory processing is still poorly understood. In particular, it is unclear whether directing attention toward a sensory cue dynamically reweights cue reliability during integration of multiple sensory signals. In this study, we investigated the impact of attention in combining audio-tactile signals in an optimal fashion. We used the Maximum Likelihood Estimation (MLE) model to predict audio-tactile spatial localization on the body surface. We developed a new audio-tactile device composed by several small units, each one consisting of a speaker and a tactile vibrator independently controllable by external software. We tested participants in an attentional and a non-attentional condition. In the attentional experiment, participants performed a dual task paradigm: they were required to evaluate the duration of a sound while performing an audio-tactile spatial task. Three unisensory or multisensory stimuli, conflictual or not conflictual sounds and vibrations arranged along the horizontal axis, were presented sequentially. In the primary task participants had to evaluate in a space bisection task the position of the second stimulus (the probe) with respect to the others (the standards). In the secondary task they had to report occasionally changes in duration of the second auditory stimulus. In the non-attentional task participants had only to perform the primary task (space bisection). Our results showed an enhanced auditory precision (and auditory weights) in the auditory attentional condition with respect to the control non-attentional condition. The results of this study support the idea that modality-specific attention modulates multisensory integration.

  12. Sensitivity to audio-visual synchrony and its relation to language abilities in children with and without ASD.

    PubMed

    Righi, Giulia; Tenenbaum, Elena J; McCormick, Carolyn; Blossom, Megan; Amso, Dima; Sheinkopf, Stephen J

    2018-04-01

    Autism Spectrum Disorder (ASD) is often accompanied by deficits in speech and language processing. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to examine whether young children with ASD show reduced sensitivity to temporal asynchronies in a speech processing task when compared to typically developing controls, and to examine how this sensitivity might relate to language proficiency. Using automated eye tracking methods, we found that children with ASD failed to demonstrate sensitivity to asynchronies of 0.3s, 0.6s, or 1.0s between a video of a woman speaking and the corresponding audio track. In contrast, typically developing children who were language-matched to the ASD group, were sensitive to both 0.6s and 1.0s asynchronies. We also demonstrated that individual differences in sensitivity to audiovisual asynchronies and individual differences in orientation to relevant facial features were both correlated with scores on a standardized measure of language abilities. Results are discussed in the context of attention to visual language and audio-visual processing as potential precursors to language impairment in ASD. Autism Res 2018, 11: 645-653. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to explore whether children with ASD process audio-visual synchrony in ways comparable to their typically developing peers, and the relationship between preference for synchrony and language ability. Results showed that there are differences in attention to audiovisual synchrony between typically developing children and children with ASD. Preference for synchrony was related to the language abilities of children across groups. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.

  13. Measuring the electric activity of chick embryos heart through 16 bit audio card monitored by the Goldwavetm software

    NASA Astrophysics Data System (ADS)

    Silva, Dilson; Cortez, Celia Martins

    2015-12-01

    In the present work we used a high-resolution, low-cost apparatus capable of detecting waves fit inside the sound bandwidth, and the software package GoldwaveTM for graphical display, processing and monitoring the signals, to study aspects of the electric heart activity of early avian embryos, specifically at the 18th Hamburger & Hamilton stage of the embryo development. The species used was the domestic chick (Gallus gallus), and we carried out 23 experiments in which cardiographic spectra of QRS complex waves representing the propagation of depolarization waves through ventricles was recorded using microprobes and reference electrodes directly on the embryos. The results show that technique using 16 bit audio card monitored by the GoldwaveTM software was efficient to study signal aspects of heart electric activity of early avian embryos.

  14. Audio-based, unsupervised machine learning reveals cyclic changes in earthquake mechanisms in the Geysers geothermal field, California

    NASA Astrophysics Data System (ADS)

    Holtzman, B. K.; Paté, A.; Paisley, J.; Waldhauser, F.; Repetto, D.; Boschi, L.

    2017-12-01

    The earthquake process reflects complex interactions of stress, fracture and frictional properties. New machine learning methods reveal patterns in time-dependent spectral properties of seismic signals and enable identification of changes in faulting processes. Our methods are based closely on those developed for music information retrieval and voice recognition, using the spectrogram instead of the waveform directly. Unsupervised learning involves identification of patterns based on differences among signals without any additional information provided to the algorithm. Clustering of 46,000 earthquakes of $0.3

  15. Time-frequency analysis of acoustic signals in the audio-frequency range generated during Hadfield's steel friction

    NASA Astrophysics Data System (ADS)

    Dobrynin, S. A.; Kolubaev, E. A.; Smolin, A. Yu.; Dmitriev, A. I.; Psakhie, S. G.

    2010-07-01

    Time-frequency analysis of sound waves detected by a microphone during the friction of Hadfield’s steel has been performed using wavelet transform and window Fourier transform methods. This approach reveals a relationship between the appearance of quasi-periodic intensity outbursts in the acoustic response signals and the processes responsible for the formation of wear products. It is shown that the time-frequency analysis of acoustic emission in a tribosystem can be applied, along with traditional approaches, to studying features in the wear and friction process.

  16. Multichannel audio monitor for detecting electrical signals.

    PubMed

    Friesen, W O; Stent, G S

    1978-12-01

    The multichannel audio monitor (MUCAM) permits the simultaneous auditory monitoring of concurrent trains of electrical signals generated by as many as eight different sources. The basic working principle of this device is the modulation of the amplitude of a given pure tone by the incoming signals of each input channel. The MUCAM thus converts a complex, multichannel, temporal signal sequence into a musical melody suitable for instant, subliminal pattern analysis by the human ear. Neurophysiological experiments requiring multi-electrode recordings have provided one useful application of the MUCAM.

  17. Improved Techniques for Automatic Chord Recognition from Music Audio Signals

    ERIC Educational Resources Information Center

    Cho, Taemin

    2014-01-01

    This thesis is concerned with the development of techniques that facilitate the effective implementation of capable automatic chord transcription from music audio signals. Since chord transcriptions can capture many important aspects of music, they are useful for a wide variety of music applications and also useful for people who learn and perform…

  18. Comparison between audio-only and audiovisual biofeedback for regulating patients' respiration during four-dimensional radiotherapy

    PubMed Central

    Yu, Jesang; Choi, Ji Hoon; Ma, Sun Young; Jeung, Tae Sig

    2015-01-01

    Purpose To compare audio-only biofeedback to conventional audiovisual biofeedback for regulating patients' respiration during four-dimensional radiotherapy, limiting damage to healthy surrounding tissues caused by organ movement. Materials and Methods Six healthy volunteers were assisted by audiovisual or audio-only biofeedback systems to regulate their respirations. Volunteers breathed through a mask developed for this study by following computer-generated guiding curves displayed on a screen, combined with instructional sounds. They then performed breathing following instructional sounds only. The guiding signals and the volunteers' respiratory signals were logged at 20 samples per second. Results The standard deviations between the guiding and respiratory curves for the audiovisual and audio-only biofeedback systems were 21.55% and 23.19%, respectively; the average correlation coefficients were 0.9778 and 0.9756, respectively. The regularities between audiovisual and audio-only biofeedback for six volunteers' respirations were same statistically from the paired t-test. Conclusion The difference between the audiovisual and audio-only biofeedback methods was not significant. Audio-only biofeedback has many advantages, as patients do not require a mask and can quickly adapt to this method in the clinic. PMID:26484309

  19. Blind source separation and localization using microphone arrays

    NASA Astrophysics Data System (ADS)

    Sun, Longji

    The blind source separation and localization problem for audio signals is studied using microphone arrays. Pure delay mixtures of source signals typically encountered in outdoor environments are considered. Our proposed approach utilizes the subspace methods, including multiple signal classification (MUSIC) and estimation of signal parameters via rotational invariance techniques (ESPRIT) algorithms, to estimate the directions of arrival (DOAs) of the sources from the collected mixtures. Since audio signals are generally considered broadband, the DOA estimates at frequencies with the large sum of squared amplitude values are combined to obtain the final DOA estimates. Using the estimated DOAs, the corresponding mixing and demixing matrices are computed, and the source signals are recovered using the inverse short time Fourier transform. Subspace methods take advantage of the spatial covariance matrix of the collected mixtures to achieve robustness to noise. While the subspace methods have been studied for localizing radio frequency signals, audio signals have their special properties. For instance, they are nonstationary, naturally broadband and analog. All of these make the separation and localization for the audio signals more challenging. Moreover, our algorithm is essentially equivalent to the beamforming technique, which suppresses the signals in unwanted directions and only recovers the signals in the estimated DOAs. Several crucial issues related to our algorithm and their solutions have been discussed, including source number estimation, spatial aliasing, artifact filtering, different ways of mixture generation, and source coordinate estimation using multiple arrays. Additionally, comprehensive simulations and experiments have been conducted to examine various aspects of the algorithm. Unlike the existing blind source separation and localization methods, which are generally time consuming, our algorithm needs signal mixtures of only a short duration and therefore supports real-time implementation.

  20. Design and evaluation of a parametric model for cardiac sounds.

    PubMed

    Ibarra-Hernández, Roilhi F; Alonso-Arévalo, Miguel A; Cruz-Gutiérrez, Alejandro; Licona-Chávez, Ana L; Villarreal-Reyes, Salvador

    2017-10-01

    Heart sound analysis plays an important role in the auscultative diagnosis process to detect the presence of cardiovascular diseases. In this paper we propose a novel parametric heart sound model that accurately represents normal and pathological cardiac audio signals, also known as phonocardiograms (PCG). The proposed model considers that the PCG signal is formed by the sum of two parts: one of them is deterministic and the other one is stochastic. The first part contains most of the acoustic energy. This part is modeled by the Matching Pursuit (MP) algorithm, which performs an analysis-synthesis procedure to represent the PCG signal as a linear combination of elementary waveforms. The second part, also called residual, is obtained after subtracting the deterministic signal from the original heart sound recording and can be accurately represented as an autoregressive process using the Linear Predictive Coding (LPC) technique. We evaluate the proposed heart sound model by performing subjective and objective tests using signals corresponding to different pathological cardiac sounds. The results of the objective evaluation show an average Percentage of Root-Mean-Square Difference of approximately 5% between the original heart sound and the reconstructed signal. For the subjective test we conducted a formal methodology for perceptual evaluation of audio quality with the assistance of medical experts. Statistical results of the subjective evaluation show that our model provides a highly accurate approximation of real heart sound signals. We are not aware of any previous heart sound model rigorously evaluated as our proposal. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. 47 CFR 73.128 - AM stereophonic broadcasting.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... negative peaks of 100%. (ii) Stereophonic (L−R) modulated with audio tones of the same amplitude at the... characteristics: (1) The audio response of the main (L+R) channel shall conform to the requirements of the ANSI... (NRSC-1). (2) The left and right channel audio signals shall conform to frequency response limitations...

  2. Advances in audio source seperation and multisource audio content retrieval

    NASA Astrophysics Data System (ADS)

    Vincent, Emmanuel

    2012-06-01

    Audio source separation aims to extract the signals of individual sound sources from a given recording. In this paper, we review three recent advances which improve the robustness of source separation in real-world challenging scenarios and enable its use for multisource content retrieval tasks, such as automatic speech recognition (ASR) or acoustic event detection (AED) in noisy environments. We present a Flexible Audio Source Separation Toolkit (FASST) and discuss its advantages compared to earlier approaches such as independent component analysis (ICA) and sparse component analysis (SCA). We explain how cues as diverse as harmonicity, spectral envelope, temporal fine structure or spatial location can be jointly exploited by this toolkit. We subsequently present the uncertainty decoding (UD) framework for the integration of audio source separation and audio content retrieval. We show how the uncertainty about the separated source signals can be accurately estimated and propagated to the features. Finally, we explain how this uncertainty can be efficiently exploited by a classifier, both at the training and the decoding stage. We illustrate the resulting performance improvements in terms of speech separation quality and speaker recognition accuracy.

  3. Recording vocalizations with Bluetooth technology.

    PubMed

    Gaona-González, Andrés; Santillán-Doherty, Ana María; Arenas-Rosas, Rita Virginia; Muñoz-Delgado, Jairo; Aguillón-Pantaleón, Miguel Angel; Ordoñez-Gómez, José Domingo; Márquez-Arias, Alejandra

    2011-06-01

    We propose a method for capturing vocalizations that is designed to avoid some of the limiting factors found in traditional bioacoustical methods, such as the impossibility of obtaining continuous long-term registers or analyzing amplitude due to the continuous change of distance between the subject and the position of the recording system. Using Bluetooth technology, vocalizations are captured and transmitted wirelessly into a receiving system without affecting the quality of the signal. The recordings of the proposed system were compared to those obtained as a reference, which were based on the coding of the signal with the so-called pulse-code modulation technique in WAV audio format without any compressing process. The evaluation showed p < .05 for the measured quantitative and qualitative parameters. We also describe how the transmitting system is encapsulated and fixed on the animal and a way to video record a spider monkey's behavior simultaneously with the audio recordings.

  4. The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music.

    PubMed

    Aucouturier, Jean-Julien; Defreville, Boris; Pachet, François

    2007-08-01

    The "bag-of-frames" approach (BOF) to audio pattern recognition represents signals as the long-term statistical distribution of their local spectral features. This approach has proved nearly optimal for simulating the auditory perception of natural and human environments (or soundscapes), and is also the most predominent paradigm to extract high-level descriptions from music signals. However, recent studies show that, contrary to its application to soundscape signals, BOF only provides limited performance when applied to polyphonic music signals. This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach. First, the application of the same measure of acoustic similarity on both soundscape and music data sets confirms that the BOF approach can model soundscapes to near-perfect precision, and exhibits none of the limitations observed in the music data set. Second, the modification of this measure by two custom homogeneity transforms reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal. Such differences may explain the uneven performance of BOF algorithms on soundscapes and music signals, and suggest that their human perception rely on cognitive processes of a different nature.

  5. Subjective audio quality evaluation of embedded-optimization-based distortion precompensation algorithms.

    PubMed

    Defraene, Bruno; van Waterschoot, Toon; Diehl, Moritz; Moonen, Marc

    2016-07-01

    Subjective audio quality evaluation experiments have been conducted to assess the performance of embedded-optimization-based precompensation algorithms for mitigating perceptible linear and nonlinear distortion in audio signals. It is concluded with statistical significance that the perceived audio quality is improved by applying an embedded-optimization-based precompensation algorithm, both in case (i) nonlinear distortion and (ii) a combination of linear and nonlinear distortion is present. Moreover, a significant positive correlation is reported between the collected subjective and objective PEAQ audio quality scores, supporting the validity of using PEAQ to predict the impact of linear and nonlinear distortion on the perceived audio quality.

  6. Highlight summarization in golf videos using audio signals

    NASA Astrophysics Data System (ADS)

    Kim, Hyoung-Gook; Kim, Jin Young

    2008-01-01

    In this paper, we present an automatic summarization of highlights in golf videos based on audio information alone without video information. The proposed highlight summarization system is carried out based on semantic audio segmentation and detection on action units from audio signals. Studio speech, field speech, music, and applause are segmented by means of sound classification. Swing is detected by the methods of impulse onset detection. Sounds like swing and applause form a complete action unit, while studio speech and music parts are used to anchor the program structure. With the advantage of highly precise detection of applause, highlights are extracted effectively. Our experimental results obtain high classification precision on 18 golf games. It proves that the proposed system is very effective and computationally efficient to apply the technology to embedded consumer electronic devices.

  7. Assessment of vocal cord nodules: a case study in speech processing by using Hilbert-Huang Transform

    NASA Astrophysics Data System (ADS)

    Civera, M.; Filosi, C. M.; Pugno, N. M.; Silvestrini, M.; Surace, C.; Worden, K.

    2017-05-01

    Vocal cord nodules represent a pathological condition for which the growth of unnatural masses on vocal folds affects the patients. Among other effects, changes in the vocal cords’ overall mass and stiffness alter their vibratory behaviour, thus changing the vocal emission generated by them. This causes dysphonia, i.e. abnormalities in the patients’ voice, which can be analysed and inspected via audio signals. However, the evaluation of voice condition through speech processing is not a trivial task, as standard methods based on the Fourier Transform, fail to fit the non-stationary nature of vocal signals. In this study, four audio tracks, provided by a volunteer patient, whose vocal fold nodules have been surgically removed, were analysed using a relatively new technique: the Hilbert-Huang Transform (HHT) via Empirical Mode Decomposition (EMD); specifically, by using the CEEMDAN (Complete Ensemble EMD with Adaptive Noise) algorithm. This method has been applied here to speech signals, which were recorded before removal surgery and during convalescence, to investigate specific trends. Possibilities offered by the HHT are exposed, but also some limitations of decomposing the signals into so-called intrinsic mode functions (IMFs) are highlighted. The results of these preliminary studies are intended to be a basis for the development of new viable alternatives to the softwares currently used for the analysis and evaluation of pathological voice.

  8. A Robust Zero-Watermarking Algorithm for Audio

    NASA Astrophysics Data System (ADS)

    Chen, Ning; Zhu, Jie

    2007-12-01

    In traditional watermarking algorithms, the insertion of watermark into the host signal inevitably introduces some perceptible quality degradation. Another problem is the inherent conflict between imperceptibility and robustness. Zero-watermarking technique can solve these problems successfully. Instead of embedding watermark, the zero-watermarking technique extracts some essential characteristics from the host signal and uses them for watermark detection. However, most of the available zero-watermarking schemes are designed for still image and their robustness is not satisfactory. In this paper, an efficient and robust zero-watermarking technique for audio signal is presented. The multiresolution characteristic of discrete wavelet transform (DWT), the energy compression characteristic of discrete cosine transform (DCT), and the Gaussian noise suppression property of higher-order cumulant are combined to extract essential features from the host audio signal and they are then used for watermark recovery. Simulation results demonstrate the effectiveness of our scheme in terms of inaudibility, detection reliability, and robustness.

  9. A third-order class-D amplifier with and without ripple compensation

    NASA Astrophysics Data System (ADS)

    Cox, Stephen M.; du Toit Mouton, H.

    2018-06-01

    We analyse the nonlinear behaviour of a third-order class-D amplifier, and demonstrate the remarkable effectiveness of the recently introduced ripple compensation (RC) technique in reducing the audio distortion of the device. The amplifier converts an input audio signal to a high-frequency train of rectangular pulses, whose widths are modulated according to the input signal (pulse-width modulation) and employs negative feedback. After determining the steady-state operating point for constant input and calculating its stability, we derive a small-signal model (SSM), which yields in closed form the transfer function relating (infinitesimal) input and output disturbances. This SSM shows how the RC technique is able to linearise the small-signal response of the device. We extend this SSM through a fully nonlinear perturbation calculation of the dynamics of the amplifier, based on the disparity in time scales between the pulse train and the audio signal. We obtain the nonlinear response of the amplifier to a general audio signal, avoiding the linearisation inherent in the SSM; we thereby more precisely quantify the reduction in distortion achieved through RC. Finally, simulations corroborate our theoretical predictions and illustrate the dramatic deterioration in performance that occurs when the amplifier is operated in an unstable regime. The perturbation calculation is rather general, and may be adapted to quantify the way in which other nonlinear negative-feedback pulse-modulated devices track a time-varying input signal that slowly modulates the system parameters.

  10. 47 CFR 15.115 - TV interface devices, including cable system terminal devices.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... times the square root of (R) for the video signal and 155 times the square root of (R) for the audio... and 77.5 times the square root of (R) for the audio signal. (2) At any RF output terminal, the maximum... video cassette recorders continue to be subject to the provisions for general TV interface devices. (c...

  11. 47 CFR 15.115 - TV interface devices, including cable system terminal devices.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... times the square root of (R) for the video signal and 155 times the square root of (R) for the audio... and 77.5 times the square root of (R) for the audio signal. (2) At any RF output terminal, the maximum... video cassette recorders continue to be subject to the provisions for general TV interface devices. (c...

  12. 47 CFR 15.115 - TV interface devices, including cable system terminal devices.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... times the square root of (R) for the video signal and 155 times the square root of (R) for the audio... and 77.5 times the square root of (R) for the audio signal. (2) At any RF output terminal, the maximum... video cassette recorders continue to be subject to the provisions for general TV interface devices. (c...

  13. Investigation of signal models and methods for evaluating structures of processing telecommunication information exchange systems under acoustic noise conditions

    NASA Astrophysics Data System (ADS)

    Kropotov, Y. A.; Belov, A. A.; Proskuryakov, A. Y.; Kolpakov, A. A.

    2018-05-01

    The paper considers models and methods for estimating signals during the transmission of information messages in telecommunication systems of audio exchange. One-dimensional probability distribution functions that can be used to isolate useful signals, and acoustic noise interference are presented. An approach to the estimation of the correlation and spectral functions of the parameters of acoustic signals is proposed, based on the parametric representation of acoustic signals and the components of the noise components. The paper suggests an approach to improving the efficiency of interference cancellation and highlighting the necessary information when processing signals from telecommunications systems. In this case, the suppression of acoustic noise is based on the methods of adaptive filtering and adaptive compensation. The work also describes the models of echo signals and the structure of subscriber devices in operational command telecommunications systems.

  14. TV audio and video on the same channel

    NASA Technical Reports Server (NTRS)

    Hopkins, J. B.

    1979-01-01

    Transmitting technique adds audio to video signal during vertical blanking interval. SIVI (signal in the vertical interval) is used by TV networks and stations to transmit cuing and automatic-switching tone signals to augment automatic and manual operations. It can also be used to transmit one-way instructional information, such as bulletin alerts, program changes, and commercial-cutaway aural cues from the networks to affiliates. Additonally, it can be used as extra sound channel for second-language transmission to biligual stations.

  15. Involvement of Right STS in Audio-Visual Integration for Affective Speech Demonstrated Using MEG

    PubMed Central

    Hagan, Cindy C.; Woods, Will; Johnson, Sam; Green, Gary G. R.; Young, Andrew W.

    2013-01-01

    Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals. PMID:23950977

  16. Involvement of right STS in audio-visual integration for affective speech demonstrated using MEG.

    PubMed

    Hagan, Cindy C; Woods, Will; Johnson, Sam; Green, Gary G R; Young, Andrew W

    2013-01-01

    Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals.

  17. Speech information retrieval: a review

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hafen, Ryan P.; Henry, Michael J.

    Audio is an information-rich component of multimedia. Information can be extracted from audio in a number of different ways, and thus there are several established audio signal analysis research fields. These fields include speech recognition, speaker recognition, audio segmentation and classification, and audio finger-printing. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major audio analysis fields. The goal is to introduce enough back-ground for someone new in the field to quickly gain high-level understanding andmore » to provide direction for further study.« less

  18. Digital Audio Radio Broadcast Systems Laboratory Testing Nearly Complete

    NASA Technical Reports Server (NTRS)

    2005-01-01

    Radio history continues to be made at the NASA Lewis Research Center with the completion of phase one of the digital audio radio (DAR) testing conducted by the Consumer Electronics Group of the Electronic Industries Association. This satellite, satellite/terrestrial, and terrestrial digital technology will open up new audio broadcasting opportunities both domestically and worldwide. It will significantly improve the current quality of amplitude-modulated/frequency-modulated (AM/FM) radio with a new digitally modulated radio signal and will introduce true compact-disc-quality (CD-quality) sound for the first time. Lewis is hosting the laboratory testing of seven proposed digital audio radio systems and modes. Two of the proposed systems operate in two modes each, making a total of nine systems being tested. The nine systems are divided into the following types of transmission: in-band on-channel (IBOC), in-band adjacent-channel (IBAC), and new bands. The laboratory testing was conducted by the Consumer Electronics Group of the Electronic Industries Association. Subjective assessments of the audio recordings for each of the nine systems was conducted by the Communications Research Center in Ottawa, Canada, under contract to the Electronic Industries Association. The Communications Research Center has the only CCIR-qualified (Consultative Committee for International Radio) audio testing facility in North America. The main goals of the U.S. testing process are to (1) provide technical data to the Federal Communication Commission (FCC) so that it can establish a standard for digital audio receivers and transmitters and (2) provide the receiver and transmitter industries with the proper standards upon which to build their equipment. In addition, the data will be forwarded to the International Telecommunications Union to help in the establishment of international standards for digital audio receivers and transmitters, thus allowing U.S. manufacturers to compete in the world market.

  19. When they listen and when they watch: Pianists’ use of nonverbal audio and visual cues during duet performance

    PubMed Central

    Goebl, Werner

    2015-01-01

    Nonverbal auditory and visual communication helps ensemble musicians predict each other’s intentions and coordinate their actions. When structural characteristics of the music make predicting co-performers’ intentions difficult (e.g., following long pauses or during ritardandi), reliance on incoming auditory and visual signals may change. This study tested whether attention to visual cues during piano–piano and piano–violin duet performance increases in such situations. Pianists performed the secondo part to three duets, synchronizing with recordings of violinists or pianists playing the primo parts. Secondos’ access to incoming audio and visual signals and to their own auditory feedback was manipulated. Synchronization was most successful when primo audio was available, deteriorating when primo audio was removed and only cues from primo visual signals were available. Visual cues were used effectively following long pauses in the music, however, even in the absence of primo audio. Synchronization was unaffected by the removal of secondos’ own auditory feedback. Differences were observed in how successfully piano–piano and piano–violin duos synchronized, but these effects of instrument pairing were not consistent across pieces. Pianists’ success at synchronizing with violinists and other pianists is likely moderated by piece characteristics and individual differences in the clarity of cueing gestures used. PMID:26279610

  20. Media/Device Configurations for Platoon Leader Tactical Training

    DTIC Science & Technology

    1985-02-01

    munication and visual communication sig- na ls, VInputs to the The device should simulate the real- Platoon Leader time receipt of all tactical voice...communication, audio and visual battle- field cues, and visual communication signals. 14- Table 4 (Continued) Functional Capability Categories and...battlefield cues, and visual communication signals. 0.8 Receipt of limited tactical voice communication, plus audio and visual battlefield cues, and visual

  1. Incorporating Auditory Models in Speech/Audio Applications

    NASA Astrophysics Data System (ADS)

    Krishnamoorthi, Harish

    2011-12-01

    Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.

  2. Creative reflections-the strategic use of reflections in multitrack music production

    NASA Astrophysics Data System (ADS)

    Case, Alexander

    2005-09-01

    There is a long tradition of deliberately capturing and even synthesizing early reflections to enhance the music intended for loudspeaker playback. The desire to improve or at least alter the quality, audibility, intelligibility, stereo width, and/or uniqueness of the audio signal guides the recording engineer's use of the recording space, influences their microphone selection and placement, and inspires countless signal-processing approaches. This paper reviews contemporary multitrack production techniques that specifically take advantage of reflected sound energy for musical benefit.

  3. Human Language Technology: Opportunities and Challenges

    DTIC Science & Technology

    2005-01-01

    because of the connections to and reliance on signal processing. Audio diarization critically includes indexing of speakers [12], since speaker ...to reduce inter- speaker variability in training. Standard techniques include vocal-tract length normalization, adaptation of acoustic models using...maximum likelihood linear regression (MLLR), and speaker -adaptive training based on MLLR. The acoustic models are mixtures of Gaussians, typically with

  4. MASTERS: A Virtual Lab on Multimedia Systems for Telecommunications, Medical, and Remote Sensing Applications

    ERIC Educational Resources Information Center

    Alexiadis, D. S.; Mitianoudis, N.

    2013-01-01

    Digital signal processing (DSP) has been an integral part of most electrical, electronic, and computer engineering curricula. The applications of DSP in multimedia (audio, image, video) storage, transmission, and analysis are also widely taught at both the undergraduate and post-graduate levels, as digital multimedia can be encountered in most…

  5. Fiber optic multiplex optical transmission system

    NASA Technical Reports Server (NTRS)

    Bell, C. H. (Inventor)

    1977-01-01

    A multiplex optical transmission system which minimizes external interference while simultaneously receiving and transmitting video, digital data, and audio signals is described. Signals are received into subgroup mixers for blocking into respective frequency ranges. The outputs of these mixers are in turn fed to a master mixer which produces a composite electrical signal. An optical transmitter connected to the master mixer converts the composite signal into an optical signal and transmits it over a fiber optic cable to an optical receiver which receives the signal and converts it back to a composite electrical signal. A de-multiplexer is coupled to the output of the receiver for separating the composite signal back into composite video, digital data, and audio signals. A programmable optic patch board is interposed in the fiber optic cables for selectively connecting the optical signals to various receivers and transmitters.

  6. Combinatorial Markov Random Fields and Their Applications to Information Organization

    DTIC Science & Technology

    2008-02-01

    titles, part-of- speech tags; • Image processing: images, colors, texture, blobs, interest points, caption words; • Video processing: video signal, audio...McGurk and MacDonald published their pioneering work [80] that revealed the multi-modal nature of speech perception: sound and moving lips compose one... Speech (POS) n-grams (that correspond to the syntactic structure of text). POS n-grams are extracted from sentences in an incremental manner: the first n

  7. Aerospace Applications Conference, Steamboat Springs, CO, Feb. 1-8, 1986, Digest

    NASA Astrophysics Data System (ADS)

    The present conference considers topics concerning the projected NASA Space Station's systems, digital signal and data processing applications, and space science and microwave applications. Attention is given to Space Station video and audio subsystems design, clock error, jitter, phase error and differential time-of-arrival in satellite communications, automation and robotics in space applications, target insertion into synthetic background scenes, and a novel scheme for the computation of the discrete Fourier transform on a systolic processor. Also discussed are a novel signal parameter measurement system employing digital signal processing, EEPROMS for spacecraft applications, a unique concurrent processor architecture for high speed simulation of dynamic systems, a dual polarization flat plate antenna, Fresnel diffraction, and ultralinear TWTs for high efficiency satellite communications.

  8. 47 CFR 80.74 - Public coast station facilities for a telephony busy signal.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ...), must consist of the transmission of a single audio frequency regularly interrupted, as follows: (a) Audio frequency. Not less than 100 nor more than 1100 Hertz, provided the frequency used for this...

  9. 47 CFR 80.74 - Public coast station facilities for a telephony busy signal.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ...), must consist of the transmission of a single audio frequency regularly interrupted, as follows: (a) Audio frequency. Not less than 100 nor more than 1100 Hertz, provided the frequency used for this...

  10. 47 CFR 80.74 - Public coast station facilities for a telephony busy signal.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ...), must consist of the transmission of a single audio frequency regularly interrupted, as follows: (a) Audio frequency. Not less than 100 nor more than 1100 Hertz, provided the frequency used for this...

  11. 47 CFR 80.74 - Public coast station facilities for a telephony busy signal.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ...), must consist of the transmission of a single audio frequency regularly interrupted, as follows: (a) Audio frequency. Not less than 100 nor more than 1100 Hertz, provided the frequency used for this...

  12. 47 CFR 80.74 - Public coast station facilities for a telephony busy signal.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ...), must consist of the transmission of a single audio frequency regularly interrupted, as follows: (a) Audio frequency. Not less than 100 nor more than 1100 Hertz, provided the frequency used for this...

  13. Collusion-Resistant Audio Fingerprinting System in the Modulated Complex Lapped Transform Domain

    PubMed Central

    Garcia-Hernandez, Jose Juan; Feregrino-Uribe, Claudia; Cumplido, Rene

    2013-01-01

    Collusion-resistant fingerprinting paradigm seems to be a practical solution to the piracy problem as it allows media owners to detect any unauthorized copy and trace it back to the dishonest users. Despite the billionaire losses in the music industry, most of the collusion-resistant fingerprinting systems are devoted to digital images and very few to audio signals. In this paper, state-of-the-art collusion-resistant fingerprinting ideas are extended to audio signals and the corresponding parameters and operation conditions are proposed. Moreover, in order to carry out fingerprint detection using just a fraction of the pirate audio clip, block-based embedding and its corresponding detector is proposed. Extensive simulations show the robustness of the proposed system against average collusion attack. Moreover, by using an efficient Fast Fourier Transform core and standard computer machines it is shown that the proposed system is suitable for real-world scenarios. PMID:23762455

  14. 47 CFR 73.757 - System specifications for single-sideband (SSB) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... emission is one giving the same audio-frequency signal-to-noise ratio at the receiver output as the... is equally valid for both DSB and SSB emissions. (3) Audio-frequency band. The upper limit of the audio-frequency band (at—3 dB) of the transmitter shall not exceed 4.5 kHz with a further slope of...

  15. 47 CFR 73.757 - System specifications for single-sideband (SSB) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... emission is one giving the same audio-frequency signal-to-noise ratio at the receiver output as the... is equally valid for both DSB and SSB emissions. (3) Audio-frequency band. The upper limit of the audio-frequency band (at—3 dB) of the transmitter shall not exceed 4.5 kHz with a further slope of...

  16. 47 CFR 73.757 - System specifications for single-sideband (SSB) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... emission is one giving the same audio-frequency signal-to-noise ratio at the receiver output as the... is equally valid for both DSB and SSB emissions. (3) Audio-frequency band. The upper limit of the audio-frequency band (at—3 dB) of the transmitter shall not exceed 4.5 kHz with a further slope of...

  17. Multiresolution analysis (discrete wavelet transform) through Daubechies family for emotion recognition in speech.

    NASA Astrophysics Data System (ADS)

    Campo, D.; Quintero, O. L.; Bastidas, M.

    2016-04-01

    We propose a study of the mathematical properties of voice as an audio signal. This work includes signals in which the channel conditions are not ideal for emotion recognition. Multiresolution analysis- discrete wavelet transform - was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states. ANNs proved to be a system that allows an appropriate classification of such states. This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features. Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify.

  18. Judging the similarity of soundscapes does not require categorization: evidence from spliced stimuli.

    PubMed

    Aucouturier, Jean-Julien; Defreville, Boris

    2009-04-01

    This study uses an audio signal transformation, splicing, to create an experimental situation where human listeners judge the similarity of audio signals, which they cannot easily categorize. Splicing works by segmenting audio signals into 50-ms frames, then shuffling and concatenating these frames back in random order. Splicing a signal masks the identification of the categories that it normally elicits: For instance, human participants cannot easily identify the sound of cars in a spliced recording of a city street. This study compares human performance on both normal and spliced recordings of soundscapes and music. Splicing is found to degrade human similarity performance significantly less for soundscapes than for music: When two spliced soundscapes are judged similar to one another, the original recordings also tend to sound similar. This establishes that humans are capable of reconstructing consistent similarity relations between soundscapes without relying much on the identification of the natural categories associated with such signals, such as their constituent sound sources. This finding contradicts previous literature and points to new ways to conceptualize the different ways in which humans perceive soundscapes and music.

  19. Noise-Canceling Helmet Audio System

    NASA Technical Reports Server (NTRS)

    Seibert, Marc A.; Culotta, Anthony J.

    2007-01-01

    A prototype helmet audio system has been developed to improve voice communication for the wearer in a noisy environment. The system was originally intended to be used in a space suit, wherein noise generated by airflow of the spacesuit life-support system can make it difficult for remote listeners to understand the astronaut s speech and can interfere with the astronaut s attempt to issue vocal commands to a voice-controlled robot. The system could be adapted to terrestrial use in helmets of protective suits that are typically worn in noisy settings: examples include biohazard, fire, rescue, and diving suits. The system (see figure) includes an array of microphones and small loudspeakers mounted at fixed positions in a helmet, amplifiers and signal-routing circuitry, and a commercial digital signal processor (DSP). Notwithstanding the fixed positions of the microphones and loudspeakers, the system can accommodate itself to any normal motion of the wearer s head within the helmet. The system operates in conjunction with a radio transceiver. An audio signal arriving via the transceiver intended to be heard by the wearer is adjusted in volume and otherwise conditioned and sent to the loudspeakers. The wearer s speech is collected by the microphones, the outputs of which are logically combined (phased) so as to form a microphone- array directional sensitivity pattern that discriminates in favor of sounds coming from vicinity of the wearer s mouth and against sounds coming from elsewhere. In the DSP, digitized samples of the microphone outputs are processed to filter out airflow noise and to eliminate feedback from the loudspeakers to the microphones. The resulting conditioned version of the wearer s speech signal is sent to the transceiver.

  20. Aggressive Bimodal Communication in Domestic Dogs, Canis familiaris.

    PubMed

    Déaux, Éloïse C; Clarke, Jennifer A; Charrier, Isabelle

    2015-01-01

    Evidence of animal multimodal signalling is widespread and compelling. Dogs' aggressive vocalisations (growls and barks) have been extensively studied, but without any consideration of the simultaneously produced visual displays. In this study we aimed to categorize dogs' bimodal aggressive signals according to the redundant/non-redundant classification framework. We presented dogs with unimodal (audio or visual) or bimodal (audio-visual) stimuli and measured their gazing and motor behaviours. Responses did not qualitatively differ between the bimodal and two unimodal contexts, indicating that acoustic and visual signals provide redundant information. We could not further classify the signal as 'equivalent' or 'enhancing' as we found evidence for both subcategories. We discuss our findings in relation to the complex signal framework, and propose several hypotheses for this signal's function.

  1. Multimodal fusion of polynomial classifiers for automatic person recgonition

    NASA Astrophysics Data System (ADS)

    Broun, Charles C.; Zhang, Xiaozheng

    2001-03-01

    With the prevalence of the information age, privacy and personalization are forefront in today's society. As such, biometrics are viewed as essential components of current evolving technological systems. Consumers demand unobtrusive and non-invasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals in all environments without encumbering the user with a head- mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multi modal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality-giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within Markov random field MRF framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late integration approach, based on a probabilistic model, is employed to combine the two modalities. The system is tested on the XM2VTS database combined with AWGN in the audio domain over a range of signal-to-noise ratios.

  2. EEG Recording and Online Signal Processing on Android: A Multiapp Framework for Brain-Computer Interfaces on Smartphone

    PubMed Central

    Debener, Stefan; Emkes, Reiner; Volkening, Nils; Fudickar, Sebastian; Bleichner, Martin G.

    2017-01-01

    Objective Our aim was the development and validation of a modular signal processing and classification application enabling online electroencephalography (EEG) signal processing on off-the-shelf mobile Android devices. The software application SCALA (Signal ProCessing and CLassification on Android) supports a standardized communication interface to exchange information with external software and hardware. Approach In order to implement a closed-loop brain-computer interface (BCI) on the smartphone, we used a multiapp framework, which integrates applications for stimulus presentation, data acquisition, data processing, classification, and delivery of feedback to the user. Main Results We have implemented the open source signal processing application SCALA. We present timing test results supporting sufficient temporal precision of audio events. We also validate SCALA with a well-established auditory selective attention paradigm and report above chance level classification results for all participants. Regarding the 24-channel EEG signal quality, evaluation results confirm typical sound onset auditory evoked potentials as well as cognitive event-related potentials that differentiate between correct and incorrect task performance feedback. Significance We present a fully smartphone-operated, modular closed-loop BCI system that can be combined with different EEG amplifiers and can easily implement other paradigms. PMID:29349070

  3. EEG Recording and Online Signal Processing on Android: A Multiapp Framework for Brain-Computer Interfaces on Smartphone.

    PubMed

    Blum, Sarah; Debener, Stefan; Emkes, Reiner; Volkening, Nils; Fudickar, Sebastian; Bleichner, Martin G

    2017-01-01

    Our aim was the development and validation of a modular signal processing and classification application enabling online electroencephalography (EEG) signal processing on off-the-shelf mobile Android devices. The software application SCALA (Signal ProCessing and CLassification on Android) supports a standardized communication interface to exchange information with external software and hardware. In order to implement a closed-loop brain-computer interface (BCI) on the smartphone, we used a multiapp framework, which integrates applications for stimulus presentation, data acquisition, data processing, classification, and delivery of feedback to the user. We have implemented the open source signal processing application SCALA. We present timing test results supporting sufficient temporal precision of audio events. We also validate SCALA with a well-established auditory selective attention paradigm and report above chance level classification results for all participants. Regarding the 24-channel EEG signal quality, evaluation results confirm typical sound onset auditory evoked potentials as well as cognitive event-related potentials that differentiate between correct and incorrect task performance feedback. We present a fully smartphone-operated, modular closed-loop BCI system that can be combined with different EEG amplifiers and can easily implement other paradigms.

  4. Development and testing of an audio forensic software for enhancing speech signals masked by loud music

    NASA Astrophysics Data System (ADS)

    Dobre, Robert A.; Negrescu, Cristian; Stanomir, Dumitru

    2016-12-01

    In many situations audio recordings can decide the fate of a trial when accepted as evidence. But until they can be taken into account they must be authenticated at first, but also the quality of the targeted content (speech in most cases) must be good enough to remove any doubt. In this scope two main directions of multimedia forensics come into play: content authentication and noise reduction. This paper presents an application that is included in the latter. If someone would like to conceal their conversation, the easiest way to do it would be to turn loud the nearest audio system. In this situation, if a microphone was placed close by, the recorded signal would be apparently useless because the speech signal would be masked by the loud music signal. The paper proposes an adaptive filters based solution to remove the musical content from a previously described signal mixture in order to recover the masked vocal signal. Two adaptive filtering algorithms were tested in the proposed solution: the Normalised Least Mean Squares (NLMS) and Recursive Least Squares (RLS). Their performances in the described situation were evaluated using Simulink, compared and included in the paper.

  5. Snore related signals processing in a private cloud computing system.

    PubMed

    Qian, Kun; Guo, Jian; Xu, Huijie; Zhu, Zhaomeng; Zhang, Gongxuan

    2014-09-01

    Snore related signals (SRS) have been demonstrated to carry important information about the obstruction site and degree in the upper airway of Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) patients in recent years. To make this acoustic signal analysis method more accurate and robust, big SRS data processing is inevitable. As an emerging concept and technology, cloud computing has motivated numerous researchers and engineers to exploit applications both in academic and industry field, which could have an ability to implement a huge blue print in biomedical engineering. Considering the security and transferring requirement of biomedical data, we designed a system based on private cloud computing to process SRS. Then we set the comparable experiments of processing a 5-hour audio recording of an OSAHS patient by a personal computer, a server and a private cloud computing system to demonstrate the efficiency of the infrastructure we proposed.

  6. Audio-Tactile Integration in Congenitally and Late Deaf Cochlear Implant Users

    PubMed Central

    Nava, Elena; Bottari, Davide; Villwock, Agnes; Fengler, Ineke; Büchner, Andreas; Lenarz, Thomas; Röder, Brigitte

    2014-01-01

    Several studies conducted in mammals and humans have shown that multisensory processing may be impaired following congenital sensory loss and in particular if no experience is achieved within specific early developmental time windows known as sensitive periods. In this study we investigated whether basic multisensory abilities are impaired in hearing-restored individuals with deafness acquired at different stages of development. To this aim, we tested congenitally and late deaf cochlear implant (CI) recipients, age-matched with two groups of hearing controls, on an audio-tactile redundancy paradigm, in which reaction times to unimodal and crossmodal redundant signals were measured. Our results showed that both congenitally and late deaf CI recipients were able to integrate audio-tactile stimuli, suggesting that congenital and acquired deafness does not prevent the development and recovery of basic multisensory processing. However, we found that congenitally deaf CI recipients had a lower multisensory gain compared to their matched controls, which may be explained by their faster responses to tactile stimuli. We discuss this finding in the context of reorganisation of the sensory systems following sensory loss and the possibility that these changes cannot be “rewired” through auditory reafferentation. PMID:24918766

  7. Audio-tactile integration in congenitally and late deaf cochlear implant users.

    PubMed

    Nava, Elena; Bottari, Davide; Villwock, Agnes; Fengler, Ineke; Büchner, Andreas; Lenarz, Thomas; Röder, Brigitte

    2014-01-01

    Several studies conducted in mammals and humans have shown that multisensory processing may be impaired following congenital sensory loss and in particular if no experience is achieved within specific early developmental time windows known as sensitive periods. In this study we investigated whether basic multisensory abilities are impaired in hearing-restored individuals with deafness acquired at different stages of development. To this aim, we tested congenitally and late deaf cochlear implant (CI) recipients, age-matched with two groups of hearing controls, on an audio-tactile redundancy paradigm, in which reaction times to unimodal and crossmodal redundant signals were measured. Our results showed that both congenitally and late deaf CI recipients were able to integrate audio-tactile stimuli, suggesting that congenital and acquired deafness does not prevent the development and recovery of basic multisensory processing. However, we found that congenitally deaf CI recipients had a lower multisensory gain compared to their matched controls, which may be explained by their faster responses to tactile stimuli. We discuss this finding in the context of reorganisation of the sensory systems following sensory loss and the possibility that these changes cannot be "rewired" through auditory reafferentation.

  8. Computational Analysis and Simulation of Empathic Behaviors: a Survey of Empathy Modeling with Behavioral Signal Processing Framework.

    PubMed

    Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis; Atkins, David C; Narayanan, Shrikanth S

    2016-05-01

    Empathy is an important psychological process that facilitates human communication and interaction. Enhancement of empathy has profound significance in a range of applications. In this paper, we review emerging directions of research on computational analysis of empathy expression and perception as well as empathic interactions, including their simulation. We summarize the work on empathic expression analysis by the targeted signal modalities (e.g., text, audio, and facial expressions). We categorize empathy simulation studies into theory-based emotion space modeling or application-driven user and context modeling. We summarize challenges in computational study of empathy including conceptual framing and understanding of empathy, data availability, appropriate use and validation of machine learning techniques, and behavior signal processing. Finally, we propose a unified view of empathy computation and offer a series of open problems for future research.

  9. Speech watermarking: an approach for the forensic analysis of digital telephonic recordings.

    PubMed

    Faundez-Zanuy, Marcos; Lucena-Molina, Jose J; Hagmüller, Martin

    2010-07-01

    In this article, the authors discuss the problem of forensic authentication of digital audio recordings. Although forensic audio has been addressed in several articles, the existing approaches are focused on analog magnetic recordings, which are less prevalent because of the large amount of digital recorders available on the market (optical, solid state, hard disks, etc.). An approach based on digital signal processing that consists of spread spectrum techniques for speech watermarking is presented. This approach presents the advantage that the authentication is based on the signal itself rather than the recording format. Thus, it is valid for usual recording devices in police-controlled telephone intercepts. In addition, our proposal allows for the introduction of relevant information such as the recording date and time and all the relevant data (this is not always possible with classical systems). Our experimental results reveal that the speech watermarking procedure does not interfere in a significant way with the posterior forensic speaker identification.

  10. Direct broadcast satellite-radio market, legal, regulatory, and business considerations

    NASA Technical Reports Server (NTRS)

    Sood, Des R.

    1991-01-01

    A Direct Broadcast Satellite-Radio (DBS-R) System offers the prospect of delivering high quality audio broadcasts to large audiences at costs lower than or comparable to those incurred using the current means of broadcasting. The maturation of mobile communications technologies, and advances in microelectronics and digital signal processing now make it possible to bring this technology to the marketplace. Heightened consumer interest in improved audio quality coupled with the technological and economic feasibility of meeting this demand via DBS-R make it opportune to start planning for implementation of DBS-R Systems. NASA-Lewis and the Voice of America as part of their on-going efforts to improve the quality of international audio broadcasts, have undertaken a number of tasks to more clearly define the technical, marketing, organizational, legal, and regulatory issues underlying implementation of DBS-R Systems. The results and an assessment is presented of the business considerations underlying the construction, launch, and operation of DBS-R Systems.

  11. An Efficient Audio Watermarking Algorithm in Frequency Domain for Copyright Protection

    NASA Astrophysics Data System (ADS)

    Dhar, Pranab Kumar; Khan, Mohammad Ibrahim; Kim, Cheol-Hong; Kim, Jong-Myon

    Digital Watermarking plays an important role for copyright protection of multimedia data. This paper proposes a new watermarking system in frequency domain for copyright protection of digital audio. In our proposed watermarking system, the original audio is segmented into non-overlapping frames. Watermarks are then embedded into the selected prominent peaks in the magnitude spectrum of each frame. Watermarks are extracted by performing the inverse operation of watermark embedding process. Simulation results indicate that the proposed watermarking system is highly robust against various kinds of attacks such as noise addition, cropping, re-sampling, re-quantization, MP3 compression, and low-pass filtering. Our proposed watermarking system outperforms Cox's method in terms of imperceptibility, while keeping comparable robustness with the Cox's method. Our proposed system achieves SNR (signal-to-noise ratio) values ranging from 20 dB to 28 dB, in contrast to Cox's method which achieves SNR values ranging from only 14 dB to 23 dB.

  12. Direct broadcast satellite-radio market, legal, regulatory, and business considerations

    NASA Astrophysics Data System (ADS)

    Sood, Des R.

    1991-03-01

    A Direct Broadcast Satellite-Radio (DBS-R) System offers the prospect of delivering high quality audio broadcasts to large audiences at costs lower than or comparable to those incurred using the current means of broadcasting. The maturation of mobile communications technologies, and advances in microelectronics and digital signal processing now make it possible to bring this technology to the marketplace. Heightened consumer interest in improved audio quality coupled with the technological and economic feasibility of meeting this demand via DBS-R make it opportune to start planning for implementation of DBS-R Systems. NASA-Lewis and the Voice of America as part of their on-going efforts to improve the quality of international audio broadcasts, have undertaken a number of tasks to more clearly define the technical, marketing, organizational, legal, and regulatory issues underlying implementation of DBS-R Systems. The results and an assessment is presented of the business considerations underlying the construction, launch, and operation of DBS-R Systems.

  13. Effect of tape recording on perturbation measures.

    PubMed

    Jiang, J; Lin, E; Hanson, D G

    1998-10-01

    Tape recorders have been shown to affect measures of voice perturbation. Few studies, however, have been conducted to quantitatively justify the use or exclusion of certain types of recorders in voice perturbation studies. This study used sinusoidal and triangular waves and synthesized vowels to compare perturbation measures extracted from directly digitized signals with those recorded and played back through various tape recorders, including 3 models of digital audio tape recorders, 2 models of analog audio cassette tape recorders, and 2 models of video tape recorders. Signal contamination for frequency perturbation values was found to be consistently minimal with digital recorders (percent jitter = 0.01%-0.02%), mildly increased with video recorders (0.05%-0.10%), moderately increased with a high-quality analog audio cassette tape recorder (0.15%), and most prominent with a low-quality analog audio cassette tape recorder (0.24%). Recorder effect on amplitude perturbation measures was lowest in digital recorders (percent shimmer = 0.09%-0.20%), mildly to moderately increased in video recorders and a high-quality analog audio cassette tape recorder (0.25%-0.45%), and most prominent in a low-quality analog audio cassette tape recorder (0.98%). The effect of cassette tape material, length of spooled tape, and duration of analysis were also tested and are discussed.

  14. A real-time device for converting Doppler ultrasound audio signals into fluid flow velocity

    PubMed Central

    Hogeman, Cynthia S.; Koch, Dennis W.; Krishnan, Anandi; Momen, Afsana; Leuenberger, Urs A.

    2010-01-01

    A Doppler signal converter has been developed to facilitate cardiovascular and exercise physiology research. This device directly converts audio signals from a clinical Doppler ultrasound imaging system into a real-time analog signal that accurately represents blood flow velocity and is easily recorded by any standard data acquisition system. This real-time flow velocity signal, when simultaneously recorded with other physiological signals of interest, permits the observation of transient flow response to experimental interventions in a manner not possible when using standard Doppler imaging devices. This converted flow velocity signal also permits a more robust and less subjective analysis of data in a fraction of the time required by previous analytic methods. This signal converter provides this capability inexpensively and requires no modification of either the imaging or data acquisition system. PMID:20173048

  15. Instrumental Landing Using Audio Indication

    NASA Astrophysics Data System (ADS)

    Burlak, E. A.; Nabatchikov, A. M.; Korsun, O. N.

    2018-02-01

    The paper proposes an audio indication method for presenting to a pilot the information regarding the relative positions of an aircraft in the tasks of precision piloting. The implementation of the method is presented, the use of such parameters of audio signal as loudness, frequency and modulation are discussed. To confirm the operability of the audio indication channel the experiments using modern aircraft simulation facility were carried out. The simulated performed the instrument landing using the proposed audio method to indicate the aircraft deviations in relation to the slide path. The results proved compatible with the simulated instrumental landings using the traditional glidescope pointers. It inspires to develop the method in order to solve other precision piloting tasks.

  16. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    NASA Astrophysics Data System (ADS)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  17. Sinusoidal Analysis-Synthesis of Audio Using Perceptual Criteria

    NASA Astrophysics Data System (ADS)

    Painter, Ted; Spanias, Andreas

    2003-12-01

    This paper presents a new method for the selection of sinusoidal components for use in compact representations of narrowband audio. The method consists of ranking and selecting the most perceptually relevant sinusoids. The idea behind the method is to maximize the matching between the auditory excitation pattern associated with the original signal and the corresponding auditory excitation pattern associated with the modeled signal that is being represented by a small set of sinusoidal parameters. The proposed component-selection methodology is shown to outperform the maximum signal-to-mask ratio selection strategy in terms of subjective quality.

  18. Photo-acoustic and video-acoustic methods for sensing distant sound sources

    NASA Astrophysics Data System (ADS)

    Slater, Dan; Kozacik, Stephen; Kelmelis, Eric

    2017-05-01

    Long range telescopic video imagery of distant terrestrial scenes, aircraft, rockets and other aerospace vehicles can be a powerful observational tool. But what about the associated acoustic activity? A new technology, Remote Acoustic Sensing (RAS), may provide a method to remotely listen to the acoustic activity near these distant objects. Local acoustic activity sometimes weakly modulates the ambient illumination in a way that can be remotely sensed. RAS is a new type of microphone that separates an acoustic transducer into two spatially separated components: 1) a naturally formed in situ acousto-optic modulator (AOM) located within the distant scene and 2) a remote sensing readout device that recovers the distant audio. These two elements are passively coupled over long distances at the speed of light by naturally occurring ambient light energy or other electromagnetic fields. Stereophonic, multichannel and acoustic beam forming are all possible using RAS techniques and when combined with high-definition video imagery it can help to provide a more cinema like immersive viewing experience. A practical implementation of a remote acousto-optic readout device can be a challenging engineering problem. The acoustic influence on the optical signal is generally weak and often with a strong bias term. The optical signal is further degraded by atmospheric seeing turbulence. In this paper, we consider two fundamentally different optical readout approaches: 1) a low pixel count photodiode based RAS photoreceiver and 2) audio extraction directly from a video stream. Most of our RAS experiments to date have used the first method for reasons of performance and simplicity. But there are potential advantages to extracting audio directly from a video stream. These advantages include the straight forward ability to work with multiple AOMs (useful for acoustic beam forming), simpler optical configurations, and a potential ability to use certain preexisting video recordings. However, doing so requires overcoming significant limitations typically including much lower sample rates, reduced sensitivity and dynamic range, more expensive video hardware, and the need for sophisticated video processing. The ATCOM real time image processing software environment provides many of the needed capabilities for researching video-acoustic signal extraction. ATCOM currently is a powerful tool for the visual enhancement of atmospheric turbulence distorted telescopic views. In order to explore the potential of acoustic signal recovery from video imagery we modified ATCOM to extract audio waveforms from the same telescopic video sources. In this paper, we demonstrate and compare both readout techniques for several aerospace test scenarios to better show where each has advantages.

  19. Digital Audio Broadcasting in the Short Wave Bands

    NASA Technical Reports Server (NTRS)

    Vaisnys, Arvydas

    1998-01-01

    For many decades the Short Wae broadcasting service has used high power, double-sideband AM signals to reach audiences far and wide. While audio quality was usually not very high, inexpensive receivers could be used to tune into broadcasts fro distant countries.

  20. Harmonic Characteristics of Rectifier Substations and Their Impact on Audio Frequency Track Circuits

    DOT National Transportation Integrated Search

    1982-05-01

    This report describes the basic operation of substation rectifier equipment and the modes of possible interference with audio frequency track circuits used for train detection, cab signalling, and vehicle speed control. It also includes methods of es...

  1. AVS on satellite

    NASA Astrophysics Data System (ADS)

    Zhao, Haiwu; Wang, Guozhong; Hou, Gang

    2005-07-01

    AVS is a new digital audio-video coding standard established by China. AVS will be used in digital TV broadcasting and next general optical disk. AVS adopted many digital audio-video coding techniques developed by Chinese company and universities in recent years, it has very low complexity compared to H.264, and AVS will charge very low royalty fee through one-step license including all AVS tools. So AVS is a good and competitive candidate for Chinese DTV and next generation optical disk. In addition, Chinese government has published a plan for satellite TV signal directly to home(DTH) and a telecommunication satellite named as SINO 2 will be launched in 2006. AVS will be also one of the best hopeful candidates of audio-video coding standard on satellite signal transmission.

  2. [Ventriloquism and audio-visual integration of voice and face].

    PubMed

    Yokosawa, Kazuhiko; Kanaya, Shoko

    2012-07-01

    Presenting synchronous auditory and visual stimuli in separate locations creates the illusion that the sound originates from the direction of the visual stimulus. Participants' auditory localization bias, called the ventriloquism effect, has revealed factors affecting the perceptual integration of audio-visual stimuli. However, many studies on audio-visual processes have focused on performance in simplified experimental situations, with a single stimulus in each sensory modality. These results cannot necessarily explain our perceptual behavior in natural scenes, where various signals exist within a single sensory modality. In the present study we report the contributions of a cognitive factor, that is, the audio-visual congruency of speech, although this factor has often been underestimated in previous ventriloquism research. Thus, we investigated the contribution of speech congruency on the ventriloquism effect using a spoken utterance and two videos of a talking face. The salience of facial movements was also manipulated. As a result, when bilateral visual stimuli are presented in synchrony with a single voice, cross-modal speech congruency was found to have a significant impact on the ventriloquism effect. This result also indicated that more salient visual utterances attracted participants' auditory localization. The congruent pairing of audio-visual utterances elicited greater localization bias than did incongruent pairing, whereas previous studies have reported little dependency on the reality of stimuli in ventriloquism. Moreover, audio-visual illusory congruency, owing to the McGurk effect, caused substantial visual interference to auditory localization. This suggests that a greater flexibility in responding to multi-sensory environments exists than has been previously considered.

  3. Flight State Information Inference with Application to Helicopter Cockpit Video Data Analysis Using Data Mining Techniques

    NASA Astrophysics Data System (ADS)

    Shin, Sanghyun

    The National Transportation Safety Board (NTSB) has recently emphasized the importance of analyzing flight data as one of the most effective methods to improve eciency and safety of helicopter operations. By analyzing flight data with Flight Data Monitoring (FDM) programs, the safety and performance of helicopter operations can be evaluated and improved. In spite of the NTSB's effort, the safety of helicopter operations has not improved at the same rate as the safety of worldwide airlines, and the accident rate of helicopters continues to be much higher than that of fixed-wing aircraft. One of the main reasons is that the participation rates of the rotorcraft industry in the FDM programs are low due to the high costs of the Flight Data Recorder (FDR), the need of a special readout device to decode the FDR, anxiety of punitive action, etc. Since a video camera is easily installed, accessible, and inexpensively maintained, cockpit video data could complement the FDR in the presence of the FDR or possibly replace the role of the FDR in the absence of the FDR. Cockpit video data is composed of image and audio data: image data contains outside views through cockpit windows and activities on the flight instrument panels, whereas audio data contains sounds of the alarms within the cockpit. The goal of this research is to develop, test, and demonstrate a cockpit video data analysis algorithm based on data mining and signal processing techniques that can help better understand situations in the cockpit and the state of a helicopter by efficiently and accurately inferring the useful flight information from cockpit video data. Image processing algorithms based on data mining techniques are proposed to estimate a helicopter's attitude such as the bank and pitch angles, identify indicators from a flight instrument panel, and read the gauges and the numbers in the analogue gauge indicators and digital displays from cockpit image data. In addition, an audio processing algorithm based on signal processing and abrupt change detection techniques is proposed to identify types of warning alarms and to detect the occurrence times of individual alarms from cockpit audio data. Those proposed algorithms are then successfully applied to simulated and real helicopter cockpit video data to demonstrate and validate their performance.

  4. Computational Analysis and Simulation of Empathic Behaviors: A Survey of Empathy Modeling with Behavioral Signal Processing Framework

    PubMed Central

    Xiao, Bo; Imel, Zac E.; Georgiou, Panayiotis; Atkins, David C.; Narayanan, Shrikanth S.

    2017-01-01

    Empathy is an important psychological process that facilitates human communication and interaction. Enhancement of empathy has profound significance in a range of applications. In this paper, we review emerging directions of research on computational analysis of empathy expression and perception as well as empathic interactions, including their simulation. We summarize the work on empathic expression analysis by the targeted signal modalities (e.g., text, audio, facial expressions). We categorize empathy simulation studies into theory-based emotion space modeling or application-driven user and context modeling. We summarize challenges in computational study of empathy including conceptual framing and understanding of empathy, data availability, appropriate use and validation of machine learning techniques, and behavior signal processing. Finally, we propose a unified view of empathy computation, and offer a series of open problems for future research. PMID:27017830

  5. Sonic Simulation of Near Projectile Hits

    NASA Technical Reports Server (NTRS)

    Statman, J. I.; Rodemich, E. R.

    1988-01-01

    Measured frequencies identify projectiles and indicate miss distances. Developmental battlefield-simulation system for training soldiers uses sounds emitted by incoming projectiles to identify projectiles and indicate miss distances. Depending on projectile type and closeness of each hit, system generates "kill" or "near-kill" indication. Artillery shell simulated by lightweight plastic projectile launched by compressed air. Flow of air through groove in nose of projectile generates acoustic tone. Each participant carries audio receiver measure and process tone signal. System performs fast Fourier transforms of received tone to obtain dominant frequency during each succeeding interval of approximately 40 ms (an interval determined from practical signal-processing requirements). With modifications, system concept applicable to collision-warning or collision-avoidance systems.

  6. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients.

  7. Aggressive Bimodal Communication in Domestic Dogs, Canis familiaris

    PubMed Central

    Déaux, Éloïse C.; Clarke, Jennifer A.; Charrier, Isabelle

    2015-01-01

    Evidence of animal multimodal signalling is widespread and compelling. Dogs’ aggressive vocalisations (growls and barks) have been extensively studied, but without any consideration of the simultaneously produced visual displays. In this study we aimed to categorize dogs’ bimodal aggressive signals according to the redundant/non-redundant classification framework. We presented dogs with unimodal (audio or visual) or bimodal (audio-visual) stimuli and measured their gazing and motor behaviours. Responses did not qualitatively differ between the bimodal and two unimodal contexts, indicating that acoustic and visual signals provide redundant information. We could not further classify the signal as ‘equivalent’ or ‘enhancing’ as we found evidence for both subcategories. We discuss our findings in relation to the complex signal framework, and propose several hypotheses for this signal’s function. PMID:26571266

  8. Dual Key Speech Encryption Algorithm Based Underdetermined BSS

    PubMed Central

    Zhao, Huan; Chen, Zuo; Zhang, Xixiang

    2014-01-01

    When the number of the mixed signals is less than that of the source signals, the underdetermined blind source separation (BSS) is a significant difficult problem. Due to the fact that the great amount data of speech communications and real-time communication has been required, we utilize the intractability of the underdetermined BSS problem to present a dual key speech encryption method. The original speech is mixed with dual key signals which consist of random key signals (one-time pad) generated by secret seed and chaotic signals generated from chaotic system. In the decryption process, approximate calculation is used to recover the original speech signals. The proposed algorithm for speech signals encryption can resist traditional attacks against the encryption system, and owing to approximate calculation, decryption becomes faster and more accurate. It is demonstrated that the proposed method has high level of security and can recover the original signals quickly and efficiently yet maintaining excellent audio quality. PMID:24955430

  9. Energy and Power Spectra of Thunder in the Magdalena Mountains, Central New Mexico

    NASA Astrophysics Data System (ADS)

    Johnson, R. L.; Johnson, J. B.; Arechiga, R. O.; Michnovicz, J. C.; Edens, H. E.; Rison, W.

    2011-12-01

    Thunder is generated primarily by heating and expansion of the atmosphere around a lightning channel and by charge relaxation within a cloud. Broadband acoustic studies are important for inferring dynamic charge behavior during and after lightning events. During the Summer monsoon seasons of 2009-2011, we deployed networks of 3-5 stations consisting of broadband (0.01 to 500 Hz) acoustic arrays and audio microphones in the Magdalena Mountains in central New Mexico. We utilize Lightning Mapping Array (LMA) data for accurate timing of lightning events within a 10 km radius of our network. Unlike the LMA, which detects VHF signals from breakdown processes, thunder signals may be used to observe charge dynamics and thermal shocking of the atmosphere. Previous investigations show that thunder spectral content may distinguish between electrostatic and thermal heating processes. We collected extensive datasets in terms of number of independent broadband sensors (up to 20), number of observed flashes (hundreds from multiple storms), and available coincident LMA data. We use infrasound and audio data to quantify total acoustic energy produced at lightning sources in various frequency bands. We attribute the spectral content and intensity of thunder signals to source characteristics, sensor locations, propagation effects, and noise. We observe variations in acoustic energy for both entire storm systems and individual lightning flashes. We propose that some variations may be related to the type of lightning flash and that spectral content is important for distinguishing between thunder generation mechanisms.

  10. A scale-up field experiment for the monitoring of a burning process using chemical, audio, and video sensors.

    PubMed

    Stavrakakis, P; Agapiou, A; Mikedi, K; Karma, S; Statheropoulos, M; Pallis, G C; Pappa, A

    2014-01-01

    Fires are becoming more violent and frequent resulting in major economic losses and long-lasting effects on communities and ecosystems; thus, efficient fire monitoring is becoming a necessity. A novel triple multi-sensor approach was developed for monitoring and studying the burning of dry forest fuel in an open field scheduled experiment; chemical, optical, and acoustical sensors were combined to record the fire spread. The results of this integrated field campaign for real-time monitoring of the fire event are presented and discussed. Chemical analysis, despite its limitations, corresponded to the burning process with a minor time delay. Nevertheless, the evolution profile of CO2, CO, NO, and O2 were detected and monitored. The chemical monitoring of smoke components enabled the observing of the different fire phases (flaming, smoldering) based on the emissions identified in each phase. The analysis of fire acoustical signals presented accurate and timely response to the fire event. In the same content, the use of a thermographic camera, for monitoring the biomass burning, was also considerable (both profiles of the intensities of average gray and red component greater than 230) and presented similar promising potentials to audio results. Further work is needed towards integrating sensors signals for automation purposes leading to potential applications in real situations.

  11. An Efficient Method for Image and Audio Steganography using Least Significant Bit (LSB) Substitution

    NASA Astrophysics Data System (ADS)

    Chadha, Ankit; Satam, Neha; Sood, Rakshak; Bade, Dattatray

    2013-09-01

    In order to improve the data hiding in all types of multimedia data formats such as image and audio and to make hidden message imperceptible, a novel method for steganography is introduced in this paper. It is based on Least Significant Bit (LSB) manipulation and inclusion of redundant noise as secret key in the message. This method is applied to data hiding in images. For data hiding in audio, Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) both are used. All the results displayed prove to be time-efficient and effective. Also the algorithm is tested for various numbers of bits. For those values of bits, Mean Square Error (MSE) and Peak-Signal-to-Noise-Ratio (PSNR) are calculated and plotted. Experimental results show that the stego-image is visually indistinguishable from the original cover-image when n<=4, because of better PSNR which is achieved by this technique. The final results obtained after steganography process does not reveal presence of any hidden message, thus qualifying the criteria of imperceptible message.

  12. Optimal Window and Lattice in Gabor Transform. Application to Audio Analysis.

    PubMed

    Lachambre, Helene; Ricaud, Benjamin; Stempfel, Guillaume; Torrésani, Bruno; Wiesmeyr, Christoph; Onchis-Moaca, Darian

    2015-01-01

    This article deals with the use of optimal lattice and optimal window in Discrete Gabor Transform computation. In the case of a generalized Gaussian window, extending earlier contributions, we introduce an additional local window adaptation technique for non-stationary signals. We illustrate our approach and the earlier one by addressing three time-frequency analysis problems to show the improvements achieved by the use of optimal lattice and window: close frequencies distinction, frequency estimation and SNR estimation. The results are presented, when possible, with real world audio signals.

  13. Home telecare system using cable television plants--an experimental field trial.

    PubMed

    Lee, R G; Chen, H S; Lin, C C; Chang, K C; Chen, J H

    2000-03-01

    To solve the inconvenience of routine transportation of chronically ill and handicapped patients, this paper proposes a platform based on a hybrid fiber coaxial (HFC) network in Taiwan designed to make a home telecare system feasible. The aim of this home telecare system is to combine biomedical data, including three-channel electrocardiogram (ECG) and blood pressure (BP), video, and audio into a National Television Standard Committee (NTSC) channel for communication between the patient and healthcare provider. Digitized biomedical data and output from medical devices can be further modulated to a second audio program (SAP) subchannel which can be used for second-language audio in NTSC television signals. For long-distance transmission, we translate the digital biomedical data into the frequency domain using frequency shift key (FSK) technology and insert this signal into an SAP band. The whole system has been implemented and tested. The results obtained using this system clearly demonstrated that real-time video, audio, and biomedical data transmission are very clear with a carrier-to-noise ratio up to 43 dB.

  14. Visual Image Sensor Organ Replacement: Implementation

    NASA Technical Reports Server (NTRS)

    Maluf, A. David (Inventor)

    2011-01-01

    Method and system for enhancing or extending visual representation of a selected region of a visual image, where visual representation is interfered with or distorted, by supplementing a visual signal with at least one audio signal having one or more audio signal parameters that represent one or more visual image parameters, such as vertical and/or horizontal location of the region; region brightness; dominant wavelength range of the region; change in a parameter value that characterizes the visual image, with respect to a reference parameter value; and time rate of change in a parameter value that characterizes the visual image. Region dimensions can be changed to emphasize change with time of a visual image parameter.

  15. Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

    PubMed Central

    Haro, Martín; Serrà, Joan; Herrera, Perfecto; Corral, Álvaro

    2012-01-01

    Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources. PMID:22479497

  16. Integrated approach to multimodal media content analysis

    NASA Astrophysics Data System (ADS)

    Zhang, Tong; Kuo, C.-C. Jay

    1999-12-01

    In this work, we present a system for the automatic segmentation, indexing and retrieval of audiovisual data based on the combination of audio, visual and textural content analysis. The video stream is demultiplexed into audio, image and caption components. Then, a semantic segmentation of the audio signal based on audio content analysis is conducted, and each segment is indexed as one of the basic audio types. The image sequence is segmented into shots based on visual information analysis, and keyframes are extracted from each shot. Meanwhile, keywords are detected from the closed caption. Index tables are designed for both linear and non-linear access to the video. It is shown by experiments that the proposed methods for multimodal media content analysis are effective. And that the integrated framework achieves satisfactory results for video information filtering and retrieval.

  17. Audio-visual speech processing in age-related hearing loss: Stronger integration and increased frontal lobe recruitment.

    PubMed

    Rosemann, Stephanie; Thiel, Christiane M

    2018-07-15

    Hearing loss is associated with difficulties in understanding speech, especially under adverse listening conditions. In these situations, seeing the speaker improves speech intelligibility in hearing-impaired participants. On the neuronal level, previous research has shown cross-modal plastic reorganization in the auditory cortex following hearing loss leading to altered processing of auditory, visual and audio-visual information. However, how reduced auditory input effects audio-visual speech perception in hearing-impaired subjects is largely unknown. We here investigated the impact of mild to moderate age-related hearing loss on processing audio-visual speech using functional magnetic resonance imaging. Normal-hearing and hearing-impaired participants performed two audio-visual speech integration tasks: a sentence detection task inside the scanner and the McGurk illusion outside the scanner. Both tasks consisted of congruent and incongruent audio-visual conditions, as well as auditory-only and visual-only conditions. We found a significantly stronger McGurk illusion in the hearing-impaired participants, which indicates stronger audio-visual integration. Neurally, hearing loss was associated with an increased recruitment of frontal brain areas when processing incongruent audio-visual, auditory and also visual speech stimuli, which may reflect the increased effort to perform the task. Hearing loss modulated both the audio-visual integration strength measured with the McGurk illusion and brain activation in frontal areas in the sentence task, showing stronger integration and higher brain activation with increasing hearing loss. Incongruent compared to congruent audio-visual speech revealed an opposite brain activation pattern in left ventral postcentral gyrus in both groups, with higher activation in hearing-impaired participants in the incongruent condition. Our results indicate that already mild to moderate hearing loss impacts audio-visual speech processing accompanied by changes in brain activation particularly involving frontal areas. These changes are modulated by the extent of hearing loss. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Space Shuttle Orbiter audio subsystem. [to communication and tracking system

    NASA Technical Reports Server (NTRS)

    Stewart, C. H.

    1978-01-01

    The selection of the audio multiplex control configuration for the Space Shuttle Orbiter audio subsystem is discussed and special attention is given to the evaluation criteria of cost, weight and complexity. The specifications and design of the subsystem are described and detail is given to configurations of the audio terminal and audio central control unit (ATU, ACCU). The audio input from the ACCU, at a signal level of -12.2 to 14.8 dBV, nominal range, at 1 kHz, was found to have balanced source impedance and a balanced local impedance of 6000 + or - 600 ohms at 1 kHz, dc isolated. The Lyndon B. Johnson Space Center (JSC) electroacoustic test laboratory, an audio engineering facility consisting of a collection of acoustic test chambers, analyzed problems of speaker and headset performance, multiplexed control data coupled with audio channels, and the Orbiter cabin acoustic effects on the operational performance of voice communications. This system allows technical management and project engineering to address key constraining issues, such as identifying design deficiencies of the headset interface unit and the assessment of the Orbiter cabin performance of voice communications, which affect the subsystem development.

  19. Active Learning for Automatic Audio Processing of Unwritten Languages (ALAPUL)

    DTIC Science & Technology

    2016-07-01

    AFRL-RH-WP-TR-2016-0074 ACTIVE LEARNING FOR AUTOMATIC AUDIO PROCESSING OF UNWRITTEN LANGUAGES (ALAPUL) Dimitra Vergyri Andreas Kathol Wen Wang...June 2015-July 2016 4. TITLE AND SUBTITLE Active Learning for Automatic Audio Processing of Unwritten Languages (ALAPUL) 5a. CONTRACT NUMBER...5430, 27 October 2016 1. SUMMARY The goal of the project was to investigate development of an automatic spoken language processing (ASLP) system

  20. Auto-Associative Recurrent Neural Networks and Long Term Dependencies in Novelty Detection for Audio Surveillance Applications

    NASA Astrophysics Data System (ADS)

    Rossi, A.; Montefoschi, F.; Rizzo, A.; Diligenti, M.; Festucci, C.

    2017-10-01

    Machine Learning applied to Automatic Audio Surveillance has been attracting increasing attention in recent years. In spite of several investigations based on a large number of different approaches, little attention had been paid to the environmental temporal evolution of the input signal. In this work, we propose an exploration in this direction comparing the temporal correlations extracted at the feature level with the one learned by a representational structure. To this aim we analysed the prediction performances of a Recurrent Neural Network architecture varying the length of the processed input sequence and the size of the time window used in the feature extraction. Results corroborated the hypothesis that sequential models work better when dealing with data characterized by temporal order. However, so far the optimization of the temporal dimension remains an open issue.

  1. Exploring expressivity and emotion with artificial voice and speech technologies.

    PubMed

    Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

    2013-10-01

    Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.

  2. Remote listening and passive acoustic detection in a 3-D environment

    NASA Astrophysics Data System (ADS)

    Barnhill, Colin

    Teleconferencing environments are a necessity in business, education and personal communication. They allow for the communication of information to remote locations without the need for travel and the necessary time and expense required for that travel. Visual information can be communicated using cameras and monitors. The advantage of visual communication is that an image can capture multiple objects and convey them, using a monitor, to a large group of people regardless of the receiver's location. This is not the case for audio. Currently, most experimental teleconferencing systems' audio is based on stereo recording and reproduction techniques. The problem with this solution is that it is only effective for one or two receivers. To accurately capture a sound environment consisting of multiple sources and to recreate that for a group of people is an unsolved problem. This work will focus on new methods of multiple source 3-D environment sound capture and applications using these captured environments. Using spherical microphone arrays, it is now possible to capture a true 3-D environment A spherical harmonic transform on the array's surface allows us to determine the basis functions (spherical harmonics) for all spherical wave solutions (up to a fixed order). This spherical harmonic decomposition (SHD) allows us to not only look at the time and frequency characteristics of an audio signal but also the spatial characteristics of an audio signal. In this way, a spherical harmonic transform is analogous to a Fourier transform in that a Fourier transform transforms a signal into the frequency domain and a spherical harmonic transform transforms a signal into the spatial domain. The SHD also decouples the input signals from the microphone locations. Using the SHD of a soundfield, new algorithms are available for remote listening, acoustic detection, and signal enhancement The new algorithms presented in this paper show distinct advantages over previous detection and listening algorithms especially for multiple speech sources and room environments. The algorithms use high order (spherical harmonic) beamforming and power signal characteristics for source localization and signal enhancement These methods are applied to remote listening, surveillance, and teleconferencing.

  3. Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

    NASA Astrophysics Data System (ADS)

    Feldbauer, Christian; Kubin, Gernot; Kleijn, W. Bastiaan

    2005-12-01

    Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel) coding.

  4. Implementation and performance evaluation of acoustic denoising algorithms for UAV

    NASA Astrophysics Data System (ADS)

    Chowdhury, Ahmed Sony Kamal

    Unmanned Aerial Vehicles (UAVs) have become popular alternative for wildlife monitoring and border surveillance applications. Elimination of the UAV's background noise and classifying the target audio signal effectively are still a major challenge. The main goal of this thesis is to remove UAV's background noise by means of acoustic denoising techniques. Existing denoising algorithms, such as Adaptive Least Mean Square (LMS), Wavelet Denoising, Time-Frequency Block Thresholding, and Wiener Filter, were implemented and their performance evaluated. The denoising algorithms were evaluated for average Signal to Noise Ratio (SNR), Segmental SNR (SSNR), Log Likelihood Ratio (LLR), and Log Spectral Distance (LSD) metrics. To evaluate the effectiveness of the denoising algorithms on classification of target audio, we implemented Support Vector Machine (SVM) and Naive Bayes classification algorithms. Simulation results demonstrate that LMS and Discrete Wavelet Transform (DWT) denoising algorithm offered superior performance than other algorithms. Finally, we implemented the LMS and DWT algorithms on a DSP board for hardware evaluation. Experimental results showed that LMS algorithm's performance is robust compared to DWT for various noise types to classify target audio signals.

  5. Classroom Audio Distribution in the Postsecondary Setting: A Story of Universal Design for Learning

    ERIC Educational Resources Information Center

    Flagg-Williams, Joan B.; Bokhorst-Heng, Wendy D.

    2016-01-01

    Classroom Audio Distribution Systems (CADS) consist of amplification technology that enhances the teacher's, or sometimes the student's, vocal signal above the background noise in a classroom. Much research has supported the benefits of CADS for student learning, but most of it has focused on elementary school classrooms. This study investigated…

  6. Lamp modulator provides signal magnitude indication

    NASA Technical Reports Server (NTRS)

    Zeman, J. R.

    1970-01-01

    Lamp modulator provides visible indication of presence and magnitude of an audio signal carrying voice or data. It can be made to reflect signal variations of up to 32 decibels. Lamp life is increased by use of a bypass resistor to prevent filament failure.

  7. Hi Fi Audio Tape to Sun Workstation Transfer System for Digital Audio Data

    DTIC Science & Technology

    1994-03-01

    33 Figure 13 The Interface Memory Map (for 64K X 32 SRAM ). [Ref. 10] ..... 35 Figure 14 Main board data bus connection to the DM bus...module are described separately below. DSP-LINK’C OR SCSI 2K x 32 SRAM 40MMO 51K x 32 atuffersOM C SBus TMS320C30 - - Slave Floating Point A t a...and an ENABLE signal is sent to the device along with a read or a write signal. The memory map of the board with 64k SRAM is shown in Figure 13. The

  8. Content-based audio authentication using a hierarchical patchwork watermark embedding

    NASA Astrophysics Data System (ADS)

    Gulbis, Michael; Müller, Erika

    2010-05-01

    Content-based audio authentication watermarking techniques extract perceptual relevant audio features, which are robustly embedded into the audio file to protect. Manipulations of the audio file are detected on the basis of changes between the original embedded feature information and the anew extracted features during verification. The main challenges of content-based watermarking are on the one hand the identification of a suitable audio feature to distinguish between content preserving and malicious manipulations. On the other hand the development of a watermark, which is robust against content preserving modifications and able to carry the whole authentication information. The payload requirements are significantly higher compared to transaction watermarking or copyright protection. Finally, the watermark embedding should not influence the feature extraction to avoid false alarms. Current systems still lack a sufficient alignment of watermarking algorithm and feature extraction. In previous work we developed a content-based audio authentication watermarking approach. The feature is based on changes in DCT domain over time. A patchwork algorithm based watermark was used to embed multiple one bit watermarks. The embedding process uses the feature domain without inflicting distortions to the feature. The watermark payload is limited by the feature extraction, more precisely the critical bands. The payload is inverse proportional to segment duration of the audio file segmentation. Transparency behavior was analyzed in dependence of segment size and thus the watermark payload. At a segment duration of about 20 ms the transparency shows an optimum (measured in units of Objective Difference Grade). Transparency and/or robustness are fast decreased for working points beyond this area. Therefore, these working points are unsuitable to gain further payload, needed for the embedding of the whole authentication information. In this paper we present a hierarchical extension of the watermark method to overcome the limitations given by the feature extraction. The approach is a recursive application of the patchwork algorithm onto its own patches, with a modified patch selection to ensure a better signal to noise ratio for the watermark embedding. The robustness evaluation was done by compression (mp3, ogg, aac), normalization, and several attacks of the stirmark benchmark for audio suite. Compared on the base of same payload and transparency the hierarchical approach shows improved robustness.

  9. An extraordinary tabletop speed of light apparatus

    NASA Astrophysics Data System (ADS)

    Pegna, Guido

    2017-09-01

    A compact, low-cost, pre-aligned apparatus of the modulation type is described. The apparatus allows accurate determination of the speed of light in free propagation with an accuracy on the order of one part in 104. Due to the 433.92 MHz radio frequency (rf) modulation of its laser diode, determination of the speed of light is possible within a sub-meter measuring base and in small volumes (some cm3) of transparent solids or liquids. No oscilloscope is necessary, while the required function generators, power supplies, and optical components are incorporated into the design of the apparatus and its receiver can slide along the optical bench while maintaining alignment with the laser beam. Measurement of the velocity factor of coaxial cables is also easily performed. The apparatus detects the phase difference between the rf modulation of the laser diode by further modulating the rf signal with an audio frequency signal; the phase difference between these signals is then observed as the loudness of the audio signal. In this way, the positions at which the minima of the audio signal are found determine where the rf signals are completely out of phase. This phase detection method yields a much increased sensitivity with respect to the display of coincidence of two signals of questionable arrival time and somewhat distorted shape on an oscilloscope. The displaying technique is also particularly suitable for large audiences as well as in unattended exhibits in museums and science centers. In addition, the apparatus can be set up in less than one minute.

  10. Audio-visual biofeedback for respiratory-gated radiotherapy: Impact of audio instruction and audio-visual biofeedback on respiratory-gated radiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    George, Rohini; Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA; Chung, Theodore D.

    2006-07-01

    Purpose: Respiratory gating is a commercially available technology for reducing the deleterious effects of motion during imaging and treatment. The efficacy of gating is dependent on the reproducibility within and between respiratory cycles during imaging and treatment. The aim of this study was to determine whether audio-visual biofeedback can improve respiratory reproducibility by decreasing residual motion and therefore increasing the accuracy of gated radiotherapy. Methods and Materials: A total of 331 respiratory traces were collected from 24 lung cancer patients. The protocol consisted of five breathing training sessions spaced about a week apart. Within each session the patients initially breathedmore » without any instruction (free breathing), with audio instructions and with audio-visual biofeedback. Residual motion was quantified by the standard deviation of the respiratory signal within the gating window. Results: Audio-visual biofeedback significantly reduced residual motion compared with free breathing and audio instruction. Displacement-based gating has lower residual motion than phase-based gating. Little reduction in residual motion was found for duty cycles less than 30%; for duty cycles above 50% there was a sharp increase in residual motion. Conclusions: The efficiency and reproducibility of gating can be improved by: incorporating audio-visual biofeedback, using a 30-50% duty cycle, gating during exhalation, and using displacement-based gating.« less

  11. Laboratory and in-flight experiments to evaluate 3-D audio display technology

    NASA Technical Reports Server (NTRS)

    Ericson, Mark; Mckinley, Richard; Kibbe, Marion; Francis, Daniel

    1994-01-01

    Laboratory and in-flight experiments were conducted to evaluate 3-D audio display technology for cockpit applications. A 3-D audio display generator was developed which digitally encodes naturally occurring direction information onto any audio signal and presents the binaural sound over headphones. The acoustic image is stabilized for head movement by use of an electromagnetic head-tracking device. In the laboratory, a 3-D audio display generator was used to spatially separate competing speech messages to improve the intelligibility of each message. Up to a 25 percent improvement in intelligibility was measured for spatially separated speech at high ambient noise levels (115 dB SPL). During the in-flight experiments, pilots reported that spatial separation of speech communications provided a noticeable improvement in intelligibility. The use of 3-D audio for target acquisition was also investigated. In the laboratory, 3-D audio enabled the acquisition of visual targets in about two seconds average response time at 17 degrees accuracy. During the in-flight experiments, pilots correctly identified ground targets 50, 75, and 100 percent of the time at separation angles of 12, 20, and 35 degrees, respectively. In general, pilot performance in the field with the 3-D audio display generator was as expected, based on data from laboratory experiments.

  12. Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

    2009-01-01

    A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.

  13. A real-time detector system for precise timing of audiovisual stimuli.

    PubMed

    Henelius, Andreas; Jagadeesan, Sharman; Huotilainen, Minna

    2012-01-01

    The successful recording of neurophysiologic signals, such as event-related potentials (ERPs) or event-related magnetic fields (ERFs), relies on precise information of stimulus presentation times. We have developed an accurate and flexible audiovisual sensor solution operating in real-time for on-line use in both auditory and visual ERP and ERF paradigms. The sensor functions independently of the used audio or video stimulus presentation tools or signal acquisition system. The sensor solution consists of two independent sensors; one for sound and one for light. The microcontroller-based audio sensor incorporates a novel approach to the detection of natural sounds such as multipart audio stimuli, using an adjustable dead time. This aids in producing exact markers for complex auditory stimuli and reduces the number of false detections. The analog photosensor circuit detects changes in light intensity on the screen and produces a marker for changes exceeding a threshold. The microcontroller software for the audio sensor is free and open source, allowing other researchers to customise the sensor for use in specific auditory ERP/ERF paradigms. The hardware schematics and software for the audiovisual sensor are freely available from the webpage of the authors' lab.

  14. Listening to an Audio Drama Activates Two Processing Networks, One for All Sounds, Another Exclusively for Speech

    PubMed Central

    Boldt, Robert; Malinen, Sanna; Seppä, Mika; Tikka, Pia; Savolainen, Petri; Hari, Riitta; Carlson, Synnöve

    2013-01-01

    Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ∼19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI). An intersubject-correlation (ISC) map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC) analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two–covering non-overlapping areas of the auditory cortex–were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds. PMID:23734202

  15. Listening to an audio drama activates two processing networks, one for all sounds, another exclusively for speech.

    PubMed

    Boldt, Robert; Malinen, Sanna; Seppä, Mika; Tikka, Pia; Savolainen, Petri; Hari, Riitta; Carlson, Synnöve

    2013-01-01

    Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ∼19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI). An intersubject-correlation (ISC) map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC) analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two-covering non-overlapping areas of the auditory cortex-were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds.

  16. A compact electroencephalogram recording device with integrated audio stimulation system.

    PubMed

    Paukkunen, Antti K O; Kurttio, Anttu A; Leminen, Miika M; Sepponen, Raimo E

    2010-06-01

    A compact (96 x 128 x 32 mm(3), 374 g), battery-powered, eight-channel electroencephalogram recording device with an integrated audio stimulation system and a wireless interface is presented. The recording device is capable of producing high-quality data, while the operating time is also reasonable for evoked potential studies. The effective measurement resolution is about 4 nV at 200 Hz sample rate, typical noise level is below 0.7 microV(rms) at 0.16-70 Hz, and the estimated operating time is 1.5 h. An embedded audio decoder circuit reads and plays wave sound files stored on a memory card. The activities are controlled by an 8 bit main control unit which allows accurate timing of the stimuli. The interstimulus interval jitter measured is less than 1 ms. Wireless communication is made through bluetooth and the data recorded are transmitted to an external personal computer (PC) interface in real time. The PC interface is implemented with LABVIEW and in addition to data acquisition it also allows online signal processing, data storage, and control of measurement activities such as contact impedance measurement, for example. The practical application of the device is demonstrated in mismatch negativity experiment with three test subjects.

  17. A compact electroencephalogram recording device with integrated audio stimulation system

    NASA Astrophysics Data System (ADS)

    Paukkunen, Antti K. O.; Kurttio, Anttu A.; Leminen, Miika M.; Sepponen, Raimo E.

    2010-06-01

    A compact (96×128×32 mm3, 374 g), battery-powered, eight-channel electroencephalogram recording device with an integrated audio stimulation system and a wireless interface is presented. The recording device is capable of producing high-quality data, while the operating time is also reasonable for evoked potential studies. The effective measurement resolution is about 4 nV at 200 Hz sample rate, typical noise level is below 0.7 μVrms at 0.16-70 Hz, and the estimated operating time is 1.5 h. An embedded audio decoder circuit reads and plays wave sound files stored on a memory card. The activities are controlled by an 8 bit main control unit which allows accurate timing of the stimuli. The interstimulus interval jitter measured is less than 1 ms. Wireless communication is made through bluetooth and the data recorded are transmitted to an external personal computer (PC) interface in real time. The PC interface is implemented with LABVIEW® and in addition to data acquisition it also allows online signal processing, data storage, and control of measurement activities such as contact impedance measurement, for example. The practical application of the device is demonstrated in mismatch negativity experiment with three test subjects.

  18. Independent component analysis algorithm FPGA design to perform real-time blind source separation

    NASA Astrophysics Data System (ADS)

    Meyer-Baese, Uwe; Odom, Crispin; Botella, Guillermo; Meyer-Baese, Anke

    2015-05-01

    The conditions that arise in the Cocktail Party Problem prevail across many fields creating a need for of Blind Source Separation. The need for BSS has become prevalent in several fields of work. These fields include array processing, communications, medical signal processing, and speech processing, wireless communication, audio, acoustics and biomedical engineering. The concept of the cocktail party problem and BSS led to the development of Independent Component Analysis (ICA) algorithms. ICA proves useful for applications needing real time signal processing. The goal of this research was to perform an extensive study on ability and efficiency of Independent Component Analysis algorithms to perform blind source separation on mixed signals in software and implementation in hardware with a Field Programmable Gate Array (FPGA). The Algebraic ICA (A-ICA), Fast ICA, and Equivariant Adaptive Separation via Independence (EASI) ICA were examined and compared. The best algorithm required the least complexity and fewest resources while effectively separating mixed sources. The best algorithm was the EASI algorithm. The EASI ICA was implemented on hardware with Field Programmable Gate Arrays (FPGA) to perform and analyze its performance in real time.

  19. Multimedia Classifier

    NASA Astrophysics Data System (ADS)

    Costache, G. N.; Gavat, I.

    2004-09-01

    Along with the aggressive growing of the amount of digital data available (text, audio samples, digital photos and digital movies joined all in the multimedia domain) the need for classification, recognition and retrieval of this kind of data became very important. In this paper will be presented a system structure to handle multimedia data based on a recognition perspective. The main processing steps realized for the interesting multimedia objects are: first, the parameterization, by analysis, in order to obtain a description based on features, forming the parameter vector; second, a classification, generally with a hierarchical structure to make the necessary decisions. For audio signals, both speech and music, the derived perceptual features are the melcepstral (MFCC) and the perceptual linear predictive (PLP) coefficients. For images, the derived features are the geometric parameters of the speaker mouth. The hierarchical classifier consists generally in a clustering stage, based on the Kohonnen Self-Organizing Maps (SOM) and a final stage, based on a powerful classification algorithm called Support Vector Machines (SVM). The system, in specific variants, is applied with good results in two tasks: the first, is a bimodal speech recognition which uses features obtained from speech signal fused to features obtained from speaker's image and the second is a music retrieval from large music database.

  20. Transfer Learning for Improved Audio-Based Human Activity Recognition.

    PubMed

    Ntalampiras, Stavros; Potamitis, Ilyas

    2018-06-25

    Human activities are accompanied by characteristic sound events, the processing of which might provide valuable information for automated human activity recognition. This paper presents a novel approach addressing the case where one or more human activities are associated with limited audio data, resulting in a potentially highly imbalanced dataset. Data augmentation is based on transfer learning; more specifically, the proposed method: (a) identifies the classes which are statistically close to the ones associated with limited data; (b) learns a multiple input, multiple output transformation; and (c) transforms the data of the closest classes so that it can be used for modeling the ones associated with limited data. Furthermore, the proposed framework includes a feature set extracted out of signal representations of diverse domains, i.e., temporal, spectral, and wavelet. Extensive experiments demonstrate the relevance of the proposed data augmentation approach under a variety of generative recognition schemes.

  1. Incorporation of perceptually adaptive QIM with singular value decomposition for blind audio watermarking

    NASA Astrophysics Data System (ADS)

    Hu, Hwai-Tsu; Chou, Hsien-Hsin; Yu, Chu; Hsu, Ling-Yuan

    2014-12-01

    This paper presents a novel approach for blind audio watermarking. The proposed scheme utilizes the flexibility of discrete wavelet packet transformation (DWPT) to approximate the critical bands and adaptively determines suitable embedding strengths for carrying out quantization index modulation (QIM). The singular value decomposition (SVD) is employed to analyze the matrix formed by the DWPT coefficients and embed watermark bits by manipulating singular values subject to perceptual criteria. To achieve even better performance, two auxiliary enhancement measures are attached to the developed scheme. Performance evaluation and comparison are demonstrated with the presence of common digital signal processing attacks. Experimental results confirm that the combination of the DWPT, SVD, and adaptive QIM achieves imperceptible data hiding with satisfying robustness and payload capacity. Moreover, the inclusion of self-synchronization capability allows the developed watermarking system to withstand time-shifting and cropping attacks.

  2. Sonification of optical coherence tomography data and images

    PubMed Central

    Ahmad, Adeel; Adie, Steven G.; Wang, Morgan; Boppart, Stephen A.

    2010-01-01

    Sonification is the process of representing data as non-speech audio signals. In this manuscript, we describe the auditory presentation of OCT data and images. OCT acquisition rates frequently exceed our ability to visually analyze image-based data, and multi-sensory input may therefore facilitate rapid interpretation. This conversion will be especially valuable in time-sensitive surgical or diagnostic procedures. In these scenarios, auditory feedback can complement visual data without requiring the surgeon to constantly monitor the screen, or provide additional feedback in non-imaging procedures such as guided needle biopsies which use only axial-scan data. In this paper we present techniques to translate OCT data and images into sound based on the spatial and spatial frequency properties of the OCT data. Results obtained from parameter-mapped sonification of human adipose and tumor tissues are presented, indicating that audio feedback of OCT data may be useful for the interpretation of OCT images. PMID:20588846

  3. Audio-video feature correlation: faces and speech

    NASA Astrophysics Data System (ADS)

    Durand, Gwenael; Montacie, Claude; Caraty, Marie-Jose; Faudemay, Pascal

    1999-08-01

    This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.

  4. Embedded security system for multi-modal surveillance in a railway carriage

    NASA Astrophysics Data System (ADS)

    Zouaoui, Rhalem; Audigier, Romaric; Ambellouis, Sébastien; Capman, François; Benhadda, Hamid; Joudrier, Stéphanie; Sodoyer, David; Lamarque, Thierry

    2015-10-01

    Public transport security is one of the main priorities of the public authorities when fighting against crime and terrorism. In this context, there is a great demand for autonomous systems able to detect abnormal events such as violent acts aboard passenger cars and intrusions when the train is parked at the depot. To this end, we present an innovative approach which aims at providing efficient automatic event detection by fusing video and audio analytics and reducing the false alarm rate compared to classical stand-alone video detection. The multi-modal system is composed of two microphones and one camera and integrates onboard video and audio analytics and fusion capabilities. On the one hand, for detecting intrusion, the system relies on the fusion of "unusual" audio events detection with intrusion detections from video processing. The audio analysis consists in modeling the normal ambience and detecting deviation from the trained models during testing. This unsupervised approach is based on clustering of automatically extracted segments of acoustic features and statistical Gaussian Mixture Model (GMM) modeling of each cluster. The intrusion detection is based on the three-dimensional (3D) detection and tracking of individuals in the videos. On the other hand, for violent events detection, the system fuses unsupervised and supervised audio algorithms with video event detection. The supervised audio technique detects specific events such as shouts. A GMM is used to catch the formant structure of a shout signal. Video analytics use an original approach for detecting aggressive motion by focusing on erratic motion patterns specific to violent events. As data with violent events is not easily available, a normality model with structured motions from non-violent videos is learned for one-class classification. A fusion algorithm based on Dempster-Shafer's theory analyses the asynchronous detection outputs and computes the degree of belief of each probable event.

  5. 47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 47 Telecommunication 1 2010-10-01 2010-10-01 false EAS code and Attention Signal Transmission... SYSTEM (EAS) Emergency Operations § 11.51 EAS code and Attention Signal Transmission requirements. (a... programming before EAS message transmission should not cause television receivers to mute EAS audio messages...

  6. Detecting Parkinson's disease from sustained phonation and speech signals.

    PubMed

    Vaiciukynas, Evaldas; Verikas, Antanas; Gelzinis, Adas; Bacauskiene, Marija

    2017-01-01

    This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC) and smart phone (SP) microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF) is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER) and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.

  7. Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech.

    PubMed

    Alm, Magnus; Behne, Dawn

    2013-10-01

    Previous research indicates that perception of audio-visual (AV) synchrony changes in adulthood. Possible explanations for these age differences include a decline in hearing acuity, a decline in cognitive processing speed, and increased experience with AV binding. The current study aims to isolate the effect of AV experience by comparing synchrony judgments from 20 young adults (20 to 30 yrs) and 20 normal-hearing middle-aged adults (50 to 60 yrs), an age range for which a decline of cognitive processing speed is expected to be minimal. When presented with AV stop consonant syllables with asynchronies ranging from 440 ms audio-lead to 440 ms visual-lead, middle-aged adults showed significantly less tolerance for audio-lead than young adults. Middle-aged adults also showed a greater shift in their point of subjective simultaneity than young adults. Natural audio-lead asynchronies are arguably more predictable than natural visual-lead asynchronies, and this predictability may render audio-lead thresholds more prone to experience-related fine-tuning.

  8. WebGL and web audio software lightweight components for multimedia education

    NASA Astrophysics Data System (ADS)

    Chang, Xin; Yuksel, Kivanc; Skarbek, Władysław

    2017-08-01

    The paper presents the results of our recent work on development of contemporary computing platform DC2 for multimedia education usingWebGL andWeb Audio { the W3C standards. Using literate programming paradigm the WEBSA educational tools were developed. It offers for a user (student), the access to expandable collection of WEBGL Shaders and web Audio scripts. The unique feature of DC2 is the option of literate programming, offered for both, the author and the reader in order to improve interactivity to lightweightWebGL andWeb Audio components. For instance users can define: source audio nodes including synthetic sources, destination audio nodes, and nodes for audio processing such as: sound wave shaping, spectral band filtering, convolution based modification, etc. In case of WebGL beside of classic graphics effects based on mesh and fractal definitions, the novel image processing analysis by shaders is offered like nonlinear filtering, histogram of gradients, and Bayesian classifiers.

  9. Data reduction for cough studies using distribution of audio frequency content

    PubMed Central

    2012-01-01

    Background Recent studies suggest that objectively quantifying coughing in audio recordings offers a novel means to understand coughing and assess treatments. Currently, manual cough counting is the most accurate method for quantifying coughing. However, the demand of manually counting cough records is substantial, demonstrating a need to reduce record lengths prior to counting whilst preserving the coughs within them. This study tested the performance of an algorithm developed for this purpose. Methods 20 subjects were recruited (5 healthy smokers and non-smokers, 5 chronic cough, 5 chronic obstructive pulmonary disease and 5 asthma), fitted with an ambulatory recording system and recorded for 24 hours. The recordings produced were divided into 15 min segments and counted. Periods of inactive audio in each segment were removed using the median frequency and power of the audio signal and the resulting files re-counted. Results The median resultant segment length was 13.9 s (IQR 56.4 s) and median 24 hr recording length 62.4 min (IQR 100.4). A median of 0.0 coughs/h (IQR 0.0-0.2) were erroneously removed and the variability in the resultant cough counts was comparable to that between manual cough counts. The largest error was seen in asthmatic patients, but still only 1.0% coughs/h were missed. Conclusions These data show that a system which measures signal activity using the median audio frequency can substantially reduce record lengths without significantly compromising the coughs contained within them. PMID:23231789

  10. Recording and reading of information on optical disks

    NASA Astrophysics Data System (ADS)

    Bouwhuis, G.; Braat, J. J. M.

    In the storage of information, related to video programs, in a spiral track on a disk, difficulties arise because the bandwidth for video is much greater than for audio signals. An attractive solution was found in optical storage. The optical noncontact method is free of wear, and allows for fast random access. Initial problems regarding a suitable light source could be overcome with the aid of appropriate laser devices. The basic concepts of optical storage on disks are treated insofar as they are relevant for the optical arrangement. A general description is provided of a video, a digital audio, and a data storage system. Scanning spot microscopy for recording and reading of optical disks is discussed, giving attention to recording of the signal, the readout of optical disks, the readout of digitally encoded signals, and cross talk. Tracking systems are also considered, taking into account the generation of error signals for radial tracking and the generation of focus error signals.

  11. Audio Motor Training at the Foot Level Improves Space Representation.

    PubMed

    Aggius-Vella, Elena; Campus, Claudio; Finocchietti, Sara; Gori, Monica

    2017-01-01

    Spatial representation is developed thanks to the integration of visual signals with the other senses. It has been shown that the lack of vision compromises the development of some spatial representations. In this study we tested the effect of a new rehabilitation device called ABBI (Audio Bracelet for Blind Interaction) to improve space representation. ABBI produces an audio feedback linked to body movement. Previous studies from our group showed that this device improves the spatial representation of space in early blind adults around the upper part of the body. Here we evaluate whether the audio motor feedback produced by ABBI can also improve audio spatial representation of sighted individuals in the space around the legs. Forty five blindfolded sighted subjects participated in the study, subdivided into three experimental groups. An audio space localization (front-back discrimination) task was performed twice by all groups of subjects before and after different kind of training conditions. A group (experimental) performed an audio-motor training with the ABBI device placed on their foot. Another group (control) performed a free motor activity without audio feedback associated with body movement. The other group (control) passively listened to the ABBI sound moved at foot level by the experimenter without producing any body movement. Results showed that only the experimental group, which performed the training with the audio-motor feedback, showed an improvement in accuracy for sound discrimination. No improvement was observed for the two control groups. These findings suggest that the audio-motor training with ABBI improves audio space perception also in the space around the legs in sighted individuals. This result provides important inputs for the rehabilitation of the space representations in the lower part of the body.

  12. Audio Motor Training at the Foot Level Improves Space Representation

    PubMed Central

    Aggius-Vella, Elena; Campus, Claudio; Finocchietti, Sara; Gori, Monica

    2017-01-01

    Spatial representation is developed thanks to the integration of visual signals with the other senses. It has been shown that the lack of vision compromises the development of some spatial representations. In this study we tested the effect of a new rehabilitation device called ABBI (Audio Bracelet for Blind Interaction) to improve space representation. ABBI produces an audio feedback linked to body movement. Previous studies from our group showed that this device improves the spatial representation of space in early blind adults around the upper part of the body. Here we evaluate whether the audio motor feedback produced by ABBI can also improve audio spatial representation of sighted individuals in the space around the legs. Forty five blindfolded sighted subjects participated in the study, subdivided into three experimental groups. An audio space localization (front-back discrimination) task was performed twice by all groups of subjects before and after different kind of training conditions. A group (experimental) performed an audio-motor training with the ABBI device placed on their foot. Another group (control) performed a free motor activity without audio feedback associated with body movement. The other group (control) passively listened to the ABBI sound moved at foot level by the experimenter without producing any body movement. Results showed that only the experimental group, which performed the training with the audio-motor feedback, showed an improvement in accuracy for sound discrimination. No improvement was observed for the two control groups. These findings suggest that the audio-motor training with ABBI improves audio space perception also in the space around the legs in sighted individuals. This result provides important inputs for the rehabilitation of the space representations in the lower part of the body. PMID:29326564

  13. A Low Cost Remote Sensing System Using PC and Stereo Equipment

    NASA Technical Reports Server (NTRS)

    Campbell, Joel F.; Flood, Michael A.; Prasad, Narasimha S.; Hodson, Wade D.

    2011-01-01

    A system using a personal computer, speaker, and a microphone is used to detect objects, and make crude measurements using a carrier modulated by a pseudorandom noise (PN) code. This system can be constructed using a personal computer and audio equipment commonly found in the laboratory or at home, or more sophisticated equipment that can be purchased at reasonable cost. We demonstrate its value as an instructional tool for teaching concepts of remote sensing and digital signal processing.

  14. Inductive Interference in Rapid Transit Signaling Systems. Volume 1. Theory and Background.

    DOT National Transportation Integrated Search

    1986-05-01

    This report describes the mechanism of inductive interference to audio frequency (AF) signaling systems used in rail transit operations, caused by rail transit vehicles with chopper propulsion control. Choppers are switching circuits composed of high...

  15. Evolution Nonlinear Diffusion-Convection PDE Models for Spectrogram Enhancement

    NASA Astrophysics Data System (ADS)

    Dugnol, B.; Fernández, C.; Galiano, G.; Velasco, J.

    2008-09-01

    In previous works we studied the application of PDE-based image processing techniques applied to the spectrogram of audio signals in order to improve the readability of the signal. In particular we considered the implementation of the nonlinear diffusive model proposed by Álvarez, Lions and Morel [1](ALM) combined with a convective term inspired by the differential reassignment proposed by Chassandre-Mottin, Daubechies, Auger and Flandrin [2]-[3]. In this work we consider the possibility of replacing the diffusive model of ALM by diffusive terms in divergence form. In particular we implement finite element approximations of nonlinear diffusive terms studied by Chen, Levine, Rao [4] and Antontsev, Shmarev [5]-[8] with a convective term.

  16. 47 CFR 1.4000 - Restrictions impairing reception of television broadcast signals, direct broadcast satellite...

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... antenna that is: (A) Used to receive video programming services via multipoint distribution services... radio, amateur (“HAM”) radio, Citizen's Band (CB) radio, and Digital Audio Radio Service (DARS) signals... reception of video programming services or devices used to receive or transmit fixed wireless signals shall...

  17. Enhancement of Signal-to-noise Ratio in Natural-source Transient Magnetotelluric Data with Wavelet Transform

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Paulson, K. V.

    For audio-frequency magnetotelluric surveys where the signals are lightning-stroke transients, the conventional Fourier transform method often fails to produce a high quality impedance tensor. An alternative approach is to use the wavelet transform method which is capable of localizing target information simultaneously in both the temporal and frequency domains. Unlike Fourier analysis that yields an average amplitude and phase, the wavelet transform produces an instantaneous estimate of the amplitude and phase of a signal. In this paper a complex well-localized wavelet, the Morlet wavelet, has been used to transform and analyze audio-frequency magnetotelluric data. With the Morlet wavelet, the magnetotelluric impedance tensor can be computed directly in the wavelet transform domain. The lightning-stroke transients are easily identified on the dilation-translation plane. Choosing those wavelet transform values where the signals are located, a higher signal-to-noise ratio estimation of the impedance tensor can be obtained. In a test using real data, the wavelet transform showed a significant improvement in the signal-to-noise ratio over the conventional Fourier transform.

  18. Holographic disk with high data transfer rate: its application to an audio response memory.

    PubMed

    Kubota, K; Ono, Y; Kondo, M; Sugama, S; Nishida, N; Sakaguchi, M

    1980-03-15

    This paper describes a memory realized with a high data transfer rate using the holographic parallel-processing function and its application to an audio response system that supplies many audio messages to many terminals simultaneously. Digitalized audio messages are recorded as tiny 1-D Fourier transform holograms on a holographic disk. A hologram recorder and a hologram reader were constructed to test and demonstrate the holographic audio response memory feasibility. Experimental results indicate the potentiality of an audio response system with a 2000-word vocabulary and 250-Mbit/sec bit transfer rate.

  19. Fast contactless vibrating structure characterization using real time field programmable gate array-based digital signal processing: demonstrations with a passive wireless acoustic delay line probe and vision.

    PubMed

    Goavec-Mérou, G; Chrétien, N; Friedt, J-M; Sandoz, P; Martin, G; Lenczner, M; Ballandras, S

    2014-01-01

    Vibrating mechanical structure characterization is demonstrated using contactless techniques best suited for mobile and rotating equipments. Fast measurement rates are achieved using Field Programmable Gate Array (FPGA) devices as real-time digital signal processors. Two kinds of algorithms are implemented on FPGA and experimentally validated in the case of the vibrating tuning fork. A first application concerns in-plane displacement detection by vision with sampling rates above 10 kHz, thus reaching frequency ranges above the audio range. A second demonstration concerns pulsed-RADAR cooperative target phase detection and is applied to radiofrequency acoustic transducers used as passive wireless strain gauges. In this case, the 250 ksamples/s refresh rate achieved is only limited by the acoustic sensor design but not by the detection bandwidth. These realizations illustrate the efficiency, interest, and potentialities of FPGA-based real-time digital signal processing for the contactless interrogation of passive embedded probes with high refresh rates.

  20. Power line detection system

    DOEpatents

    Latorre, Victor R.; Watwood, Donald B.

    1994-01-01

    A short-range, radio frequency (RF) transmitting-receiving system that provides both visual and audio warnings to the pilot of a helicopter or light aircraft of an up-coming power transmission line complex. Small, milliwatt-level narrowband transmitters, powered by the transmission line itself, are installed on top of selected transmission line support towers or within existing warning balls, and provide a continuous RF signal to approaching aircraft. The on-board receiver can be either a separate unit or a portion of the existing avionics, and can also share an existing antenna with another airborne system. Upon receipt of a warning signal, the receiver will trigger a visual and an audio alarm to alert the pilot to the potential power line hazard.

  1. Bridging music and speech rhythm: rhythmic priming and audio-motor training affect speech perception.

    PubMed

    Cason, Nia; Astésano, Corine; Schön, Daniele

    2015-02-01

    Following findings that musical rhythmic priming enhances subsequent speech perception, we investigated whether rhythmic priming for spoken sentences can enhance phonological processing - the building blocks of speech - and whether audio-motor training enhances this effect. Participants heard a metrical prime followed by a sentence (with a matching/mismatching prosodic structure), for which they performed a phoneme detection task. Behavioural (RT) data was collected from two groups: one who received audio-motor training, and one who did not. We hypothesised that 1) phonological processing would be enhanced in matching conditions, and 2) audio-motor training with the musical rhythms would enhance this effect. Indeed, providing a matching rhythmic prime context resulted in faster phoneme detection, thus revealing a cross-domain effect of musical rhythm on phonological processing. In addition, our results indicate that rhythmic audio-motor training enhances this priming effect. These results have important implications for rhythm-based speech therapies, and suggest that metrical rhythm in music and speech may rely on shared temporal processing brain resources. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Audio-Visual Perception of 3D Cinematography: An fMRI Study Using Condition-Based and Computation-Based Analyses

    PubMed Central

    Ogawa, Akitoshi; Bordier, Cecile; Macaluso, Emiliano

    2013-01-01

    The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard “condition-based” designs, as well as “computational” methods based on the extraction of time-varying features of the stimuli (e.g. motion). Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround), 3D with monaural sound (3D-Mono), 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG). The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life-like stimuli. PMID:24194828

  3. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    PubMed

    Ogawa, Akitoshi; Bordier, Cecile; Macaluso, Emiliano

    2013-01-01

    The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion). Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround), 3D with monaural sound (3D-Mono), 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG). The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life-like stimuli.

  4. Horatio Audio-Describes Shakespeare's "Hamlet": Blind and Low-Vision Theatre-Goers Evaluate an Unconventional Audio Description Strategy

    ERIC Educational Resources Information Center

    Udo, J. P.; Acevedo, B.; Fels, D. I.

    2010-01-01

    Audio description (AD) has been introduced as one solution for providing people who are blind or have low vision with access to live theatre, film and television content. However, there is little research to inform the process, user preferences and presentation style. We present a study of a single live audio-described performance of Hart House…

  5. Neural decoding of attentional selection in multi-speaker environments without access to clean sources

    NASA Astrophysics Data System (ADS)

    O'Sullivan, James; Chen, Zhuo; Herrero, Jose; McKhann, Guy M.; Sheth, Sameer A.; Mehta, Ashesh D.; Mesgarani, Nima

    2017-10-01

    Objective. People who suffer from hearing impairments can find it difficult to follow a conversation in a multi-speaker environment. Current hearing aids can suppress background noise; however, there is little that can be done to help a user attend to a single conversation amongst many without knowing which speaker the user is attending to. Cognitively controlled hearing aids that use auditory attention decoding (AAD) methods are the next step in offering help. Translating the successes in AAD research to real-world applications poses a number of challenges, including the lack of access to the clean sound sources in the environment with which to compare with the neural signals. We propose a novel framework that combines single-channel speech separation algorithms with AAD. Approach. We present an end-to-end system that (1) receives a single audio channel containing a mixture of speakers that is heard by a listener along with the listener’s neural signals, (2) automatically separates the individual speakers in the mixture, (3) determines the attended speaker, and (4) amplifies the attended speaker’s voice to assist the listener. Main results. Using invasive electrophysiology recordings, we identified the regions of the auditory cortex that contribute to AAD. Given appropriate electrode locations, our system is able to decode the attention of subjects and amplify the attended speaker using only the mixed audio. Our quality assessment of the modified audio demonstrates a significant improvement in both subjective and objective speech quality measures. Significance. Our novel framework for AAD bridges the gap between the most recent advancements in speech processing technologies and speech prosthesis research and moves us closer to the development of cognitively controlled hearable devices for the hearing impaired.

  6. Software for Acoustic Rendering

    NASA Technical Reports Server (NTRS)

    Miller, Joel D.

    2003-01-01

    SLAB is a software system that can be run on a personal computer to simulate an acoustic environment in real time. SLAB was developed to enable computational experimentation in which one can exert low-level control over a variety of signal-processing parameters, related to spatialization, for conducting psychoacoustic studies. Among the parameters that can be manipulated are the number and position of reflections, the fidelity (that is, the number of taps in finite-impulse-response filters), the system latency, and the update rate of the filters. Another goal in the development of SLAB was to provide an inexpensive means of dynamic synthesis of virtual audio over headphones, without need for special-purpose signal-processing hardware. SLAB has a modular, object-oriented design that affords the flexibility and extensibility needed to accommodate a variety of computational experiments and signal-flow structures. SLAB s spatial renderer has a fixed signal-flow architecture corresponding to a set of parallel signal paths from each source to a listener. This fixed architecture can be regarded as a compromise that optimizes efficiency at the expense of complete flexibility. Such a compromise is necessary, given the design goal of enabling computational psychoacoustic experimentation on inexpensive personal computers.

  7. A Virtual Audio Guidance and Alert System for Commercial Aircraft Operations

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.; Shrum, Richard; Miller, Joel; Null, Cynthia H. (Technical Monitor)

    1996-01-01

    Our work in virtual reality systems at NASA Ames Research Center includes the area of aurally-guided visual search, using specially-designed audio cues and spatial audio processing (also known as virtual or "3-D audio") techniques (Begault, 1994). Previous studies at Ames had revealed that use of 3-D audio for Traffic Collision Avoidance System (TCAS) advisories significantly reduced head-down time, compared to a head-down map display (0.5 sec advantage) or no display at all (2.2 sec advantage) (Begault, 1993, 1995; Begault & Pittman, 1994; see Wenzel, 1994, for an audio demo). Since the crew must keep their head up and looking out the window as much as possible when taxiing under low-visibility conditions, and the potential for "blunder" is increased under such conditions, it was sensible to evaluate the audio spatial cueing for a prototype audio ground collision avoidance warning (GCAW) system, and a 3-D audio guidance system. Results were favorable for GCAW, but not for the audio guidance system.

  8. Hierarchical structure for audio-video based semantic classification of sports video sequences

    NASA Astrophysics Data System (ADS)

    Kolekar, M. H.; Sengupta, S.

    2005-07-01

    A hierarchical structure for sports event classification based on audio and video content analysis is proposed in this paper. Compared to the event classifications in other games, those of cricket are very challenging and yet unexplored. We have successfully solved cricket video classification problem using a six level hierarchical structure. The first level performs event detection based on audio energy and Zero Crossing Rate (ZCR) of short-time audio signal. In the subsequent levels, we classify the events based on video features using a Hidden Markov Model implemented through Dynamic Programming (HMM-DP) using color or motion as a likelihood function. For some of the game-specific decisions, a rule-based classification is also performed. Our proposed hierarchical structure can easily be applied to any other sports. Our results are very promising and we have moved a step forward towards addressing semantic classification problems in general.

  9. Vibration Signaling in Mobile Devices for Emergency Alerting: A Study with Deaf Evaluators

    ERIC Educational Resources Information Center

    Harkins, Judith; Tucker, Paula E.; Williams, Norman; Sauro, Jeff

    2010-01-01

    In the United States, a nationwide Commercial Mobile Alert Service (CMAS) is being planned to alert cellular mobile device subscribers to emergencies occurring near the location of the mobile device. The plan specifies a unique audio attention signal as well as a unique vibration attention signal (for mobile devices set to vibrate) to identify…

  10. Novel Sessile Drop Software for Quantitative Estimation of Slag Foaming in Carbon/Slag Interactions

    NASA Astrophysics Data System (ADS)

    Khanna, Rita; Rahman, Mahfuzur; Leow, Richard; Sahajwalla, Veena

    2007-08-01

    Novel video-processing software has been developed for the sessile drop technique for a rapid and quantitative estimation of slag foaming. The data processing was carried out in two stages: the first stage involved the initial transformation of digital video/audio signals into a format compatible with computing software, and the second stage involved the computation of slag droplet volume and area of contact in a chosen video frame. Experimental results are presented on slag foaming from synthetic graphite/slag system at 1550 °C. This technique can be used for determining the extent and stability of foam as a function of time.

  11. [A magnetic therapy apparatus with an adaptable electromagnetic spectrum for the treatment of prostatitis and gynecopathies].

    PubMed

    Kuz'min, A A; Meshkovskiĭ, D V; Filist, S A

    2008-01-01

    Problems of engineering and algorithm development of magnetic therapy apparatuses with pseudo-random radiation spectrum within the audio range for treatment of prostatitis and gynecopathies are considered. A typical design based on a PIC 16F microcontroller is suggested. It includes a keyboard, LCD indicator, audio amplifier, inducer, and software units. The problem of pseudo-random signal generation within the audio range is considered. A series of rectangular pulses is generated on a random-length interval on the basis of a three-component random vector. This series provides the required spectral characteristics of the therapeutic magnetic field and their adaptation to the therapeutic conditions and individual features of the patient.

  12. 47 CFR 76.66 - Satellite broadcast signal carriage.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... MULTICHANNEL VIDEO AND CABLE TELEVISION SERVICE Carriage of Television Broadcast Signals § 76.66 Satellite... satellite carrier that offers multichannel video programming distribution service in the United States to... entirety the primary video, accompanying audio, and closed captioning data contained in line 21 of the...

  13. 47 CFR 76.66 - Satellite broadcast signal carriage.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... MULTICHANNEL VIDEO AND CABLE TELEVISION SERVICE Carriage of Television Broadcast Signals § 76.66 Satellite... satellite carrier that offers multichannel video programming distribution service in the United States to... entirety the primary video, accompanying audio, and closed captioning data contained in line 21 of the...

  14. 47 CFR 76.66 - Satellite broadcast signal carriage.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... MULTICHANNEL VIDEO AND CABLE TELEVISION SERVICE Carriage of Television Broadcast Signals § 76.66 Satellite... satellite carrier that offers multichannel video programming distribution service in the United States to... entirety the primary video, accompanying audio, and closed captioning data contained in line 21 of the...

  15. 47 CFR 76.66 - Satellite broadcast signal carriage.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... MULTICHANNEL VIDEO AND CABLE TELEVISION SERVICE Carriage of Television Broadcast Signals § 76.66 Satellite... satellite carrier that offers multichannel video programming distribution service in the United States to... entirety the primary video, accompanying audio, and closed captioning data contained in line 21 of the...

  16. A Dynamical Model of Pitch Memory Provides an Improved Basis for Implied Harmony Estimation.

    PubMed

    Kim, Ji Chul

    2017-01-01

    Tonal melody can imply vertical harmony through a sequence of tones. Current methods for automatic chord estimation commonly use chroma-based features extracted from audio signals. However, the implied harmony of unaccompanied melodies can be difficult to estimate on the basis of chroma content in the presence of frequent nonchord tones. Here we present a novel approach to automatic chord estimation based on the human perception of pitch sequences. We use cohesion and inhibition between pitches in auditory short-term memory to differentiate chord tones and nonchord tones in tonal melodies. We model short-term pitch memory as a gradient frequency neural network, which is a biologically realistic model of auditory neural processing. The model is a dynamical system consisting of a network of tonotopically tuned nonlinear oscillators driven by audio signals. The oscillators interact with each other through nonlinear resonance and lateral inhibition, and the pattern of oscillatory traces emerging from the interactions is taken as a measure of pitch salience. We test the model with a collection of unaccompanied tonal melodies to evaluate it as a feature extractor for chord estimation. We show that chord tones are selectively enhanced in the response of the model, thereby increasing the accuracy of implied harmony estimation. We also find that, like other existing features for chord estimation, the performance of the model can be improved by using segmented input signals. We discuss possible ways to expand the present model into a full chord estimation system within the dynamical systems framework.

  17. Principles of signal conditioning.

    PubMed

    Finkel, A; Bookman, R

    2001-05-01

    It is rare for biological, physiological, chemical, electrical, or physical signals to be measured in the appropriate format for recording and interpretation. Usually, a signal must be conditioned to optimize it for both of these functions. This overview describes the fundamentals of signal filtering, how to prepare signals for A/D conversion, signal averaging to increase the signal-to-noise ratio, line frequency pickup (hum), peak-to-peak and rms noise measurements, blanking, audio monitoring, testing of electrodes and the common-mode rejection ratio.

  18. A scheme for racquet sports video analysis with the combination of audio-visual information

    NASA Astrophysics Data System (ADS)

    Xing, Liyuan; Ye, Qixiang; Zhang, Weigang; Huang, Qingming; Yu, Hua

    2005-07-01

    As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.

  19. A two-stage approach to removing noise from recorded music

    NASA Astrophysics Data System (ADS)

    Berger, Jonathan; Goldberg, Maxim J.; Coifman, Ronald C.; Goldberg, Maxim J.; Coifman, Ronald C.

    2004-05-01

    A two-stage algorithm for removing noise from recorded music signals (first proposed in Berger et al., ICMC, 1995) is described and updated. The first stage selects the ``best'' local trigonometric basis for the signal and models noise as the part having high entropy [see Berger et al., J. Audio Eng. Soc. 42(10), 808-818 (1994)]. In the second stage, the original source and the model of the noise obtained from the first stage are expanded into dyadic trees of smooth local sine bases. The best basis for the source signal is extracted using a relative entropy function (the Kullback-Leibler distance) to compare the sum of the costs of the children nodes to the cost of their parent node; energies of the noise in corresponding nodes of the model noise tree are used as weights. The talk will include audio examples of various stages of the method and proposals for further research.

  20. Radioactive Decay: Audio Data Collection

    ERIC Educational Resources Information Center

    Struthers, Allan

    2009-01-01

    Many phenomena generate interesting audible time series. This data can be collected and processed using audio software. The free software package "Audacity" is used to demonstrate the process by recording, processing, and extracting click times from an inexpensive radiation detector. The high quality of the data is demonstrated with a simple…

  1. Distinguishing detection from identification in subliminal auditory perception: a review and critique of Merikle's study.

    PubMed

    Harris, J L; Salus, D; Rerecich, R; Larsen, D

    1996-01-01

    Assertions made by Merikle (1988) regarding audio subliminal messages were tested. Seventeen participants were presented subliminal messages embedded in a white-noise cover, and three signal-to-noise (S/N) detection ratios were examined. Participants were asked to guess message presence and message content, to determine subjective/objective thresholds. Results showed that participants were unable to identify target words presented in this audio subliminal stimulus format beyond chance levels.

  2. Adaptive Modulation Approach for Robust MPEG-4 AAC Encoded Audio Transmission

    DTIC Science & Technology

    2011-11-01

    as shown in Table 1. Table 1 specifies the perceptual interpretation of the ODG. Subjective Difference Grade ( SDG ) = Grade Signal under test... SDG using human hearing and cognitive model [8], [9]. Freely available PEAQ basic model, “PQevalAudio,” is used in this paper which is available as...PEAQ-ODG Score [6] Impairment ITU-R Five Grade Impairment Scale SDG /PEAQ-ODG Score Imperceptible 5.00 0.00 Perceptible, but not Annoying 4.00

  3. A Cough-Based Algorithm for Automatic Diagnosis of Pertussis.

    PubMed

    Pramono, Renard Xaviero Adhi; Imtiaz, Syed Anas; Rodriguez-Villegas, Esther

    2016-01-01

    Pertussis is a contagious respiratory disease which mainly affects young children and can be fatal if left untreated. The World Health Organization estimates 16 million pertussis cases annually worldwide resulting in over 200,000 deaths. It is prevalent mainly in developing countries where it is difficult to diagnose due to the lack of healthcare facilities and medical professionals. Hence, a low-cost, quick and easily accessible solution is needed to provide pertussis diagnosis in such areas to contain an outbreak. In this paper we present an algorithm for automated diagnosis of pertussis using audio signals by analyzing cough and whoop sounds. The algorithm consists of three main blocks to perform automatic cough detection, cough classification and whooping sound detection. Each of these extract relevant features from the audio signal and subsequently classify them using a logistic regression model. The output from these blocks is collated to provide a pertussis likelihood diagnosis. The performance of the proposed algorithm is evaluated using audio recordings from 38 patients. The algorithm is able to diagnose all pertussis successfully from all audio recordings without any false diagnosis. It can also automatically detect individual cough sounds with 92% accuracy and PPV of 97%. The low complexity of the proposed algorithm coupled with its high accuracy demonstrates that it can be readily deployed using smartphones and can be extremely useful for quick identification or early screening of pertussis and for infection outbreaks control.

  4. A Cough-Based Algorithm for Automatic Diagnosis of Pertussis

    PubMed Central

    Pramono, Renard Xaviero Adhi; Imtiaz, Syed Anas; Rodriguez-Villegas, Esther

    2016-01-01

    Pertussis is a contagious respiratory disease which mainly affects young children and can be fatal if left untreated. The World Health Organization estimates 16 million pertussis cases annually worldwide resulting in over 200,000 deaths. It is prevalent mainly in developing countries where it is difficult to diagnose due to the lack of healthcare facilities and medical professionals. Hence, a low-cost, quick and easily accessible solution is needed to provide pertussis diagnosis in such areas to contain an outbreak. In this paper we present an algorithm for automated diagnosis of pertussis using audio signals by analyzing cough and whoop sounds. The algorithm consists of three main blocks to perform automatic cough detection, cough classification and whooping sound detection. Each of these extract relevant features from the audio signal and subsequently classify them using a logistic regression model. The output from these blocks is collated to provide a pertussis likelihood diagnosis. The performance of the proposed algorithm is evaluated using audio recordings from 38 patients. The algorithm is able to diagnose all pertussis successfully from all audio recordings without any false diagnosis. It can also automatically detect individual cough sounds with 92% accuracy and PPV of 97%. The low complexity of the proposed algorithm coupled with its high accuracy demonstrates that it can be readily deployed using smartphones and can be extremely useful for quick identification or early screening of pertussis and for infection outbreaks control. PMID:27583523

  5. Egalisation adaptative et non invasive de la reponse temps-frequence d'une petite salle

    NASA Astrophysics Data System (ADS)

    Martin, Tristan

    In this research, we are interested in sound, environment wherein it propagates, the interaction between the sound wave and a transmission channel, and the changes induced by the components of an audio chain. The specific context studied is that of listening to music on loudspeakers. For the environment in which sound wave propagates, like for any transmission channel, there are mathematical functions used to characterize the changes induced by a channel on the signal therethrough. An electric signal serves as a input for a system, in this case consisting of an amplifier, a loudspeaker, and the room where the listening takes place, which according to its characteristics, returns as an output at the listening position, an altered sound wave. Frequency response, impulse response, transfer function, the mathematics used are no different from those used commonly for the characterization of a transmission channel or the expression of the outputs of a linear system to its inputs. Naturally, there is a purpose to this modeling exercise: getting the frequency response of the amplifier/loundspeaker/room chain makes possible its equalization. It is common in many contexts of listening to find a filter inserted into the audio chain between the source (Eg CD player) and the amplifier/loudspeaker that converts the electrical signal to an acoustic signal propagated in the room. This filter, called "equalizer" is intended to compensate the frequency effect of the components of the audio chain and the room on the sound signal that will be transmitted. Properties for designing this filter are derived from those of the audio chain. Although analytically rigorous, physical approach, focusing on physical modeling of the loudspeaker and the propagation equation of the acoustic wave is ill-suited to rooms with complex geometry and changing over time. The second approach, experimental modeling, and therefore that addressed in this work, ignores physical properties. The system audio chain is rather seen as a "black box" including inputs and outputs. The problem studied is the characterization of an electro-acoustic system as having a single input signal transmitted through a speaker in a room, and a single output signal picked up by a microphone at the listening position. The originality of this work lies not only in the technique developed to arrive at this characterization, but especially in the constraints imposed in order to get there. The majority of technics documented to this date involve using excitation signals dedicated the measure; signals with favorable characteristics to simplify the calculation of the impulse response of the audio chain. Known signals are played through a loudspeaker and the room's response to excitation is captured with a microphone at the listening position. The measurement exercise itself poses problem, especially when there is an audience in the room. Also, the response of the room may change between the time of the measurement and time of listening. If the room is reconfigured for example, a curtain is pulled or the stage moved. In the case of a theater, the speaker used may vary depending on the context. A survey of work in which solutions to this problem are suggested was made. The main objective is to develop an innovative method to capture the impulse response of an audio chain without the knowledge of the audience. To do this, no signal dedicated to the measurement should be used. The developed method allows the capture of the electro-acoustic impulse response exploiting only the music signals when it comes to a concert hall or using a movie sound track when a movie is a movie theater. As a result, an algorithm for modeling dynamicly and continuously the response of a room. A finite impulse response filter acting as a digital equalizer must be designed and also able to dynamically adapt the behavior of the room, even when it varies over time. A multi spectral resolution method is used to build, for diffrent frequency bands, the filter response arising from the inversion of the room/speaker frequency response. The resulting dynamically adapting filter has properties similar to those of the human ear, a significant spectral-resolution in lower frequencies, and high time-resolution at high frequencies. The response corrected by the filter system tends approaching to a pure pulse. Techniques explored in the context of this research led to the publication of a scientific article in a peer reviewed journal and one conference paper in which similar methods were used for mining engineering applications. (Abstract shortened by UMI.).

  6. Finding the Correspondence of Audio-Visual Events by Object Manipulation

    NASA Astrophysics Data System (ADS)

    Nishibori, Kento; Takeuchi, Yoshinori; Matsumoto, Tetsuya; Kudo, Hiroaki; Ohnishi, Noboru

    A human being understands the objects in the environment by integrating information obtained by the senses of sight, hearing and touch. In this integration, active manipulation of objects plays an important role. We propose a method for finding the correspondence of audio-visual events by manipulating an object. The method uses the general grouping rules in Gestalt psychology, i.e. “simultaneity” and “similarity” among motion command, sound onsets and motion of the object in images. In experiments, we used a microphone, a camera, and a robot which has a hand manipulator. The robot grasps an object like a bell and shakes it or grasps an object like a stick and beat a drum in a periodic, or non-periodic motion. Then the object emits periodical/non-periodical events. To create more realistic scenario, we put other event source (a metronome) in the environment. As a result, we had a success rate of 73.8 percent in finding the correspondence between audio-visual events (afferent signal) which are relating to robot motion (efferent signal).

  7. Development of a smartphone-based pulse oximeter with adaptive SNR/power balancing.

    PubMed

    Phelps, Tom; Haowei Jiang; Hall, Drew A

    2017-07-01

    Millions worldwide suffer from diseases that exhibit early warnings signs that can be detected by standard clinical-grade diagnostic tools. Unfortunately, such tools are often prohibitively expensive to the developing world leading to inadequate healthcare and high mortality rates. To address this problem, a smartphone-based pulse oximeter is presented that interfaces with the phone through the audio jack, enabling point-of-care measurements of heart rate (HR) and oxygen saturation (SpO 2 ). The device is designed to utilize existing phone resources (e.g., the processor, battery, and memory) resulting in a more portable and inexpensive diagnostic tool than standalone equivalents. By adaptively tuning the LED driving signal, the device is less dependent on phone-specific audio jack properties than prior audio jack-based work making it universally compatible with all smartphones. We demonstrate that the pulse oximeter can adaptively optimize the signal-to-noise ratio (SNR) within the power constraints of a mobile phone (<; 10mW) while maintaining high accuracy (HR error <; 3.4% and SpO 2 error <; 3.7%) against a clinical grade instrument.

  8. External unit for a semi-implantable middle ear hearing device.

    PubMed

    Garverick, S L; Kane, M; Ko, W H; Maniglia, A J

    1997-06-01

    A miniaturized, low-power external unit has been developed for the clinical trials of a semi-implantable middle ear electromagnetic hearing device (SIMEHD) which uses radio-frequency telemetry to couple sound signals to the internal unit. The external unit is based on a commercial hearing aid which provides proven audio amplification and compression. Its receiver is replaced by an application-specific integrated circuit (ASIC) which: 1) adjusts the direct-current bias of the audio input according to its peak value; 2) converts the audio signal to a one-bit digital form using sigma-delta modulation; 3) modulates the sigma-delta output with a radio-frequency (RF) oscillator; and 4) drives the external RF coil and tuning capacitor using a field-effect transistor operated in class D. The external unit functions as expected and has been used to operate bench-top tests to the SIMEHD. Measured current consumption is 1.65-2.15 mA, which projects to a battery lifetime of about 15 days. Bandwidth is 6 kHz and harmonic distortion is about 2%.

  9. Localization with Sparse Acoustic Sensor Network Using UAVs as Information-Seeking Data Mules

    DTIC Science & Technology

    2013-05-01

    technique to differentiate among several sources. 2.2. AoA Estimation AoA Models. The kth of NAOA AoA sensors produces an angular measurement modeled...squares sense. θ̂ = arg min φ 3∑ i=1 ( ̂τi0 − eTφ ri )2 (9) The minimization was done by gridding the one-dimensional angular space and finding the optimum...Latitude E5500 laptop running FreeBSD and custom Java applications to process and store the raw audio signals. Power Source: The laptop was powered for an

  10. Speech to Text Translation for Malay Language

    NASA Astrophysics Data System (ADS)

    Al-khulaidi, Rami Ali; Akmeliawati, Rini

    2017-11-01

    The speech recognition system is a front end and a back-end process that receives an audio signal uttered by a speaker and converts it into a text transcription. The speech system can be used in several fields including: therapeutic technology, education, social robotics and computer entertainments. In most cases in control tasks, which is the purpose of proposing our system, wherein the speed of performance and response concern as the system should integrate with other controlling platforms such as in voiced controlled robots. Therefore, the need for flexible platforms, that can be easily edited to jibe with functionality of the surroundings, came to the scene; unlike other software programs that require recording audios and multiple training for every entry such as MATLAB and Phoenix. In this paper, a speech recognition system for Malay language is implemented using Microsoft Visual Studio C#. 90 (ninety) Malay phrases were tested by 10 (ten) speakers from both genders in different contexts. The result shows that the overall accuracy (calculated from Confusion Matrix) is satisfactory as it is 92.69%.

  11. Authenticity examination of compressed audio recordings using detection of multiple compression and encoders' identification.

    PubMed

    Korycki, Rafal

    2014-05-01

    Since the appearance of digital audio recordings, audio authentication has been becoming increasingly difficult. The currently available technologies and free editing software allow a forger to cut or paste any single word without audible artifacts. Nowadays, the only method referring to digital audio files commonly approved by forensic experts is the ENF criterion. It consists in fluctuation analysis of the mains frequency induced in electronic circuits of recording devices. Therefore, its effectiveness is strictly dependent on the presence of mains signal in the recording, which is a rare occurrence. Recently, much attention has been paid to authenticity analysis of compressed multimedia files and several solutions were proposed for detection of double compression in both digital video and digital audio. This paper addresses the problem of tampering detection in compressed audio files and discusses new methods that can be used for authenticity analysis of digital recordings. Presented approaches consist in evaluation of statistical features extracted from the MDCT coefficients as well as other parameters that may be obtained from compressed audio files. Calculated feature vectors are used for training selected machine learning algorithms. The detection of multiple compression covers up tampering activities as well as identification of traces of montage in digital audio recordings. To enhance the methods' robustness an encoder identification algorithm was developed and applied based on analysis of inherent parameters of compression. The effectiveness of tampering detection algorithms is tested on a predefined large music database consisting of nearly one million of compressed audio files. The influence of compression algorithms' parameters on the classification performance is discussed, based on the results of the current study. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  12. Defining Audio/Video Redundancy from a Limited Capacity Information Processing Perspective.

    ERIC Educational Resources Information Center

    Lang, Annie

    1995-01-01

    Investigates whether audio/video redundancy improves memory for television messages. Suggests a theoretical framework for classifying previous work and reinterpreting the results. Suggests general support for the notion that redundancy levels affect the capacity requirements of the message, which impact differentially on audio or visual…

  13. Optical Interferometric Measurement of Skin Vibration for the Diagnosis of Cardiovascular Diseases.

    NASA Astrophysics Data System (ADS)

    Hong, Hyundae

    A system has been developed based on the measurement of skin surface vibration which is related to the underlying vascular wall motion for the superficial arteries and coronary movement for the chest wall. Data obtained suggests that the information detected by such measurements can be related to the derivative of the intravascular pressure, an important physiological parameter. These results are in contrast to conventional optical Doppler techniques which have been utilized to measure blood perfusion in the skin layers and blood flow within the superficial arteries. These techniques relied on the interaction between incident photons and moving red blood cells. The present system uses an optical interferometer with a 633 nm HeNe laser to detect μm displacements of the skin surface. A photodiode detects an optical Doppler shift signal of frequency, 2 v/ lambda, where v and lambda are the skin vibration velocity and the wavelength of the laser, respectively. The electronic processing system we developed enhances, cleans and processes the raw Doppler signal to produce two main outputs: Doppler audio, and a time domain profile of the skin velocity. The audio signal changes its tone according to the velocity of skin movement which is related to the first derivative of the intravascular pressure, and the internal structure of the intervening tissue layers between the vessel and the surface. The results obtained demonstrated that the skin velocity waveforms near each artery and the chest signals at the auscultation points for the four heart valve sounds were unique in their profiles. It also proved to be possible to measure the magnitude, harmonics, and the cardiovascular propagation delay for pulse waves. The theoretical and experimental results demonstrated that the system detected the skin velocity, which is related to the time derivative of the pressure. It also reduces the loading effect on the pulsation signals and heart sounds produced by the conventional piezoelectric vibration sensors. The system sensitivity, which could potentially be optimized further was 366.2 mum/sec for the present research. Overall, optical cardiovascular vibrometry has the potential to become a simple non invasive approach to cardiovascular screening.

  14. Power line detection system

    DOEpatents

    Latorre, V.R.; Watwood, D.B.

    1994-09-27

    A short-range, radio frequency (RF) transmitting-receiving system that provides both visual and audio warnings to the pilot of a helicopter or light aircraft of an up-coming power transmission line complex. Small, milliwatt-level narrowband transmitters, powered by the transmission line itself, are installed on top of selected transmission line support towers or within existing warning balls, and provide a continuous RF signal to approaching aircraft. The on-board receiver can be either a separate unit or a portion of the existing avionics, and can also share an existing antenna with another airborne system. Upon receipt of a warning signal, the receiver will trigger a visual and an audio alarm to alert the pilot to the potential power line hazard. 4 figs.

  15. Electrostatic Graphene Loudspeaker

    DTIC Science & Technology

    2013-06-01

    millennia, with classic examples being drum- heads and whistles for long-range communications and entertainment .4 In modern society, efficient small...harmonic oscilla- tor. Unlike most insect or musical instrument resonators which exhibit lightly damped sharp frequency response, a wide-band audio...sound signal is introduced from a signal generator or from a commercial laptop or digital music player. The maximum amplitude of the input signal Vin

  16. Aided Electrophysiology Using Direct Audio Input: Effects of Amplification and Absolute Signal Level

    PubMed Central

    Billings, Curtis J.; Miller, Christi W.; Tremblay, Kelly L.

    2016-01-01

    Purpose This study investigated (a) the effect of amplification on cortical auditory evoked potentials (CAEPs) at different signal levels when signal-to-noise ratios (SNRs) were equated between unaided and aided conditions, and (b) the effect of absolute signal level on aided CAEPs when SNR was held constant. Method CAEPs were recorded from 13 young adults with normal hearing. A 1000-Hz pure tone was presented in unaided and aided conditions with a linear analog hearing aid. Direct audio input was used, allowing recorded hearing aid noise floor to be added to unaided conditions to equate SNRs between conditions. An additional stimulus was created through scaling the noise floor to study the effect of signal level. Results Amplification resulted in delayed N1 and P2 peak latencies relative to the unaided condition. An effect of absolute signal level (when SNR was constant) was present for aided CAEP area measures, such that larger area measures were found at higher levels. Conclusion Results of this study further demonstrate that factors in addition to SNR must also be considered before CAEPs can be used to clinically to measure aided thresholds. PMID:26953543

  17. Learning multimodal dictionaries.

    PubMed

    Monaci, Gianluca; Jost, Philippe; Vandergheynst, Pierre; Mailhé, Boris; Lesage, Sylvain; Gribonval, Rémi

    2007-09-01

    Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving mutually related signals. The simultaneous processing of multimodal data can, in fact, reveal information that is otherwise hidden when considering the signals independently. However, in natural multimodal signals, the statistical dependencies between modalities are in general not obvious. Learning fundamental multimodal patterns could offer deep insight into the structure of such signals. In this paper, we present a novel model of multimodal signals based on their sparse decomposition over a dictionary of multimodal structures. An algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal is proposed, as well. The learning is defined in such a way that it can be accomplished by iteratively solving a generalized eigenvector problem, which makes the algorithm fast, flexible, and free of user-defined parameters. The proposed algorithm is applied to audiovisual sequences and it is able to discover underlying structures in the data. The detection of such audio-video patterns in audiovisual clips allows to effectively localize the sound source on the video in presence of substantial acoustic and visual distractors, outperforming state-of-the-art audiovisual localization algorithms.

  18. Reduction in time-to-sleep through EEG based brain state detection and audio stimulation.

    PubMed

    Zhuo Zhang; Cuntai Guan; Ti Eu Chan; Juanhong Yu; Aung Aung Phyo Wai; Chuanchu Wang; Haihong Zhang

    2015-08-01

    We developed an EEG- and audio-based sleep sensing and enhancing system, called iSleep (interactive Sleep enhancement apparatus). The system adopts a closed-loop approach which optimizes the audio recording selection based on user's sleep status detected through our online EEG computing algorithm. The iSleep prototype comprises two major parts: 1) a sleeping mask integrated with a single channel EEG electrode and amplifier, a pair of stereo earphones and a microcontroller with wireless circuit for control and data streaming; 2) a mobile app to receive EEG signals for online sleep monitoring and audio playback control. In this study we attempt to validate our hypothesis that appropriate audio stimulation in relation to brain state can induce faster onset of sleep and improve the quality of a nap. We conduct experiments on 28 healthy subjects, each undergoing two nap sessions - one with a quiet background and one with our audio-stimulation. We compare the time-to-sleep in both sessions between two groups of subjects, e.g., fast and slow sleep onset groups. The p-value obtained from Wilcoxon Signed Rank Test is 1.22e-04 for slow onset group, which demonstrates that iSleep can significantly reduce the time-to-sleep for people with difficulty in falling sleep.

  19. Land Mobile Satellite Service (LMSS) channel simulator: An end-to-end hardware simulation and study of the LMSS communications links

    NASA Technical Reports Server (NTRS)

    Salmasi, A. B. (Editor); Springett, J. C.; Sumida, J. T.; Richter, P. H.

    1984-01-01

    The design and implementation of the Land Mobile Satellite Service (LMSS) channel simulator as a facility for an end to end hardware simulation of the LMSS communications links, primarily with the mobile terminal is described. A number of studies are reported which show the applications of the channel simulator as a facility for validation and assessment of the LMSS design requirements and capabilities by performing quantitative measurements and qualitative audio evaluations for various link design parameters and channel impairments under simulated LMSS operating conditions. As a first application, the LMSS channel simulator was used in the evaluation of a system based on the voice processing and modulation (e.g., NBFM with 30 kHz of channel spacing and a 2 kHz rms frequency deviation for average talkers) selected for the Bell System's Advanced Mobile Phone Service (AMPS). The various details of the hardware design, qualitative audio evaluation techniques, signal to channel impairment measurement techniques, the justifications for criteria of different parameter selection in regards to the voice processing and modulation methods, and the results of a number of parametric studies are further described.

  20. Proposed patient motion monitoring system using feature point tracking with a web camera.

    PubMed

    Miura, Hideharu; Ozawa, Shuichi; Matsuura, Takaaki; Yamada, Kiyoshi; Nagata, Yasushi

    2017-12-01

    Patient motion monitoring systems play an important role in providing accurate treatment dose delivery. We propose a system that utilizes a web camera (frame rate up to 30 fps, maximum resolution of 640 × 480 pixels) and an in-house image processing software (developed using Microsoft Visual C++ and OpenCV). This system is simple to use and convenient to set up. The pyramidal Lucas-Kanade method was applied to calculate motions for each feature point by analysing two consecutive frames. The image processing software employs a color scheme where the defined feature points are blue under stable (no movement) conditions and turn red along with a warning message and an audio signal (beeping alarm) for large patient movements. The initial position of the marker was used by the program to determine the marker positions in all the frames. The software generates a text file that contains the calculated motion for each frame and saves it as a compressed audio video interleave (AVI) file. We proposed a patient motion monitoring system using a web camera, which is simple and convenient to set up, to increase the safety of treatment delivery.

  1. Modeling of the ground-to-SSFMB link networking features using SPW

    NASA Technical Reports Server (NTRS)

    Watson, John C.

    1993-01-01

    This report describes the modeling and simulation of the networking features of the ground-to-Space Station Freedom manned base (SSFMB) link using COMDISCO signal processing work-system (SPW). The networking features modeled include the implementation of Consultative Committee for Space Data Systems (CCSDS) protocols in the multiplexing of digitized audio and core data into virtual channel data units (VCDU's) in the control center complex and the demultiplexing of VCDU's in the onboard baseband signal processor. The emphasis of this work has been placed on techniques for modeling the CCSDS networking features using SPW. The objectives for developing the SPW models are to test the suitability of SPW for modeling networking features and to develop SPW simulation models of the control center complex and space station baseband signal processor for use in end-to-end testing of the ground-to-SSFMB S-band single access forward (SSAF) link.

  2. Using online handwriting and audio streams for mathematical expressions recognition: a bimodal approach

    NASA Astrophysics Data System (ADS)

    Medjkoune, Sofiane; Mouchère, Harold; Petitrenaud, Simon; Viard-Gaudin, Christian

    2013-01-01

    The work reported in this paper concerns the problem of mathematical expressions recognition. This task is known to be a very hard one. We propose to alleviate the difficulties by taking into account two complementary modalities. The modalities referred to are handwriting and audio ones. To combine the signals coming from both modalities, various fusion methods are explored. Performances evaluated on the HAMEX dataset show a significant improvement compared to a single modality (handwriting) based system.

  3. Variance fluctuations in nonstationary time series: a comparative study of music genres

    NASA Astrophysics Data System (ADS)

    Jennings, Heather D.; Ivanov, Plamen Ch.; De Martins, Allan M.; da Silva, P. C.; Viswanathan, G. M.

    2004-05-01

    An important problem in physics concerns the analysis of audio time series generated by transduced acoustic phenomena. Here, we develop a new method to quantify the scaling properties of the local variance of nonstationary time series. We apply this technique to analyze audio signals obtained from selected genres of music. We find quantitative differences in the correlation properties of high art music, popular music, and dance music. We discuss the relevance of these objective findings in relation to the subjective experience of music.

  4. Channel Compensation for Speaker Recognition using MAP Adapted PLDA and Denoising DNNs

    DTIC Science & Technology

    2016-06-21

    improvement has been the availability of large quantities of speaker-labeled data from telephone recordings. For new data applications, such as audio from...mi- crophone channels to the telephone channel. Audio files were rejected if the alignment process failed. At the end of the pro- cess a total of 873...Microphone 01 AT3035 ( Audio Technica Studio Mic) 02 MX418S (Shure Gooseneck Mic) 03 Crown PZM Soundgrabber II 04 AT Pro45 ( Audio Technica Hanging Mic

  5. Audio-vocal interaction in single neurons of the monkey ventrolateral prefrontal cortex.

    PubMed

    Hage, Steffen R; Nieder, Andreas

    2015-05-06

    Complex audio-vocal integration systems depend on a strong interconnection between the auditory and the vocal motor system. To gain cognitive control over audio-vocal interaction during vocal motor control, the PFC needs to be involved. Neurons in the ventrolateral PFC (VLPFC) have been shown to separately encode the sensory perceptions and motor production of vocalizations. It is unknown, however, whether single neurons in the PFC reflect audio-vocal interactions. We therefore recorded single-unit activity in the VLPFC of rhesus monkeys (Macaca mulatta) while they produced vocalizations on command or passively listened to monkey calls. We found that 12% of randomly selected neurons in VLPFC modulated their discharge rate in response to acoustic stimulation with species-specific calls. Almost three-fourths of these auditory neurons showed an additional modulation of their discharge rates either before and/or during the monkeys' motor production of vocalization. Based on these audio-vocal interactions, the VLPFC might be well positioned to combine higher order auditory processing with cognitive control of the vocal motor output. Such audio-vocal integration processes in the VLPFC might constitute a precursor for the evolution of complex learned audio-vocal integration systems, ultimately giving rise to human speech. Copyright © 2015 the authors 0270-6474/15/357030-11$15.00/0.

  6. Objective Assessment of Patient Inhaler User Technique Using an Audio-Based Classification Approach.

    PubMed

    Taylor, Terence E; Zigel, Yaniv; Egan, Clarice; Hughes, Fintan; Costello, Richard W; Reilly, Richard B

    2018-02-01

    Many patients make critical user technique errors when using pressurised metered dose inhalers (pMDIs) which reduce the clinical efficacy of respiratory medication. Such critical errors include poor actuation coordination (poor timing of medication release during inhalation) and inhaling too fast (peak inspiratory flow rate over 90 L/min). Here, we present a novel audio-based method that objectively assesses patient pMDI user technique. The Inhaler Compliance Assessment device was employed to record inhaler audio signals from 62 respiratory patients as they used a pMDI with an In-Check Flo-Tone device attached to the inhaler mouthpiece. Using a quadratic discriminant analysis approach, the audio-based method generated a total frame-by-frame accuracy of 88.2% in classifying sound events (actuation, inhalation and exhalation). The audio-based method estimated the peak inspiratory flow rate and volume of inhalations with an accuracy of 88.2% and 83.94% respectively. It was detected that 89% of patients made at least one critical user technique error even after tuition from an expert clinical reviewer. This method provides a more clinically accurate assessment of patient inhaler user technique than standard checklist methods.

  7. "Listen to This!" Utilizing Audio Recordings to Improve Instructor Feedback on Writing in Mathematics

    ERIC Educational Resources Information Center

    Weld, Christopher

    2014-01-01

    Providing audio files in lieu of written remarks on graded assignments is arguably a more effective means of feedback, allowing students to better process and understand the critique and improve their future work. With emerging technologies and software, this audio feedback alternative to the traditional paradigm of providing written comments…

  8. Audiovisual focus of attention and its application to Ultra High Definition video compression

    NASA Astrophysics Data System (ADS)

    Rerabek, Martin; Nemoto, Hiromi; Lee, Jong-Seok; Ebrahimi, Touradj

    2014-02-01

    Using Focus of Attention (FoA) as a perceptual process in image and video compression belongs to well-known approaches to increase coding efficiency. It has been shown that foveated coding, when compression quality varies across the image according to region of interest, is more efficient than the alternative coding, when all region are compressed in a similar way. However, widespread use of such foveated compression has been prevented due to two main conflicting causes, namely, the complexity and the efficiency of algorithms for FoA detection. One way around these is to use as much information as possible from the scene. Since most video sequences have an associated audio, and moreover, in many cases there is a correlation between the audio and the visual content, audiovisual FoA can improve efficiency of the detection algorithm while remaining of low complexity. This paper discusses a simple yet efficient audiovisual FoA algorithm based on correlation of dynamics between audio and video signal components. Results of audiovisual FoA detection algorithm are subsequently taken into account for foveated coding and compression. This approach is implemented into H.265/HEVC encoder producing a bitstream which is fully compliant to any H.265/HEVC decoder. The influence of audiovisual FoA in the perceived quality of high and ultra-high definition audiovisual sequences is explored and the amount of gain in compression efficiency is analyzed.

  9. Performance enhancement for audio-visual speaker identification using dynamic facial muscle model.

    PubMed

    Asadpour, Vahid; Towhidkhah, Farzad; Homayounpour, Mohammad Mehdi

    2006-10-01

    Science of human identification using physiological characteristics or biometry has been of great concern in security systems. However, robust multimodal identification systems based on audio-visual information has not been thoroughly investigated yet. Therefore, the aim of this work to propose a model-based feature extraction method which employs physiological characteristics of facial muscles producing lip movements. This approach adopts the intrinsic properties of muscles such as viscosity, elasticity, and mass which are extracted from the dynamic lip model. These parameters are exclusively dependent on the neuro-muscular properties of speaker; consequently, imitation of valid speakers could be reduced to a large extent. These parameters are applied to a hidden Markov model (HMM) audio-visual identification system. In this work, a combination of audio and video features has been employed by adopting a multistream pseudo-synchronized HMM training method. Noise robust audio features such as Mel-frequency cepstral coefficients (MFCC), spectral subtraction (SS), and relative spectra perceptual linear prediction (J-RASTA-PLP) have been used to evaluate the performance of the multimodal system once efficient audio feature extraction methods have been utilized. The superior performance of the proposed system is demonstrated on a large multispeaker database of continuously spoken digits, along with a sentence that is phonetically rich. To evaluate the robustness of algorithms, some experiments were performed on genetically identical twins. Furthermore, changes in speaker voice were simulated with drug inhalation tests. In 3 dB signal to noise ratio (SNR), the dynamic muscle model improved the identification rate of the audio-visual system from 91 to 98%. Results on identical twins revealed that there was an apparent improvement on the performance for the dynamic muscle model-based system, in which the identification rate of the audio-visual system was enhanced from 87 to 96%.

  10. Talker variability in audio-visual speech perception

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker’s face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred. PMID:25076919

  11. Talker variability in audio-visual speech perception.

    PubMed

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  12. Low-cost synchronization of high-speed audio and video recordings in bio-acoustic experiments.

    PubMed

    Laurijssen, Dennis; Verreycken, Erik; Geipel, Inga; Daems, Walter; Peremans, Herbert; Steckel, Jan

    2018-02-27

    In this paper, we present a method for synchronizing high-speed audio and video recordings of bio-acoustic experiments. By embedding a random signal into the recorded video and audio data, robust synchronization of a diverse set of sensor streams can be performed without the need to keep detailed records. The synchronization can be performed using recording devices without dedicated synchronization inputs. We demonstrate the efficacy of the approach in two sets of experiments: behavioral experiments on different species of echolocating bats and the recordings of field crickets. We present the general operating principle of the synchronization method, discuss its synchronization strength and provide insights into how to construct such a device using off-the-shelf components. © 2018. Published by The Company of Biologists Ltd.

  13. Seismic intrusion detector system

    DOEpatents

    Hawk, Hervey L.; Hawley, James G.; Portlock, John M.; Scheibner, James E.

    1976-01-01

    A system for monitoring man-associated seismic movements within a control area including a geophone for generating an electrical signal in response to seismic movement, a bandpass amplifier and threshold detector for eliminating unwanted signals, pulse counting system for counting and storing the number of seismic movements within the area, and a monitoring system operable on command having a variable frequency oscillator generating an audio frequency signal proportional to the number of said seismic movements.

  14. Audio Watermark Embedding Technique Applying Auditory Stream Segregation: "G-encoder Mark" Able to Be Extracted by Mobile Phone

    NASA Astrophysics Data System (ADS)

    Modegi, Toshio

    We are developing audio watermarking techniques which enable extraction of embedded data by cell phones. For that we have to embed data onto frequency ranges, where our auditory response is prominent, therefore data embedding will cause much auditory noises. Previously we have proposed applying a two-channel stereo play-back feature, where noises generated by a data embedded left-channel signal will be reduced by the other right-channel signal. However, this proposal has practical problems of restricting extracting terminal location. In this paper, we propose synthesizing the noise reducing right-channel signal with the left-signal and reduces noises completely by generating an auditory stream segregation phenomenon to users. This newly proposed makes the noise reducing right-channel signal unnecessary and supports monaural play-back operations. Moreover, we propose a wide-band embedding method causing dual auditory stream segregation phenomena, which enables data embedding on whole public phone frequency ranges and stable extractions with 3-G mobile phones. From these proposals, extraction precisions become higher than those by the previously proposed method whereas the quality damages of embedded signals become smaller. In this paper we present an abstract of our newly proposed method and experimental results comparing with those by the previously proposed method.

  15. Calibration of Clinical Audio Recording and Analysis Systems for Sound Intensity Measurement.

    PubMed

    Maryn, Youri; Zarowski, Andrzej

    2015-11-01

    Sound intensity is an important acoustic feature of voice/speech signals. Yet recordings are performed with different microphone, amplifier, and computer configurations, and it is therefore crucial to calibrate sound intensity measures of clinical audio recording and analysis systems on the basis of output of a sound-level meter. This study was designed to evaluate feasibility, validity, and accuracy of calibration methods, including audiometric speech noise signals and human voice signals under typical speech conditions. Calibration consisted of 3 comparisons between data from 29 measurement microphone-and-computer systems and data from the sound-level meter: signal-specific comparison with audiometric speech noise at 5 levels, signal-specific comparison with natural voice at 3 levels, and cross-signal comparison with natural voice at 3 levels. Intensity measures from recording systems were then linearly converted into calibrated data on the basis of these comparisons, and validity and accuracy of calibrated sound intensity were investigated. Very strong correlations and quasisimilarity were found between calibrated data and sound-level meter data across calibration methods and recording systems. Calibration of clinical sound intensity measures according to this method is feasible, valid, accurate, and representative for a heterogeneous set of microphones and data acquisition systems in real-life circumstances with distinct noise contexts.

  16. Astronomical component estimation (ACE v.1) by time-variant sinusoidal modeling

    NASA Astrophysics Data System (ADS)

    Sinnesael, Matthias; Zivanovic, Miroslav; De Vleeschouwer, David; Claeys, Philippe; Schoukens, Johan

    2016-09-01

    Accurately deciphering periodic variations in paleoclimate proxy signals is essential for cyclostratigraphy. Classical spectral analysis often relies on methods based on (fast) Fourier transformation. This technique has no unique solution separating variations in amplitude and frequency. This characteristic can make it difficult to correctly interpret a proxy's power spectrum or to accurately evaluate simultaneous changes in amplitude and frequency in evolutionary analyses. This drawback is circumvented by using a polynomial approach to estimate instantaneous amplitude and frequency in orbital components. This approach was proven useful to characterize audio signals (music and speech), which are non-stationary in nature. Paleoclimate proxy signals and audio signals share similar dynamics; the only difference is the frequency relationship between the different components. A harmonic-frequency relationship exists in audio signals, whereas this relation is non-harmonic in paleoclimate signals. However, this difference is irrelevant for the problem of separating simultaneous changes in amplitude and frequency. Using an approach with overlapping analysis frames, the model (Astronomical Component Estimation, version 1: ACE v.1) captures time variations of an orbital component by modulating a stationary sinusoid centered at its mean frequency, with a single polynomial. Hence, the parameters that determine the model are the mean frequency of the orbital component and the polynomial coefficients. The first parameter depends on geologic interpretations, whereas the latter are estimated by means of linear least-squares. As output, the model provides the orbital component waveform, either in the depth or time domain. Uncertainty analyses of the model estimates are performed using Monte Carlo simulations. Furthermore, it allows for a unique decomposition of the signal into its instantaneous amplitude and frequency. Frequency modulation patterns reconstruct changes in accumulation rate, whereas amplitude modulation identifies eccentricity-modulated precession. The functioning of the time-variant sinusoidal model is illustrated and validated using a synthetic insolation signal. The new modeling approach is tested on two case studies: (1) a Pliocene-Pleistocene benthic δ18O record from Ocean Drilling Program (ODP) Site 846 and (2) a Danian magnetic susceptibility record from the Contessa Highway section, Gubbio, Italy.

  17. When I Stopped Writing on Their Papers: Accommodating the Needs of Student Writers with Audio Comments

    ERIC Educational Resources Information Center

    Bauer, Sara

    2011-01-01

    The author finds using software to make audio comments on students' writing improves students' understanding of her responses and increases their willingness to take her suggestions for revision more seriously. In the process of recording audio comments, she came to a new understanding of her students' writing needs and her responsibilities as…

  18. Instructional Audio Guidelines: Four Design Principles to Consider for Every Instructional Audio Design Effort

    ERIC Educational Resources Information Center

    Carter, Curtis W.

    2012-01-01

    This article contends that instructional designers and developers should attend to four particular design principles when creating instructional audio. Support for this view is presented by referencing the limited research that has been done in this area, and by indicating how and why each of the four principles is important to the design process.…

  19. Digital methods of recording color television images on film tape

    NASA Astrophysics Data System (ADS)

    Krivitskaya, R. Y.; Semenov, V. M.

    1985-04-01

    Three methods are now available for recording color television images on film tape, directly or after appropriate finish of signal processing. Conventional recording of images from the screens of three kinescopes with synthetic crystal face plates is still most effective for high fidelity. This method was improved by digital preprocessing of brightness color-difference signal. Frame-by-frame storage of these signals in the memory in digital form is followed by gamma and aperture correction and electronic correction of crossover distortions in the color layers of the film with fixing in accordance with specific emulsion procedures. The newer method of recording color television images with line arrays of light-emitting diodes involves dichromic superposing mirrors and a movable scanning mirror. This method allows the use of standard movie cameras, simplifies interlacing-to-linewise conversion and the mechanical equipment, and lengthens exposure time while it shortens recording time. The latest image transform method requires an audio-video recorder, a memory disk, a digital computer, and a decoder. The 9-step procedure includes preprocessing the total color television signal with reduction of noise level and time errors, followed by frame frequency conversion and setting the number of lines. The total signal is then resolved into its brightness and color-difference components and phase errors and image blurring are also reduced. After extraction of R,G,B signals and colorimetric matching of TV camera and film tape, the simultaneous R,B, B signals are converted from interlacing to sequential triades of color-quotient frames with linewise scanning at triple frequency. Color-quotient signals are recorded with an electron beam on a smoothly moving black-and-white film tape under vacuum. While digital techniques improve the signal quality and simplify the control of processes, not requiring stabilization of circuits, image processing is still analog.

  20. Audio frequency in vivo optical coherence elastography

    NASA Astrophysics Data System (ADS)

    Adie, Steven G.; Kennedy, Brendan F.; Armstrong, Julian J.; Alexandrov, Sergey A.; Sampson, David D.

    2009-05-01

    We present a new approach to optical coherence elastography (OCE), which probes the local elastic properties of tissue by using optical coherence tomography to measure the effect of an applied stimulus in the audio frequency range. We describe the approach, based on analysis of the Bessel frequency spectrum of the interferometric signal detected from scatterers undergoing periodic motion in response to an applied stimulus. We present quantitative results of sub-micron excitation at 820 Hz in a layered phantom and the first such measurements in human skin in vivo.

  1. DRACULA: Dynamic range control for broadcasting and other applications

    NASA Astrophysics Data System (ADS)

    Gilchrist, N. H. C.

    The BBC has developed a digital processor which is capable of reducing the dynamic range of audio in an unobtrusive manner. It is ideally suited to the task of controlling the level of musical programs. Operating as a self-contained dynamic range controller, the processor is suitable for controlling levels in conventional AM or FM broadcasting, or for applications such as the compression of program material for in-flight entertainment. It can, alternatively, be used to provide a supplementary signal in DAB (digital audio broadcasting) for optional dynamic compression in the receiver.

  2. Primary School Pupils' Response to Audio-Visual Learning Process in Port-Harcourt

    ERIC Educational Resources Information Center

    Olube, Friday K.

    2015-01-01

    The purpose of this study is to examine primary school children's response on the use of audio-visual learning processes--a case study of Chokhmah International Academy, Port-Harcourt (owned by Salvation Ministries). It looked at the elements that enhance pupils' response to educational television programmes and their hindrances to these…

  3. Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

    PubMed

    Keitel, Christian; Müller, Matthias M

    2016-05-01

    Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space.

  4. A direct broadcast satellite-audio experiment

    NASA Technical Reports Server (NTRS)

    Vaisnys, Arvydas; Abbe, Brian; Motamedi, Masoud

    1992-01-01

    System studies have been carried out over the past three years at the Jet Propulsion Laboratory (JPL) on digital audio broadcasting (DAB) via satellite. The thrust of the work to date has been on designing power and bandwidth efficient systems capable of providing reliable service to fixed, mobile, and portable radios. It is very difficult to predict performance in an environment which produces random periods of signal blockage, such as encountered in mobile reception where a vehicle can quickly move from one type of terrain to another. For this reason, some signal blockage mitigation techniques were built into an experimental DAB system and a satellite experiment was conducted to obtain both qualitative and quantitative measures of performance in a range of reception environments. This paper presents results from the experiment and some conclusions on the effectiveness of these blockage mitigation techniques.

  5. Multi-modal highlight generation for sports videos using an information-theoretic excitability measure

    NASA Astrophysics Data System (ADS)

    Hasan, Taufiq; Bořil, Hynek; Sangwan, Abhijeet; L Hansen, John H.

    2013-12-01

    The ability to detect and organize `hot spots' representing areas of excitement within video streams is a challenging research problem when techniques rely exclusively on video content. A generic method for sports video highlight selection is presented in this study which leverages both video/image structure as well as audio/speech properties. Processing begins where the video is partitioned into small segments and several multi-modal features are extracted from each segment. Excitability is computed based on the likelihood of the segmental features residing in certain regions of their joint probability density function space which are considered both exciting and rare. The proposed measure is used to rank order the partitioned segments to compress the overall video sequence and produce a contiguous set of highlights. Experiments are performed on baseball videos based on signal processing advancements for excitement assessment in the commentators' speech, audio energy, slow motion replay, scene cut density, and motion activity as features. Detailed analysis on correlation between user excitability and various speech production parameters is conducted and an effective scheme is designed to estimate the excitement level of commentator's speech from the sports videos. Subjective evaluation of excitability and ranking of video segments demonstrate a higher correlation with the proposed measure compared to well-established techniques indicating the effectiveness of the overall approach.

  6. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study.

    PubMed

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-11-06

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.

  7. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study

    PubMed Central

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-01-01

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented. PMID:26561811

  8. Multifunction audio digitizer. [producing direct delta and pulse code modulation

    NASA Technical Reports Server (NTRS)

    Monford, L. G., Jr. (Inventor)

    1974-01-01

    An illustrative embodiment of the invention includes apparatus which simultaneously produces both direct delta modulation and pulse code modulation. An input signal, after amplification, is supplied to a window comparator which supplies a polarity control signal to gate the output of a clock to the appropriate input of a binary up-down counter. The control signals provide direct delta modulation while the up-down counter output provides pulse code modulation.

  9. Pulse mode readout techniques for use with non-gridded industrial ionization chambers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Popov, Vladimir E.; Degtiarenko, Pavel V.

    2011-10-01

    Highly sensitive readout technique for precision long-term radiation measurements has been developed and tested in the Radiation Control Department at Jefferson Lab. The new electronics design is used to retrieve ionization data in a pulse mode. The dedicated data acquisition system works with M=Audio Audiophile 192 High-Definition 24-bit/192 kHz audio cards, taking data in continuous waveform recording mode. The on-line data processing algorithms extract signals of the ionization events from the data flow and measure the ionization value for each event. Two different ion chambers are evaluated. The first is a Reuter-Stokes Argon-filled (at 25 atm) High Pressure Ionization Chambermore » (HPIC), commonly used as a detector part in many GE Reuter-Stokes instruments of the RSS series. The second is a VacuTec Model 70181, 5 atm Xenon-filled ionization chamber. Results for both chambers indicate that the techniques allow using industrial ICs for high sensitivity and precision long-term radiation measurements, while at the same time providing information about spectral characteristics of the radiation fields.« less

  10. Analysis of musical expression in audio signals

    NASA Astrophysics Data System (ADS)

    Dixon, Simon

    2003-01-01

    In western art music, composers communicate their work to performers via a standard notation which specificies the musical pitches and relative timings of notes. This notation may also include some higher level information such as variations in the dynamics, tempo and timing. Famous performers are characterised by their expressive interpretation, the ability to convey structural and emotive information within the given framework. The majority of work on audio content analysis focusses on retrieving score-level information; this paper reports on the extraction of parameters describing the performance, a task which requires a much higher degree of accuracy. Two systems are presented: BeatRoot, an off-line beat tracking system which finds the times of musical beats and tracks changes in tempo throughout a performance, and the Performance Worm, a system which provides a real-time visualisation of the two most important expressive dimensions, tempo and dynamics. Both of these systems are being used to process data for a large-scale study of musical expression in classical and romantic piano performance, which uses artificial intelligence (machine learning) techniques to discover fundamental patterns or principles governing expressive performance.

  11. Comparison of three orientation and mobility aids for individuals with blindness: Verbal description, audio-tactile map and audio-haptic map.

    PubMed

    Papadopoulos, Konstantinos; Koustriava, Eleni; Koukourikos, Panagiotis; Kartasidou, Lefkothea; Barouti, Marialena; Varveris, Asimis; Misiou, Marina; Zacharogeorga, Timoclia; Anastasiadis, Theocharis

    2017-01-01

    Disorientation and inability of wayfinding are phenomena with a great frequency for individuals with visual impairments during the process of travelling novel environments. Orientation and mobility aids could suggest important tools for the preparation of a more secure and cognitively mapped travelling. The aim of the present study was to examine if spatial knowledge structured after an individual with blindness had studied the map of an urban area that was delivered through a verbal description, an audio-tactile map or an audio-haptic map, could be used for detecting in the area specific points of interest. The effectiveness of the three aids with reference to each other was also examined. The results of the present study highlight the effectiveness of the audio-tactile and the audio-haptic maps as orientation and mobility aids, especially when these are compared to verbal descriptions.

  12. A high efficiency PWM CMOS class-D audio power amplifier

    NASA Astrophysics Data System (ADS)

    Zhangming, Zhu; Lianxi, Liu; Yintang, Yang; Han, Lei

    2009-02-01

    Based on the difference close-loop feedback technique and the difference pre-amp, a high efficiency PWM CMOS class-D audio power amplifier is proposed. A rail-to-rail PWM comparator with window function has been embedded in the class-D audio power amplifier. Design results based on the CSMC 0.5 μm CMOS process show that the max efficiency is 90%, the PSRR is -75 dB, the power supply voltage range is 2.5-5.5 V, the THD+N in 1 kHz input frequency is less than 0.20%, the quiescent current in no load is 2.8 mA, and the shutdown current is 0.5 μA. The active area of the class-D audio power amplifier is about 1.47 × 1.52 mm2. With the good performance, the class-D audio power amplifier can be applied to several audio power systems.

  13. Cognitive spare capacity: evaluation data and its association with comprehension of dynamic conversations

    PubMed Central

    Keidser, Gitte; Best, Virginia; Freeston, Katrina; Boyce, Alexandra

    2015-01-01

    It is well-established that communication involves the working memory system, which becomes increasingly engaged in understanding speech as the input signal degrades. The more resources allocated to recovering a degraded input signal, the fewer resources, referred to as cognitive spare capacity (CSC), remain for higher-level processing of speech. Using simulated natural listening environments, the aims of this paper were to (1) evaluate an English version of a recently introduced auditory test to measure CSC that targets the updating process of the executive function, (2) investigate if the test predicts speech comprehension better than the reading span test (RST) commonly used to measure working memory capacity, and (3) determine if the test is sensitive to increasing the number of attended locations during listening. In Experiment I, the CSC test was presented using a male and a female talker, in quiet and in spatially separated babble- and cafeteria-noises, in an audio-only and in an audio-visual mode. Data collected on 21 listeners with normal and impaired hearing confirmed that the English version of the CSC test is sensitive to population group, noise condition, and clarity of speech, but not presentation modality. In Experiment II, performance by 27 normal-hearing listeners on a novel speech comprehension test presented in noise was significantly associated with working memory capacity, but not with CSC. Moreover, this group showed no significant difference in CSC as the number of talker locations in the test increased. There was no consistent association between the CSC test and the RST. It is recommended that future studies investigate the psychometric properties of the CSC test, and examine its sensitivity to the complexity of the listening environment in participants with both normal and impaired hearing. PMID:25999904

  14. Multiple Frequency Audio Signal Communication as a Mechanism for Neurophysiology and Video Data Synchronization

    PubMed Central

    Topper, Nicholas C.; Burke, S.N.; Maurer, A.P.

    2014-01-01

    BACKGROUND Current methods for aligning neurophysiology and video data are either prepackaged, requiring the additional purchase of a software suite, or use a blinking LED with a stationary pulse-width and frequency. These methods lack significant user interface for adaptation, are expensive, or risk a misalignment of the two data streams. NEW METHOD A cost-effective means to obtain high-precision alignment of behavioral and neurophysiological data is obtained by generating an audio-pulse embedded with two domains of information, a low-frequency binary-counting signal and a high, randomly changing frequency. This enabled the derivation of temporal information while maintaining enough entropy in the system for algorithmic alignment. RESULTS The sample to frame index constructed using the audio input correlation method described in this paper enables video and data acquisition to be aligned at a sub-frame level of precision. COMPARISONS WITH EXISTING METHOD Traditionally, a synchrony pulse is recorded on-screen via a flashing diode. The higher sampling rate of the audio input of the camcorder enables the timing of an event to be detected with greater precision. CONCLUSIONS While On-line analysis and synchronization using specialized equipment may be the ideal situation in some cases, the method presented in the current paper presents a viable, low cost alternative, and gives the flexibility to interface with custom off-line analysis tools. Moreover, the ease of constructing and implements this set-up presented in the current paper makes it applicable to a wide variety of applications that require video recording. PMID:25256648

  15. Multiple frequency audio signal communication as a mechanism for neurophysiology and video data synchronization.

    PubMed

    Topper, Nicholas C; Burke, Sara N; Maurer, Andrew Porter

    2014-12-30

    Current methods for aligning neurophysiology and video data are either prepackaged, requiring the additional purchase of a software suite, or use a blinking LED with a stationary pulse-width and frequency. These methods lack significant user interface for adaptation, are expensive, or risk a misalignment of the two data streams. A cost-effective means to obtain high-precision alignment of behavioral and neurophysiological data is obtained by generating an audio-pulse embedded with two domains of information, a low-frequency binary-counting signal and a high, randomly changing frequency. This enabled the derivation of temporal information while maintaining enough entropy in the system for algorithmic alignment. The sample to frame index constructed using the audio input correlation method described in this paper enables video and data acquisition to be aligned at a sub-frame level of precision. Traditionally, a synchrony pulse is recorded on-screen via a flashing diode. The higher sampling rate of the audio input of the camcorder enables the timing of an event to be detected with greater precision. While on-line analysis and synchronization using specialized equipment may be the ideal situation in some cases, the method presented in the current paper presents a viable, low cost alternative, and gives the flexibility to interface with custom off-line analysis tools. Moreover, the ease of constructing and implements this set-up presented in the current paper makes it applicable to a wide variety of applications that require video recording. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Audio visual speech source separation via improved context dependent association model

    NASA Astrophysics Data System (ADS)

    Kazemi, Alireza; Boostani, Reza; Sobhanmanesh, Fariborz

    2014-12-01

    In this paper, we exploit the non-linear relation between a speech source and its associated lip video as a source of extra information to propose an improved audio-visual speech source separation (AVSS) algorithm. The audio-visual association is modeled using a neural associator which estimates the visual lip parameters from a temporal context of acoustic observation frames. We define an objective function based on mean square error (MSE) measure between estimated and target visual parameters. This function is minimized for estimation of the de-mixing vector/filters to separate the relevant source from linear instantaneous or time-domain convolutive mixtures. We have also proposed a hybrid criterion which uses AV coherency together with kurtosis as a non-Gaussianity measure. Experimental results are presented and compared in terms of visually relevant speech detection accuracy and output signal-to-interference ratio (SIR) of source separation. The suggested audio-visual model significantly improves relevant speech classification accuracy compared to existing GMM-based model and the proposed AVSS algorithm improves the speech separation quality compared to reference ICA- and AVSS-based methods.

  17. Effect of audio in-vehicle red light-running warning message on driving behavior based on a driving simulator experiment.

    PubMed

    Yan, Xuedong; Liu, Yang; Xu, Yongcun

    2015-01-01

    Drivers' incorrect decisions of crossing signalized intersections at the onset of the yellow change may lead to red light running (RLR), and RLR crashes result in substantial numbers of severe injuries and property damage. In recent years, some Intelligent Transport System (ITS) concepts have focused on reducing RLR by alerting drivers that they are about to violate the signal. The objective of this study is to conduct an experimental investigation on the effectiveness of the red light violation warning system using a voice message. In this study, the prototype concept of the RLR audio warning system was modeled and tested in a high-fidelity driving simulator. According to the concept, when a vehicle is approaching an intersection at the onset of yellow and the time to the intersection is longer than the yellow interval, the in-vehicle warning system can activate the following audio message "The red light is impending. Please decelerate!" The intent of the warning design is to encourage drivers who cannot clear an intersection during the yellow change interval to stop at the intersection. The experimental results showed that the warning message could decrease red light running violations by 84.3 percent. Based on the logistic regression analyses, drivers without a warning were about 86 times more likely to make go decisions at the onset of yellow and about 15 times more likely to run red lights than those with a warning. Additionally, it was found that the audio warning message could significantly reduce RLR severity because the RLR drivers' red-entry times without a warning were longer than those with a warning. This driving simulator study showed a promising effect of the audio in-vehicle warning message on reducing RLR violations and crashes. It is worthwhile to further develop the proposed technology in field applications.

  18. Speech enhancement on smartphone voice recording

    NASA Astrophysics Data System (ADS)

    Tris Atmaja, Bagus; Nur Farid, Mifta; Arifianto, Dhany

    2016-11-01

    Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation.

  19. Noise in any frequency range can enhance information transmission in a sensory neuron

    NASA Astrophysics Data System (ADS)

    Levin, Jacob E.

    1997-05-01

    The effect of noise on the neural encoding of broadband signals was investigated in the cricket cercal system, a mechanosensory system sensitive to small near-field air particle disturbances. Known air current stimuli were presented to the cricket through audio speakers in a controlled environment in a variety of background noise conditions. Spike trains from the second layer of neuronal processing, the primary sensory interneurons, were recorded with intracellular Electrodes and the performance of these neurons characterized with the tools of information theory. SNR, mutual information rates, and other measures of encoding accuracy were calculated for single frequency, narrowband, and broadband signals over the entire amplitude sensitivity range of the cells, in the presence of uncorrelated noise background also spanning the cells' frequency and amplitude sensitivity range. Significant enhancements of transmitted information through the addition of external noise were observed regardless of the frequency range of either the signal or noise waveforms, provided both were within the operating range of the cell. Considerable improvements in signal encoding were observed for almost an entire order of magnitude of near-threshold signal amplitudes. This included sinusoidal signals embedded in broadband white noise, broadband signals in broadband noise, and even broadband signals presented with narrowband noise in a completely non-overlapping frequency range. The noise related increases in mutual information rate for broadband signals were as high as 150%, and up to 600% increases in SNR were observed for sinusoidal signals. Additionally, it was shown that the amount of information about the signal carried, on average, by each spike was INCREASED for small signals when presented with noise—implying that added input noise can, in certain situations, actually improve the accuracy of the encoding process itself.

  20. Project Echo: System Calculations

    NASA Technical Reports Server (NTRS)

    Ruthroff, Clyde L.; Jakes, William C., Jr.

    1961-01-01

    The primary experimental objective of Project Echo was the transmission of radio communications between points on the earth by reflection from the balloon satellite. This paper describes system calculations made in preparation for the experiment and their adaptation to the problem of interpreting the results. The calculations include path loss computations, expected audio signal-to-noise ratios, and received signal strength based on orbital parameters.

  1. Basic Radio Circuits and Vacuum Tube AM Troubleshooting; Radio and Television Service, Intermediate: 9785.03.

    ERIC Educational Resources Information Center

    Dade County Public Schools, Miami, FL.

    The 135-hour quinmester course covers study of basic radio circuits as applied to vacuum tube radios in six blocks of instruction: orientation; AM receivers with tubes; no signal, audio failure; distortion; weak, noisy signals; and a post-test. Each block is subdivided into several units, and block objectives are outlined. Completion of AC…

  2. Influences of selective adaptation on perception of audiovisual speech

    PubMed Central

    Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.

    2016-01-01

    Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781

  3. Audio-visual speech intelligibility benefits with bilateral cochlear implants when talker location varies.

    PubMed

    van Hoesel, Richard J M

    2015-04-01

    One of the key benefits of using cochlear implants (CIs) in both ears rather than just one is improved localization. It is likely that in complex listening scenes, improved localization allows bilateral CI users to orient toward talkers to improve signal-to-noise ratios and gain access to visual cues, but to date, that conjecture has not been tested. To obtain an objective measure of that benefit, seven bilateral CI users were assessed for both auditory-only and audio-visual speech intelligibility in noise using a novel dynamic spatial audio-visual test paradigm. For each trial conducted in spatially distributed noise, first, an auditory-only cueing phrase that was spoken by one of four talkers was selected and presented from one of four locations. Shortly afterward, a target sentence was presented that was either audio-visual or, in another test configuration, audio-only and was spoken by the same talker and from the same location as the cueing phrase. During the target presentation, visual distractors were added at other spatial locations. Results showed that in terms of speech reception thresholds (SRTs), the average improvement for bilateral listening over the better performing ear alone was 9 dB for the audio-visual mode, and 3 dB for audition-alone. Comparison of bilateral performance for audio-visual and audition-alone showed that inclusion of visual cues led to an average SRT improvement of 5 dB. For unilateral device use, no such benefit arose, presumably due to the greatly reduced ability to localize the target talker to acquire visual information. The bilateral CI speech intelligibility advantage over the better ear in the present study is much larger than that previously reported for static talker locations and indicates greater everyday speech benefits and improved cost-benefit than estimated to date.

  4. Audio-vocal system regulation in children with autism spectrum disorders.

    PubMed

    Russo, Nicole; Larson, Charles; Kraus, Nina

    2008-06-01

    Do children with autism spectrum disorders (ASD) respond similarly to perturbations in auditory feedback as typically developing (TD) children? Presentation of pitch-shifted voice auditory feedback to vocalizing participants reveals a close coupling between the processing of auditory feedback and vocal motor control. This paradigm was used to test the hypothesis that abnormalities in the audio-vocal system would negatively impact ASD compensatory responses to perturbed auditory feedback. Voice fundamental frequency (F(0)) was measured while children produced an /a/ sound into a microphone. The voice signal was fed back to the subjects in real time through headphones. During production, the feedback was pitch shifted (-100 cents, 200 ms) at random intervals for 80 trials. Averaged voice F(0) responses to pitch-shifted stimuli were calculated and correlated with both mental and language abilities as tested via standardized tests. A subset of children with ASD produced larger responses to perturbed auditory feedback than TD children, while the other children with ASD produced significantly lower response magnitudes. Furthermore, robust relationships between language ability, response magnitude and time of peak magnitude were identified. Because auditory feedback helps to stabilize voice F(0) (a major acoustic cue of prosody) and individuals with ASD have problems with prosody, this study identified potential mechanisms of dysfunction in the audio-vocal system for voice pitch regulation in some children with ASD. Objectively quantifying this deficit may inform both the assessment of a subgroup of ASD children with prosody deficits, as well as remediation strategies that incorporate pitch training.

  5. How we give personalised audio feedback after summative OSCEs.

    PubMed

    Harrison, Christopher J; Molyneux, Adrian J; Blackwell, Sara; Wass, Valerie J

    2015-04-01

    Students often receive little feedback after summative objective structured clinical examinations (OSCEs) to enable them to improve their performance. Electronic audio feedback has shown promise in other educational areas. We investigated the feasibility of electronic audio feedback in OSCEs. An electronic OSCE system was designed, comprising (1) an application for iPads allowing examiners to mark in the key consultation skill domains, provide "tick-box" feedback identifying strengths and difficulties, and record voice feedback; (2) a feedback website giving students the opportunity to view/listen in multiple ways to the feedback. Acceptability of the audio feedback was investigated, using focus groups with students and questionnaires with both examiners and students. 87 (95%) students accessed the examiners' audio comments; 83 (90%) found the comments useful and 63 (68%) reported changing the way they perform a skill as a result of the audio feedback. They valued its highly personalised, relevant nature and found it much more useful than written feedback. Eighty-nine per cent of examiners gave audio feedback to all students on their stations. Although many found the method easy, lack of time was a factor. Electronic audio feedback provides timely, personalised feedback to students after a summative OSCE provided enough time is allocated to the process.

  6. Joint Acoustic and Modulation Frequency

    NASA Astrophysics Data System (ADS)

    Atlas, Les; Shamma, Shihab A.

    2003-12-01

    There is a considerable evidence that our perception of sound uses important features which is related to underlying signal modulations. This topic has been studied extensively via perceptual experiments, yet there are few, if any, well-developed signal processing methods which capitalize on or model these effects. We begin by summarizing evidence of the importance of modulation representations from psychophysical, physiological, and other sources. The concept of a two-dimensional joint acoustic and modulation frequency representation is proposed. A simple single sinusoidal amplitude modulator of a sinusoidal carrier is then used to illustrate properties of an unconstrained and ideal joint representation. Added constraints are required to remove or reduce undesired interference terms and to provide invertibility. It is then noted that the constraints would also apply to more general and complex cases of broader modulation and carriers. Applications in single-channel speaker separation and in audio coding are used to illustrate the applicability of this joint representation. Other applications in signal analysis and filtering are suggested.

  7. Detecting regional lung properties using audio transfer functions of the respiratory system.

    PubMed

    Mulligan, K; Adler, A; Goubran, R

    2009-01-01

    In this study, a novel instrument has been developed for measuring changes in the distribution of lung fluid the respiratory system. The instrument consists of a speaker that inputs a 0-4kHz White Gaussian Noise (WGN) signal into a patient's mouth and an array of 4 electronic stethoscopes, linked via a fully adjustable harness, used to recover signals on the chest surface. The software system for processing the data utilizes the principles of adaptive filtering in order to obtain a transfer function that represents the input-output relationship for the signal as the volume of fluid in the lungs is varied. A chest phantom model was constructed to simulate the behavior of fluid related diseases within the lungs through the injection of varying volumes of water. Tests from the phantom model were compared to healthy subjects. Results show the instrument can obtain similar transfer functions and sound propagation delays between both human and phantom chests.

  8. Automatic Modulation Classification Based on Deep Learning for Unmanned Aerial Vehicles.

    PubMed

    Zhang, Duona; Ding, Wenrui; Zhang, Baochang; Xie, Chunyu; Li, Hongguang; Liu, Chunhui; Han, Jungong

    2018-03-20

    Deep learning has recently attracted much attention due to its excellent performance in processing audio, image, and video data. However, few studies are devoted to the field of automatic modulation classification (AMC). It is one of the most well-known research topics in communication signal recognition and remains challenging for traditional methods due to complex disturbance from other sources. This paper proposes a heterogeneous deep model fusion (HDMF) method to solve the problem in a unified framework. The contributions include the following: (1) a convolutional neural network (CNN) and long short-term memory (LSTM) are combined by two different ways without prior knowledge involved; (2) a large database, including eleven types of single-carrier modulation signals with various noises as well as a fading channel, is collected with various signal-to-noise ratios (SNRs) based on a real geographical environment; and (3) experimental results demonstrate that HDMF is very capable of coping with the AMC problem, and achieves much better performance when compared with the independent network.

  9. Automatic Modulation Classification Based on Deep Learning for Unmanned Aerial Vehicles

    PubMed Central

    Ding, Wenrui; Zhang, Baochang; Xie, Chunyu; Li, Hongguang; Liu, Chunhui; Han, Jungong

    2018-01-01

    Deep learning has recently attracted much attention due to its excellent performance in processing audio, image, and video data. However, few studies are devoted to the field of automatic modulation classification (AMC). It is one of the most well-known research topics in communication signal recognition and remains challenging for traditional methods due to complex disturbance from other sources. This paper proposes a heterogeneous deep model fusion (HDMF) method to solve the problem in a unified framework. The contributions include the following: (1) a convolutional neural network (CNN) and long short-term memory (LSTM) are combined by two different ways without prior knowledge involved; (2) a large database, including eleven types of single-carrier modulation signals with various noises as well as a fading channel, is collected with various signal-to-noise ratios (SNRs) based on a real geographical environment; and (3) experimental results demonstrate that HDMF is very capable of coping with the AMC problem, and achieves much better performance when compared with the independent network. PMID:29558434

  10. Development of a real time bistatic radar receiver using signals of opportunity

    NASA Astrophysics Data System (ADS)

    Rainville, Nicholas

    Passive bistatic radar remote sensing offers a novel method of monitoring the Earth's surface by observing reflected signals of opportunity. The Global Positioning System (GPS) has been used as a source of signals for these observations and the scattering properties of GPS signals from rough surfaces are well understood. Recent work has extended GPS signal reflection observations and scattering models to include communications signals such as XM radio signals. However the communication signal reflectometry experiments to date have relied on collecting raw, high data-rate signals which are then post-processed after the end of the experiment. This thesis describes the development of a communication signal bistatic radar receiver which computes a real time correlation waveform, which can be used to retrieve measurements of the Earth's surface. The real time bistatic receiver greatly reduces the quantity of data that must be stored to perform the remote sensing measurements, as well as offering immediate feedback. This expands the applications for the receiver to include space and bandwidth limited platforms such as aircraft and satellites. It also makes possible the adjustment of flight plans to the observed conditions. This real time receiver required the development of an FGPA based signal processor, along with the integration of commercial Satellite Digital Audio Radio System (SDARS) components. The resulting device was tested both in a lab environment as well on NOAA WP-3D and NASA WB-57 aircraft.

  11. Kernel-Based Sensor Fusion With Application to Audio-Visual Voice Activity Detection

    NASA Astrophysics Data System (ADS)

    Dov, David; Talmon, Ronen; Cohen, Israel

    2016-12-01

    In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance.

  12. Auditory and audio-vocal responses of single neurons in the monkey ventral premotor cortex.

    PubMed

    Hage, Steffen R

    2018-03-20

    Monkey vocalization is a complex behavioral pattern, which is flexibly used in audio-vocal communication. A recently proposed dual neural network model suggests that cognitive control might be involved in this behavior, originating from a frontal cortical network in the prefrontal cortex and mediated via projections from the rostral portion of the ventral premotor cortex (PMvr) and motor cortex to the primary vocal motor network in the brainstem. For the rapid adjustment of vocal output to external acoustic events, strong interconnections between vocal motor and auditory sites are needed, which are present at cortical and subcortical levels. However, the role of the PMvr in audio-vocal integration processes remains unclear. In the present study, single neurons in the PMvr were recorded in rhesus monkeys (Macaca mulatta) while volitionally producing vocalizations in a visual detection task or passively listening to monkey vocalizations. Ten percent of randomly selected neurons in the PMvr modulated their discharge rate in response to acoustic stimulation with species-specific calls. More than four-fifths of these auditory neurons showed an additional modulation of their discharge rates either before and/or during the monkeys' motor production of the vocalization. Based on these audio-vocal interactions, the PMvr might be well positioned to mediate higher order auditory processing with cognitive control of the vocal motor output to the primary vocal motor network. Such audio-vocal integration processes in the premotor cortex might constitute a precursor for the evolution of complex learned audio-vocal integration systems, ultimately giving rise to human speech. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Auditory and audio-visual processing in patients with cochlear, auditory brainstem, and auditory midbrain implants: An EEG study.

    PubMed

    Schierholz, Irina; Finke, Mareike; Kral, Andrej; Büchner, Andreas; Rach, Stefan; Lenarz, Thomas; Dengler, Reinhard; Sandmann, Pascale

    2017-04-01

    There is substantial variability in speech recognition ability across patients with cochlear implants (CIs), auditory brainstem implants (ABIs), and auditory midbrain implants (AMIs). To better understand how this variability is related to central processing differences, the current electroencephalography (EEG) study compared hearing abilities and auditory-cortex activation in patients with electrical stimulation at different sites of the auditory pathway. Three different groups of patients with auditory implants (Hannover Medical School; ABI: n = 6, CI: n = 6; AMI: n = 2) performed a speeded response task and a speech recognition test with auditory, visual, and audio-visual stimuli. Behavioral performance and cortical processing of auditory and audio-visual stimuli were compared between groups. ABI and AMI patients showed prolonged response times on auditory and audio-visual stimuli compared with NH listeners and CI patients. This was confirmed by prolonged N1 latencies and reduced N1 amplitudes in ABI and AMI patients. However, patients with central auditory implants showed a remarkable gain in performance when visual and auditory input was combined, in both speech and non-speech conditions, which was reflected by a strong visual modulation of auditory-cortex activation in these individuals. In sum, the results suggest that the behavioral improvement for audio-visual conditions in central auditory implant patients is based on enhanced audio-visual interactions in the auditory cortex. Their findings may provide important implications for the optimization of electrical stimulation and rehabilitation strategies in patients with central auditory prostheses. Hum Brain Mapp 38:2206-2225, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  14. Multifunction audio digitizer for communications systems

    NASA Technical Reports Server (NTRS)

    Monford, L. G., Jr.

    1971-01-01

    Digitizer accomplishes both N bit pulse code modulation /PCM/ and delta modulation, and provides modulation indicating variable signal gain and variable sidetone. Other features include - low package count, variable clock rate to optimize bandwidth, and easily expanded PCM output.

  15. Inexpensive Audio Activities: Earbud-based Sound Experiments

    NASA Astrophysics Data System (ADS)

    Allen, Joshua; Boucher, Alex; Meggison, Dean; Hruby, Kate; Vesenka, James

    2016-11-01

    Inexpensive alternatives to a number of classic introductory physics sound laboratories are presented including interference phenomena, resonance conditions, and frequency shifts. These can be created using earbuds, economical supplies such as Giant Pixie Stix® wrappers, and free software available for PCs and mobile devices. We describe two interference laboratories (beat frequency and two-speaker interference) and two resonance laboratories (quarter- and half-wavelength). Lastly, a Doppler laboratory using rotating earbuds is explained. The audio signal captured by all experiments is analyzed on free spectral analysis software and many of the experiments incorporate the unifying theme of measuring the speed of sound in air.

  16. Secure videoconferencing equipment switching system and method

    DOEpatents

    Hansen, Michael E [Livermore, CA

    2009-01-13

    A switching system and method are provided to facilitate use of videoconference facilities over a plurality of security levels. The system includes a switch coupled to a plurality of codecs and communication networks. Audio/Visual peripheral components are connected to the switch. The switch couples control and data signals between the Audio/Visual peripheral components and one but nor both of the plurality of codecs. The switch additionally couples communication networks of the appropriate security level to each of the codecs. In this manner, a videoconferencing facility is provided for use on both secure and non-secure networks.

  17. Robust Audio Watermarking by Using Low-Frequency Histogram

    NASA Astrophysics Data System (ADS)

    Xiang, Shijun

    In continuation to earlier work where the problem of time-scale modification (TSM) has been studied [1] by modifying the shape of audio time domain histogram, here we consider the additional ingredient of resisting additive noise-like operations, such as Gaussian noise, lossy compression and low-pass filtering. In other words, we study the problem of the watermark against both TSM and additive noises. To this end, in this paper we extract the histogram from a Gaussian-filtered low-frequency component for audio watermarking. The watermark is inserted by shaping the histogram in a way that the use of two consecutive bins as a group is exploited for hiding a bit by reassigning their population. The watermarked signals are perceptibly similar to the original one. Comparing with the previous time-domain watermarking scheme [1], the proposed watermarking method is more robust against additive noise, MP3 compression, low-pass filtering, etc.

  18. Loudspeaker Design and Performance Evaluation

    NASA Astrophysics Data System (ADS)

    Mäkivirta, Aki Vihtori

    A loudspeaker comprises transducers converting an electrical driving signal into sound pressure, an enclosure working as a holder for transducers, front baffle and box to contain and eliminate the rear-radiating audio signal, and electronic components. Modeling of transducers as well as enclosures is treated in Chap. 32 of this handbook. The purpose of the present chapter is to shed light on the design choices and options for the electronic circuits conditioning the electrical signal fed into loudspeaker transducers in order to optimize the acoustic performance of the loudspeaker.

  19. 47 CFR 1.4000 - Restrictions impairing reception of television broadcast signals, direct broadcast satellite...

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... things, AM radio, FM radio, amateur (“HAM”) radio, Citizen's Band (CB) radio, and Digital Audio Radio... Commission's contract copy center, and the Commission decisions will be available on the Internet. [66 FR...

  20. 47 CFR 1.4000 - Restrictions impairing reception of television broadcast signals, direct broadcast satellite...

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... things, AM radio, FM radio, amateur (“HAM”) radio, Citizen's Band (CB) radio, and Digital Audio Radio... Commission's contract copy center, and the Commission decisions will be available on the Internet. [66 FR...

  1. 47 CFR 1.4000 - Restrictions impairing reception of television broadcast signals, direct broadcast satellite...

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... things, AM radio, FM radio, amateur (“HAM”) radio, Citizen's Band (CB) radio, and Digital Audio Radio... Commission's contract copy center, and the Commission decisions will be available on the Internet. [66 FR...

  2. Robust audio-visual speech recognition under noisy audio-video conditions.

    PubMed

    Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

    2014-02-01

    This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.

  3. Procedural Due Process in Indian Education. Scripts for a Two-Part Filmstrip/Audio Tape Program. Curriculum Bulletin 18.03.

    ERIC Educational Resources Information Center

    Streiff, Paul R.

    In 1974 new sections added to the Code of Federal Regulations and the Indian Affairs Field Manual required the establishment of programs of student rights and responsibilities and due process procedures in schools operated or funded by the Bureau of Indian Affairs. This two-part filmstrip and audio tape program is designed to assist educators and…

  4. Audio-visual affective expression recognition

    NASA Astrophysics Data System (ADS)

    Huang, Thomas S.; Zeng, Zhihong

    2007-11-01

    Automatic affective expression recognition has attracted more and more attention of researchers from different disciplines, which will significantly contribute to a new paradigm for human computer interaction (affect-sensitive interfaces, socially intelligent environments) and advance the research in the affect-related fields including psychology, psychiatry, and education. Multimodal information integration is a process that enables human to assess affective states robustly and flexibly. In order to understand the richness and subtleness of human emotion behavior, the computer should be able to integrate information from multiple sensors. We introduce in this paper our efforts toward machine understanding of audio-visual affective behavior, based on both deliberate and spontaneous displays. Some promising methods are presented to integrate information from both audio and visual modalities. Our experiments show the advantage of audio-visual fusion in affective expression recognition over audio-only or visual-only approaches.

  5. Automatic Detection and Classification of Audio Events for Road Surveillance Applications.

    PubMed

    Almaadeed, Noor; Asim, Muhammad; Al-Maadeed, Somaya; Bouridane, Ahmed; Beghdadi, Azeddine

    2018-06-06

    This work investigates the problem of detecting hazardous events on roads by designing an audio surveillance system that automatically detects perilous situations such as car crashes and tire skidding. In recent years, research has shown several visual surveillance systems that have been proposed for road monitoring to detect accidents with an aim to improve safety procedures in emergency cases. However, the visual information alone cannot detect certain events such as car crashes and tire skidding, especially under adverse and visually cluttered weather conditions such as snowfall, rain, and fog. Consequently, the incorporation of microphones and audio event detectors based on audio processing can significantly enhance the detection accuracy of such surveillance systems. This paper proposes to combine time-domain, frequency-domain, and joint time-frequency features extracted from a class of quadratic time-frequency distributions (QTFDs) to detect events on roads through audio analysis and processing. Experiments were carried out using a publicly available dataset. The experimental results conform the effectiveness of the proposed approach for detecting hazardous events on roads as demonstrated by 7% improvement of accuracy rate when compared against methods that use individual temporal and spectral features.

  6. Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs

    PubMed Central

    ten Oever, Sanne; Sack, Alexander T.; Wheat, Katherine L.; Bien, Nina; van Atteveldt, Nienke

    2013-01-01

    Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception. PMID:23805110

  7. Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs.

    PubMed

    Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

    2013-01-01

    Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.

  8. Sonification for geoscience: Exploring faults from the inside

    NASA Astrophysics Data System (ADS)

    Mair, K.; Barrett, N.

    2013-12-01

    Sonification is the use of non-speech audio to convey information or display data. It can involve techniques such as the direct transposition of a signal into the audible range, or the mapping of specific sounds to a signal or event. Since our ears are very sensitive to unusual sonic events or sound patterns, sonification provides a potentially powerful way to perceptualize (or hear) our data. The addition of 3D audio to the sonification tool palate allows us to more accurately connect to spatial data in a novel, engaging and highly accessible approach. Here we investigate the use of sonification for geoscience by sonifying the data generated in computer models of earthquake processes. We present results from our recent 3D DEM (discrete element method) models where granular debris is sheared between rough walls to simulate an evolving fault. To best appreciate the inherently 3D nature of the crushing and sliding events (continuously tracked in our models) that occur as faults slip, we use Ambiosonics (a sound field recreation technology). This allows the position of individual events to be preserved generating a virtual 3D soundscape so we can explore faults - from the inside! During sonification, events such as grain scale fracturing, grain motions and interactions are mapped to specific sounds whose pitch, timbre, and volume reflect properties such as the depth, character, size of the individual events. Active listeners can explore the 3D soundscape and listen to evolving processes in the data by physically moving through the space or navigating via a 3D mouse controller. The soundscape can be heard either through an array of 16 speakers or using a pair of headphones. Emergent phenomena in the models generate sound patterns that are easily spotted by experts and youngsters alike. Also, because our ears are excellent signal-to-noise filters, events are easily recognizable above the background noise. From a learning perspective, the exploration process required to interact with the data is attractive: it is curiosity driven and requires active listening to decide where to go next; and each listener has a unique experience that they choose and control. In addition, the fact that this approach uses a different sense (and part of the brain), means that the listener's appreciation will depend more on their audio awareness than on prior education or aptitude. It is also an accessible tool for the visually impaired. From a scientists perspective, depending on the datasets in question, sonification may become a viable alternative or a complementary tool used alongside standard visualisation techniques. We anticipate significant potential for the use of sonification in the presentation, interpretation and communication of geoscience datasets in the classroom, conference hall, boardroom and public art gallery. Visitors to this poster will be encouraged to interact with and explore our 3D fault soundscape using a laptop and headphone interface.

  9. Using computers to overcome math-phobia in an introductory course in musical acoustics

    NASA Astrophysics Data System (ADS)

    Piacsek, Andrew A.

    2002-11-01

    In recent years, the desktop computer has acquired the signal processing and visualization capabilities once obtained only with expensive specialized equipment. With the appropriate A/D card and software, a PC can behave like an oscilloscope, a real-time signal analyzer, a function generator, and a synthesizer, with both audio and visual outputs. In addition, the computer can be used to visualize specific wave behavior, such as superposition and standing waves, refraction, dispersion, etc. These capabilities make the computer an invaluable tool to teach basic acoustic principles to students with very poor math skills. In this paper I describe my approach to teaching the introductory-level Physics of Musical Sound at Central Washington University, in which very few science students enroll. Emphasis is placed on how vizualization with computers can help students appreciate and apply quantitative methods for analyzing sound.

  10. Theory, implementation and applications of nonstationary Gabor frames

    PubMed Central

    Balazs, P.; Dörfler, M.; Jaillet, F.; Holighaus, N.; Velasco, G.

    2011-01-01

    Signal analysis with classical Gabor frames leads to a fixed time–frequency resolution over the whole time–frequency plane. To overcome the limitations imposed by this rigidity, we propose an extension of Gabor theory that leads to the construction of frames with time–frequency resolution changing over time or frequency. We describe the construction of the resulting nonstationary Gabor frames and give the explicit formula for the canonical dual frame for a particular case, the painless case. We show that wavelet transforms, constant-Q transforms and more general filter banks may be modeled in the framework of nonstationary Gabor frames. Further, we present the results in the finite-dimensional case, which provides a method for implementing the above-mentioned transforms with perfect reconstruction. Finally, we elaborate on two applications of nonstationary Gabor frames in audio signal processing, namely a method for automatic adaptation to transients and an algorithm for an invertible constant-Q transform. PMID:22267893

  11. Interface Design Implications for Recalling the Spatial Configuration of Virtual Auditory Environments

    NASA Astrophysics Data System (ADS)

    McMullen, Kyla A.

    Although the concept of virtual spatial audio has existed for almost twenty-five years, only in the past fifteen years has modern computing technology enabled the real-time processing needed to deliver high-precision spatial audio. Furthermore, the concept of virtually walking through an auditory environment did not exist. The applications of such an interface have numerous potential uses. Spatial audio has the potential to be used in various manners ranging from enhancing sounds delivered in virtual gaming worlds to conveying spatial locations in real-time emergency response systems. To incorporate this technology in real-world systems, various concerns should be addressed. First, to widely incorporate spatial audio into real-world systems, head-related transfer functions (HRTFs) must be inexpensively created for each user. The present study further investigated an HRTF subjective selection procedure previously developed within our research group. Users discriminated auditory cues to subjectively select their preferred HRTF from a publicly available database. Next, the issue of training to find virtual sources was addressed. Listeners participated in a localization training experiment using their selected HRTFs. The training procedure was created from the characterization of successful search strategies in prior auditory search experiments. Search accuracy significantly improved after listeners performed the training procedure. Next, in the investigation of auditory spatial memory, listeners completed three search and recall tasks with differing recall methods. Recall accuracy significantly decreased in tasks that required the storage of sound source configurations in memory. To assess the impacts of practical scenarios, the present work assessed the performance effects of: signal uncertainty, visual augmentation, and different attenuation modeling. Fortunately, source uncertainty did not affect listeners' ability to recall or identify sound sources. The present study also found that the presence of visual reference frames significantly increased recall accuracy. Additionally, the incorporation of drastic attenuation significantly improved environment recall accuracy. Through investigating the aforementioned concerns, the present study made initial footsteps guiding the design of virtual auditory environments that support spatial configuration recall.

  12. Identification of walked-upon materials in auditory, kinesthetic, haptic, and audio-haptic conditions.

    PubMed

    Giordano, Bruno L; Visell, Yon; Yao, Hsin-Yun; Hayward, Vincent; Cooperstock, Jeremy R; McAdams, Stephen

    2012-05-01

    Locomotion generates multisensory information about walked-upon objects. How perceptual systems use such information to get to know the environment remains unexplored. The ability to identify solid (e.g., marble) and aggregate (e.g., gravel) walked-upon materials was investigated in auditory, haptic or audio-haptic conditions, and in a kinesthetic condition where tactile information was perturbed with a vibromechanical noise. Overall, identification performance was better than chance in all experimental conditions and for both solids and the better identified aggregates. Despite large mechanical differences between the response of solids and aggregates to locomotion, for both material categories discrimination was at its worst in the auditory and kinesthetic conditions and at its best in the haptic and audio-haptic conditions. An analysis of the dominance of sensory information in the audio-haptic context supported a focus on the most accurate modality, haptics, but only for the identification of solid materials. When identifying aggregates, response biases appeared to produce a focus on the least accurate modality--kinesthesia. When walking on loose materials such as gravel, individuals do not perceive surfaces by focusing on the most accurate modality, but by focusing on the modality that would most promptly signal postural instabilities.

  13. Real World Audio

    NASA Technical Reports Server (NTRS)

    1998-01-01

    Crystal River Engineering was originally featured in Spinoff 1992 with the Convolvotron, a high speed digital audio processing system that delivers three-dimensional sound over headphones. The Convolvotron was developed for Ames' research on virtual acoustic displays. Crystal River is a now a subsidiary of Aureal Semiconductor, Inc. and they together develop and market the technology, which is a 3-D (three dimensional) audio technology known commercially today as Aureal 3D (A-3D). The technology has been incorporated into video games, surround sound systems, and sound cards.

  14. 47 CFR 73.3549 - Requests for extension of time to operate without required monitors, indicating instruments, and...

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... decoders for monitoring and generating the EAS codes and Attention Signal should be made to the FCC in Washington, DC, Attention: Audio Division (radio) or Video Division (television), Media Bureau. Such requests...

  15. Comparing closed quotient in children singers' voices as measured by high-speed-imaging, electroglottography, and inverse filtering.

    PubMed

    Mecke, Ann-Christine; Sundberg, Johan; Granqvist, Svante; Echternach, Matthias

    2012-01-01

    The closed quotient, i.e., the ratio between the closed phase and the period, is commonly studied in voice research. However, the term may refer to measures derived from different methods, such as inverse filtering, electroglottography or high-speed digital imaging (HSDI). This investigation compares closed quotient data measured by these three methods in two boy singers. Each singer produced sustained tones on two different pitches and a glissando. Audio, electroglottographic signal (EGG), and HSDI were recorded simultaneously. The audio signal was inverse filtered by means of the decap program; the closed phase was defined as the flat minimum portion of the flow glottogram. Glottal area was automatically measured in the high speed images by the built-in camera software, and the closed phase was defined as the flat minimum portion of the area-signal. The EGG-signal was analyzed in four different ways using the matlab open quotient interface. The closed quotient data taken from the EGG were found to be considerably higher than those obtained from inverse filtering. Also, substantial differences were found between the closed quotient derived from HSDI and those derived from inverse filtering. The findings illustrate the importance of distinguishing between these quotients. © 2012 Acoustical Society of America.

  16. Web Audio/Video Streaming Tool

    NASA Technical Reports Server (NTRS)

    Guruvadoo, Eranna K.

    2003-01-01

    In order to promote NASA-wide educational outreach program to educate and inform the public of space exploration, NASA, at Kennedy Space Center, is seeking efficient ways to add more contents to the web by streaming audio/video files. This project proposes a high level overview of a framework for the creation, management, and scheduling of audio/video assets over the web. To support short-term goals, the prototype of a web-based tool is designed and demonstrated to automate the process of streaming audio/video files. The tool provides web-enabled users interfaces to manage video assets, create publishable schedules of video assets for streaming, and schedule the streaming events. These operations are performed on user-defined and system-derived metadata of audio/video assets stored in a relational database while the assets reside on separate repository. The prototype tool is designed using ColdFusion 5.0.

  17. Tensorial dynamic time warping with articulation index representation for efficient audio-template learning.

    PubMed

    Le, Long N; Jones, Douglas L

    2018-03-01

    Audio classification techniques often depend on the availability of a large labeled training dataset for successful performance. However, in many application domains of audio classification (e.g., wildlife monitoring), obtaining labeled data is still a costly and laborious process. Motivated by this observation, a technique is proposed to efficiently learn a clean template from a few labeled, but likely corrupted (by noise and interferences), data samples. This learning can be done efficiently via tensorial dynamic time warping on the articulation index-based time-frequency representations of audio data. The learned template can then be used in audio classification following the standard template-based approach. Experimental results show that the proposed approach outperforms both (1) the recurrent neural network approach and (2) the state-of-the-art in the template-based approach on a wildlife detection application with few training samples.

  18. Real-time implementation of an interactive jazz accompaniment system

    NASA Astrophysics Data System (ADS)

    Deshpande, Nikhil

    Modern computational algorithms and digital signal processing (DSP) are able to combine with human performers without forced or predetermined structure in order to create dynamic and real-time accompaniment systems. With modern computing power and intelligent algorithm layout and design, it is possible to achieve more detailed auditory analysis of live music. Using this information, computer code can follow and predict how a human's musical performance evolves, and use this to react in a musical manner. This project builds a real-time accompaniment system to perform together with live musicians, with a focus on live jazz performance and improvisation. The system utilizes a new polyphonic pitch detector and embeds it in an Ableton Live system - combined with Max for Live - to perform elements of audio analysis, generation, and triggering. The system also relies on tension curves and information rate calculations from the Creative Artificially Intuitive and Reasoning Agent (CAIRA) system to help understand and predict human improvisation. These metrics are vital to the core system and allow for extrapolated audio analysis. The system is able to react dynamically to a human performer, and can successfully accompany the human as an entire rhythm section.

  19. Blind estimation of reverberation time

    NASA Astrophysics Data System (ADS)

    Ratnam, Rama; Jones, Douglas L.; Wheeler, Bruce C.; O'Brien, William D.; Lansing, Charissa R.; Feng, Albert S.

    2003-11-01

    The reverberation time (RT) is an important parameter for characterizing the quality of an auditory space. Sounds in reverberant environments are subject to coloration. This affects speech intelligibility and sound localization. Many state-of-the-art audio signal processing algorithms, for example in hearing-aids and telephony, are expected to have the ability to characterize the listening environment, and turn on an appropriate processing strategy accordingly. Thus, a method for characterization of room RT based on passively received microphone signals represents an important enabling technology. Current RT estimators, such as Schroeder's method, depend on a controlled sound source, and thus cannot produce an online, blind RT estimate. Here, a method for estimating RT without prior knowledge of sound sources or room geometry is presented. The diffusive tail of reverberation was modeled as an exponentially damped Gaussian white noise process. The time-constant of the decay, which provided a measure of the RT, was estimated using a maximum-likelihood procedure. The estimates were obtained continuously, and an order-statistics filter was used to extract the most likely RT from the accumulated estimates. The procedure was illustrated for connected speech. Results obtained for simulated and real room data are in good agreement with the real RT values.

  20. Spatial domain entertainment audio decompression/compression

    NASA Astrophysics Data System (ADS)

    Chan, Y. K.; Tam, Ka Him K.

    2014-02-01

    The ARM7 NEON processor with 128bit SIMD hardware accelerator requires a peak performance of 13.99 Mega Cycles per Second for MP3 stereo entertainment quality decoding. For similar compression bit rate, OGG and AAC is preferred over MP3. The Patent Cooperation Treaty Application dated 28/August/2012 describes an audio decompression scheme producing a sequence of interleaving "min to Max" and "Max to min" rising and falling segments. The number of interior audio samples bound by "min to Max" or "Max to min" can be {0|1|…|N} audio samples. The magnitudes of samples, including the bounding min and Max, are distributed as normalized constants within the 0 and 1 of the bounding magnitudes. The decompressed audio is then a "sequence of static segments" on a frame by frame basis. Some of these frames needed to be post processed to elevate high frequency. The post processing is compression efficiency neutral and the additional decoding complexity is only a small fraction of the overall decoding complexity without the need of extra hardware. Compression efficiency can be speculated as very high as source audio had been decimated and converted to a set of data with only "segment length and corresponding segment magnitude" attributes. The PCT describes how these two attributes are efficiently coded by the PCT innovative coding scheme. The PCT decoding efficiency is obviously very high and decoding latency is basically zero. Both hardware requirement and run time is at least an order of magnitude better than MP3 variants. The side benefit is ultra low power consumption on mobile device. The acid test on how such a simplistic waveform representation can indeed reproduce authentic decompressed quality is benchmarked versus OGG(aoTuv Beta 6.03) by three pair of stereo audio frames and one broadcast like voice audio frame with each frame consisting 2,028 samples at 44,100KHz sampling frequency.

  1. Four-dimensional computed tomography based respiratory-gated radiotherapy with respiratory guidance system: analysis of respiratory signals and dosimetric comparison.

    PubMed

    Lee, Jung Ae; Kim, Chul Yong; Yang, Dae Sik; Yoon, Won Sup; Park, Young Je; Lee, Suk; Kim, Young Bum

    2014-01-01

    To investigate the effectiveness of respiratory guidance system in 4-dimensional computed tomography (4 DCT) based respiratory-gated radiation therapy (RGRT) by comparing respiratory signals and dosimetric analysis of treatment plans. The respiratory amplitude and period of the free, the audio device-guided, and the complex system-guided breathing were evaluated in eleven patients with lung or liver cancers. The dosimetric parameters were assessed by comparing free breathing CT plan and 4 DCT-based 30-70% maximal intensity projection (MIP) plan. The use of complex system-guided breathing showed significantly less variation in respiratory amplitude and period compared to the free or audio-guided breathing regarding the root mean square errors (RMSE) of full inspiration (P = 0.031), full expiration (P = 0.007), and period (P = 0.007). The dosimetric parameters including V(5 Gy), V(10 Gy), V(20 Gy), V(30 Gy), V(40 Gy), and V(50 Gy) of normal liver or lung in 4 DCT MIP plan were superior over free breathing CT plan. The reproducibility and regularity of respiratory amplitude and period were significantly improved with the complex system-guided breathing compared to the free or the audio-guided breathing. In addition, the treatment plan based on the 4D CT-based MIP images acquired with the complex system guided breathing showed better normal tissue sparing than that on the free breathing CT.

  2. Weaving together peer assessment, audios and medical vignettes in teaching medical terms.

    PubMed

    Allibaih, Mohammad; Khan, Lateef M

    2015-12-06

    The current study aims at exploring the possibility of aligning peer assessment, audiovisuals, and medical case-report extracts (vignettes) in medical terminology teaching. In addition, the study wishes to highlight the effectiveness of audio materials and medical history vignettes in preventing medical students' comprehension, listening, writing, and pronunciation errors. The study also aims at reflecting the medical students' attitudes towards the teaching and learning process. The study involved 161 medical students who received an intensive medical terminology course through audio and medical history extracts. Peer assessment and formative assessment platforms were applied through fake quizzes in a pre- and post-test manner. An 18-item survey was distributed amongst students to investigate their attitudes and feedback towards the teaching and learning process. Quantitative and qualitative data were analysed using the SPSS software. The students did better in the posttests than on the pretests for both the quizzes of audios and medical vignettes showing a t-test of -12.09 and -13.60 respectively. Moreover, out of the 133 students, 120 students (90.22%) responded to the survey questions. The students gave positive attitudes towards the application of audios and vignettes in the teaching and learning of medical terminology and towards the learning process. The current study revealed that the teaching and learning of medical terminology have more room for the application of advanced technologies, effective assessment platforms, and active learning strategies in higher education. It also highlights that students are capable of carrying more responsibilities of assessment, feedback, and e-learning.

  3. Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

    PubMed

    Smayda, Kirsten E; Van Engen, Kristin J; Maddox, W Todd; Chandrasekaran, Bharath

    2016-01-01

    Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35) and thirty-three older adults (ages 60-90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both semantic and visual cues are available to the listener.

  4. Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults

    PubMed Central

    Smayda, Kirsten E.; Van Engen, Kristin J.; Maddox, W. Todd; Chandrasekaran, Bharath

    2016-01-01

    Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18–35) and thirty-three older adults (ages 60–90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both semantic and visual cues are available to the listener. PMID:27031343

  5. Enhanced audio-visual interactions in the auditory cortex of elderly cochlear-implant users.

    PubMed

    Schierholz, Irina; Finke, Mareike; Schulte, Svenja; Hauthal, Nadine; Kantzke, Christoph; Rach, Stefan; Büchner, Andreas; Dengler, Reinhard; Sandmann, Pascale

    2015-10-01

    Auditory deprivation and the restoration of hearing via a cochlear implant (CI) can induce functional plasticity in auditory cortical areas. How these plastic changes affect the ability to integrate combined auditory (A) and visual (V) information is not yet well understood. In the present study, we used electroencephalography (EEG) to examine whether age, temporary deafness and altered sensory experience with a CI can affect audio-visual (AV) interactions in post-lingually deafened CI users. Young and elderly CI users and age-matched NH listeners performed a speeded response task on basic auditory, visual and audio-visual stimuli. Regarding the behavioral results, a redundant signals effect, that is, faster response times to cross-modal (AV) than to both of the two modality-specific stimuli (A, V), was revealed for all groups of participants. Moreover, in all four groups, we found evidence for audio-visual integration. Regarding event-related responses (ERPs), we observed a more pronounced visual modulation of the cortical auditory response at N1 latency (approximately 100 ms after stimulus onset) in the elderly CI users when compared with young CI users and elderly NH listeners. Thus, elderly CI users showed enhanced audio-visual binding which may be a consequence of compensatory strategies developed due to temporary deafness and/or degraded sensory input after implantation. These results indicate that the combination of aging, sensory deprivation and CI facilitates the coupling between the auditory and the visual modality. We suggest that this enhancement in multisensory interactions could be used to optimize auditory rehabilitation, especially in elderly CI users, by the application of strong audio-visually based rehabilitation strategies after implant switch-on. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Design of batch audio/video conversion platform based on JavaEE

    NASA Astrophysics Data System (ADS)

    Cui, Yansong; Jiang, Lianpin

    2018-03-01

    With the rapid development of digital publishing industry, the direction of audio / video publishing shows the diversity of coding standards for audio and video files, massive data and other significant features. Faced with massive and diverse data, how to quickly and efficiently convert to a unified code format has brought great difficulties to the digital publishing organization. In view of this demand and present situation in this paper, basing on the development architecture of Sptring+SpringMVC+Mybatis, and combined with the open source FFMPEG format conversion tool, a distributed online audio and video format conversion platform with a B/S structure is proposed. Based on the Java language, the key technologies and strategies designed in the design of platform architecture are analyzed emphatically in this paper, designing and developing a efficient audio and video format conversion system, which is composed of “Front display system”, "core scheduling server " and " conversion server ". The test results show that, compared with the ordinary audio and video conversion scheme, the use of batch audio and video format conversion platform can effectively improve the conversion efficiency of audio and video files, and reduce the complexity of the work. Practice has proved that the key technology discussed in this paper can be applied in the field of large batch file processing, and has certain practical application value.

  7. HomeBank: An Online Repository of Daylong Child-Centered Audio Recordings

    PubMed Central

    VanDam, Mark; Warlaumont, Anne S.; Bergelson, Elika; Cristia, Alejandrina; Soderstrom, Melanie; De Palma, Paul; MacWhinney, Brian

    2017-01-01

    HomeBank is introduced here. It is a public, permanent, extensible, online database of daylong audio recorded in naturalistic environments. HomeBank serves two primary purposes. First, it is a repository for raw audio and associated files: one database requires special permissions, and another redacted database allows unrestricted public access. Associated files include metadata such as participant demographics and clinical diagnostics, automated annotations, and human-generated transcriptions and annotations. Many recordings use the child-perspective LENA recorders (LENA Research Foundation, Boulder, Colorado, United States), but various recordings and metadata can be accommodated. The HomeBank database can have both vetted and unvetted recordings, with different levels of accessibility. Additionally, HomeBank is an open repository for processing and analysis tools for HomeBank or similar data sets. HomeBank is flexible for users and contributors, making primary data available to researchers, especially those in child development, linguistics, and audio engineering. HomeBank facilitates researchers’ access to large-scale data and tools, linking the acoustic, auditory, and linguistic characteristics of children’s environments with a variety of variables including socioeconomic status, family characteristics, language trajectories, and disorders. Automated processing applied to daylong home audio recordings is now becoming widely used in early intervention initiatives, helping parents to provide richer speech input to at-risk children. PMID:27111272

  8. An ESL Audio-Script Writing Workshop

    ERIC Educational Resources Information Center

    Miller, Carla

    2012-01-01

    The roles of dialogue, collaborative writing, and authentic communication have been explored as effective strategies in second language writing classrooms. In this article, the stages of an innovative, multi-skill writing method, which embeds students' personal voices into the writing process, are explored. A 10-step ESL Audio Script Writing Model…

  9. The Audio-Visual Marketing Handbook for Independent Schools.

    ERIC Educational Resources Information Center

    Griffith, Tom

    This how-to booklet offers specific advice on producing video or slide/tape programs for marketing independent schools. Five chapters present guidelines for various stages in the process: (1) Audio-Visual Marketing in Context (aesthetics and economics of audiovisual marketing); (2) A Question of Identity (identifying the audience and deciding on…

  10. Demonstration of Detection and Ranging Using Solvable Chaos

    NASA Technical Reports Server (NTRS)

    Corron, Ned J.; Stahl, Mark T.; Blakely, Jonathan N.

    2013-01-01

    Acoustic experiments demonstrate a novel approach to ranging and detection that exploits the properties of a solvable chaotic oscillator. This nonlinear oscillator includes an ordinary differential equation and a discrete switching condition. The chaotic waveform generated by this hybrid system is used as the transmitted waveform. The oscillator admits an exact analytic solution that can be written as the linear convolution of binary symbols and a single basis function. This linear representation enables coherent reception using a simple analog matched filter and without need for digital sampling or signal processing. An audio frequency implementation of the transmitter and receiver is described. Successful acoustic ranging measurements are presented to demonstrate the viability of the approach.

  11. Audio-visual synchrony and spatial attention enhance processing of dynamic visual stimulation independently and in parallel: A frequency-tagging study.

    PubMed

    Covic, Amra; Keitel, Christian; Porcu, Emanuele; Schröger, Erich; Müller, Matthias M

    2017-11-01

    The neural processing of a visual stimulus can be facilitated by attending to its position or by a co-occurring auditory tone. Using frequency-tagging, we investigated whether facilitation by spatial attention and audio-visual synchrony rely on similar neural processes. Participants attended to one of two flickering Gabor patches (14.17 and 17 Hz) located in opposite lower visual fields. Gabor patches further "pulsed" (i.e. showed smooth spatial frequency variations) at distinct rates (3.14 and 3.63 Hz). Frequency-modulating an auditory stimulus at the pulse-rate of one of the visual stimuli established audio-visual synchrony. Flicker and pulsed stimulation elicited stimulus-locked rhythmic electrophysiological brain responses that allowed tracking the neural processing of simultaneously presented Gabor patches. These steady-state responses (SSRs) were quantified in the spectral domain to examine visual stimulus processing under conditions of synchronous vs. asynchronous tone presentation and when respective stimulus positions were attended vs. unattended. Strikingly, unique patterns of effects on pulse- and flicker driven SSRs indicated that spatial attention and audiovisual synchrony facilitated early visual processing in parallel and via different cortical processes. We found attention effects to resemble the classical top-down gain effect facilitating both, flicker and pulse-driven SSRs. Audio-visual synchrony, in turn, only amplified synchrony-producing stimulus aspects (i.e. pulse-driven SSRs) possibly highlighting the role of temporally co-occurring sights and sounds in bottom-up multisensory integration. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Real-time implementations of acoustic signal enhancement techniques for aerial based surveillance and rescue applications

    NASA Astrophysics Data System (ADS)

    Ramos, Antonio L. L.; Shao, Zhili; Holthe, Aleksander; Sandli, Mathias F.

    2017-05-01

    The introduction of the System-on-Chip (SoC) technology has brought exciting new opportunities for the development of smart low cost embedded systems spanning a wide range of applications. Currently available SoC devices are capable of performing high speed digital signal processing tasks in software while featuring relatively low development costs and reduced time-to-market. Unmanned aerial vehicles (UAV) are an application example that has shown tremendous potential in an increasing number of scenarios, ranging from leisure to surveillance as well as in search and rescue missions. Video capturing from UAV platforms is a relatively straightforward task that requires almost no preprocessing. However, that does not apply to audio signals, especially in cases where the data is to be used to support real-time decision making. In fact, the enormous amount of acoustic interference from the surroundings, including the noise from the UAVs propellers, becomes a huge problem. This paper discusses a real-time implementation of the NLMS adaptive filtering algorithm applied to enhancing acoustic signals captured from UAV platforms. The model relies on a combination of acoustic sensors and a computational inexpensive algorithm running on a digital signal processor. Given its simplicity, this solution can be incorporated into the main processing system of an UAV using the SoC technology, and run concurrently with other required tasks, such as flight control and communications. Simulations and real-time DSP-based implementations have shown significant signal enhancement results by efficiently mitigating the interference from the noise generated by the UAVs propellers as well as from other external noise sources.

  13. The Sweet-Home project: audio processing and decision making in smart home to improve well-being and reliance.

    PubMed

    Vacher, Michel; Chahuara, Pedro; Lecouteux, Benjamin; Istrate, Dan; Portet, Francois; Joubert, Thierry; Sehili, Mohamed; Meillon, Brigitte; Bonnefond, Nicolas; Fabre, Sébastien; Roux, Camille; Caffiau, Sybille

    2013-01-01

    The Sweet-Home project aims at providing audio-based interaction technology that lets the user have full control over their home environment, at detecting distress situations and at easing the social inclusion of the elderly and frail population. This paper presents an overview of the project focusing on the implemented techniques for speech and sound recognition as context-aware decision making with uncertainty. A user experiment in a smart home demonstrates the interest of this audio-based technology.

  14. A hybrid technique for speech segregation and classification using a sophisticated deep neural network

    PubMed Central

    Nawaz, Tabassam; Mehmood, Zahid; Rashid, Muhammad; Habib, Hafiz Adnan

    2018-01-01

    Recent research on speech segregation and music fingerprinting has led to improvements in speech segregation and music identification algorithms. Speech and music segregation generally involves the identification of music followed by speech segregation. However, music segregation becomes a challenging task in the presence of noise. This paper proposes a novel method of speech segregation for unlabelled stationary noisy audio signals using the deep belief network (DBN) model. The proposed method successfully segregates a music signal from noisy audio streams. A recurrent neural network (RNN)-based hidden layer segregation model is applied to remove stationary noise. Dictionary-based fisher algorithms are employed for speech classification. The proposed method is tested on three datasets (TIMIT, MIR-1K, and MusicBrainz), and the results indicate the robustness of proposed method for speech segregation. The qualitative and quantitative analysis carried out on three datasets demonstrate the efficiency of the proposed method compared to the state-of-the-art speech segregation and classification-based methods. PMID:29558485

  15. Automatic detection of obstructive sleep apnea using speech signals.

    PubMed

    Goldshtein, Evgenia; Tarasiuk, Ariel; Zigel, Yaniv

    2011-05-01

    Obstructive sleep apnea (OSA) is a common disorder associated with anatomical abnormalities of the upper airways that affects 5% of the population. Acoustic parameters may be influenced by the vocal tract structure and soft tissue properties. We hypothesize that speech signal properties of OSA patients will be different than those of control subjects not having OSA. Using speech signal processing techniques, we explored acoustic speech features of 93 subjects who were recorded using a text-dependent speech protocol and a digital audio recorder immediately prior to polysomnography study. Following analysis of the study, subjects were divided into OSA (n=67) and non-OSA (n=26) groups. A Gaussian mixture model-based system was developed to model and classify between the groups; discriminative features such as vocal tract length and linear prediction coefficients were selected using feature selection technique. Specificity and sensitivity of 83% and 79% were achieved for the male OSA and 86% and 84% for the female OSA patients, respectively. We conclude that acoustic features from speech signals during wakefulness can detect OSA patients with good specificity and sensitivity. Such a system can be used as a basis for future development of a tool for OSA screening. © 2011 IEEE

  16. Development of SIR-C Ground Calibration Equipment

    NASA Technical Reports Server (NTRS)

    Freeman, A.; Azeem, M.; Haub, D.; Sarabandi, K.

    1993-01-01

    SIR-C/X-SAR is currently scheduled for launch in April 1994. SIR-C is an L-Band and C-Band, multi-polarization spaceborne SAR system developed by NASA/JPL. X- SAR is an X-Band SAR system developed by DARA/ASI. One of the problems involved in calibrating the SIR-C instrument is to make sure that the horizontal (H) and vertical (V) polarized beams are aligned in the azimuth direction, i.e.. that they are pointing in the same direction. This is important if the polarimetric performance specifications for the system are to be met. To solve this problem, we have designed and built a prototype of a low-cost ground receiver capable of recording received power from two antennas, one H-polarized, the other V-polarized. The two signals are mixed to audio then recorded on the left and right stereo channels of a standard audio cassette player. The audio cassette recording can then be played back directly into a Macintosh computer, where it is digitized. Analysis of.

  17. Comparing Binaural Pre-processing Strategies I: Instrumental Evaluation.

    PubMed

    Baumgärtel, Regina M; Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M A; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-12-30

    In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. © The Author(s) 2015.

  18. Comparing Binaural Pre-processing Strategies I

    PubMed Central

    Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M. A.; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-01-01

    In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. PMID:26721920

  19. Spatio-temporal distribution of brain activity associated with audio-visually congruent and incongruent speech and the McGurk Effect.

    PubMed

    Pratt, Hillel; Bleich, Naomi; Mittelman, Nomi

    2015-11-01

    Spatio-temporal distributions of cortical activity to audio-visual presentations of meaningless vowel-consonant-vowels and the effects of audio-visual congruence/incongruence, with emphasis on the McGurk effect, were studied. The McGurk effect occurs when a clearly audible syllable with one consonant, is presented simultaneously with a visual presentation of a face articulating a syllable with a different consonant and the resulting percept is a syllable with a consonant other than the auditorily presented one. Twenty subjects listened to pairs of audio-visually congruent or incongruent utterances and indicated whether pair members were the same or not. Source current densities of event-related potentials to the first utterance in the pair were estimated and effects of stimulus-response combinations, brain area, hemisphere, and clarity of visual articulation were assessed. Auditory cortex, superior parietal cortex, and middle temporal cortex were the most consistently involved areas across experimental conditions. Early (<200 msec) processing of the consonant was overall prominent in the left hemisphere, except right hemisphere prominence in superior parietal cortex and secondary visual cortex. Clarity of visual articulation impacted activity in secondary visual cortex and Wernicke's area. McGurk perception was associated with decreased activity in primary and secondary auditory cortices and Wernicke's area before 100 msec, increased activity around 100 msec which decreased again around 180 msec. Activity in Broca's area was unaffected by McGurk perception and was only increased to congruent audio-visual stimuli 30-70 msec following consonant onset. The results suggest left hemisphere prominence in the effects of stimulus and response conditions on eight brain areas involved in dynamically distributed parallel processing of audio-visual integration. Initially (30-70 msec) subcortical contributions to auditory cortex, superior parietal cortex, and middle temporal cortex occur. During 100-140 msec, peristriate visual influences and Wernicke's area join in the processing. Resolution of incongruent audio-visual inputs is then attempted, and if successful, McGurk perception occurs and cortical activity in left hemisphere further increases between 170 and 260 msec.

  20. Audio-based bolt-loosening detection technique of bolt joint

    NASA Astrophysics Data System (ADS)

    Zhang, Yang; Zhao, Xuefeng; Su, Wensheng; Xue, Zhigang

    2018-03-01

    Bolt joint, as the commonest coupling structure, is widely used in electro-mechanical system. However, it is the weakest part of the whole system. The increase of preload tension force can raise the reliability and strength of the bolt joint. Therefore, the pretension force is one of the most important factors to ensure the stability of bolt joint. According to the way of generating pretension force, the pretension force can be monitored by bolt torque, degrees and elongation. The existing bolt-loosening monitoring methods all require expensive equipment, which greatly restricts the practicality of the bolt-loosening monitoring. In this paper, a new method of bolt-loosening detection technique based on audio is proposed. The sound that bolt is hit by a hammer is recorded on the Smartphone, and the collected audio signal is classified and identified by support vector machine algorithm. First, a verification test was designed and the results show that this new method can identify the damage of bolt looseness accurately. Second, a variety of bolt-loosening was identified. The results indicate that this method has a high accuracy in multiclass classification of the bolt looseness. This bolt-loosening detection technique based on audio not only can reduce the requirements of technical and professional experience, but also make bolt-loosening monitoring simpler and easier.

  1. Designing between Pedagogies and Cultures: Audio-Visual Chinese Language Resources for Australian Schools

    ERIC Educational Resources Information Center

    Yuan, Yifeng; Shen, Huizhong

    2016-01-01

    This design-based study examines the creation and development of audio-visual Chinese language teaching and learning materials for Australian schools by incorporating users' feedback and content writers' input that emerged in the designing process. Data were collected from workshop feedback of two groups of Chinese-language teachers from primary…

  2. Mathematical Audio-Podcasts for Teacher Education and School

    ERIC Educational Resources Information Center

    Schreiber, Christof; Klose, Rebecca

    2017-01-01

    Audio-podcasts offer notable opportunities for oral representation of mathematical content through digital media--not only for teacher education but also in primary schools. This article deals with the process of creating such podcasts, as well as their uses in schools, university teaching and research. We allow for various learning groups--which…

  3. An Annotated Guide to Audio-Visual Materials for Teaching Shakespeare.

    ERIC Educational Resources Information Center

    Albert, Richard N.

    Audio-visual materials, found in a variety of periodicals, catalogs, and reference works, are listed in this guide to expedite the process of finding appropriate classroom materials for a study of William Shakespeare in the classroom. Separate listings of films, filmstrips, and recordings are provided, with subdivisions for "The Plays"…

  4. Audio and Video Reflections to Promote Social Justice

    ERIC Educational Resources Information Center

    Boske, Christa

    2011-01-01

    Purpose: The purpose of this paper is to examine how 15 graduate students enrolled in a US school leadership preparation program understand issues of social justice and equity through a reflective process utilizing audio and/or video software. Design/methodology/approach: The study is based on the tradition of grounded theory. The researcher…

  5. A Lossless Multichannel Bio-Signal Compression Based on Low-Complexity Joint Coding Scheme for Portable Medical Devices

    PubMed Central

    Kim, Dong-Sun; Kwon, Jin-San

    2014-01-01

    Research on real-time health systems have received great attention during recent years and the needs of high-quality personal multichannel medical signal compression for personal medical product applications are increasing. The international MPEG-4 audio lossless coding (ALS) standard supports a joint channel-coding scheme for improving compression performance of multichannel signals and it is very efficient compression method for multi-channel biosignals. However, the computational complexity of such a multichannel coding scheme is significantly greater than that of other lossless audio encoders. In this paper, we present a multichannel hardware encoder based on a low-complexity joint-coding technique and shared multiplier scheme for portable devices. A joint-coding decision method and a reference channel selection scheme are modified for a low-complexity joint coder. The proposed joint coding decision method determines the optimized joint-coding operation based on the relationship between the cross correlation of residual signals and the compression ratio. The reference channel selection is designed to select a channel for the entropy coding of the joint coding. The hardware encoder operates at a 40 MHz clock frequency and supports two-channel parallel encoding for the multichannel monitoring system. Experimental results show that the compression ratio increases by 0.06%, whereas the computational complexity decreases by 20.72% compared to the MPEG-4 ALS reference software encoder. In addition, the compression ratio increases by about 11.92%, compared to the single channel based bio-signal lossless data compressor. PMID:25237900

  6. The broadband social acoustic signaling behavior of spinner and spotted dolphins

    NASA Astrophysics Data System (ADS)

    Lammers, Marc O.; Au, Whitlow W. L.; Herzing, Denise L.

    2003-09-01

    Efforts to study the social acoustic signaling behavior of delphinids have traditionally been restricted to audio-range (<20 kHz) analyses. To explore the occurrence of communication signals at ultrasonic frequencies, broadband recordings of whistles and burst pulses were obtained from two commonly studied species of delphinids, the Hawaiian spinner dolphin (Stenella longirostris) and the Atlantic spotted dolphin (Stenella frontalis). Signals were quantitatively analyzed to establish their full bandwidth, to identify distinguishing characteristics between each species, and to determine how often they occur beyond the range of human hearing. Fundamental whistle contours were found to extend beyond 20 kHz only rarely among spotted dolphins, but with some regularity in spinner dolphins. Harmonics were present in the majority of whistles and varied considerably in their number, occurrence, and amplitude. Many whistles had harmonics that extended past 50 kHz and some reached as high as 100 kHz. The relative amplitude of harmonics and the high hearing sensitivity of dolphins to equivalent frequencies suggest that harmonics are biologically relevant spectral features. The burst pulses of both species were found to be predominantly ultrasonic, often with little or no energy below 20 kHz. The findings presented reveal that the social signals produced by spinner and spotted dolphins span the full range of their hearing sensitivity, are spectrally quite varied, and in the case of burst pulses are probably produced more frequently than reported by audio-range analyses.

  7. The broadband social acoustic signaling behavior of spinner and spotted dolphins.

    PubMed

    Lammers, Marc O; Au, Whitlow W L; Herzing, Denise L

    2003-09-01

    Efforts to study the social acoustic signaling behavior of delphinids have traditionally been restricted to audio-range (<20 kHz) analyses. To explore the occurrence of communication signals at ultrasonic frequencies, broadband recordings of whistles and burst pulses were obtained from two commonly studied species of delphinids, the Hawaiian spinner dolphin (Stenella longirostris) and the Atlantic spotted dolphin (Stenella frontalis). Signals were quantitatively analyzed to establish their full bandwidth, to identify distinguishing characteristics between each species, and to determine how often they occur beyond the range of human hearing. Fundamental whistle contours were found to extend beyond 20 kHz only rarely among spotted dolphins, but with some regularity in spinner dolphins. Harmonics were present in the majority of whistles and varied considerably in their number, occurrence, and amplitude. Many whistles had harmonics that extended past 50 kHz and some reached as high as 100 kHz. The relative amplitude of harmonics and the high hearing sensitivity of dolphins to equivalent frequencies suggest that harmonics are biologically relevant spectral features. The burst pulses of both species were found to be predominantly ultrasonic, often with little or no energy below 20 kHz. The findings presented reveal that the social signals produced by spinner and spotted dolphins span the full range of their hearing sensitivity, are spectrally quite varied, and in the case of burst pulses are probably produced more frequently than reported by audio-range analyses.

  8. Effects of Bone Vibrator Position on Auditory Spatial Perception Tasks.

    PubMed

    McBride, Maranda; Tran, Phuong; Pollard, Kimberly A; Letowski, Tomasz; McMillan, Garnett P

    2015-12-01

    This study assessed listeners' ability to localize spatially differentiated virtual audio signals delivered by bone conduction (BC) vibrators and circumaural air conduction (AC) headphones. Although the skull offers little intracranial sound wave attenuation, previous studies have demonstrated listeners' ability to localize auditory signals delivered by a pair of BC vibrators coupled to the mandibular condyle bones. The current study extended this research to other BC vibrator locations on the skull. Each participant listened to virtual audio signals originating from 16 different horizontal locations using circumaural headphones or BC vibrators placed in front of, above, or behind the listener's ears. The listener's task was to indicate the signal's perceived direction of origin. Localization accuracy with the BC front and BC top positions was comparable to that with the headphones, but responses for the BC back position were less accurate than both the headphones and BC front position. This study supports the conclusion of previous studies that listeners can localize virtual 3D signals equally well using AC and BC transducers. Based on these results, it is apparent that BC devices could be substituted for AC headphones with little to no localization performance degradation. BC headphones can be used when spatial auditory information needs to be delivered without occluding the ears. Although vibrator placement in front of the ears appears optimal from the localization standpoint, the top or back position may be acceptable from an operational standpoint or if the BC system is integrated into headgear. © 2015, Human Factors and Ergonomics Society.

  9. 16 CFR 1210.2 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... which this regulation applies. (d) Novelty lighter means a lighter that has entertaining audio or visual effects, or that depicts (logos, decals, art work, etc.) or resembles in physical form or function... produce a flame; and produces an audible or visual signal that will be clearly discernible when the...

  10. Automated Slide Projector

    ERIC Educational Resources Information Center

    Gould, Mauri

    1975-01-01

    Describes assembly of a moderately priced synchronized projector and cassette tape recorder using a single channel recorder with a tuned amplifier to separate voice and control tones. Construction requires familiarity with transistors and use of an oscilloscope with an audio signal generator. A picture as well as schematics is provided. (GH)

  11. Learning diagnostic models using speech and language measures.

    PubMed

    Peintner, Bart; Jarrold, William; Vergyriy, Dimitra; Richey, Colleen; Tempini, Maria Luisa Gorno; Ogar, Jennifer

    2008-01-01

    We describe results that show the effectiveness of machine learning in the automatic diagnosis of certain neurodegenerative diseases, several of which alter speech and language production. We analyzed audio from 9 control subjects and 30 patients diagnosed with one of three subtypes of Frontotemporal Lobar Degeneration. From this data, we extracted features of the audio signal and the words the patient used, which were obtained using our automated transcription technologies. We then automatically learned models that predict the diagnosis of the patient using these features. Our results show that learned models over these features predict diagnosis with accuracy significantly better than random. Future studies using higher quality recordings will likely improve these results.

  12. Single-sensor multispeaker listening with acoustic metamaterials

    PubMed Central

    Xie, Yangbo; Tsai, Tsung-Han; Konneker, Adam; Popa, Bogdan-Ioan; Brady, David J.; Cummer, Steven A.

    2015-01-01

    Designing a “cocktail party listener” that functionally mimics the selective perception of a human auditory system has been pursued over the past decades. By exploiting acoustic metamaterials and compressive sensing, we present here a single-sensor listening device that separates simultaneous overlapping sounds from different sources. The device with a compact array of resonant metamaterials is demonstrated to distinguish three overlapping and independent sources with 96.67% correct audio recognition. Segregation of the audio signals is achieved using physical layer encoding without relying on source characteristics. This hardware approach to multichannel source separation can be applied to robust speech recognition and hearing aids and may be extended to other acoustic imaging and sensing applications. PMID:26261314

  13. ATS-6 - Television Relay Using Small Terminals Experiment

    NASA Technical Reports Server (NTRS)

    Miller, J. E.

    1975-01-01

    The Television Relay Using Small Terminals (TRUST) Experiment was designed to advance and promote the technology of broadcasting satellites. A constant envelope television FM signal was transmitted at C band to the ATS-6 earth coverage horn and retransmitted at 860 MHz through the 9-m antenna to a low-cost direct-readout ground station. The experiment demonstrated that high-quality television and audio can be received by low-cost direct-receive ground stations. Predetection bandwidths significantly less than predicted by Carson's rule can be utilized with minimal degradation of either monochrome or color pictures. Two separate techniques of dual audio channel transmission have been demonstrated to be suitable for low-cost applications.

  14. Empirical Wavelet Transform Based Features for Classification of Parkinson's Disease Severity.

    PubMed

    Oung, Qi Wei; Muthusamy, Hariharan; Basah, Shafriza Nisha; Lee, Hoileong; Vijean, Vikneswaran

    2017-12-29

    Parkinson's disease (PD) is a type of progressive neurodegenerative disorder that has affected a large part of the population till now. Several symptoms of PD include tremor, rigidity, slowness of movements and vocal impairments. In order to develop an effective diagnostic system, a number of algorithms were proposed mainly to distinguish healthy individuals from the ones with PD. However, most of the previous works were conducted based on a binary classification, with the early PD stage and the advanced ones being treated equally. Therefore, in this work, we propose a multiclass classification with three classes of PD severity level (mild, moderate, severe) and healthy control. The focus is to detect and classify PD using signals from wearable motion and audio sensors based on both empirical wavelet transform (EWT) and empirical wavelet packet transform (EWPT) respectively. The EWT/EWPT was applied to decompose both speech and motion data signals up to five levels. Next, several features are extracted after obtaining the instantaneous amplitudes and frequencies from the coefficients of the decomposed signals by applying the Hilbert transform. The performance of the algorithm was analysed using three classifiers - K-nearest neighbour (KNN), probabilistic neural network (PNN) and extreme learning machine (ELM). Experimental results demonstrated that our proposed approach had the ability to differentiate PD from non-PD subjects, including their severity level - with classification accuracies of more than 90% using EWT/EWPT-ELM based on signals from motion and audio sensors respectively. Additionally, classification accuracy of more than 95% was achieved when EWT/EWPT-ELM is applied to signals from integration of both signal's information.

  15. Weaving together peer assessment, audios and medical vignettes in teaching medical terms

    PubMed Central

    Khan, Lateef M.

    2015-01-01

    Objectives The current study aims at exploring the possibility of aligning peer assessment, audiovisuals, and medical case-report extracts (vignettes) in medical terminology teaching. In addition, the study wishes to highlight the effectiveness of audio materials and medical history vignettes in preventing medical students' comprehension, listening, writing, and pronunciation errors. The study also aims at reflecting the medical students' attitudes towards the teaching and learning process. Methods The study involved 161 medical students who received an intensive medical terminology course through audio and medical history extracts. Peer assessment and formative assessment platforms were applied through fake quizzes in a pre- and post-test manner. An 18-item survey was distributed amongst students to investigate their attitudes and feedback towards the teaching and learning process. Quantitative and qualitative data were analysed using the SPSS software. Results The students did better in the posttests than on the pretests for both the quizzes of audios and medical vignettes showing a t-test of -12.09 and -13.60 respectively. Moreover, out of the 133 students, 120 students (90.22%) responded to the survey questions. The students gave positive attitudes towards the application of audios and vignettes in the teaching and learning of medical terminology and towards the learning process. Conclusions The current study revealed that the teaching and learning of medical terminology have more room for the application of advanced technologies, effective assessment platforms, and active learning strategies in higher education. It also highlights that students are capable of carrying more responsibilities of assessment, feedback, and e-learning. PMID:26637986

  16. System on a chip with MPEG-4 capability

    NASA Astrophysics Data System (ADS)

    Yassa, Fathy; Schonfeld, Dan

    2002-12-01

    Current products supporting video communication applications rely on existing computer architectures. RISC processors have been used successfully in numerous applications over several decades. DSP processors have become ubiquitous in signal processing and communication applications. Real-time applications such as speech processing in cellular telephony rely extensively on the computational power of these processors. Video processors designed to implement the computationally intensive codec operations have also been used to address the high demands of video communication applications (e.g., cable set-top boxes and DVDs). This paper presents an overview of a system-on-chip (SOC) architecture used for real-time video in wireless communication applications. The SOC specifications answer to the system requirements imposed by the application environment. A CAM-based video processor is used to accelerate data intensive video compression tasks such as motion estimations and filtering. Other components are dedicated to system level data processing and audio processing. A rich set of I/Os allows the SOC to communicate with other system components such as baseband and memory subsystems.

  17. Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

    PubMed

    Borrie, Stephanie A

    2015-03-01

    This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.

  18. Structuring Broadcast Audio for Information Access

    NASA Astrophysics Data System (ADS)

    Gauvain, Jean-Luc; Lamel, Lori

    2003-12-01

    One rapidly expanding application area for state-of-the-art speech recognition technology is the automatic processing of broadcast audiovisual data for information access. Since much of the linguistic information is found in the audio channel, speech recognition is a key enabling technology which, when combined with information retrieval techniques, can be used for searching large audiovisual document collections. Audio indexing must take into account the specificities of audio data such as needing to deal with the continuous data stream and an imperfect word transcription. Other important considerations are dealing with language specificities and facilitating language portability. At Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), broadcast news transcription systems have been developed for seven languages: English, French, German, Mandarin, Portuguese, Spanish, and Arabic. The transcription systems have been integrated into prototype demonstrators for several application areas such as audio data mining, structuring audiovisual archives, selective dissemination of information, and topic tracking for media monitoring. As examples, this paper addresses the spoken document retrieval and topic tracking tasks.

  19. Sonification for geoscience: Listening to faults from the inside

    NASA Astrophysics Data System (ADS)

    Barrett, Natasha; Mair, Karen

    2014-05-01

    Here we investigate the use of sonification for geoscience by sonifying the data generated in computer models of earthquake processes. Using mainly parameter mapping sonification, we explore data from our recent 3D DEM (discrete element method) models where granular debris is sheared between rough walls to simulate an evolving fault (e.g. Mair and Abe, 2011). To best appreciate the inherently 3D nature of the crushing and sliding events (continuously tracked in our models) that occur as faults slip, we use Ambisonics (a sound field recreation technology). This allows the position of individual events to be preserved generating a virtual 3D soundscape so we can explore faults from the inside. The addition of 3D audio to the sonification tool palate further allows us to more accurately connect to spatial data in a novel and engaging manner. During sonification, events such as grain scale fracturing, grain motions and interactions are mapped to specific sounds whose pitch, timbre, and volume reflect properties such as the depth, character, and size of the individual events. Our interactive and real-time approaches allow the listener to actively explore the data in time and space, listening to evolving processes by navigating through the spatial data via a 3D mouse controller. The soundscape can be heard either through an array of speakers or using a pair of headphones. Emergent phenomena in the models generate clear sound patterns that are easily spotted. Also, because our ears are excellent signal-to-noise filters, events are recognizable above the background noise. Although these features may be detectable visually, using a different sense (and part of the brain) gives a fresh perspective and facilitates a rapid appreciation of 'signals' through audio awareness, rather than specific scientific training. For this reason we anticipate significant potential for the future use of sonification in the presentation, interpretation and communication of geoscience datasets to both experts and the general public.

  20. One size does not fit all: older adults benefit from redundant text in multimedia instruction

    PubMed Central

    Fenesi, Barbara; Vandermorris, Susan; Kim, Joseph A.; Shore, David I.; Heisz, Jennifer J.

    2015-01-01

    The multimedia design of presentations typically ignores that younger and older adults have varying cognitive strengths and weaknesses. We examined whether differential instructional design may enhance learning in these populations. Younger and older participants viewed one of three computer-based presentations: Audio only (narration), Redundant (audio narration with redundant text), or Complementary (audio narration with non-redundant text and images). Younger participants learned better when audio narration was paired with relevant images compared to when audio narration was paired with redundant text. However, older participants learned best when audio narration was paired with redundant text. Younger adults, who presumably have a higher working memory capacity (WMC), appear to benefit more from complementary information that may drive deeper conceptual processing. In contrast, older adults learn better from presentations that support redundant coding across modalities, which may help mitigate the effects of age-related decline in WMC. Additionally, several misconceptions of design quality appeared across age groups: both younger and older participants positively rated less effective designs. Findings suggest that one-size does not fit all, with older adults requiring unique multimedia design tailored to their cognitive abilities for effective learning. PMID:26284000

  1. One size does not fit all: older adults benefit from redundant text in multimedia instruction.

    PubMed

    Fenesi, Barbara; Vandermorris, Susan; Kim, Joseph A; Shore, David I; Heisz, Jennifer J

    2015-01-01

    The multimedia design of presentations typically ignores that younger and older adults have varying cognitive strengths and weaknesses. We examined whether differential instructional design may enhance learning in these populations. Younger and older participants viewed one of three computer-based presentations: Audio only (narration), Redundant (audio narration with redundant text), or Complementary (audio narration with non-redundant text and images). Younger participants learned better when audio narration was paired with relevant images compared to when audio narration was paired with redundant text. However, older participants learned best when audio narration was paired with redundant text. Younger adults, who presumably have a higher working memory capacity (WMC), appear to benefit more from complementary information that may drive deeper conceptual processing. In contrast, older adults learn better from presentations that support redundant coding across modalities, which may help mitigate the effects of age-related decline in WMC. Additionally, several misconceptions of design quality appeared across age groups: both younger and older participants positively rated less effective designs. Findings suggest that one-size does not fit all, with older adults requiring unique multimedia design tailored to their cognitive abilities for effective learning.

  2. Audio aided electro-tactile perception training for finger posture biofeedback.

    PubMed

    Vargas, Jose Gonzalez; Yu, Wenwei

    2008-01-01

    Visual information is one of the prerequisites for most biofeedback studies. The aim of this study is to explore how the usage of an audio aided training helps in the learning process of dynamical electro-tactile perception without any visual feedback. In this research, the electrical simulation patterns associated with the experimenter's finger postures and motions were presented to the subjects. Along with the electrical stimulation patterns 2 different types of information, verbal and audio information on finger postures and motions, were presented to the verbal training subject group (group 1) and audio training subject group (group 2), respectively. The results showed an improvement in the ability to distinguish and memorize electrical stimulation patterns correspondent to finger postures and motions without visual feedback, and with audio tones aid, the learning was faster and the perception became more precise after training. Thus, this study clarified that, as a substitution to visual presentation, auditory information could help effectively in the formation of electro-tactile perception. Further research effort needed to make clear the difference between the visual guided and audio aided training in terms of information compilation, post-training effect and robustness of the perception.

  3. Effects of lips and hands on auditory learning of second-language speech sounds.

    PubMed

    Hirata, Yukari; Kelly, Spencer D

    2010-04-01

    Previous research has found that auditory training helps native English speakers to perceive phonemic vowel length contrasts in Japanese, but their performance did not reach native levels after training. Given that multimodal information, such as lip movement and hand gesture, influences many aspects of native language processing, the authors examined whether multimodal input helps to improve native English speakers' ability to perceive Japanese vowel length contrasts. Sixty native English speakers participated in 1 of 4 types of training: (a) audio-only; (b) audio-mouth; (c) audio-hands; and (d) audio-mouth-hands. Before and after training, participants were given phoneme perception tests that measured their ability to identify short and long vowels in Japanese (e.g., /kato/ vs. /kato/). Although all 4 groups improved from pre- to posttest (replicating previous research), the participants in the audio-mouth condition improved more than those in the audio-only condition, whereas the 2 conditions involving hand gestures did not. Seeing lip movements during training significantly helps learners to perceive difficult second-language phonemic contrasts, but seeing hand gestures does not. The authors discuss possible benefits and limitations of using multimodal information in second-language phoneme learning.

  4. Audio-visual presentation of information for informed consent for participation in clinical trials.

    PubMed

    Synnot, Anneliese; Ryan, Rebecca; Prictor, Megan; Fetherstonhaugh, Deirdre; Parker, Barbara

    2014-05-09

    Informed consent is a critical component of clinical research. Different methods of presenting information to potential participants of clinical trials may improve the informed consent process. Audio-visual interventions (presented, for example, on the Internet or on DVD) are one such method. We updated a 2008 review of the effects of these interventions for informed consent for trial participation. To assess the effects of audio-visual information interventions regarding informed consent compared with standard information or placebo audio-visual interventions regarding informed consent for potential clinical trial participants, in terms of their understanding, satisfaction, willingness to participate, and anxiety or other psychological distress. We searched: the Cochrane Central Register of Controlled Trials (CENTRAL), The Cochrane Library, issue 6, 2012; MEDLINE (OvidSP) (1946 to 13 June 2012); EMBASE (OvidSP) (1947 to 12 June 2012); PsycINFO (OvidSP) (1806 to June week 1 2012); CINAHL (EbscoHOST) (1981 to 27 June 2012); Current Contents (OvidSP) (1993 Week 27 to 2012 Week 26); and ERIC (Proquest) (searched 27 June 2012). We also searched reference lists of included studies and relevant review articles, and contacted study authors and experts. There were no language restrictions. We included randomised and quasi-randomised controlled trials comparing audio-visual information alone, or in conjunction with standard forms of information provision (such as written or verbal information), with standard forms of information provision or placebo audio-visual information, in the informed consent process for clinical trials. Trials involved individuals or their guardians asked to consider participating in a real or hypothetical clinical study. (In the earlier version of this review we only included studies evaluating informed consent interventions for real studies). Two authors independently assessed studies for inclusion and extracted data. We synthesised the findings using meta-analysis, where possible, and narrative synthesis of results. We assessed the risk of bias of individual studies and considered the impact of the quality of the overall evidence on the strength of the results. We included 16 studies involving data from 1884 participants. Nine studies included participants considering real clinical trials, and eight included participants considering hypothetical clinical trials, with one including both. All studies were conducted in high-income countries.There is still much uncertainty about the effect of audio-visual informed consent interventions on a range of patient outcomes. However, when considered across comparisons, we found low to very low quality evidence that such interventions may slightly improve knowledge or understanding of the parent trial, but may make little or no difference to rate of participation or willingness to participate. Audio-visual presentation of informed consent may improve participant satisfaction with the consent information provided. However its effect on satisfaction with other aspects of the process is not clear. There is insufficient evidence to draw conclusions about anxiety arising from audio-visual informed consent. We found conflicting, very low quality evidence about whether audio-visual interventions took more or less time to administer. No study measured researcher satisfaction with the informed consent process, nor ease of use.The evidence from real clinical trials was rated as low quality for most outcomes, and for hypothetical studies, very low. We note, however, that this was in large part due to poor study reporting, the hypothetical nature of some studies and low participant numbers, rather than inconsistent results between studies or confirmed poor trial quality. We do not believe that any studies were funded by organisations with a vested interest in the results. The value of audio-visual interventions as a tool for helping to enhance the informed consent process for people considering participating in clinical trials remains largely unclear, although trends are emerging with regard to improvements in knowledge and satisfaction. Many relevant outcomes have not been evaluated in randomised trials. Triallists should continue to explore innovative methods of providing information to potential trial participants during the informed consent process, mindful of the range of outcomes that the intervention should be designed to achieve, and balancing the resource implications of intervention development and delivery against the purported benefits of any intervention.More trials, adhering to CONSORT standards, and conducted in settings and populations underserved in this review, i.e. low- and middle-income countries and people with low literacy, would strengthen the results of this review and broaden its applicability. Assessing process measures, such as time taken to administer the intervention and researcher satisfaction, would inform the implementation of audio-visual consent materials.

  5. Applying emerging digital video interface standards to airborne avionics sensor and digital map integrations: benefits outweigh the initial costs

    NASA Astrophysics Data System (ADS)

    Kuehl, C. Stephen

    1996-06-01

    Video signal system performance can be compromised in a military aircraft cockpit management system (CMS) with the tailoring of vintage Electronics Industries Association (EIA) RS170 and RS343A video interface standards. Video analog interfaces degrade when induced system noise is present. Further signal degradation has been traditionally associated with signal data conversions between avionics sensor outputs and the cockpit display system. If the CMS engineering process is not carefully applied during the avionics video and computing architecture development, extensive and costly redesign will occur when visual sensor technology upgrades are incorporated. Close monitoring and technical involvement in video standards groups provides the knowledge-base necessary for avionic systems engineering organizations to architect adaptable and extendible cockpit management systems. With the Federal Communications Commission (FCC) in the process of adopting the Digital HDTV Grand Alliance System standard proposed by the Advanced Television Systems Committee (ATSC), the entertainment and telecommunications industries are adopting and supporting the emergence of new serial/parallel digital video interfaces and data compression standards that will drastically alter present NTSC-M video processing architectures. The re-engineering of the U.S. Broadcasting system must initially preserve the electronic equipment wiring networks within broadcast facilities to make the transition to HDTV affordable. International committee activities in technical forums like ITU-R (former CCIR), ANSI/SMPTE, IEEE, and ISO/IEC are establishing global consensus on video signal parameterizations that support a smooth transition from existing analog based broadcasting facilities to fully digital computerized systems. An opportunity exists for implementing these new video interface standards over existing video coax/triax cabling in military aircraft cockpit management systems. Reductions in signal conversion processing steps, major improvement in video noise reduction, and an added capability to pass audio/embedded digital data within the digital video signal stream are the significant performance increases associated with the incorporation of digital video interface standards. By analyzing the historical progression of military CMS developments, establishing a systems engineering process for CMS design, tracing the commercial evolution of video signal standardization, adopting commercial video signal terminology/definitions, and comparing/contrasting CMS architecture modifications using digital video interfaces; this paper provides a technical explanation on how a systems engineering process approach to video interface standardization can result in extendible and affordable cockpit management systems.

  6. Compression of surface myoelectric signals using MP3 encoding.

    PubMed

    Chan, Adrian D C

    2011-01-01

    The potential of MP3 compression of surface myoelectric signals is explored in this paper. MP3 compression is a perceptual-based encoder scheme, used traditionally to compress audio signals. The ubiquity of MP3 compression (e.g., portable consumer electronics and internet applications) makes it an attractive option for remote monitoring and telemedicine applications. The effects of muscle site and contraction type are examined at different MP3 encoding bitrates. Results demonstrate that MP3 compression is sensitive to the myoelectric signal bandwidth, with larger signal distortion associated with myoelectric signals that have higher bandwidths. Compared to other myoelectric signal compression techniques reported previously (embedded zero-tree wavelet compression and adaptive differential pulse code modulation), MP3 compression demonstrates superior performance (i.e., lower percent residual differences for the same compression ratios).

  7. A Robust Approach For Acoustic Noise Suppression In Speech Using ANFIS

    NASA Astrophysics Data System (ADS)

    Martinek, Radek; Kelnar, Michal; Vanus, Jan; Bilik, Petr; Zidek, Jan

    2015-11-01

    The authors of this article deals with the implementation of a combination of techniques of the fuzzy system and artificial intelligence in the application area of non-linear noise and interference suppression. This structure used is called an Adaptive Neuro Fuzzy Inference System (ANFIS). This system finds practical use mainly in audio telephone (mobile) communication in a noisy environment (transport, production halls, sports matches, etc). Experimental methods based on the two-input adaptive noise cancellation concept was clearly outlined. Within the experiments carried out, the authors created, based on the ANFIS structure, a comprehensive system for adaptive suppression of unwanted background interference that occurs in audio communication and degrades the audio signal. The system designed has been tested on real voice signals. This article presents the investigation and comparison amongst three distinct approaches to noise cancellation in speech; they are LMS (least mean squares) and RLS (recursive least squares) adaptive filtering and ANFIS. A careful review of literatures indicated the importance of non-linear adaptive algorithms over linear ones in noise cancellation. It was concluded that the ANFIS approach had the overall best performance as it efficiently cancelled noise even in highly noise-degraded speech. Results were drawn from the successful experimentation, subjective-based tests were used to analyse their comparative performance while objective tests were used to validate them. Implementation of algorithms was experimentally carried out in Matlab to justify the claims and determine their relative performances.

  8. 14 CFR 25.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Cockpit voice recorders. 25.1457 Section 25... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  9. 14 CFR 25.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Cockpit voice recorders. 25.1457 Section 25... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  10. 14 CFR 29.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Cockpit voice recorders. 29.1457 Section 29... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  11. 14 CFR 29.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Cockpit voice recorders. 29.1457 Section 29... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  12. 14 CFR 25.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Cockpit voice recorders. 25.1457 Section 25... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  13. An improved nuclear magnetic resonance spectrometer

    NASA Technical Reports Server (NTRS)

    Elleman, D. D.; Manatt, S. L.

    1967-01-01

    Cylindrical sample container provides a high degree of nuclear stabilization to a nuclear magnetic resonance /nmr/ spectrometer. It is placed coaxially about the nmr insert and contains reference sample that gives a signal suitable for locking the field and frequency of an nmr spectrometer with a simple audio modulation system.

  14. 47 CFR 101.91 - Involuntary relocation procedures.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... engineering, equipment, site and FCC fees, as well as any legitimate and prudent transaction expenses incurred..., reliability is measured by the percent of time the bit error rate (BER) exceeds a desired value, and for analog or digital voice transmissions, it is measured by the percent of time that audio signal quality...

  15. Speech Music Discrimination Using Class-Specific Features

    DTIC Science & Technology

    2004-08-01

    Speech Music Discrimination Using Class-Specific Features Thomas Beierholm...between speech and music . Feature extraction is class-specific and can therefore be tailored to each class meaning that segment size, model orders...interest. Some of the applications of audio signal classification are speech/ music classification [1], acoustical environmental classification [2][3

  16. A Comparative Study of Compression Video Technology.

    ERIC Educational Resources Information Center

    Keller, Chris A.; And Others

    The purpose of this study was to provide an overview of compression devices used to increase the cost effectiveness of teleconferences by reducing satellite bandwidth requirements for the transmission of television pictures and accompanying audio signals. The main body of the report describes the comparison study of compression rates and their…

  17. Audio-visual speech perception: a developmental ERP investigation

    PubMed Central

    Knowland, Victoria CP; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael SC

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language learning. We therefore explored this at the neural level. The event-related potential (ERP) technique has been used to assess the mechanisms of audio-visual speech perception in adults, with visual cues reliably modulating auditory ERP responses to speech. Previous work has shown congruence-dependent shortening of auditory N1/P2 latency and congruence-independent attenuation of amplitude in the presence of auditory and visual speech signals, compared to auditory alone. The aim of this study was to chart the development of these well-established modulatory effects over mid-to-late childhood. Experiment 1 employed an adult sample to validate a child-friendly stimulus set and paradigm by replicating previously observed effects of N1/P2 amplitude and latency modulation by visual speech cues; it also revealed greater attenuation of component amplitude given incongruent audio-visual stimuli, pointing to a new interpretation of the amplitude modulation effect. Experiment 2 used the same paradigm to map cross-sectional developmental change in these ERP responses between 6 and 11 years of age. The effect of amplitude modulation by visual cues emerged over development, while the effect of latency modulation was stable over the child sample. These data suggest that auditory ERP modulation by visual speech represents separable underlying cognitive processes, some of which show earlier maturation than others over the course of development. PMID:24176002

  18. The plastic ear and perceptual relearning in auditory spatial perception

    PubMed Central

    Carlile, Simon

    2014-01-01

    The auditory system of adult listeners has been shown to accommodate to altered spectral cues to sound location which presumably provides the basis for recalibration to changes in the shape of the ear over a life time. Here we review the role of auditory and non-auditory inputs to the perception of sound location and consider a range of recent experiments looking at the role of non-auditory inputs in the process of accommodation to these altered spectral cues. A number of studies have used small ear molds to modify the spectral cues that result in significant degradation in localization performance. Following chronic exposure (10–60 days) performance recovers to some extent and recent work has demonstrated that this occurs for both audio-visual and audio-only regions of space. This begs the questions as to the teacher signal for this remarkable functional plasticity in the adult nervous system. Following a brief review of influence of the motor state in auditory localization, we consider the potential role of auditory-motor learning in the perceptual recalibration of the spectral cues. Several recent studies have considered how multi-modal and sensory-motor feedback might influence accommodation to altered spectral cues produced by ear molds or through virtual auditory space stimulation using non-individualized spectral cues. The work with ear molds demonstrates that a relatively short period of training involving audio-motor feedback (5–10 days) significantly improved both the rate and extent of accommodation to altered spectral cues. This has significant implications not only for the mechanisms by which this complex sensory information is encoded to provide spatial cues but also for adaptive training to altered auditory inputs. The review concludes by considering the implications for rehabilitative training with hearing aids and cochlear prosthesis. PMID:25147497

  19. Multisensory Rehabilitation Training Improves Spatial Perception in Totally but Not Partially Visually Deprived Children

    PubMed Central

    Cappagli, Giulia; Finocchietti, Sara; Baud-Bovy, Gabriel; Cocchi, Elena; Gori, Monica

    2017-01-01

    Since it has been shown that spatial development can be delayed in blind children, focused sensorimotor trainings that associate auditory and motor information might be used to prevent the risk of spatial-related developmental delays or impairments from an early age. With this aim, we proposed a new technological device based on the implicit link between action and perception: ABBI (Audio Bracelet for Blind Interaction) is an audio bracelet that produces a sound when a movement occurs by allowing the substitution of the visuo-motor association with a new audio-motor association. In this study, we assessed the effects of an extensive but entertaining sensorimotor training with ABBI on the development of spatial hearing in a group of seven 3–5 years old children with congenital blindness (n = 2; light perception or no perception of light) or low vision (n = 5; visual acuity range 1.1–1.7 LogMAR). The training required the participants to play several spatial games individually and/or together with the psychomotor therapist 1 h per week for 3 months: the spatial games consisted of exercises meant to train their ability to associate visual and motor-related signals from their body, in order to foster the development of multisensory processes. We measured spatial performance by asking participants to indicate the position of one single fixed (static condition) or moving (dynamic condition) sound source on a vertical sensorized surface. We found that spatial performance of congenitally blind but not low vision children is improved after the training, indicating that early interventions with the use of science-driven devices based on multisensory capabilities can provide consistent advancements in therapeutic interventions, improving the quality of life of children with visual disability. PMID:29097987

  20. Audio spectrum and sound pressure levels vary between pulse oximeters.

    PubMed

    Chandra, Deven; Tessler, Michael J; Usher, John

    2006-01-01

    The variable-pitch pulse oximeter is an important intraoperative patient monitor. Our ability to hear its auditory signal depends on its acoustical properties and our hearing. This study quantitatively describes the audio spectrum and sound pressure levels of the monitoring tones produced by five variable-pitch pulse oximeters. We compared the Datex-Ohmeda Capnomac Ultima, Hewlett-Packard M1166A, Datex-Engstrom AS/3, Ohmeda Biox 3700, and Datex-Ohmeda 3800 oximeters. Three machines of each of the five models were assessed for sound pressure levels (using a precision sound level meter) and audio spectrum (using a hanning windowed fast Fourier trans-form of three beats at saturations of 99%, 90%, and 85%). The widest range of sound pressure levels was produced by the Hewlett-Packard M1166A (46.5 +/- 1.74 dB to 76.9 +/- 2.77 dB). The loudest model was the Datex-Engstrom AS/3 (89.2 +/- 5.36 dB). Three oximeters, when set to the lower ranges of their volume settings, were indistinguishable from background operating room noise. Each model produced sounds with different audio spectra. Although each model produced a fundamental tone with multiple harmonic overtones, the number of harmonics varied with each model; from three harmonic tones on the Hewlett-Packard M1166A, to 12 on the Ohmeda Biox 3700. There were variations between models, and individual machines of the same model with respect to the fundamental tone associated with a given saturation. There is considerable variance in the sound pressure and audio spectrum of commercially-available pulse oximeters. Further studies are warranted in order to establish standards.

  1. Modeling sports highlights using a time-series clustering framework and model interpretation

    NASA Astrophysics Data System (ADS)

    Radhakrishnan, Regunathan; Otsuka, Isao; Xiong, Ziyou; Divakaran, Ajay

    2005-01-01

    In our past work on sports highlights extraction, we have shown the utility of detecting audience reaction using an audio classification framework. The audio classes in the framework were chosen based on intuition. In this paper, we present a systematic way of identifying the key audio classes for sports highlights extraction using a time series clustering framework. We treat the low-level audio features as a time series and model the highlight segments as "unusual" events in a background of an "usual" process. The set of audio classes to characterize the sports domain is then identified by analyzing the consistent patterns in each of the clusters output from the time series clustering framework. The distribution of features from the training data so obtained for each of the key audio classes, is parameterized by a Minimum Description Length Gaussian Mixture Model (MDL-GMM). We also interpret the meaning of each of the mixture components of the MDL-GMM for the key audio class (the "highlight" class) that is correlated with highlight moments. Our results show that the "highlight" class is a mixture of audience cheering and commentator's excited speech. Furthermore, we show that the precision-recall performance for highlights extraction based on this "highlight" class is better than that of our previous approach which uses only audience cheering as the key highlight class.

  2. Metal Sounds Stiffer than Drums for Ears, but Not Always for Hands: Low-Level Auditory Features Affect Multisensory Stiffness Perception More than High-Level Categorical Information

    PubMed Central

    Liu, Juan; Ando, Hiroshi

    2016-01-01

    Most real-world events stimulate multiple sensory modalities simultaneously. Usually, the stiffness of an object is perceived haptically. However, auditory signals also contain stiffness-related information, and people can form impressions of stiffness from the different impact sounds of metal, wood, or glass. To understand whether there is any interaction between auditory and haptic stiffness perception, and if so, whether the inferred material category is the most relevant auditory information, we conducted experiments using a force-feedback device and the modal synthesis method to present haptic stimuli and impact sound in accordance with participants’ actions, and to modulate low-level acoustic parameters, i.e., frequency and damping, without changing the inferred material categories of sound sources. We found that metal sounds consistently induced an impression of stiffer surfaces than did drum sounds in the audio-only condition, but participants haptically perceived surfaces with modulated metal sounds as significantly softer than the same surfaces with modulated drum sounds, which directly opposes the impression induced by these sounds alone. This result indicates that, although the inferred material category is strongly associated with audio-only stiffness perception, low-level acoustic parameters, especially damping, are more tightly integrated with haptic signals than the material category is. Frequency played an important role in both audio-only and audio-haptic conditions. Our study provides evidence that auditory information influences stiffness perception differently in unisensory and multisensory tasks. Furthermore, the data demonstrated that sounds with higher frequency and/or shorter decay time tended to be judged as stiffer, and contact sounds of stiff objects had no effect on the haptic perception of soft surfaces. We argue that the intrinsic physical relationship between object stiffness and acoustic parameters may be applied as prior knowledge to achieve robust estimation of stiffness in multisensory perception. PMID:27902718

  3. The Redundancy Effect on Retention and Transfer for Individuals with High Symptoms of ADHD

    ERIC Educational Resources Information Center

    Brown, Victoria; Lewis, David; Toussaint, Mario

    2016-01-01

    The multimedia elements of text and audio need to be carefully integrated together to maximize the impact of those elements for learning in a multimedia environment. Redundancy information presented through audio and visual channels can inhibit learning for individuals diagnosed with ADHD, who may experience challenges in the processing of…

  4. Audio-Visual Aid in Teaching "Fatty Liver"

    ERIC Educational Resources Information Center

    Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

    2016-01-01

    Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…

  5. Using Songs as Audio Materials in Teaching Turkish as a Foreign Language

    ERIC Educational Resources Information Center

    Keskin, Funda

    2011-01-01

    The use of songs as audio materials in teaching Turkish as foreign language is an important part of language culture and has an important place in culture. Thus, the transfer of cultural aspects accelerates language learning process. In the light of this view, it becomes necessary to transfer cultural aspects into classroom environment in teaching…

  6. Agency Video, Audio and Imagery Library

    NASA Technical Reports Server (NTRS)

    Grubbs, Rodney

    2015-01-01

    The purpose of this presentation was to inform the ISS International Partners of the new NASA Agency Video, Audio and Imagery Library (AVAIL) website. AVAIL is a new resource for the public to search for and download NASA-related imagery, and is not intended to replace the current process by which the International Partners receive their Space Station imagery products.

  7. Musical examination to bridge audio data and sheet music

    NASA Astrophysics Data System (ADS)

    Pan, Xunyu; Cross, Timothy J.; Xiao, Liangliang; Hei, Xiali

    2015-03-01

    The digitalization of audio is commonly implemented for the purpose of convenient storage and transmission of music and songs in today's digital age. Analyzing digital audio for an insightful look at a specific musical characteristic, however, can be quite challenging for various types of applications. Many existing musical analysis techniques can examine a particular piece of audio data. For example, the frequency of digital sound can be easily read and identified at a specific section in an audio file. Based on this information, we could determine the musical note being played at that instant, but what if you want to see a list of all the notes played in a song? While most existing methods help to provide information about a single piece of the audio data at a time, few of them can analyze the available audio file on a larger scale. The research conducted in this work considers how to further utilize the examination of audio data by storing more information from the original audio file. In practice, we develop a novel musical analysis system Musicians Aid to process musical representation and examination of audio data. Musicians Aid solves the previous problem by storing and analyzing the audio information as it reads it rather than tossing it aside. The system can provide professional musicians with an insightful look at the music they created and advance their understanding of their work. Amateur musicians could also benefit from using it solely for the purpose of obtaining feedback about a song they were attempting to play. By comparing our system's interpretation of traditional sheet music with their own playing, a musician could ensure what they played was correct. More specifically, the system could show them exactly where they went wrong and how to adjust their mistakes. In addition, the application could be extended over the Internet to allow users to play music with one another and then review the audio data they produced. This would be particularly useful for teaching music lessons on the web. The developed system is evaluated with songs played with guitar, keyboard, violin, and other popular musical instruments (primarily electronic or stringed instruments). The Musicians Aid system is successful at both representing and analyzing audio data and it is also powerful in assisting individuals interested in learning and understanding music.

  8. Light Weight MP3 Watermarking Method for Mobile Terminals

    NASA Astrophysics Data System (ADS)

    Takagi, Koichi; Sakazawa, Shigeyuki; Takishima, Yasuhiro

    This paper proposes a novel MP3 watermarking method which is applicable to a mobile terminal with limited computational resources. Considering that in most cases the embedded information is copyright information or metadata, which should be extracted before playing back audio contents, the watermark detection process should be executed at high speed. However, when conventional methods are used with a mobile terminal, it takes a considerable amount of time to detect a digital watermark. This paper focuses on scalefactor manipulation to enable high speed watermark embedding/detection for MP3 audio and also proposes the manipulation method which minimizes audio quality degradation adaptively. Evaluation tests showed that the proposed method is capable of embedding 3 bits/frame information without degrading audio quality and detecting it at very high speed. Finally, this paper describes application examples for authentication with a digital signature.

  9. The enhancement of beneficial effects following audio feedback by cognitive preparation in the treatment of social anxiety: a single-session experiment.

    PubMed

    Nilsson, Jan-Erik; Lundh, Lars-Gunnar; Faghihi, Shahriar; Roth-Andersson, Gun

    2011-12-01

    According to cognitive models, negatively biased processing of the publicly observable self is an important aspect of social phobia; if this is true, effective methods for producing corrective feedback concerning the public self should be strived for. Video feedback is proven effective, but since one's voice represents another aspect of the self, audio feedback should produce equivalent results. This is the first study to assess the enhancement of audio feedback by cognitive preparation in a single-session randomized controlled experiment. Forty socially anxious participants were asked to give a speech, then to listen to and evaluate a taped recording of their performance. Half of the sample was given cognitive preparation prior to the audio feedback and the remainder received audio feedback only. Cognitive preparation involved asking participants to (1) predict in detail what they would hear on the audiotape, (2) form an image of themselves giving the speech and (3) listen to the audio recording as though they were listening to a stranger. To assess generalization effects all participants were asked to give a second speech. Audio feedback with cognitive preparation was shown to produce less negative ratings after the first speech, and effects generalized to the evaluation of the second speech. More positive speech evaluations were associated with corresponding reductions of state anxiety. Social anxiety as indexed by the Implicit Association Test was reduced in participants given cognitive preparation. Small sample size; analogue study. Audio feedback with cognitive preparation may be utilized as a treatment intervention for social phobia. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. Using ERPs for assessing the (sub) conscious perception of noise.

    PubMed

    Porbadnigk, Anne K; Antons, Jan-N; Blankertz, Benjamin; Treder, Matthias S; Schleicher, Robert; Moller, Sebastian; Curio, Gabriel

    2010-01-01

    In this paper, we investigate the use of event-related potentials (ERPs) as a quantitative measure for quality assessment of disturbed audio signals. For this purpose, we ran an EEG study (N=11) using an oddball paradigm, during which subjects were presented with the phoneme /a/, superimposed with varying degrees of signal-correlated noise. Based on this data set, we address the question to which degree the degradation of the auditory stimuli is reflected on a neural level, even if the disturbance is below the threshold of conscious perception. For those stimuli that are consciously recognized as being disturbed, we suggest the use of the amplitude and latency of the P300 component for assessing the level of disturbance. For disturbed stimuli for which the noise is not perceived consciously, we show for two subjects that a classifier based on shrinkage LDA can be applied successfully to single out stimuli, for which the noise was presumably processed subconsciously.

  11. DSP Synthesis Algorithm for Generating Florida Scrub Jay Calls

    NASA Technical Reports Server (NTRS)

    Lane, John; Pittman, Tyler

    2017-01-01

    A prototype digital signal processing (DSP) algorithm has been developed to approximate Florida scrub jay calls. The Florida scrub jay (Aphelocoma coerulescens), believed to have been in existence for 2 million years, living only in Florida, has a complicated social system that is evident by examining the spectrograms of its calls. Audio data was acquired at the Helen and Allan Cruickshank Sanctuary, Rockledge, Florida during the 2016 mating season using three digital recorders sampling at 44.1 kHz. The synthesis algorithm is a first step at developing a robust identification and call analysis algorithm. Since the Florida scrub jay is severely threatened by loss of habitat, it is important to develop effective methods to monitor their threatened population using autonomous means.

  12. Integrated multimodal human-computer interface and augmented reality for interactive display applications

    NASA Astrophysics Data System (ADS)

    Vassiliou, Marius S.; Sundareswaran, Venkataraman; Chen, S.; Behringer, Reinhold; Tam, Clement K.; Chan, M.; Bangayan, Phil T.; McGee, Joshua H.

    2000-08-01

    We describe new systems for improved integrated multimodal human-computer interaction and augmented reality for a diverse array of applications, including future advanced cockpits, tactical operations centers, and others. We have developed an integrated display system featuring: speech recognition of multiple concurrent users equipped with both standard air- coupled microphones and novel throat-coupled sensors (developed at Army Research Labs for increased noise immunity); lip reading for improving speech recognition accuracy in noisy environments, three-dimensional spatialized audio for improved display of warnings, alerts, and other information; wireless, coordinated handheld-PC control of a large display; real-time display of data and inferences from wireless integrated networked sensors with on-board signal processing and discrimination; gesture control with disambiguated point-and-speak capability; head- and eye- tracking coupled with speech recognition for 'look-and-speak' interaction; and integrated tetherless augmented reality on a wearable computer. The various interaction modalities (speech recognition, 3D audio, eyetracking, etc.) are implemented a 'modality servers' in an Internet-based client-server architecture. Each modality server encapsulates and exposes commercial and research software packages, presenting a socket network interface that is abstracted to a high-level interface, minimizing both vendor dependencies and required changes on the client side as the server's technology improves.

  13. Music to knowledge: A visual programming environment for the development and evaluation of music information retrieval techniques

    NASA Astrophysics Data System (ADS)

    Ehmann, Andreas F.; Downie, J. Stephen

    2005-09-01

    The objective of the International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) project is the creation of a large, secure corpus of audio and symbolic music data accessible to the music information retrieval (MIR) community for the testing and evaluation of various MIR techniques. As part of the IMIRSEL project, a cross-platform JAVA based visual programming environment called Music to Knowledge (M2K) is being developed for a variety of music information retrieval related tasks. The primary objective of M2K is to supply the MIR community with a toolset that provides the ability to rapidly prototype algorithms, as well as foster the sharing of techniques within the MIR community through the use of a standardized set of tools. Due to the relatively large size of audio data and the computational costs associated with some digital signal processing and machine learning techniques, M2K is also designed to support distributed computing across computing clusters. In addition, facilities to allow the integration of non-JAVA based (e.g., C/C++, MATLAB, etc.) algorithms and programs are provided within M2K. [Work supported by the Andrew W. Mellon Foundation and NSF Grants No. IIS-0340597 and No. IIS-0327371.

  14. Real-Time Reconfigurable Adaptive Speech Recognition Command and Control Apparatus and Method

    NASA Technical Reports Server (NTRS)

    Salazar, George A. (Inventor); Haynes, Dena S. (Inventor); Sommers, Marc J. (Inventor)

    1998-01-01

    An adaptive speech recognition and control system and method for controlling various mechanisms and systems in response to spoken instructions and in which spoken commands are effective to direct the system into appropriate memory nodes, and to respective appropriate memory templates corresponding to the voiced command is discussed. Spoken commands from any of a group of operators for which the system is trained may be identified, and voice templates are updated as required in response to changes in pronunciation and voice characteristics over time of any of the operators for which the system is trained. Provisions are made for both near-real-time retraining of the system with respect to individual terms which are determined not be positively identified, and for an overall system training and updating process in which recognition of each command and vocabulary term is checked, and in which the memory templates are retrained if necessary for respective commands or vocabulary terms with respect to an operator currently using the system. In one embodiment, the system includes input circuitry connected to a microphone and including signal processing and control sections for sensing the level of vocabulary recognition over a given period and, if recognition performance falls below a given level, processing audio-derived signals for enhancing recognition performance of the system.

  15. Incipient failure detection (IFD) of SSME ball bearings

    NASA Technical Reports Server (NTRS)

    1982-01-01

    Because of the immense noise background during the operation of a large engine such as the SSME, the relatively low level unique ball bearing signatures were often buried by the overall machine signal. As a result, the most commonly used bearing failure detection technique, pattern recognition using power spectral density (PSD) constructed from the extracted bearing signals, is rendered useless. Data enhancement techniques were carried out by using a HP5451C Fourier Analyzer. The signal was preprocessed by a Digital Audio Crop. DAC-1024I noise cancelling filter in order to estimate the desired signal corrupted by the backgound noise. Reference levels of good bearings were established. Any deviation of bearing signals from these reference levels indicate the incipient bearing failures.

  16. Multifunction waveform generator for EM receiver testing

    NASA Astrophysics Data System (ADS)

    Chen, Kai; Jin, Sheng; Deng, Ming

    2018-01-01

    In many electromagnetic (EM) methods - such as magnetotelluric, spectral-induced polarization (SIP), time-domain-induced polarization (TDIP), and controlled-source audio magnetotelluric (CSAMT) methods - it is important to evaluate and test the EM receivers during their development stage. To assess the performance of the developed EM receivers, controlled synthetic data that simulate the observed signals in different modes are required. In CSAMT and SIP mode testing, the waveform generator should use the GPS time as the reference for repeating schedule. Based on our testing, the frequency range, frequency precision, and time synchronization of the currently available function waveform generators on the market are deficient. This paper presents a multifunction waveform generator with three waveforms: (1) a wideband, low-noise electromagnetic field signal to be used for magnetotelluric, audio-magnetotelluric, and long-period magnetotelluric studies; (2) a repeating frequency sweep square waveform for CSAMT and SIP studies; and (3) a positive-zero-negative-zero signal that contains primary and secondary fields for TDIP studies. In this paper, we provide the principles of the above three waveforms along with a hardware design for the generator. Furthermore, testing of the EM receiver was conducted with the waveform generator, and the results of the experiment were compared with those calculated from the simulation and theory in the frequency band of interest.

  17. DEMON-type algorithms for determination of hydro-acoustic signatures of surface ships and of divers

    NASA Astrophysics Data System (ADS)

    Slamnoiu, G.; Radu, O.; Rosca, V.; Pascu, C.; Damian, R.; Surdu, G.; Curca, E.; Radulescu, A.

    2016-08-01

    With the project “System for detection, localization, tracking and identification of risk factors for strategic importance in littoral areas”, developed in the National Programme II, the members of the research consortium intend to develop a functional model for a hydroacoustic passive subsystem for determination of acoustic signatures of targets such as fast boats and autonomous divers. This paper presents some of the results obtained in the area of hydroacoustic signal processing by using DEMON-type algorithms (Detection of Envelope Modulation On Noise). For evaluation of the performance of various algorithm variations we have used both audio recordings of the underwater noise generated by ships and divers in real situations and also simulated noises. We have analysed the results of processing these signals using four DEMON algorithm structures as presented in the reference literature and a fifth DEMON algorithm structure proposed by the authors of this paper. The algorithm proposed by the authors generates similar results to those obtained by applying the traditional algorithms but requires less computing resources than those and at the same time it has proven to be more resilient to random noise influence.

  18. Audio reproduction for personal ambient home assistance: concepts and evaluations for normal-hearing and hearing-impaired persons.

    PubMed

    Huber, Rainer; Meis, Markus; Klink, Karin; Bartsch, Christian; Bitzer, Joerg

    2014-01-01

    Within the Lower Saxony Research Network Design of Environments for Ageing (GAL), a personal activity and household assistant (PAHA), an ambient reminder system, has been developed. One of its central output modality to interact with the user is sound. The study presented here evaluated three different system technologies for sound reproduction using up to five loudspeakers, including the "phantom source" concept. Moreover, a technology for hearing loss compensation for the mostly older users of the PAHA was implemented and evaluated. Evaluation experiments with 21 normal hearing and hearing impaired test subjects were carried out. The results show that after direct comparison of the sound presentation concepts, the presentation by the single TV speaker was most preferred, whereas the phantom source concept got the highest acceptance ratings as far as the general concept is concerned. The localization accuracy of the phantom source concept was good as long as the exact listening position was known to the algorithm and speech stimuli were used. Most subjects preferred the original signals over the pre-processed, dynamic-compressed signals, although processed speech was often described as being clearer.

  19. "Are You Listening Please?" The Advantages of Electronic Audio Feedback Compared to Written Feedback

    ERIC Educational Resources Information Center

    Lunt, Tom; Curran, John

    2010-01-01

    Feedback on students' work is, probably, one of the most important aspects of learning, yet students' report, according to the National Union of Students (NUS) Survey of 2008, unhappiness with the feedback process. Students were unhappy with the quality, detail and timing of feedback. This paper examines the benefits of using audio, as opposed to…

  20. Low Latency Audio Video: Potentials for Collaborative Music Making through Distance Learning

    ERIC Educational Resources Information Center

    Riley, Holly; MacLeod, Rebecca B.; Libera, Matthew

    2016-01-01

    The primary purpose of this study was to examine the potential of LOw LAtency (LOLA), a low latency audio visual technology designed to allow simultaneous music performance, as a distance learning tool for musical styles in which synchronous playing is an integral aspect of the learning process (e.g., jazz, folk styles). The secondary purpose was…

  1. Role of Audio and Audio-Visual Materials in Enhancing the Learning Process of Health Science Personnel.

    ERIC Educational Resources Information Center

    Cooper, William

    The material presented here is the result of a review of the Technical Development Plan of the National Library of Medicine, made with the object of describing the role of audiovisual materials in medical education, research and service, and particularly in the continuing education of physicians and allied health personnel. A historical background…

  2. Securing Digital Audio using Complex Quadratic Map

    NASA Astrophysics Data System (ADS)

    Suryadi, MT; Satria Gunawan, Tjandra; Satria, Yudi

    2018-03-01

    In This digital era, exchanging data are common and easy to do, therefore it is vulnerable to be attacked and manipulated from unauthorized parties. One data type that is vulnerable to attack is digital audio. So, we need data securing method that is not vulnerable and fast. One of the methods that match all of those criteria is securing the data using chaos function. Chaos function that is used in this research is complex quadratic map (CQM). There are some parameter value that causing the key stream that is generated by CQM function to pass all 15 NIST test, this means that the key stream that is generated using this CQM is proven to be random. In addition, samples of encrypted digital sound when tested using goodness of fit test are proven to be uniform, so securing digital audio using this method is not vulnerable to frequency analysis attack. The key space is very huge about 8.1×l031 possible keys and the key sensitivity is very small about 10-10, therefore this method is also not vulnerable against brute-force attack. And finally, the processing speed for both encryption and decryption process on average about 450 times faster that its digital audio duration.

  3. Reconfigurable Auditory-Visual Display

    NASA Technical Reports Server (NTRS)

    Begault, Durand R. (Inventor); Anderson, Mark R. (Inventor); McClain, Bryan (Inventor); Miller, Joel D. (Inventor)

    2008-01-01

    System and method for visual and audible communication between a central operator and N mobile communicators (N greater than or equal to 2), including an operator transceiver and interface, configured to receive and display, for the operator, visually perceptible and audibly perceptible signals from each of the mobile communicators. The interface (1) presents an audible signal from each communicator as if the audible signal is received from a different location relative to the operator and (2) allows the operator to select, to assign priority to, and to display, the visual signals and the audible signals received from a specified communicator. Each communicator has an associated signal transmitter that is configured to transmit at least one of the visual signals and the audio signal associated with the communicator, where at least one of the signal transmitters includes at least one sensor that senses and transmits a sensor value representing a selected environmental or physiological parameter associated with the communicator.

  4. Electromagnetic receiver with capacitive electrodes and triaxial induction coil for tunnel exploration

    NASA Astrophysics Data System (ADS)

    Kai, Chen; Sheng, Jin; Wang, Shun

    2017-09-01

    A new type of electromagnetic (EM) receiver has been developed by integrating four capacitive electrodes and a triaxial induction coil with an advanced data logger for tunnel exploration. The new EM receiver can conduct EM observations in tunnels, which is one of the principal goals of surface-tunnel-borehole EM detection for deep ore deposit mapping. The use of capacitive electrodes enables us to record the electrical field (E-field) signals from hard rock surfaces, which are high-resistance terrains. A compact triaxial induction coil integrates three independent induction coils for narrow-tunnel exploration applications. A low-time-drift-error clock source is developed for tunnel applications where GPS signals are unavailable. The three main components of our tunnel EM receiver are: (1) four capacitive electrodes for measuring the E-field signal without digging in hard rock regions; (2) a triaxial induction coil sensor for audio-frequency magnetotelluric and controlled-source audio-frequency magnetotelluric signal measurements; and (3) a data logger that allows us to record five-component MT signals with low noise levels, low time-drift-error for the clock source, and high dynamic range. The proposed tunnel EM receiver was successfully deployed in a mine that exhibited with typical noise characteristics. [Figure not available: see fulltext. Caption: The new EM receiver can conduct EM observations in tunnels, which is one of the principal goals of the surface-tunnel-borehole EM (STBEM) detection for deep ore deposit mapping. The use of a capacitive electrode enables us to record the electrical field (E-field) signals from hard rock surfaces. A compact triaxial induction coil integrated three induction coils, for narrow-tunnel applications.

  5. Improvement of information fusion-based audio steganalysis

    NASA Astrophysics Data System (ADS)

    Kraetzer, Christian; Dittmann, Jana

    2010-01-01

    In the paper we extend an existing information fusion based audio steganalysis approach by three different kinds of evaluations: The first evaluation addresses the so far neglected evaluations on sensor level fusion. Our results show that this fusion removes content dependability while being capable of achieving similar classification rates (especially for the considered global features) if compared to single classifiers on the three exemplarily tested audio data hiding algorithms. The second evaluation enhances the observations on fusion from considering only segmental features to combinations of segmental and global features, with the result of a reduction of the required computational complexity for testing by about two magnitudes while maintaining the same degree of accuracy. The third evaluation tries to build a basis for estimating the plausibility of the introduced steganalysis approach by measuring the sensibility of the models used in supervised classification of steganographic material against typical signal modification operations like de-noising or 128kBit/s MP3 encoding. Our results show that for some of the tested classifiers the probability of false alarms rises dramatically after such modifications.

  6. Audio Spectrogram Representations for Processing with Convolutional Neural Networks

    NASA Astrophysics Data System (ADS)

    Wyse, L.

    2017-05-01

    One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than it seems to be for visual images, and a variety of representations have been used for different applications including the raw digitized sample stream, hand-crafted features, machine discovered features, MFCCs and variants that include deltas, and a variety of spectral representations. This paper reviews some of these representations and issues that arise, focusing particularly on spectrograms for generating audio using neural networks for style transfer.

  7. Computationally Efficient Clustering of Audio-Visual Meeting Data

    NASA Astrophysics Data System (ADS)

    Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

    This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.

  8. Online Distance Teaching of Undergraduate Finance: A Case for Musashi University and Konan University, Japan

    ERIC Educational Resources Information Center

    Kubota, Keiichi; Fujikawa, Kiyoshi

    2007-01-01

    We implemented a synchronous distance course entitled: Introductory Finance designed for undergraduate students. This course was held between two Japanese universities. Stable Internet connections allowing minimum delay and minimum interruptions of the audio-video streaming signals were used. Students were equipped with their own PCs with…

  9. Machine Learning Methods for Articulatory Data

    ERIC Educational Resources Information Center

    Berry, Jeffrey James

    2012-01-01

    Humans make use of more than just the audio signal to perceive speech. Behavioral and neurological research has shown that a person's knowledge of how speech is produced influences what is perceived. With methods for collecting articulatory data becoming more ubiquitous, methods for extracting useful information are needed to make this data…

  10. Audio-visual integration through the parallel visual pathways.

    PubMed

    Kaposvári, Péter; Csete, Gergő; Bognár, Anna; Csibri, Péter; Tóth, Eszter; Szabó, Nikoletta; Vécsei, László; Sáry, Gyula; Tamás Kincses, Zsigmond

    2015-10-22

    Audio-visual integration has been shown to be present in a wide range of different conditions, some of which are processed through the dorsal, and others through the ventral visual pathway. Whereas neuroimaging studies have revealed integration-related activity in the brain, there has been no imaging study of the possible role of segregated visual streams in audio-visual integration. We set out to determine how the different visual pathways participate in this communication. We investigated how audio-visual integration can be supported through the dorsal and ventral visual pathways during the double flash illusion. Low-contrast and chromatic isoluminant stimuli were used to drive preferably the dorsal and ventral pathways, respectively. In order to identify the anatomical substrates of the audio-visual interaction in the two conditions, the psychophysical results were correlated with the white matter integrity as measured by diffusion tensor imaging.The psychophysiological data revealed a robust double flash illusion in both conditions. A correlation between the psychophysical results and local fractional anisotropy was found in the occipito-parietal white matter in the low-contrast condition, while a similar correlation was found in the infero-temporal white matter in the chromatic isoluminant condition. Our results indicate that both of the parallel visual pathways may play a role in the audio-visual interaction. Copyright © 2015. Published by Elsevier B.V.

  11. Selective Attention Modulates the Direction of Audio-Visual Temporal Recalibration

    PubMed Central

    Ikumi, Nara; Soto-Faraco, Salvador

    2014-01-01

    Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging), was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes. PMID:25004132

  12. Selective attention modulates the direction of audio-visual temporal recalibration.

    PubMed

    Ikumi, Nara; Soto-Faraco, Salvador

    2014-01-01

    Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging), was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes.

  13. Cortical Tracking of Global and Local Variations of Speech Rhythm during Connected Natural Speech Perception.

    PubMed

    Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

    2018-06-19

    During natural speech perception, listeners must track the global speaking rate, that is, the overall rate of incoming linguistic information, as well as transient, local speaking rate variations occurring within the global speaking rate. Here, we address the hypothesis that this tracking mechanism is achieved through coupling of cortical signals to the amplitude envelope of the perceived acoustic speech signals. Cortical signals were recorded with magnetoencephalography (MEG) while participants perceived spontaneously produced speech stimuli at three global speaking rates (slow, normal/habitual, and fast). Inherently to spontaneously produced speech, these stimuli also featured local variations in speaking rate. The coupling between cortical and acoustic speech signals was evaluated using audio-MEG coherence. Modulations in audio-MEG coherence spatially differentiated between tracking of global speaking rate, highlighting the temporal cortex bilaterally and the right parietal cortex, and sensitivity to local speaking rate variations, emphasizing the left parietal cortex. Cortical tuning to the temporal structure of natural connected speech thus seems to require the joint contribution of both auditory and parietal regions. These findings suggest that cortical tuning to speech rhythm operates on two functionally distinct levels: one encoding the global rhythmic structure of speech and the other associated with online, rapidly evolving temporal predictions. Thus, it may be proposed that speech perception is shaped by evolutionary tuning, a preference for certain speaking rates, and predictive tuning, associated with cortical tracking of the constantly changing rate of linguistic information in a speech stream.

  14. Vibration signaling in mobile devices for emergency alerting: a study with deaf evaluators.

    PubMed

    Harkins, Judith; Tucker, Paula E; Williams, Norman; Sauro, Jeff

    2010-01-01

    In the United States, a nationwide Commercial Mobile Alert Service (CMAS) is being planned to alert cellular mobile device subscribers to emergencies occurring near the location of the mobile device. The plan specifies a unique audio attention signal as well as a unique vibration attention signal (for mobile devices set to vibrate) to identify that the incoming message pertains to an emergency. Ratings of vibration signals of varying lengths and patterns were obtained from 44 deaf users of mobile devices for the perceived effectiveness of the signal in getting their attention in an emergency situation. Longer signals received higher ratings than shorter ones, and three signals with temporal on-off patterns were rated significantly better than a constant vibration. The U.S. government's recommended vibration signal for the CMAS, an important feature for access to emergency alerts by deaf persons, is supported by the results of the study.

  15. Bat detective-Deep learning tools for bat acoustic signal detection.

    PubMed

    Mac Aodha, Oisin; Gibb, Rory; Barlow, Kate E; Browning, Ella; Firman, Michael; Freeman, Robin; Harder, Briana; Kinsey, Libby; Mead, Gary R; Newson, Stuart E; Pandourski, Ivan; Parsons, Stuart; Russ, Jon; Szodoray-Paradi, Abigel; Szodoray-Paradi, Farkas; Tilova, Elena; Girolami, Mark; Brostow, Gabriel; Jones, Kate E

    2018-03-01

    Passive acoustic sensing has emerged as a powerful tool for quantifying anthropogenic impacts on biodiversity, especially for echolocating bat species. To better assess bat population trends there is a critical need for accurate, reliable, and open source tools that allow the detection and classification of bat calls in large collections of audio recordings. The majority of existing tools are commercial or have focused on the species classification task, neglecting the important problem of first localizing echolocation calls in audio which is particularly problematic in noisy recordings. We developed a convolutional neural network based open-source pipeline for detecting ultrasonic, full-spectrum, search-phase calls produced by echolocating bats. Our deep learning algorithms were trained on full-spectrum ultrasonic audio collected along road-transects across Europe and labelled by citizen scientists from www.batdetective.org. When compared to other existing algorithms and commercial systems, we show significantly higher detection performance of search-phase echolocation calls with our test sets. As an example application, we ran our detection pipeline on bat monitoring data collected over five years from Jersey (UK), and compared results to a widely-used commercial system. Our detection pipeline can be used for the automatic detection and monitoring of bat populations, and further facilitates their use as indicator species on a large scale. Our proposed pipeline makes only a small number of bat specific design decisions, and with appropriate training data it could be applied to detecting other species in audio. A crucial novelty of our work is showing that with careful, non-trivial, design and implementation considerations, state-of-the-art deep learning methods can be used for accurate and efficient monitoring in audio.

  16. Bat detective—Deep learning tools for bat acoustic signal detection

    PubMed Central

    Barlow, Kate E.; Firman, Michael; Freeman, Robin; Harder, Briana; Kinsey, Libby; Mead, Gary R.; Newson, Stuart E.; Pandourski, Ivan; Russ, Jon; Szodoray-Paradi, Abigel; Tilova, Elena; Girolami, Mark; Jones, Kate E.

    2018-01-01

    Passive acoustic sensing has emerged as a powerful tool for quantifying anthropogenic impacts on biodiversity, especially for echolocating bat species. To better assess bat population trends there is a critical need for accurate, reliable, and open source tools that allow the detection and classification of bat calls in large collections of audio recordings. The majority of existing tools are commercial or have focused on the species classification task, neglecting the important problem of first localizing echolocation calls in audio which is particularly problematic in noisy recordings. We developed a convolutional neural network based open-source pipeline for detecting ultrasonic, full-spectrum, search-phase calls produced by echolocating bats. Our deep learning algorithms were trained on full-spectrum ultrasonic audio collected along road-transects across Europe and labelled by citizen scientists from www.batdetective.org. When compared to other existing algorithms and commercial systems, we show significantly higher detection performance of search-phase echolocation calls with our test sets. As an example application, we ran our detection pipeline on bat monitoring data collected over five years from Jersey (UK), and compared results to a widely-used commercial system. Our detection pipeline can be used for the automatic detection and monitoring of bat populations, and further facilitates their use as indicator species on a large scale. Our proposed pipeline makes only a small number of bat specific design decisions, and with appropriate training data it could be applied to detecting other species in audio. A crucial novelty of our work is showing that with careful, non-trivial, design and implementation considerations, state-of-the-art deep learning methods can be used for accurate and efficient monitoring in audio. PMID:29518076

  17. Overdrive and Edge as Refiners of "Belting"?: An Empirical Study Qualifying and Categorizing "Belting" Based on Audio Perception, Laryngostroboscopic Imaging, Acoustics, LTAS, and EGG.

    PubMed

    McGlashan, Julian; Thuesen, Mathias Aaen; Sadolin, Cathrine

    2017-05-01

    We aimed to study the categorizations "Overdrive" and "Edge" from the pedagogical method Complete Vocal Technique as refiners of the often ill-defined concept of "belting" by means of audio perception, laryngostroboscopic imaging, acoustics, long-term average spectrum (LTAS), and electroglottography (EGG). This is a case-control study. Twenty singers were recorded singing sustained vowels in a "belting" quality refined by audio perception as "Overdrive" and "Edge." Two studies were performed: (1) a laryngostroboscopic examination using a videonasoendoscopic camera system (Olympus) and the Laryngostrobe program (Laryngograph); (2) a simultaneous recording of the EGG and acoustic signals using Speech Studio (Laryngograph). The images were analyzed based on consensus agreement. Statistical analysis of the acoustic, LTAS, and EGG parameters was undertaken using the Student paired t test. The two modes of singing determined by audio perception have visibly different laryngeal gestures: Edge has a more constricted setting than that of Overdrive, where the ventricular folds seem to cover more of the vocal folds, the aryepiglottic folds show a sharper edge in Edge, and the cuneiform cartilages are rolled in anteromedially. LTAS analysis shows a statistical difference, particularly after the ninth harmonic, with a coinciding first formant. The combined group showed statistical differences in shimmer, harmonics-to-noise ratio, normalized noise energy, and mean sound pressure level (P ≤ 0.05). "Belting" sounds can be categorized using audio perception into two modes of singing: "Overdrive" and "Edge." This study demonstrates consistent visibly different laryngeal gestures between these modes and with some correspondingly significant differences in LTAS, EGG, and acoustic measures. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  18. The Emotion Recognition System Based on Autoregressive Model and Sequential Forward Feature Selection of Electroencephalogram Signals

    PubMed Central

    Hatamikia, Sepideh; Maghooli, Keivan; Nasrabadi, Ali Motie

    2014-01-01

    Electroencephalogram (EEG) is one of the useful biological signals to distinguish different brain diseases and mental states. In recent years, detecting different emotional states from biological signals has been merged more attention by researchers and several feature extraction methods and classifiers are suggested to recognize emotions from EEG signals. In this research, we introduce an emotion recognition system using autoregressive (AR) model, sequential forward feature selection (SFS) and K-nearest neighbor (KNN) classifier using EEG signals during emotional audio-visual inductions. The main purpose of this paper is to investigate the performance of AR features in the classification of emotional states. To achieve this goal, a distinguished AR method (Burg's method) based on Levinson-Durbin's recursive algorithm is used and AR coefficients are extracted as feature vectors. In the next step, two different feature selection methods based on SFS algorithm and Davies–Bouldin index are used in order to decrease the complexity of computing and redundancy of features; then, three different classifiers include KNN, quadratic discriminant analysis and linear discriminant analysis are used to discriminate two and three different classes of valence and arousal levels. The proposed method is evaluated with EEG signals of available database for emotion analysis using physiological signals, which are recorded from 32 participants during 40 1 min audio visual inductions. According to the results, AR features are efficient to recognize emotional states from EEG signals, and KNN performs better than two other classifiers in discriminating of both two and three valence/arousal classes. The results also show that SFS method improves accuracies by almost 10-15% as compared to Davies–Bouldin based feature selection. The best accuracies are %72.33 and %74.20 for two classes of valence and arousal and %61.10 and %65.16 for three classes, respectively. PMID:25298928

  19. 4DCAPTURE: a general purpose software package for capturing and analyzing two- and three-dimensional motion data acquired from video sequences

    NASA Astrophysics Data System (ADS)

    Walton, James S.; Hodgson, Peter; Hallamasek, Karen; Palmer, Jake

    2003-07-01

    4DVideo is creating a general purpose capability for capturing and analyzing kinematic data from video sequences in near real-time. The core element of this capability is a software package designed for the PC platform. The software ("4DCapture") is designed to capture and manipulate customized AVI files that can contain a variety of synchronized data streams -- including audio, video, centroid locations -- and signals acquired from more traditional sources (such as accelerometers and strain gauges.) The code includes simultaneous capture or playback of multiple video streams, and linear editing of the images (together with the ancilliary data embedded in the files). Corresponding landmarks seen from two or more views are matched automatically, and photogrammetric algorithms permit multiple landmarks to be tracked in two- and three-dimensions -- with or without lens calibrations. Trajectory data can be processed within the main application or they can be exported to a spreadsheet where they can be processed or passed along to a more sophisticated, stand-alone, data analysis application. Previous attempts to develop such applications for high-speed imaging have been limited in their scope, or by the complexity of the application itself. 4DVideo has devised a friendly ("FlowStack") user interface that assists the end-user to capture and treat image sequences in a natural progression. 4DCapture employs the AVI 2.0 standard and DirectX technology which effectively eliminates the file size limitations found in older applications. In early tests, 4DVideo has streamed three RS-170 video sources to disk for more than an hour without loss of data. At this time, the software can acquire video sequences in three ways: (1) directly, from up to three hard-wired cameras supplying RS-170 (monochrome) signals; (2) directly, from a single camera or video recorder supplying an NTSC (color) signal; and (3) by importing existing video streams in the AVI 1.0 or AVI 2.0 formats. The latter is particularly useful for high-speed applications where the raw images are often captured and stored by the camera before being downloaded. Provision has been made to synchronize data acquired from any combination of these video sources using audio and visual "tags." Additional "front-ends," designed for digital cameras, are anticipated.

  20. Context-specific effects of musical expertise on audiovisual integration

    PubMed Central

    Bishop, Laura; Goebl, Werner

    2014-01-01

    Ensemble musicians exchange auditory and visual signals that can facilitate interpersonal synchronization. Musical expertise improves how precisely auditory and visual signals are perceptually integrated and increases sensitivity to asynchrony between them. Whether expertise improves sensitivity to audiovisual asynchrony in all instrumental contexts or only in those using sound-producing gestures that are within an observer's own motor repertoire is unclear. This study tested the hypothesis that musicians are more sensitive to audiovisual asynchrony in performances featuring their own instrument than in performances featuring other instruments. Short clips were extracted from audio-video recordings of clarinet, piano, and violin performances and presented to highly-skilled clarinetists, pianists, and violinists. Clips either maintained the audiovisual synchrony present in the original recording or were modified so that the video led or lagged behind the audio. Participants indicated whether the audio and video channels in each clip were synchronized. The range of asynchronies most often endorsed as synchronized was assessed as a measure of participants' sensitivities to audiovisual asynchrony. A positive relationship was observed between musical training and sensitivity, with data pooled across stimuli. While participants across expertise groups detected asynchronies most readily in piano stimuli and least readily in violin stimuli, pianists showed significantly better performance for piano stimuli than for either clarinet or violin. These findings suggest that, to an extent, the effects of expertise on audiovisual integration can be instrument-specific; however, the nature of the sound-producing gestures that are observed has a substantial effect on how readily asynchrony is detected as well. PMID:25324819

  1. Audio-visual presentation of information for informed consent for participation in clinical trials.

    PubMed

    Ryan, R E; Prictor, M J; McLaughlin, K J; Hill, S J

    2008-01-23

    Informed consent is a critical component of clinical research. Different methods of presenting information to potential participants of clinical trials may improve the informed consent process. Audio-visual interventions (presented for example on the Internet, DVD, or video cassette) are one such method. To assess the effects of providing audio-visual information alone, or in conjunction with standard forms of information provision, to potential clinical trial participants in the informed consent process, in terms of their satisfaction, understanding and recall of information about the study, level of anxiety and their decision whether or not to participate. We searched: the Cochrane Consumers and Communication Review Group Specialised Register (searched 20 June 2006); the Cochrane Central Register of Controlled Trials (CENTRAL), The Cochrane Library, issue 2, 2006; MEDLINE (Ovid) (1966 to June week 1 2006); EMBASE (Ovid) (1988 to 2006 week 24); and other databases. We also searched reference lists of included studies and relevant review articles, and contacted study authors and experts. There were no language restrictions. Randomised and quasi-randomised controlled trials comparing audio-visual information alone, or in conjunction with standard forms of information provision (such as written or oral information as usually employed in the particular service setting), with standard forms of information provision alone, in the informed consent process for clinical trials. Trials involved individuals or their guardians asked to participate in a real (not hypothetical) clinical study. Two authors independently assessed studies for inclusion and extracted data. Due to heterogeneity no meta-analysis was possible; we present the findings in a narrative review. We included 4 trials involving data from 511 people. Studies were set in the USA and Canada. Three were randomised controlled trials (RCTs) and the fourth a quasi-randomised trial. Their quality was mixed and results should be interpreted with caution. Considerable uncertainty remains about the effects of audio-visual interventions, compared with standard forms of information provision (such as written or oral information normally used in the particular setting), for use in the process of obtaining informed consent for clinical trials. Audio-visual interventions did not consistently increase participants' levels of knowledge/understanding (assessed in four studies), although one study showed better retention of knowledge amongst intervention recipients. An audio-visual intervention may transiently increase people's willingness to participate in trials (one study), but this was not sustained at two to four weeks post-intervention. Perceived worth of the trial did not appear to be influenced by an audio-visual intervention (one study), but another study suggested that the quality of information disclosed may be enhanced by an audio-visual intervention. Many relevant outcomes including harms were not measured. The heterogeneity in results may reflect the differences in intervention design, content and delivery, the populations studied and the diverse methods of outcome assessment in included studies. The value of audio-visual interventions for people considering participating in clinical trials remains unclear. Evidence is mixed as to whether audio-visual interventions enhance people's knowledge of the trial they are considering entering, and/or the health condition the trial is designed to address; one study showed improved retention of knowledge amongst intervention recipients. The intervention may also have small positive effects on the quality of information disclosed, and may increase willingness to participate in the short-term; however the evidence is weak. There were no data for several primary outcomes, including harms. In the absence of clear results, triallists should continue to explore innovative methods of providing information to potential trial participants. Further research should take the form of high-quality randomised controlled trials, with clear reporting of methods. Studies should conduct content assessment of audio-visual and other innovative interventions for people of differing levels of understanding and education; also for different age and cultural groups. Researchers should assess systematically the effects of different intervention components and delivery characteristics, and should involve consumers in intervention development. Studies should assess additional outcomes relevant to individuals' decisional capacity, using validated tools, including satisfaction; anxiety; and adherence to the subsequent trial protocol.

  2. Efficient Geometric Sound Propagation Using Visibility Culling

    NASA Astrophysics Data System (ADS)

    Chandak, Anish

    2011-07-01

    Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario.

  3. A Binaural CI Research Platform for Oticon Medical SP/XP Implants Enabling ITD/ILD and Variable Rate Processing

    PubMed Central

    Adiloğlu, K.; Herzke, T.

    2015-01-01

    We present the first portable, binaural, real-time research platform compatible with Oticon Medical SP and XP generation cochlear implants. The platform consists of (a) a pair of behind-the-ear devices, each containing front and rear calibrated microphones, (b) a four-channel USB analog-to-digital converter, (c) real-time PC-based sound processing software called the Master Hearing Aid, and (d) USB-connected hardware and output coils capable of driving two implants simultaneously. The platform is capable of processing signals from the four microphones simultaneously and producing synchronized binaural cochlear implant outputs that drive two (bilaterally implanted) SP or XP implants. Both audio signal preprocessing algorithms (such as binaural beamforming) and novel binaural stimulation strategies (within the implant limitations) can be programmed by researchers. When the whole research platform is combined with Oticon Medical SP implants, interaural electrode timing can be controlled on individual electrodes to within ±1 µs and interaural electrode energy differences can be controlled to within ±2%. Hence, this new platform is particularly well suited to performing experiments related to interaural time differences in combination with interaural level differences in real-time. The platform also supports instantaneously variable stimulation rates and thereby enables investigations such as the effect of changing the stimulation rate on pitch perception. Because the processing can be changed on the fly, researchers can use this platform to study perceptual changes resulting from different processing strategies acutely. PMID:26721923

  4. A Binaural CI Research Platform for Oticon Medical SP/XP Implants Enabling ITD/ILD and Variable Rate Processing.

    PubMed

    Backus, B; Adiloğlu, K; Herzke, T

    2015-12-30

    We present the first portable, binaural, real-time research platform compatible with Oticon Medical SP and XP generation cochlear implants. The platform consists of (a) a pair of behind-the-ear devices, each containing front and rear calibrated microphones, (b) a four-channel USB analog-to-digital converter, (c) real-time PC-based sound processing software called the Master Hearing Aid, and (d) USB-connected hardware and output coils capable of driving two implants simultaneously. The platform is capable of processing signals from the four microphones simultaneously and producing synchronized binaural cochlear implant outputs that drive two (bilaterally implanted) SP or XP implants. Both audio signal preprocessing algorithms (such as binaural beamforming) and novel binaural stimulation strategies (within the implant limitations) can be programmed by researchers. When the whole research platform is combined with Oticon Medical SP implants, interaural electrode timing can be controlled on individual electrodes to within ±1 µs and interaural electrode energy differences can be controlled to within ±2%. Hence, this new platform is particularly well suited to performing experiments related to interaural time differences in combination with interaural level differences in real-time. The platform also supports instantaneously variable stimulation rates and thereby enables investigations such as the effect of changing the stimulation rate on pitch perception. Because the processing can be changed on the fly, researchers can use this platform to study perceptual changes resulting from different processing strategies acutely. © The Author(s) 2015.

  5. Preferred Tempo and Low-Audio-Frequency Bias Emerge From Simulated Sub-cortical Processing of Sounds With a Musical Beat

    PubMed Central

    Zuk, Nathaniel J.; Carney, Laurel H.; Lalor, Edmund C.

    2018-01-01

    Prior research has shown that musical beats are salient at the level of the cortex in humans. Yet below the cortex there is considerable sub-cortical processing that could influence beat perception. Some biases, such as a tempo preference and an audio frequency bias for beat timing, could result from sub-cortical processing. Here, we used models of the auditory-nerve and midbrain-level amplitude modulation filtering to simulate sub-cortical neural activity to various beat-inducing stimuli, and we used the simulated activity to determine the tempo or beat frequency of the music. First, irrespective of the stimulus being presented, the preferred tempo was around 100 beats per minute, which is within the range of tempi where tempo discrimination and tapping accuracy are optimal. Second, sub-cortical processing predicted a stronger influence of lower audio frequencies on beat perception. However, the tempo identification algorithm that was optimized for simple stimuli often failed for recordings of music. For music, the most highly synchronized model activity occurred at a multiple of the beat frequency. Using bottom-up processes alone is insufficient to produce beat-locked activity. Instead, a learned and possibly top-down mechanism that scales the synchronization frequency to derive the beat frequency greatly improves the performance of tempo identification. PMID:29896080

  6. The Scanning Theremin Microscope: A Model Scanning Probe Instrument for Hands-On Activities

    ERIC Educational Resources Information Center

    Quardokus, Rebecca C.; Wasio, Natalie A.; Kandel, S. Alex

    2014-01-01

    A model scanning probe microscope, designed using similar principles of operation to research instruments, is described. Proximity sensing is done using a capacitance probe, and a mechanical linkage is used to scan this probe across surfaces. The signal is transduced as an audio tone using a heterodyne detection circuit analogous to that used in…

  7. Method for determining depth and shape of a sub-surface conductive object

    NASA Astrophysics Data System (ADS)

    Lee, D. O.; Montoya, P. C.; Wayland, J. R., Jr.

    1984-06-01

    The depth to and size of an underground object may be determined by sweeping a controlled source audio magnetotelluric (CSAMT) signal and locating a peak response when the receiver spans the edge of the object. The depth of the object is one quarter wavelength in the subsurface media of the frequency of the peak.

  8. Electronic (fenceless) control of livestock.

    Treesearch

    A.R. Tiedemann; T.M. Quigley; L.D. White; et al.

    1999-01-01

    During June and August 1992, a new technology designed to exclude cattle from specific areas such as riparian zones was tested. The technology consisted of an eartag worn by an animal that provides an audio warning and electrical impulse to the ear as the animal approaches the zone of influence of a transmitter. The transmitter emits a signal that narrowly defines the...

  9. Audiovisual Vowel Monitoring and the Word Superiority Effect in Children

    ERIC Educational Resources Information Center

    Fort, Mathilde; Spinelli, Elsa; Savariaux, Christophe; Kandel, Sonia

    2012-01-01

    The goal of this study was to explore whether viewing the speaker's articulatory gestures contributes to lexical access in children (ages 5-10) and in adults. We conducted a vowel monitoring task with words and pseudo-words in audio-only (AO) and audiovisual (AV) contexts with white noise masking the acoustic signal. The results indicated that…

  10. System Interconnections. A Survey of Technical Requirements for Broadband Cable Teleservices; Volume Five.

    ERIC Educational Resources Information Center

    McManamon, Peter M.

    Several aspects of system interconnections are treated in this report. The interconnection of existing and future cable television (CATV) systems for two-way transfer of audio/video and digital data signals is surveyed. The concept of interconnection is explored relative to existing and proposed CATV systems and broadband teleservice networks,…

  11. Towards parameter-free classification of sound effects in movies

    NASA Astrophysics Data System (ADS)

    Chu, Selina; Narayanan, Shrikanth; Kuo, C.-C. J.

    2005-08-01

    The problem of identifying intense events via multimedia data mining in films is investigated in this work. Movies are mainly characterized by dialog, music, and sound effects. We begin our investigation with detecting interesting events through sound effects. Sound effects are neither speech nor music, but are closely associated with interesting events such as car chases and gun shots. In this work, we utilize low-level audio features including MFCC and energy to identify sound effects. It was shown in previous work that the Hidden Markov model (HMM) works well for speech/audio signals. However, this technique requires a careful choice in designing the model and choosing correct parameters. In this work, we introduce a framework that will avoid such necessity and works well with semi- and non-parametric learning algorithms.

  12. New radio meteor detecting and logging software

    NASA Astrophysics Data System (ADS)

    Kaufmann, Wolfgang

    2017-08-01

    A new piece of software ``Meteor Logger'' for the radio observation of meteors is described. It analyses an incoming audio stream in the frequency domain to detect a radio meteor signal on the basis of its signature, instead of applying an amplitude threshold. For that reason the distribution of the three frequencies with the highest spectral power are considered over the time (3f method). An auto notch algorithm is developed to prevent the radio meteor signal detection from being jammed by a present interference line. The results of an exemplary logging session are discussed.

  13. The Personal Hearing System—A Software Hearing Aid for a Personal Communication System

    NASA Astrophysics Data System (ADS)

    Grimm, Giso; Guilmin, Gwénaël; Poppen, Frank; Vlaming, Marcel S. M. G.; Hohmann, Volker

    2009-12-01

    A concept and architecture of a personal communication system (PCS) is introduced that integrates audio communication and hearing support for the elderly and hearing-impaired through a personal hearing system (PHS). The concept envisions a central processor connected to audio headsets via a wireless body area network (WBAN). To demonstrate the concept, a prototype PCS is presented that is implemented on a netbook computer with a dedicated audio interface in combination with a mobile phone. The prototype can be used for field-testing possible applications and to reveal possibilities and limitations of the concept of integrating hearing support in consumer audio communication devices. It is shown that the prototype PCS can integrate hearing aid functionality, telephony, public announcement systems, and home entertainment. An exemplary binaural speech enhancement scheme that represents a large class of possible PHS processing schemes is shown to be compatible with the general concept. However, an analysis of hardware and software architectures shows that the implementation of a PCS on future advanced cell phone-like devices is challenging. Because of limitations in processing power, recoding of prototype implementations into fixed point arithmetic will be required and WBAN performance is still a limiting factor in terms of data rate and delay.

  14. Young children's recall and reconstruction of audio and audiovisual narratives.

    PubMed

    Gibbons, J; Anderson, D R; Smith, R; Field, D E; Fischer, C

    1986-08-01

    It has been claimed that the visual component of audiovisual media dominates young children's cognitive processing. This experiment examines the effects of input modality while controlling the complexity of the visual and auditory content and while varying the comprehension task (recall vs. reconstruction). 4- and 7-year-olds were presented brief stories through either audio or audiovisual media. The audio version consisted of narrated character actions and character utterances. The narrated actions were matched to the utterances on the basis of length and propositional complexity. The audiovisual version depicted the actions visually by means of stop animation instead of by auditory narrative statements. The character utterances were the same in both versions. Audiovisual input produced superior performance on explicit information in the 4-year-olds and produced more inferences at both ages. Because performance on utterances was superior in the audiovisual condition as compared to the audio condition, there was no evidence that visual input inhibits processing of auditory information. Actions were more likely to be produced by the younger children than utterances, regardless of input medium, indicating that prior findings of visual dominance may have been due to the salience of narrative action. Reconstruction, as compared to recall, produced superior depiction of actions at both ages as well as more constrained relevant inferences and narrative conventions.

  15. Audiovisual Interval Size Estimation Is Associated with Early Musical Training.

    PubMed

    Abel, Mary Kathryn; Li, H Charles; Russo, Frank A; Schlaug, Gottfried; Loui, Psyche

    2016-01-01

    Although pitch is a fundamental attribute of auditory perception, substantial individual differences exist in our ability to perceive differences in pitch. Little is known about how these individual differences in the auditory modality might affect crossmodal processes such as audiovisual perception. In this study, we asked whether individual differences in pitch perception might affect audiovisual perception, as it relates to age of onset and number of years of musical training. Fifty-seven subjects made subjective ratings of interval size when given point-light displays of audio, visual, and audiovisual stimuli of sung intervals. Audiovisual stimuli were divided into congruent and incongruent (audiovisual-mismatched) stimuli. Participants' ratings correlated strongly with interval size in audio-only, visual-only, and audiovisual-congruent conditions. In the audiovisual-incongruent condition, ratings correlated more with audio than with visual stimuli, particularly for subjects who had better pitch perception abilities and higher nonverbal IQ scores. To further investigate the effects of age of onset and length of musical training, subjects were divided into musically trained and untrained groups. Results showed that among subjects with musical training, the degree to which participants' ratings correlated with auditory interval size during incongruent audiovisual perception was correlated with both nonverbal IQ and age of onset of musical training. After partialing out nonverbal IQ, pitch discrimination thresholds were no longer associated with incongruent audio scores, whereas age of onset of musical training remained associated with incongruent audio scores. These findings invite future research on the developmental effects of musical training, particularly those relating to the process of audiovisual perception.

  16. Audiovisual Interval Size Estimation Is Associated with Early Musical Training

    PubMed Central

    Abel, Mary Kathryn; Li, H. Charles; Russo, Frank A.; Schlaug, Gottfried; Loui, Psyche

    2016-01-01

    Although pitch is a fundamental attribute of auditory perception, substantial individual differences exist in our ability to perceive differences in pitch. Little is known about how these individual differences in the auditory modality might affect crossmodal processes such as audiovisual perception. In this study, we asked whether individual differences in pitch perception might affect audiovisual perception, as it relates to age of onset and number of years of musical training. Fifty-seven subjects made subjective ratings of interval size when given point-light displays of audio, visual, and audiovisual stimuli of sung intervals. Audiovisual stimuli were divided into congruent and incongruent (audiovisual-mismatched) stimuli. Participants’ ratings correlated strongly with interval size in audio-only, visual-only, and audiovisual-congruent conditions. In the audiovisual-incongruent condition, ratings correlated more with audio than with visual stimuli, particularly for subjects who had better pitch perception abilities and higher nonverbal IQ scores. To further investigate the effects of age of onset and length of musical training, subjects were divided into musically trained and untrained groups. Results showed that among subjects with musical training, the degree to which participants’ ratings correlated with auditory interval size during incongruent audiovisual perception was correlated with both nonverbal IQ and age of onset of musical training. After partialing out nonverbal IQ, pitch discrimination thresholds were no longer associated with incongruent audio scores, whereas age of onset of musical training remained associated with incongruent audio scores. These findings invite future research on the developmental effects of musical training, particularly those relating to the process of audiovisual perception. PMID:27760134

  17. PROTAX-Sound: A probabilistic framework for automated animal sound identification

    PubMed Central

    Somervuo, Panu; Ovaskainen, Otso

    2017-01-01

    Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities. PMID:28863178

  18. PROTAX-Sound: A probabilistic framework for automated animal sound identification.

    PubMed

    de Camargo, Ulisses Moliterno; Somervuo, Panu; Ovaskainen, Otso

    2017-01-01

    Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities.

  19. Impact of Audio-Visual Asynchrony on Lip-Reading Effects -Neuromagnetic and Psychophysical Study-

    PubMed Central

    Yahata, Izumi; Kanno, Akitake; Sakamoto, Shuichi; Takanashi, Yoshitaka; Takata, Shiho; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio

    2016-01-01

    The effects of asynchrony between audio and visual (A/V) stimuli on the N100m responses of magnetoencephalography in the left hemisphere were compared with those on the psychophysical responses in 11 participants. The latency and amplitude of N100m were significantly shortened and reduced in the left hemisphere by the presentation of visual speech as long as the temporal asynchrony between A/V stimuli was within 100 ms, but were not significantly affected with audio lags of -500 and +500 ms. However, some small effects were still preserved on average with audio lags of 500 ms, suggesting similar asymmetry of the temporal window to that observed in psychophysical measurements, which tended to be more robust (wider) for audio lags; i.e., the pattern of visual-speech effects as a function of A/V lag observed in the N100m in the left hemisphere grossly resembled that in psychophysical measurements on average, although the individual responses were somewhat varied. The present results suggest that the basic configuration of the temporal window of visual effects on auditory-speech perception could be observed from the early auditory processing stage. PMID:28030631

  20. Recognition and characterization of unstructured environmental sounds

    NASA Astrophysics Data System (ADS)

    Chu, Selina

    2011-12-01

    Environmental sounds are what we hear everyday, or more generally sounds that surround us ambient or background audio. Humans utilize both vision and hearing to respond to their surroundings, a capability still quite limited in machine processing. The first step toward achieving multimodal input applications is the ability to process unstructured audio and recognize audio scenes (or environments). Such ability would have applications in content analysis and mining of multimedia data or improving robustness in context aware applications through multi-modality, such as in assistive robotics, surveillances, or mobile device-based services. The goal of this thesis is on the characterization of unstructured environmental sounds for understanding and predicting the context surrounding of an agent or device. Most research on audio recognition has focused primarily on speech and music. Less attention has been paid to the challenges and opportunities for using audio to characterize unstructured audio. My research focuses on investigating challenging issues in characterizing unstructured environmental audio and to develop novel algorithms for modeling the variations of the environment. The first step in building a recognition system for unstructured auditory environment was to investigate on techniques and audio features for working with such audio data. We begin by performing a study that explore suitable features and the feasibility of designing an automatic environment recognition system using audio information. In my initial investigation to explore the feasibility of designing an automatic environment recognition system using audio information, I have found that traditional recognition and feature extraction for audio were not suitable for environmental sound, as they lack any type of structures, unlike those of speech and music which contain formantic and harmonic structures, thus dispelling the notion that traditional speech and music recognition techniques can simply be used for realistic environmental sound. Natural unstructured environment sounds contain a large variety of sounds, which are in fact noise-like and are not effectively modeled by Mel-frequency cepstral coefficients (MFCCs) or other commonly-used audio features, e.g. energy, zero-crossing, etc. Due to the lack of appropriate features that is suitable for environmental audio and to achieve a more effective representation, I proposed a specialized feature extraction algorithm for environmental sounds that utilizes the matching pursuit (MP) algorithm to learn the inherent structure of each type of sounds, which we called MP-features. MP-features have shown to capture and represent sounds from different sources and different ranges, where frequency domain features (e.g., MFCCs) fail and can be advantageous when combining with MFCCs to improve the overall performance. The third component leads to our investigation on modeling and detecting the background audio. One of the goals of this research is to characterize an environment. Since many events would blend into the background, I wanted to look for a way to achieve a general model for any particular environment. Once we have an idea of the background, it will enable us to identify foreground events even if we havent seen these events before. Therefore, the next step is to investigate into learning the audio background model for each environment type, despite the occurrences of different foreground events. In this work, I presented a framework for robust audio background modeling, which includes learning the models for prediction, data knowledge and persistent characteristics of the environment. This approach has the ability to model the background and detect foreground events as well as the ability to verify whether the predicted background is indeed the background or a foreground event that protracts for a longer period of time. In this work, I also investigated the use of a semi-supervised learning technique to exploit and label new unlabeled audio data. The final components of my thesis will involve investigating on learning sound structures for generalization and applying the proposed ideas to context aware applications. The inherent nature of environmental sound is noisy and contains relatively large amounts of overlapping events between different environments. Environmental sounds contain large variances even within a single environment type, and frequently, there are no divisible or clear boundaries between some types. Traditional methods of classification are generally not robust enough to handle classes with overlaps. This audio, hence, requires representation by complex models. Using deep learning architecture provides a way to obtain a generative model-based method for classification. Specifically, I considered the use of Deep Belief Networks (DBNs) to model environmental audio and investigate its applicability with noisy data to improve robustness and generalization. A framework was proposed using composite-DBNs to discover high-level representations and to learn a hierarchical structure for different acoustic environments in a data-driven fashion. Experimental results on real data sets demonstrate its effectiveness over traditional methods with over 90% accuracy on recognition for a high number of environmental sound types.

  1. The integration of audio-tactile information is modulated by multimodal social interaction with physical contact in infancy.

    PubMed

    Tanaka, Yukari; Kanakogi, Yasuhiro; Kawasaki, Masahiro; Myowa, Masako

    2018-04-01

    Interaction between caregivers and infants is multimodal in nature. To react interactively and smoothly to such multimodal signals, infants must integrate all these signals. However, few empirical infant studies have investigated how multimodal social interaction with physical contact facilitates multimodal integration, especially regarding audio - tactile (A-T) information. By using electroencephalogram (EEG) and event-related potentials (ERPs), the present study investigated how neural processing involved in A-T integration is modulated by tactile interaction. Seven- to 8-months-old infants heard one pseudoword both whilst being tickled (multimodal 'A-T' condition), and not being tickled (unimodal 'A' condition). Thereafter, their EEG was measured during the perception of the same words. Compared to the A condition, the A-T condition resulted in enhanced ERPs and higher beta-band activity within the left temporal regions, indicating neural processing of A-T integration. Additionally, theta-band activity within the middle frontal region was enhanced, which may reflect enhanced attention to social information. Furthermore, differential ERPs correlated with the degree of engagement in the tickling interaction. We provide neural evidence that the integration of A-T information in infants' brains is facilitated through tactile interaction with others. Such plastic changes in neural processing may promote harmonious social interaction and effective learning in infancy. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. Selective Listening Point Audio Based on Blind Signal Separation and Stereophonic Technology

    NASA Astrophysics Data System (ADS)

    Niwa, Kenta; Nishino, Takanori; Takeda, Kazuya

    A sound field reproduction method is proposed that uses blind source separation and a head-related transfer function. In the proposed system, multichannel acoustic signals captured at distant microphones are decomposed to a set of location/signal pairs of virtual sound sources based on frequency-domain independent component analysis. After estimating the locations and the signals of the virtual sources by convolving the controlled acoustic transfer functions with each signal, the spatial sound is constructed at the selected point. In experiments, a sound field made by six sound sources is captured using 48 distant microphones and decomposed into sets of virtual sound sources. Since subjective evaluation shows no significant difference between natural and reconstructed sound when six virtual sources and are used, the effectiveness of the decomposing algorithm as well as the virtual source representation are confirmed.

  3. Method for determining depth and shape of a sub-surface conductive object

    DOEpatents

    Lee, D.O.; Montoya, P.C.; Wayland, Jr.

    1984-06-27

    The depth to and size of an underground object may be determined by sweeping a controlled source audio magnetotelluric (CSAMT) signal and locating a peak response when the receiver spans the edge of the object. The depth of the object is one quarter wavelength in the subsurface media of the frequency of the peak. 3 figures.

  4. 33 CFR 149.135 - What should be marked on the cargo transfer system alarm switch?

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 33 Navigation and Navigable Waters 2 2011-07-01 2011-07-01 false What should be marked on the cargo transfer system alarm switch? 149.135 Section 149.135 Navigation and Navigable Waters COAST GUARD... switch? Each switch for activating an alarm, and each audio or visual device for signaling an alarm, must...

  5. 33 CFR 149.135 - What should be marked on the cargo transfer system alarm switch?

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 33 Navigation and Navigable Waters 2 2012-07-01 2012-07-01 false What should be marked on the cargo transfer system alarm switch? 149.135 Section 149.135 Navigation and Navigable Waters COAST GUARD... switch? Each switch for activating an alarm, and each audio or visual device for signaling an alarm, must...

  6. 33 CFR 149.135 - What should be marked on the cargo transfer system alarm switch?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 33 Navigation and Navigable Waters 2 2010-07-01 2010-07-01 false What should be marked on the cargo transfer system alarm switch? 149.135 Section 149.135 Navigation and Navigable Waters COAST GUARD... switch? Each switch for activating an alarm, and each audio or visual device for signaling an alarm, must...

  7. 33 CFR 149.135 - What should be marked on the cargo transfer system alarm switch?

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 33 Navigation and Navigable Waters 2 2014-07-01 2014-07-01 false What should be marked on the cargo transfer system alarm switch? 149.135 Section 149.135 Navigation and Navigable Waters COAST GUARD... switch? Each switch for activating an alarm, and each audio or visual device for signaling an alarm, must...

  8. 33 CFR 149.135 - What should be marked on the cargo transfer system alarm switch?

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 33 Navigation and Navigable Waters 2 2013-07-01 2013-07-01 false What should be marked on the cargo transfer system alarm switch? 149.135 Section 149.135 Navigation and Navigable Waters COAST GUARD... switch? Each switch for activating an alarm, and each audio or visual device for signaling an alarm, must...

  9. Free Oscilloscope Web App Using a Computer Mic, Built-In Sound Library, or Your Own Files

    ERIC Educational Resources Information Center

    Ball, Edward; Ruiz, Frances; Ruiz, Michael J.

    2017-01-01

    We have developed an online oscilloscope program which allows users to see waveforms by utilizing their computer microphones, selecting from our library of over 30 audio files, and opening any *.mp3 or *.wav file on their computers. The oscilloscope displays real-time signals against time. The oscilloscope has been calibrated so one can make…

  10. A Software Tool for the Annotation of Embolic Events in Echo Doppler Audio Signals

    PubMed Central

    Pierleoni, Paola; Maurizi, Lorenzo; Palma, Lorenzo; Belli, Alberto; Valenti, Simone; Marroni, Alessandro

    2017-01-01

    The use of precordial Doppler monitoring to prevent decompression sickness (DS) is well known by the scientific community as an important instrument for early diagnosis of DS. However, the timely and correct diagnosis of DS without assistance from diving medical specialists is unreliable. Thus, a common protocol for the manual annotation of echo Doppler signals and a tool for their automated recording and annotation are necessary. We have implemented original software for efficient bubble appearance annotation and proposed a unified annotation protocol. The tool auto-sets the response time of human “bubble examiners,” performs playback of the Doppler file by rendering it independent of the specific audio player, and enables the annotation of individual bubbles or multiple bubbles known as “showers.” The tool provides a report with an optimized data structure and estimates the embolic risk level according to the Extended Spencer Scale. The tool is built in accordance with ISO/IEC 9126 on software quality and has been projected and tested with assistance from the Divers Alert Network (DAN) Europe Foundation, which employs this tool for its diving data acquisition campaigns. PMID:29242701

  11. A practical, low-noise coil system for magnetotellurics

    USGS Publications Warehouse

    Stanley, William D.; Tinkler, Richard D.

    1983-01-01

    Magnetotellurics is a geophysical technique which was developed by Cagnaird (1953) and Tikhonov (1950) and later refined by other scientists worldwide. The technique is a method of electromagnetic sounding of the Earth and is based upon the skin depth effect in conductive media. The electric and magnetic fields arising from natural sources are measured at the surface of the earth over broad frequency bands. An excellent review of the technique is provided in the paper by Vozoff (1972). The sources of the natural fields are found in two basic mechanisms. At frequencies above a few hertz, most of the energy arises from lightning in thunderstorm belts around the equatorial regions. This energy is propagated in a wave-guide formed by the earthionospheric cavity. Energy levels are higher at fundamental modes for this cavity, but sufficient energy exists over most of the audio range to be useful for sounding at these frequencies, in which case the technique is generally referred to as audio-magnetotellurics or AMT. At frequencies lower than audio, and in general below 1 Hz, the source of naturally occuring electromagnetic energy is found in ionospheric currents. Current systems flowing in the ionosphere generate EM waves which can be used in sounding of the earth. These fields generate a relatively complete spectrum of electromagnetic energy that extends from around 1 Hz to periods of one day. Figure 1 shows an amplitude spectrum characteristic of both the ionospheric and lightning sources, covering a frequency range from 0.0001 Hz to 1000 Hz. It can be seen that there is a minimum in signal levels that occurs at about 1 Hz, in the gap between the two sources, and that signal level increases with a decrease in frequency.

  12. An Evaluation of Output Signal to Noise Ratio as a Predictor of Cochlear Implant Speech Intelligibility.

    PubMed

    Watkins, Greg D; Swanson, Brett A; Suaning, Gregg J

    2018-02-22

    Cochlear implant (CI) sound processing strategies are usually evaluated in clinical studies involving experienced implant recipients. Metrics which estimate the capacity to perceive speech for a given set of audio and processing conditions provide an alternative means to assess the effectiveness of processing strategies. The aim of this research was to assess the ability of the output signal to noise ratio (OSNR) to accurately predict speech perception. It was hypothesized that compared with the other metrics evaluated in this study (1) OSNR would have equivalent or better accuracy and (2) OSNR would be the most accurate in the presence of variable levels of speech presentation. For the first time, the accuracy of OSNR as a metric which predicts speech intelligibility was compared, in a retrospective study, with that of the input signal to noise ratio (ISNR) and the short-term objective intelligibility (STOI) metric. Because STOI measured audio quality at the input to a CI sound processor, a vocoder was applied to the sound processor output and STOI was also calculated for the reconstructed audio signal (vocoder short-term objective intelligibility [VSTOI] metric). The figures of merit calculated for each metric were Pearson correlation of the metric and a psychometric function fitted to sentence scores at each predictor value (Pearson sigmoidal correlation [PSIG]), epsilon insensitive root mean square error (RMSE*) of the psychometric function and the sentence scores, and the statistical deviance of the fitted curve to the sentence scores (D). Sentence scores were taken from three existing data sets of Australian Sentence Tests in Noise results. The AuSTIN tests were conducted with experienced users of the Nucleus CI system. The score for each sentence was the proportion of morphemes the participant correctly repeated. In data set 1, all sentences were presented at 65 dB sound pressure level (SPL) in the presence of four-talker Babble noise. Each block of sentences used an adaptive procedure, with the speech presented at a fixed level and the ISNR varied. In data set 2, sentences were presented at 65 dB SPL in the presence of stationary speech weighted noise, street-side city noise, and cocktail party noise. An adaptive ISNR procedure was used. In data set 3, sentences were presented at levels ranging from 55 to 89 dB SPL with two automatic gain control configurations and two fixed ISNRs. For data set 1, the ISNR and OSNR were equally most accurate. STOI was significantly different for deviance (p = 0.045) and RMSE* (p < 0.001). VSTOI was significantly different for RMSE* (p < 0.001). For data set 2, ISNR and OSNR had an equivalent accuracy which was significantly better than that of STOI for PSIG (p = 0.029) and VSTOI for deviance (p = 0.001), RMSE*, and PSIG (both p < 0.001). For data set 3, OSNR was the most accurate metric and was significantly more accurate than VSTOI for deviance, RMSE*, and PSIG (all p < 0.001). ISNR and STOI were unable to predict the sentence scores for this data set. The study results supported the hypotheses. OSNR was found to have an accuracy equivalent to or better than ISNR, STOI, and VSTOI for tests conducted at a fixed presentation level and variable ISNR. OSNR was a more accurate metric than VSTOI for tests with fixed ISNRs and variable presentation levels. Overall, OSNR was the most accurate metric across the three data sets. OSNR holds promise as a prediction metric which could potentially improve the effectiveness of sound processor research and CI fitting.

  13. Impairments in multisensory processing are not universal to the autism spectrum: no evidence for crossmodal priming deficits in Asperger syndrome.

    PubMed

    David, Nicole; R Schneider, Till; Vogeley, Kai; Engel, Andreas K

    2011-10-01

    Individuals suffering from autism spectrum disorders (ASD) often show a tendency for detail- or feature-based perception (also referred to as "local processing bias") instead of more holistic stimulus processing typical for unaffected people. This local processing bias has been demonstrated for the visual and auditory domains and there is evidence that multisensory processing may also be affected in ASD. Most multisensory processing paradigms used social-communicative stimuli, such as human speech or faces, probing the processing of simultaneously occuring sensory signals. Multisensory processing, however, is not limited to simultaneous stimulation. In this study, we investigated whether multisensory processing deficits in ASD persist when semantically complex but nonsocial stimuli are presented in succession. Fifteen adult individuals with Asperger syndrome and 15 control persons participated in a visual-audio priming task, which required the classification of sounds that were either primed by semantically congruent or incongruent preceding pictures of objects. As expected, performance on congruent trials was faster and more accurate compared with incongruent trials (crossmodal priming effect). The Asperger group, however, did not differ significantly from the control group. Our results do not support a general multisensory processing deficit, which is universal to the entire autism spectrum. Copyright © 2011, International Society for Autism Research, Wiley-Liss, Inc.

  14. Battlefield decision aid for acoustical ground sensors with interface to meteorological data sources

    NASA Astrophysics Data System (ADS)

    Wilson, D. Keith; Noble, John M.; VanAartsen, Bruce H.; Szeto, Gregory L.

    2001-08-01

    The performance of acoustical ground sensors depends heavily on the local atmospheric and terrain conditions. This paper describes a prototype physics-based decision aid, called the Acoustic Battlefield Aid (ABFA), for predicting these environ-mental effects. ABFA integrates advanced models for acoustic propagation, atmospheric structure, and array signal process-ing into a convenient graphical user interface. The propagation calculations are performed in the frequency domain on user-definable target spectra. The solution method involves a parabolic approximation to the wave equation combined with a ter-rain diffraction model. Sensor performance is characterized with Cramer-Rao lower bounds (CRLBs). The CRLB calcula-tions include randomization of signal energy and wavefront orientation resulting from atmospheric turbulence. Available performance characterizations include signal-to-noise ratio, probability of detection, direction-finding accuracy for isolated receiving arrays, and location-finding accuracy for networked receiving arrays. A suite of integrated tools allows users to create new target descriptions from standard digitized audio files and to design new sensor array layouts. These tools option-ally interface with the ARL Database/Automatic Target Recognition (ATR) Laboratory, providing access to an extensive library of target signatures. ABFA also includes a Java-based capability for network access of near real-time data from sur-face weather stations or forecasts from the Army's Integrated Meteorological System. As an example, the detection footprint of an acoustical sensor, as it evolves over a 13-hour period, is calculated.

  15. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    NASA Technical Reports Server (NTRS)

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  16. Sample-based engine noise synthesis using an enhanced pitch-synchronous overlap-and-add method.

    PubMed

    Jagla, Jan; Maillard, Julien; Martin, Nadine

    2012-11-01

    An algorithm for the real time synthesis of internal combustion engine noise is presented. Through the analysis of a recorded engine noise signal of continuously varying engine speed, a dataset of sound samples is extracted allowing the real time synthesis of the noise induced by arbitrary evolutions of engine speed. The sound samples are extracted from a recording spanning the entire engine speed range. Each sample is delimitated such as to contain the sound emitted during one cycle of the engine plus the necessary overlap to ensure smooth transitions during the synthesis. The proposed approach, an extension of the PSOLA method introduced for speech processing, takes advantage of the specific periodicity of engine noise signals to locate the extraction instants of the sound samples. During the synthesis stage, the sound samples corresponding to the target engine speed evolution are concatenated with an overlap and add algorithm. It is shown that this method produces high quality audio restitution with a low computational load. It is therefore well suited for real time applications.

  17. Digital Signal Processing For Low Bit Rate TV Image Codecs

    NASA Astrophysics Data System (ADS)

    Rao, K. R.

    1987-06-01

    In view of the 56 KBPS digital switched network services and the ISDN, low bit rate codecs for providing real time full motion color video are under various stages of development. Some companies have already brought the codecs into the market. They are being used by industry and some Federal Agencies for video teleconferencing. In general, these codecs have various features such as multiplexing audio and data, high resolution graphics, encryption, error detection and correction, self diagnostics, freezeframe, split video, text overlay etc. To transmit the original color video on a 56 KBPS network requires bit rate reduction of the order of 1400:1. Such a large scale bandwidth compression can be realized only by implementing a number of sophisticated,digital signal processing techniques. This paper provides an overview of such techniques and outlines the newer concepts that are being investigated. Before resorting to the data compression techniques, various preprocessing operations such as noise filtering, composite-component transformation and horizontal and vertical blanking interval removal are to be implemented. Invariably spatio-temporal subsampling is achieved by appropriate filtering. Transform and/or prediction coupled with motion estimation and strengthened by adaptive features are some of the tools in the arsenal of the data reduction methods. Other essential blocks in the system are quantizer, bit allocation, buffer, multiplexer, channel coding etc.

  18. Design and Implementation of Sound Searching Robots in Wireless Sensor Networks

    PubMed Central

    Han, Lianfu; Shen, Zhengguang; Fu, Changfeng; Liu, Chao

    2016-01-01

    A sound target-searching robot system which includes a 4-channel microphone array for sound collection, magneto-resistive sensor for declination measurement, and a wireless sensor networks (WSN) for exchanging information is described. It has an embedded sound signal enhancement, recognition and location method, and a sound searching strategy based on a digital signal processor (DSP). As the wireless network nodes, three robots comprise the WSN a personal computer (PC) in order to search the three different sound targets in task-oriented collaboration. The improved spectral subtraction method is used for noise reduction. As the feature of audio signal, Mel-frequency cepstral coefficient (MFCC) is extracted. Based on the K-nearest neighbor classification method, we match the trained feature template to recognize sound signal type. This paper utilizes the improved generalized cross correlation method to estimate time delay of arrival (TDOA), and then employs spherical-interpolation for sound location according to the TDOA and the geometrical position of the microphone array. A new mapping has been proposed to direct the motor to search sound targets flexibly. As the sink node, the PC receives and displays the result processed in the WSN, and it also has the ultimate power to make decision on the received results in order to improve their accuracy. The experiment results show that the designed three-robot system implements sound target searching function without collisions and performs well. PMID:27657088

  19. Design and Implementation of Sound Searching Robots in Wireless Sensor Networks.

    PubMed

    Han, Lianfu; Shen, Zhengguang; Fu, Changfeng; Liu, Chao

    2016-09-21

    A sound target-searching robot system which includes a 4-channel microphone array for sound collection, magneto-resistive sensor for declination measurement, and a wireless sensor networks (WSN) for exchanging information is described. It has an embedded sound signal enhancement, recognition and location method, and a sound searching strategy based on a digital signal processor (DSP). As the wireless network nodes, three robots comprise the WSN a personal computer (PC) in order to search the three different sound targets in task-oriented collaboration. The improved spectral subtraction method is used for noise reduction. As the feature of audio signal, Mel-frequency cepstral coefficient (MFCC) is extracted. Based on the K-nearest neighbor classification method, we match the trained feature template to recognize sound signal type. This paper utilizes the improved generalized cross correlation method to estimate time delay of arrival (TDOA), and then employs spherical-interpolation for sound location according to the TDOA and the geometrical position of the microphone array. A new mapping has been proposed to direct the motor to search sound targets flexibly. As the sink node, the PC receives and displays the result processed in the WSN, and it also has the ultimate power to make decision on the received results in order to improve their accuracy. The experiment results show that the designed three-robot system implements sound target searching function without collisions and performs well.

  20. Fourier Phase Domain Steganography: Phase Bin Encoding Via Interpolation

    NASA Astrophysics Data System (ADS)

    Rivas, Edward

    2007-04-01

    In recent years there has been an increased interest in audio steganography and watermarking. This is due primarily to two reasons. First, an acute need to improve our national security capabilities in light of terrorist and criminal activity has driven new ideas and experimentation. Secondly, the explosive proliferation of digital media has forced the music industry to rethink how they will protect their intellectual property. Various techniques have been implemented but the phase domain remains a fertile ground for improvement due to the relative robustness to many types of distortion and immunity to the Human Auditory System. A new method for embedding data in the phase domain of the Discrete Fourier Transform of an audio signal is proposed. Focus is given to robustness and low perceptibility, while maintaining a relatively high capacity rate of up to 172 bits/s.

Top