Sample records for audio feature space

  1. Modified DCTNet for audio signals classification

    NASA Astrophysics Data System (ADS)

    Xian, Yin; Pu, Yunchen; Gan, Zhe; Lu, Liang; Thompson, Andrew

    2016-10-01

    In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental results show that the A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art performance in bird song classification rate, and improve artist identification accuracy in music data. They demonstrate A-DCTNet's applicability to signal processing problems.

  2. Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

    PubMed

    Keitel, Christian; Müller, Matthias M

    2016-05-01

    Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space.

  3. Detecting Parkinson's disease from sustained phonation and speech signals.

    PubMed

    Vaiciukynas, Evaldas; Verikas, Antanas; Gelzinis, Adas; Bacauskiene, Marija

    2017-01-01

    This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC) and smart phone (SP) microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF) is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER) and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.

  4. News video story segmentation method using fusion of audio-visual features

    NASA Astrophysics Data System (ADS)

    Wen, Jun; Wu, Ling-da; Zeng, Pu; Luan, Xi-dao; Xie, Yu-xiang

    2007-11-01

    News story segmentation is an important aspect for news video analysis. This paper presents a method for news video story segmentation. Different form prior works, which base on visual features transform, the proposed technique uses audio features as baseline and fuses visual features with it to refine the results. At first, it selects silence clips as audio features candidate points, and selects shot boundaries and anchor shots as two kinds of visual features candidate points. Then this paper selects audio feature candidates as cues and develops different fusion method, which effectively using diverse type visual candidates to refine audio candidates, to get story boundaries. Experiment results show that this method has high efficiency and adaptability to different kinds of news video.

  5. Efficient audio signal processing for embedded systems

    NASA Astrophysics Data System (ADS)

    Chiu, Leung Kin

    As mobile platforms continue to pack on more computational power, electronics manufacturers start to differentiate their products by enhancing the audio features. However, consumers also demand smaller devices that could operate for longer time, hence imposing design constraints. In this research, we investigate two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound ”richer" and "fuller." Piezoelectric speakers have a small form factor but exhibit poor response in the low-frequency region. In the algorithm, we combine psychoacoustic bass extension and dynamic range compression to improve the perceived bass coming out from the tiny speakers. We also developed an audio energy reduction algorithm for loudspeaker power management. The perceptually transparent algorithm extends the battery life of mobile devices and prevents thermal damage in speakers. This method is similar to audio compression algorithms, which encode audio signals in such a ways that the compression artifacts are not easily perceivable. Instead of reducing the storage space, however, we suppress the audio contents that are below the hearing threshold, therefore reducing the signal energy. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The system is an example of an analog-to-information converter. The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. In this classifier architecture, we combine simple "base" analog classifiers to form a strong one. We also designed the circuits to implement the AdaBoost-based analog classifier.

  6. Modeling of the ground-to-SSFMB link networking features using SPW

    NASA Technical Reports Server (NTRS)

    Watson, John C.

    1993-01-01

    This report describes the modeling and simulation of the networking features of the ground-to-Space Station Freedom manned base (SSFMB) link using COMDISCO signal processing work-system (SPW). The networking features modeled include the implementation of Consultative Committee for Space Data Systems (CCSDS) protocols in the multiplexing of digitized audio and core data into virtual channel data units (VCDU's) in the control center complex and the demultiplexing of VCDU's in the onboard baseband signal processor. The emphasis of this work has been placed on techniques for modeling the CCSDS networking features using SPW. The objectives for developing the SPW models are to test the suitability of SPW for modeling networking features and to develop SPW simulation models of the control center complex and space station baseband signal processor for use in end-to-end testing of the ground-to-SSFMB S-band single access forward (SSAF) link.

  7. Audio Motor Training at the Foot Level Improves Space Representation.

    PubMed

    Aggius-Vella, Elena; Campus, Claudio; Finocchietti, Sara; Gori, Monica

    2017-01-01

    Spatial representation is developed thanks to the integration of visual signals with the other senses. It has been shown that the lack of vision compromises the development of some spatial representations. In this study we tested the effect of a new rehabilitation device called ABBI (Audio Bracelet for Blind Interaction) to improve space representation. ABBI produces an audio feedback linked to body movement. Previous studies from our group showed that this device improves the spatial representation of space in early blind adults around the upper part of the body. Here we evaluate whether the audio motor feedback produced by ABBI can also improve audio spatial representation of sighted individuals in the space around the legs. Forty five blindfolded sighted subjects participated in the study, subdivided into three experimental groups. An audio space localization (front-back discrimination) task was performed twice by all groups of subjects before and after different kind of training conditions. A group (experimental) performed an audio-motor training with the ABBI device placed on their foot. Another group (control) performed a free motor activity without audio feedback associated with body movement. The other group (control) passively listened to the ABBI sound moved at foot level by the experimenter without producing any body movement. Results showed that only the experimental group, which performed the training with the audio-motor feedback, showed an improvement in accuracy for sound discrimination. No improvement was observed for the two control groups. These findings suggest that the audio-motor training with ABBI improves audio space perception also in the space around the legs in sighted individuals. This result provides important inputs for the rehabilitation of the space representations in the lower part of the body.

  8. Audio Motor Training at the Foot Level Improves Space Representation

    PubMed Central

    Aggius-Vella, Elena; Campus, Claudio; Finocchietti, Sara; Gori, Monica

    2017-01-01

    Spatial representation is developed thanks to the integration of visual signals with the other senses. It has been shown that the lack of vision compromises the development of some spatial representations. In this study we tested the effect of a new rehabilitation device called ABBI (Audio Bracelet for Blind Interaction) to improve space representation. ABBI produces an audio feedback linked to body movement. Previous studies from our group showed that this device improves the spatial representation of space in early blind adults around the upper part of the body. Here we evaluate whether the audio motor feedback produced by ABBI can also improve audio spatial representation of sighted individuals in the space around the legs. Forty five blindfolded sighted subjects participated in the study, subdivided into three experimental groups. An audio space localization (front-back discrimination) task was performed twice by all groups of subjects before and after different kind of training conditions. A group (experimental) performed an audio-motor training with the ABBI device placed on their foot. Another group (control) performed a free motor activity without audio feedback associated with body movement. The other group (control) passively listened to the ABBI sound moved at foot level by the experimenter without producing any body movement. Results showed that only the experimental group, which performed the training with the audio-motor feedback, showed an improvement in accuracy for sound discrimination. No improvement was observed for the two control groups. These findings suggest that the audio-motor training with ABBI improves audio space perception also in the space around the legs in sighted individuals. This result provides important inputs for the rehabilitation of the space representations in the lower part of the body. PMID:29326564

  9. Tonal Interface to MacroMolecules (TIMMol): A Textual and Tonal Tool for Molecular Visualization

    ERIC Educational Resources Information Center

    Cordes, Timothy J.; Carlson, C. Britt; Forest, Katrina T.

    2008-01-01

    We developed the three-dimensional visualization software, Tonal Interface to MacroMolecules or TIMMol, for studying atomic coordinates of protein structures. Key features include audio tones indicating x, y, z location, identification of the cursor location in one-dimensional and three-dimensional space, textual output that can be easily linked…

  10. Content-based audio authentication using a hierarchical patchwork watermark embedding

    NASA Astrophysics Data System (ADS)

    Gulbis, Michael; Müller, Erika

    2010-05-01

    Content-based audio authentication watermarking techniques extract perceptual relevant audio features, which are robustly embedded into the audio file to protect. Manipulations of the audio file are detected on the basis of changes between the original embedded feature information and the anew extracted features during verification. The main challenges of content-based watermarking are on the one hand the identification of a suitable audio feature to distinguish between content preserving and malicious manipulations. On the other hand the development of a watermark, which is robust against content preserving modifications and able to carry the whole authentication information. The payload requirements are significantly higher compared to transaction watermarking or copyright protection. Finally, the watermark embedding should not influence the feature extraction to avoid false alarms. Current systems still lack a sufficient alignment of watermarking algorithm and feature extraction. In previous work we developed a content-based audio authentication watermarking approach. The feature is based on changes in DCT domain over time. A patchwork algorithm based watermark was used to embed multiple one bit watermarks. The embedding process uses the feature domain without inflicting distortions to the feature. The watermark payload is limited by the feature extraction, more precisely the critical bands. The payload is inverse proportional to segment duration of the audio file segmentation. Transparency behavior was analyzed in dependence of segment size and thus the watermark payload. At a segment duration of about 20 ms the transparency shows an optimum (measured in units of Objective Difference Grade). Transparency and/or robustness are fast decreased for working points beyond this area. Therefore, these working points are unsuitable to gain further payload, needed for the embedding of the whole authentication information. In this paper we present a hierarchical extension of the watermark method to overcome the limitations given by the feature extraction. The approach is a recursive application of the patchwork algorithm onto its own patches, with a modified patch selection to ensure a better signal to noise ratio for the watermark embedding. The robustness evaluation was done by compression (mp3, ogg, aac), normalization, and several attacks of the stirmark benchmark for audio suite. Compared on the base of same payload and transparency the hierarchical approach shows improved robustness.

  11. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.

    PubMed

    Giannakopoulos, Theodoros

    2015-01-01

    Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.

  12. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis

    PubMed Central

    Giannakopoulos, Theodoros

    2015-01-01

    Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library. PMID:26656189

  13. 47 CFR 25.214 - Technical requirements for space stations in the Satellite Digital Audio Radio Service and...

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 47 Telecommunication 2 2014-10-01 2014-10-01 false Technical requirements for space stations in the Satellite Digital Audio Radio Service and associated terrestrial repeaters. 25.214 Section 25.214... Technical Standards § 25.214 Technical requirements for space stations in the Satellite Digital Audio Radio...

  14. 47 CFR 25.214 - Technical requirements for space stations in the satellite digital audio radio service and...

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 47 Telecommunication 2 2012-10-01 2012-10-01 false Technical requirements for space stations in the satellite digital audio radio service and associated terrestrial repeaters. 25.214 Section 25.214... Technical Standards § 25.214 Technical requirements for space stations in the satellite digital audio radio...

  15. 47 CFR 25.214 - Technical requirements for space stations in the satellite digital audio radio service and...

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 47 Telecommunication 2 2011-10-01 2011-10-01 false Technical requirements for space stations in the satellite digital audio radio service and associated terrestrial repeaters. 25.214 Section 25.214... Technical Standards § 25.214 Technical requirements for space stations in the satellite digital audio radio...

  16. 47 CFR 25.214 - Technical requirements for space stations in the Satellite Digital Audio Radio Service and...

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 47 Telecommunication 2 2013-10-01 2013-10-01 false Technical requirements for space stations in the Satellite Digital Audio Radio Service and associated terrestrial repeaters. 25.214 Section 25.214... Technical Standards § 25.214 Technical requirements for space stations in the Satellite Digital Audio Radio...

  17. 47 CFR 25.214 - Technical requirements for space stations in the satellite digital audio radio service and...

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 47 Telecommunication 2 2010-10-01 2010-10-01 false Technical requirements for space stations in the satellite digital audio radio service and associated terrestrial repeaters. 25.214 Section 25.214... Technical Standards § 25.214 Technical requirements for space stations in the satellite digital audio radio...

  18. Fall Detection Using Smartphone Audio Features.

    PubMed

    Cheffena, Michael

    2016-07-01

    An automated fall detection system based on smartphone audio features is developed. The spectrogram, mel frequency cepstral coefficents (MFCCs), linear predictive coding (LPC), and matching pursuit (MP) features of different fall and no-fall sound events are extracted from experimental data. Based on the extracted audio features, four different machine learning classifiers: k-nearest neighbor classifier (k-NN), support vector machine (SVM), least squares method (LSM), and artificial neural network (ANN) are investigated for distinguishing between fall and no-fall events. For each audio feature, the performance of each classifier in terms of sensitivity, specificity, accuracy, and computational complexity is evaluated. The best performance is achieved using spectrogram features with ANN classifier with sensitivity, specificity, and accuracy all above 98%. The classifier also has acceptable computational requirement for training and testing. The system is applicable in home environments where the phone is placed in the vicinity of the user.

  19. Video content parsing based on combined audio and visual information

    NASA Astrophysics Data System (ADS)

    Zhang, Tong; Kuo, C.-C. Jay

    1999-08-01

    While previous research on audiovisual data segmentation and indexing primarily focuses on the pictorial part, significant clues contained in the accompanying audio flow are often ignored. A fully functional system for video content parsing can be achieved more successfully through a proper combination of audio and visual information. By investigating the data structure of different video types, we present tools for both audio and visual content analysis and a scheme for video segmentation and annotation in this research. In the proposed system, video data are segmented into audio scenes and visual shots by detecting abrupt changes in audio and visual features, respectively. Then, the audio scene is categorized and indexed as one of the basic audio types while a visual shot is presented by keyframes and associate image features. An index table is then generated automatically for each video clip based on the integration of outputs from audio and visual analysis. It is shown that the proposed system provides satisfying video indexing results.

  20. Space Shuttle Orbiter audio subsystem. [to communication and tracking system

    NASA Technical Reports Server (NTRS)

    Stewart, C. H.

    1978-01-01

    The selection of the audio multiplex control configuration for the Space Shuttle Orbiter audio subsystem is discussed and special attention is given to the evaluation criteria of cost, weight and complexity. The specifications and design of the subsystem are described and detail is given to configurations of the audio terminal and audio central control unit (ATU, ACCU). The audio input from the ACCU, at a signal level of -12.2 to 14.8 dBV, nominal range, at 1 kHz, was found to have balanced source impedance and a balanced local impedance of 6000 + or - 600 ohms at 1 kHz, dc isolated. The Lyndon B. Johnson Space Center (JSC) electroacoustic test laboratory, an audio engineering facility consisting of a collection of acoustic test chambers, analyzed problems of speaker and headset performance, multiplexed control data coupled with audio channels, and the Orbiter cabin acoustic effects on the operational performance of voice communications. This system allows technical management and project engineering to address key constraining issues, such as identifying design deficiencies of the headset interface unit and the assessment of the Orbiter cabin performance of voice communications, which affect the subsystem development.

  1. Audio-video feature correlation: faces and speech

    NASA Astrophysics Data System (ADS)

    Durand, Gwenael; Montacie, Claude; Caraty, Marie-Jose; Faudemay, Pascal

    1999-08-01

    This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.

  2. Audio-visual imposture

    NASA Astrophysics Data System (ADS)

    Karam, Walid; Mokbel, Chafic; Greige, Hanna; Chollet, Gerard

    2006-05-01

    A GMM based audio visual speaker verification system is described and an Active Appearance Model with a linear speaker transformation system is used to evaluate the robustness of the verification. An Active Appearance Model (AAM) is used to automatically locate and track a speaker's face in a video recording. A Gaussian Mixture Model (GMM) based classifier (BECARS) is used for face verification. GMM training and testing is accomplished on DCT based extracted features of the detected faces. On the audio side, speech features are extracted and used for speaker verification with the GMM based classifier. Fusion of both audio and video modalities for audio visual speaker verification is compared with face verification and speaker verification systems. To improve the robustness of the multimodal biometric identity verification system, an audio visual imposture system is envisioned. It consists of an automatic voice transformation technique that an impostor may use to assume the identity of an authorized client. Features of the transformed voice are then combined with the corresponding appearance features and fed into the GMM based system BECARS for training. An attempt is made to increase the acceptance rate of the impostor and to analyzing the robustness of the verification system. Experiments are being conducted on the BANCA database, with a prospect of experimenting on the newly developed PDAtabase developed within the scope of the SecurePhone project.

  3. Performance enhancement for audio-visual speaker identification using dynamic facial muscle model.

    PubMed

    Asadpour, Vahid; Towhidkhah, Farzad; Homayounpour, Mohammad Mehdi

    2006-10-01

    Science of human identification using physiological characteristics or biometry has been of great concern in security systems. However, robust multimodal identification systems based on audio-visual information has not been thoroughly investigated yet. Therefore, the aim of this work to propose a model-based feature extraction method which employs physiological characteristics of facial muscles producing lip movements. This approach adopts the intrinsic properties of muscles such as viscosity, elasticity, and mass which are extracted from the dynamic lip model. These parameters are exclusively dependent on the neuro-muscular properties of speaker; consequently, imitation of valid speakers could be reduced to a large extent. These parameters are applied to a hidden Markov model (HMM) audio-visual identification system. In this work, a combination of audio and video features has been employed by adopting a multistream pseudo-synchronized HMM training method. Noise robust audio features such as Mel-frequency cepstral coefficients (MFCC), spectral subtraction (SS), and relative spectra perceptual linear prediction (J-RASTA-PLP) have been used to evaluate the performance of the multimodal system once efficient audio feature extraction methods have been utilized. The superior performance of the proposed system is demonstrated on a large multispeaker database of continuously spoken digits, along with a sentence that is phonetically rich. To evaluate the robustness of algorithms, some experiments were performed on genetically identical twins. Furthermore, changes in speaker voice were simulated with drug inhalation tests. In 3 dB signal to noise ratio (SNR), the dynamic muscle model improved the identification rate of the audio-visual system from 91 to 98%. Results on identical twins revealed that there was an apparent improvement on the performance for the dynamic muscle model-based system, in which the identification rate of the audio-visual system was enhanced from 87 to 96%.

  4. MedlinePlus FAQ: Is audio description available for videos on MedlinePlus?

    MedlinePlus

    ... audiodescription.html Question: Is audio description available for videos on MedlinePlus? To use the sharing features on ... page, please enable JavaScript. Answer: Audio description of videos helps make the content of videos accessible to ...

  5. Recognition and characterization of unstructured environmental sounds

    NASA Astrophysics Data System (ADS)

    Chu, Selina

    2011-12-01

    Environmental sounds are what we hear everyday, or more generally sounds that surround us ambient or background audio. Humans utilize both vision and hearing to respond to their surroundings, a capability still quite limited in machine processing. The first step toward achieving multimodal input applications is the ability to process unstructured audio and recognize audio scenes (or environments). Such ability would have applications in content analysis and mining of multimedia data or improving robustness in context aware applications through multi-modality, such as in assistive robotics, surveillances, or mobile device-based services. The goal of this thesis is on the characterization of unstructured environmental sounds for understanding and predicting the context surrounding of an agent or device. Most research on audio recognition has focused primarily on speech and music. Less attention has been paid to the challenges and opportunities for using audio to characterize unstructured audio. My research focuses on investigating challenging issues in characterizing unstructured environmental audio and to develop novel algorithms for modeling the variations of the environment. The first step in building a recognition system for unstructured auditory environment was to investigate on techniques and audio features for working with such audio data. We begin by performing a study that explore suitable features and the feasibility of designing an automatic environment recognition system using audio information. In my initial investigation to explore the feasibility of designing an automatic environment recognition system using audio information, I have found that traditional recognition and feature extraction for audio were not suitable for environmental sound, as they lack any type of structures, unlike those of speech and music which contain formantic and harmonic structures, thus dispelling the notion that traditional speech and music recognition techniques can simply be used for realistic environmental sound. Natural unstructured environment sounds contain a large variety of sounds, which are in fact noise-like and are not effectively modeled by Mel-frequency cepstral coefficients (MFCCs) or other commonly-used audio features, e.g. energy, zero-crossing, etc. Due to the lack of appropriate features that is suitable for environmental audio and to achieve a more effective representation, I proposed a specialized feature extraction algorithm for environmental sounds that utilizes the matching pursuit (MP) algorithm to learn the inherent structure of each type of sounds, which we called MP-features. MP-features have shown to capture and represent sounds from different sources and different ranges, where frequency domain features (e.g., MFCCs) fail and can be advantageous when combining with MFCCs to improve the overall performance. The third component leads to our investigation on modeling and detecting the background audio. One of the goals of this research is to characterize an environment. Since many events would blend into the background, I wanted to look for a way to achieve a general model for any particular environment. Once we have an idea of the background, it will enable us to identify foreground events even if we havent seen these events before. Therefore, the next step is to investigate into learning the audio background model for each environment type, despite the occurrences of different foreground events. In this work, I presented a framework for robust audio background modeling, which includes learning the models for prediction, data knowledge and persistent characteristics of the environment. This approach has the ability to model the background and detect foreground events as well as the ability to verify whether the predicted background is indeed the background or a foreground event that protracts for a longer period of time. In this work, I also investigated the use of a semi-supervised learning technique to exploit and label new unlabeled audio data. The final components of my thesis will involve investigating on learning sound structures for generalization and applying the proposed ideas to context aware applications. The inherent nature of environmental sound is noisy and contains relatively large amounts of overlapping events between different environments. Environmental sounds contain large variances even within a single environment type, and frequently, there are no divisible or clear boundaries between some types. Traditional methods of classification are generally not robust enough to handle classes with overlaps. This audio, hence, requires representation by complex models. Using deep learning architecture provides a way to obtain a generative model-based method for classification. Specifically, I considered the use of Deep Belief Networks (DBNs) to model environmental audio and investigate its applicability with noisy data to improve robustness and generalization. A framework was proposed using composite-DBNs to discover high-level representations and to learn a hierarchical structure for different acoustic environments in a data-driven fashion. Experimental results on real data sets demonstrate its effectiveness over traditional methods with over 90% accuracy on recognition for a high number of environmental sound types.

  6. Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging

    NASA Astrophysics Data System (ADS)

    Lee, Jongpil; Nam, Juhan

    2017-08-01

    Music auto-tagging is often handled in a similar manner to image classification by regarding the 2D audio spectrogram as image data. However, music auto-tagging is distinguished from image classification in that the tags are highly diverse and have different levels of abstractions. Considering this issue, we propose a convolutional neural networks (CNN)-based architecture that embraces multi-level and multi-scaled features. The architecture is trained in three steps. First, we conduct supervised feature learning to capture local audio features using a set of CNNs with different input sizes. Second, we extract audio features from each layer of the pre-trained convolutional networks separately and aggregate them altogether given a long audio clip. Finally, we put them into fully-connected networks and make final predictions of the tags. Our experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method outperforms previous state-of-the-arts on the MagnaTagATune dataset and the Million Song Dataset. We further show that the proposed architecture is useful in transfer learning.

  7. Characteristics of Abductive Inquiry in Earth and Space Science: An Undergraduate Teacher Prospective Case Study

    NASA Astrophysics Data System (ADS)

    Ramalis, T. R.; Liliasari; Herdiwidjaya, D.

    2016-08-01

    The purpose this case study was to describe characteristic features learning activities in the domain of earth and space science. Context of this study is earth and space learning activities on three groups of student teachers prospective, respectively on the subject of the shape and size of Earth, land and sea breeze, and moon's orbit. The analysis is conducted qualitatively from activity data and analyze students doing project work, student worksheets, group project report documents, note and audio recordings of discussion. Research findings identified the type of abduction: theoretical models abduction, factual abduction, and law abduction during the learning process. Implications for science inquiry learning as well as relevant research were suggested.

  8. Dynamic and scalable audio classification by collective network of binary classifiers framework: an evolutionary approach.

    PubMed

    Kiranyaz, Serkan; Mäkinen, Toni; Gabbouj, Moncef

    2012-10-01

    In this paper, we propose a novel framework based on a collective network of evolutionary binary classifiers (CNBC) to address the problems of feature and class scalability. The main goal of the proposed framework is to achieve a high classification performance over dynamic audio and video repositories. The proposed framework adopts a "Divide and Conquer" approach in which an individual network of binary classifiers (NBC) is allocated to discriminate each audio class. An evolutionary search is applied to find the best binary classifier in each NBC with respect to a given criterion. Through the incremental evolution sessions, the CNBC framework can dynamically adapt to each new incoming class or feature set without resorting to a full-scale re-training or re-configuration. Therefore, the CNBC framework is particularly designed for dynamically varying databases where no conventional static classifiers can adapt to such changes. In short, it is entirely a novel topology, an unprecedented approach for dynamic, content/data adaptive and scalable audio classification. A large set of audio features can be effectively used in the framework, where the CNBCs make appropriate selections and combinations so as to achieve the highest discrimination among individual audio classes. Experiments demonstrate a high classification accuracy (above 90%) and efficiency of the proposed framework over large and dynamic audio databases. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. Audio feature extraction using probability distribution function

    NASA Astrophysics Data System (ADS)

    Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.

    2015-05-01

    Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.

  10. Audio-video decision support for patients: the documentary genré as a basis for decision aids.

    PubMed

    Volandes, Angelo E; Barry, Michael J; Wood, Fiona; Elwyn, Glyn

    2013-09-01

    Decision support tools are increasingly using audio-visual materials. However, disagreement exists about the use of audio-visual materials as they may be subjective and biased. This is a literature review of the major texts for documentary film studies to extrapolate issues of objectivity and bias from film to decision support tools. The key features of documentary films are that they attempt to portray real events and that the attempted reality is always filtered through the lens of the filmmaker. The same key features can be said of decision support tools that use audio-visual materials. Three concerns arising from documentary film studies as they apply to the use of audio-visual materials in decision support tools include whose perspective matters (stakeholder bias), how to choose among audio-visual materials (selection bias) and how to ensure objectivity (editorial bias). Decision science needs to start a debate about how audio-visual materials are to be used in decision support tools. Simply because audio-visual materials may be subjective and open to bias does not mean that we should not use them. Methods need to be found to ensure consensus around balance and editorial control, such that audio-visual materials can be used. © 2011 John Wiley & Sons Ltd.

  11. Using listener-based perceptual features as intermediate representations in music information retrieval.

    PubMed

    Friberg, Anders; Schoonderwaldt, Erwin; Hedblad, Anton; Fabiani, Marco; Elowsson, Anders

    2014-10-01

    The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a "brute force" model using a large number of general audio features.

  12. Detection of goal events in soccer videos

    NASA Astrophysics Data System (ADS)

    Kim, Hyoung-Gook; Roeber, Steffen; Samour, Amjad; Sikora, Thomas

    2005-01-01

    In this paper, we present an automatic extraction of goal events in soccer videos by using audio track features alone without relying on expensive-to-compute video track features. The extracted goal events can be used for high-level indexing and selective browsing of soccer videos. The detection of soccer video highlights using audio contents comprises three steps: 1) extraction of audio features from a video sequence, 2) event candidate detection of highlight events based on the information provided by the feature extraction Methods and the Hidden Markov Model (HMM), 3) goal event selection to finally determine the video intervals to be included in the summary. For this purpose we compared the performance of the well known Mel-scale Frequency Cepstral Coefficients (MFCC) feature extraction method vs. MPEG-7 Audio Spectrum Projection feature (ASP) extraction method based on three different decomposition methods namely Principal Component Analysis( PCA), Independent Component Analysis (ICA) and Non-Negative Matrix Factorization (NMF). To evaluate our system we collected five soccer game videos from various sources. In total we have seven hours of soccer games consisting of eight gigabytes of data. One of five soccer games is used as the training data (e.g., announcers' excited speech, audience ambient speech noise, audience clapping, environmental sounds). Our goal event detection results are encouraging.

  13. Multi-modal highlight generation for sports videos using an information-theoretic excitability measure

    NASA Astrophysics Data System (ADS)

    Hasan, Taufiq; Bořil, Hynek; Sangwan, Abhijeet; L Hansen, John H.

    2013-12-01

    The ability to detect and organize `hot spots' representing areas of excitement within video streams is a challenging research problem when techniques rely exclusively on video content. A generic method for sports video highlight selection is presented in this study which leverages both video/image structure as well as audio/speech properties. Processing begins where the video is partitioned into small segments and several multi-modal features are extracted from each segment. Excitability is computed based on the likelihood of the segmental features residing in certain regions of their joint probability density function space which are considered both exciting and rare. The proposed measure is used to rank order the partitioned segments to compress the overall video sequence and produce a contiguous set of highlights. Experiments are performed on baseball videos based on signal processing advancements for excitement assessment in the commentators' speech, audio energy, slow motion replay, scene cut density, and motion activity as features. Detailed analysis on correlation between user excitability and various speech production parameters is conducted and an effective scheme is designed to estimate the excitement level of commentator's speech from the sports videos. Subjective evaluation of excitability and ranking of video segments demonstrate a higher correlation with the proposed measure compared to well-established techniques indicating the effectiveness of the overall approach.

  14. SPACE FOR AUDIO-VISUAL LARGE GROUP INSTRUCTION.

    ERIC Educational Resources Information Center

    GAUSEWITZ, CARL H.

    WITH AN INCREASING INTEREST IN AND UTILIZATION OF AUDIO-VISUAL MEDIA IN EDUCATION FACILITIES, IT IS IMPORTANT THAT STANDARDS ARE ESTABLISHED FOR ESTIMATING THE SPACE REQUIRED FOR VIEWING THESE VARIOUS MEDIA. THIS MONOGRAPH SUGGESTS SUCH STANDARDS FOR VIEWING AREAS, VIEWING ANGLES, SEATING PATTERNS, SCREEN CHARACTERISTICS AND EQUIPMENT PERFORMANCES…

  15. Self-synchronization for spread spectrum audio watermarks after time scale modification

    NASA Astrophysics Data System (ADS)

    Nadeau, Andrew; Sharma, Gaurav

    2014-02-01

    De-synchronizing operations such as insertion, deletion, and warping pose significant challenges for watermarking. Because these operations are not typical for classical communications, watermarking techniques such as spread spectrum can perform poorly. Conversely, specialized synchronization solutions can be challenging to analyze/ optimize. This paper addresses desynchronization for blind spread spectrum watermarks, detected without reference to any unmodified signal, using the robustness properties of short blocks. Synchronization relies on dynamic time warping to search over block alignments to find a sequence with maximum correlation to the watermark. This differs from synchronization schemes that must first locate invariant features of the original signal, or estimate and reverse desynchronization before detection. Without these extra synchronization steps, analysis for the proposed scheme builds on classical SS concepts and allows characterizes the relationship between the size of search space (number of detection alignment tests) and intrinsic robustness (continuous search space region covered by each individual detection test). The critical metrics that determine the search space, robustness, and performance are: time-frequency resolution of the watermarking transform, and blocklength resolution of the alignment. Simultaneous robustness to (a) MP3 compression, (b) insertion/deletion, and (c) time-scale modification is also demonstrated for a practical audio watermarking scheme developed in the proposed framework.

  16. Audio-visual sensory deprivation degrades visuo-tactile peri-personal space.

    PubMed

    Noel, Jean-Paul; Park, Hyeong-Dong; Pasqualini, Isabella; Lissek, Herve; Wallace, Mark; Blanke, Olaf; Serino, Andrea

    2018-05-01

    Self-perception is scaffolded upon the integration of multisensory cues on the body, the space surrounding the body (i.e., the peri-personal space; PPS), and from within the body. We asked whether reducing information available from external space would change: PPS, interoceptive accuracy, and self-experience. Twenty participants were exposed to 15 min of audio-visual deprivation and performed: (i) a visuo-tactile interaction task measuring their PPS; (ii) a heartbeat perception task measuring interoceptive accuracy; and (iii) a series of questionnaires related to self-perception and mental illness. These tasks were carried out in two conditions: while exposed to a standard sensory environment and under a condition of audio-visual deprivation. Results suggest that while PPS becomes ill defined after audio-visual deprivation, interoceptive accuracy is unaltered at a group-level, with some participants improving and some worsening in interoceptive accuracy. Interestingly, correlational individual differences analyses revealed that changes in PPS after audio-visual deprivation were related to interoceptive accuracy and self-reports of "unusual experiences" on an individual subject basis. Taken together, the findings argue for a relationship between the malleability of PPS, interoceptive accuracy, and an inclination toward aberrant ideation often associated with mental illness. Copyright © 2018. Published by Elsevier Inc.

  17. Web Audio/Video Streaming Tool

    NASA Technical Reports Server (NTRS)

    Guruvadoo, Eranna K.

    2003-01-01

    In order to promote NASA-wide educational outreach program to educate and inform the public of space exploration, NASA, at Kennedy Space Center, is seeking efficient ways to add more contents to the web by streaming audio/video files. This project proposes a high level overview of a framework for the creation, management, and scheduling of audio/video assets over the web. To support short-term goals, the prototype of a web-based tool is designed and demonstrated to automate the process of streaming audio/video files. The tool provides web-enabled users interfaces to manage video assets, create publishable schedules of video assets for streaming, and schedule the streaming events. These operations are performed on user-defined and system-derived metadata of audio/video assets stored in a relational database while the assets reside on separate repository. The prototype tool is designed using ColdFusion 5.0.

  18. Audio Design: Creating Multi-sensory Images for the Mind.

    ERIC Educational Resources Information Center

    Ferrington, Gary

    1994-01-01

    Explores the concept of "theater of the mind" and discusses design factors in creating audio works that effectively stimulate mental pictures, including: narrative format in audio scripting; qualities of voice; use of concrete language; music; noise versus silence; and the creation of the illusion of space using monaural, stereophonic,…

  19. The SWRL Audio Laboratory System (ALS): An Integrated Configuration for Psychomusicology Research. Technical Report 51.

    ERIC Educational Resources Information Center

    Williams, David Brian; Hoskin, Richard K.

    This report describes features of the Audio Laboratory System (ALS), a device which supports research activities of the Southwest Regional Laboratory's Music Program. The ALS is used primarily to generate recorded audio tapes for psychomusicology research related to children's perception and learning of music concepts such as pitch, loudness,…

  20. Fuzzy Logic-Based Audio Pattern Recognition

    NASA Astrophysics Data System (ADS)

    Malcangi, M.

    2008-11-01

    Audio and audio-pattern recognition is becoming one of the most important technologies to automatically control embedded systems. Fuzzy logic may be the most important enabling methodology due to its ability to rapidly and economically model such application. An audio and audio-pattern recognition engine based on fuzzy logic has been developed for use in very low-cost and deeply embedded systems to automate human-to-machine and machine-to-machine interaction. This engine consists of simple digital signal-processing algorithms for feature extraction and normalization, and a set of pattern-recognition rules manually tuned or automatically tuned by a self-learning process.

  1. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

    PubMed

    Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

    2018-05-01

    Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.

  2. Audio Spectrogram Representations for Processing with Convolutional Neural Networks

    NASA Astrophysics Data System (ADS)

    Wyse, L.

    2017-05-01

    One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than it seems to be for visual images, and a variety of representations have been used for different applications including the raw digitized sample stream, hand-crafted features, machine discovered features, MFCCs and variants that include deltas, and a variety of spectral representations. This paper reviews some of these representations and issues that arise, focusing particularly on spectrograms for generating audio using neural networks for style transfer.

  3. Assessment of rural soundscapes with high-speed train noise.

    PubMed

    Lee, Pyoung Jik; Hong, Joo Young; Jeon, Jin Yong

    2014-06-01

    In the present study, rural soundscapes with high-speed train noise were assessed through laboratory experiments. A total of ten sites with varying landscape metrics were chosen for audio-visual recording. The acoustical characteristics of the high-speed train noise were analyzed using various noise level indices. Landscape metrics such as the percentage of natural features (NF) and Shannon's diversity index (SHDI) were adopted to evaluate the landscape features of the ten sites. Laboratory experiments were then performed with 20 well-trained listeners to investigate the perception of high-speed train noise in rural areas. The experiments consisted of three parts: 1) visual-only condition, 2) audio-only condition, and 3) combined audio-visual condition. The results showed that subjects' preference for visual images was significantly related to NF, the number of land types, and the A-weighted equivalent sound pressure level (LAeq). In addition, the visual images significantly influenced the noise annoyance, and LAeq and NF were the dominant factors affecting the annoyance from high-speed train noise in the combined audio-visual condition. In addition, Zwicker's loudness (N) was highly correlated with the annoyance from high-speed train noise in both the audio-only and audio-visual conditions. © 2013.

  4. Reconsidering the Role of Recorded Audio as a Rich, Flexible and Engaging Learning Space

    ERIC Educational Resources Information Center

    Middleton, Andrew

    2016-01-01

    Audio needs to be recognised as an integral medium capable of extending education's formal and informal, virtual and physical learning spaces. This paper reconsiders the value of educational podcasting through a review of literature and a module case study. It argues that a pedagogical understanding is needed and challenges technology-centred or…

  5. Mixing console design for telematic applications in live performance and remote recording

    NASA Astrophysics Data System (ADS)

    Samson, David J.

    The development of a telematic mixing console addresses audio engineers' need for a fully integrated system architecture that improves efficiency and control for applications such as distributed performance and remote recording. Current systems used in state of the art telematic performance rely on software-based interconnections with complex routing schemes that offer minimal flexibility or control over key parameters needed to achieve a professional workflow. The lack of hardware-based control in the current model limits the full potential of both the engineer and the system. The new architecture provides a full-featured platform that, alongside customary features, integrates (1) surround panning capability for motorized, binaural manikin heads, as well as all sources in the included auralization module, (2) self-labelling channel strips, responsive to change at all remote sites, (3) onboard roundtrip latency monitoring, (4) synchronized remote audio recording and monitoring, and (5) flexible routing. These features combined with robust parameter automation and precise analog control will raise the standard for telematic systems as well as advance the development of networked audio systems for both research and professional audio markets.

  6. Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

    NASA Astrophysics Data System (ADS)

    Adams, W. H.; Iyengar, Giridharan; Lin, Ching-Yung; Naphade, Milind Ramesh; Neti, Chalapathy; Nock, Harriet J.; Smith, John R.

    2003-12-01

    We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM), hidden Markov models (HMM), and support vector machines (SVM). Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.

  7. Audio Spatial Representation Around the Body

    PubMed Central

    Aggius-Vella, Elena; Campus, Claudio; Finocchietti, Sara; Gori, Monica

    2017-01-01

    Studies have found that portions of space around our body are differently coded by our brain. Numerous works have investigated visual and auditory spatial representation, focusing mostly on the spatial representation of stimuli presented at head level, especially in the frontal space. Only few studies have investigated spatial representation around the entire body and its relationship with motor activity. Moreover, it is still not clear whether the space surrounding us is represented as a unitary dimension or whether it is split up into different portions, differently shaped by our senses and motor activity. To clarify these points, we investigated audio localization of dynamic and static sounds at different body levels. In order to understand the role of a motor action in auditory space representation, we asked subjects to localize sounds by pointing with the hand or the foot, or by giving a verbal answer. We found that the audio sound localization was different depending on the body part considered. Moreover, a different pattern of response was observed when subjects were asked to make actions with respect to the verbal responses. These results suggest that the audio space around our body is split in various spatial portions, which are perceived differently: front, back, around chest, and around foot, suggesting that these four areas could be differently modulated by our senses and our actions. PMID:29249999

  8. Semantic Context Detection Using Audio Event Fusion

    NASA Astrophysics Data System (ADS)

    Chu, Wei-Ta; Cheng, Wen-Huang; Wu, Ja-Ling

    2006-12-01

    Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish semantic context detection. Two levels of modeling, audio event and semantic context modeling, are devised to bridge the gap between physical audio features and semantic concepts. In this work, hidden Markov models (HMMs) are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the semantic context level, generative (ergodic hidden Markov model) and discriminative (support vector machine (SVM)) approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics.

  9. Wireless Headset Communication System

    NASA Technical Reports Server (NTRS)

    Lau, Wilfred K.; Swanson, Richard; Christensen, Kurt K.

    1995-01-01

    System combines features of pagers, walkie-talkies, and cordless telephones. Wireless headset communication system uses digital modulation on spread spectrum to avoid interference among units. Consists of base station, 4 radio/antenna modules, and as many as 16 remote units with headsets. Base station serves as network controller, audio-mixing network, and interface to such outside services as computers, telephone networks, and other base stations. Developed for use at Kennedy Space Center, system also useful in industrial maintenance, emergency operations, construction, and airport operations. Also, digital capabilities exploited; by adding bar-code readers for use in taking inventories.

  10. The Effects of Audio-Visual Recorded and Audio Recorded Listening Tasks on the Accuracy of Iranian EFL Learners' Oral Production

    ERIC Educational Resources Information Center

    Drood, Pooya; Asl, Hanieh Davatgari

    2016-01-01

    The ways in which task in classrooms has developed and proceeded have receive great attention in the field of language teaching and learning in the sense that they draw attention of learners to the competing features such as accuracy, fluency, and complexity. English audiovisual and audio recorded materials have been widely used by teachers and…

  11. Trend Alert: A History Teacher's Guide to Using Podcasts in the Classroom

    ERIC Educational Resources Information Center

    Swan, Kathleen Owings; Hofer, Mark

    2009-01-01

    A "podcast" (an amalgam of the word broadcast and the iPod digital audio player) is essentially a broadcast of digital audio files on the web that users can listen to on their computer or digital audio player (e.g., iPod). Podcasts can be automatically delivered to an iPod or computer whenever new content is available. This unique feature of…

  12. Modeling sports highlights using a time-series clustering framework and model interpretation

    NASA Astrophysics Data System (ADS)

    Radhakrishnan, Regunathan; Otsuka, Isao; Xiong, Ziyou; Divakaran, Ajay

    2005-01-01

    In our past work on sports highlights extraction, we have shown the utility of detecting audience reaction using an audio classification framework. The audio classes in the framework were chosen based on intuition. In this paper, we present a systematic way of identifying the key audio classes for sports highlights extraction using a time series clustering framework. We treat the low-level audio features as a time series and model the highlight segments as "unusual" events in a background of an "usual" process. The set of audio classes to characterize the sports domain is then identified by analyzing the consistent patterns in each of the clusters output from the time series clustering framework. The distribution of features from the training data so obtained for each of the key audio classes, is parameterized by a Minimum Description Length Gaussian Mixture Model (MDL-GMM). We also interpret the meaning of each of the mixture components of the MDL-GMM for the key audio class (the "highlight" class) that is correlated with highlight moments. Our results show that the "highlight" class is a mixture of audience cheering and commentator's excited speech. Furthermore, we show that the precision-recall performance for highlights extraction based on this "highlight" class is better than that of our previous approach which uses only audience cheering as the key highlight class.

  13. NFL Films audio, video, and film production facilities

    NASA Astrophysics Data System (ADS)

    Berger, Russ; Schrag, Richard C.; Ridings, Jason J.

    2003-04-01

    The new NFL Films 200,000 sq. ft. headquarters is home for the critically acclaimed film production that preserves the NFL's visual legacy week-to-week during the football season, and is also the technical plant that processes and archives football footage from the earliest recorded media to the current network broadcasts. No other company in the country shoots more film than NFL Films, and the inclusion of cutting-edge video and audio formats demands that their technical spaces continually integrate the latest in the ever-changing world of technology. This facility houses a staggering array of acoustically sensitive spaces where music and sound are equal partners with the visual medium. Over 90,000 sq. ft. of sound critical technical space is comprised of an array of sound stages, music scoring stages, audio control rooms, music writing rooms, recording studios, mixing theaters, video production control rooms, editing suites, and a screening theater. Every production control space in the building is designed to monitor and produce multi channel surround sound audio. An overview of the architectural and acoustical design challenges encountered for each sophisticated listening, recording, viewing, editing, and sound critical environment will be discussed.

  14. Authenticity examination of compressed audio recordings using detection of multiple compression and encoders' identification.

    PubMed

    Korycki, Rafal

    2014-05-01

    Since the appearance of digital audio recordings, audio authentication has been becoming increasingly difficult. The currently available technologies and free editing software allow a forger to cut or paste any single word without audible artifacts. Nowadays, the only method referring to digital audio files commonly approved by forensic experts is the ENF criterion. It consists in fluctuation analysis of the mains frequency induced in electronic circuits of recording devices. Therefore, its effectiveness is strictly dependent on the presence of mains signal in the recording, which is a rare occurrence. Recently, much attention has been paid to authenticity analysis of compressed multimedia files and several solutions were proposed for detection of double compression in both digital video and digital audio. This paper addresses the problem of tampering detection in compressed audio files and discusses new methods that can be used for authenticity analysis of digital recordings. Presented approaches consist in evaluation of statistical features extracted from the MDCT coefficients as well as other parameters that may be obtained from compressed audio files. Calculated feature vectors are used for training selected machine learning algorithms. The detection of multiple compression covers up tampering activities as well as identification of traces of montage in digital audio recordings. To enhance the methods' robustness an encoder identification algorithm was developed and applied based on analysis of inherent parameters of compression. The effectiveness of tampering detection algorithms is tested on a predefined large music database consisting of nearly one million of compressed audio files. The influence of compression algorithms' parameters on the classification performance is discussed, based on the results of the current study. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  15. High-performance combination method of electric network frequency and phase for audio forgery detection in battery-powered devices.

    PubMed

    Savari, Maryam; Abdul Wahab, Ainuddin Wahid; Anuar, Nor Badrul

    2016-09-01

    Audio forgery is any act of tampering, illegal copy and fake quality in the audio in a criminal way. In the last decade, there has been increasing attention to the audio forgery detection due to a significant increase in the number of forge in different type of audio. There are a number of methods for forgery detection, which electric network frequency (ENF) is one of the powerful methods in this area for forgery detection in terms of accuracy. In spite of suitable accuracy of ENF in a majority of plug-in powered devices, the weak accuracy of ENF in audio forgery detection for battery-powered devices, especially in laptop and mobile phone, can be consider as one of the main obstacles of the ENF. To solve the ENF problem in terms of accuracy in battery-powered devices, a combination method of ENF and phase feature is proposed. From experiment conducted, ENF alone give 50% and 60% accuracy for forgery detection in mobile phone and laptop respectively, while the proposed method shows 88% and 92% accuracy respectively, for forgery detection in battery-powered devices. The results lead to higher accuracy for forgery detection with the combination of ENF and phase feature. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  16. How actions shape perception: learning action-outcome relations and predicting sensory outcomes promote audio-visual temporal binding

    PubMed Central

    Desantis, Andrea; Haggard, Patrick

    2016-01-01

    To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events. PMID:27982063

  17. How actions shape perception: learning action-outcome relations and predicting sensory outcomes promote audio-visual temporal binding.

    PubMed

    Desantis, Andrea; Haggard, Patrick

    2016-12-16

    To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events.

  18. Robot Command Interface Using an Audio-Visual Speech Recognition System

    NASA Astrophysics Data System (ADS)

    Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

    In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

  19. Effect of Divided Attention on Children's Rhythmic Response

    ERIC Educational Resources Information Center

    Thomas, Jerry R.; Stratton, Richard K.

    1977-01-01

    Audio and visual interference did not significantly impair rhythmic response levels of second- and fourth-grade boys as measured by space error scores, though audio input resulted in significantly less consistent temporal performance. (MB)

  20. “I Can Never Be Too Comfortable”: Race, Gender, and Emotion at the Hospital Bedside

    PubMed Central

    Cottingham, Marci D.; Johnson, Austin H.; Erickson, Rebecca J.

    2017-01-01

    In this article, we examine how race and gender shape nurses’ emotion practice. Based on audio diaries collected from 48 nurses within two Midwestern hospital systems in the United States, we illustrate the disproportionate emotional labor that emerges among women nurses of color in the white institutional space of American health care. In this environment, women of color experience an emotional double shift as a result of negotiating patient, coworker, and supervisor interactions. In confronting racist encounters, nurses of color in our sample experience additional job-related stress, must perform disproportionate amounts of emotional labor, and experience depleted emotional resources that negatively influence patient care. Methodologically, the study extends prior research by using audio diaries collected from a racially diverse sample to capture emotion as a situationally emergent and complex feature of nursing practice. We also extend research on nursing by tracing both the sources and consequences of unequal emotion practices for nurse well-being and patient care. PMID:29094641

  1. Low-order auditory Zernike moment: a novel approach for robust music identification in the compressed domain

    NASA Astrophysics Data System (ADS)

    Li, Wei; Xiao, Chuan; Liu, Yaduo

    2013-12-01

    Audio identification via fingerprint has been an active research field for years. However, most previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store music on personal computers and/or transmit it over the Internet. It will be interesting if a compressed unknown audio fragment could be directly recognized from the database without decompressing it into the wave format at first. So far, very few algorithms run directly on the compressed domain for music information retrieval, and most of them take advantage of the modified discrete cosine transform coefficients or derived cepstrum and energy type of features. As a first attempt, we propose in this paper utilizing compressed domain auditory Zernike moment adapted from image processing techniques as the key feature to devise a novel robust audio identification algorithm. Such fingerprint exhibits strong robustness, due to its statistically stable nature, against various audio signal distortions such as recompression, noise contamination, echo adding, equalization, band-pass filtering, pitch shifting, and slight time scale modification. Experimental results show that in a music database which is composed of 21,185 MP3 songs, a 10-s long music segment is able to identify its original near-duplicate recording, with average top-5 hit rate up to 90% or above even under severe audio signal distortions.

  2. Innovations: clinical computing: an audio computer-assisted self-interviewing system for research and screening in public mental health settings.

    PubMed

    Bertollo, David N; Alexander, Mary Jane; Shinn, Marybeth; Aybar, Jalila B

    2007-06-01

    This column describes the nonproprietary software Talker, used to adapt screening instruments to audio computer-assisted self-interviewing (ACASI) systems for low-literacy populations and other populations. Talker supports ease of programming, multiple languages, on-site scoring, and the ability to update a central research database. Key features include highly readable text display, audio presentation of questions and audio prompting of answers, and optional touch screen input. The scripting language for adapting instruments is briefly described as well as two studies in which respondents provided positive feedback on its use.

  3. Steganalysis of recorded speech

    NASA Astrophysics Data System (ADS)

    Johnson, Micah K.; Lyu, Siwei; Farid, Hany

    2005-03-01

    Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.

  4. Improvements of ModalMax High-Fidelity Piezoelectric Audio Device

    NASA Technical Reports Server (NTRS)

    Woodard, Stanley E.

    2005-01-01

    ModalMax audio speakers have been enhanced by innovative means of tailoring the vibration response of thin piezoelectric plates to produce a high-fidelity audio response. The ModalMax audio speakers are 1 mm in thickness. The device completely supplants the need to have a separate driver and speaker cone. ModalMax speakers can perform the same applications of cone speakers, but unlike cone speakers, ModalMax speakers can function in harsh environments such as high humidity or extreme wetness. New design features allow the speakers to be completely submersed in salt water, making them well suited for maritime applications. The sound produced from the ModalMax audio speakers has sound spatial resolution that is readily discernable for headset users.

  5. Automatic Detection and Classification of Audio Events for Road Surveillance Applications.

    PubMed

    Almaadeed, Noor; Asim, Muhammad; Al-Maadeed, Somaya; Bouridane, Ahmed; Beghdadi, Azeddine

    2018-06-06

    This work investigates the problem of detecting hazardous events on roads by designing an audio surveillance system that automatically detects perilous situations such as car crashes and tire skidding. In recent years, research has shown several visual surveillance systems that have been proposed for road monitoring to detect accidents with an aim to improve safety procedures in emergency cases. However, the visual information alone cannot detect certain events such as car crashes and tire skidding, especially under adverse and visually cluttered weather conditions such as snowfall, rain, and fog. Consequently, the incorporation of microphones and audio event detectors based on audio processing can significantly enhance the detection accuracy of such surveillance systems. This paper proposes to combine time-domain, frequency-domain, and joint time-frequency features extracted from a class of quadratic time-frequency distributions (QTFDs) to detect events on roads through audio analysis and processing. Experiments were carried out using a publicly available dataset. The experimental results conform the effectiveness of the proposed approach for detecting hazardous events on roads as demonstrated by 7% improvement of accuracy rate when compared against methods that use individual temporal and spectral features.

  6. PROTAX-Sound: A probabilistic framework for automated animal sound identification

    PubMed Central

    Somervuo, Panu; Ovaskainen, Otso

    2017-01-01

    Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities. PMID:28863178

  7. PROTAX-Sound: A probabilistic framework for automated animal sound identification.

    PubMed

    de Camargo, Ulisses Moliterno; Somervuo, Panu; Ovaskainen, Otso

    2017-01-01

    Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities.

  8. Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap.

    PubMed

    Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin'ya

    2013-01-01

    It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.

  9. Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap

    PubMed Central

    Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin’Ya

    2013-01-01

    It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap. PMID:23658549

  10. WebGL and web audio software lightweight components for multimedia education

    NASA Astrophysics Data System (ADS)

    Chang, Xin; Yuksel, Kivanc; Skarbek, Władysław

    2017-08-01

    The paper presents the results of our recent work on development of contemporary computing platform DC2 for multimedia education usingWebGL andWeb Audio { the W3C standards. Using literate programming paradigm the WEBSA educational tools were developed. It offers for a user (student), the access to expandable collection of WEBGL Shaders and web Audio scripts. The unique feature of DC2 is the option of literate programming, offered for both, the author and the reader in order to improve interactivity to lightweightWebGL andWeb Audio components. For instance users can define: source audio nodes including synthetic sources, destination audio nodes, and nodes for audio processing such as: sound wave shaping, spectral band filtering, convolution based modification, etc. In case of WebGL beside of classic graphics effects based on mesh and fractal definitions, the novel image processing analysis by shaders is offered like nonlinear filtering, histogram of gradients, and Bayesian classifiers.

  11. Detecting double compression of audio signal

    NASA Astrophysics Data System (ADS)

    Yang, Rui; Shi, Yun Q.; Huang, Jiwu

    2010-01-01

    MP3 is the most popular audio format nowadays in our daily life, for example music downloaded from the Internet and file saved in the digital recorder are often in MP3 format. However, low bitrate MP3s are often transcoded to high bitrate since high bitrate ones are of high commercial value. Also audio recording in digital recorder can be doctored easily by pervasive audio editing software. This paper presents two methods for the detection of double MP3 compression. The methods are essential for finding out fake-quality MP3 and audio forensics. The proposed methods use support vector machine classifiers with feature vectors formed by the distributions of the first digits of the quantized MDCT (modified discrete cosine transform) coefficients. Extensive experiments demonstrate the effectiveness of the proposed methods. To the best of our knowledge, this piece of work is the first one to detect double compression of audio signal.

  12. STS-57 Endeavour, Orbiter Vehicle (OV) 105, lifts off from KSC LC Pad 39B

    NASA Image and Video Library

    1993-06-21

    STS057-S-053 (21 June 1993) --- The Space Shuttle Endeavour lifts off Launch Pad 39B as captured on film by an audio-activated camera positioned at the 270-feet level on the Rotating Service Structure (RSS) at Launch Pad 39B. STS-57 launch occurred at 9:07:22 a.m. (EDT), June 21, 1993. The mission represents the first flight of the commercially developed SpaceHab laboratory module and also will feature a retrieval of the European Retrievable Carrier (EURECA). Onboard for Endeavour's fourth flight are a crew of six NASA astronauts; Ronald J. Grabe, mission commander; Brian Duffy, pilot; G. David Low, payload commander; and Nancy J. Sherlock, Peter J. K. (Jeff) Wisoff and Janice E. Voss, all mission specialists. An earlier launch attempt was scrubbed due to unacceptable weather conditions both at the Kennedy Space Center (KSC) and the overseas contingency landing sites.

  13. Multiresolution analysis (discrete wavelet transform) through Daubechies family for emotion recognition in speech.

    NASA Astrophysics Data System (ADS)

    Campo, D.; Quintero, O. L.; Bastidas, M.

    2016-04-01

    We propose a study of the mathematical properties of voice as an audio signal. This work includes signals in which the channel conditions are not ideal for emotion recognition. Multiresolution analysis- discrete wavelet transform - was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states. ANNs proved to be a system that allows an appropriate classification of such states. This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features. Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify.

  14. On-line Tool Wear Detection on DCMT070204 Carbide Tool Tip Based on Noise Cutting Audio Signal using Artificial Neural Network

    NASA Astrophysics Data System (ADS)

    Prasetyo, T.; Amar, S.; Arendra, A.; Zam Zami, M. K.

    2018-01-01

    This study develops an on-line detection system to predict the wear of DCMT070204 tool tip during the cutting process of the workpiece. The machine used in this research is CNC ProTurn 9000 to cut ST42 steel cylinder. The audio signal has been captured using the microphone placed in the tool post and recorded in Matlab. The signal is recorded at the sampling rate of 44.1 kHz, and the sampling size of 1024. The recorded signal is 110 data derived from the audio signal while cutting using a normal chisel and a worn chisel. And then perform signal feature extraction in the frequency domain using Fast Fourier Transform. Feature selection is done based on correlation analysis. And tool wear classification was performed using artificial neural networks with 33 input features selected. This artificial neural network is trained with back propagation method. Classification performance testing yields an accuracy of 74%.

  15. Real World Audio

    NASA Technical Reports Server (NTRS)

    1998-01-01

    Crystal River Engineering was originally featured in Spinoff 1992 with the Convolvotron, a high speed digital audio processing system that delivers three-dimensional sound over headphones. The Convolvotron was developed for Ames' research on virtual acoustic displays. Crystal River is a now a subsidiary of Aureal Semiconductor, Inc. and they together develop and market the technology, which is a 3-D (three dimensional) audio technology known commercially today as Aureal 3D (A-3D). The technology has been incorporated into video games, surround sound systems, and sound cards.

  16. Strategies for Characterizing the Sensory Environment: Objective and Subjective Evaluation Methods using the VisiSonic Real Space 64/5 Audio-Visual Panoramic Camera

    DTIC Science & Technology

    2017-11-01

    ARL-TR-8205 ● NOV 2017 US Army Research Laboratory Strategies for Characterizing the Sensory Environment: Objective and...Subjective Evaluation Methods using the VisiSonic Real Space 64/5 Audio-Visual Panoramic Camera By Joseph McArdle, Ashley Foots, Chris Stachowiak, and...return it to the originator. ARL-TR-8205 ● NOV 2017 US Army Research Laboratory Strategies for Characterizing the Sensory

  17. NFL Films music scoring stage and control room space

    NASA Astrophysics Data System (ADS)

    Berger, Russ; Schrag, Richard C.; Ridings, Jason J.

    2003-04-01

    NFL Films' new 200,000 sq. ft. corporate headquarters is home to an orchestral scoring stage used to record custom music scores to support and enhance their video productions. Part of the 90,000 sq. ft. of sound critical technical space, the music scoring stage and its associated control room are at the heart of the audio facilities. Driving the design were the owner's mandate for natural light, wood textures, and an acoustical environment that would support small rhythm sections, soloists, and a full orchestra. Being an industry leader in cutting-edge video and audio formats, the NFLF required that the technical spaces allow the latest in technology to be continually integrated into the infrastructure. Never was it more important for a project to hold true to the adage of ``designing from the inside out.'' Each audio and video space within the facility had to stand on its own with regard to user functionality, acoustical accuracy, sound isolation, noise control, and monitor presentation. A detailed look at the architectural and acoustical design challenges encountered and the solutions developed for the performance studio and the associated control room space will be discussed.

  18. Chemical News Via Audio Tapes: Chemical Industry News

    ERIC Educational Resources Information Center

    Hanford, W. E.; And Others

    1972-01-01

    Tape coverage of internal R&D news now has a broader scope with improved features. A new tape series covering external news of broad interest has been initiated. The use of tape in a Continuing Education Program is discussed as the future plans for expanding the audio tape program. (1 reference) (Author)

  19. Transitioning from Analog to Digital Audio Recording in Childhood Speech Sound Disorders

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Mcsweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.

    2005-01-01

    Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing…

  20. All Source Sensor Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    - PNNL, Harold Trease

    2012-10-10

    ASSA is a software application that processes binary data into summarized index tables that can be used to organize features contained within the data. ASSA's index tables can also be used to search for user specified features. ASSA is designed to organize and search for patterns in unstructured binary data streams or archives, such as video, images, audio, and network traffic. ASSA is basically a very general search engine used to search for any pattern in any binary data stream. It has uses in video analytics, image analysis, audio analysis, searching hard-drives, monitoring network traffic, etc.

  1. Vision-mediated interaction with the Nottingham caves

    NASA Astrophysics Data System (ADS)

    Ghali, Ahmed; Bayomi, Sahar; Green, Jonathan; Pridmore, Tony; Benford, Steve

    2003-05-01

    The English city of Nottingham is widely known for its rich history and compelling folklore. A key attraction is the extensive system of caves to be found beneath Nottingham Castle. Regular guided tours are made of the Nottingham caves, during which castle staff tell stories and explain historical events to small groups of visitors while pointing out relevant cave locations and features. The work reported here is part of a project aimed at enhancing the experience of cave visitors, and providing flexible storytelling tools to their guides, by developing machine vision systems capable of identifying specific actions of guides and/or visitors and triggering audio and/or video presentations as a result. Attention is currently focused on triggering audio material by directing the beam of a standard domestic flashlight towards features of interest on the cave wall. Cameras attached to the walls or roof provide image sequences within which torch light and cave features are detected and their relative positions estimated. When a target feature is illuminated the corresponding audio response is generated. We describe the architecture of the system, its implementation within the caves and the results of initial evaluations carried out with castle guides and members of the public.

  2. Learning diagnostic models using speech and language measures.

    PubMed

    Peintner, Bart; Jarrold, William; Vergyriy, Dimitra; Richey, Colleen; Tempini, Maria Luisa Gorno; Ogar, Jennifer

    2008-01-01

    We describe results that show the effectiveness of machine learning in the automatic diagnosis of certain neurodegenerative diseases, several of which alter speech and language production. We analyzed audio from 9 control subjects and 30 patients diagnosed with one of three subtypes of Frontotemporal Lobar Degeneration. From this data, we extracted features of the audio signal and the words the patient used, which were obtained using our automated transcription technologies. We then automatically learned models that predict the diagnosis of the patient using these features. Our results show that learned models over these features predict diagnosis with accuracy significantly better than random. Future studies using higher quality recordings will likely improve these results.

  3. Rapid Development of Orion Structural Test Systems

    NASA Astrophysics Data System (ADS)

    Baker, Dave

    2012-07-01

    NASA is currently validating the Orion spacecraft design for human space flight. Three systems developed by G Systems using hardware and software from National Instruments play an important role in the testing of the new Multi- purpose crew vehicle (MPCV). A custom pressurization and venting system enables engineers to apply pressure inside the test article for measuring strain. A custom data acquisition system synchronizes over 1,800 channels of analog data. This data, along with multiple video and audio streams and calculated data, can be viewed, saved, and replayed in real-time on multiple client stations. This paper presents design features and how the system works together in a distributed fashion.

  4. A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

    NASA Astrophysics Data System (ADS)

    Radhakrishnan, Regunathan; Divakaran, Ajay; Xiong, Ziyou; Otsuka, Isao

    2006-12-01

    We propose a content-adaptive analysis and representation framework to discover events using audio features from "unscripted" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that "interesting" events in unscripted multimedia occur sparsely in a background of usual or "uninteresting" events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select "key audio classes" that are indicative of events of interest in the chosen domain.

  5. Improvement of information fusion-based audio steganalysis

    NASA Astrophysics Data System (ADS)

    Kraetzer, Christian; Dittmann, Jana

    2010-01-01

    In the paper we extend an existing information fusion based audio steganalysis approach by three different kinds of evaluations: The first evaluation addresses the so far neglected evaluations on sensor level fusion. Our results show that this fusion removes content dependability while being capable of achieving similar classification rates (especially for the considered global features) if compared to single classifiers on the three exemplarily tested audio data hiding algorithms. The second evaluation enhances the observations on fusion from considering only segmental features to combinations of segmental and global features, with the result of a reduction of the required computational complexity for testing by about two magnitudes while maintaining the same degree of accuracy. The third evaluation tries to build a basis for estimating the plausibility of the introduced steganalysis approach by measuring the sensibility of the models used in supervised classification of steganographic material against typical signal modification operations like de-noising or 128kBit/s MP3 encoding. Our results show that for some of the tested classifiers the probability of false alarms rises dramatically after such modifications.

  6. Audio fingerprint extraction for content identification

    NASA Astrophysics Data System (ADS)

    Shiu, Yu; Yeh, Chia-Hung; Kuo, C. C. J.

    2003-11-01

    In this work, we present an audio content identification system that identifies some unknown audio material by comparing its fingerprint with those extracted off-line and saved in the music database. We will describe in detail the procedure to extract audio fingerprints and demonstrate that they are robust to noise and content-preserving manipulations. The main feature in the proposed system is the zero-crossing rate extracted with the octave-band filter bank. The zero-crossing rate can be used to describe the dominant frequency in each subband with a very low computational cost. The size of audio fingerprint is small and can be efficiently stored along with the compressed files in the database. It is also robust to many modifications such as tempo change and time-alignment distortion. Besides, the octave-band filter bank is used to enhance the robustness to distortion, especially those localized on some frequency regions.

  7. Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering

    PubMed Central

    Savran, Arman; Cao, Houwei; Shah, Miraj; Nenkova, Ani; Verma, Ragini

    2013-01-01

    We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics. Tested on the Audio-visual Emotion Recognition Challenge dataset, our single modality estimators achieve substantially higher scores than the official baseline method for every dimension of affect. Our filtering-based multi-modality fusion achieves correlation performance of 0.344 (baseline: 0.136) and 0.280 (baseline: 0.096) for the fully continuous and word level sub challenges, respectively. PMID:25300451

  8. Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering.

    PubMed

    Savran, Arman; Cao, Houwei; Shah, Miraj; Nenkova, Ani; Verma, Ragini

    2012-01-01

    We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics. Tested on the Audio-visual Emotion Recognition Challenge dataset, our single modality estimators achieve substantially higher scores than the official baseline method for every dimension of affect. Our filtering-based multi-modality fusion achieves correlation performance of 0.344 (baseline: 0.136) and 0.280 (baseline: 0.096) for the fully continuous and word level sub challenges, respectively.

  9. Direct broadcast satellite-audio, portable and mobile reception tradeoffs

    NASA Technical Reports Server (NTRS)

    Golshan, Nasser

    1992-01-01

    This paper reports on the findings of a systems tradeoffs study on direct broadcast satellite-radio (DBS-R). Based on emerging advanced subband and transform audio coding systems, four ranges of bit rates: 16-32 kbps, 48-64 kbps, 96-128 kbps and 196-256 kbps are identified for DBS-R. The corresponding grades of audio quality will be subjectively comparable to AM broadcasting, monophonic FM, stereophonic FM, and CD quality audio, respectively. The satellite EIRP's needed for mobile DBS-R reception in suburban areas are sufficient for portable reception in most single family houses when allowance is made for the higher G/T of portable table-top receivers. As an example, the variation of the space segment cost as a function of frequency, audio quality, coverage capacity, and beam size is explored for a typical DBS-R system.

  10. Feature Representations for Neuromorphic Audio Spike Streams.

    PubMed

    Anumula, Jithendar; Neil, Daniel; Delbruck, Tobi; Liu, Shih-Chii

    2018-01-01

    Event-driven neuromorphic spiking sensors such as the silicon retina and the silicon cochlea encode the external sensory stimuli as asynchronous streams of spikes across different channels or pixels. Combining state-of-art deep neural networks with the asynchronous outputs of these sensors has produced encouraging results on some datasets but remains challenging. While the lack of effective spiking networks to process the spike streams is one reason, the other reason is that the pre-processing methods required to convert the spike streams to frame-based features needed for the deep networks still require further investigation. This work investigates the effectiveness of synchronous and asynchronous frame-based features generated using spike count and constant event binning in combination with the use of a recurrent neural network for solving a classification task using N-TIDIGITS18 dataset. This spike-based dataset consists of recordings from the Dynamic Audio Sensor, a spiking silicon cochlea sensor, in response to the TIDIGITS audio dataset. We also propose a new pre-processing method which applies an exponential kernel on the output cochlea spikes so that the interspike timing information is better preserved. The results from the N-TIDIGITS18 dataset show that the exponential features perform better than the spike count features, with over 91% accuracy on the digit classification task. This accuracy corresponds to an improvement of at least 2.5% over the use of spike count features, establishing a new state of the art for this dataset.

  11. Feature Representations for Neuromorphic Audio Spike Streams

    PubMed Central

    Anumula, Jithendar; Neil, Daniel; Delbruck, Tobi; Liu, Shih-Chii

    2018-01-01

    Event-driven neuromorphic spiking sensors such as the silicon retina and the silicon cochlea encode the external sensory stimuli as asynchronous streams of spikes across different channels or pixels. Combining state-of-art deep neural networks with the asynchronous outputs of these sensors has produced encouraging results on some datasets but remains challenging. While the lack of effective spiking networks to process the spike streams is one reason, the other reason is that the pre-processing methods required to convert the spike streams to frame-based features needed for the deep networks still require further investigation. This work investigates the effectiveness of synchronous and asynchronous frame-based features generated using spike count and constant event binning in combination with the use of a recurrent neural network for solving a classification task using N-TIDIGITS18 dataset. This spike-based dataset consists of recordings from the Dynamic Audio Sensor, a spiking silicon cochlea sensor, in response to the TIDIGITS audio dataset. We also propose a new pre-processing method which applies an exponential kernel on the output cochlea spikes so that the interspike timing information is better preserved. The results from the N-TIDIGITS18 dataset show that the exponential features perform better than the spike count features, with over 91% accuracy on the digit classification task. This accuracy corresponds to an improvement of at least 2.5% over the use of spike count features, establishing a new state of the art for this dataset. PMID:29479300

  12. Enhancing Navigation Skills through Audio Gaming.

    PubMed

    Sánchez, Jaime; Sáenz, Mauricio; Pascual-Leone, Alvaro; Merabet, Lotfi

    2010-01-01

    We present the design, development and initial cognitive evaluation of an Audio-based Environment Simulator (AbES). This software allows a blind user to navigate through a virtual representation of a real space for the purposes of training orientation and mobility skills. Our findings indicate that users feel satisfied and self-confident when interacting with the audio-based interface, and the embedded sounds allow them to correctly orient themselves and navigate within the virtual world. Furthermore, users are able to transfer spatial information acquired through virtual interactions into real world navigation and problem solving tasks.

  13. Enhancing Navigation Skills through Audio Gaming

    PubMed Central

    Sánchez, Jaime; Sáenz, Mauricio; Pascual-Leone, Alvaro; Merabet, Lotfi

    2014-01-01

    We present the design, development and initial cognitive evaluation of an Audio-based Environment Simulator (AbES). This software allows a blind user to navigate through a virtual representation of a real space for the purposes of training orientation and mobility skills. Our findings indicate that users feel satisfied and self-confident when interacting with the audio-based interface, and the embedded sounds allow them to correctly orient themselves and navigate within the virtual world. Furthermore, users are able to transfer spatial information acquired through virtual interactions into real world navigation and problem solving tasks. PMID:25505796

  14. Influence of audio triggered emotional attention on video perception

    NASA Astrophysics Data System (ADS)

    Torres, Freddy; Kalva, Hari

    2014-02-01

    Perceptual video coding methods attempt to improve compression efficiency by discarding visual information not perceived by end users. Most of the current approaches for perceptual video coding only use visual features ignoring the auditory component. Many psychophysical studies have demonstrated that auditory stimuli affects our visual perception. In this paper we present our study of audio triggered emotional attention and it's applicability to perceptual video coding. Experiments with movie clips show that the reaction time to detect video compression artifacts was longer when video was presented with the audio information. The results reported are statistically significant with p=0.024.

  15. Advances in audio source seperation and multisource audio content retrieval

    NASA Astrophysics Data System (ADS)

    Vincent, Emmanuel

    2012-06-01

    Audio source separation aims to extract the signals of individual sound sources from a given recording. In this paper, we review three recent advances which improve the robustness of source separation in real-world challenging scenarios and enable its use for multisource content retrieval tasks, such as automatic speech recognition (ASR) or acoustic event detection (AED) in noisy environments. We present a Flexible Audio Source Separation Toolkit (FASST) and discuss its advantages compared to earlier approaches such as independent component analysis (ICA) and sparse component analysis (SCA). We explain how cues as diverse as harmonicity, spectral envelope, temporal fine structure or spatial location can be jointly exploited by this toolkit. We subsequently present the uncertainty decoding (UD) framework for the integration of audio source separation and audio content retrieval. We show how the uncertainty about the separated source signals can be accurately estimated and propagated to the features. Finally, we explain how this uncertainty can be efficiently exploited by a classifier, both at the training and the decoding stage. We illustrate the resulting performance improvements in terms of speech separation quality and speaker recognition accuracy.

  16. Audio-guided audiovisual data segmentation, indexing, and retrieval

    NASA Astrophysics Data System (ADS)

    Zhang, Tong; Kuo, C.-C. Jay

    1998-12-01

    While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data, based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e., speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based upon morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes, such as applause, explosions, bird sounds, etc. This fine-level classification and indexing step is based upon time- frequency analysis of audio signals and the use of the hidden Markov model as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90 percent for the coarse-level classification, and higher than 85 percent for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.

  17. Astronaut Garneau working with Audio Control System panel

    NASA Image and Video Library

    1996-06-05

    STS077-392-007 (19-29 May 1996) --- Inside the Spacehab Module onboard the Earth-orbiting Space Shuttle Endeavour, Canadian astronaut Marc Garneau, mission specialist, joins astronaut Curtis L. Brown, Jr., pilot, in checking out the audio control system for Spacehab. The two joined four other NASA astronauts for nine days of research and experimentation in Earth-orbit.

  18. 47 CFR 73.757 - System specifications for single-sideband (SSB) modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... dB per octave. (4) Modulation processing. If audio-frequency signal processing is used, the dynamic... broadcasting service. (a) System parameters—(1) Channel spacing. In a mixed DSB, SSB and digital environment... emission is one giving the same audio-frequency signal-to-noise ratio at the receiver output as the...

  19. Agency Video, Audio and Imagery Library

    NASA Technical Reports Server (NTRS)

    Grubbs, Rodney

    2015-01-01

    The purpose of this presentation was to inform the ISS International Partners of the new NASA Agency Video, Audio and Imagery Library (AVAIL) website. AVAIL is a new resource for the public to search for and download NASA-related imagery, and is not intended to replace the current process by which the International Partners receive their Space Station imagery products.

  20. Virtual environment display for a 3D audio room simulation

    NASA Technical Reports Server (NTRS)

    Chapin, William L.; Foster, Scott H.

    1992-01-01

    The development of a virtual environment simulation system integrating a 3D acoustic audio model with an immersive 3D visual scene is discussed. The system complements the acoustic model and is specified to: allow the listener to freely move about the space, a room of manipulable size, shape, and audio character, while interactively relocating the sound sources; reinforce the listener's feeling of telepresence in the acoustical environment with visual and proprioceptive sensations; enhance the audio with the graphic and interactive components, rather than overwhelm or reduce it; and serve as a research testbed and technology transfer demonstration. The hardware/software design of two demonstration systems, one installed and one portable, are discussed through the development of four iterative configurations.

  1. Integration of advanced teleoperation technologies for control of space robots

    NASA Technical Reports Server (NTRS)

    Stagnaro, Michael J.

    1993-01-01

    Teleoperated robots require one or more humans to control actuators, mechanisms, and other robot equipment given feedback from onboard sensors. To accomplish this task, the human or humans require some form of control station. Desirable features of such a control station include operation by a single human, comfort, and natural human interfaces (visual, audio, motion, tactile, etc.). These interfaces should work to maximize performance of the human/robot system by streamlining the link between human brain and robot equipment. This paper describes development of a control station testbed with the characteristics described above. Initially, this testbed will be used to control two teleoperated robots. Features of the robots include anthropomorphic mechanisms, slaving to the testbed, and delivery of sensory feedback to the testbed. The testbed will make use of technologies such as helmet mounted displays, voice recognition, and exoskeleton masters. It will allow tor integration and testing of emerging telepresence technologies along with techniques for coping with control link time delays. Systems developed from this testbed could be applied to ground control of space based robots. During man-tended operations, the Space Station Freedom may benefit from ground control of IVA or EVA robots with science or maintenance tasks. Planetary exploration may also find advanced teleoperation systems to be very useful.

  2. Audio-visual biofeedback for respiratory-gated radiotherapy: Impact of audio instruction and audio-visual biofeedback on respiratory-gated radiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    George, Rohini; Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA; Chung, Theodore D.

    2006-07-01

    Purpose: Respiratory gating is a commercially available technology for reducing the deleterious effects of motion during imaging and treatment. The efficacy of gating is dependent on the reproducibility within and between respiratory cycles during imaging and treatment. The aim of this study was to determine whether audio-visual biofeedback can improve respiratory reproducibility by decreasing residual motion and therefore increasing the accuracy of gated radiotherapy. Methods and Materials: A total of 331 respiratory traces were collected from 24 lung cancer patients. The protocol consisted of five breathing training sessions spaced about a week apart. Within each session the patients initially breathedmore » without any instruction (free breathing), with audio instructions and with audio-visual biofeedback. Residual motion was quantified by the standard deviation of the respiratory signal within the gating window. Results: Audio-visual biofeedback significantly reduced residual motion compared with free breathing and audio instruction. Displacement-based gating has lower residual motion than phase-based gating. Little reduction in residual motion was found for duty cycles less than 30%; for duty cycles above 50% there was a sharp increase in residual motion. Conclusions: The efficiency and reproducibility of gating can be improved by: incorporating audio-visual biofeedback, using a 30-50% duty cycle, gating during exhalation, and using displacement-based gating.« less

  3. "Why would I want to go out?": Age-related Vision Loss and Social Participation.

    PubMed

    Laliberte Rudman, Debbie; Gold, Deborah; McGrath, Colleen; Zuvela, Biljana; Spafford, Marlee M; Renwick, Rebecca

    2016-12-01

    Social participation, a key determinant of healthy aging, is often negatively impacted by age-related vision loss (ARVL). This grounded theory study aimed to understand social participation as a process negotiated in everyday life by older adults with ARVL. Interviews, audio diaries, and life space maps were used to collect data with 21 older adults in two Ontario cities. Inductive data analysis resulted in a transactional model of the process of negotiating social participation in context. This model depicts how environmental features and resources, skills and abilities, and risks and vulnerabilities transacted with values and priorities to affect if and how social participation occurred within the context of daily life. The findings point to several ways that research and services addressing the social participation of older adults with ARVL need to expand, particularly in relation to environmental features and resources, risk, and the prioritization of independence.

  4. Resonance-Based Time-Frequency Manifold for Feature Extraction of Ship-Radiated Noise.

    PubMed

    Yan, Jiaquan; Sun, Haixin; Chen, Hailan; Junejo, Naveed Ur Rehman; Cheng, En

    2018-03-22

    In this paper, a novel time-frequency signature using resonance-based sparse signal decomposition (RSSD), phase space reconstruction (PSR), time-frequency distribution (TFD) and manifold learning is proposed for feature extraction of ship-radiated noise, which is called resonance-based time-frequency manifold (RTFM). This is suitable for analyzing signals with oscillatory, non-stationary and non-linear characteristics in a situation of serious noise pollution. Unlike the traditional methods which are sensitive to noise and just consider one side of oscillatory, non-stationary and non-linear characteristics, the proposed RTFM can provide the intact feature signature of all these characteristics in the form of a time-frequency signature by the following steps: first, RSSD is employed on the raw signal to extract the high-oscillatory component and abandon the low-oscillatory component. Second, PSR is performed on the high-oscillatory component to map the one-dimensional signal to the high-dimensional phase space. Third, TFD is employed to reveal non-stationary information in the phase space. Finally, manifold learning is applied to the TFDs to fetch the intrinsic non-linear manifold. A proportional addition of the top two RTFMs is adopted to produce the improved RTFM signature. All of the case studies are validated on real audio recordings of ship-radiated noise. Case studies of ship-radiated noise on different datasets and various degrees of noise pollution manifest the effectiveness and robustness of the proposed method.

  5. Resonance-Based Time-Frequency Manifold for Feature Extraction of Ship-Radiated Noise

    PubMed Central

    Yan, Jiaquan; Sun, Haixin; Chen, Hailan; Junejo, Naveed Ur Rehman; Cheng, En

    2018-01-01

    In this paper, a novel time-frequency signature using resonance-based sparse signal decomposition (RSSD), phase space reconstruction (PSR), time-frequency distribution (TFD) and manifold learning is proposed for feature extraction of ship-radiated noise, which is called resonance-based time-frequency manifold (RTFM). This is suitable for analyzing signals with oscillatory, non-stationary and non-linear characteristics in a situation of serious noise pollution. Unlike the traditional methods which are sensitive to noise and just consider one side of oscillatory, non-stationary and non-linear characteristics, the proposed RTFM can provide the intact feature signature of all these characteristics in the form of a time-frequency signature by the following steps: first, RSSD is employed on the raw signal to extract the high-oscillatory component and abandon the low-oscillatory component. Second, PSR is performed on the high-oscillatory component to map the one-dimensional signal to the high-dimensional phase space. Third, TFD is employed to reveal non-stationary information in the phase space. Finally, manifold learning is applied to the TFDs to fetch the intrinsic non-linear manifold. A proportional addition of the top two RTFMs is adopted to produce the improved RTFM signature. All of the case studies are validated on real audio recordings of ship-radiated noise. Case studies of ship-radiated noise on different datasets and various degrees of noise pollution manifest the effectiveness and robustness of the proposed method. PMID:29565288

  6. Design of batch audio/video conversion platform based on JavaEE

    NASA Astrophysics Data System (ADS)

    Cui, Yansong; Jiang, Lianpin

    2018-03-01

    With the rapid development of digital publishing industry, the direction of audio / video publishing shows the diversity of coding standards for audio and video files, massive data and other significant features. Faced with massive and diverse data, how to quickly and efficiently convert to a unified code format has brought great difficulties to the digital publishing organization. In view of this demand and present situation in this paper, basing on the development architecture of Sptring+SpringMVC+Mybatis, and combined with the open source FFMPEG format conversion tool, a distributed online audio and video format conversion platform with a B/S structure is proposed. Based on the Java language, the key technologies and strategies designed in the design of platform architecture are analyzed emphatically in this paper, designing and developing a efficient audio and video format conversion system, which is composed of “Front display system”, "core scheduling server " and " conversion server ". The test results show that, compared with the ordinary audio and video conversion scheme, the use of batch audio and video format conversion platform can effectively improve the conversion efficiency of audio and video files, and reduce the complexity of the work. Practice has proved that the key technology discussed in this paper can be applied in the field of large batch file processing, and has certain practical application value.

  7. Speech Music Discrimination Using Class-Specific Features

    DTIC Science & Technology

    2004-08-01

    Speech Music Discrimination Using Class-Specific Features Thomas Beierholm...between speech and music . Feature extraction is class-specific and can therefore be tailored to each class meaning that segment size, model orders...interest. Some of the applications of audio signal classification are speech/ music classification [1], acoustical environmental classification [2][3

  8. Large-Scale Pattern Discovery in Music

    NASA Astrophysics Data System (ADS)

    Bertin-Mahieux, Thierry

    This work focuses on extracting patterns in musical data from very large collections. The problem is split in two parts. First, we build such a large collection, the Million Song Dataset, to provide researchers access to commercial-size datasets. Second, we use this collection to study cover song recognition which involves finding harmonic patterns from audio features. Regarding the Million Song Dataset, we detail how we built the original collection from an online API, and how we encouraged other organizations to participate in the project. The result is the largest research dataset with heterogeneous sources of data available to music technology researchers. We demonstrate some of its potential and discuss the impact it already has on the field. On cover song recognition, we must revisit the existing literature since there are no publicly available results on a dataset of more than a few thousand entries. We present two solutions to tackle the problem, one using a hashing method, and one using a higher-level feature computed from the chromagram (dubbed the 2DFTM). We further investigate the 2DFTM since it has potential to be a relevant representation for any task involving audio harmonic content. Finally, we discuss the future of the dataset and the hope of seeing more work making use of the different sources of data that are linked in the Million Song Dataset. Regarding cover songs, we explain how this might be a first step towards defining a harmonic manifold of music, a space where harmonic similarities between songs would be more apparent.

  9. INSPIRE

    NASA Technical Reports Server (NTRS)

    Taylor, Bill; Pine, Bill

    2003-01-01

    INSPIRE (Interactive NASA Space Physics Ionosphere Radio Experiment - http://image.gsfc.nasa.gov/poetry/inspire) is a non-profit scientific, educational organization whose objective is to bring the excitement of observing natural and manmade radio waves in the audio region to high school students and others. The project consists of building an audio frequency radio receiver kit, making observations of natural and manmade radio waves and analyzing the data. Students also learn about NASA and our natural environment through the study of lightning, the source of many of the audio frequency waves, the atmosphere, the ionosphere, and the magnetosphere where the waves travel.

  10. Audio signal encryption using chaotic Hénon map and lifting wavelet transforms

    NASA Astrophysics Data System (ADS)

    Roy, Animesh; Misra, A. P.

    2017-12-01

    We propose an audio signal encryption scheme based on the chaotic Hénon map. The scheme mainly comprises two phases: one is the preprocessing stage where the audio signal is transformed into data by the lifting wavelet scheme and the other in which the transformed data is encrypted by chaotic data set and hyperbolic functions. Furthermore, we use dynamic keys and consider the key space size to be large enough to resist any kind of cryptographic attacks. A statistical investigation is also made to test the security and the efficiency of the proposed scheme.

  11. Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans.

    PubMed

    Bresch, Erik; Nielsen, Jon; Nayak, Krishna; Narayanan, Shrikanth

    2006-10-01

    This letter describes a data acquisition setup for recording, and processing, running speech from a person in a magnetic resonance imaging (MRI) scanner. The main focus is on ensuring synchronicity between image and audio acquisition, and in obtaining good signal to noise ratio to facilitate further speech analysis and modeling. A field-programmable gate array based hardware design for synchronizing the scanner image acquisition to other external data such as audio is described. The audio setup itself features two fiber optical microphones and a noise-canceling filter. Two noise cancellation methods are described including a novel approach using a pulse sequence specific model of the gradient noise of the MRI scanner. The setup is useful for scientific speech production studies. Sample results of speech and singing data acquired and processed using the proposed method are given.

  12. Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans (L)

    PubMed Central

    Bresch, Erik; Nielsen, Jon; Nayak, Krishna; Narayanan, Shrikanth

    2007-01-01

    This letter describes a data acquisition setup for recording, and processing, running speech from a person in a magnetic resonance imaging (MRI) scanner. The main focus is on ensuring synchronicity between image and audio acquisition, and in obtaining good signal to noise ratio to facilitate further speech analysis and modeling. A field-programmable gate array based hardware design for synchronizing the scanner image acquisition to other external data such as audio is described. The audio setup itself features two fiber optical microphones and a noise-canceling filter. Two noise cancellation methods are described including a novel approach using a pulse sequence specific model of the gradient noise of the MRI scanner. The setup is useful for scientific speech production studies. Sample results of speech and singing data acquired and processed using the proposed method are given. PMID:17069275

  13. Development of the ISS EMU Dashboard Software

    NASA Technical Reports Server (NTRS)

    Bernard, Craig; Hill, Terry R.

    2011-01-01

    The EMU (Extra-Vehicular Mobility Unit) Dashboard was developed at NASA s Johnson Space Center to aid in real-time mission support for the ISS (International Space Station) and Shuttle EMU space suit by time synchronizing down-linked video, space suit data and audio from the mission control audio loops. Once the input streams are synchronized and recorded, the data can be replayed almost instantly and has proven invaluable in understanding in-flight hardware anomalies and playing back information conveyed by the crew to missions control and the back room support. This paper will walk through the development from an engineer s idea brought to life by an intern to real time mission support and how this tool is evolving today and its challenges to support EVAs (Extra-Vehicular Activities) and human exploration in the 21st century.

  14. Sounds of silence: How to animate virtual worlds with sound

    NASA Technical Reports Server (NTRS)

    Astheimer, Peter

    1993-01-01

    Sounds are an integral and sometimes annoying part of our daily life. Virtual worlds which imitate natural environments gain a lot of authenticity from fast, high quality visualization combined with sound effects. Sounds help to increase the degree of immersion for human dwellers in imaginary worlds significantly. The virtual reality toolkit of IGD (Institute for Computer Graphics) features a broad range of standard visual and advanced real-time audio components which interpret an object-oriented definition of the scene. The virtual reality system 'Virtual Design' realized with the toolkit enables the designer of virtual worlds to create a true audiovisual environment. Several examples on video demonstrate the usage of the audio features in Virtual Design.

  15. Automated Cough Assessment on a Mobile Platform

    PubMed Central

    2014-01-01

    The development of an Automated System for Asthma Monitoring (ADAM) is described. This consists of a consumer electronics mobile platform running a custom application. The application acquires an audio signal from an external user-worn microphone connected to the device analog-to-digital converter (microphone input). This signal is processed to determine the presence or absence of cough sounds. Symptom tallies and raw audio waveforms are recorded and made easily accessible for later review by a healthcare provider. The symptom detection algorithm is based upon standard speech recognition and machine learning paradigms and consists of an audio feature extraction step followed by a Hidden Markov Model based Viterbi decoder that has been trained on a large database of audio examples from a variety of subjects. Multiple Hidden Markov Model topologies and orders are studied. Performance of the recognizer is presented in terms of the sensitivity and the rate of false alarm as determined in a cross-validation test. PMID:25506590

  16. Hierarchical structure for audio-video based semantic classification of sports video sequences

    NASA Astrophysics Data System (ADS)

    Kolekar, M. H.; Sengupta, S.

    2005-07-01

    A hierarchical structure for sports event classification based on audio and video content analysis is proposed in this paper. Compared to the event classifications in other games, those of cricket are very challenging and yet unexplored. We have successfully solved cricket video classification problem using a six level hierarchical structure. The first level performs event detection based on audio energy and Zero Crossing Rate (ZCR) of short-time audio signal. In the subsequent levels, we classify the events based on video features using a Hidden Markov Model implemented through Dynamic Programming (HMM-DP) using color or motion as a likelihood function. For some of the game-specific decisions, a rule-based classification is also performed. Our proposed hierarchical structure can easily be applied to any other sports. Our results are very promising and we have moved a step forward towards addressing semantic classification problems in general.

  17. Space.

    ERIC Educational Resources Information Center

    Web Feet K-8, 2001

    2001-01-01

    This annotated subject guide to Web sites and additional resources focuses on space and astronomy. Specifies age levels for resources that include Web sites, CD-ROMS and software, videos, books, audios, and magazines; offers professional resources; and presents a relevant class activity. (LRW)

  18. [A magnetic therapy apparatus with an adaptable electromagnetic spectrum for the treatment of prostatitis and gynecopathies].

    PubMed

    Kuz'min, A A; Meshkovskiĭ, D V; Filist, S A

    2008-01-01

    Problems of engineering and algorithm development of magnetic therapy apparatuses with pseudo-random radiation spectrum within the audio range for treatment of prostatitis and gynecopathies are considered. A typical design based on a PIC 16F microcontroller is suggested. It includes a keyboard, LCD indicator, audio amplifier, inducer, and software units. The problem of pseudo-random signal generation within the audio range is considered. A series of rectangular pulses is generated on a random-length interval on the basis of a three-component random vector. This series provides the required spectral characteristics of the therapeutic magnetic field and their adaptation to the therapeutic conditions and individual features of the patient.

  19. "Travelers In The Night" in the Old and New Media

    NASA Astrophysics Data System (ADS)

    Grauer, Albert D.

    2015-11-01

    "Travelers in the Night" is a series of 2 minute audio programs based on current research in astronomy and the space sciences.After more than a year of submitting “Travelers In The Night” 2 minute audio pieces to NPR and Community Radio stations with limited success, a parallel effort was initiated by posting the pieces as audio podcasts on Spreaker.com and iTunes.The classic media dispenses programming whose content and schedule is determined by editors and station managers. Riding the wave of new technology, people from every demographic group across the globe are selecting what, when, and how they receive information and entertainment. This change is significant with the Pew Research Center reporting that currently more than 60% of Facebook and Twitter users now get their news and/or links to stories from these sources. What remains constant is the public’s interest in astronomy and space.This poster presents relevant statistics and a discussion of the initial results of these two parallel efforts.

  20. Distance Learning as a Training and Education Tool.

    ERIC Educational Resources Information Center

    Hosley, David L.; Randolph, Sherry L.

    Lockheed Space Operations Company's Technical Training Department provides certification classes to personnel at other National Aeronautics and Space Administration (NASA) Centers. Courses are delivered over the Kennedy Space Center's Video Teleconferencing System (ViTS). The ViTS system uses two-way compressed video and two-way audio between…

  1. Acoustic Calibration of the Exterior Effects Room at the NASA Langley Research Center

    NASA Technical Reports Server (NTRS)

    Faller, Kenneth J., II; Rizzi, Stephen A.; Klos, Jacob; Chapin, William L.; Surucu, Fahri; Aumann, Aric R.

    2010-01-01

    The Exterior Effects Room (EER) at the NASA Langley Research Center is a 39-seat auditorium built for psychoacoustic studies of aircraft community noise. The original reproduction system employed monaural playback and hence lacked sound localization capability. In an effort to more closely recreate field test conditions, a significant upgrade was undertaken to allow simulation of a three-dimensional audio and visual environment. The 3D audio system consists of 27 mid and high frequency satellite speakers and 4 subwoofers, driven by a real-time audio server running an implementation of Vector Base Amplitude Panning. The audio server is part of a larger simulation system, which controls the audio and visual presentation of recorded and synthesized aircraft flyovers. The focus of this work is on the calibration of the 3D audio system, including gains used in the amplitude panning algorithm, speaker equalization, and absolute gain control. Because the speakers are installed in an irregularly shaped room, the speaker equalization includes time delay and gain compensation due to different mounting distances from the focal point, filtering for color compensation due to different installations (half space, corner, baffled/unbaffled), and cross-over filtering.

  2. An Overview of Audacity

    ERIC Educational Resources Information Center

    Thompson, Douglas Earl

    2014-01-01

    This article is an overview of the open source audio-editing and -recording program, Audacity. Key features are noted, along with significant features not included in the program. A number of music and music technology concepts are identified that could be taught and/or reinforced through using Audacity.

  3. Emotion detection model of Filipino music

    NASA Astrophysics Data System (ADS)

    Noblejas, Kathleen Alexis; Isidro, Daryl Arvin; Samonte, Mary Jane C.

    2017-02-01

    This research explored the creation of a model to detect emotion from Filipino songs. The emotion model used was based from Paul Ekman's six basic emotions. The songs were classified into the following genres: kundiman, novelty, pop, and rock. The songs were annotated by a group of music experts based on the emotion the song induces to the listener. Musical features of the songs were extracted using jAudio while the lyric features were extracted by Bag-of- Words feature representation. The audio and lyric features of the Filipino songs were extracted for classification by the chosen three classifiers, Naïve Bayes, Support Vector Machines, and k-Nearest Neighbors. The goal of the research was to know which classifier would work best for Filipino music. Evaluation was done by 10-fold cross validation and accuracy, precision, recall, and F-measure results were compared. Models were also tested with unknown test data to further determine the models' accuracy through the prediction results.

  4. Multisensory and modality specific processing of visual speech in different regions of the premotor cortex

    PubMed Central

    Callan, Daniel E.; Jones, Jeffery A.; Callan, Akiko

    2014-01-01

    Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex (PMC) has been shown to be active during both observation and execution of action (“Mirror System” properties), and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI) study, participants identified vowels produced by a speaker in audio-visual (saw the speaker's articulating face and heard her voice), visual only (only saw the speaker's articulating face), and audio only (only heard the speaker's voice) conditions with varying audio signal-to-noise ratios in order to determine the regions of the PMC involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the functional magnetic resonance imaging (fMRI) analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and PMC. The left ventral inferior premotor cortex (PMvi) showed properties of multimodal (audio-visual) enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex (PMvs/PMd) did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the PMC are involved with mapping unimodal (in this case visual) sensory features of the speech signal with articulatory speech gestures. PMID:24860526

  5. Aeronautical audio broadcasting via satellite

    NASA Technical Reports Server (NTRS)

    Tzeng, Forrest F.

    1993-01-01

    A system design for aeronautical audio broadcasting, with C-band uplink and L-band downlink, via Inmarsat space segments is presented. Near-transparent-quality compression of 5-kHz bandwidth audio at 20.5 kbit/s is achieved based on a hybrid technique employing linear predictive modeling and transform-domain residual quantization. Concatenated Reed-Solomon/convolutional codes with quadrature phase shift keying are selected for bandwidth and power efficiency. RF bandwidth at 25 kHz per channel, and a decoded bit error rate at 10(exp -6) with E(sub b)/N(sub o) at 3.75 dB are obtained. An interleaver, scrambler, modem synchronization, and frame format were designed, and frequency-division multiple access was selected over code-division multiple access. A link budget computation based on a worst-case scenario indicates sufficient system power margins. Transponder occupancy analysis for 72 audio channels demonstrates ample remaining capacity to accommodate emerging aeronautical services.

  6. Attention to sound improves auditory reliability in audio-tactile spatial optimal integration.

    PubMed

    Vercillo, Tiziana; Gori, Monica

    2015-01-01

    The role of attention on multisensory processing is still poorly understood. In particular, it is unclear whether directing attention toward a sensory cue dynamically reweights cue reliability during integration of multiple sensory signals. In this study, we investigated the impact of attention in combining audio-tactile signals in an optimal fashion. We used the Maximum Likelihood Estimation (MLE) model to predict audio-tactile spatial localization on the body surface. We developed a new audio-tactile device composed by several small units, each one consisting of a speaker and a tactile vibrator independently controllable by external software. We tested participants in an attentional and a non-attentional condition. In the attentional experiment, participants performed a dual task paradigm: they were required to evaluate the duration of a sound while performing an audio-tactile spatial task. Three unisensory or multisensory stimuli, conflictual or not conflictual sounds and vibrations arranged along the horizontal axis, were presented sequentially. In the primary task participants had to evaluate in a space bisection task the position of the second stimulus (the probe) with respect to the others (the standards). In the secondary task they had to report occasionally changes in duration of the second auditory stimulus. In the non-attentional task participants had only to perform the primary task (space bisection). Our results showed an enhanced auditory precision (and auditory weights) in the auditory attentional condition with respect to the control non-attentional condition. The results of this study support the idea that modality-specific attention modulates multisensory integration.

  7. Audio Haptic Videogaming for Developing Wayfinding Skills in Learners Who are Blind

    PubMed Central

    Sánchez, Jaime; de Borba Campos, Marcia; Espinoza, Matías; Merabet, Lotfi B.

    2014-01-01

    Interactive digital technologies are currently being developed as a novel tool for education and skill development. Audiopolis is an audio and haptic based videogame designed for developing orientation and mobility (O&M) skills in people who are blind. We have evaluated the cognitive impact of videogame play on O&M skills by assessing performance on a series of behavioral tasks carried out in both indoor and outdoor virtual spaces. Our results demonstrate that the use of Audiopolis had a positive impact on the development and use of O&M skills in school-aged learners who are blind. The impact of audio and haptic information on learning is also discussed. PMID:25485312

  8. Hierarchical vs non-hierarchical audio indexation and classification for video genres

    NASA Astrophysics Data System (ADS)

    Dammak, Nouha; BenAyed, Yassine

    2018-04-01

    In this paper, Support Vector Machines (SVMs) are used for segmenting and indexing video genres based on only audio features extracted at block level, which has a prominent asset by capturing local temporal information. The main contribution of our study is to show the wide effect on the classification accuracies while using an hierarchical categorization structure based on Mel Frequency Cepstral Coefficients (MFCC) audio descriptor. In fact, the classification consists in three common video genres: sports videos, music clips and news scenes. The sub-classification may divide each genre into several multi-speaker and multi-dialect sub-genres. The validation of this approach was carried out on over 360 minutes of video span yielding a classification accuracy of over 99%.

  9. Full-Featured Web Conferencing Systems

    ERIC Educational Resources Information Center

    Foreman, Joel; Jenkins, Roy

    2005-01-01

    In order to match the customary strengths of the still dominant face-to-face instructional mode, a high-performance online learning system must employ synchronous as well as asynchronous communications; buttress graphics, animation, and text with live audio and video; and provide many of the features and processes associated with course management…

  10. Temporal Structure and Complexity Affect Audio-Visual Correspondence Detection

    PubMed Central

    Denison, Rachel N.; Driver, Jon; Ruff, Christian C.

    2013-01-01

    Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence) to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task reproduced features of past findings based on explicit timing judgments but did not show any special advantage for perfectly synchronous streams. Importantly, the complexity of temporal patterns influences sensitivity to correspondence. Stochastic, irregular streams – with richer temporal pattern information – led to higher audio-visual matching sensitivity than predictable, rhythmic streams. Our results reveal that temporal structure and its complexity are key determinants for human detection of audio-visual correspondence. The distinctive emphasis of our new paradigms on temporal patterning could be useful for studying special populations with suspected abnormalities in audio-visual temporal perception and multisensory integration. PMID:23346067

  11. Singing voice detection for karaoke application

    NASA Astrophysics Data System (ADS)

    Shenoy, Arun; Wu, Yuansheng; Wang, Ye

    2005-07-01

    We present a framework to detect the regions of singing voice in musical audio signals. This work is oriented towards the development of a robust transcriber of lyrics for karaoke applications. The technique leverages on a combination of low-level audio features and higher level musical knowledge of rhythm and tonality. Musical knowledge of the key is used to create a song-specific filterbank to attenuate the presence of the pitched musical instruments. This is followed by subband processing of the audio to detect the musical octaves in which the vocals are present. Text processing is employed to approximate the duration of the sung passages using freely available lyrics. This is used to obtain a dynamic threshold for vocal/ non-vocal segmentation. This pairing of audio and text processing helps create a more accurate system. Experimental evaluation on a small database of popular songs shows the validity of the proposed approach. Holistic and per-component evaluation of the system is conducted and various improvements are discussed.

  12. Sellers holds up a bundle of tangled audio cables during STS-121 / Expedition 13 joint operations

    NASA Image and Video Library

    2006-07-15

    S121-E-07791 (15 July 2006) --- Astronaut Piers J. Sellers, STS-121 mission specialist, works with cables on the middeck of Space Shuttle Discovery as the shuttle crew prepares to undock from the International Space Station.

  13. Kuipers installs and routes RCS Video Cables in the U.S. Laboratory

    NASA Image and Video Library

    2012-02-01

    ISS030-E-060117 (1 Feb. 2012) --- In the International Space Station?s Destiny laboratory, European Space Agency astronaut Andre Kuipers, Expedition 30 flight engineer, routes video cable for the High Rate Communication System (HRCS). HRCS will allow for two additional space-to-ground audio channels and two additional downlink video channels.

  14. Simple Solutions for Space Station Audio Problems

    NASA Technical Reports Server (NTRS)

    Wood, Eric

    2016-01-01

    Throughout this summer, a number of different projects were supported relating to various NASA programs, including the International Space Station (ISS) and Orion. The primary project that was worked on was designing and testing an acoustic diverter which could be used on the ISS to increase sound pressure levels in Node 1, a module that does not have any Audio Terminal Units (ATUs) inside it. This acoustic diverter is not intended to be a permanent solution to providing audio to Node 1; it is simply intended to improve conditions while more permanent solutions are under development. One of the most exciting aspects of this project is that the acoustic diverter is designed to be 3D printed on the ISS, using the 3D printer that was set up earlier this year. Because of this, no new hardware needs to be sent up to the station, and no extensive hardware testing needs to be performed on the ground before sending it to the station. Instead, the 3D part file can simply be uploaded to the station's 3D printer, where the diverter will be made.

  15. Affective video retrieval: violence detection in Hollywood movies by large-scale segmental feature extraction.

    PubMed

    Eyben, Florian; Weninger, Felix; Lehment, Nicolas; Schuller, Björn; Rigoll, Gerhard

    2013-01-01

    Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology "out of the lab" to real-world, diverse data. In this contribution, we address the problem of finding "disturbing" scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.

  16. Affective Video Retrieval: Violence Detection in Hollywood Movies by Large-Scale Segmental Feature Extraction

    PubMed Central

    Eyben, Florian; Weninger, Felix; Lehment, Nicolas; Schuller, Björn; Rigoll, Gerhard

    2013-01-01

    Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology “out of the lab” to real-world, diverse data. In this contribution, we address the problem of finding “disturbing” scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis. PMID:24391704

  17. ESA personal communications and digital audio broadcasting systems based on non-geostationary satellites

    NASA Technical Reports Server (NTRS)

    Logalbo, P.; Benedicto, J.; Viola, R.

    1993-01-01

    Personal Communications and Digital Audio Broadcasting are two new services that the European Space Agency (ESA) is investigating for future European and Global Mobile Satellite systems. ESA is active in promoting these services in their various mission options including non-geostationary and geostationary satellite systems. A Medium Altitude Global Satellite System (MAGSS) for global personal communications at L and S-band, and a Multiregional Highly inclined Elliptical Orbit (M-HEO) system for multiregional digital audio broadcasting at L-band are described. Both systems are being investigated by ESA in the context of future programs, such as Archimedes, which are intended to demonstrate the new services and to develop the technology for future non-geostationary mobile communication and broadcasting satellites.

  18. Headphone and Head-Mounted Visual Displays for Virtual Environments

    NASA Technical Reports Server (NTRS)

    Begault, Duran R.; Ellis, Stephen R.; Wenzel, Elizabeth M.; Trejo, Leonard J. (Technical Monitor)

    1998-01-01

    A realistic auditory environment can contribute to both the overall subjective sense of presence in a virtual display, and to a quantitative metric predicting human performance. Here, the role of audio in a virtual display and the importance of auditory-visual interaction are examined. Conjectures are proposed regarding the effectiveness of audio compared to visual information for creating a sensation of immersion, the frame of reference within a virtual display, and the compensation of visual fidelity by supplying auditory information. Future areas of research are outlined for improving simulations of virtual visual and acoustic spaces. This paper will describe some of the intersensory phenomena that arise during operator interaction within combined visual and auditory virtual environments. Conjectures regarding audio-visual interaction will be proposed.

  19. Influence of Immersive Human Scale Architectural Representation on Design Judgment

    NASA Astrophysics Data System (ADS)

    Elder, Rebecca L.

    Unrealistic visual representation of architecture within our existing environments have lost all reference to the human senses. As a design tool, visual and auditory stimuli can be utilized to determine human's perception of design. This experiment renders varying building inputs within different sites, simulated with corresponding immersive visual and audio sensory cues. Introducing audio has been proven to influence the way a person perceives a space, yet most inhabitants rely strictly on their sense of vision to make design judgments. Though not as apparent, users prefer spaces that have a better quality of sound and comfort. Through a series of questions, we can begin to analyze whether a design is fit for both an acoustic and visual environment.

  20. Multimedia Classifier

    NASA Astrophysics Data System (ADS)

    Costache, G. N.; Gavat, I.

    2004-09-01

    Along with the aggressive growing of the amount of digital data available (text, audio samples, digital photos and digital movies joined all in the multimedia domain) the need for classification, recognition and retrieval of this kind of data became very important. In this paper will be presented a system structure to handle multimedia data based on a recognition perspective. The main processing steps realized for the interesting multimedia objects are: first, the parameterization, by analysis, in order to obtain a description based on features, forming the parameter vector; second, a classification, generally with a hierarchical structure to make the necessary decisions. For audio signals, both speech and music, the derived perceptual features are the melcepstral (MFCC) and the perceptual linear predictive (PLP) coefficients. For images, the derived features are the geometric parameters of the speaker mouth. The hierarchical classifier consists generally in a clustering stage, based on the Kohonnen Self-Organizing Maps (SOM) and a final stage, based on a powerful classification algorithm called Support Vector Machines (SVM). The system, in specific variants, is applied with good results in two tasks: the first, is a bimodal speech recognition which uses features obtained from speech signal fused to features obtained from speaker's image and the second is a music retrieval from large music database.

  1. Designing sound and visual components for enhancement of urban soundscapes.

    PubMed

    Hong, Joo Young; Jeon, Jin Yong

    2013-09-01

    The aim of this study is to investigate the effect of audio-visual components on environmental quality to improve soundscape. Natural sounds with road traffic noise and visual components in urban streets were evaluated through laboratory experiments. Waterfall and stream water sounds, as well as bird sounds, were selected to enhance the soundscape. Sixteen photomontages of a streetscape were constructed in combination with two types of water features and three types of vegetation which were chosen as positive visual components. The experiments consisted of audio-only, visual-only, and audio-visual conditions. The preferences and environmental qualities of the stimuli were evaluated by a numerical scale and 12 pairs of adjectives, respectively. The results showed that bird sounds were the most preferred among the natural sounds, while the sound of falling water was found to degrade the soundscape quality when the road traffic noise level was high. The visual effects of vegetation on aesthetic preference were significant, but those of water features relatively small. It was revealed that the perceptual dimensions of the environment were different from the noise levels. Particularly, the acoustic comfort factor related to soundscape quality considerably influenced preference for the overall environment at a higher level of road traffic noise.

  2. STS-29 MS Bagian juggles audio cassettes on Discovery's, OV-103's, middeck

    NASA Image and Video Library

    1989-03-18

    STS29-02-033 (3-18 March 1989) --- In what appears to be a juggling act in the microgravity of space, James P. Bagian, a physician, is actually attempting to organize audio cassettes. Other frames taken during the flight document Bagian's medical testing of his fellow crewmembers. This photographic frame was among NASA's third STS-29 photo release. Monday, March 20, 1989. Crewmembers were Astronauts Michael L. Coats, John E. Blaha, James F. Buchli, Robert C. Springer and James P. Bagian.

  3. Simulation and testing of a multichannel system for 3D sound localization

    NASA Astrophysics Data System (ADS)

    Matthews, Edward Albert

    Three-dimensional (3D) audio involves the ability to localize sound anywhere in a three-dimensional space. 3D audio can be used to provide the listener with the perception of moving sounds and can provide a realistic listening experience for applications such as gaming, video conferencing, movies, and concerts. The purpose of this research is to simulate and test 3D audio by incorporating auditory localization techniques in a multi-channel speaker system. The objective is to develop an algorithm that can place an audio event in a desired location by calculating and controlling the gain factors of each speaker. A MATLAB simulation displays the location of the speakers and perceived sound, which is verified through experimentation. The scenario in which the listener is not equidistant from each of the speakers is also investigated and simulated. This research is envisioned to lead to a better understanding of human localization of sound, and will contribute to a more realistic listening experience.

  4. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study.

    PubMed

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-11-06

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.

  5. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study

    PubMed Central

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-01-01

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented. PMID:26561811

  6. Colour Association with Music Is Mediated by Emotion: Evidence from an Experiment Using a CIE Lab Interface and Interviews

    PubMed Central

    Lindborg, PerMagnus; Friberg, Anders K.

    2015-01-01

    Crossmodal associations may arise at neurological, perceptual, cognitive, or emotional levels of brain processing. Higher-level modal correspondences between musical timbre and visual colour have been previously investigated, though with limited sets of colour. We developed a novel response method that employs a tablet interface to navigate the CIE Lab colour space. The method was used in an experiment where 27 film music excerpts were presented to participants (n = 22) who continuously manipulated the colour and size of an on-screen patch to match the music. Analysis of the data replicated and extended earlier research, for example, that happy music was associated with yellow, music expressing anger with large red colour patches, and sad music with smaller patches towards dark blue. Correlation analysis suggested patterns of relationships between audio features and colour patch parameters. Using partial least squares regression, we tested models for predicting colour patch responses from audio features and ratings of perceived emotion in the music. Parsimonious models that included emotion robustly explained between 60% and 75% of the variation in each of the colour patch parameters, as measured by cross-validated R 2. To illuminate the quantitative findings, we performed a content analysis of structured spoken interviews with the participants. This provided further evidence of a significant emotion mediation mechanism, whereby people tended to match colour association with the perceived emotion in the music. The mixed method approach of our study gives strong evidence that emotion can mediate crossmodal association between music and visual colour. The CIE Lab interface promises to be a useful tool in perceptual ratings of music and other sounds. PMID:26642050

  7. Colour Association with Music Is Mediated by Emotion: Evidence from an Experiment Using a CIE Lab Interface and Interviews.

    PubMed

    Lindborg, PerMagnus; Friberg, Anders K

    2015-01-01

    Crossmodal associations may arise at neurological, perceptual, cognitive, or emotional levels of brain processing. Higher-level modal correspondences between musical timbre and visual colour have been previously investigated, though with limited sets of colour. We developed a novel response method that employs a tablet interface to navigate the CIE Lab colour space. The method was used in an experiment where 27 film music excerpts were presented to participants (n = 22) who continuously manipulated the colour and size of an on-screen patch to match the music. Analysis of the data replicated and extended earlier research, for example, that happy music was associated with yellow, music expressing anger with large red colour patches, and sad music with smaller patches towards dark blue. Correlation analysis suggested patterns of relationships between audio features and colour patch parameters. Using partial least squares regression, we tested models for predicting colour patch responses from audio features and ratings of perceived emotion in the music. Parsimonious models that included emotion robustly explained between 60% and 75% of the variation in each of the colour patch parameters, as measured by cross-validated R2. To illuminate the quantitative findings, we performed a content analysis of structured spoken interviews with the participants. This provided further evidence of a significant emotion mediation mechanism, whereby people tended to match colour association with the perceived emotion in the music. The mixed method approach of our study gives strong evidence that emotion can mediate crossmodal association between music and visual colour. The CIE Lab interface promises to be a useful tool in perceptual ratings of music and other sounds.

  8. Vroom: designing an augmented environment for remote collaboration in digital cinema production

    NASA Astrophysics Data System (ADS)

    Margolis, Todd; Cornish, Tracy

    2013-03-01

    As media technologies become increasingly affordable, compact and inherently networked, new generations of telecollaborative platforms continue to arise which integrate these new affordances. Virtual reality has been primarily concerned with creating simulations of environments that can transport participants to real or imagined spaces that replace the "real world". Meanwhile Augmented Reality systems have evolved to interleave objects from Virtual Reality environments into the physical landscape. Perhaps now there is a new class of systems that reverse this precept to enhance dynamic media landscapes and immersive physical display environments to enable intuitive data exploration through collaboration. Vroom (Virtual Room) is a next-generation reconfigurable tiled display environment in development at the California Institute for Telecommunications and Information Technology (Calit2) at the University of California, San Diego. Vroom enables freely scalable digital collaboratories, connecting distributed, high-resolution visualization resources for collaborative work in the sciences, engineering and the arts. Vroom transforms a physical space into an immersive media environment with large format interactive display surfaces, video teleconferencing and spatialized audio built on a highspeed optical network backbone. Vroom enables group collaboration for local and remote participants to share knowledge and experiences. Possible applications include: remote learning, command and control, storyboarding, post-production editorial review, high resolution video playback, 3D visualization, screencasting and image, video and multimedia file sharing. To support these various scenarios, Vroom features support for multiple user interfaces (optical tracking, touch UI, gesture interface, etc.), support for directional and spatialized audio, giga-pixel image interactivity, 4K video streaming, 3D visualization and telematic production. This paper explains the design process that has been utilized to make Vroom an accessible and intuitive immersive environment for remote collaboration specifically for digital cinema production.

  9. Microcomputer Software Development: New Strategies for a New Technology.

    ERIC Educational Resources Information Center

    Kehrberg, Kent T.

    1979-01-01

    Provides a guide for the development of educational computer programs for use on microcomputers. Making use of the features of microcomputers, including visual, audio, and tactile techniques, is encouraged. (Author/IRT)

  10. Music information retrieval in compressed audio files: a survey

    NASA Astrophysics Data System (ADS)

    Zampoglou, Markos; Malamos, Athanasios G.

    2014-07-01

    In this paper, we present an organized survey of the existing literature on music information retrieval systems in which descriptor features are extracted directly from the compressed audio files, without prior decompression to pulse-code modulation format. Avoiding the decompression step and utilizing the readily available compressed-domain information can significantly lighten the computational cost of a music information retrieval system, allowing application to large-scale music databases. We identify a number of systems relying on compressed-domain information and form a systematic classification of the features they extract, the retrieval tasks they tackle and the degree in which they achieve an actual increase in the overall speed-as well as any resulting loss in accuracy. Finally, we discuss recent developments in the field, and the potential research directions they open toward ultra-fast, scalable systems.

  11. Auto-Associative Recurrent Neural Networks and Long Term Dependencies in Novelty Detection for Audio Surveillance Applications

    NASA Astrophysics Data System (ADS)

    Rossi, A.; Montefoschi, F.; Rizzo, A.; Diligenti, M.; Festucci, C.

    2017-10-01

    Machine Learning applied to Automatic Audio Surveillance has been attracting increasing attention in recent years. In spite of several investigations based on a large number of different approaches, little attention had been paid to the environmental temporal evolution of the input signal. In this work, we propose an exploration in this direction comparing the temporal correlations extracted at the feature level with the one learned by a representational structure. To this aim we analysed the prediction performances of a Recurrent Neural Network architecture varying the length of the processed input sequence and the size of the time window used in the feature extraction. Results corroborated the hypothesis that sequential models work better when dealing with data characterized by temporal order. However, so far the optimization of the temporal dimension remains an open issue.

  12. Neuromorphic audio-visual sensor fusion on a sound-localizing robot.

    PubMed

    Chan, Vincent Yue-Sek; Jin, Craig T; van Schaik, André

    2012-01-01

    This paper presents the first robotic system featuring audio-visual (AV) sensor fusion with neuromorphic sensors. We combine a pair of silicon cochleae and a silicon retina on a robotic platform to allow the robot to learn sound localization through self motion and visual feedback, using an adaptive ITD-based sound localization algorithm. After training, the robot can localize sound sources (white or pink noise) in a reverberant environment with an RMS error of 4-5° in azimuth. We also investigate the AV source binding problem and an experiment is conducted to test the effectiveness of matching an audio event with a corresponding visual event based on their onset time. Despite the simplicity of this method and a large number of false visual events in the background, a correct match can be made 75% of the time during the experiment.

  13. Twenty-Five Years of Dynamic Growth.

    ERIC Educational Resources Information Center

    Pipes, Lana

    1980-01-01

    Discusses developments in instructional technology in the past 25 years in the areas of audio, video, micro-electronics, social evolution, the space race, and living with rapidly changing technology. (CMV)

  14. The Construction (Using Multi-Media Techniques) of Certain Modules of a Programmed Course in Astronomy-Space Sciences for NASA Personnel of The Goddard Space Flight Center, Greenbelt, Maryland.

    ERIC Educational Resources Information Center

    Collagan, Robert B.

    This paper describes the development of a self-instructional multi-media course in astronomy-space sciences for non-technical NASA personnel. The course consists of a variety of programed materials including slides, films, film-loops, filmstrips video-tapes and audio-tapes, on concepts of time, space, and matter in our solar system and galaxy.…

  15. Williams in the U.S. Laboratory during Expedition 13

    NASA Image and Video Library

    2006-08-17

    ISS013-E-67445 (17 Aug. 2006) --- Astronaut Jeffrey N. Williams, Expedition 13 NASA space station science officer and flight engineer, conducts an educational teleconference with the Boys and Girls Clubs of Middle Tennessee in Nashville, via Ku- and S-band in the Destiny laboratory of the International Space Station, with audio and video relayed to the Mission Control Center at Johnson Space Center.

  16. Expedition 13 Crew during a teleconference in the U.S. Laboratory during Expedition 13

    NASA Image and Video Library

    2006-08-31

    ISS013-E-75727 (31 Aug. 2006) --- Astronaut Jeffrey N. Williams (foreground), Expedition 13 NASA space station science officer and flight engineer; cosmonaut Pavel V. Vinogradov (center), commander representing Russia's Federal Space Agency; and European Space Agency (ESA) astronaut Thomas Reiter, flight engineer, conduct a teleconference in the Destiny laboratory of the International Space Station, via Ku- and S-band, with audio and video relayed to the Mission Control Center (MCC) at Johnson Space Center.

  17. Active noise control for infant incubators.

    PubMed

    Yu, Xun; Gujjula, Shruthi; Kuo, Sen M

    2009-01-01

    This paper presents an active noise control system for infant incubators. Experimental results show that global noise reduction can be achieved for infant incubator ANC systems. An audio-integration algorithm is presented to introduce a healthy audio (intrauterine) sound with the ANC system to mask the residual noise and soothe the infant. Carbon nanotube based transparent thin film speaker is also introduced in this paper as the actuator for the ANC system to generate the destructive secondary sound, which can significantly save the congested incubator space and without blocking the view of doctors and nurses.

  18. Automatic Indexing for Content Analysis of Whale Recordings and XML Representation

    NASA Astrophysics Data System (ADS)

    Bénard, Frédéric; Glotin, Hervé

    2010-12-01

    This paper focuses on the robust indexing of sperm whale hydrophone recordings based on a set of features extracted from a real-time passive underwater acoustic tracking algorithm for multiple whales using four hydrophones. Acoustic localization permits the study of whale behavior in deep water without interfering with the environment. Given the position coordinates, we are able to generate different features such as the speed, energy of the clicks, Inter-Click-Interval (ICI), and so on. These features allow to construct different markers which allow us to index and structure the audio files. Thus, the behavior study is facilitated by choosing and accessing the corresponding index in the audio file. The complete indexing algorithm is processed on real data from the NUWC (Naval Undersea Warfare Center of the US Navy) and the AUTEC (Atlantic Undersea Test & Evaluation Center-Bahamas). Our model is validated by similar results from the US Navy (NUWC) and SOEST (School of Ocean and Earth Science and Technology) Hawaii university labs in a single whale case. Finally, as an illustration, we index a single whale sound file using the extracted whale's features provided by the tracking, and we present an example of an XML script structuring it.

  19. A Robust Zero-Watermarking Algorithm for Audio

    NASA Astrophysics Data System (ADS)

    Chen, Ning; Zhu, Jie

    2007-12-01

    In traditional watermarking algorithms, the insertion of watermark into the host signal inevitably introduces some perceptible quality degradation. Another problem is the inherent conflict between imperceptibility and robustness. Zero-watermarking technique can solve these problems successfully. Instead of embedding watermark, the zero-watermarking technique extracts some essential characteristics from the host signal and uses them for watermark detection. However, most of the available zero-watermarking schemes are designed for still image and their robustness is not satisfactory. In this paper, an efficient and robust zero-watermarking technique for audio signal is presented. The multiresolution characteristic of discrete wavelet transform (DWT), the energy compression characteristic of discrete cosine transform (DCT), and the Gaussian noise suppression property of higher-order cumulant are combined to extract essential features from the host audio signal and they are then used for watermark recovery. Simulation results demonstrate the effectiveness of our scheme in terms of inaudibility, detection reliability, and robustness.

  20. Computationally Efficient Clustering of Audio-Visual Meeting Data

    NASA Astrophysics Data System (ADS)

    Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

    This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.

  1. Multifunction audio digitizer for communications systems

    NASA Technical Reports Server (NTRS)

    Monford, L. G., Jr.

    1971-01-01

    Digitizer accomplishes both N bit pulse code modulation /PCM/ and delta modulation, and provides modulation indicating variable signal gain and variable sidetone. Other features include - low package count, variable clock rate to optimize bandwidth, and easily expanded PCM output.

  2. "Tuberculosis Case Management" Training.

    ERIC Educational Resources Information Center

    Knebel, Elisa; Kolodner, Jennifer

    2001-01-01

    The need to isolated health providers with critical knowledge in tuberculosis (TB) case management prompted the development of "Tuberculosis Case Management" CD-ROM. Features include "Learning Center,""Examination Room," and "Library." The combination of audio, video, and graphics allows participants to…

  3. OSA severity assessment based on sleep breathing analysis using ambient microphone.

    PubMed

    Dafna, E; Tarasiuk, A; Zigel, Y

    2013-01-01

    In this paper, an audio-based system for severity estimation of obstructive sleep apnea (OSA) is proposed. The system estimates the apnea-hypopnea index (AHI), which is the average number of apneic events per hour of sleep. This system is based on a Gaussian mixture regression algorithm that was trained and validated on full-night audio recordings. Feature selection process using a genetic algorithm was applied to select the best features extracted from time and spectra domains. A total of 155 subjects, referred to in-laboratory polysomnography (PSG) study, were recruited. Using the PSG's AHI score as a gold-standard, the performances of the proposed system were evaluated using a Pearson correlation, AHI error, and diagnostic agreement methods. Correlation of R=0.89, AHI error of 7.35 events/hr, and diagnostic agreement of 77.3% were achieved, showing encouraging performances and a reliable non-contact alternative method for OSA severity estimation.

  4. Initial utilization of the CVIRB video production facility

    NASA Technical Reports Server (NTRS)

    Parrish, Russell V.; Busquets, Anthony M.; Hogge, Thomas W.

    1987-01-01

    Video disk technology is one of the central themes of a technology demonstrator workstation being assembled as a man/machine interface for the Space Station Data Management Test Bed at Johnson Space Center. Langley Research Center personnel involved in the conception and implementation of this workstation have assembled a video production facility to allow production of video disk material for this propose. This paper documents the initial familiarization efforts in the field of video production for those personnel and that facility. Although the entire video disk production cycle was not operational for this initial effort, the production of a simulated disk on video tape did acquaint the personnel with the processes involved and with the operation of the hardware. Invaluable experience in storyboarding, script writing, audio and video recording, and audio and video editing was gained in the production process.

  5. Preliminary Plans. A Senior High School in the Bailey Hill Area, Eugene, Oregon.

    ERIC Educational Resources Information Center

    Lutes and Amundson, Architects and Community Planners, Springfield, OR.

    The design of this high school is explained by outlining the decision making process used by the architects. The following design criteria form the basis of this process--(1) design for expansion, (2) design for team teaching, (3) organized by function, (4) space for teachers, (5) space for instructional materials, (6) audio-visual communication…

  6. On application of kernel PCA for generating stimulus features for fMRI during continuous music listening.

    PubMed

    Tsatsishvili, Valeri; Burunat, Iballa; Cong, Fengyu; Toiviainen, Petri; Alluri, Vinoo; Ristaniemi, Tapani

    2018-06-01

    There has been growing interest towards naturalistic neuroimaging experiments, which deepen our understanding of how human brain processes and integrates incoming streams of multifaceted sensory information, as commonly occurs in real world. Music is a good example of such complex continuous phenomenon. In a few recent fMRI studies examining neural correlates of music in continuous listening settings, multiple perceptual attributes of music stimulus were represented by a set of high-level features, produced as the linear combination of the acoustic descriptors computationally extracted from the stimulus audio. NEW METHOD: fMRI data from naturalistic music listening experiment were employed here. Kernel principal component analysis (KPCA) was applied to acoustic descriptors extracted from the stimulus audio to generate a set of nonlinear stimulus features. Subsequently, perceptual and neural correlates of the generated high-level features were examined. The generated features captured musical percepts that were hidden from the linear PCA features, namely Rhythmic Complexity and Event Synchronicity. Neural correlates of the new features revealed activations associated to processing of complex rhythms, including auditory, motor, and frontal areas. Results were compared with the findings in the previously published study, which analyzed the same fMRI data but applied linear PCA for generating stimulus features. To enable comparison of the results, methodology for finding stimulus-driven functional maps was adopted from the previous study. Exploiting nonlinear relationships among acoustic descriptors can lead to the novel high-level stimulus features, which can in turn reveal new brain structures involved in music processing. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. SNR-adaptive stream weighting for audio-MES ASR.

    PubMed

    Lee, Ki-Seung

    2008-08-01

    Myoelectric signals (MESs) from the speaker's mouth region have been successfully shown to improve the noise robustness of automatic speech recognizers (ASRs), thus promising to extend their usability in implementing noise-robust ASR. In the recognition system presented herein, extracted audio and facial MES features were integrated by a decision fusion method, where the likelihood score of the audio-MES observation vector was given by a linear combination of class-conditional observation log-likelihoods of two classifiers, using appropriate weights. We developed a weighting process adaptive to SNRs. The main objective of the paper involves determining the optimal SNR classification boundaries and constructing a set of optimum stream weights for each SNR class. These two parameters were determined by a method based on a maximum mutual information criterion. Acoustic and facial MES data were collected from five subjects, using a 60-word vocabulary. Four types of acoustic noise including babble, car, aircraft, and white noise were acoustically added to clean speech signals with SNR ranging from -14 to 31 dB. The classification accuracy of the audio ASR was as low as 25.5%. Whereas, the classification accuracy of the MES ASR was 85.2%. The classification accuracy could be further improved by employing the proposed audio-MES weighting method, which was as high as 89.4% in the case of babble noise. A similar result was also found for the other types of noise.

  8. Integrated science building

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Conklin, Shane

    2013-09-30

    Shell space fit out included faculty office advising space, student study space, staff restroom and lobby cafe. Electrical, HVAC and fire alarm installations and upgrades to existing systems were required to support the newly configured spaces. These installations and upgrades included audio/visual equipment, additional electrical outlets and connections to emergency generators. The project provided increased chilled water capacity with the addition of an electric centrifugal chiller. Upgrades associated with chiller included upgrade of exhaust ventilation fan, electrical conductor and breaker upgrades, piping and upgrades to air handling equipment.

  9. Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs

    PubMed Central

    ten Oever, Sanne; Sack, Alexander T.; Wheat, Katherine L.; Bien, Nina; van Atteveldt, Nienke

    2013-01-01

    Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception. PMID:23805110

  10. Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs.

    PubMed

    Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

    2013-01-01

    Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.

  11. Expedition 14 crew in the Zvezda Service module

    NASA Image and Video Library

    2006-12-25

    ISS014-E-10242 (25 Dec. 2006) --- Cosmonaut Mikhail Tyurin (left), Expedition 14 flight engineer representing Russia's Federal Space Agency; astronaut Michael E. Lopez-Alegria, commander and NASA space station science officer; and astronaut Sunita L. Williams, flight engineer, conduct a teleconference with the Moscow Support Group for the Russian New Year celebration, via Ku- and S-band, with audio and video relayed to the Mission Control Center at Johnson Space Center.

  12. Towards Structural Analysis of Audio Recordings in the Presence of Musical Variations

    NASA Astrophysics Data System (ADS)

    Müller, Meinard; Kurth, Frank

    2006-12-01

    One major goal of structural analysis of an audio recording is to automatically extract the repetitive structure or, more generally, the musical form of the underlying piece of music. Recent approaches to this problem work well for music, where the repetitions largely agree with respect to instrumentation and tempo, as is typically the case for popular music. For other classes of music such as Western classical music, however, musically similar audio segments may exhibit significant variations in parameters such as dynamics, timbre, execution of note groups, modulation, articulation, and tempo progression. In this paper, we propose a robust and efficient algorithm for audio structure analysis, which allows to identify musically similar segments even in the presence of large variations in these parameters. To account for such variations, our main idea is to incorporate invariance at various levels simultaneously: we design a new type of statistical features to absorb microvariations, introduce an enhanced local distance measure to account for local variations, and describe a new strategy for structure extraction that can cope with the global variations. Our experimental results with classical and popular music show that our algorithm performs successfully even in the presence of significant musical variations.

  13. Long-Term Animal Observation by Wireless Sensor Networks with Sound Recognition

    NASA Astrophysics Data System (ADS)

    Liu, Ning-Han; Wu, Chen-An; Hsieh, Shu-Ju

    Due to wireless sensor networks can transmit data wirelessly and can be disposed easily, they are used in the wild to monitor the change of environment. However, the lifetime of sensor is limited by the battery, especially when the monitored data type is audio, the lifetime is very short due to a huge amount of data transmission. By intuition, sensor mote analyzes the sensed data and decides not to deliver them to server that can reduce the expense of energy. Nevertheless, the ability of sensor mote is not powerful enough to work on complicated methods. Therefore, it is an urgent issue to design a method to keep analyzing speed and accuracy under the restricted memory and processor. This research proposed an embedded audio processing module in the sensor mote to extract and analyze audio features in advance. Then, through the estimation of likelihood of observed animal sound by the frequencies distribution, only the interesting audio data are sent back to server. The prototype of WSN system is built and examined in the wild to observe frogs. According to the results of experiments, the energy consumed by sensors through our method can be reduced effectively to prolong the observing time of animal detecting sensors.

  14. Digital Multicasting of Multiple Audio Streams

    NASA Technical Reports Server (NTRS)

    Macha, Mitchell; Bullock, John

    2007-01-01

    The Mission Control Center Voice Over Internet Protocol (MCC VOIP) system (see figure) comprises hardware and software that effect simultaneous, nearly real-time transmission of as many as 14 different audio streams to authorized listeners via the MCC intranet and/or the Internet. The original version of the MCC VOIP system was conceived to enable flight-support personnel located in offices outside a spacecraft mission control center to monitor audio loops within the mission control center. Different versions of the MCC VOIP system could be used for a variety of public and commercial purposes - for example, to enable members of the general public to monitor one or more NASA audio streams through their home computers, to enable air-traffic supervisors to monitor communication between airline pilots and air-traffic controllers in training, and to monitor conferences among brokers in a stock exchange. At the transmitting end, the audio-distribution process begins with feeding the audio signals to analog-to-digital converters. The resulting digital streams are sent through the MCC intranet, using a user datagram protocol (UDP), to a server that converts them to encrypted data packets. The encrypted data packets are then routed to the personal computers of authorized users by use of multicasting techniques. The total data-processing load on the portion of the system upstream of and including the encryption server is the total load imposed by all of the audio streams being encoded, regardless of the number of the listeners or the number of streams being monitored concurrently by the listeners. The personal computer of a user authorized to listen is equipped with special- purpose MCC audio-player software. When the user launches the program, the user is prompted to provide identification and a password. In one of two access- control provisions, the program is hard-coded to validate the user s identity and password against a list maintained on a domain-controller computer at the MCC. In the other access-control provision, the program verifies that the user is authorized to have access to the audio streams. Once both access-control checks are completed, the audio software presents a graphical display that includes audiostream-selection buttons and volume-control sliders. The user can select all or any subset of the available audio streams and can adjust the volume of each stream independently of that of the other streams. The audio-player program spawns a "read" process for the selected stream(s). The spawned process sends, to the router(s), a "multicast-join" request for the selected streams. The router(s) responds to the request by sending the encrypted multicast packets to the spawned process. The spawned process receives the encrypted multicast packets and sends a decryption packet to audio-driver software. As the volume or muting features are changed by the user, interrupts are sent to the spawned process to change the corresponding attributes sent to the audio-driver software. The total latency of this system - that is, the total time from the origination of the audio signals to generation of sound at a listener s computer - lies between four and six seconds.

  15. Multimodal Speaker Diarization.

    PubMed

    Noulas, A; Englebienne, G; Krose, B J A

    2012-01-01

    We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovisual space. The framework is very robust to different contexts, makes no assumptions about the location of the recording equipment, and does not require labeled training data as it acquires the model parameters using the Expectation Maximization (EM) algorithm. We apply the proposed model to two meeting videos and a news broadcast video, all of which come from publicly available data sets. The results acquired in speaker diarization are in favor of the proposed multimodal framework, which outperforms the single modality analysis results and improves over the state-of-the-art audio-based speaker diarization.

  16. Virtual classroom

    NASA Astrophysics Data System (ADS)

    Carlowicz, Michael

    After four decades of perfecting techniques for communication with spacecraft on the way to other worlds, space scientists are now working on new ways to reach students in this one. In a partnership between NASA and the University of North Dakota (UND), scientists and engineers from both institutions will soon lead an experiment in Internet learning.Starting January 22, UND will offer a threemonth computerized course in telerobotics. Using RealAudio and CU-SeeMe channels of the Internet to allow real-time transmission of video and audio, instructors will teach college-and graduate-level students the fundamentals of the remote operation and control of a robot.

  17. Burbank uses video camera during installation and routing of HRCS Video Cables

    NASA Image and Video Library

    2012-02-01

    ISS030-E-060104 (1 Feb. 2012) --- NASA astronaut Dan Burbank, Expedition 30 commander, uses a video camera in the Destiny laboratory of the International Space Station during installation and routing of video cable for the High Rate Communication System (HRCS). HRCS will allow for two additional space-to-ground audio channels and two additional downlink video channels.

  18. Non-RF wireless helmet-mounted display and two-way audio connectivity using covert free-space optical communications

    NASA Astrophysics Data System (ADS)

    Strauss, M.; Volfson, L.

    2011-06-01

    Providing the warfighter with Head or Helmet Mounted Displays (HMDs) while in tracked vehicles provides a means to visually maintain access to systems information while in a high vibration environment. The high vibration and unique environment of military tracked and turreted vehicles impact the ability to distinctly see certain information on an HMD, especially small font size or graphics and information that requires long fixation (staring), rather than a brief or peripheral glance. The military and commercial use of HMDs was compiled from market research, market trends, and user feedback. Lessons learned from previous military and commercial use of HMD products were derived to determine the feasibility of HMDs use in the high vibration and the unique environments of tracked vehicles. The results are summarized into factors that determine HMD features which must be specified for successful implementation.

  19. Audio-vestibular signs and symptoms in Chiari malformation type i. Case series and literature review.

    PubMed

    Guerra Jiménez, Gloria; Mazón Gutiérrez, Ángel; Marco de Lucas, Enrique; Valle San Román, Natalia; Martín Laez, Rubén; Morales Angulo, Carmelo

    2015-01-01

    Chiari malformation is an alteration of the base of the skull with herniation through the foramen magnum of the brain stem and cerebellum. Although the most common presentation is occipital headache, the association of audio-vestibular symptoms is not rare. The aim of our study was to describe audio-vestibular signs and symptoms in Chiari malformation type i (CM-I). We performed a retrospective observational study of patients referred to our unit during the last 5 years. We also carried out a literature review of audio-vestibular signs and symptoms in this disease. There were 9 patients (2 males and 7 females), with an average age of 42.8 years. Five patients presented a Ménière-like syndrome; 2 cases, a recurrent vertigo with peripheral features; one patient showed a sudden hearing loss; and one case suffered a sensorineural hearing loss with early childhood onset. The most common audio-vestibular symptom indicated in the literature in patients with CM-I is unsteadiness (49%), followed by dizziness (18%), nystagmus (15%) and hearing loss (15%). Nystagmus is frequently horizontal (74%) or down-beating (18%). Other audio-vestibular signs and symptoms are tinnitus (11%), aural fullness (10%) and hyperacusis (1%). Occipital headache that increases with Valsalva manoeuvres and hand paresthesias are very suggestive symptoms. The appearance of audio-vestibular manifestations in CM-I makes it common to refer these patients to neurotologists. Unsteadiness, vertiginous syndromes and sensorineural hearing loss are frequent. Nystagmus, especially horizontal and down-beating, is not rare. It is important for neurotologists to familiarise themselves with CM-I symptoms to be able to consider it in differential diagnosis. Copyright © 2014 Elsevier España, S.L.U. y Sociedad Española de Otorrinolaringología y Patología Cérvico-Facial. All rights reserved.

  20. Williams communicates with the boys and girls at Middle Tennessee Nashville School during Expedition 13

    NASA Image and Video Library

    2006-08-17

    ISS013-E-67441 (17 Aug. 2006) --- Astronaut Jeffrey N. Williams, Expedition 13 NASA space station science officer and flight engineer, holds a sleeping bag while conducting an educational teleconference with the Boys and Girls Clubs of Middle Tennessee in Nashville, via Ku- and S-band in the Destiny laboratory of the International Space Station, with audio and video relayed to the Mission Control Center at Johnson Space Center.

  1. The NT digital micro tape recorder

    NASA Technical Reports Server (NTRS)

    Sasaki, Toshikazu; Alstad, John; Younker, Mike

    1993-01-01

    The description of an audio recorder may at first glance seem out of place in a conference which has been dedicated to the discussion of the technology and requirements of mass data storage. However, there are several advanced features of the NT system which will be of interest to the mass storage technologist. Moreover, there are a sufficient number of data storage formats in current use which have evolved from their audio counterparts to recommend a close attention to major innovative introductions of audio storage formats. While the existing analog micro-cassette recorder has been (and will continue to be) adequate for various uses, there are significant benefits to be gained through the application of digital technology. The elimination of background tape hiss and the availability of two relatively wide band channels (for stereo recording), for example, would greatly enhance listenability and speech intelligibility. And with the use of advanced high-density recording and LSI circuit technologies, a digital micro recorder can realize unprecedented compactness with excellent energy efficiency. This is what was accomplished with the NT-1 Digital Micro Recorder. Its remarkably compact size contributes to its portability. The high-density NT format enables up to two hours of low-noise digital stereo recording on a cassette the size of a postage stamp. Its highly energy-efficient mechanical and electrical design results in low power consumption; the unit can be operated up to 7 hours (for continuous recording) on a single AA alkaline battery. Advanced user conveniences include a multifunction LCD readout. The unit's compactness and energy-efficiency, in particular, are attributes that cannot be matched by existing analog and digital audio formats. The size, performance, and features of the NT format are of benefit primarily to those who desire improved portability and audio quality in a personal memo product. The NT Recorder is the result of over ten years of intensive, multi-disciplinary research and development. What follows is a discussion of the technologies that have made the NT possible: (1) NT format mechanics, (2) NT media, (3) NT circuitry and board.

  2. Social and Physical Environmental Factors Influencing Adolescents' Physical Activity in Urban Public Open Spaces: A Qualitative Study Using Walk-Along Interviews.

    PubMed

    Van Hecke, Linde; Deforche, Benedicte; Van Dyck, Delfien; De Bourdeaudhuij, Ilse; Veitch, Jenny; Van Cauwenberg, Jelle

    2016-01-01

    Most previous studies examining physical activity in Public Open Spaces (POS) focused solely on the physical environment. However, according to socio-ecological models the social environment is important as well. The aim of this study was to determine which social and physical environmental factors affect adolescents' visitation and physical activity in POS in low-income neighbourhoods. Since current knowledge on this topic is limited, especially in Europe, qualitative walk-along interviews were used to obtain detailed and context-specific information. Participants (n = 30, aged 12-16 years, 64% boys) were recruited in POS in low-income neighbourhoods in Brussels, Ghent and Antwerp (Belgium). Participants were interviewed while walking in the POS with the interviewer. Using this method, the interviewer could observe and ask questions while the participant was actually experiencing the environment. All audio-recorded interviews were transcribed and analysed using Nvivo 10 software and thematic analysis was used to derive categories and subcategories using a grounded theory approach. The most important subcategories that were supportive of visiting POS and performing physical activity in POS were; accessibility by foot/bicycle/public transport, located close to home/school, presence of (active) friends and family, cleanliness of the POS and features, availability of sport and play facilities, large open spaces and beautiful sceneries. The most important subcategories that were unsupportive of visiting POS and physical activity in POS were; presence of undesirable users (drug users, gangs and homeless people), the behaviour of other users and the cleanliness of the POS and features. Social factors appeared often more influential than physical factors, however, it was the combination of social and physical factors that affected adolescents' behaviour in POS. Easily accessible POS with high quality features in the proximity of adolescents' home or school may stimulate physical activity, if adolescents also experience a safe and familiar social environment.

  3. Social and Physical Environmental Factors Influencing Adolescents’ Physical Activity in Urban Public Open Spaces: A Qualitative Study Using Walk-Along Interviews

    PubMed Central

    Van Hecke, Linde; Deforche, Benedicte; Van Dyck, Delfien; De Bourdeaudhuij, Ilse; Veitch, Jenny; Van Cauwenberg, Jelle

    2016-01-01

    Most previous studies examining physical activity in Public Open Spaces (POS) focused solely on the physical environment. However, according to socio-ecological models the social environment is important as well. The aim of this study was to determine which social and physical environmental factors affect adolescents’ visitation and physical activity in POS in low-income neighbourhoods. Since current knowledge on this topic is limited, especially in Europe, qualitative walk-along interviews were used to obtain detailed and context-specific information. Participants (n = 30, aged 12–16 years, 64% boys) were recruited in POS in low-income neighbourhoods in Brussels, Ghent and Antwerp (Belgium). Participants were interviewed while walking in the POS with the interviewer. Using this method, the interviewer could observe and ask questions while the participant was actually experiencing the environment. All audio-recorded interviews were transcribed and analysed using Nvivo 10 software and thematic analysis was used to derive categories and subcategories using a grounded theory approach. The most important subcategories that were supportive of visiting POS and performing physical activity in POS were; accessibility by foot/bicycle/public transport, located close to home/school, presence of (active) friends and family, cleanliness of the POS and features, availability of sport and play facilities, large open spaces and beautiful sceneries. The most important subcategories that were unsupportive of visiting POS and physical activity in POS were; presence of undesirable users (drug users, gangs and homeless people), the behaviour of other users and the cleanliness of the POS and features. Social factors appeared often more influential than physical factors, however, it was the combination of social and physical factors that affected adolescents’ behaviour in POS. Easily accessible POS with high quality features in the proximity of adolescents’ home or school may stimulate physical activity, if adolescents also experience a safe and familiar social environment. PMID:27214385

  4. Optical Disk Technology.

    ERIC Educational Resources Information Center

    Abbott, George L.; And Others

    1987-01-01

    This special feature focuses on recent developments in optical disk technology. Nine articles discuss current trends, large scale image processing, data structures for optical disks, the use of computer simulators to create optical disks, videodisk use in training, interactive audio video systems, impacts on federal information policy, and…

  5. Cassini/Huygens your messages en route for Titan

    NASA Astrophysics Data System (ADS)

    1997-02-01

    This unprecedented operation will soon be coming to an end. The Internet site will not be accepting messages after 1 March 1997, although it will remain active to incorporate regularly updated information on the Cassini/Huygens mission at least until the launch in October 1997. ESA has decided to open the site to audio messages as well until 1 March 1997. Files, which must be no larger than 250 KB, should sent to the electronic letterbox sound@huygens.com in WAV or AIFF format. We shall be concluding this exceptional operation by producing a CD-ROM containing written and audio messages within the limits of the space available. We therefore propose that all media announce the forthcoming end of the operation and stress the additional possibility of sending in audio as well as written messages that will leave for Titan in October 1997. Radio or television stations wishing to offer listeners or viewers the opportunity to transmit their messages over the air can forward them to us on audio or video (Betacam) cassettes. N.B. No more than three twenty-second messages per station. For further information, please contact : ESA Public Relations Division Tel: +33.1.53.69.71.55 Fax : +33.1.53.69.76.90

  6. Securing Digital Audio using Complex Quadratic Map

    NASA Astrophysics Data System (ADS)

    Suryadi, MT; Satria Gunawan, Tjandra; Satria, Yudi

    2018-03-01

    In This digital era, exchanging data are common and easy to do, therefore it is vulnerable to be attacked and manipulated from unauthorized parties. One data type that is vulnerable to attack is digital audio. So, we need data securing method that is not vulnerable and fast. One of the methods that match all of those criteria is securing the data using chaos function. Chaos function that is used in this research is complex quadratic map (CQM). There are some parameter value that causing the key stream that is generated by CQM function to pass all 15 NIST test, this means that the key stream that is generated using this CQM is proven to be random. In addition, samples of encrypted digital sound when tested using goodness of fit test are proven to be uniform, so securing digital audio using this method is not vulnerable to frequency analysis attack. The key space is very huge about 8.1×l031 possible keys and the key sensitivity is very small about 10-10, therefore this method is also not vulnerable against brute-force attack. And finally, the processing speed for both encryption and decryption process on average about 450 times faster that its digital audio duration.

  7. NASA Bluetooth Wireless Communications

    NASA Technical Reports Server (NTRS)

    Miller, Robert D.

    2007-01-01

    NASA has been interested in wireless communications for many years, especially when the crew size of the International Space Station (ISS) was reduced to two members. NASA began a study to find ways to improve crew efficiency to make sure the ISS could be maintained with limited crew capacity and still be a valuable research testbed in Low-Earth Orbit (LEO). Currently the ISS audio system requires astronauts to be tethered to the audio system, specifically a device called the Audio Terminal Unit (ATU). Wireless communications would remove the tether and allow astronauts to freely float from experiment to experiment without having to worry about moving and reconnecting the associated cabling or finding the space equivalent of an extension cord. A wireless communication system would also improve safety and reduce system susceptibility to Electromagnetic Interference (EMI). Safety would be improved because a crewmember could quickly escape a fire while maintaining communications with the ground and other crewmembers at any location. In addition, it would allow the crew to overcome the volume limitations of the ISS ATU. This is especially important to the Portable Breathing Apparatus (PBA). The next generation of space vehicles and habitats also demand wireless attention. Orion will carry up to six crewmembers in a relatively small cabin. Yet, wireless could become a driving factor to reduce launch weight and increase habitable volume. Six crewmembers, each tethered to a panel, could result in a wiring mess even in nominal operations. In addition to Orion, research is being conducted to determine if Bluetooth is appropriate for Lunar Habitat applications.

  8. Characterization of Clastic Dikes Using Controlled Source Audio Magnetotellurics

    NASA Astrophysics Data System (ADS)

    Persichetti, J. A.; Alumbaugh, D.

    2001-12-01

    A site consisting of 3D geology on the Hanford Reservation in Hanford, Washington, has been surveyed using Controlled Source Audio Magnetotellurics (CSAMT) to determine the method's ability to detect clastic dikes. The dikes are fine-grained, soft-sediment intrusions, formed by the buoyant rise of buried, unconsolidated, water rich mud into overlying unconsolidated sediment. The dikes are of major importance because they may act as natural barriers inhibiting the spread of contaminants, or as conduits, allowing the contaminants to be quickly wicked away from the contaminant storage tanks that may be located in close vicinity of the dikes. The field setup consisted of a 33 meter by 63 meter receiver grid with 3 meter spacing in all directions with the transmitter positioned 71.5 meters from the center of the receiver grid. A total of 12 frequencies were collected from 1.1kHz to 66.2kHz. The CSAMT data is being analyzed using a 2D CSAMT RRI code (Lu, Unsworth and Booker, 1999) and a 2D MT RRI code (Smith and Booker, 1991). Of interest is examining how well the 2D codes are able to map 3D geology, the level of resolution that is obtained, and how important it is to include the 3D source in the solution. The ultimate goal is to determine the applicability of using CSAMT for mapping these types of features at the Hanford Reservation site.

  9. Transitioning from analog to digital audio recording in childhood speech sound disorders.

    PubMed

    Shriberg, Lawrence D; McSweeny, Jane L; Anderson, Bruce E; Campbell, Thomas F; Chial, Michael R; Green, Jordan R; Hauner, Katherina K; Moore, Christopher A; Rusiewicz, Heather L; Wilson, David L

    2005-06-01

    Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants' speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise.

  10. Transitioning from analog to digital audio recording in childhood speech sound disorders

    PubMed Central

    Shriberg, Lawrence D.; McSweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.

    2014-01-01

    Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants’ speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise. PMID:16019779

  11. A Self-Paced Physical Geology Laboratory.

    ERIC Educational Resources Information Center

    Watson, Donald W.

    1983-01-01

    Describes a self-paced geology course utilizing a diversity of instructional techniques, including maps, models, samples, audio-visual materials, and a locally developed laboratory manual. Mechanical features are laboratory exercises, followed by unit quizzes; quizzes are repeated until the desired level of competence is attained. (Author/JN)

  12. Paper Trails

    ERIC Educational Resources Information Center

    Fernandez, Kim

    2010-01-01

    With more and more people attached to their computers, it's no wonder that publications are increasingly going online. Magazines are either supplementing their print content with online bonus information, such as extended features, photos, audio files, or videos, or looking to ditch the printing presses entirely to focus on all-electronic…

  13. Preplanning and Evaluating Video Documentaries and Features.

    ERIC Educational Resources Information Center

    Maynard, Riley

    1997-01-01

    This article presents a ten-part pre-production outline and post-production evaluation that helps communications students more effectively improve video skills. Examines camera movement and motion, camera angle and perspective, lighting, audio, graphics, backgrounds and color, special effects, editing, transitions, and music. Provides a glossary…

  14. Proposed patient motion monitoring system using feature point tracking with a web camera.

    PubMed

    Miura, Hideharu; Ozawa, Shuichi; Matsuura, Takaaki; Yamada, Kiyoshi; Nagata, Yasushi

    2017-12-01

    Patient motion monitoring systems play an important role in providing accurate treatment dose delivery. We propose a system that utilizes a web camera (frame rate up to 30 fps, maximum resolution of 640 × 480 pixels) and an in-house image processing software (developed using Microsoft Visual C++ and OpenCV). This system is simple to use and convenient to set up. The pyramidal Lucas-Kanade method was applied to calculate motions for each feature point by analysing two consecutive frames. The image processing software employs a color scheme where the defined feature points are blue under stable (no movement) conditions and turn red along with a warning message and an audio signal (beeping alarm) for large patient movements. The initial position of the marker was used by the program to determine the marker positions in all the frames. The software generates a text file that contains the calculated motion for each frame and saves it as a compressed audio video interleave (AVI) file. We proposed a patient motion monitoring system using a web camera, which is simple and convenient to set up, to increase the safety of treatment delivery.

  15. Automatic lip reading by using multimodal visual features

    NASA Astrophysics Data System (ADS)

    Takahashi, Shohei; Ohya, Jun

    2013-12-01

    Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.

  16. Can You See Me Now?

    ERIC Educational Resources Information Center

    Raths, David

    2013-01-01

    Ten years ago, integrating videoconferencing into a college course required considerable effort on the part of the instructor and IT support staff. Today, video- and web-conferencing tools are more sophisticated. Distance education has morphed from audio- and videocassettes featuring talking heads to a more interactive experience with greater…

  17. A Comparative Evaluation of Videodiscs for General Biology.

    ERIC Educational Resources Information Center

    Ralph, Charles L.

    1995-01-01

    Provides a brief profile of the currently available videodiscs for general biology, with comparable information for each. An introduction discusses benefits and problems associated with videodisc use in the classroom. Profiles contain information on description, good and bad features, still images, animations and movies, audio, software,…

  18. Enhancing L2 Reading Comprehension with Hypermedia Texts: Student Perceptions

    ERIC Educational Resources Information Center

    Garrett-Rucks, Paula; Howles, Les; Lake, William M.

    2015-01-01

    This study extends current research about L2 hypermedia texts by investigating the combined use of audiovisual features including: (a) Contextualized images, (b) rollover translations, (c) cultural information, (d) audio explanations and (e) comprehension check exercises. Specifically, student perceptions of hypermedia readings compared to…

  19. Supporting the Essential Elements with CD-ROM Storybooks

    ERIC Educational Resources Information Center

    Pearman, Cathy J.; Lefever-Davis, Shirley

    2006-01-01

    CD-ROM storybooks can support the development of the five essential elements of reading instruction identified by The National Reading Panel: phonemic awareness, phonics, fluency, vocabulary, and comprehension. Specific features inherent in these texts, audio pronunciation of text, embedded vocabulary definitions and animated graphics can be used…

  20. The Application of Acoustic Measurements and Audio Recordings for Diagnosis of In-Flight Hardware Anomalies

    NASA Technical Reports Server (NTRS)

    Welsh, David; Denham, Samuel; Allen, Christopher

    2011-01-01

    In many cases, an initial symptom of hardware malfunction is unusual or unexpected acoustic noise. Many industries such as automotive, heating and air conditioning, and petro-chemical processing use noise and vibration data along with rotating machinery analysis techniques to identify noise sources and correct hardware defects. The NASA/Johnson Space Center Acoustics Office monitors the acoustic environment of the International Space Station (ISS) through periodic sound level measurement surveys. Trending of the sound level measurement survey results can identify in-flight hardware anomalies. The crew of the ISS also serves as a "detection tool" in identifying unusual hardware noises; in these cases the spectral analysis of audio recordings made on orbit can be used to identify hardware defects that are related to rotating components such as fans, pumps, and compressors. In this paper, three examples of the use of sound level measurements and audio recordings for the diagnosis of in-flight hardware anomalies are discussed: identification of blocked inter-module ventilation (IMV) ducts, diagnosis of abnormal ISS Crew Quarters rack exhaust fan noise, and the identification and replacement of a defective flywheel assembly in the Treadmill with Vibration Isolation (TVIS) hardware. In each of these examples, crew time was saved by identifying the off nominal component or condition that existed and in directing in-flight maintenance activities to address and correct each of these problems.

  1. Intern Abstract for Spring 2016

    NASA Technical Reports Server (NTRS)

    Gibson, William

    2016-01-01

    The Human Interface Branch - EV3 - is evaluating Organic lighting-emitting diodes (OLEDs) as an upgrade for current displays on future spacecraft. OLEDs have many advantages over current displays. Conventional displays require constant backlighting which draws a lot of power, but with OLEDs they generate light themselves. OLEDs are lighter, and weight is always a concern with space launches. OLEDs also grant greater viewing angles. OLEDs have been in the commercial market for almost ten years now. What is not known is how they will perform in a space-like environment; specifically deep space far away from the Earth's magnetosphere. In this environment, the OLEDs can be expected to experience vacuum and galactic radiation. The intern's responsibility has been to prepare the OLED for a battery of tests. Unfortunately, it will not be ready for testing at the end of the internship. That being said much progress has been made: a) Developed procedures to safely disassemble the tablet. b) Inventoried and identified critical electronic components. c) 3D printed a testing apparatus. d) Wrote software in Python that will test the OLED screen while being radiated. e) Built circuits to restart the tablet and the test pattern, and ensure it doesn't fall asleep during radiation testing. f) Built enclosure that will house all of the electronics Also, the intern has been working on a way to take messages from a simulated Caution and Warnings system, process said messages into packets, send audio packets to a multicast address that audio boxes are listening to, and output spoken audio. Currently, Cautions and Warnings use a tone to alert crew members of a situation, and then crew members have to read through their checklists to determine what the tone means. In urgent situations, EV3 wants to deliver concise and specific alerts to the crew to facilitate any mitigation efforts on their part. Significant progress was made on this project: a) Open channel with the simulated Caution and Warning system to acquire messages. b) Configure audio boxes. c) Grab pre-recorded audio files. d) Packetize the audio stream. A third project that was assigned to implement LED indicator modules for an Omnibus project. The Omnibus project is investigating better ways designing lighting for the interior of spacecraft-both spacecraft lighting and avionics box status lighting indication. The current scheme contains too much of the blue light spectrum that disrupts the sleep cycle. The LED indicator modules are to simulate the indicators running on a spacecraft. Lighting data will be gathered by human factors personal and use in a model underdevelopment to model spacecraft lighting. Significant progress was made on this project: Designed circuit layout a) Tested LEDs at LETF. b) Created GUI for the indicators. c) Created code for the Arduino to run that will illuminate the indicator modules.

  2. User-oriented summary extraction for soccer video based on multimodal analysis

    NASA Astrophysics Data System (ADS)

    Liu, Huayong; Jiang, Shanshan; He, Tingting

    2011-11-01

    An advanced user-oriented summary extraction method for soccer video is proposed in this work. Firstly, an algorithm of user-oriented summary extraction for soccer video is introduced. A novel approach that integrates multimodal analysis, such as extraction and analysis of the stadium features, moving object features, audio features and text features is introduced. By these features the semantic of the soccer video and the highlight mode are obtained. Then we can find the highlight position and put them together by highlight degrees to obtain the video summary. The experimental results for sports video of world cup soccer games indicate that multimodal analysis is effective for soccer video browsing and retrieval.

  3. 36 CFR 902.82 - Fee schedule.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... operating duplicating machinery. Not included in direct costs are overhead expenses such as costs of space... form of paper copy, microform, audio-visual materials, or machine-readable documentation (e.g... programs of scholarly research. (5) Non-commercial scientific institution means an institution that is not...

  4. 36 CFR 902.82 - Fee schedule.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... operating duplicating machinery. Not included in direct costs are overhead expenses such as costs of space... form of paper copy, microform, audio-visual materials, or machine-readable documentation (e.g... programs of scholarly research. (5) Non-commercial scientific institution means an institution that is not...

  5. Evaluation of architectures for an ASP MPEG-4 decoder using a system-level design methodology

    NASA Astrophysics Data System (ADS)

    Garcia, Luz; Reyes, Victor; Barreto, Dacil; Marrero, Gustavo; Bautista, Tomas; Nunez, Antonio

    2005-06-01

    Trends in multimedia consumer electronics, digital video and audio, aim to reach users through low-cost mobile devices connected to data broadcasting networks with limited bandwidth. An emergent broadcasting network is the digital audio broadcasting network (DAB) which provides CD quality audio transmission together with robustness and efficiency techniques to allow good quality reception in motion conditions. This paper focuses on the system-level evaluation of different architectural options to allow low bandwidth digital video reception over DAB, based on video compression techniques. Profiling and design space exploration techniques are applied over the ASP MPEG-4 decoder in order to find out the best HW/SW partition given the application and platform constraints. An innovative SystemC-based system-level design tool, called CASSE, is being used for modelling, exploration and evaluation of different ASP MPEG-4 decoder HW/SW partitions. System-level trade offs and quantitative data derived from this analysis are also presented in this work.

  6. Live Ultra-High Definition from the International Space Station

    NASA Technical Reports Server (NTRS)

    Grubbs, Rodney; George, Sandy

    2017-01-01

    The first ever live downlink of Ultra-High Definition (UHD) video from the International Space Station (ISS) was the highlight of a 'Super Session' at the National Association of Broadcasters (NAB) in April 2017. The Ultra-High Definition video downlink from the ISS all the way to the Las Vegas Convention Center required considerable planning, pushed the limits of conventional video distribution from a space-craft, and was the first use of High Efficiency Video Coding (HEVC) from a space-craft. The live event at NAB will serve as a pathfinder for more routine downlinks of UHD as well as use of HEVC for conventional HD downlinks to save bandwidth. HEVC may also enable live Virtual Reality video downlinks from the ISS. This paper will describe the overall work flow and routing of the UHD video, how audio was synchronized even though the video and audio were received many seconds apart from each other, and how the demonstration paves the way for not only more efficient video distribution from the ISS, but also serves as a pathfinder for more complex video distribution from deep space. The paper will also describe how a 'live' event was staged when the UHD coming from the ISS had a latency of 10+ seconds. Finally, the paper will discuss how NASA is leveraging commercial technologies for use on-orbit vs. creating technology as was required during the Apollo Moon Program and early space age.

  7. Entropy Based Classifier Combination for Sentence Segmentation

    DTIC Science & Technology

    2007-01-01

    speaker diarization system to divide the audio data into hypothetical speakers [17...the prosodic feature also includes turn-based features which describe the position of a word in relation to diarization seg- mentation. The speaker ...ro- bust speaker segmentation: the ICSI-SRI fall 2004 diarization system,” in Proc. RT-04F Workshop, 2004. [18] “The rich transcription fall 2003,” http://nist.gov/speech/tests/rt/rt2003/fall/docs/rt03-fall-eval- plan-v9.pdf.

  8. You Asked, We Answered! A Podcasting Series by Scientists for K-12 Teachers Through the Pennsylvania Earth Science Teachers Association (PAESTA)

    NASA Astrophysics Data System (ADS)

    Guertin, L. A.; Tait, K.

    2015-12-01

    The Pennsylvania Earth Science Teachers Association (PAESTA) recently initiated a podcasting series "You Asked, We Answered!" for K-12 teachers to increase their science content knowledge through short audio podcasts, supplemented with relevant resources. The 2015-2016 PAESTA President Kathy Tait generated the idea of tapping in to the content expertise of higher education faculty, post-doctoral researchers, and graduate students to assist K-12 teachers with increasing their own Earth and space content knowledge. As time and resources for professional development are decreasing for K-12 teachers, PAESTA is committed to not only providing curricular resources through our online database of inquiry-based exercises in the PAESTA Classroom, but providing an opportunity to learn science content from professionals in an audio format.Our goal at PAESTA has been to release at least one new podcast per month that answers the questions asked by PAESTA members. Each podcast is recorded by an Earth/space science professional with content expertise and placed online with supporting images, links, and relevant exercises found in the PAESTA Classroom. Each podcast is available through the PAESTA website (http://www.paesta.psu.edu/podcasts) and PAESTA iTunes channel (https://itunes.apple.com/us/podcast/paesta-podcasts/id1017828453). For ADA compliance, the PAESTA website has a transcript for each audio file. In order to provide these podcasts, we need the participation of both K-12 teachers and science professionals. On the PAESTA Podcast website, K-12 teachers can submit discipline questions for us to pass along to our content experts, questions relating to the "what" and "how" of the Earth and space sciences, as well as questions about Earth and space science careers. We ask science professionals for help in answering the questions posed by teachers. We include online instructions and tips to help scientists generate their podcast and supporting materials.

  9. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    NASA Astrophysics Data System (ADS)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  10. Smithsonian Folkways: Resources for World and Folk Music Multimedia

    ERIC Educational Resources Information Center

    Beegle, Amy Christine

    2012-01-01

    This column describes multimedia resources available to teachers on the Smithsonian Folkways website. In addition to massive collections of audio and video recordings and advanced search tools already available through this website, the Smithsonian Global Sound educational initiative brought detailed lesson plans and interactive features to the…

  11. The Third Eye.

    ERIC Educational Resources Information Center

    Eken, Ali Nihat

    2002-01-01

    Argues that in the contemporary world, literacy is not limited to the printed page and students need to interpret a variety of audio and visual texts. Concentrates on feature films, suggesting ways of improving students' film literacy. Examines effects film literacy has on students and their learning. Concludes that critical literacy and higher…

  12. Use of Audiovisual Texts in University Education Process

    ERIC Educational Resources Information Center

    Aleksandrov, Evgeniy P.

    2014-01-01

    Audio-visual learning technologies offer great opportunities in the development of students' analytical and projective abilities. These technologies can be used in classroom activities and for homework. This article discusses the features of audiovisual media texts use in a series of social sciences and humanities in the University curriculum.

  13. Music as Narrative in American College Football

    ERIC Educational Resources Information Center

    McCluskey, John Michael

    2016-01-01

    American college football features an enormous amount of music woven into the fabric of the event, with selections accompanying approximately two-thirds of a game's plays. Musical selections are controlled by a number of forces, including audio and video technicians, university marketing departments, financial sponsors, and wind bands. These blend…

  14. An Insight into E-Collections

    ERIC Educational Resources Information Center

    Albert, Angeline Sheba; Navaraj, A. Johnson

    2006-01-01

    The present paper gives a brief introduction about E-collections. It discusses the e-books, e-journals, utility, features, advantages and issues for the development of e-collections. E-books will offer a rich learning experience, reinforced with audio, video, 3D animation and collaborative learning tools. E-journals on the other hand are…

  15. 78 FR 42072 - Consumer Advisory Committee

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-07-15

    ... people with disabilities. You can listen to the audio and use a screen reader to read displayed documents...://accessibleevent.com . The Web page prompts for an Event Code which is 005202376. To learn about the features of... accommodations for people with disabilities are available upon request. The request should include a detailed...

  16. Students participate in Congressional Night

    NASA Technical Reports Server (NTRS)

    1997-01-01

    Middle school students were offered a unique opportunity at Stennis Space Center to speak real-time through audio and visual means to NASA scientists in Washington D.C., about numerous research projects, such as the Martian meteorite NASA researchers claim contains fossilized proof that life existed on Mars.

  17. 25 CFR 517.3 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... sought to further scholarly research. (h) Record means all books, papers, maps, photographs, machine... as the cost of space, heating, or lighting of the facility in which the records are stored. (d... copies can take the form of, among other things, paper copy, microfilm, audio-visual materials, or...

  18. 36 CFR § 902.82 - Fee schedule.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... operating duplicating machinery. Not included in direct costs are overhead expenses such as costs of space... form of paper copy, microform, audio-visual materials, or machine-readable documentation (e.g... programs of scholarly research. (5) Non-commercial scientific institution means an institution that is not...

  19. 25 CFR 517.3 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... sought to further scholarly research. (h) Record means all books, papers, maps, photographs, machine... as the cost of space, heating, or lighting of the facility in which the records are stored. (d... copies can take the form of, among other things, paper copy, microfilm, audio-visual materials, or...

  20. 25 CFR 517.3 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... sought to further scholarly research. (h) Record means all books, papers, maps, photographs, machine... as the cost of space, heating, or lighting of the facility in which the records are stored. (d... copies can take the form of, among other things, paper copy, microfilm, audio-visual materials, or...

  1. 25 CFR 517.3 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... sought to further scholarly research. (h) Record means all books, papers, maps, photographs, machine... as the cost of space, heating, or lighting of the facility in which the records are stored. (d... copies can take the form of, among other things, paper copy, microfilm, audio-visual materials, or...

  2. 36 CFR 1120.2 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... operating duplicating machinery. Not included in direct costs are overhead expenses such as costs of space... FOIA request. Such copies can take the form of paper copy, microform, audio-visual materials, or... research. (n) Non-Commercial Scientific Institution refers to an institution that is not operated on a...

  3. 36 CFR 1120.2 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... operating duplicating machinery. Not included in direct costs are overhead expenses such as costs of space... FOIA request. Such copies can take the form of paper copy, microform, audio-visual materials, or... research. (n) Non-Commercial Scientific Institution refers to an institution that is not operated on a...

  4. 25 CFR 517.3 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... sought to further scholarly research. (h) Record means all books, papers, maps, photographs, machine... as the cost of space, heating, or lighting of the facility in which the records are stored. (d... copies can take the form of, among other things, paper copy, microfilm, audio-visual materials, or...

  5. Advanced Texas Studies: Curriculum Guide.

    ERIC Educational Resources Information Center

    Harlandale Independent School District, San Antonio, TX. Career Education Center.

    The guide is arranged in vertical columns relating curriculum concepts in Texas studies to curriculum performance objectives, career concepts and career performance objectives, suggested teaching methods, and audio-visual and resource materials. Career information is included on 24 related occupations. Space is provided for teachers' notes which…

  6. Passing the Baton: An Experimental Study of Shift Handover

    NASA Technical Reports Server (NTRS)

    Parke, Bonny; Hobbs, Alan; Kanki, Barbara

    2010-01-01

    Shift handovers occur in many safety-critical environments, including aviation maintenance, medicine, air traffic control, and mission control for space shuttle and space station operations. Shift handovers are associated with increased risk of communication failures and human error. In dynamic industries, errors and accidents occur disproportionately after shift handover. Typical shift handovers involve transferring information from an outgoing shift to an incoming shift via written logs, or in some cases, face-to-face briefings. The current study explores the possibility of improving written communication with the support modalities of audio and video recordings, as well as face-to-face briefings. Fifty participants participated in an experimental task which mimicked some of the critical challenges involved in transferring information between shifts in industrial settings. All three support modalities, face-to-face, video, and audio recordings, reduced task errors significantly over written communication alone. The support modality most preferred by participants was face-to-face communication; the least preferred was written communication alone.

  7. Audio Restoration

    NASA Astrophysics Data System (ADS)

    Esquef, Paulo A. A.

    The first reproducible recording of human voice was made in 1877 on a tinfoil cylinder phonograph devised by Thomas A. Edison. Since then, much effort has been expended to find better ways to record and reproduce sounds. By the mid-1920s, the first electrical recordings appeared and gradually took over purely acoustic recordings. The development of electronic computers, in conjunction with the ability to record data onto magnetic or optical media, culminated in the standardization of compact disc format in 1980. Nowadays, digital technology is applied to several audio applications, not only to improve the quality of modern and old recording/reproduction techniques, but also to trade off sound quality for less storage space and less taxing transmission capacity requirements.

  8. Audible vision for the blind and visually impaired in indoor open spaces.

    PubMed

    Yu, Xunyi; Ganz, Aura

    2012-01-01

    In this paper we introduce Audible Vision, a system that can help blind and visually impaired users navigate in large indoor open spaces. The system uses computer vision to estimate the location and orientation of the user, and enables the user to perceive his/her relative position to a landmark through 3D audio. Testing shows that Audible Vision can work reliably in real-life ever-changing environment crowded with people.

  9. Embedded security system for multi-modal surveillance in a railway carriage

    NASA Astrophysics Data System (ADS)

    Zouaoui, Rhalem; Audigier, Romaric; Ambellouis, Sébastien; Capman, François; Benhadda, Hamid; Joudrier, Stéphanie; Sodoyer, David; Lamarque, Thierry

    2015-10-01

    Public transport security is one of the main priorities of the public authorities when fighting against crime and terrorism. In this context, there is a great demand for autonomous systems able to detect abnormal events such as violent acts aboard passenger cars and intrusions when the train is parked at the depot. To this end, we present an innovative approach which aims at providing efficient automatic event detection by fusing video and audio analytics and reducing the false alarm rate compared to classical stand-alone video detection. The multi-modal system is composed of two microphones and one camera and integrates onboard video and audio analytics and fusion capabilities. On the one hand, for detecting intrusion, the system relies on the fusion of "unusual" audio events detection with intrusion detections from video processing. The audio analysis consists in modeling the normal ambience and detecting deviation from the trained models during testing. This unsupervised approach is based on clustering of automatically extracted segments of acoustic features and statistical Gaussian Mixture Model (GMM) modeling of each cluster. The intrusion detection is based on the three-dimensional (3D) detection and tracking of individuals in the videos. On the other hand, for violent events detection, the system fuses unsupervised and supervised audio algorithms with video event detection. The supervised audio technique detects specific events such as shouts. A GMM is used to catch the formant structure of a shout signal. Video analytics use an original approach for detecting aggressive motion by focusing on erratic motion patterns specific to violent events. As data with violent events is not easily available, a normality model with structured motions from non-violent videos is learned for one-class classification. A fusion algorithm based on Dempster-Shafer's theory analyses the asynchronous detection outputs and computes the degree of belief of each probable event.

  10. Virtual environment display for a 3D audio room simulation

    NASA Astrophysics Data System (ADS)

    Chapin, William L.; Foster, Scott

    1992-06-01

    Recent developments in virtual 3D audio and synthetic aural environments have produced a complex acoustical room simulation. The acoustical simulation models a room with walls, ceiling, and floor of selected sound reflecting/absorbing characteristics and unlimited independent localizable sound sources. This non-visual acoustic simulation, implemented with 4 audio ConvolvotronsTM by Crystal River Engineering and coupled to the listener with a Poihemus IsotrakTM, tracking the listener's head position and orientation, and stereo headphones returning binaural sound, is quite compelling to most listeners with eyes closed. This immersive effect should be reinforced when properly integrated into a full, multi-sensory virtual environment presentation. This paper discusses the design of an interactive, visual virtual environment, complementing the acoustic model and specified to: 1) allow the listener to freely move about the space, a room of manipulable size, shape, and audio character, while interactively relocating the sound sources; 2) reinforce the listener's feeling of telepresence into the acoustical environment with visual and proprioceptive sensations; 3) enhance the audio with the graphic and interactive components, rather than overwhelm or reduce it; and 4) serve as a research testbed and technology transfer demonstration. The hardware/software design of two demonstration systems, one installed and one portable, are discussed through the development of four iterative configurations. The installed system implements a head-coupled, wide-angle, stereo-optic tracker/viewer and multi-computer simulation control. The portable demonstration system implements a head-mounted wide-angle, stereo-optic display, separate head and pointer electro-magnetic position trackers, a heterogeneous parallel graphics processing system, and object oriented C++ program code.

  11. Multifunctional microcontrollable interface module

    NASA Astrophysics Data System (ADS)

    Spitzer, Mark B.; Zavracky, Paul M.; Rensing, Noa M.; Crawford, J.; Hockman, Angela H.; Aquilino, P. D.; Girolamo, Henry J.

    2001-08-01

    This paper reports the development of a complete eyeglass- mounted computer interface system including display, camera and audio subsystems. The display system provides an SVGA image with a 20 degree horizontal field of view. The camera system has been optimized for face recognition and provides a 19 degree horizontal field of view. A microphone and built-in pre-amp optimized for voice recognition and a speaker on an articulated arm are included for audio. An important feature of the system is a high degree of adjustability and reconfigurability. The system has been developed for testing by the Military Police, in a complete system comprising the eyeglass-mounted interface, a wearable computer, and an RF link. Details of the design, construction, and performance of the eyeglass-based system are discussed.

  12. Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

    NASA Astrophysics Data System (ADS)

    Feldbauer, Christian; Kubin, Gernot; Kleijn, W. Bastiaan

    2005-12-01

    Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel) coding.

  13. Quantifying auditory temporal stability in a large database of recorded music.

    PubMed

    Ellis, Robert J; Duan, Zhiyan; Wang, Ye

    2014-01-01

    "Moving to the beat" is both one of the most basic and one of the most profound means by which humans (and a few other species) interact with music. Computer algorithms that detect the precise temporal location of beats (i.e., pulses of musical "energy") in recorded music have important practical applications, such as the creation of playlists with a particular tempo for rehabilitation (e.g., rhythmic gait training), exercise (e.g., jogging), or entertainment (e.g., continuous dance mixes). Although several such algorithms return simple point estimates of an audio file's temporal structure (e.g., "average tempo", "time signature"), none has sought to quantify the temporal stability of a series of detected beats. Such a method--a "Balanced Evaluation of Auditory Temporal Stability" (BEATS)--is proposed here, and is illustrated using the Million Song Dataset (a collection of audio features and music metadata for nearly one million audio files). A publically accessible web interface is also presented, which combines the thresholdable statistics of BEATS with queryable metadata terms, fostering potential avenues of research and facilitating the creation of highly personalized music playlists for clinical or recreational applications.

  14. Unsupervised Decoding of Long-Term, Naturalistic Human Neural Recordings with Automated Video and Audio Annotations

    PubMed Central

    Wang, Nancy X. R.; Olson, Jared D.; Ojemann, Jeffrey G.; Rao, Rajesh P. N.; Brunton, Bingni W.

    2016-01-01

    Fully automated decoding of human activities and intentions from direct neural recordings is a tantalizing challenge in brain-computer interfacing. Implementing Brain Computer Interfaces (BCIs) outside carefully controlled experiments in laboratory settings requires adaptive and scalable strategies with minimal supervision. Here we describe an unsupervised approach to decoding neural states from naturalistic human brain recordings. We analyzed continuous, long-term electrocorticography (ECoG) data recorded over many days from the brain of subjects in a hospital room, with simultaneous audio and video recordings. We discovered coherent clusters in high-dimensional ECoG recordings using hierarchical clustering and automatically annotated them using speech and movement labels extracted from audio and video. To our knowledge, this represents the first time techniques from computer vision and speech processing have been used for natural ECoG decoding. Interpretable behaviors were decoded from ECoG data, including moving, speaking and resting; the results were assessed by comparison with manual annotation. Discovered clusters were projected back onto the brain revealing features consistent with known functional areas, opening the door to automated functional brain mapping in natural settings. PMID:27148018

  15. 5 CFR 1303.30 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... included in direct costs are overhead expenses such as costs of space, and heating or lighting the facility... request. Such copies can take the form of paper, microform, audio-visual materials, or electronic records... institution of vocational education, that operates a program or programs of scholarly research. (i) The term...

  16. 5 CFR 2502.11 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... included in direct costs are overhead expenses such as costs of space, and heating or lighting the facility... FOIA request. Such copies can take the form of paper copy, microform, audio-visual materials, or... operates a program or programs of scholarly research. (i) The term non-commercial scientific institution...

  17. 5 CFR 2502.11 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... included in direct costs are overhead expenses such as costs of space, and heating or lighting the facility... FOIA request. Such copies can take the form of paper copy, microform, audio-visual materials, or... operates a program or programs of scholarly research. (i) The term non-commercial scientific institution...

  18. 5 CFR 2502.11 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... included in direct costs are overhead expenses such as costs of space, and heating or lighting the facility... FOIA request. Such copies can take the form of paper copy, microform, audio-visual materials, or... operates a program or programs of scholarly research. (i) The term non-commercial scientific institution...

  19. 5 CFR 1303.30 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... included in direct costs are overhead expenses such as costs of space, and heating or lighting the facility... request. Such copies can take the form of paper, microform, audio-visual materials, or electronic records... institution of vocational education, that operates a program or programs of scholarly research. (i) The term...

  20. 36 CFR § 1120.2 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... operating duplicating machinery. Not included in direct costs are overhead expenses such as costs of space... FOIA request. Such copies can take the form of paper copy, microform, audio-visual materials, or... research. (n) Non-Commercial Scientific Institution refers to an institution that is not operated on a...

  1. 5 CFR 2411.13 - Fees.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... duplicating machinery. Not included in direct costs are overhead expenses such as costs of space, and heating... a FOIA request. Such copies can take the form of paper copy, microfilm, audio-visual materials, or... vocational education, which operates a program or programs of scholarly research. (7) The term non-commercial...

  2. 5 CFR 1303.30 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... included in direct costs are overhead expenses such as costs of space, and heating or lighting the facility... request. Such copies can take the form of paper, microform, audio-visual materials, or electronic records... institution of vocational education, that operates a program or programs of scholarly research. (i) The term...

  3. 5 CFR 2411.13 - Fees.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... duplicating machinery. Not included in direct costs are overhead expenses such as costs of space, and heating... a FOIA request. Such copies can take the form of paper copy, microfilm, audio-visual materials, or... vocational education, which operates a program or programs of scholarly research. (7) The term non-commercial...

  4. 5 CFR 2411.13 - Fees.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... duplicating machinery. Not included in direct costs are overhead expenses such as costs of space, and heating... a FOIA request. Such copies can take the form of paper copy, microfilm, audio-visual materials, or... vocational education, which operates a program or programs of scholarly research. (7) The term non-commercial...

  5. Aviation & Space Education: A Teacher's Resource Guide.

    ERIC Educational Resources Information Center

    Texas State Dept. of Aviation, Austin.

    This resource guide contains information on curriculum guides, resources for teachers, computer software and computer related programs, audio/visual presentations, model aircraft and demonstration aids, training seminars and career education, and an aerospace bibliography for primary grades. Each entry includes all or some of the following items:…

  6. A Cough-Based Algorithm for Automatic Diagnosis of Pertussis.

    PubMed

    Pramono, Renard Xaviero Adhi; Imtiaz, Syed Anas; Rodriguez-Villegas, Esther

    2016-01-01

    Pertussis is a contagious respiratory disease which mainly affects young children and can be fatal if left untreated. The World Health Organization estimates 16 million pertussis cases annually worldwide resulting in over 200,000 deaths. It is prevalent mainly in developing countries where it is difficult to diagnose due to the lack of healthcare facilities and medical professionals. Hence, a low-cost, quick and easily accessible solution is needed to provide pertussis diagnosis in such areas to contain an outbreak. In this paper we present an algorithm for automated diagnosis of pertussis using audio signals by analyzing cough and whoop sounds. The algorithm consists of three main blocks to perform automatic cough detection, cough classification and whooping sound detection. Each of these extract relevant features from the audio signal and subsequently classify them using a logistic regression model. The output from these blocks is collated to provide a pertussis likelihood diagnosis. The performance of the proposed algorithm is evaluated using audio recordings from 38 patients. The algorithm is able to diagnose all pertussis successfully from all audio recordings without any false diagnosis. It can also automatically detect individual cough sounds with 92% accuracy and PPV of 97%. The low complexity of the proposed algorithm coupled with its high accuracy demonstrates that it can be readily deployed using smartphones and can be extremely useful for quick identification or early screening of pertussis and for infection outbreaks control.

  7. A Cough-Based Algorithm for Automatic Diagnosis of Pertussis

    PubMed Central

    Pramono, Renard Xaviero Adhi; Imtiaz, Syed Anas; Rodriguez-Villegas, Esther

    2016-01-01

    Pertussis is a contagious respiratory disease which mainly affects young children and can be fatal if left untreated. The World Health Organization estimates 16 million pertussis cases annually worldwide resulting in over 200,000 deaths. It is prevalent mainly in developing countries where it is difficult to diagnose due to the lack of healthcare facilities and medical professionals. Hence, a low-cost, quick and easily accessible solution is needed to provide pertussis diagnosis in such areas to contain an outbreak. In this paper we present an algorithm for automated diagnosis of pertussis using audio signals by analyzing cough and whoop sounds. The algorithm consists of three main blocks to perform automatic cough detection, cough classification and whooping sound detection. Each of these extract relevant features from the audio signal and subsequently classify them using a logistic regression model. The output from these blocks is collated to provide a pertussis likelihood diagnosis. The performance of the proposed algorithm is evaluated using audio recordings from 38 patients. The algorithm is able to diagnose all pertussis successfully from all audio recordings without any false diagnosis. It can also automatically detect individual cough sounds with 92% accuracy and PPV of 97%. The low complexity of the proposed algorithm coupled with its high accuracy demonstrates that it can be readily deployed using smartphones and can be extremely useful for quick identification or early screening of pertussis and for infection outbreaks control. PMID:27583523

  8. 2D and 3D separate and joint inversion of airborne ZTEM and ground AMT data: Synthetic model studies

    NASA Astrophysics Data System (ADS)

    Sasaki, Yutaka; Yi, Myeong-Jong; Choi, Jihyang

    2014-05-01

    The ZTEM (Z-axis Tipper Electromagnetic) method measures naturally occurring audio-frequency magnetic fields and obtains the tipper function that defines the relationship among the three components of the magnetic field. Since the anomalous tipper responses are caused by the presence of lateral resistivity variations, the ZTEM survey is most suited for detecting and delineating conductive bodies extending to considerable depths, such as graphitic dykes encountered in the exploration of unconformity type uranium deposit. Our simulations shows that inversion of ZTEM data can detect reasonably well multiple conductive dykes placed 1 km apart. One important issue regarding ZTEM inversion is the effect of the initial model, because homogeneous half-space and (1D) layered structures produce no responses. For the 2D model with multiple conductive dykes, the inversion results were useful for locating the dykes even when the initial model was not close to the true background resistivity. For general 3D structures, however, the resolution of the conductive bodies can be reduced considerably depending on the initial model. This is because the tipper magnitudes from 3D conductors are smaller due to boundary charges than the 2D responses. To alleviate this disadvantage of ZTEM surveys, we combined ZTEM and audio-frequency magnetotelluric (AMT) data. Inversion of sparse AMT data was shown to be effective in providing a good initial model for ZTEM inversion. Moreover, simultaneously inverting both data sets led to better results than the sequential approach by enabling to identify structural features that were difficult to resolve from the individual data sets.

  9. The Emotion Recognition System Based on Autoregressive Model and Sequential Forward Feature Selection of Electroencephalogram Signals

    PubMed Central

    Hatamikia, Sepideh; Maghooli, Keivan; Nasrabadi, Ali Motie

    2014-01-01

    Electroencephalogram (EEG) is one of the useful biological signals to distinguish different brain diseases and mental states. In recent years, detecting different emotional states from biological signals has been merged more attention by researchers and several feature extraction methods and classifiers are suggested to recognize emotions from EEG signals. In this research, we introduce an emotion recognition system using autoregressive (AR) model, sequential forward feature selection (SFS) and K-nearest neighbor (KNN) classifier using EEG signals during emotional audio-visual inductions. The main purpose of this paper is to investigate the performance of AR features in the classification of emotional states. To achieve this goal, a distinguished AR method (Burg's method) based on Levinson-Durbin's recursive algorithm is used and AR coefficients are extracted as feature vectors. In the next step, two different feature selection methods based on SFS algorithm and Davies–Bouldin index are used in order to decrease the complexity of computing and redundancy of features; then, three different classifiers include KNN, quadratic discriminant analysis and linear discriminant analysis are used to discriminate two and three different classes of valence and arousal levels. The proposed method is evaluated with EEG signals of available database for emotion analysis using physiological signals, which are recorded from 32 participants during 40 1 min audio visual inductions. According to the results, AR features are efficient to recognize emotional states from EEG signals, and KNN performs better than two other classifiers in discriminating of both two and three valence/arousal classes. The results also show that SFS method improves accuracies by almost 10-15% as compared to Davies–Bouldin based feature selection. The best accuracies are %72.33 and %74.20 for two classes of valence and arousal and %61.10 and %65.16 for three classes, respectively. PMID:25298928

  10. Building Your Own Web Course: The Case for Off-the-Shelf Component Software.

    ERIC Educational Resources Information Center

    Kaplan, Howard

    1998-01-01

    Compares the features, advantages, and disadvantages of two major software options available for designing web courses: (1) component, off-the shelf software that allows for creation of audio slide lectures, course materials, discussion forums, animations, synchronous chat groups, quiz creators, and electronic mail, and (2) integrated packages…

  11. CPFP Video | Cancer Prevention Fellowship Program

    Cancer.gov

    The Cancer Prevention Fellowship Program (CPFP) trains future leaders in the field of cancer prevention and control. This video will highlight unique features of the CPFP through testimonials from current fellows and alumni, remarks from the director, and reflections from the Director of the Division of Cancer Prevention, NCI. Audio described version of the CPFP video

  12. Interactive Videodisc as a Component in a Multi-Method Approach to Anatomy and Physiology.

    ERIC Educational Resources Information Center

    Wheeler, Donald A.; Wheeler, Mary Jane

    At Cuyahoga Community College (Ohio), computer-controlled interactive videodisc technology is being used as one of several instructional methods to teach anatomy and physiology. The system has the following features: audio-visual instruction, interaction with immediate feedback, self-pacing, fill-in-the-blank quizzes for testing total recall,…

  13. Teaching Biology on the Internet.

    ERIC Educational Resources Information Center

    Ingebritsen, Thomas S.; Brown, George G.; Pleasants, John M.

    Iowa State University, through a program called Project BIO, is using an innovative new approach to offer biology courses via the World Wide Web. The approach features online lectures similar to those a student might experience in a traditional classroom. Students listen to the lectures using RealAudio while viewing lecture materials with a Web…

  14. Multimedia Projects in Education: Designing, Producing, and Assessing, Third Edition

    ERIC Educational Resources Information Center

    Ivers, Karen S.; Barron, Ann E.

    2005-01-01

    Building on the materials in the two previous successful editions, this book features approximately 40% all new material and updates the previous information. The authors use the DDD-E model (Decide, Design, Develop--Evaluate) to show how to select and plan multimedia projects, use presentation and development tools, manage graphics, audio, and…

  15. Cross-Modal Approach for Karaoke Artifacts Correction

    NASA Astrophysics Data System (ADS)

    Yan, Wei-Qi; Kankanhalli, Mohan S.

    In this chapter, we combine adaptive sampling in conjunction with video analogies (VA) to correct the audio stream in the karaoke environment κ= {κ (t) : κ (t) = (U(t), K(t)), t in ({t}s, {t}e)} where t s and t e are start time and end time respectively, U(t) is the user multimedia data. We employ multiple streams from the karaoke data K(t) = ({K}_{V }(t), {K}M(t), {K}S(t)), where K V (t), K M (t) and K S (t) are the video, musical accompaniment and original singer's rendition respectively along with the user multimedia data U(t) = ({U}A(t),{U}_{V }(t)) where U V (t) is the user video captured with a camera and U A (t) is the user's rendition of the song. We analyze the audio and video streaming features Ψ (κ ) = {Ψ (U(t), K(t))} = {Ψ (U(t)), Ψ (K(t))} = {{Ψ }U(t), {Ψ }K(t)}, to produce the corrected singing, namely output U '(t), which is made as close as possible to the original singer's rendition. Note that Ψ represents any kind of feature processing.

  16. Cross-Modal Approach for Karaoke Artifacts Correction

    NASA Astrophysics Data System (ADS)

    Yan, Wei-Qi; Kankanhalli, Mohan S.

    In this chapter, we combine adaptive sampling in conjunction with video analogies (VA) to correct the audio stream in the karaoke environment kappa= {kappa (t) : kappa (t) = (U(t), K(t)), t in ({t}s, {t}e)} where t s and t e are start time and end time respectively, U(t) is the user multimedia data. We employ multiple streams from the karaoke data K(t) = ({K}_{V }(t), {K}M(t), {K}S(t)), where K V (t), K M (t) and K S (t) are the video, musical accompaniment and original singer's rendition respectively along with the user multimedia data U(t) = ({U}A(t),{U}_{V }(t)) where U V (t) is the user video captured with a camera and U A (t) is the user's rendition of the song. We analyze the audio and video streaming features Ψ (kappa ) = {Ψ (U(t), K(t))} = {Ψ (U(t)), Ψ (K(t))} = {{Ψ }U(t), {Ψ }K(t)}, to produce the corrected singing, namely output U ' (t), which is made as close as possible to the original singer's rendition. Note that Ψ represents any kind of feature processing.

  17. A Machine Learning Approach to Discover Rules for Expressive Performance Actions in Jazz Guitar Music.

    PubMed

    Giraldo, Sergio I; Ramirez, Rafael

    2016-01-01

    Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores) of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1) quantitatively evaluate the accuracy of the induced models, (2) analyse the relative importance of the considered musical features, (3) discuss some of the learnt expressive performance rules in the context of previous work, and (4) assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules' performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules.

  18. A Machine Learning Approach to Discover Rules for Expressive Performance Actions in Jazz Guitar Music

    PubMed Central

    Giraldo, Sergio I.; Ramirez, Rafael

    2016-01-01

    Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores) of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1) quantitatively evaluate the accuracy of the induced models, (2) analyse the relative importance of the considered musical features, (3) discuss some of the learnt expressive performance rules in the context of previous work, and (4) assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules' performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules. PMID:28066290

  19. 47 CFR 11.31 - EAS protocol.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... End Of Message (EOM) Codes. (1) The Preamble and EAS Codes must use Audio Frequency Shift Keying at a rate of 520.83 bits per second to transmit the codes. Mark frequency is 2083.3 Hz and space frequency... Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL EMERGENCY ALERT SYSTEM (EAS) Equipment Requirements § 11...

  20. 47 CFR 11.31 - EAS protocol.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... End Of Message (EOM) Codes. (1) The Preamble and EAS Codes must use Audio Frequency Shift Keying at a rate of 520.83 bits per second to transmit the codes. Mark frequency is 2083.3 Hz and space frequency... Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL EMERGENCY ALERT SYSTEM (EAS) Equipment Requirements § 11...

  1. 47 CFR 25.144 - Licensing provisions for the 2.3 GHz satellite digital audio radio service.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... (CONTINUED) COMMON CARRIER SERVICES SATELLITE COMMUNICATIONS Applications and Licenses Space Stations § 25... frequencies and emission designators of such communications, and the frequencies and emission designators used... repeaters will communicate, the frequencies and emission designators of such communications, and the...

  2. 47 CFR 25.144 - Licensing provisions for the 2.3 GHz satellite digital audio radio service.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... (CONTINUED) COMMON CARRIER SERVICES SATELLITE COMMUNICATIONS Applications and Licenses Space Stations § 25... of such communications, and the frequencies and emission designators used by the repeaters to re..., the frequencies and emission designators of such communications, and the frequencies and emission...

  3. 47 CFR 25.144 - Licensing provisions for the 2.3 GHz satellite digital audio radio service.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... (CONTINUED) COMMON CARRIER SERVICES SATELLITE COMMUNICATIONS Applications and Licenses Space Stations § 25... frequencies and emission designators of such communications, and the frequencies and emission designators used... repeaters will communicate, the frequencies and emission designators of such communications, and the...

  4. 47 CFR 11.31 - EAS protocol.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... End Of Message (EOM) Codes. (1) The Preamble and EAS Codes must use Audio Frequency Shift Keying at a rate of 520.83 bits per second to transmit the codes. Mark frequency is 2083.3 Hz and space frequency... Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL EMERGENCY ALERT SYSTEM (EAS) Equipment Requirements § 11...

  5. 47 CFR 11.31 - EAS protocol.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... End Of Message (EOM) Codes. (1) The Preamble and EAS Codes must use Audio Frequency Shift Keying at a rate of 520.83 bits per second to transmit the codes. Mark frequency is 2083.3 Hz and space frequency... Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL EMERGENCY ALERT SYSTEM (EAS) Equipment Requirements § 11...

  6. 47 CFR 25.144 - Licensing provisions for the 2.3 GHz satellite digital audio radio service.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... (CONTINUED) COMMON CARRIER SERVICES SATELLITE COMMUNICATIONS Applications and Licenses Space Stations § 25... frequencies and emission designators of such communications, and the frequencies and emission designators used... repeaters will communicate, the frequencies and emission designators of such communications, and the...

  7. 47 CFR 11.31 - EAS protocol.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... End Of Message (EOM) Codes. (1) The Preamble and EAS Codes must use Audio Frequency Shift Keying at a rate of 520.83 bits per second to transmit the codes. Mark frequency is 2083.3 Hz and space frequency... Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL EMERGENCY ALERT SYSTEM (EAS) Equipment Requirements § 11...

  8. 47 CFR 25.144 - Licensing provisions for the 2.3 GHz satellite digital audio radio service.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... (CONTINUED) COMMON CARRIER SERVICES SATELLITE COMMUNICATIONS Applications and Licenses Space Stations § 25... frequencies and emission designators of such communications, and the frequencies and emission designators used... repeaters will communicate, the frequencies and emission designators of such communications, and the...

  9. 14 CFR 25.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Cockpit voice recorders. 25.1457 Section 25... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  10. 14 CFR 25.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Cockpit voice recorders. 25.1457 Section 25... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  11. 14 CFR 29.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Cockpit voice recorders. 29.1457 Section 29... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  12. 14 CFR 29.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Cockpit voice recorders. 29.1457 Section 29... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  13. 14 CFR 25.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Cockpit voice recorders. 25.1457 Section 25... recorders. (a) Each cockpit voice recorder required by the operating rules of this chapter must be approved... interphone system. (4) Voice or audio signals identifying navigation or approach aids introduced into a...

  14. 19 CFR 201.20 - Fees.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... requesters, subject to the limitations of paragraph (c) of this section. For a paper photocopy of a record... overhead expenses such as costs of space and heating or lighting of the facility in which the records are... of paper copy, microform, audio-visual materials, or machine-readable documentation (e.g., magnetic...

  15. 28 CFR 701.18 - Fees.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... requesters, subject to the limitations of paragraph (c) of this section. For a paper photocopy of a record... machinery. Not included in direct costs are overhead expenses such as costs of space and heating or lighting... request. Such copies can take the form of paper copy, microfilm, audio-visual materials, or machine...

  16. 40 CFR 1601.3 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... overhead expenses, such as the cost of space and heating or lighting of the facility in which the records... FOIA request. Such copies can take the form of, among other things, paper copy, microform, audio-visual... operates a program of scholarly research. FOIA Officer means the person designated to process requests for...

  17. 28 CFR 701.18 - Fees.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... requesters, subject to the limitations of paragraph (c) of this section. For a paper photocopy of a record... machinery. Not included in direct costs are overhead expenses such as costs of space and heating or lighting... request. Such copies can take the form of paper copy, microfilm, audio-visual materials, or machine...

  18. 19 CFR 201.20 - Fees.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... requesters, subject to the limitations of paragraph (c) of this section. For a paper photocopy of a record... overhead expenses such as costs of space and heating or lighting of the facility in which the records are... of paper copy, microform, audio-visual materials, or machine-readable documentation (e.g., magnetic...

  19. 29 CFR 70.38 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... duplication machinery. Not included in direct costs are overhead expenses such as costs of space, heating or... a record necessary to respond to a request. Such copy can take the form of paper, microform, audio... research. To qualify under this definition, the program of scholarly research in connection with which the...

  20. 29 CFR 70.38 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... duplication machinery. Not included in direct costs are overhead expenses such as costs of space, heating or... a record necessary to respond to a request. Such copy can take the form of paper, microform, audio... research. To qualify under this definition, the program of scholarly research in connection with which the...

  1. 19 CFR 201.20 - Fees.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... requesters, subject to the limitations of paragraph (c) of this section. For a paper photocopy of a record... overhead expenses such as costs of space and heating or lighting of the facility in which the records are... of paper copy, microform, audio-visual materials, or machine-readable documentation (e.g., magnetic...

  2. 40 CFR 1601.3 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... overhead expenses, such as the cost of space and heating or lighting of the facility in which the records... FOIA request. Such copies can take the form of, among other things, paper copy, microform, audio-visual... operates a program of scholarly research. FOIA Officer means the person designated to process requests for...

  3. 40 CFR 1601.3 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... overhead expenses, such as the cost of space and heating or lighting of the facility in which the records... FOIA request. Such copies can take the form of, among other things, paper copy, microform, audio-visual... operates a program of scholarly research. FOIA Officer means the person designated to process requests for...

  4. 29 CFR 70.38 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... duplication machinery. Not included in direct costs are overhead expenses such as costs of space, heating or... a record necessary to respond to a request. Such copy can take the form of paper, microform, audio... research. To qualify under this definition, the program of scholarly research in connection with which the...

  5. 19 CFR 201.20 - Fees.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... requesters, subject to the limitations of paragraph (c) of this section. For a paper photocopy of a record... overhead expenses such as costs of space and heating or lighting of the facility in which the records are... of paper copy, microform, audio-visual materials, or machine-readable documentation (e.g., magnetic...

  6. 29 CFR 70.38 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... duplication machinery. Not included in direct costs are overhead expenses such as costs of space, heating or... a record necessary to respond to a request. Such copy can take the form of paper, microform, audio... research. To qualify under this definition, the program of scholarly research in connection with which the...

  7. 28 CFR 701.18 - Fees.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... requesters, subject to the limitations of paragraph (c) of this section. For a paper photocopy of a record... machinery. Not included in direct costs are overhead expenses such as costs of space and heating or lighting... request. Such copies can take the form of paper copy, microfilm, audio-visual materials, or machine...

  8. 32 CFR 1662.6 - Fee schedule; waiver of fees.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... as costs of space, and heating or lighting the facility in which the records are stored. (2) The term... copies may take the form of paper copy, microform, audio-visual materials, or machine readable... institution of vocational education, which operates a program or programs of scholarly research. (7) The term...

  9. 32 CFR 1662.6 - Fee schedule; waiver of fees.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... as costs of space, and heating or lighting the facility in which the records are stored. (2) The term... copies may take the form of paper copy, microform, audio-visual materials, or machine readable... institution of vocational education, which operates a program or programs of scholarly research. (7) The term...

  10. 40 CFR 1601.3 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... overhead expenses, such as the cost of space and heating or lighting of the facility in which the records... FOIA request. Such copies can take the form of, among other things, paper copy, microform, audio-visual... operates a program of scholarly research. FOIA Officer means the person designated to process requests for...

  11. 28 CFR 701.18 - Fees.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... requesters, subject to the limitations of paragraph (c) of this section. For a paper photocopy of a record... machinery. Not included in direct costs are overhead expenses such as costs of space and heating or lighting... request. Such copies can take the form of paper copy, microfilm, audio-visual materials, or machine...

  12. School Building Design and Audio-Visual Resources.

    ERIC Educational Resources Information Center

    National Committee for Audio-Visual Aids in Education, London (England).

    The design of new schools should facilitate the use of audiovisual resources by ensuring that the materials used in the construction of the buildings provide adequate sound insulation and acoustical and viewing conditions in all learning spaces. The facilities to be considered are: electrical services; electronic services; light control and…

  13. Quo Vadimus? The 21st Century and Multimedia.

    ERIC Educational Resources Information Center

    Kuhn, Allan D.

    This paper relates the concept of computer-driven multimedia to the National Aeronautics and Space Administration (NASA) Scientific and Technical Information Program (STIP). Multimedia is defined here as computer integration and output of text, animation, audio, video, and graphics. Multimedia is the stage of computer-based information that allows…

  14. Telecommunications in Higher Education: Creating New Information Sources.

    ERIC Educational Resources Information Center

    Brown, Fred D.

    1986-01-01

    Discusses the telecommunications systems in operation at Buena Vista College in Iowa. Describes the systems' uses in linking all offices and classrooms on the campus, downlinking satellite communications through a dish, transmitting audio and video information to any set of defined studio or classroom space, and teleconferencing. (TW)

  15. Context-specific effects of musical expertise on audiovisual integration

    PubMed Central

    Bishop, Laura; Goebl, Werner

    2014-01-01

    Ensemble musicians exchange auditory and visual signals that can facilitate interpersonal synchronization. Musical expertise improves how precisely auditory and visual signals are perceptually integrated and increases sensitivity to asynchrony between them. Whether expertise improves sensitivity to audiovisual asynchrony in all instrumental contexts or only in those using sound-producing gestures that are within an observer's own motor repertoire is unclear. This study tested the hypothesis that musicians are more sensitive to audiovisual asynchrony in performances featuring their own instrument than in performances featuring other instruments. Short clips were extracted from audio-video recordings of clarinet, piano, and violin performances and presented to highly-skilled clarinetists, pianists, and violinists. Clips either maintained the audiovisual synchrony present in the original recording or were modified so that the video led or lagged behind the audio. Participants indicated whether the audio and video channels in each clip were synchronized. The range of asynchronies most often endorsed as synchronized was assessed as a measure of participants' sensitivities to audiovisual asynchrony. A positive relationship was observed between musical training and sensitivity, with data pooled across stimuli. While participants across expertise groups detected asynchronies most readily in piano stimuli and least readily in violin stimuli, pianists showed significantly better performance for piano stimuli than for either clarinet or violin. These findings suggest that, to an extent, the effects of expertise on audiovisual integration can be instrument-specific; however, the nature of the sound-producing gestures that are observed has a substantial effect on how readily asynchrony is detected as well. PMID:25324819

  16. The ventriloquist in periphery: impact of eccentricity-related reliability on audio-visual localization.

    PubMed

    Charbonneau, Geneviève; Véronneau, Marie; Boudrias-Fournier, Colin; Lepore, Franco; Collignon, Olivier

    2013-10-28

    The relative reliability of separate sensory estimates influences the way they are merged into a unified percept. We investigated how eccentricity-related changes in reliability of auditory and visual stimuli influence their integration across the entire frontal space. First, we surprisingly found that despite a strong decrease in auditory and visual unisensory localization abilities in periphery, the redundancy gain resulting from the congruent presentation of audio-visual targets was not affected by stimuli eccentricity. This result therefore contrasts with the common prediction that a reduction in sensory reliability necessarily induces an enhanced integrative gain. Second, we demonstrate that the visual capture of sounds observed with spatially incongruent audio-visual targets (ventriloquist effect) steadily decreases with eccentricity, paralleling a lowering of the relative reliability of unimodal visual over unimodal auditory stimuli in periphery. Moreover, at all eccentricities, the ventriloquist effect positively correlated with a weighted combination of the spatial resolution obtained in unisensory conditions. These findings support and extend the view that the localization of audio-visual stimuli relies on an optimal combination of auditory and visual information according to their respective spatial reliability. All together, these results evidence that the external spatial coordinates of multisensory events relative to an observer's body (e.g., eyes' or head's position) influence how this information is merged, and therefore determine the perceptual outcome.

  17. FAST at MACH 20: clinical ultrasound aboard the International Space Station.

    PubMed

    Sargsyan, Ashot E; Hamilton, Douglas R; Jones, Jeffrey A; Melton, Shannon; Whitson, Peggy A; Kirkpatrick, Andrew W; Martin, David; Dulchavsky, Scott A

    2005-01-01

    Focused assessment with sonography for trauma (FAST) examination has been proved accurate for diagnosing trauma when performed by nonradiologist physicians. Recent reports have suggested that nonphysicians also may be able to perform the FAST examination reliably. A multipurpose ultrasound system is installed on the International Space Station as a component of the Human Research Facility. Nonphysician crew members aboard the International Space Station receive modest training in hardware operation, sonographic techniques, and remotely guided scanning. This report documents the first FAST examination conducted in space, as part of the sustained effort to maintain the highest possible level of available medical care during long-duration space flight. An International Space Station crew member with minimal sonography training was remotely guided through a FAST examination by an ultrasound imaging expert from Mission Control Center using private real-time two-way audio and a private space-to-ground video downlink (7.5 frames/second). There was a 2-second satellite delay for both video and audio. To facilitate the real-time telemedical ultrasound examination, identical reference cards showing topologic reference points and hardware controls were available to both the crew member and the ground-based expert. A FAST examination, including four standard abdominal windows, was completed in approximately 5.5 minutes. Following commands from the Mission Control Center-based expert, the crew member acquired all target images without difficulty. The anatomic content and fidelity of the ultrasound video were excellent and would allow clinical decision making. It is possible to conduct a remotely guided FAST examination with excellent clinical results and speed, even with a significantly reduced video frame rate and a 2-second communication latency. A wider application of trauma ultrasound applications for remote medicine on earth appears to be possible and warranted.

  18. Concept of Operations Evaluation for Mitigating Space Flight-Relevant Medical Issues in a Planetary Habitat

    NASA Technical Reports Server (NTRS)

    Barsten, Kristina; Hurst, Victor, IV; Scheuring, Richard; Baumann, David K.; Johnson-Throop, Kathy

    2010-01-01

    Introduction: Analogue environments assist the NASA Human Research Program (HRP) in developing capabilities to mitigate high risk issues to crew health and performance for space exploration. The Habitat Demonstration Unit (HDU) is an analogue habitat used to assess space-related products for planetary missions. The Exploration Medical Capability (ExMC) element at the NASA Johnson Space Center (JSC) was tasked with developing planetary-relevant medical scenarios to evaluate the concept of operations for mitigating medical issues in such an environment. Methods: Two medical scenarios were conducted within the simulated planetary habitat with the crew executing two space flight-relevant procedures: Eye Examination with a corneal injury and Skin Laceration. Remote guidance for the crew was provided by a flight surgeon (FS) stationed at a console outside of the habitat. Audio and video data were collected to capture the communication between the crew and the FS, as well as the movements of the crew executing the procedures. Questionnaire data regarding procedure content and remote guidance performance also were collected from the crew immediately after the sessions. Results: Preliminary review of the audio, video, and questionnaire data from the two scenarios conducted within the HDU indicate that remote guidance techniques from an FS on console can help crew members within a planetary habitat mitigate planetary-relevant medical issues. The content and format of the procedures were considered concise and intuitive, respectively. Discussion: Overall, the preliminary data from the evaluation suggest that use of remote guidance techniques by a FS can help HDU crew execute space exploration-relevant medical procedures within a habitat relevant to planetary missions, however further evaluations will be needed to implement this strategy into the complete concept of operations for conducting general space medicine within similar environments

  19. Reaching the Public in 2016

    NASA Astrophysics Data System (ADS)

    Grauer, Albert D.; Catalina Sky Survey

    2016-10-01

    Travelers in the Night is a series of 2 minute audio programs whose topics include Catalina Sky Survey discoveries as well as other current research in astronomy and the space sciences. Each episode is first published on Public Radio Exchange [PRX] which makes it available to NPR and Community Radio Stations free of charge. After about 3 weeks it is published as an audio podcast on the internet via spreaker.com, iHeart Radio, Stitcher, iTunes and a few other outlets. The most interesting aspect of the Travelers In The Night experiment is the insight it provides into the rapidly changing means by which people obtain information in 2016. The demographics, and devices used to obtain more than 175,000 plays and downloads are presented in this poster.

  20. Towards parameter-free classification of sound effects in movies

    NASA Astrophysics Data System (ADS)

    Chu, Selina; Narayanan, Shrikanth; Kuo, C.-C. J.

    2005-08-01

    The problem of identifying intense events via multimedia data mining in films is investigated in this work. Movies are mainly characterized by dialog, music, and sound effects. We begin our investigation with detecting interesting events through sound effects. Sound effects are neither speech nor music, but are closely associated with interesting events such as car chases and gun shots. In this work, we utilize low-level audio features including MFCC and energy to identify sound effects. It was shown in previous work that the Hidden Markov model (HMM) works well for speech/audio signals. However, this technique requires a careful choice in designing the model and choosing correct parameters. In this work, we introduce a framework that will avoid such necessity and works well with semi- and non-parametric learning algorithms.

  1. Parallel perceptual enhancement and hierarchic relevance evaluation in an audio-visual conjunction task.

    PubMed

    Potts, Geoffrey F; Wood, Susan M; Kothmann, Delia; Martin, Laura E

    2008-10-21

    Attention directs limited-capacity information processing resources to a subset of available perceptual representations. The mechanisms by which attention selects task-relevant representations for preferential processing are not fully known. Triesman and Gelade's [Triesman, A., Gelade, G., 1980. A feature integration theory of attention. Cognit. Psychol. 12, 97-136.] influential attention model posits that simple features are processed preattentively, in parallel, but that attention is required to serially conjoin multiple features into an object representation. Event-related potentials have provided evidence for this model showing parallel processing of perceptual features in the posterior Selection Negativity (SN) and serial, hierarchic processing of feature conjunctions in the Frontal Selection Positivity (FSP). Most prior studies have been done on conjunctions within one sensory modality while many real-world objects have multimodal features. It is not known if the same neural systems of posterior parallel processing of simple features and frontal serial processing of feature conjunctions seen within a sensory modality also operate on conjunctions between modalities. The current study used ERPs and simultaneously presented auditory and visual stimuli in three task conditions: Attend Auditory (auditory feature determines the target, visual features are irrelevant), Attend Visual (visual features relevant, auditory irrelevant), and Attend Conjunction (target defined by the co-occurrence of an auditory and a visual feature). In the Attend Conjunction condition when the auditory but not the visual feature was a target there was an SN over auditory cortex, when the visual but not auditory stimulus was a target there was an SN over visual cortex, and when both auditory and visual stimuli were targets (i.e. conjunction target) there were SNs over both auditory and visual cortex, indicating parallel processing of the simple features within each modality. In contrast, an FSP was present when either the visual only or both auditory and visual features were targets, but not when only the auditory stimulus was a target, indicating that the conjunction target determination was evaluated serially and hierarchically with visual information taking precedence. This indicates that the detection of a target defined by audio-visual conjunction is achieved via the same mechanism as within a single perceptual modality, through separate, parallel processing of the auditory and visual features and serial processing of the feature conjunction elements, rather than by evaluation of a fused multimodal percept.

  2. AccuNet/AP (Associated Press) Multimedia Archive

    ERIC Educational Resources Information Center

    Young, Terrence E., Jr.

    2004-01-01

    The AccuNet/AP Multimedia Archive is an electronic library containing the AP's current photos and a selection of pictures from their enormous print and negative library, as well as text and graphic material. It is composed of two photo databases as well as graphics, text, and audio databases. The features of this database are briefly described in…

  3. The Effective Audio-Visual Program in Foreign Language and Literature Studies.

    ERIC Educational Resources Information Center

    Lawton, Ben

    Foreign language teachers should exploit the American affinity for television and movies by using foreign language feature films and shorts in the classroom. Social and political history and literary trends illustrated in the films may be discussed and absorbed along with the language. The author teaches such a course in the Department of Italian…

  4. Phonetic Symbols through Audiolingual Method to Improve the Students' Listening Skill

    ERIC Educational Resources Information Center

    Samawiyah, Zuhrotun; Saifuddin, Muhammad

    2016-01-01

    Phonetic symbols present linguistics feature to how the words are pronounced or spelled and they offer a way to easily identify and recognize the words. Phonetic symbols were applied in this research to give the students clear input and a comprehension toward English words. Moreover, these phonetic symbols were applied within audio-lingual method…

  5. The Film in Language Teaching Association (FILTA): A Multilingual Community of Practice

    ERIC Educational Resources Information Center

    Herrero, Carmen

    2016-01-01

    This article presents the Film in Language Teaching Association (FILTA) project, a community of practice (CoP) whose main goals are first to engage language teachers in practical uses of film and audio-visual media in the second language classroom; second, to value the artistic features of cinema; and third, to encourage a dialogue between…

  6. A Description of a Prototype System at NTID which Merges Computer Assisted Instruction and Instructional Television.

    ERIC Educational Resources Information Center

    vonFeldt, James R.

    The development of a prototype system is described which merges the strengths of computer assisted instruction, data gathering, interactive learning, individualized instruction, and the motion in color, and audio features of television. Creation of the prototype system will allow testing of both TV and interactive CAI/TV strategies in auditory and…

  7. Detecting Psychopathy from Thin Slices of Behavior

    ERIC Educational Resources Information Center

    Fowler, Katherine A.; Lilienfeld, Scott O.; Patrick, Christopher J.

    2009-01-01

    This study is the first to demonstrate that features of psychopathy can be reliably and validly detected by lay raters from "thin slices" (i.e., small samples) of behavior. Brief excerpts (5 s, 10 s, and 20 s) from interviews with 96 maximum-security inmates were presented in video or audio form or in both modalities combined. Forty raters used…

  8. Beyond "Classroom" Technology: The Equipment Circulation Program at Rasmuson Library, University of Alaska Fairbanks

    ERIC Educational Resources Information Center

    Jensen, Karen

    2008-01-01

    The library at the University of Alaska Fairbanks offers a unique equipment lending program through its Circulation Desk. The program features a wide array of equipment types, generous circulation policies, and unrestricted borrowing, enabling students, staff, and faculty to experiment with the latest in audio, video, and computer technologies,…

  9. Manual of Tape Scripts: Spanish, Level 2. Curriculum Bulletin, 1968-69 Series, Number 13.

    ERIC Educational Resources Information Center

    Lipton, Gladys; And Others

    This second manual of tape scripts, together with a set of foreign language audio tapes for level 2 Spanish, was prepared to support the curriculum bulletin, New York City Foreign Language Program for Secondary Schools: Spanish, Levels 1-5. Vocabulary, repetition, transformation, and recombination drills on specific grammatical features allow…

  10. Manual of Tape Scripts: German, Level 1. Curriculum Bulletin, 1968-69 Series, Number 11.

    ERIC Educational Resources Information Center

    Lipton, Gladys; And Others

    This manual of tape scripts, together with a set of foreign language audio tapes for level 1 German, was prepared to support the curriculum bulletin, New York City Foreign Language Program for Secondary Schools: German, Levels 1-4. Vocabulary, repetition, transformation, and recombination drills on specific grammatical features allow further…

  11. Manual of Tape Scripts: Italian, Level 1. Curriculum Bulletin, 1968-69 Series, Number 12.

    ERIC Educational Resources Information Center

    Lipton, Gladys; And Others

    This manual of tape scripts, together with a set of foreign language audio tapes for level 1 Italian, was prepared to support the curriculum bulletin, New York City Foreign Language Program for Schools: Italian, Levels 1-4. Vocabulary, repetition, transformation, and recombination drills on specific grammatical features allow further development…

  12. Manual of Tape Scripts: Russian, Levels 1 and 2. Curriculum Bulletin, 1969-70 Series, Number 18.

    ERIC Educational Resources Information Center

    Lipton, Gladys; And Others

    This manual of tape scripts, together with a set of foreign language audio tapes for Levels 1 and 2 Russian, was prepared to support the curriculum bulletin, "New York City Foreign Language Program for Schools: Russian, Levels 1-4." Vocabulary, repetition, transformation, and recombination drills on specific grammatical features allow further…

  13. Manual of Tape Scripts: French, Level 2. Curriculum Bulletin, 1968-69 Series, Number 10.

    ERIC Educational Resources Information Center

    Lipton, Gladys; And Others

    This second manual of tape scripts, together with a set of foreign language audio tapes for level 2 French, was prepared to support the curriculum bulletin, New York City Foreign Language Program for Secondary Schools: French, Levels 1-5. Vocabulary, repetition, transformation, and recombination drills on specific grammatical features allow…

  14. Manual of Tape Scripts: Italian, Level 2. Curriculum Bulletin, 1969-70 Series, Number 20.

    ERIC Educational Resources Information Center

    Lipton, Gladys; And Others

    This manual of tape scripts, together with a set of foreign language audio tapes for Level 2 Italian, was prepared to support the curriculum bulletin, "New York City Foreign Language Program for Schools: Italian, Levels 1-4." Vocabulary, repetition, transformation, and recombination drills on specific grammatical features allow further development…

  15. Mobile Guide System Using Problem-Solving Strategy for Museum Learning: A Sequential Learning Behavioural Pattern Analysis

    ERIC Educational Resources Information Center

    Sung, Y.-T.; Hou, H.-T.; Liu, C.-K.; Chang, K.-E.

    2010-01-01

    Mobile devices have been increasingly utilized in informal learning because of their high degree of portability; mobile guide systems (or electronic guidebooks) have also been adopted in museum learning, including those that combine learning strategies and the general audio-visual guide systems. To gain a deeper understanding of the features and…

  16. Development of SPIES (Space Intelligent Eyeing System) for smart vehicle tracing and tracking

    NASA Astrophysics Data System (ADS)

    Abdullah, Suzanah; Ariffin Osoman, Muhammad; Guan Liyong, Chua; Zulfadhli Mohd Noor, Mohd; Mohamed, Ikhwan

    2016-06-01

    SPIES or Space-based Intelligent Eyeing System is an intelligent technology which can be utilized for various applications such as gathering spatial information of features on Earth, tracking system for the movement of an object, tracing system to trace the history information, monitoring driving behavior, security and alarm system as an observer in real time and many more. SPIES as will be developed and supplied modularly will encourage the usage based on needs and affordability of users. SPIES are a complete system with camera, GSM, GPS/GNSS and G-Sensor modules with intelligent function and capabilities. Mainly the camera is used to capture pictures and video and sometimes with audio of an event. Its usage is not limited to normal use for nostalgic purpose but can be used as a reference for security and material of evidence when an undesirable event such as crime occurs. When integrated with space based technology of the Global Navigational Satellite System (GNSS), photos and videos can be recorded together with positioning information. A product of the integration of these technologies when integrated with Information, Communication and Technology (ICT) and Geographic Information System (GIS) will produce innovation in the form of information gathering methods in still picture or video with positioning information that can be conveyed in real time via the web to display location on the map hence creating an intelligent eyeing system based on space technology. The importance of providing global positioning information is a challenge but overcome by SPIES even in areas without GNSS signal reception for the purpose of continuous tracking and tracing capability

  17. 14 CFR 23.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... intelligibility. (c) Each cockpit voice recorder must be installed so that the part of the communication or audio... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Cockpit voice recorders. 23.1457 Section 23... Equipment § 23.1457 Cockpit voice recorders. (a) Each cockpit voice recorder required by the operating rules...

  18. 14 CFR 23.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... intelligibility. (c) Each cockpit voice recorder must be installed so that the part of the communication or audio... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Cockpit voice recorders. 23.1457 Section 23... Equipment § 23.1457 Cockpit voice recorders. (a) Each cockpit voice recorder required by the operating rules...

  19. 14 CFR 23.1457 - Cockpit voice recorders.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... intelligibility. (c) Each cockpit voice recorder must be installed so that the part of the communication or audio... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Cockpit voice recorders. 23.1457 Section 23... Equipment § 23.1457 Cockpit voice recorders. (a) Each cockpit voice recorder required by the operating rules...

  20. 76 FR 32360 - Information Collection Being Reviewed by the Federal Communications Commission

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-06-06

    ... do so within the period of time allowed by this notice, you should advise the contact listed below as... other for profit. Number of Respondents and Responses: 158 respondents; 2,406 responses. Estimated Time... Satellite Digital Audio Radio Service (SDARS), Aeronautical Mobile Telemetry (AMT), and Deep Space Network...

  1. 76 FR 45618 - Solicitation for a Cooperative Agreement-Two Hearings of the National Institute of Corrections...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-29

    ..., television, and Web-based news organizations. The recipient will arrange for or supply all audio- visual... program narrative text to 20 double spaced, numbered pages. The application package must include: a cover..., 2012), a program narrative responding to the requirements in this announcement, a description of the...

  2. Emotion recognition abilities across stimulus modalities in schizophrenia and the role of visual attention.

    PubMed

    Simpson, Claire; Pinkham, Amy E; Kelsven, Skylar; Sasson, Noah J

    2013-12-01

    Emotion can be expressed by both the voice and face, and previous work suggests that presentation modality may impact emotion recognition performance in individuals with schizophrenia. We investigated the effect of stimulus modality on emotion recognition accuracy and the potential role of visual attention to faces in emotion recognition abilities. Thirty-one patients who met DSM-IV criteria for schizophrenia (n=8) or schizoaffective disorder (n=23) and 30 non-clinical control individuals participated. Both groups identified emotional expressions in three different conditions: audio only, visual only, combined audiovisual. In the visual only and combined conditions, time spent visually fixating salient features of the face were recorded. Patients were significantly less accurate than controls in emotion recognition during both the audio and visual only conditions but did not differ from controls on the combined condition. Analysis of visual scanning behaviors demonstrated that patients attended less than healthy individuals to the mouth in the visual condition but did not differ in visual attention to salient facial features in the combined condition, which may in part explain the absence of a deficit for patients in this condition. Collectively, these findings demonstrate that patients benefit from multimodal stimulus presentations of emotion and support hypotheses that visual attention to salient facial features may serve as a mechanism for accurate emotion identification. © 2013.

  3. A modified mole cricket lure and description of Scapteriscus borellii (Orthoptera: Gryllotalpidae) range expansion and calling song in California.

    PubMed

    Dillman, Adler R; Cronin, Christopher J; Tang, Joseph; Gray, David A; Sternberg, Paul W

    2014-02-01

    Invasive mole cricket species in the genus Scapteriscus have become significant agricultural pests and are continuing to expand their range in North America. Though largely subterranean, adults of some species, such as Scapteriscus borellii Giglio-Tos 1894, are capable of long dispersive flights and phonotaxis to male calling songs to find suitable habitats and mates. Mole crickets in the genus Scapteriscus are known to be attracted to and can be caught by audio lure traps that broadcast synthesized or recorded calling songs. We report improvements in the design and production of electronic controllers for the automation of semipermanent mole cricket trap lures as well as highly portable audio trap collection designs. Using these improved audio lure traps, we collected the first reported individuals of the pest mole cricket S. borellii in California. We describe several characteristic features of the calling song of the California population including that the pulse rate is a function of soil temperature, similar to Florida populations of S. borellii. Further, we show that other calling song characteristics (carrier frequency, intensity, and pulse rate) are significantly different between the populations.

  4. Quantifying Auditory Temporal Stability in a Large Database of Recorded Music

    PubMed Central

    Ellis, Robert J.; Duan, Zhiyan; Wang, Ye

    2014-01-01

    “Moving to the beat” is both one of the most basic and one of the most profound means by which humans (and a few other species) interact with music. Computer algorithms that detect the precise temporal location of beats (i.e., pulses of musical “energy”) in recorded music have important practical applications, such as the creation of playlists with a particular tempo for rehabilitation (e.g., rhythmic gait training), exercise (e.g., jogging), or entertainment (e.g., continuous dance mixes). Although several such algorithms return simple point estimates of an audio file’s temporal structure (e.g., “average tempo”, “time signature”), none has sought to quantify the temporal stability of a series of detected beats. Such a method-a “Balanced Evaluation of Auditory Temporal Stability” (BEATS)–is proposed here, and is illustrated using the Million Song Dataset (a collection of audio features and music metadata for nearly one million audio files). A publically accessible web interface is also presented, which combines the thresholdable statistics of BEATS with queryable metadata terms, fostering potential avenues of research and facilitating the creation of highly personalized music playlists for clinical or recreational applications. PMID:25469636

  5. Video mining using combinations of unsupervised and supervised learning techniques

    NASA Astrophysics Data System (ADS)

    Divakaran, Ajay; Miyahara, Koji; Peker, Kadir A.; Radhakrishnan, Regunathan; Xiong, Ziyou

    2003-12-01

    We discuss the meaning and significance of the video mining problem, and present our work on some aspects of video mining. A simple definition of video mining is unsupervised discovery of patterns in audio-visual content. Such purely unsupervised discovery is readily applicable to video surveillance as well as to consumer video browsing applications. We interpret video mining as content-adaptive or "blind" content processing, in which the first stage is content characterization and the second stage is event discovery based on the characterization obtained in stage 1. We discuss the target applications and find that using a purely unsupervised approach are too computationally complex to be implemented on our product platform. We then describe various combinations of unsupervised and supervised learning techniques that help discover patterns that are useful to the end-user of the application. We target consumer video browsing applications such as commercial message detection, sports highlights extraction etc. We employ both audio and video features. We find that supervised audio classification combined with unsupervised unusual event discovery enables accurate supervised detection of desired events. Our techniques are computationally simple and robust to common variations in production styles etc.

  6. Decoding power-spectral profiles from FMRI brain activities during naturalistic auditory experience.

    PubMed

    Hu, Xintao; Guo, Lei; Han, Junwei; Liu, Tianming

    2017-02-01

    Recent studies have demonstrated a close relationship between computational acoustic features and neural brain activities, and have largely advanced our understanding of auditory information processing in the human brain. Along this line, we proposed a multidisciplinary study to examine whether power spectral density (PSD) profiles can be decoded from brain activities during naturalistic auditory experience. The study was performed on a high resolution functional magnetic resonance imaging (fMRI) dataset acquired when participants freely listened to the audio-description of the movie "Forrest Gump". Representative PSD profiles existing in the audio-movie were identified by clustering the audio samples according to their PSD descriptors. Support vector machine (SVM) classifiers were trained to differentiate the representative PSD profiles using corresponding fMRI brain activities. Based on PSD profile decoding, we explored how the neural decodability correlated to power intensity and frequency deviants. Our experimental results demonstrated that PSD profiles can be reliably decoded from brain activities. We also suggested a sigmoidal relationship between the neural decodability and power intensity deviants of PSD profiles. Our study in addition substantiates the feasibility and advantage of naturalistic paradigm for studying neural encoding of complex auditory information.

  7. One Way Multimedia Broadcasting as a Tool for Education and Development in Developing Nations

    NASA Astrophysics Data System (ADS)

    Chandrasekhar, M. G.; Venugopal, D.; Sebastian, M.; Chari, B.

    2000-07-01

    An improved quality of life through education and developmental communication is an important necessity of societal up-liftment in the new millennium, especially in the developing nations. The population explosion and the associated pressure on the scarce resources to meet the basic necessities have made it more or less impossible for most of the nations to invest reasonable resources in realizing adequate channels of formal education. Thanks to the developments in satellite communication and associated technologies, new vistas are available today to provide education and developmental communication opportunities to millions of people, spread across the globe. Satellite based Digital Audio and Multimedia Broadcasting is one such new development that is being viewed as an innovative space application in the coming decades. The potential of DAB technology to reach education, information and entertainment directly to the user through a specially designed receiver could be efficiently utilized by the developing nations to overcome their difficulties in realizing formal channels of education and information dissemination. WorldSpace plans to launch three geo-stationary satellites that would cover most of the developing economies in Africa, the Mediterranean, the Middle East, Asia, Latin America and the Caribbean. Apart from a variety of digital, high quality audio channels providing news, views, education and entertainment opportunities, the end users can also get a responsive multimedia. The multimedia is being planned as a specially packaged offering that can meet the demand of students, professionals as well as certain special groups who have certain specific data and information requirements. Apart from WorldSpace, renowned agencies/firms from different parts of the world shall provide the required content to meet these requirements. Though the Internet option is available, higher telephone charges and the difficulty in getting access have made this option less interesting and unpopular in most of the developing countries. The proposed digital audio and multimedia offering from WorldSpace to millions of consumers spread across more than 120 countries is considered as a unique tool for education and development, particularly in the developing nations. In this paper, an attempt is made to briefly describe the issues associated with education and development in developing countries, the WorldSpace offering and how a developing nation can benefit from this offering in the coming decades.

  8. KSC-2011-2958

    NASA Image and Video Library

    2011-04-21

    CAPE CANAVERAL, Fla. - In Orbiter Processing Facility-1 at NASA's Kennedy Space Center in Florida, the Ku-band antenna is being stowed in space shuttle Atlantis' payload bay. The antenna, which resembles a mini-satellite dish, transmits audio, video and data between Earth and the shuttle. Next, the clamshell doors of the payload bay will close completely in preparation for its move to the Vehicle Assembly Building. Atlantis is being prepared for the STS-135 mission, which will deliver the Raffaello multipurpose logistics module packed with supplies, logistics and spare parts to the International Space Station. STS-135 is targeted to launch June 28, and will be the last spaceflight for the Space Shuttle Program. Photo credit: NASA/Jack Pfaller

  9. KSC-2011-2959

    NASA Image and Video Library

    2011-04-21

    CAPE CANAVERAL, Fla. - In Orbiter Processing Facility-1 at NASA's Kennedy Space Center in Florida, the Ku-band antenna is being stowed in space shuttle Atlantis' payload bay. The antenna, which resembles a mini-satellite dish, transmits audio, video and data between Earth and the shuttle. Next, the clamshell doors of the payload bay will close completely in preparation for its move to the Vehicle Assembly Building. Atlantis is being prepared for the STS-135 mission, which will deliver the Raffaello multipurpose logistics module packed with supplies, logistics and spare parts to the International Space Station. STS-135 is targeted to launch June 28, and will be the last spaceflight for the Space Shuttle Program. Photo credit: NASA/Jack Pfaller

  10. KSC-2011-2957

    NASA Image and Video Library

    2011-04-21

    CAPE CANAVERAL, Fla. - In Orbiter Processing Facility-1 at NASA's Kennedy Space Center in Florida, the Ku-band antenna is being stowed in space shuttle Atlantis' payload bay. The antenna, which resembles a mini-satellite dish, transmits audio, video and data between Earth and the shuttle. Next, the clamshell doors of the payload bay will close completely in preparation for its move to the Vehicle Assembly Building. Atlantis is being prepared for the STS-135 mission, which will deliver the Raffaello multipurpose logistics module packed with supplies, logistics and spare parts to the International Space Station. STS-135 is targeted to launch June 28, and will be the last spaceflight for the Space Shuttle Program. Photo credit: NASA/Jack Pfaller

  11. KSC-2011-2961

    NASA Image and Video Library

    2011-04-21

    CAPE CANAVERAL, Fla. - In Orbiter Processing Facility-1 at NASA's Kennedy Space Center in Florida, the Ku-band antenna is stowed in space shuttle Atlantis' payload bay. The antenna, which resembles a mini-satellite dish, transmits audio, video and data between Earth and the shuttle. Next, the clamshell doors of the payload bay will close completely in preparation for its move to the Vehicle Assembly Building. Atlantis is being prepared for the STS-135 mission, which will deliver the Raffaello multipurpose logistics module packed with supplies, logistics and spare parts to the International Space Station. STS-135 is targeted to launch June 28, and will be the last spaceflight for the Space Shuttle Program. Photo credit: NASA/Jack Pfaller

  12. KSC-2011-2960

    NASA Image and Video Library

    2011-04-21

    CAPE CANAVERAL, Fla. - In Orbiter Processing Facility-1 at NASA's Kennedy Space Center in Florida, the Ku-band antenna is being stowed in space shuttle Atlantis' payload bay. The antenna, which resembles a mini-satellite dish, transmits audio, video and data between Earth and the shuttle. Next, the clamshell doors of the payload bay will close completely in preparation for its move to the Vehicle Assembly Building. Atlantis is being prepared for the STS-135 mission, which will deliver the Raffaello multipurpose logistics module packed with supplies, logistics and spare parts to the International Space Station. STS-135 is targeted to launch June 28, and will be the last spaceflight for the Space Shuttle Program. Photo credit: NASA/Jack Pfaller

  13. KSC-2011-2956

    NASA Image and Video Library

    2011-04-21

    CAPE CANAVERAL, Fla. - In Orbiter Processing Facility-1 at NASA's Kennedy Space Center in Florida, the Ku-band antenna is being stowed in space shuttle Atlantis' payload bay. The antenna, which resembles a mini-satellite dish, transmits audio, video and data between Earth and the shuttle. Next, the clamshell doors of the payload bay will close completely in preparation for its move to the Vehicle Assembly Building. Atlantis is being prepared for the STS-135 mission, which will deliver the Raffaello multipurpose logistics module packed with supplies, logistics and spare parts to the International Space Station. STS-135 is targeted to launch June 28, and will be the last spaceflight for the Space Shuttle Program. Photo credit: NASA/Jack Pfaller

  14. An automatic audio-magnetotelluric equipment, controlled by microprocessor, for the telesurvellance of the volcano Momotombo (Nicaragua)

    NASA Astrophysics Data System (ADS)

    Clerc, G.; Décriaud, J.-P.; Doyen, G.; Halbwachs, M.; Henrotte, M.; Rémy, J.; Zhang, X.-C.

    1984-07-01

    A campaign of audio-magnetotelluric soundings in the crater of Momotombo has shown that the structure of this volcano is suitable to a surveillance by this method; it has an only central activity, and the resistivity versus the depth decreases suddenly at least 100 times at about 270 m. So, this technical paper describes the equipment designed and built in order to measure every 2 hr the apparent resistivity in a given direction, at seven frequencies regularly spaced from 5 to 312 Hz. The data are preprocessed to fill only 32 bytes: they are - on the one hand, printed locally for the close surveillance; - on the other hand, transmitted by an ARGOS platform and received in Garchy via satellite for searching studies

  15. River multimodal scenario for rehabilitation robotics.

    PubMed

    Munih, Marko; Novak, Domen; Milavec, Maja; Ziherl, Jaka; Olenšek, Andrej; Mihelj, Matjaž

    2011-01-01

    This paper presents the novel "River" multimodal rehabilitation robotics scenario that includes video, audio and haptic modalities. Elements contributing to intrinsic motivation are carefully joined in the three modalities to increase motivation of the user. The user first needs to perform a motor action, then receives a cognitive challenge that is solved with adequate motor activity. Audio includes environmental sounds, music and spoken instructions or encouraging statements. Sounds and music were classified according to the arousal-valence space. The haptic modality can provide catching, grasping, tunnel or adaptive assistance, all depending on the user's needs. The scenario was evaluated in 16 stroke users, who responded to it favourably according to the Intrinsic Motivation Inventory questionnaire. Additionally, the river multimodal environment seems to elicit higher motivation than a simpler apple pick-and-place multimodal task. © 2011 IEEE

  16. Applying Spatial Audio to Human Interfaces: 25 Years of NASA Experience

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.; Godfrey, Martine; Miller, Joel D.; Anderson, Mark R.

    2010-01-01

    From the perspective of human factors engineering, the inclusion of spatial audio within a human-machine interface is advantageous from several perspectives. Demonstrated benefits include the ability to monitor multiple streams of speech and non-speech warning tones using a cocktail party advantage, and for aurally-guided visual search. Other potential benefits include the spatial coordination and interaction of multimodal events, and evaluation of new communication technologies and alerting systems using virtual simulation. Many of these technologies were developed at NASA Ames Research Center, beginning in 1985. This paper reviews examples and describes the advantages of spatial sound in NASA-related technologies, including space operations, aeronautics, and search and rescue. The work has involved hardware and software development as well as basic and applied research.

  17. There’s More to Groove than Bass in Electronic Dance Music: Why Some People Won’t Dance to Techno

    PubMed Central

    2016-01-01

    The purpose of this study was to explore the relationship between audio descriptors for groove-based electronic dance music (EDM) and raters’ perceived cognitive, affective, and psychomotor responses. From 198 musical excerpts (length: 15 sec.) representing 11 subgenres of EDM, 19 low-level audio feature descriptors were extracted. A principal component analysis of the feature vectors indicated that the musical excerpts could effectively be classified using five complex measures, describing the rhythmical properties of: (a) the high-frequency band, (b) the mid-frequency band, and (c) the low-frequency band, as well as overall fluctuations in (d) dynamics, and (e) timbres. Using these five complex audio measures, four meaningful clusters of the EDM excerpts emerged with distinct musical attributes comprising music with: (a) isochronous bass and static timbres, (b) isochronous bass with fluctuating dynamics and rhythmical variations in the mid-frequency range, (c) non-isochronous bass and fluctuating timbres, and (d) non-isochronous bass with rhythmical variations in the high frequencies. Raters (N = 99) were each asked to respond to four musical excerpts using a four point Likert-Type scale consisting of items representing cognitive (n = 9), affective (n = 9), and psychomotor (n = 3) domains. Musical excerpts falling under the cluster of “non-isochronous bass with rhythmical variations in the high frequencies” demonstrated the overall highest composite scores as evaluated by the raters. Musical samples falling under the cluster of “isochronous bass with static timbres” demonstrated the overall lowest composite scores as evaluated by the raters. Moreover, music preference was shown to significantly affect the systematic patterning of raters’ responses for those with a musical preference for “contemporary” music, “sophisticated” music, and “intense” music. PMID:27798645

  18. There's More to Groove than Bass in Electronic Dance Music: Why Some People Won't Dance to Techno.

    PubMed

    Wesolowski, Brian C; Hofmann, Alex

    2016-01-01

    The purpose of this study was to explore the relationship between audio descriptors for groove-based electronic dance music (EDM) and raters' perceived cognitive, affective, and psychomotor responses. From 198 musical excerpts (length: 15 sec.) representing 11 subgenres of EDM, 19 low-level audio feature descriptors were extracted. A principal component analysis of the feature vectors indicated that the musical excerpts could effectively be classified using five complex measures, describing the rhythmical properties of: (a) the high-frequency band, (b) the mid-frequency band, and (c) the low-frequency band, as well as overall fluctuations in (d) dynamics, and (e) timbres. Using these five complex audio measures, four meaningful clusters of the EDM excerpts emerged with distinct musical attributes comprising music with: (a) isochronous bass and static timbres, (b) isochronous bass with fluctuating dynamics and rhythmical variations in the mid-frequency range, (c) non-isochronous bass and fluctuating timbres, and (d) non-isochronous bass with rhythmical variations in the high frequencies. Raters (N = 99) were each asked to respond to four musical excerpts using a four point Likert-Type scale consisting of items representing cognitive (n = 9), affective (n = 9), and psychomotor (n = 3) domains. Musical excerpts falling under the cluster of "non-isochronous bass with rhythmical variations in the high frequencies" demonstrated the overall highest composite scores as evaluated by the raters. Musical samples falling under the cluster of "isochronous bass with static timbres" demonstrated the overall lowest composite scores as evaluated by the raters. Moreover, music preference was shown to significantly affect the systematic patterning of raters' responses for those with a musical preference for "contemporary" music, "sophisticated" music, and "intense" music.

  19. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.

    PubMed

    Weninger, Felix; Eyben, Florian; Schuller, Björn W; Mortillaro, Marcello; Scherer, Klaus R

    2013-01-01

    WITHOUT DOUBT, THERE IS EMOTIONAL INFORMATION IN ALMOST ANY KIND OF SOUND RECEIVED BY HUMANS EVERY DAY: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow's pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of "the sound that something makes," in order to evaluate the system's auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.

  20. Partially supervised speaker clustering.

    PubMed

    Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

    2012-05-01

    Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.

  1. CD-ROM-aided Databases

    NASA Astrophysics Data System (ADS)

    Shimbori, Susumu

    CD-ROM has recently attracted remarkable attentions as a new information media. In this feature the following points concerning CD-ROM are described: (1) Development of CD-ROM from audio CD, (2)advantages and character of CD-ROM compared with printed or online media, (3)CD-ROM specification by Philips-Sony, (4)hardware and system construction with CD-ROM, and (5)production processes of CD-ROM.

  2. Sound as Affective Design Feature in Multimedia Learning--Benefits and Drawbacks from a Cognitive Load Theory Perspective

    ERIC Educational Resources Information Center

    Königschulte, Anke

    2015-01-01

    The study presented in this paper investigates the potential effects of including non-speech audio such as sound effects into multimedia-based instruction taking into account Sweller's cognitive load theory (Sweller, 2005) and applied frameworks such as the cognitive theory of multimedia learning (Mayer, 2005) and the cognitive affective theory of…

  3. A Pilot Study of a Self-Voicing Computer Program for Prealgebra Math Problems

    ERIC Educational Resources Information Center

    Beal, Carole R.; Rosenblum, L. Penny; Smith, Derrick W.

    2011-01-01

    Fourteen students with visual impairments in Grades 5-12 participated in the field-testing of AnimalWatch-VI-Beta. This computer program delivered 12 prealgebra math problems and hints through a self-voicing audio feature. The students provided feedback about how the computer program can be improved and expanded to make it accessible to all users.…

  4. Multi-Modal Surrogates for Retrieving and Making Sense of Videos: Is Synchronization between the Multiple Modalities Optimal?

    ERIC Educational Resources Information Center

    Song, Yaxiao

    2010-01-01

    Video surrogates can help people quickly make sense of the content of a video before downloading or seeking more detailed information. Visual and audio features of a video are primary information carriers and might become important components of video retrieval and video sense-making. In the past decades, most research and development efforts on…

  5. Listeners' expectation of room acoustical parameters based on visual cues

    NASA Astrophysics Data System (ADS)

    Valente, Daniel L.

    Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audio-visual study, in which participants are instructed to make spatial congruency and quantity judgments in dynamic cross-modal environments. The results of these psychophysical tests suggest the importance of consilient audio-visual presentation to the legibility of an auditory scene. Several studies have looked into audio-visual interaction in room perception in recent years, but these studies rely on static images, speech signals, or photographs alone to represent the visual scene. Building on these studies, the aim is to propose a testing method that uses monochromatic compositing (blue-screen technique) to position a studio recording of a musical performance in a number of virtual acoustical environments and ask subjects to assess these environments. In the first experiment of the study, video footage was taken from five rooms varying in physical size from a small studio to a small performance hall. Participants were asked to perceptually align two distinct acoustical parameters---early-to-late reverberant energy ratio and reverberation time---of two solo musical performances in five contrasting visual environments according to their expectations of how the room should sound given its visual appearance. In the second experiment in the study, video footage shot from four different listening positions within a general-purpose space was coupled with sounds derived from measured binaural impulse responses (IRs). The relationship between the presented image, sound, and virtual receiver position was examined. It was found that many visual cues caused different perceived events of the acoustic environment. This included the visual attributes of the space in which the performance was located as well as the visual attributes of the performer. The addressed visual makeup of the performer included: (1) an actual video of the performance, (2) a surrogate image of the performance, for example a loudspeaker's image reproducing the performance, (3) no visual image of the performance (empty room), or (4) a multi-source visual stimulus (actual video of the performance coupled with two images of loudspeakers positioned to the left and right of the performer). For this experiment, perceived auditory events of sound were measured in terms of two subjective spatial metrics: Listener Envelopment (LEV) and Apparent Source Width (ASW) These metrics were hypothesized to be dependent on the visual imagery of the presented performance. Data was also collected by participants matching direct and reverberant sound levels for the presented audio-visual scenes. In the final experiment, participants judged spatial expectations of an ensemble of musicians presented in the five physical spaces from Experiment 1. Supporting data was accumulated in two stages. First, participants were given an audio-visual matching test, in which they were instructed to align the auditory width of a performing ensemble to a varying set of audio and visual cues. In the second stage, a conjoint analysis design paradigm was explored to extrapolate the relative magnitude of explored audio-visual factors in affecting three assessed response criteria: Congruency (the perceived match-up of the auditory and visual cues in the assessed performance), ASW and LEV. Results show that both auditory and visual factors affect the collected responses, and that the two sensory modalities coincide in distinct interactions. This study reveals participant resiliency in the presence of forced auditory-visual mismatch: Participants are able to adjust the acoustic component of the cross-modal environment in a statistically similar way despite randomized starting values for the monitored parameters. Subjective results of the experiments are presented along with objective measurements for verification.

  6. A cyber-physical system for senior collapse detection

    NASA Astrophysics Data System (ADS)

    Grewe, Lynne; Magaña-Zook, Steven

    2014-06-01

    Senior Collapse Detection (SCD) is a system that uses cyber-physical techniques to create a "smart home" system to predict and detect the falling of senior/geriatric participants in home environments. This software application addresses the needs of millions of senior citizens who live at home by themselves and can find themselves in situations where they have fallen and need assistance. We discuss how SCD uses imagery, depth and audio to fuse and interact in a system that does not require the senior to wear any devices allowing them to be more autonomous. The Microsoft Kinect Sensor is used to collect imagery, depth and audio. We will begin by discussing the physical attributes of the "collapse detection problem". Next, we will discuss the task of feature extraction resulting in skeleton and joint tracking. Improvements in error detection of joint tracking will be highlighted. Next, we discuss the main module of "fall detection" using our mid-level skeleton features. Attributes including acceleration, position and room environment factor into the SCD fall detection decision. Finally, how a detected fall and the resultant emergency response are handled will be presented. Results in a home environment will be given.

  7. Action Unit Models of Facial Expression of Emotion in the Presence of Speech

    PubMed Central

    Shah, Miraj; Cooper, David G.; Cao, Houwei; Gur, Ruben C.; Nenkova, Ani; Verma, Ragini

    2014-01-01

    Automatic recognition of emotion using facial expressions in the presence of speech poses a unique challenge because talking reveals clues for the affective state of the speaker but distorts the canonical expression of emotion on the face. We introduce a corpus of acted emotion expression where speech is either present (talking) or absent (silent). The corpus is uniquely suited for analysis of the interplay between the two conditions. We use a multimodal decision level fusion classifier to combine models of emotion from talking and silent faces as well as from audio to recognize five basic emotions: anger, disgust, fear, happy and sad. Our results strongly indicate that emotion prediction in the presence of speech from action unit facial features is less accurate when the person is talking. Modeling talking and silent expressions separately and fusing the two models greatly improves accuracy of prediction in the talking setting. The advantages are most pronounced when silent and talking face models are fused with predictions from audio features. In this multi-modal prediction both the combination of modalities and the separate models of talking and silent facial expression of emotion contribute to the improvement. PMID:25525561

  8. Musical structure analysis using similarity matrix and dynamic programming

    NASA Astrophysics Data System (ADS)

    Shiu, Yu; Jeong, Hong; Kuo, C.-C. Jay

    2005-10-01

    Automatic music segmentation and structure analysis from audio waveforms based on a three-level hierarchy is examined in this research, where the three-level hierarchy includes notes, measures and parts. The pitch class profile (PCP) feature is first extracted at the note level. Then, a similarity matrix is constructed at the measure level, where a dynamic time warping (DTW) technique is used to enhance the similarity computation by taking the temporal distortion of similar audio segments into account. By processing the similarity matrix, we can obtain a coarse-grain music segmentation result. Finally, dynamic programming is applied to the coarse-grain segments so that a song can be decomposed into several major parts such as intro, verse, chorus, bridge and outro. The performance of the proposed music structure analysis system is demonstrated for pop and rock music.

  9. Fault Detection and Diagnosis of Railway Point Machines by Sound Analysis

    PubMed Central

    Lee, Jonguk; Choi, Heesu; Park, Daihee; Chung, Yongwha; Kim, Hee-Young; Yoon, Sukhan

    2016-01-01

    Railway point devices act as actuators that provide different routes to trains by driving switchblades from the current position to the opposite one. Point failure can significantly affect railway operations, with potentially disastrous consequences. Therefore, early detection of anomalies is critical for monitoring and managing the condition of rail infrastructure. We present a data mining solution that utilizes audio data to efficiently detect and diagnose faults in railway condition monitoring systems. The system enables extracting mel-frequency cepstrum coefficients (MFCCs) from audio data with reduced feature dimensions using attribute subset selection, and employs support vector machines (SVMs) for early detection and classification of anomalies. Experimental results show that the system enables cost-effective detection and diagnosis of faults using a cheap microphone, with accuracy exceeding 94.1% whether used alone or in combination with other known methods. PMID:27092509

  10. Four-Channel Biosignal Analysis and Feature Extraction for Automatic Emotion Recognition

    NASA Astrophysics Data System (ADS)

    Kim, Jonghwa; André, Elisabeth

    This paper investigates the potential of physiological signals as a reliable channel for automatic recognition of user's emotial state. For the emotion recognition, little attention has been paid so far to physiological signals compared to audio-visual emotion channels such as facial expression or speech. All essential stages of automatic recognition system using biosignals are discussed, from recording physiological dataset up to feature-based multiclass classification. Four-channel biosensors are used to measure electromyogram, electrocardiogram, skin conductivity and respiration changes. A wide range of physiological features from various analysis domains, including time/frequency, entropy, geometric analysis, subband spectra, multiscale entropy, etc., is proposed in order to search the best emotion-relevant features and to correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by emotion recognition results.

  11. Automated social skills training with audiovisual information.

    PubMed

    Tanaka, Hiroki; Sakti, Sakriani; Neubig, Graham; Negoro, Hideki; Iwasaka, Hidemi; Nakamura, Satoshi

    2016-08-01

    People with social communication difficulties tend to have superior skills using computers, and as a result computer-based social skills training systems are flourishing. Social skills training, performed by human trainers, is a well-established method to obtain appropriate skills in social interaction. Previous works have attempted to automate one or several parts of social skills training through human-computer interaction. However, while previous work on simulating social skills training considered only acoustic and linguistic features, human social skills trainers take into account visual features (e.g. facial expression, posture). In this paper, we create and evaluate a social skills training system that closes this gap by considering audiovisual features regarding ratio of smiling, yaw, and pitch. An experimental evaluation measures the difference in effectiveness of social skill training when using audio features and audiovisual features. Results showed that the visual features were effective to improve users' social skills.

  12. Freedom, Flow and Fairness: Exploring How Children Develop Socially at School through Outdoor Play

    ERIC Educational Resources Information Center

    Waite, Sue; Rogers, Sue; Evans, Julie

    2013-01-01

    In this article, we report on a study that sought to discover micro-level social interactions in fluid outdoor learning spaces. Our methodology was centred around the children; our methods moved with them and captured their social interactions through mobile audio-recording. We argue that our methodological approach supported access to…

  13. Audio direct broadcast satellites

    NASA Technical Reports Server (NTRS)

    Miller, J. E.

    1983-01-01

    Satellite sound broadcasting is, as the name implies, the use of satellite techniques and technology to broadcast directly from space to low-cost, consumer-quality receivers the types of sound programs commonly received in the AM and FM broadcast bands. It would be a ubiquitous service available to the general public in the home, in the car, and out in the open.

  14. Lecture Hall and Learning Design: A Survey of Variables, Parameters, Criteria and Interrelationships for Audio-Visual Presentation Systems and Audience Reception.

    ERIC Educational Resources Information Center

    Justin, J. Karl

    Variables and parameters affecting architectural planning and audiovisual systems selection for lecture halls and other learning spaces are surveyed. Interrelationships of factors are discussed, including--(1) design requirements for modern educational techniques as differentiated from cinema, theater or auditorium design, (2) general hall…

  15. Development of an audio-based virtual gaming environment to assist with navigation skills in the blind.

    PubMed

    Connors, Erin C; Yazzolino, Lindsay A; Sánchez, Jaime; Merabet, Lotfi B

    2013-03-27

    Audio-based Environment Simulator (AbES) is virtual environment software designed to improve real world navigation skills in the blind. Using only audio based cues and set within the context of a video game metaphor, users gather relevant spatial information regarding a building's layout. This allows the user to develop an accurate spatial cognitive map of a large-scale three-dimensional space that can be manipulated for the purposes of a real indoor navigation task. After game play, participants are then assessed on their ability to navigate within the target physical building represented in the game. Preliminary results suggest that early blind users were able to acquire relevant information regarding the spatial layout of a previously unfamiliar building as indexed by their performance on a series of navigation tasks. These tasks included path finding through the virtual and physical building, as well as a series of drop off tasks. We find that the immersive and highly interactive nature of the AbES software appears to greatly engage the blind user to actively explore the virtual environment. Applications of this approach may extend to larger populations of visually impaired individuals.

  16. Comparing perceived auditory width to the visual image of a performing ensemble in contrasting bi-modal environmentsa)

    PubMed Central

    Valente, Daniel L.; Braasch, Jonas; Myrbeck, Shane A.

    2012-01-01

    Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audiovisual environment in which participants were instructed to make auditory width judgments in dynamic bi-modal settings. The results of these psychophysical tests suggest the importance of congruent audio visual presentation to the ecological interpretation of an auditory scene. Supporting data were accumulated in five rooms of ascending volumes and varying reverberation times. Participants were given an audiovisual matching test in which they were instructed to pan the auditory width of a performing ensemble to a varying set of audio and visual cues in rooms. Results show that both auditory and visual factors affect the collected responses and that the two sensory modalities coincide in distinct interactions. The greatest differences between the panned audio stimuli given a fixed visual width were found in the physical space with the largest volume and the greatest source distance. These results suggest, in this specific instance, a predominance of auditory cues in the spatial analysis of the bi-modal scene. PMID:22280585

  17. Achieving perceptually-accurate aural telepresence

    NASA Astrophysics Data System (ADS)

    Henderson, Paul D.

    Immersive multimedia requires not only realistic visual imagery but also a perceptually-accurate aural experience. A sound field may be presented simultaneously to a listener via a loudspeaker rendering system using the direct sound from acoustic sources as well as a simulation or "auralization" of room acoustics. Beginning with classical Wave-Field Synthesis (WFS), improvements are made to correct for asymmetries in loudspeaker array geometry. Presented is a new Spatially-Equalized WFS (SE-WFS) technique to maintain the energy-time balance of a simulated room by equalizing the reproduced spectrum at the listener for a distribution of possible source angles. Each reproduced source or reflection is filtered according to its incidence angle to the listener. An SE-WFS loudspeaker array of arbitrary geometry reproduces the sound field of a room with correct spectral and temporal balance, compared with classically-processed WFS systems. Localization accuracy of human listeners in SE-WFS sound fields is quantified by psychoacoustical testing. At a loudspeaker spacing of 0.17 m (equivalent to an aliasing cutoff frequency of 1 kHz), SE-WFS exhibits a localization blur of 3 degrees, nearly equal to real point sources. Increasing the loudspeaker spacing to 0.68 m (for a cutoff frequency of 170 Hz) results in a blur of less than 5 degrees. In contrast, stereophonic reproduction is less accurate with a blur of 7 degrees. The ventriloquist effect is psychometrically investigated to determine the effect of an intentional directional incongruence between audio and video stimuli. Subjects were presented with prerecorded full-spectrum speech and motion video of a talker's head as well as broadband noise bursts with a static image. The video image was displaced from the audio stimulus in azimuth by varying amounts, and the perceived auditory location measured. A strong bias was detectable for small angular discrepancies between audio and video stimuli for separations of less than 8 degrees for speech and less than 4 degrees with a pink noise burst. The results allow for the density of WFS systems to be selected from the required localization accuracy. Also, by exploiting the ventriloquist effect, the angular resolution of an audio rendering may be reduced when combined with spatially-accurate video.

  18. A virtual reality browser for Space Station models

    NASA Technical Reports Server (NTRS)

    Goldsby, Michael; Pandya, Abhilash; Aldridge, Ann; Maida, James

    1993-01-01

    The Graphics Analysis Facility at NASA/JSC has created a visualization and learning tool by merging its database of detailed geometric models with a virtual reality system. The system allows an interactive walk-through of models of the Space Station and other structures, providing detailed realistic stereo images. The user can activate audio messages describing the function and connectivity of selected components within his field of view. This paper presents the issues and trade-offs involved in the implementation of the VR system and discusses its suitability for its intended purposes.

  19. Multi-modal gesture recognition using integrated model of motion, audio and video

    NASA Astrophysics Data System (ADS)

    Goutsu, Yusuke; Kobayashi, Takaki; Obara, Junya; Kusajima, Ikuo; Takeichi, Kazunari; Takano, Wataru; Nakamura, Yoshihiko

    2015-07-01

    Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it is difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed by using dataset captured by Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are learned by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vectors and learning methods. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition. The proposed system provides the application technology to understand human actions of daily life more precisely.

  20. Robotics control using isolated word recognition of voice input

    NASA Technical Reports Server (NTRS)

    Weiner, J. M.

    1977-01-01

    A speech input/output system is presented that can be used to communicate with a task oriented system. Human speech commands and synthesized voice output extend conventional information exchange capabilities between man and machine by utilizing audio input and output channels. The speech input facility is comprised of a hardware feature extractor and a microprocessor implemented isolated word or phrase recognition system. The recognizer offers a medium sized (100 commands), syntactically constrained vocabulary, and exhibits close to real time performance. The major portion of the recognition processing required is accomplished through software, minimizing the complexity of the hardware feature extractor.

  1. Leisure time activities in space: A survey of astronauts and cosmonauts

    NASA Astrophysics Data System (ADS)

    Kelly, Alan D.; Kanas, Nick

    Questionnaires were returned from 54 astronauts and cosmonauts which addressed preferences for media and media-generated subjects that could be used to occupy leisure time in space. Ninety-three percent of the respondents had access to records or audio cassettes, and cosmonauts had greater access than astronauts to multiple media. Cosmonauts and long-duration space travelers reported that they missed various media more than their astronaut and short-duration counterparts. Media subjects that related to international events, national events and historical topics were rated as most preferable by all respondents and by several of the respondent groups. The findings are discussed in terms of their relevance for occupying free time during future long-duration manned space missions.

  2. Aerospace Applications Conference, Steamboat Springs, CO, Feb. 1-8, 1986, Digest

    NASA Astrophysics Data System (ADS)

    The present conference considers topics concerning the projected NASA Space Station's systems, digital signal and data processing applications, and space science and microwave applications. Attention is given to Space Station video and audio subsystems design, clock error, jitter, phase error and differential time-of-arrival in satellite communications, automation and robotics in space applications, target insertion into synthetic background scenes, and a novel scheme for the computation of the discrete Fourier transform on a systolic processor. Also discussed are a novel signal parameter measurement system employing digital signal processing, EEPROMS for spacecraft applications, a unique concurrent processor architecture for high speed simulation of dynamic systems, a dual polarization flat plate antenna, Fresnel diffraction, and ultralinear TWTs for high efficiency satellite communications.

  3. Your place or mine: shared sensory experiences elicit a remapping of peripersonal space.

    PubMed

    Maister, Lara; Cardini, Flavia; Zamariola, Giorgia; Serino, Andrea; Tsakiris, Manos

    2015-04-01

    Our perceptual systems integrate multisensory information about objects that are close to our bodies, which allow us to respond quickly and appropriately to potential threats, as well as act upon and manipulate useful tools. Intriguingly, the representation of this area close to our body, known as the multisensory 'peripersonal space' (PPS), can expand or contract during social interactions. However, it is not yet known how different social interactions can alter the representation of PPS. In particular, shared sensory experiences, such as those elicited by bodily illusions such as the enfacement illusion, can induce feelings of ownership over the other's body which has also been shown to increase the remapping of the other's sensory experiences onto our own bodies. The current study investigated whether such shared sensory experiences between two people induced by the enfacement illusion could alter the way PPS was represented, and whether this alteration could be best described as an expansion of one's own PPS towards the other or a remapping of the other's PPS onto one's own. An audio-tactile integration task allowed us to measure the extent of the PPS before and after a shared sensory experience with a confederate. Our results showed a clear increase in audio-tactile integration in the space close to the confederate's body after the shared experience. Importantly, this increase did not extend across the space between participant and confederate, as would be expected if the participant's PPS had expanded. Thus, the pattern of results is more consistent with a partial remapping of the confederate's PPS onto the participant's own PPS. These results have important consequences for our understanding of interpersonal space during different kinds of social interactions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Engineering a Live UHD Program from the International Space Station

    NASA Technical Reports Server (NTRS)

    Grubbs, Rodney; George, Sandy

    2017-01-01

    The first-ever live downlink of Ultra-High Definition (UHD) video from the International Space Station (ISS) was the highlight of a “Super Session” at the National Association of Broadcasters (NAB) Show in April 2017. Ultra-High Definition is four times the resolution of “full HD” or “1080P” video. Also referred to as “4K”, the Ultra-High Definition video downlink from the ISS all the way to the Las Vegas Convention Center required considerable planning, pushed the limits of conventional video distribution from a space-craft, and was the first use of High Efficiency Video Coding (HEVC) from a space-craft. The live event at NAB will serve as a pathfinder for more routine downlinks of UHD as well as use of HEVC for conventional HD downlinks to save bandwidth. A similar demonstration was conducted in 2006 with the Discovery Channel to demonstrate the ability to stream HDTV from the ISS. This paper will describe the overall work flow and routing of the UHD video, how audio was synchronized even though the video and audio were received many seconds apart from each other, and how the demonstration paves the way for not only more efficient video distribution from the ISS, but also serves as a pathfinder for more complex video distribution from deep space. The paper will also describe how a “live” event was staged when the UHD video coming from the ISS had a latency of 10+ seconds. In addition, the paper will touch on the unique collaboration between the inherently governmental aspects of the ISS, commercial partners Amazon and Elemental, and the National Association of Broadcasters.

  5. The plastic ear and perceptual relearning in auditory spatial perception

    PubMed Central

    Carlile, Simon

    2014-01-01

    The auditory system of adult listeners has been shown to accommodate to altered spectral cues to sound location which presumably provides the basis for recalibration to changes in the shape of the ear over a life time. Here we review the role of auditory and non-auditory inputs to the perception of sound location and consider a range of recent experiments looking at the role of non-auditory inputs in the process of accommodation to these altered spectral cues. A number of studies have used small ear molds to modify the spectral cues that result in significant degradation in localization performance. Following chronic exposure (10–60 days) performance recovers to some extent and recent work has demonstrated that this occurs for both audio-visual and audio-only regions of space. This begs the questions as to the teacher signal for this remarkable functional plasticity in the adult nervous system. Following a brief review of influence of the motor state in auditory localization, we consider the potential role of auditory-motor learning in the perceptual recalibration of the spectral cues. Several recent studies have considered how multi-modal and sensory-motor feedback might influence accommodation to altered spectral cues produced by ear molds or through virtual auditory space stimulation using non-individualized spectral cues. The work with ear molds demonstrates that a relatively short period of training involving audio-motor feedback (5–10 days) significantly improved both the rate and extent of accommodation to altered spectral cues. This has significant implications not only for the mechanisms by which this complex sensory information is encoded to provide spatial cues but also for adaptive training to altered auditory inputs. The review concludes by considering the implications for rehabilitative training with hearing aids and cochlear prosthesis. PMID:25147497

  6. Shape Perception and Navigation in Blind Adults

    PubMed Central

    Gori, Monica; Cappagli, Giulia; Baud-Bovy, Gabriel; Finocchietti, Sara

    2017-01-01

    Different sensory systems interact to generate a representation of space and to navigate. Vision plays a critical role in the representation of space development. During navigation, vision is integrated with auditory and mobility cues. In blind individuals, visual experience is not available and navigation therefore lacks this important sensory signal. In blind individuals, compensatory mechanisms can be adopted to improve spatial and navigation skills. On the other hand, the limitations of these compensatory mechanisms are not completely clear. Both enhanced and impaired reliance on auditory cues in blind individuals have been reported. Here, we develop a new paradigm to test both auditory perception and navigation skills in blind and sighted individuals and to investigate the effect that visual experience has on the ability to reproduce simple and complex paths. During the navigation task, early blind, late blind and sighted individuals were required first to listen to an audio shape and then to recognize and reproduce it by walking. After each audio shape was presented, a static sound was played and the participants were asked to reach it. Movements were recorded with a motion tracking system. Our results show three main impairments specific to early blind individuals. The first is the tendency to compress the shapes reproduced during navigation. The second is the difficulty to recognize complex audio stimuli, and finally, the third is the difficulty in reproducing the desired shape: early blind participants occasionally reported perceiving a square but they actually reproduced a circle during the navigation task. We discuss these results in terms of compromised spatial reference frames due to lack of visual input during the early period of development. PMID:28144226

  7. The Memory Jog Service

    NASA Astrophysics Data System (ADS)

    Dimakis, Nikolaos; Soldatos, John; Polymenakos, Lazaros; Sturm, Janienke; Neumann, Joachim; Casas, Josep R.

    The CHIL Memory Jog service focuses on facilitating the collaboration of participants in meetings, lectures, presentations, and other human interactive events, occurring in indoor CHIL spaces. It exploits the whole set of the perceptual components that have been developed by the CHIL Consortium partners (e.g., person tracking, face identification, audio source localization, etc) along with a wide range of actuating devices such as projectors, displays, targeted audio devices, speakers, etc. The underlying set of perceptual components provides a constant flow of elementary contextual information, such as “person at location x0,y0”, “speech at location x0,y0”, information that alone is not of significant use. However, the CHIL Memory Jog service is accompanied by powerful situation identification techniques that fuse all the incoming information and creates complex states that drive the actuating logic.

  8. Ranking Highlights in Personal Videos by Analyzing Edited Videos.

    PubMed

    Sun, Min; Farhadi, Ali; Chen, Tseng-Hung; Seitz, Steve

    2016-11-01

    We present a fully automatic system for ranking domain-specific highlights in unconstrained personal videos by analyzing online edited videos. A novel latent linear ranking model is proposed to handle noisy training data harvested online. Specifically, given a targeted domain such as "surfing," our system mines the YouTube database to find pairs of raw and their corresponding edited videos. Leveraging the assumption that an edited video is more likely to contain highlights than the trimmed parts of the raw video, we obtain pair-wise ranking constraints to train our model. The learning task is challenging due to the amount of noise and variation in the mined data. Hence, a latent loss function is incorporated to mitigate the issues caused by the noise. We efficiently learn the latent model on a large number of videos (about 870 min in total) using a novel EM-like procedure. Our latent ranking model outperforms its classification counterpart and is fairly competitive compared with a fully supervised ranking system that requires labels from Amazon Mechanical Turk. We further show that a state-of-the-art audio feature mel-frequency cepstral coefficients is inferior to a state-of-the-art visual feature. By combining both audio-visual features, we obtain the best performance in dog activity, surfing, skating, and viral video domains. Finally, we show that impressive highlights can be detected without additional human supervision for seven domains (i.e., skating, surfing, skiing, gymnastics, parkour, dog activity, and viral video) in unconstrained personal videos.

  9. A Dynamical Model of Pitch Memory Provides an Improved Basis for Implied Harmony Estimation.

    PubMed

    Kim, Ji Chul

    2017-01-01

    Tonal melody can imply vertical harmony through a sequence of tones. Current methods for automatic chord estimation commonly use chroma-based features extracted from audio signals. However, the implied harmony of unaccompanied melodies can be difficult to estimate on the basis of chroma content in the presence of frequent nonchord tones. Here we present a novel approach to automatic chord estimation based on the human perception of pitch sequences. We use cohesion and inhibition between pitches in auditory short-term memory to differentiate chord tones and nonchord tones in tonal melodies. We model short-term pitch memory as a gradient frequency neural network, which is a biologically realistic model of auditory neural processing. The model is a dynamical system consisting of a network of tonotopically tuned nonlinear oscillators driven by audio signals. The oscillators interact with each other through nonlinear resonance and lateral inhibition, and the pattern of oscillatory traces emerging from the interactions is taken as a measure of pitch salience. We test the model with a collection of unaccompanied tonal melodies to evaluate it as a feature extractor for chord estimation. We show that chord tones are selectively enhanced in the response of the model, thereby increasing the accuracy of implied harmony estimation. We also find that, like other existing features for chord estimation, the performance of the model can be improved by using segmented input signals. We discuss possible ways to expand the present model into a full chord estimation system within the dynamical systems framework.

  10. Personnel in blue and white FCR bldg 30 during STS-106

    NASA Image and Video Library

    2000-09-19

    JSC2000-E-22831 (13 September 2000) --- Astronauts Barbara R. Morgan and Chris A. Hadfield listen to downlinked audio from the Space Shuttle Atlantis at the approximate midway point of the STS-106 mission. The two are working at the Spacecraft Communicator (CAPCOM) console in Houston's Mission Control Center (MCC). Nearby is Bill Reeves at the Flight Director console.

  11. Intentional Voice Command Detection for Trigger-Free Speech Interface

    NASA Astrophysics Data System (ADS)

    Obuchi, Yasunari; Sumiyoshi, Takashi

    In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.

  12. Social Network Extraction and Analysis Based on Multimodal Dyadic Interaction

    PubMed Central

    Escalera, Sergio; Baró, Xavier; Vitrià, Jordi; Radeva, Petia; Raducanu, Bogdan

    2012-01-01

    Social interactions are a very important component in people’s lives. Social network analysis has become a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. For our study, we used a set of videos belonging to New York Times’ Blogging Heads opinion blog. The Social Network is represented as an oriented graph, whose directed links are determined by the Influence Model. The links’ weights are a measure of the “influence” a person has over the other. The states of the Influence Model encode automatically extracted audio/visual features from our videos using state-of-the art algorithms. Our results are reported in terms of accuracy of audio/visual data fusion for speaker segmentation and centrality measures used to characterize the extracted social network. PMID:22438733

  13. Transfer Learning for Improved Audio-Based Human Activity Recognition.

    PubMed

    Ntalampiras, Stavros; Potamitis, Ilyas

    2018-06-25

    Human activities are accompanied by characteristic sound events, the processing of which might provide valuable information for automated human activity recognition. This paper presents a novel approach addressing the case where one or more human activities are associated with limited audio data, resulting in a potentially highly imbalanced dataset. Data augmentation is based on transfer learning; more specifically, the proposed method: (a) identifies the classes which are statistically close to the ones associated with limited data; (b) learns a multiple input, multiple output transformation; and (c) transforms the data of the closest classes so that it can be used for modeling the ones associated with limited data. Furthermore, the proposed framework includes a feature set extracted out of signal representations of diverse domains, i.e., temporal, spectral, and wavelet. Extensive experiments demonstrate the relevance of the proposed data augmentation approach under a variety of generative recognition schemes.

  14. Real-time speech-driven animation of expressive talking faces

    NASA Astrophysics Data System (ADS)

    Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli

    2011-05-01

    In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.

  15. Frequency shifting approach towards textual transcription of heartbeat sounds.

    PubMed

    Arvin, Farshad; Doraisamy, Shyamala; Safar Khorasani, Ehsan

    2011-10-04

    Auscultation is an approach for diagnosing many cardiovascular problems. Automatic analysis of heartbeat sounds and extraction of its audio features can assist physicians towards diagnosing diseases. Textual transcription allows recording a continuous heart sound stream using a text format which can be stored in very small memory in comparison with other audio formats. In addition, a text-based data allows applying indexing and searching techniques to access to the critical events. Hence, the transcribed heartbeat sounds provides useful information to monitor the behavior of a patient for the long duration of time. This paper proposes a frequency shifting method in order to improve the performance of the transcription. The main objective of this study is to transfer the heartbeat sounds to the music domain. The proposed technique is tested with 100 samples which were recorded from different heart diseases categories. The observed results show that, the proposed shifting method significantly improves the performance of the transcription.

  16. MicrobeWorld Radio and Communications Initiative

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barbara Hyde

    2006-11-22

    MicrobeWorld is a 90-second feature broadcast daily on more than 90 public radio stations and available from several sources as a podcast, including www.microbeworld.org. The feature has a strong focus on the use and adapatbility of microbes as alternative sources of energy, in bioremediation, their role in climate, and especially the many benefits and scientific advances that have resulting from decoding microbial genomes. These audio features are permanantly archived on an educational outreach site, microbeworld.org, where they are linked to the National Science Education Standards. They are also being used by instructors at all levels to introduce students to themore » multiple roles and potential of microbes, including a pilot curriculum program for middle-school students in New York.« less

  17. Coordination and interpretation of vocal and visible resources: 'trail-off' conjunctions.

    PubMed

    Walker, Gareth

    2012-03-01

    The empirical focus of this paper is a conversational turn-taking phenomenon in which conjunctions produced immediately after a point of possible syntactic and pragmatic completion are treated by co-participants as points of possible completion and transition relevance. The data for this study are audio-video recordings of 5 unscripted face-to-face interactions involving native speakers of US English, yielding 28 'trail-off' conjunctions. Detailed sequential analysis of talk is combined with analysis of visible features (including gaze, posture, gesture and involvement with material objects) and technical phonetic analysis. A range of phonetic and visible features are shown to regularly co-occur in the production of 'trail-off' conjunctions. These features distinguish them from other conjunctions followed by the cessation of talk.

  18. Wavelet-based audio embedding and audio/video compression

    NASA Astrophysics Data System (ADS)

    Mendenhall, Michael J.; Claypoole, Roger L., Jr.

    2001-12-01

    Watermarking, traditionally used for copyright protection, is used in a new and exciting way. An efficient wavelet-based watermarking technique embeds audio information into a video signal. Several effective compression techniques are applied to compress the resulting audio/video signal in an embedded fashion. This wavelet-based compression algorithm incorporates bit-plane coding, index coding, and Huffman coding. To demonstrate the potential of this audio embedding and audio/video compression algorithm, we embed an audio signal into a video signal and then compress. Results show that overall compression rates of 15:1 can be achieved. The video signal is reconstructed with a median PSNR of nearly 33 dB. Finally, the audio signal is extracted from the compressed audio/video signal without error.

  19. Three-Dimensional Audio Client Library

    NASA Technical Reports Server (NTRS)

    Rizzi, Stephen A.

    2005-01-01

    The Three-Dimensional Audio Client Library (3DAudio library) is a group of software routines written to facilitate development of both stand-alone (audio only) and immersive virtual-reality application programs that utilize three-dimensional audio displays. The library is intended to enable the development of three-dimensional audio client application programs by use of a code base common to multiple audio server computers. The 3DAudio library calls vendor-specific audio client libraries and currently supports the AuSIM Gold-Server and Lake Huron audio servers. 3DAudio library routines contain common functions for (1) initiation and termination of a client/audio server session, (2) configuration-file input, (3) positioning functions, (4) coordinate transformations, (5) audio transport functions, (6) rendering functions, (7) debugging functions, and (8) event-list-sequencing functions. The 3DAudio software is written in the C++ programming language and currently operates under the Linux, IRIX, and Windows operating systems.

  20. IVA the robot: Design guidelines and lessons learned from the first space station laboratory manipulation system

    NASA Technical Reports Server (NTRS)

    Konkel, Carl R.; Powers, Allen K.; Dewitt, J. Russell

    1991-01-01

    The first interactive Space Station Freedom (SSF) lab robot exhibit was installed at the Space and Rocket Center in Huntsville, AL, and has been running daily since. IntraVehicular Activity (IVA) the robot is mounted in a full scale U.S. Lab (USL) mockup to educate the public on possible automation and robotic applications aboard the SSF. Responding to audio and video instructions at the Command Console, exhibit patrons may prompt IVA to perform a housekeeping task or give a speaking tour of the module. Other exemplary space station tasks are simulated and the public can even challenge IVA to a game of tic tac toe. In anticipation of such a system being built for the Space Station, a discussion is provided of the approach taken, along with suggestions for applicability to the Space Station Environment.

  1. Audio-frequency magnetotelluric imaging of the Hijima fault, Yamasaki fault system, southwest Japan

    NASA Astrophysics Data System (ADS)

    Yamaguchi, S.; Ogawa, Y.; Fuji-Ta, K.; Ujihara, N.; Inokuchi, H.; Oshiman, N.

    2010-04-01

    An audio-frequency magnetotelluric (AMT) survey was undertaken at ten sites along a transect across the Hijima fault, a major segment of the Yamasaki fault system, Japan. The data were subjected to dimensionality analysis, following which two-dimensional inversions for the TE and TM modes were carried out. This model is characterized by (1) a clear resistivity boundary that coincides with the downward projection of the surface trace of the Hijima fault, (2) a resistive zone (>500 Ω m) that corresponds to Mesozoic sediment, and (3) shallow and deep two highly conductive zones (30-40 Ω m) along the fault. The shallow conductive zone is a common feature of the Yamasaki fault system, whereas the deep conductor is a newly discovered feature at depths of 800-1,800 m to the southwest of the fault. The conductor is truncated by the Hijima fault to the northeast, and its upper boundary is the resistive zone. Both conductors are interpreted to represent a combination of clay minerals and a fluid network within a fault-related fracture zone. In terms of the development of the fluid networks, the fault core of the Hijima fault and the highly resistive zone may play important roles as barriers to fluid flow on the northeast and upper sides of the conductive zones, respectively.

  2. (abstract) Synthesis of Speaker Facial Movements to Match Selected Speech Sequences

    NASA Technical Reports Server (NTRS)

    Scott, Kenneth C.

    1994-01-01

    We are developing a system for synthesizing image sequences the simulate the facial motion of a speaker. To perform this synthesis, we are pursuing two major areas of effort. We are developing the necessary computer graphics technology to synthesize a realistic image sequence of a person speaking selected speech sequences. Next, we are developing a model that expresses the relation between spoken phonemes and face/mouth shape. A subject is video taped speaking an arbitrary text that contains expression of the full list of desired database phonemes. The subject is video taped from the front speaking normally, recording both audio and video detail simultaneously. Using the audio track, we identify the specific video frames on the tape relating to each spoken phoneme. From this range we digitize the video frame which represents the extreme of mouth motion/shape. Thus, we construct a database of images of face/mouth shape related to spoken phonemes. A selected audio speech sequence is recorded which is the basis for synthesizing a matching video sequence; the speaker need not be the same as used for constructing the database. The audio sequence is analyzed to determine the spoken phoneme sequence and the relative timing of the enunciation of those phonemes. Synthesizing an image sequence corresponding to the spoken phoneme sequence is accomplished using a graphics technique known as morphing. Image sequence keyframes necessary for this processing are based on the spoken phoneme sequence and timing. We have been successful in synthesizing the facial motion of a native English speaker for a small set of arbitrary speech segments. Our future work will focus on advancement of the face shape/phoneme model and independent control of facial features.

  3. Fiber-channel audio video standard for military and commercial aircraft product lines

    NASA Astrophysics Data System (ADS)

    Keller, Jack E.

    2002-08-01

    Fibre channel is an emerging high-speed digital network technology that combines to make inroads into the avionics arena. The suitability of fibre channel for such applications is largely due to its flexibility in these key areas: Network topologies can be configured in point-to-point, arbitrated loop or switched fabric connections. The physical layer supports either copper or fiber optic implementations with a Bit Error Rate of less than 10-12. Multiple Classes of Service are available. Multiple Upper Level Protocols are supported. Multiple high speed data rates offer open ended growth paths providing speed negotiation within a single network. Current speeds supported by commercially available hardware are 1 and 2 Gbps providing effective data rates of 100 and 200 MBps respectively. Such networks lend themselves well to the transport of digital video and audio data. This paper summarizes an ANSI standard currently in the final approval cycle of the InterNational Committee for Information Technology Standardization (INCITS). This standard defines a flexible mechanism whereby digital video, audio and ancillary data are systematically packaged for transport over a fibre channel network. The basic mechanism, called a container, houses audio and video content functionally grouped as elements of the container called objects. Featured in this paper is a specific container mapping called Simple Parametric Digital Video (SPDV) developed particularly to address digital video in avionics systems. SPDV provides pixel-based video with associated ancillary data typically sourced by various sensors to be processed and/or distributed in the cockpit for presentation via high-resolution displays. Also highlighted in this paper is a streamlined Upper Level Protocol (ULP) called Frame Header Control Procedure (FHCP) targeted for avionics systems where the functionality of a more complex ULP is not required.

  4. KSC-2012-2706

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, stowage of a Ku-band antenna at the forward end of space shuttle Endeavour’s payload bay is in progress in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  5. KSC-2012-2707

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, operations are under way to stow a Ku-band antenna in space shuttle Endeavour’s payload bay in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  6. KSC-2012-2709

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, a Ku-band antenna is being stowed in space shuttle Endeavour’s payload bay in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  7. KSC-2012-2712

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, stowage of a Ku-band antenna in space shuttle Endeavour’s payload bay is under way in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  8. KSC-2012-2711

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, stowage of a Ku-band antenna in space shuttle Endeavour’s payload bay is under way in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  9. KSC-2012-2715

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, a Ku-band antenna is stowed at the forward end of space shuttle Endeavour’s payload bay in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  10. KSC-2012-2714

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, a Ku-band antenna is stowed at the forward end of space shuttle Endeavour’s payload bay in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  11. KSC-2012-2713

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, a Ku-band antenna is stowed in space shuttle Endeavour’s payload bay in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  12. KSC-2012-2710

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, stowage of a Ku-band antenna in space shuttle Endeavour’s payload bay is under way in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  13. KSC-2012-2708

    NASA Image and Video Library

    2012-05-10

    CAPE CANAVERAL, Fla. – In Orbiter Processing Facility-2 at NASA's Kennedy Space Center in Florida, a Ku-band antenna is being stowed in space shuttle Endeavour’s payload bay in preparation for final closure of the shuttle’s payload bay doors. The antenna, which resembles a mini-satellite dish, was used to transmit audio, video and data between the shuttle and ground stations on Earth. Endeavour is being prepared for public display at the California Science Center in Los Angeles. Its ferry flight to California is targeted for mid-September. Endeavour was the last space shuttle added to NASA’s orbiter fleet. Over the course of its 19-year career, Endeavour spent 299 days in space during 25 missions. For more information, visit http://www.nasa.gov/transition. Photo credit: NASA/Cory Huston

  14. Comparing Deaf and Hearing Dutch Infants: Changes in the Vowel Space in the First 2 Years

    ERIC Educational Resources Information Center

    van der Stelt, Jeannette M.; Wempe, Ton G.; Pols, Louis C. W.

    2008-01-01

    The influence of the mother tongue on vowel productions in infancy is different for deaf and hearing babies. Audio material of five hearing and five deaf infants acquiring Dutch was collected monthly from month 5-18, and at 24 months. Fifty unlabelled utterances were digitized for each recording. This study focused on developmental paths in vowel…

  15. First Facility Utilization Manual. A Teachers Guide to the Use of the FLNT Elementary School. Fort Lincoln New Town Education System.

    ERIC Educational Resources Information Center

    General Learning Corp., Washington, DC.

    This guide endeavors to teach the faculty how to manipulate the structure of the new facility in the most creative way. The first chapters discuss the interior design, graphic considerations within the facility, materials and equipment suited for open space schools, and recommended audio-systems. Later chapters cover the exterior facilities, such…

  16. Foale during telecon in the U.S. Lab during Expedition 8

    NASA Image and Video Library

    2003-12-28

    ISS008-E-10745 (28 December 2003) --- Astronaut C. Michael Foale, Expedition 8 mission commander and NASA ISS science officer, conducts a teleconference with the Moscow Support Group for the Russian New Year celebration, via Ku- and S-band, with audio and video relayed to the Mission Control Center (MCC) at Johnson Space Center (JSC). Holiday decorations are visible in the background.

  17. Spacecraft transmitter reliability

    NASA Technical Reports Server (NTRS)

    1980-01-01

    A workshop on spacecraft transmitter reliability was held at the NASA Lewis Research Center on September 25 and 26, 1979, to discuss present knowledge and to plan future research areas. Since formal papers were not submitted, this synopsis was derived from audio tapes of the workshop. The following subjects were covered: users' experience with space transmitters; cathodes; power supplies and interfaces; and specifications and quality assurance. A panel discussion ended the workshop.

  18. Ad Hoc Selection of Voice over Internet Streams

    NASA Technical Reports Server (NTRS)

    Macha, Mitchell G. (Inventor); Bullock, John T. (Inventor)

    2014-01-01

    A method and apparatus for a communication system technique involving ad hoc selection of at least two audio streams is provided. Each of the at least two audio streams is a packetized version of an audio source. A data connection exists between a server and a client where a transport protocol actively propagates the at least two audio streams from the server to the client. Furthermore, software instructions executable on the client indicate a presence of the at least two audio streams, allow selection of at least one of the at least two audio streams, and direct the selected at least one of the at least two audio streams for audio playback.

  19. Ad Hoc Selection of Voice over Internet Streams

    NASA Technical Reports Server (NTRS)

    Macha, Mitchell G. (Inventor); Bullock, John T. (Inventor)

    2008-01-01

    A method and apparatus for a communication system technique involving ad hoc selection of at least two audio streams is provided. Each of the at least two audio streams is a packetized version of an audio source. A data connection exists between a server and a client where a transport protocol actively propagates the at least two audio streams from the server to the client. Furthermore, software instructions executable on the client indicate a presence of the at least two audio streams, allow selection of at least one of the at least two audio streams, and direct the selected at least one of the at least two audio streams for audio playback.

  20. Clinical characterization and etiology of space motion sickness

    NASA Technical Reports Server (NTRS)

    Thornton, William E.; Moore, Thomas P.; Pool, Sam L.; Vanderploeg, James

    1987-01-01

    An inflight, clinically-oriented investigation of space motion sickness (SMS) was begun on STS-4 and revealed the following: compared to motion sickness (MS) on earth, automatic signs are significantly different in SMS vs. MS in that sweating is not present, pallor or flushing may be present, and vomiting is episodic, sudden, and brief. Postflight there is a period of resistance to all forms of MS. There is some evidence for individual reduction in sensitivity on repeated flights. Electrooculogram, audio-evoked potentials, measurement of fluid shifts, and other studies are inconsistent with a transient vestibular hydrops or increased intracranial pressure as a cause.

  1. A Synthetic Quadrature Phase Detector/Demodulator for Fourier Transform Transform Spectrometers

    NASA Technical Reports Server (NTRS)

    Campbell, Joel

    2008-01-01

    A method is developed to demodulate (velocity correct) Fourier transform spectrometer (FTS) data that is taken with an analog to digital converter that digitizes equally spaced in time. This method makes it possible to use simple low cost, high resolution audio digitizers to record high quality data without the need for an event timer or quadrature laser hardware, and makes it possible to use a metrology laser of any wavelength. The reduced parts count and simplicity implementation makes it an attractive alternative in space based applications when compared to previous methods such as the Brault algorithm.

  2. Audio in Courseware: Design Knowledge Issues.

    ERIC Educational Resources Information Center

    Aarntzen, Diana

    1993-01-01

    Considers issues that need to be addressed when incorporating audio in courseware design. Topics discussed include functions of audio in courseware; the relationship between auditive and visual information; learner characteristics in relation to audio; events of instruction; and audio characteristics, including interactivity and speech technology.…

  3. A Virtual Audio Guidance and Alert System for Commercial Aircraft Operations

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.; Shrum, Richard; Miller, Joel; Null, Cynthia H. (Technical Monitor)

    1996-01-01

    Our work in virtual reality systems at NASA Ames Research Center includes the area of aurally-guided visual search, using specially-designed audio cues and spatial audio processing (also known as virtual or "3-D audio") techniques (Begault, 1994). Previous studies at Ames had revealed that use of 3-D audio for Traffic Collision Avoidance System (TCAS) advisories significantly reduced head-down time, compared to a head-down map display (0.5 sec advantage) or no display at all (2.2 sec advantage) (Begault, 1993, 1995; Begault & Pittman, 1994; see Wenzel, 1994, for an audio demo). Since the crew must keep their head up and looking out the window as much as possible when taxiing under low-visibility conditions, and the potential for "blunder" is increased under such conditions, it was sensible to evaluate the audio spatial cueing for a prototype audio ground collision avoidance warning (GCAW) system, and a 3-D audio guidance system. Results were favorable for GCAW, but not for the audio guidance system.

  4. The priming function of in-car audio instruction.

    PubMed

    Keyes, Helen; Whitmore, Antony; Naneva, Stanislava; McDermott, Daragh

    2018-05-01

    Studies to date have focused on the priming power of visual road signs, but not the priming potential of audio road scene instruction. Here, the relative priming power of visual, audio, and multisensory road scene instructions was assessed. In a lab-based study, participants responded to target road scene turns following visual, audio, or multisensory road turn primes which were congruent or incongruent to the primes in direction, or control primes. All types of instruction (visual, audio, and multisensory) were successful in priming responses to a road scene. Responses to multisensory-primed targets (both audio and visual) were faster than responses to either audio or visual primes alone. Incongruent audio primes did not affect performance negatively in the manner of incongruent visual or multisensory primes. Results suggest that audio instructions have the potential to prime drivers to respond quickly and safely to their road environment. Peak performance will be observed if audio and visual road instruction primes can be timed to co-occur.

  5. Audio-visual interactions in environment assessment.

    PubMed

    Preis, Anna; Kociński, Jędrzej; Hafke-Dys, Honorata; Wrzosek, Małgorzata

    2015-08-01

    The aim of the study was to examine how visual and audio information influences audio-visual environment assessment. Original audio-visual recordings were made at seven different places in the city of Poznań. Participants of the psychophysical experiments were asked to rate, on a numerical standardized scale, the degree of comfort they would feel if they were in such an environment. The assessments of audio-visual comfort were carried out in a laboratory in four different conditions: (a) audio samples only, (b) original audio-visual samples, (c) video samples only, and (d) mixed audio-visual samples. The general results of this experiment showed a significant difference between the investigated conditions, but not for all the investigated samples. There was a significant improvement in comfort assessment when visual information was added (in only three out of 7 cases), when conditions (a) and (b) were compared. On the other hand, the results show that the comfort assessment of audio-visual samples could be changed by manipulating the audio rather than the video part of the audio-visual sample. Finally, it seems, that people could differentiate audio-visual representations of a given place in the environment based rather of on the sound sources' compositions than on the sound level. Object identification is responsible for both landscape and soundscape grouping. Copyright © 2015. Published by Elsevier B.V.

  6. Spatial filtering of audible sound with acoustic landscapes

    NASA Astrophysics Data System (ADS)

    Wang, Shuping; Tao, Jiancheng; Qiu, Xiaojun; Cheng, Jianchun

    2017-07-01

    Acoustic metasurfaces manipulate waves with specially designed structures and achieve properties that natural materials cannot offer. Similar surfaces work in audio frequency range as well and lead to marvelous acoustic phenomena that can be perceived by human ears. Being intrigued by the famous Maoshan Bugle phenomenon, we investigate large scale metasurfaces consisting of periodic steps of sizes comparable to the wavelength of audio frequency in both time and space domains. We propose a theoretical method to calculate the scattered sound field and find that periodic corrugated surfaces work as spatial filters and the frequency selective character can only be observed at the same side as the incident wave. The Maoshan Bugle phenomenon can be well explained with the method. Finally, we demonstrate that the proposed method can be used to design acoustical landscapes, which transform impulsive sound into famous trumpet solos or other melodious sound.

  7. 47 CFR 73.403 - Digital audio broadcasting service requirements.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 47 Telecommunication 4 2012-10-01 2012-10-01 false Digital audio broadcasting service requirements... SERVICES RADIO BROADCAST SERVICES Digital Audio Broadcasting § 73.403 Digital audio broadcasting service requirements. (a) Broadcast radio stations using IBOC must transmit at least one over-the-air digital audio...

  8. 47 CFR 73.403 - Digital audio broadcasting service requirements.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 47 Telecommunication 4 2011-10-01 2011-10-01 false Digital audio broadcasting service requirements... SERVICES RADIO BROADCAST SERVICES Digital Audio Broadcasting § 73.403 Digital audio broadcasting service requirements. (a) Broadcast radio stations using IBOC must transmit at least one over-the-air digital audio...

  9. 47 CFR 73.403 - Digital audio broadcasting service requirements.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 47 Telecommunication 4 2014-10-01 2014-10-01 false Digital audio broadcasting service requirements... SERVICES RADIO BROADCAST SERVICES Digital Audio Broadcasting § 73.403 Digital audio broadcasting service requirements. (a) Broadcast radio stations using IBOC must transmit at least one over-the-air digital audio...

  10. 47 CFR 73.403 - Digital audio broadcasting service requirements.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 47 Telecommunication 4 2013-10-01 2013-10-01 false Digital audio broadcasting service requirements... SERVICES RADIO BROADCAST SERVICES Digital Audio Broadcasting § 73.403 Digital audio broadcasting service requirements. (a) Broadcast radio stations using IBOC must transmit at least one over-the-air digital audio...

  11. [Intermodal timing cues for audio-visual speech recognition].

    PubMed

    Hashimoto, Masahiro; Kumashiro, Masaharu

    2004-06-01

    The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.

  12. The Development of a Web-Based Urban Soundscape Evaluation System

    NASA Astrophysics Data System (ADS)

    Sudarsono, A. S.; Sarwono, J.

    2018-05-01

    Acoustic quality is one of the important aspects of urban design. It is usually evaluated based on how loud the urban environment is. However, this approach does not consider people’s perception of the urban acoustic environment. Therefore, a different method has been developed based on the perception of the acoustic environment using the concept of soundscape. Soundscape is defined as the acoustic environment perceived by people who are part of the environment. This approach considers the relationship between the sound source, the environment, and the people. The analysis of soundscape considers many aspects such as cultural aspects, people’s expectations, people’s experience of space, and social aspects. Soundscape affects many aspects of human life such as culture, health, and the quality of life. Urban soundscape management and planning must be integrated with the other aspect of urban design, both in the design and the improvement stages. The soundscape concept seeks to make the acoustic environment as pleasant as possible in a space with or without uncomfortable sound sources. Soundscape planning includes the design of physical features to achieve a positive perceptual outcome. It is vital to gather data regarding the relationship between humans and the components of a soundscape, e.g., sound sources, features of the physical environment, the functions of a space, and the expectation of the sound source. The data can be measured and gathered using several soundscape evaluation methods. Soundscape evaluation is usually conducted using in-situ surveys and laboratory experiments using a multi-speaker system. Although these methods have been validated and are widely used in soundscape analysis, there are some limitations in the application. The in-situ survey needs to be done at one time with many people at the same time because it is hard to replicate the acoustic environment. Conversely, the laboratory experiment does not have a problem with the repetition of the experiment. This method requires a room with a multi-speaker reproduction system. This project used a different method to analyse soundscape developed using headphones via the internet. The internet system for data gathering has been established; a website has enabled to reproduce high-quality audio and it has a system to design online questionnaires. Furthermore, the development of a virtual reality system allows the reproduction of virtual audio-visual stimulus on a website. Although the website has an established system to gather the required data, the problem is the validation of the reproduction system for soundscape analysis, which needs to be done with consideration of several factors: the suitable recording system, the effect of headphone variation, the calibration of the system, and the perception result from internet-based acoustic environment reproduction. This study aims to develop and validate a web-based urban soundscape evaluation method. By using this method, the experiment can be repeated easily and data can be gathered from many respondents. Furthermore, the simplicity of the system allows for the application by the stakeholders in urban design. The data gathered from this system is important for the design of an urban area with consideration of the acoustic aspects.

  13. The power of digital audio in interactive instruction: An unexploited medium

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pratt, J.; Trainor, M.

    1989-01-01

    Widespread use of audio in computer-based training (CBT) occurred with the advent of the interactive videodisc technology. This paper discusses the alternative of digital audio, which, unlike videodisc audio, enables one to rapidly revise the audio used in the CBT and which may be used in nonvideo CBT applications as well. We also discuss techniques used in audio script writing, editing, and production. Results from evaluations indicate a high degree of user satisfaction. 4 refs.

  14. A Sieving ANN for Emotion-Based Movie Clip Classification

    NASA Astrophysics Data System (ADS)

    Watanapa, Saowaluk C.; Thipakorn, Bundit; Charoenkitkarn, Nipon

    Effective classification and analysis of semantic contents are very important for the content-based indexing and retrieval of video database. Our research attempts to classify movie clips into three groups of commonly elicited emotions, namely excitement, joy and sadness, based on a set of abstract-level semantic features extracted from the film sequence. In particular, these features consist of six visual and audio measures grounded on the artistic film theories. A unique sieving-structured neural network is proposed to be the classifying model due to its robustness. The performance of the proposed model is tested with 101 movie clips excerpted from 24 award-winning and well-known Hollywood feature films. The experimental result of 97.8% correct classification rate, measured against the collected human-judges, indicates the great potential of using abstract-level semantic features as an engineered tool for the application of video-content retrieval/indexing.

  15. Video2vec Embeddings Recognize Events When Examples Are Scarce.

    PubMed

    Habibian, Amirhossein; Mensink, Thomas; Snoek, Cees G M

    2017-10-01

    This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call Video2vec, the correlations between the words are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the Video2vec embedding using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation. We also propose an event specific variant of Video2vec to learn a more accurate representation for the words, which are indicative of the event, by introducing a term sensitive descriptiveness loss. Our experiments on three challenging collections of web videos from the NIST TRECVID Multimedia Event Detection and Columbia Consumer Videos datasets demonstrate: i) the advantages of Video2vec over representations using attributes or alternative embeddings, ii) the benefit of fusing video modalities by an embedding over common strategies, iii) the complementarity of term sensitive descriptiveness and multimodal predictability for event recognition. By its ability to improve predictability of present day audio-visual video features, while at the same time maximizing their semantic descriptiveness, Video2vec leads to state-of-the-art accuracy for both few- and zero-example recognition of events in video.

  16. 47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... Message (EOM) codes using the EAS Protocol. The Attention Signal must precede any emergency audio message... audio messages. No Attention Signal is required for EAS messages that do not contain audio programming... EAS messages in the main audio channel. All DAB stations shall also transmit EAS messages on all audio...

  17. 47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... Message (EOM) codes using the EAS Protocol. The Attention Signal must precede any emergency audio message... audio messages. No Attention Signal is required for EAS messages that do not contain audio programming... EAS messages in the main audio channel. All DAB stations shall also transmit EAS messages on all audio...

  18. 47 CFR 11.51 - EAS code and Attention Signal Transmission requirements.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... Message (EOM) codes using the EAS Protocol. The Attention Signal must precede any emergency audio message... audio messages. No Attention Signal is required for EAS messages that do not contain audio programming... EAS messages in the main audio channel. All DAB stations shall also transmit EAS messages on all audio...

  19. Communicative Competence in Audio Classrooms: A Position Paper for the CADE 1991 Conference.

    ERIC Educational Resources Information Center

    Burge, Liz

    Classroom practitioners need to move their attention away from the technological and logistical competencies required for audio conferencing (AC) to the required communicative competencies in order to advance their skills in handling the psychodynamics of audio virtual classrooms which include audio alone and audio with graphics. While the…

  20. The Audio Description as a Physics Teaching Tool

    ERIC Educational Resources Information Center

    Cozendey, Sabrina; Costa, Maria da Piedade

    2016-01-01

    This study analyses the use of audio description in teaching physics concepts, aiming to determine the variables that influence the understanding of the concept. One education resource was audio described. For make the audio description the screen was freezing. The video with and without audio description should be presented to students, so that…

  1. Magnetic Resonance and Spectroscopy of the Human Brain in Gulf War Illness

    DTIC Science & Technology

    2005-08-01

    relationship between GWI and stress . Acoustic startle is a hallmark feature of PTSD . Past studies have shown that PTSD subjects have an increased startle...brain, neuro- psychological testing, audio vestibular testing, PTSD 16. SECURITY CLASSIFICATION OF: 17. LIMITATION 18. NUMBER 19a. NAME OF...such as PTSD , depression, or alcohol abuse. 2) Reduced NAA in the basal ganglia and pons correlates with central nervous system signs and symptoms of

  2. Alternative Audio Solution to Enhance Immersion in Deployable Synthetic Environments

    DTIC Science & Technology

    2003-09-01

    sense of presence. For example, the musical score of a movie increases the viewers’ emotional involvement in a cinematic feature. The character...photo-realistic way can make mental immersion difficult, because any flaw in the realism will spoil the effect [SHER 03].” One way to overcome spoiling...the visual realism is to reinforce visual clues with those from other modalities. 3. Aural Modality a. General Aural displays can be

  3. Two Stage Data Augmentation for Low Resourced Speech Recognition (Author’s Manuscript)

    DTIC Science & Technology

    2016-09-12

    speech recognition, deep neural networks, data augmentation 1. Introduction When training data is limited—whether it be audio or text—the obvious...Schwartz, and S. Tsakalidis, “Enhancing low resource keyword spotting with au- tomatically retrieved web documents,” in Interspeech, 2015, pp. 839–843. [2...and F. Seide, “Feature learning in deep neural networks - a study on speech recognition tasks,” in International Conference on Learning Representations

  4. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

    PubMed Central

    Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.

    2013-01-01

    Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144

  5. Discussion on Application of Space Materials and Technological Innovation in Dynamic Fashion Show

    NASA Astrophysics Data System (ADS)

    Huo, Meilin; Kim, Chul Soo; Zhao, Wenhan

    2018-03-01

    In modern dynamic fashion show, designers often use the latest ideas and technology, and spend their energy in stage effect and overall environment to make audience’s watching a fashion show like an audio-visual feast. With rapid development of China’s science and technology, it has become a design trend to strengthen the relationship between new ideas, new trends and technology in modern art. With emergence of new technology, new methods and new materials, designers for dynamic fashion show stage art can choose the materials with an increasingly large scope. Generation of new technology has also made designers constantly innovate the stage space design means, and made the stage space design innovated constantly on the original basis of experiences. The dynamic clothing display space is on design of clothing display space, layout, platform decoration style, platform models, performing colors, light arrangement, platform background, etc.

  6. Neurophysiological evidence for the interplay of speech segmentation and word-referent mapping during novel word learning.

    PubMed

    François, Clément; Cunillera, Toni; Garcia, Enara; Laine, Matti; Rodriguez-Fornells, Antoni

    2017-04-01

    Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. 47 CFR 73.322 - FM stereophonic sound transmission standards.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... transmission, modulation of the carrier by audio components within the baseband range of 50 Hz to 15 kHz shall... the carrier by audio components within the audio baseband range of 23 kHz to 99 kHz shall not exceed... method described in (a), must limit the modulation of the carrier by audio components within the audio...

  8. 47 CFR 73.322 - FM stereophonic sound transmission standards.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... transmission, modulation of the carrier by audio components within the baseband range of 50 Hz to 15 kHz shall... the carrier by audio components within the audio baseband range of 23 kHz to 99 kHz shall not exceed... method described in (a), must limit the modulation of the carrier by audio components within the audio...

  9. 47 CFR 73.322 - FM stereophonic sound transmission standards.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... transmission, modulation of the carrier by audio components within the baseband range of 50 Hz to 15 kHz shall... the carrier by audio components within the audio baseband range of 23 kHz to 99 kHz shall not exceed... method described in (a), must limit the modulation of the carrier by audio components within the audio...

  10. 47 CFR 73.322 - FM stereophonic sound transmission standards.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... transmission, modulation of the carrier by audio components within the baseband range of 50 Hz to 15 kHz shall... the carrier by audio components within the audio baseband range of 23 kHz to 99 kHz shall not exceed... method described in (a), must limit the modulation of the carrier by audio components within the audio...

  11. Astronaut James Newman works with computers and GPS

    NASA Image and Video Library

    1993-09-20

    STS051-16-028 (12-22 Sept 1993) --- On Discovery's middeck, astronaut James H. Newman, mission specialist, works with an array of computers, including one devoted to Global Positioning System (GPS) operations, a general portable onboard computer displaying a tracking map, a portable audio data modem and another payload and general support computer. Newman was joined by four other NASA astronauts for almost ten full days in space.

  12. Kaleri and Foale during telecon in the U.S. Lab during Expedition 8

    NASA Image and Video Library

    2003-12-28

    ISS008-E-10698 (28 December 2003) --- Cosmonaut Alexander Y. Kaleri (foreground), Expedition 8 flight engineer, and astronaut C. Michael Foale, mission commander and NASA ISS science officer, conduct a teleconference with the Moscow Support Group for the Russian New Year celebration, via Ku- and S-band, with audio and video relayed to the Mission Control Center (MCC) at Johnson Space Center (JSC). Kaleri represents Rosaviakosmos.

  13. Kaleri and Foale during telecon in the U.S. Lab during Expedition 8

    NASA Image and Video Library

    2003-12-28

    ISS008-E-10737 (28 Dec. 2003) --- Astronaut C. Michael Foale (right), Expedition 8 mission commander and NASA ISS science officer, and cosmonaut Alexander Y. Kaleri, flight engineer, conduct a teleconference with the Moscow Support Group for the Russian New Year celebration, via Ku- and S-band, with audio and video relayed to the Mission Control Center (MCC) at Johnson Space Center (JSC). Kaleri represents Rosaviakosmos.

  14. Kaleri and Foale during telecon in the U.S. Lab during Expedition 8

    NASA Image and Video Library

    2003-12-28

    ISS008-E-10711 (28 December 2003) --- Cosmonaut Alexander Y. Kaleri (foreground), Expedition 8 flight engineer, and astronaut C. Michael Foale, mission commander and NASA ISS science officer, conduct a teleconference with the Moscow Support Group for the Russian New Year celebration, via Ku- and S-band, with audio and video relayed to the Mission Control Center (MCC) at Johnson Space Center (JSC). Kaleri represents Rosaviakosmos.

  15. Comparing Audio and Video Data for Rating Communication

    PubMed Central

    Williams, Kristine; Herman, Ruth; Bontempo, Daniel

    2013-01-01

    Video recording has become increasingly popular in nursing research, adding rich nonverbal, contextual, and behavioral information. However, benefits of video over audio data have not been well established. We compared communication ratings of audio versus video data using the Emotional Tone Rating Scale. Twenty raters watched video clips of nursing care and rated staff communication on 12 descriptors that reflect dimensions of person-centered and controlling communication. Another group rated audio-only versions of the same clips. Interrater consistency was high within each group with ICC (2,1) for audio = .91, and video = .94. Interrater consistency for both groups combined was also high with ICC (2,1) for audio and video = .95. Communication ratings using audio and video data were highly correlated. The value of video being superior to audio recorded data should be evaluated in designing studies evaluating nursing care. PMID:23579475

  16. Inclusive Planetary Science Outreach and Education: a Pioneering European Experience

    NASA Astrophysics Data System (ADS)

    Galvez, A.; Ballesteros, F.; García-Frank, A.; Gil, S.; Gil-Ortiz, A.; Gómez-Heras, M.; Martínez-Frías, J.; Parro, L. M.; Parro, V.; Pérez-Montero, E.; Raposo, V.; Vaquerizo, J. A.

    2017-09-01

    Abstract Universal access to space science and exploration for researchers, students and the public, regardless of physical abilities or condition, is the main objective of work by the Space Inclusive Network (SpaceIn). The purpose of SpaceIn is to conduct educational and communication activities on Space Science in an inclusive and accessible way, so that physical disability is not an impediment for participating. SpaceIn members aim to enlarge the network also by raising awareness among individuals such as undergraduate students, secondary school teachers, and members of the public with an interest and basic knowledge on science and astronomy. As part of a pilot experience, current activities are focused on education and outreach in the field of comparative Planetary Science and Astrobiology. Themes include the similarities and differences between terrestrial planets, the role of water and its interaction with minerals on their surfaces, the importance of internal thermal energy in shaping planets and moons and the implications for the appearance of life, as we know it, in our planet and, possibly, in other places in our Solar System and beyond. The topics also include how scientific research and space missions can shed light on these fundamental issues, such as how life appears on a planet, and thus, why planetary missions are important in our society, as a source of knowledge and inspiration. The tools that are used to communicate the concepts include talks with support of multimedia and multi-sensorial material (video, audio, tactile, taste, smell) and field trips to planetary analogue sites that are accessible to most members of the public, including people with some kind of disability. The field trips help illustrate scientific concepts in geology e.g. lava formations, folds, impact features, gullies, salt plains; biology, e.g. extremophiles, halophites; and exploration technology, e.g. navigation in an unknown environment, hazard and obstacle avoidance, mobility in all types of terrain, etc. This paper describes all the current activities and the future plans for traineeships and other actions at European level.

  17. Predicting the Overall Spatial Quality of Automotive Audio Systems

    NASA Astrophysics Data System (ADS)

    Koya, Daisuke

    The spatial quality of automotive audio systems is often compromised due to their unideal listening environments. Automotive audio systems need to be developed quickly due to industry demands. A suitable perceptual model could evaluate the spatial quality of automotive audio systems with similar reliability to formal listening tests but take less time. Such a model is developed in this research project by adapting an existing model of spatial quality for automotive audio use. The requirements for the adaptation were investigated in a literature review. A perceptual model called QESTRAL was reviewed, which predicts the overall spatial quality of domestic multichannel audio systems. It was determined that automotive audio systems are likely to be impaired in terms of the spatial attributes that were not considered in developing the QESTRAL model, but metrics are available that might predict these attributes. To establish whether the QESTRAL model in its current form can accurately predict the overall spatial quality of automotive audio systems, MUSHRA listening tests using headphone auralisation with head tracking were conducted to collect results to be compared against predictions by the model. Based on guideline criteria, the model in its current form could not accurately predict the overall spatial quality of automotive audio systems. To improve prediction performance, the QESTRAL model was recalibrated and modified using existing metrics of the model, those that were proposed from the literature review, and newly developed metrics. The most important metrics for predicting the overall spatial quality of automotive audio systems included those that were interaural cross-correlation (IACC) based, relate to localisation of the frontal audio scene, and account for the perceived scene width in front of the listener. Modifying the model for automotive audio systems did not invalidate its use for domestic audio systems. The resulting model predicts the overall spatial quality of 2- and 5-channel automotive audio systems with a cross-validation performance of R. 2 = 0.85 and root-mean-squareerror (RMSE) = 11.03%.

  18. Exploring the Implementation of Steganography Protocols on Quantum Audio Signals

    NASA Astrophysics Data System (ADS)

    Chen, Kehan; Yan, Fei; Iliyasu, Abdullah M.; Zhao, Jianping

    2018-02-01

    Two quantum audio steganography (QAS) protocols are proposed, each of which manipulates or modifies the least significant qubit (LSQb) of the host quantum audio signal that is encoded as an FRQA (flexible representation of quantum audio) audio content. The first protocol (i.e. the conventional LSQb QAS protocol or simply the cLSQ stego protocol) is built on the exchanges between qubits encoding the quantum audio message and the LSQb of the amplitude information in the host quantum audio samples. In the second protocol, the embedding procedure to realize it implants information from a quantum audio message deep into the constraint-imposed most significant qubit (MSQb) of the host quantum audio samples, we refer to it as the pseudo MSQb QAS protocol or simply the pMSQ stego protocol. The cLSQ stego protocol is designed to guarantee high imperceptibility between the host quantum audio and its stego version, whereas the pMSQ stego protocol ensures that the resulting stego quantum audio signal is better immune to illicit tampering and copyright violations (a.k.a. robustness). Built on the circuit model of quantum computation, the circuit networks to execute the embedding and extraction algorithms of both QAS protocols are determined and simulation-based experiments are conducted to demonstrate their implementation. Outcomes attest that both protocols offer promising trade-offs in terms of imperceptibility and robustness.

  19. Comparing audio and video data for rating communication.

    PubMed

    Williams, Kristine; Herman, Ruth; Bontempo, Daniel

    2013-09-01

    Video recording has become increasingly popular in nursing research, adding rich nonverbal, contextual, and behavioral information. However, benefits of video over audio data have not been well established. We compared communication ratings of audio versus video data using the Emotional Tone Rating Scale. Twenty raters watched video clips of nursing care and rated staff communication on 12 descriptors that reflect dimensions of person-centered and controlling communication. Another group rated audio-only versions of the same clips. Interrater consistency was high within each group with Interclass Correlation Coefficient (ICC) (2,1) for audio .91, and video = .94. Interrater consistency for both groups combined was also high with ICC (2,1) for audio and video = .95. Communication ratings using audio and video data were highly correlated. The value of video being superior to audio-recorded data should be evaluated in designing studies evaluating nursing care.

  20. Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

    NASA Astrophysics Data System (ADS)

    Bugatti, Alessandro; Flammini, Alessandra; Migliorati, Pierangelo

    2002-12-01

    We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on Zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron). In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used.

  1. Ontology-based structured cosine similarity in document summarization: with applications to mobile audio-based knowledge management.

    PubMed

    Yuan, Soe-Tsyr; Sun, Jerry

    2005-10-01

    Development of algorithms for automated text categorization in massive text document sets is an important research area of data mining and knowledge discovery. Most of the text-clustering methods were grounded in the term-based measurement of distance or similarity, ignoring the structure of the documents. In this paper, we present a novel method named structured cosine similarity (SCS) that furnishes document clustering with a new way of modeling on document summarization, considering the structure of the documents so as to improve the performance of document clustering in terms of quality, stability, and efficiency. This study was motivated by the problem of clustering speech documents (of no rich document features) attained from the wireless experience oral sharing conducted by mobile workforce of enterprises, fulfilling audio-based knowledge management. In other words, this problem aims to facilitate knowledge acquisition and sharing by speech. The evaluations also show fairly promising results on our method of structured cosine similarity.

  2. [Style of communication between mission control centers and space crews].

    PubMed

    Iusupova, A K; Gushchin, V I; Shved, D M; Cheveleva, L M

    2011-01-01

    The article deals with a pilot investigation into the audio communication of cosmonauts with ground controllers. The purpose was to verify in space flight the patterns and trends revealed in model tests of intergroup communication, and to pinpoint the signature of multinational crew communication with 2 national mission control centers (MCCs). The investigation employed authors' content-analysis adapted to the scenario of long-duration mission. The investigation resulted in a phenomenon of double-loop ground-orbit communication, divergence, difference in opinion predictable from the concept formulated by G.T.Beregovoi. Also, there was a notable difference of expressions used by controllers of 2 MCCs.

  3. Reducing audio stimulus presentation latencies across studies, laboratories, and hardware and operating system configurations.

    PubMed

    Babjack, Destiny L; Cernicky, Brandon; Sobotka, Andrew J; Basler, Lee; Struthers, Devon; Kisic, Richard; Barone, Kimberly; Zuccolotto, Anthony P

    2015-09-01

    Using differing computer platforms and audio output devices to deliver audio stimuli often introduces (1) substantial variability across labs and (2) variable time between the intended and actual sound delivery (the sound onset latency). Fast, accurate audio onset latencies are particularly important when audio stimuli need to be delivered precisely as part of studies that depend on accurate timing (e.g., electroencephalographic, event-related potential, or multimodal studies), or in multisite studies in which standardization and strict control over the computer platforms used is not feasible. This research describes the variability introduced by using differing configurations and introduces a novel approach to minimizing audio sound latency and variability. A stimulus presentation and latency assessment approach is presented using E-Prime and Chronos (a new multifunction, USB-based data presentation and collection device). The present approach reliably delivers audio stimuli with low latencies that vary by ≤1 ms, independent of hardware and Windows operating system (OS)/driver combinations. The Chronos audio subsystem adopts a buffering, aborting, querying, and remixing approach to the delivery of audio, to achieve a consistent 1-ms sound onset latency for single-sound delivery, and precise delivery of multiple sounds that achieves standard deviations of 1/10th of a millisecond without the use of advanced scripting. Chronos's sound onset latencies are small, reliable, and consistent across systems. Testing of standard audio delivery devices and configurations highlights the need for careful attention to consistency between labs, experiments, and multiple study sites in their hardware choices, OS selections, and adoption of audio delivery systems designed to sidestep the audio latency variability issue.

  4. Revealing the ecological content of long-duration audio-recordings of the environment through clustering and visualisation.

    PubMed

    Phillips, Yvonne F; Towsey, Michael; Roe, Paul

    2018-01-01

    Audio recordings of the environment are an increasingly important technique to monitor biodiversity and ecosystem function. While the acquisition of long-duration recordings is becoming easier and cheaper, the analysis and interpretation of that audio remains a significant research area. The issue addressed in this paper is the automated reduction of environmental audio data to facilitate ecological investigations. We describe a method that first reduces environmental audio to vectors of acoustic indices, which are then clustered. This can reduce the audio data by six to eight orders of magnitude yet retain useful ecological information. We describe techniques to visualise sequences of cluster occurrence (using for example, diel plots, rose plots) that assist interpretation of environmental audio. Colour coding acoustic clusters allows months and years of audio data to be visualised in a single image. These techniques are useful in identifying and indexing the contents of long-duration audio recordings. They could also play an important role in monitoring long-term changes in species abundance brought about by habitat degradation and/or restoration.

  5. Revealing the ecological content of long-duration audio-recordings of the environment through clustering and visualisation

    PubMed Central

    Towsey, Michael; Roe, Paul

    2018-01-01

    Audio recordings of the environment are an increasingly important technique to monitor biodiversity and ecosystem function. While the acquisition of long-duration recordings is becoming easier and cheaper, the analysis and interpretation of that audio remains a significant research area. The issue addressed in this paper is the automated reduction of environmental audio data to facilitate ecological investigations. We describe a method that first reduces environmental audio to vectors of acoustic indices, which are then clustered. This can reduce the audio data by six to eight orders of magnitude yet retain useful ecological information. We describe techniques to visualise sequences of cluster occurrence (using for example, diel plots, rose plots) that assist interpretation of environmental audio. Colour coding acoustic clusters allows months and years of audio data to be visualised in a single image. These techniques are useful in identifying and indexing the contents of long-duration audio recordings. They could also play an important role in monitoring long-term changes in species abundance brought about by habitat degradation and/or restoration. PMID:29494629

  6. Using music[al] knowledge to represent expressions of emotions.

    PubMed

    Alexander, Stewart C; Garner, David Kirkland; Somoroff, Matthew; Gramling, David J; Norton, Sally A; Gramling, Robert

    2015-11-01

    Being able to identify expressions of emotion is crucial to effective clinical communication research. However, traditional linguistic coding systems often cannot represent emotions that are expressed nonlexically or phonologically (i.e., not through words themselves but through vocal pitch, speed/rhythm/tempo, and volume). Using audio recording of a palliative care consultation in the natural hospital setting, two experienced music scholars employed Western musical notation, as well as the graphic realization of a digital audio program (Piano roll visualization), to visually represent the sonic features of conversation where a patient has an emotional "choke" moment. Western musical notation showed the ways that changes in pitch and rate correspond to the patient's emotion: rising sharply in intensity before slowly fading away. Piano roll visualization is a helpful supplement. Using musical notation to illustrate palliative care conversations in the hospital setting can render visible for analysis several aspects of emotional expression that researchers otherwise experience as intuitive or subjective. Various forms and formats of musical notation techniques and sonic visualization technologies should be considered as fruitful and complementary alternatives to traditional coding tools in clinical communications research. Musical notation offers opportunity for both researchers and learners to "see" how communication evolves in clinical encounters, particularly where the lexical and phonological features of interpersonal communication are concordant and discordant with one another. Copyright © 2015. Published by Elsevier Ireland Ltd.

  7. Smartphones for Geological Data Collection- an Android Phone Application

    NASA Astrophysics Data System (ADS)

    Sun, F.; Weng, Y.; Grigsby, J. D.

    2010-12-01

    Recently, smartphones have attracted great attention in the wireless device market because of their powerful processors, ample memory capacity, advanced connectivity, and numerous utility programs. Considering the prominent new features a smartphone has, such as the large touch screen, speaker, microphone, camera, GPS receiver, accelerometer, and Internet connections, it can serve as a perfect digital aide for data recording on any geological field trip. We have designed and developed an application by using aforementioned features in an Android phone to provide functionalities used in field studies. For example, employing the accelerometer in the Android phone, the application turns the handset into a brunton-like device by which users can measure directions, strike and dip of a bedding plane or trend and plunge of a fold. Our application also includes functionalities of image taking, GPS coordinates tracking, videotaping, audio recording, and note writing. Data recorded from the application are tied together by the time log, which makes the task easy to track all data regarding a specific geologic object. The application pulls the GPS reading from the phone’s built-in GPS receiver and uses it as a spatial index to link up the other type of data, then maps them to the Google Maps/Earth for visualization. In this way, notes, pictures, audio or video recordings to depict the characteristics of the outcrops and their spatial relations, all can be well documented and organized in one handy gadget.

  8. Sonification for geoscience: Listening to faults from the inside

    NASA Astrophysics Data System (ADS)

    Barrett, Natasha; Mair, Karen

    2014-05-01

    Here we investigate the use of sonification for geoscience by sonifying the data generated in computer models of earthquake processes. Using mainly parameter mapping sonification, we explore data from our recent 3D DEM (discrete element method) models where granular debris is sheared between rough walls to simulate an evolving fault (e.g. Mair and Abe, 2011). To best appreciate the inherently 3D nature of the crushing and sliding events (continuously tracked in our models) that occur as faults slip, we use Ambisonics (a sound field recreation technology). This allows the position of individual events to be preserved generating a virtual 3D soundscape so we can explore faults from the inside. The addition of 3D audio to the sonification tool palate further allows us to more accurately connect to spatial data in a novel and engaging manner. During sonification, events such as grain scale fracturing, grain motions and interactions are mapped to specific sounds whose pitch, timbre, and volume reflect properties such as the depth, character, and size of the individual events. Our interactive and real-time approaches allow the listener to actively explore the data in time and space, listening to evolving processes by navigating through the spatial data via a 3D mouse controller. The soundscape can be heard either through an array of speakers or using a pair of headphones. Emergent phenomena in the models generate clear sound patterns that are easily spotted. Also, because our ears are excellent signal-to-noise filters, events are recognizable above the background noise. Although these features may be detectable visually, using a different sense (and part of the brain) gives a fresh perspective and facilitates a rapid appreciation of 'signals' through audio awareness, rather than specific scientific training. For this reason we anticipate significant potential for the future use of sonification in the presentation, interpretation and communication of geoscience datasets to both experts and the general public.

  9. Extending peripersonal space representation without tool-use: evidence from a combined behavioral-computational approach

    PubMed Central

    Serino, Andrea; Canzoneri, Elisa; Marzolla, Marilena; di Pellegrino, Giuseppe; Magosso, Elisa

    2015-01-01

    Stimuli from different sensory modalities occurring on or close to the body are integrated in a multisensory representation of the space surrounding the body, i.e., peripersonal space (PPS). PPS dynamically modifies depending on experience, e.g., it extends after using a tool to reach far objects. However, the neural mechanism underlying PPS plasticity after tool use is largely unknown. Here we use a combined computational-behavioral approach to propose and test a possible mechanism accounting for PPS extension. We first present a neural network model simulating audio-tactile representation in the PPS around one hand. Simulation experiments showed that our model reproduced the main property of PPS neurons, i.e., selective multisensory response for stimuli occurring close to the hand. We used the neural network model to simulate the effects of a tool-use training. In terms of sensory inputs, tool use was conceptualized as a concurrent tactile stimulation from the hand, due to holding the tool, and an auditory stimulation from the far space, due to tool-mediated action. Results showed that after exposure to those inputs, PPS neurons responded also to multisensory stimuli far from the hand. The model thus suggests that synchronous pairing of tactile hand stimulation and auditory stimulation from the far space is sufficient to extend PPS, such as after tool-use. Such prediction was confirmed by a behavioral experiment, where we used an audio-tactile interaction paradigm to measure the boundaries of PPS representation. We found that PPS extended after synchronous tactile-hand stimulation and auditory-far stimulation in a group of healthy volunteers. Control experiments both in simulation and behavioral settings showed that the same amount of tactile and auditory inputs administered out of synchrony did not change PPS representation. We conclude by proposing a simple, biological-plausible model to explain plasticity in PPS representation after tool-use, which is supported by computational and behavioral data. PMID:25698947

  10. Extending peripersonal space representation without tool-use: evidence from a combined behavioral-computational approach.

    PubMed

    Serino, Andrea; Canzoneri, Elisa; Marzolla, Marilena; di Pellegrino, Giuseppe; Magosso, Elisa

    2015-01-01

    Stimuli from different sensory modalities occurring on or close to the body are integrated in a multisensory representation of the space surrounding the body, i.e., peripersonal space (PPS). PPS dynamically modifies depending on experience, e.g., it extends after using a tool to reach far objects. However, the neural mechanism underlying PPS plasticity after tool use is largely unknown. Here we use a combined computational-behavioral approach to propose and test a possible mechanism accounting for PPS extension. We first present a neural network model simulating audio-tactile representation in the PPS around one hand. Simulation experiments showed that our model reproduced the main property of PPS neurons, i.e., selective multisensory response for stimuli occurring close to the hand. We used the neural network model to simulate the effects of a tool-use training. In terms of sensory inputs, tool use was conceptualized as a concurrent tactile stimulation from the hand, due to holding the tool, and an auditory stimulation from the far space, due to tool-mediated action. Results showed that after exposure to those inputs, PPS neurons responded also to multisensory stimuli far from the hand. The model thus suggests that synchronous pairing of tactile hand stimulation and auditory stimulation from the far space is sufficient to extend PPS, such as after tool-use. Such prediction was confirmed by a behavioral experiment, where we used an audio-tactile interaction paradigm to measure the boundaries of PPS representation. We found that PPS extended after synchronous tactile-hand stimulation and auditory-far stimulation in a group of healthy volunteers. Control experiments both in simulation and behavioral settings showed that the same amount of tactile and auditory inputs administered out of synchrony did not change PPS representation. We conclude by proposing a simple, biological-plausible model to explain plasticity in PPS representation after tool-use, which is supported by computational and behavioral data.

  11. Holographic disk with high data transfer rate: its application to an audio response memory.

    PubMed

    Kubota, K; Ono, Y; Kondo, M; Sugama, S; Nishida, N; Sakaguchi, M

    1980-03-15

    This paper describes a memory realized with a high data transfer rate using the holographic parallel-processing function and its application to an audio response system that supplies many audio messages to many terminals simultaneously. Digitalized audio messages are recorded as tiny 1-D Fourier transform holograms on a holographic disk. A hologram recorder and a hologram reader were constructed to test and demonstrate the holographic audio response memory feasibility. Experimental results indicate the potentiality of an audio response system with a 2000-word vocabulary and 250-Mbit/sec bit transfer rate.

  12. What People Talk About in Virtual Worlds

    NASA Astrophysics Data System (ADS)

    Maher, Mary Lou

    This chapter examines what people talk about in virtual worlds, employing protocol analysis. Each of two scenario studies was developed to assess the impact of virtual worlds as a collaborative environment for a specific purpose: one for learning and one for designing. The first designed a place in Active Worlds for a course on Web Site Design, having group learning spaces surrounded by individual student galleries. Student text chat was analyzed through a coding scheme with four major categories: control, technology, learning, and place. The second studied expert architects in a Second Life environment called DesignWorld that combined 3D modeling and sketching tools. Video and audio recordings were coded in terms of four categories of communication content (designing, representation of the model, awareness of each other, and software features), and in terms of synthesis comparing alternative designs versus analysis of how well the proposed solution satisfies the given design task. Both studies found that people talk about their avatars, identity, and location in the virtual world. However, the discussion is chiefly about the task and not about the virtual world, implying that virtual worlds provide a viable environment for learning and designing that does not distract people from their task.

  13. Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection

    PubMed Central

    Vesperini, Fabio; Schuller, Björn

    2017-01-01

    In the emerging field of acoustic novelty detection, most research efforts are devoted to probabilistic approaches such as mixture models or state-space models. Only recent studies introduced (pseudo-)generative models for acoustic novelty detection with recurrent neural networks in the form of an autoencoder. In these approaches, auditory spectral features of the next short term frame are predicted from the previous frames by means of Long-Short Term Memory recurrent denoising autoencoders. The reconstruction error between the input and the output of the autoencoder is used as activation signal to detect novel events. There is no evidence of studies focused on comparing previous efforts to automatically recognize novel events from audio signals and giving a broad and in depth evaluation of recurrent neural network-based autoencoders. The present contribution aims to consistently evaluate our recent novel approaches to fill this white spot in the literature and provide insight by extensive evaluations carried out on three databases: A3Novelty, PASCAL CHiME, and PROMETHEUS. Besides providing an extensive analysis of novel and state-of-the-art methods, the article shows how RNN-based autoencoders outperform statistical approaches up to an absolute improvement of 16.4% average F-measure over the three databases. PMID:28182121

  14. Electrophysiological evidence for Audio-visuo-lingual speech integration.

    PubMed

    Treille, Avril; Vilain, Coriandre; Schwartz, Jean-Luc; Hueber, Thomas; Sato, Marc

    2018-01-31

    Recent neurophysiological studies demonstrate that audio-visual speech integration partly operates through temporal expectations and speech-specific predictions. From these results, one common view is that the binding of auditory and visual, lipread, speech cues relies on their joint probability and prior associative audio-visual experience. The present EEG study examined whether visual tongue movements integrate with relevant speech sounds, despite little associative audio-visual experience between the two modalities. A second objective was to determine possible similarities and differences of audio-visual speech integration between unusual audio-visuo-lingual and classical audio-visuo-labial modalities. To this aim, participants were presented with auditory, visual, and audio-visual isolated syllables, with the visual presentation related to either a sagittal view of the tongue movements or a facial view of the lip movements of a speaker, with lingual and facial movements previously recorded by an ultrasound imaging system and a video camera. In line with previous EEG studies, our results revealed an amplitude decrease and a latency facilitation of P2 auditory evoked potentials in both audio-visual-lingual and audio-visuo-labial conditions compared to the sum of unimodal conditions. These results argue against the view that auditory and visual speech cues solely integrate based on prior associative audio-visual perceptual experience. Rather, they suggest that dynamic and phonetic informational cues are sharable across sensory modalities, possibly through a cross-modal transfer of implicit articulatory motor knowledge. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Support for the 38th International Conference on High Energy Physics, 3-10 August 2016

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Young-Kee

    The 38th International Conference on High Energy Physics (ICHEP) held in Chicago from August 3 to 10, 2016 was for physicists from around the world to gather to share the latest advancements in particle physics, astrophysics/cosmology, and accelerator science and to discuss plans for major future facilities. DOE funding provided partial support for space rental audio-visual services for scientific presentations at the conference.

  16. 78 FR 38093 - Seventh Meeting: RTCA Special Committee 226, Audio Systems and Equipment

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-06-25

    ... Committee 226, Audio Systems and Equipment AGENCY: Federal Aviation Administration (FAA), U.S. Department of Transportation (DOT). ACTION: Meeting Notice of RTCA Special Committee 226, Audio Systems and Equipment. SUMMARY... 226, Audio Systems and Equipment [[Page 38094

  17. Science@NASA: Direct to People!

    NASA Technical Reports Server (NTRS)

    Koczor, Ronald J.; Adams, Mitzi; Gallagher, Dennis; Whitaker, Ann (Technical Monitor)

    2002-01-01

    Science@NASA is a science communication effort sponsored by NASA's Marshall Space Flight Center. It is the result of a four year research project between Marshall, the University of Florida College of Journalism and Communications and the internet communications company, Bishop Web Works. The goals of Science@NASA are to inform, inspire, and involve people in the excitement of NASA science by bringing that science directly to them. We stress not only the reporting of the facts of a particular topic, but also the context and importance of the research. Science@NASA involves several levels of activity from academic communications research to production of content for 6 websites, in an integrated process involving all phases of production. A Science Communications Roundtable Process is in place that includes scientists, managers, writers, editors, and Web technical experts. The close connection between the scientists and the writers/editors assures a high level of scientific accuracy in the finished products. The websites each have unique characters and are aimed at different audience segments: 1. http://science.nasa.gov. (SNG) Carries stories featuring various aspects of NASA science activity. The site carries 2 or 3 new stories each week in written and audio formats for science-attentive adults. 2. http://liftoff.msfc.nasa.gov. Features stories from SNG that are recast for a high school level audience. J-Track and J-Pass applets for tracking satellites are our most popular product. 3. http://kids. msfc.nasa.gov. This is the Nursemaids site and is aimed at a middle school audience. The NASAKids Club is a new feature at the site. 4. http://www.thursdaysclassroom.com . This site features lesson plans and classroom activities for educators centered around one of the science stories carried on SNG. 5. http://www.spaceweather.com. This site gives the status of solar activity and its interactions with the Earth's ionosphere and magnetosphere.

  18. Diagnostic accuracy of sleep bruxism scoring in absence of audio-video recording: a pilot study.

    PubMed

    Carra, Maria Clotilde; Huynh, Nelly; Lavigne, Gilles J

    2015-03-01

    Based on the most recent polysomnographic (PSG) research diagnostic criteria, sleep bruxism is diagnosed when >2 rhythmic masticatory muscle activity (RMMA)/h of sleep are scored on the masseter and/or temporalis muscles. These criteria have not yet been validated for portable PSG systems. This pilot study aimed to assess the diagnostic accuracy of scoring sleep bruxism in absence of audio-video recordings. Ten subjects (mean age 24.7 ± 2.2) with a clinical diagnosis of sleep bruxism spent one night in the sleep laboratory. PSG were performed with a portable system (type 2) while audio-video was recorded. Sleep studies were scored by the same examiner three times: (1) without, (2) with, and (3) without audio-video in order to test the intra-scoring and intra-examiner reliability for RMMA scoring. The RMMA event-by-event concordance rate between scoring without audio-video and with audio-video was 68.3 %. Overall, the RMMA index was overestimated by 23.8 % without audio-video. However, the intra-class correlation coefficient (ICC) between scorings with and without audio-video was good (ICC = 0.91; p < 0.001); the intra-examiner reliability was high (ICC = 0.97; p < 0.001). The clinical diagnosis of sleep bruxism was confirmed in 8/10 subjects based on scoring without audio-video and in 6/10 subjects with audio-video. Although the absence of audio-video recording, the diagnostic accuracy of assessing RMMA with portable PSG systems appeared to remain good, supporting their use for both research and clinical purposes. However, the risk of moderate overestimation in absence of audio-video must be taken into account.

  19. A combined model of sensory and cognitive representations underlying tonal expectations in music: from audio signals to behavior.

    PubMed

    Collins, Tom; Tillmann, Barbara; Barrett, Frederick S; Delbé, Charles; Janata, Petr

    2014-01-01

    Listeners' expectations for melodies and harmonies in tonal music are perhaps the most studied aspect of music cognition. Long debated has been whether faster response times (RTs) to more strongly primed events (in a music theoretic sense) are driven by sensory or cognitive mechanisms, such as repetition of sensory information or activation of cognitive schemata that reflect learned tonal knowledge, respectively. We analyzed over 300 stimuli from 7 priming experiments comprising a broad range of musical material, using a model that transforms raw audio signals through a series of plausible physiological and psychological representations spanning a sensory-cognitive continuum. We show that RTs are modeled, in part, by information in periodicity pitch distributions, chroma vectors, and activations of tonal space--a representation on a toroidal surface of the major/minor key relationships in Western tonal music. We show that in tonal space, melodies are grouped by their tonal rather than timbral properties, whereas the reverse is true for the periodicity pitch representation. While tonal space variables explained more of the variation in RTs than did periodicity pitch variables, suggesting a greater contribution of cognitive influences to tonal expectation, a stepwise selection model contained variables from both representations and successfully explained the pattern of RTs across stimulus categories in 4 of the 7 experiments. The addition of closure--a cognitive representation of a specific syntactic relationship--succeeded in explaining results from all 7 experiments. We conclude that multiple representational stages along a sensory-cognitive continuum combine to shape tonal expectations in music. (PsycINFO Database Record (c) 2014 APA, all rights reserved).

  20. 47 CFR 73.403 - Digital audio broadcasting service requirements.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... programming stream at no direct charge to listeners. In addition, a broadcast radio station must simulcast its analog audio programming on one of its digital audio programming streams. The DAB audio programming... analog programming service currently provided to listeners. (b) Emergency information. The emergency...

  1. High-Fidelity Piezoelectric Audio Device

    NASA Technical Reports Server (NTRS)

    Woodward, Stanley E.; Fox, Robert L.; Bryant, Robert G.

    2003-01-01

    ModalMax is a very innovative means of harnessing the vibration of a piezoelectric actuator to produce an energy efficient low-profile device with high-bandwidth high-fidelity audio response. The piezoelectric audio device outperforms many commercially available speakers made using speaker cones. The piezoelectric device weighs substantially less (4 g) than the speaker cones which use magnets (10 g). ModalMax devices have extreme fabrication simplicity. The entire audio device is fabricated by lamination. The simplicity of the design lends itself to lower cost. The piezoelectric audio device can be used without its acoustic chambers and thereby resulting in a very low thickness of 0.023 in. (0.58 mm). The piezoelectric audio device can be completely encapsulated, which makes it very attractive for use in wet environments. Encapsulation does not significantly alter the audio response. Its small size (see Figure 1) is applicable to many consumer electronic products, such as pagers, portable radios, headphones, laptop computers, computer monitors, toys, and electronic games. The audio device can also be used in automobile or aircraft sound systems.

  2. Subjective audio quality evaluation of embedded-optimization-based distortion precompensation algorithms.

    PubMed

    Defraene, Bruno; van Waterschoot, Toon; Diehl, Moritz; Moonen, Marc

    2016-07-01

    Subjective audio quality evaluation experiments have been conducted to assess the performance of embedded-optimization-based precompensation algorithms for mitigating perceptible linear and nonlinear distortion in audio signals. It is concluded with statistical significance that the perceived audio quality is improved by applying an embedded-optimization-based precompensation algorithm, both in case (i) nonlinear distortion and (ii) a combination of linear and nonlinear distortion is present. Moreover, a significant positive correlation is reported between the collected subjective and objective PEAQ audio quality scores, supporting the validity of using PEAQ to predict the impact of linear and nonlinear distortion on the perceived audio quality.

  3. Validation of a digital audio recording method for the objective assessment of cough in the horse.

    PubMed

    Duz, M; Whittaker, A G; Love, S; Parkin, T D H; Hughes, K J

    2010-10-01

    To validate the use of digital audio recording and analysis for quantification of coughing in horses. Part A: Nine simultaneous digital audio and video recordings were collected individually from seven stabled horses over a 1 h period using a digital audio recorder attached to the halter. Audio files were analysed using audio analysis software. Video and audio recordings were analysed for cough count and timing by two blinded operators on two occasions using a randomised study design for determination of intra-operator and inter-operator agreement. Part B: Seventy-eight hours of audio recordings obtained from nine horses were analysed once by two blinded operators to assess inter-operator repeatability on a larger sample. Part A: There was complete agreement between audio and video analyses and inter- and intra-operator analyses. Part B: There was >97% agreement between operators on number and timing of 727 coughs recorded over 78 h. The results of this study suggest that the cough monitor methodology used has excellent sensitivity and specificity for the objective assessment of cough in horses and intra- and inter-operator variability of recorded coughs is minimal. Crown Copyright 2010. Published by Elsevier India Pvt Ltd. All rights reserved.

  4. Automatic Detection of Whole Night Snoring Events Using Non-Contact Microphone

    PubMed Central

    Dafna, Eliran; Tarasiuk, Ariel; Zigel, Yaniv

    2013-01-01

    Objective Although awareness of sleep disorders is increasing, limited information is available on whole night detection of snoring. Our study aimed to develop and validate a robust, high performance, and sensitive whole-night snore detector based on non-contact technology. Design Sounds during polysomnography (PSG) were recorded using a directional condenser microphone placed 1 m above the bed. An AdaBoost classifier was trained and validated on manually labeled snoring and non-snoring acoustic events. Patients Sixty-seven subjects (age 52.5±13.5 years, BMI 30.8±4.7 kg/m2, m/f 40/27) referred for PSG for obstructive sleep apnea diagnoses were prospectively and consecutively recruited. Twenty-five subjects were used for the design study; the validation study was blindly performed on the remaining forty-two subjects. Measurements and Results To train the proposed sound detector, >76,600 acoustic episodes collected in the design study were manually classified by three scorers into snore and non-snore episodes (e.g., bedding noise, coughing, environmental). A feature selection process was applied to select the most discriminative features extracted from time and spectral domains. The average snore/non-snore detection rate (accuracy) for the design group was 98.4% based on a ten-fold cross-validation technique. When tested on the validation group, the average detection rate was 98.2% with sensitivity of 98.0% (snore as a snore) and specificity of 98.3% (noise as noise). Conclusions Audio-based features extracted from time and spectral domains can accurately discriminate between snore and non-snore acoustic events. This audio analysis approach enables detection and analysis of snoring sounds from a full night in order to produce quantified measures for objective follow-up of patients. PMID:24391903

  5. Automatic detection of whole night snoring events using non-contact microphone.

    PubMed

    Dafna, Eliran; Tarasiuk, Ariel; Zigel, Yaniv

    2013-01-01

    Although awareness of sleep disorders is increasing, limited information is available on whole night detection of snoring. Our study aimed to develop and validate a robust, high performance, and sensitive whole-night snore detector based on non-contact technology. Sounds during polysomnography (PSG) were recorded using a directional condenser microphone placed 1 m above the bed. An AdaBoost classifier was trained and validated on manually labeled snoring and non-snoring acoustic events. Sixty-seven subjects (age 52.5 ± 13.5 years, BMI 30.8 ± 4.7 kg/m(2), m/f 40/27) referred for PSG for obstructive sleep apnea diagnoses were prospectively and consecutively recruited. Twenty-five subjects were used for the design study; the validation study was blindly performed on the remaining forty-two subjects. To train the proposed sound detector, >76,600 acoustic episodes collected in the design study were manually classified by three scorers into snore and non-snore episodes (e.g., bedding noise, coughing, environmental). A feature selection process was applied to select the most discriminative features extracted from time and spectral domains. The average snore/non-snore detection rate (accuracy) for the design group was 98.4% based on a ten-fold cross-validation technique. When tested on the validation group, the average detection rate was 98.2% with sensitivity of 98.0% (snore as a snore) and specificity of 98.3% (noise as noise). Audio-based features extracted from time and spectral domains can accurately discriminate between snore and non-snore acoustic events. This audio analysis approach enables detection and analysis of snoring sounds from a full night in order to produce quantified measures for objective follow-up of patients.

  6. 47 CFR 73.9005 - Compliance requirements for covered demodulator products: Audio.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... products: Audio. 73.9005 Section 73.9005 Telecommunication FEDERAL COMMUNICATIONS COMMISSION (CONTINUED....9005 Compliance requirements for covered demodulator products: Audio. Except as otherwise provided in §§ 73.9003(a) or 73.9004(a), covered demodulator products shall not output the audio portions of...

  7. 36 CFR 1002.12 - Audio disturbances.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 36 Parks, Forests, and Public Property 3 2014-07-01 2014-07-01 false Audio disturbances. 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...

  8. 36 CFR 1002.12 - Audio disturbances.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 36 Parks, Forests, and Public Property 3 2012-07-01 2012-07-01 false Audio disturbances. 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...

  9. 50 CFR 27.72 - Audio equipment.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 50 Wildlife and Fisheries 6 2010-10-01 2010-10-01 false Audio equipment. 27.72 Section 27.72 Wildlife and Fisheries UNITED STATES FISH AND WILDLIFE SERVICE, DEPARTMENT OF THE INTERIOR (CONTINUED) THE... Audio equipment. The operation or use of audio devices including radios, recording and playback devices...

  10. 36 CFR 1002.12 - Audio disturbances.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 36 Parks, Forests, and Public Property 3 2011-07-01 2011-07-01 false Audio disturbances. 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...

  11. 36 CFR 1002.12 - Audio disturbances.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false Audio disturbances. 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...

  12. 50 CFR 27.72 - Audio equipment.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 50 Wildlife and Fisheries 8 2011-10-01 2011-10-01 false Audio equipment. 27.72 Section 27.72 Wildlife and Fisheries UNITED STATES FISH AND WILDLIFE SERVICE, DEPARTMENT OF THE INTERIOR (CONTINUED) THE... Audio equipment. The operation or use of audio devices including radios, recording and playback devices...

  13. 50 CFR 27.72 - Audio equipment.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 50 Wildlife and Fisheries 9 2012-10-01 2012-10-01 false Audio equipment. 27.72 Section 27.72 Wildlife and Fisheries UNITED STATES FISH AND WILDLIFE SERVICE, DEPARTMENT OF THE INTERIOR (CONTINUED) THE... Audio equipment. The operation or use of audio devices including radios, recording and playback devices...

  14. 47 CFR 87.483 - Audio visual warning systems.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 47 Telecommunication 5 2014-10-01 2014-10-01 false Audio visual warning systems. 87.483 Section 87... AVIATION SERVICES Stations in the Radiodetermination Service § 87.483 Audio visual warning systems. An audio visual warning system (AVWS) is a radar-based obstacle avoidance system. AVWS activates...

  15. Effect of Audio Coaching on Correlation of Abdominal Displacement With Lung Tumor Motion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nakamura, Mitsuhiro; Narita, Yuichiro; Matsuo, Yukinori

    2009-10-01

    Purpose: To assess the effect of audio coaching on the time-dependent behavior of the correlation between abdominal motion and lung tumor motion and the corresponding lung tumor position mismatches. Methods and Materials: Six patients who had a lung tumor with a motion range >8 mm were enrolled in the present study. Breathing-synchronized fluoroscopy was performed initially without audio coaching, followed by fluoroscopy with recorded audio coaching for multiple days. Two different measurements, anteroposterior abdominal displacement using the real-time positioning management system and superoinferior (SI) lung tumor motion by X-ray fluoroscopy, were performed simultaneously. Their sequential images were recorded using onemore » display system. The lung tumor position was automatically detected with a template matching technique. The relationship between the abdominal and lung tumor motion was analyzed with and without audio coaching. Results: The mean SI tumor displacement was 10.4 mm without audio coaching and increased to 23.0 mm with audio coaching (p < .01). The correlation coefficients ranged from 0.89 to 0.97 with free breathing. Applying audio coaching, the correlation coefficients improved significantly (range, 0.93-0.99; p < .01), and the SI lung tumor position mismatches became larger in 75% of all sessions. Conclusion: Audio coaching served to increase the degree of correlation and make it more reproducible. In addition, the phase shifts between tumor motion and abdominal displacement were improved; however, all patients breathed more deeply, and the SI lung tumor position mismatches became slightly larger with audio coaching than without audio coaching.« less

  16. 47 CFR 10.520 - Common audio attention signal.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 47 Telecommunication 1 2011-10-01 2011-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...

  17. 36 CFR 2.12 - Audio disturbances.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 36 Parks, Forests, and Public Property 1 2012-07-01 2012-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...

  18. 36 CFR 2.12 - Audio disturbances.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 36 Parks, Forests, and Public Property 1 2010-07-01 2010-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...

  19. 37 CFR 202.22 - Acquisition and deposit of unpublished audio and audiovisual transmission programs.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... unpublished audio and audiovisual transmission programs. 202.22 Section 202.22 Patents, Trademarks, and... REGISTRATION OF CLAIMS TO COPYRIGHT § 202.22 Acquisition and deposit of unpublished audio and audiovisual... and copies of unpublished audio and audiovisual transmission programs by the Library of Congress under...

  20. 36 CFR § 1002.12 - Audio disturbances.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 36 Parks, Forests, and Public Property 3 2013-07-01 2012-07-01 true Audio disturbances. § 1002.12... RECREATION § 1002.12 Audio disturbances. (a) The following are prohibited: (1) Operating motorized equipment or machinery such as an electric generating plant, motor vehicle, motorized toy, or an audio device...

  1. 47 CFR 10.520 - Common audio attention signal.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 47 Telecommunication 1 2013-10-01 2013-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...

  2. 37 CFR 202.22 - Acquisition and deposit of unpublished audio and audiovisual transmission programs.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... unpublished audio and audiovisual transmission programs. 202.22 Section 202.22 Patents, Trademarks, and... REGISTRATION OF CLAIMS TO COPYRIGHT § 202.22 Acquisition and deposit of unpublished audio and audiovisual... and copies of unpublished audio and audiovisual transmission programs by the Library of Congress under...

  3. 36 CFR 2.12 - Audio disturbances.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 36 Parks, Forests, and Public Property 1 2013-07-01 2013-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...

  4. 37 CFR 202.22 - Acquisition and deposit of unpublished audio and audiovisual transmission programs.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... unpublished audio and audiovisual transmission programs. 202.22 Section 202.22 Patents, Trademarks, and... REGISTRATION OF CLAIMS TO COPYRIGHT § 202.22 Acquisition and deposit of unpublished audio and audiovisual... and copies of unpublished audio and audiovisual transmission programs by the Library of Congress under...

  5. ENERGY STAR Certified Audio Video

    EPA Pesticide Factsheets

    Certified models meet all ENERGY STAR requirements as listed in the Version 3.0 ENERGY STAR Program Requirements for Audio Video Equipment that are effective as of May 1, 2013. A detailed listing of key efficiency criteria are available at http://www.energystar.gov/index.cfm?c=audio_dvd.pr_crit_audio_dvd

  6. 36 CFR 2.12 - Audio disturbances.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 36 Parks, Forests, and Public Property 1 2014-07-01 2014-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...

  7. 47 CFR 11.33 - EAS Decoder.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ...: (1) Inputs. Decoders must have the capability to receive at least two audio inputs from EAS... externally, at least two minutes of audio or text messages. A decoder manufactured without an internal means to record and store audio or text must be equipped with a means (such as an audio or digital jack...

  8. 47 CFR 11.33 - EAS Decoder.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ...: (1) Inputs. Decoders must have the capability to receive at least two audio inputs from EAS... externally, at least two minutes of audio or text messages. A decoder manufactured without an internal means to record and store audio or text must be equipped with a means (such as an audio or digital jack...

  9. 47 CFR 10.520 - Common audio attention signal.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 47 Telecommunication 1 2014-10-01 2014-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...

  10. 37 CFR 202.22 - Acquisition and deposit of unpublished audio and audiovisual transmission programs.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... unpublished audio and audiovisual transmission programs. 202.22 Section 202.22 Patents, Trademarks, and... REGISTRATION OF CLAIMS TO COPYRIGHT § 202.22 Acquisition and deposit of unpublished audio and audiovisual... and copies of unpublished audio and audiovisual transmission programs by the Library of Congress under...

  11. 47 CFR 10.520 - Common audio attention signal.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 47 Telecommunication 1 2012-10-01 2012-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...

  12. 47 CFR 11.33 - EAS Decoder.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ...: (1) Inputs. Decoders must have the capability to receive at least two audio inputs from EAS... externally, at least two minutes of audio or text messages. A decoder manufactured without an internal means to record and store audio or text must be equipped with a means (such as an audio or digital jack...

  13. 36 CFR 2.12 - Audio disturbances.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 36 Parks, Forests, and Public Property 1 2011-07-01 2011-07-01 false Audio disturbances. 2.12... RESOURCE PROTECTION, PUBLIC USE AND RECREATION § 2.12 Audio disturbances. (a) The following are prohibited..., motorized toy, or an audio device, such as a radio, television set, tape deck or musical instrument, in a...

  14. Content-based analysis of news video

    NASA Astrophysics Data System (ADS)

    Yu, Junqing; Zhou, Dongru; Liu, Huayong; Cai, Bo

    2001-09-01

    In this paper, we present a schema for content-based analysis of broadcast news video. First, we separate commercials from news using audiovisual features. Then, we automatically organize news programs into a content hierarchy at various levels of abstraction via effective integration of video, audio, and text data available from the news programs. Based on these news video structure and content analysis technologies, a TV news video Library is generated, from which users can retrieve definite news story according to their demands.

  15. Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

    PubMed Central

    Haro, Martín; Serrà, Joan; Herrera, Perfecto; Corral, Álvaro

    2012-01-01

    Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources. PMID:22479497

  16. Algorithms for highway-speed acoustic impact-echo evaluation of concrete bridge decks

    NASA Astrophysics Data System (ADS)

    Mazzeo, Brian A.; Guthrie, W. Spencer

    2018-04-01

    A new acoustic impact-echo testing device has been developed for detecting and mapping delaminations in concrete bridge decks at highway speeds. The apparatus produces nearly continuous acoustic excitation of concrete bridge decks through rolling mats of chains that are placed around six wheels mounted to a hinged trailer. The wheels approximately span the width of a traffic lane, and the ability to remotely lower and raise the apparatus using a winch system allows continuous data collection without stationary traffic control or exposure of personnel to traffic. Microphones near the wheels are used to record the acoustic response of the bridge deck during testing. In conjunction with the development of this new apparatus, advances in the algorithms required for data analysis were needed. This paper describes the general framework of the algorithms developed for converting differential global positioning system data and multi-channel audio data into maps that can be used in support of engineering decisions about bridge deck maintenance, rehabilitation, and replacement (MR&R). Acquisition of position and audio data is coordinated on a laptop computer through a custom graphical user interface. All of the streams of data are synchronized with the universal computer time so that audio data can be associated with interpolated position information through data post-processing. The audio segments are individually processed according to particular detection algorithms that can adapt to variations in microphone sensitivity or particular chain excitations. Features that are greater than a predetermined threshold, which is held constant throughout the analysis, are then subjected to further analysis and included in a map that shows the results of the testing. Maps of data collected on a bridge deck using the new acoustic impact-echo testing device at different speeds ranging from approximately 10 km/h to 55 km/h indicate that the collected data are reasonably repeatable. Use of the new acoustic impact-echo testing device is expected to enable more informed decisions about MR&R of concrete bridge decks.

  17. Multimodal fusion of polynomial classifiers for automatic person recgonition

    NASA Astrophysics Data System (ADS)

    Broun, Charles C.; Zhang, Xiaozheng

    2001-03-01

    With the prevalence of the information age, privacy and personalization are forefront in today's society. As such, biometrics are viewed as essential components of current evolving technological systems. Consumers demand unobtrusive and non-invasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals in all environments without encumbering the user with a head- mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multi modal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality-giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within Markov random field MRF framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late integration approach, based on a probabilistic model, is employed to combine the two modalities. The system is tested on the XM2VTS database combined with AWGN in the audio domain over a range of signal-to-noise ratios.

  18. Metal Sounds Stiffer than Drums for Ears, but Not Always for Hands: Low-Level Auditory Features Affect Multisensory Stiffness Perception More than High-Level Categorical Information

    PubMed Central

    Liu, Juan; Ando, Hiroshi

    2016-01-01

    Most real-world events stimulate multiple sensory modalities simultaneously. Usually, the stiffness of an object is perceived haptically. However, auditory signals also contain stiffness-related information, and people can form impressions of stiffness from the different impact sounds of metal, wood, or glass. To understand whether there is any interaction between auditory and haptic stiffness perception, and if so, whether the inferred material category is the most relevant auditory information, we conducted experiments using a force-feedback device and the modal synthesis method to present haptic stimuli and impact sound in accordance with participants’ actions, and to modulate low-level acoustic parameters, i.e., frequency and damping, without changing the inferred material categories of sound sources. We found that metal sounds consistently induced an impression of stiffer surfaces than did drum sounds in the audio-only condition, but participants haptically perceived surfaces with modulated metal sounds as significantly softer than the same surfaces with modulated drum sounds, which directly opposes the impression induced by these sounds alone. This result indicates that, although the inferred material category is strongly associated with audio-only stiffness perception, low-level acoustic parameters, especially damping, are more tightly integrated with haptic signals than the material category is. Frequency played an important role in both audio-only and audio-haptic conditions. Our study provides evidence that auditory information influences stiffness perception differently in unisensory and multisensory tasks. Furthermore, the data demonstrated that sounds with higher frequency and/or shorter decay time tended to be judged as stiffer, and contact sounds of stiff objects had no effect on the haptic perception of soft surfaces. We argue that the intrinsic physical relationship between object stiffness and acoustic parameters may be applied as prior knowledge to achieve robust estimation of stiffness in multisensory perception. PMID:27902718

  19. Sounding ruins: reflections on the production of an 'audio drift'.

    PubMed

    Gallagher, Michael

    2015-07-01

    This article is about the use of audio media in researching places, which I term 'audio geography'. The article narrates some episodes from the production of an 'audio drift', an experimental environmental sound work designed to be listened to on a portable MP3 player whilst walking in a ruinous landscape. Reflecting on how this work functions, I argue that, as well as representing places, audio geography can shape listeners' attention and bodily movements, thereby reworking places, albeit temporarily. I suggest that audio geography is particularly apt for amplifying the haunted and uncanny qualities of places. I discuss some of the issues raised for research ethics, epistemology and spectral geographies.

  20. Sounding ruins: reflections on the production of an ‘audio drift’

    PubMed Central

    Gallagher, Michael

    2014-01-01

    This article is about the use of audio media in researching places, which I term ‘audio geography’. The article narrates some episodes from the production of an ‘audio drift’, an experimental environmental sound work designed to be listened to on a portable MP3 player whilst walking in a ruinous landscape. Reflecting on how this work functions, I argue that, as well as representing places, audio geography can shape listeners’ attention and bodily movements, thereby reworking places, albeit temporarily. I suggest that audio geography is particularly apt for amplifying the haunted and uncanny qualities of places. I discuss some of the issues raised for research ethics, epistemology and spectral geographies. PMID:29708107

Top