Sample records for multi-rate speech codec

  1. Performance of a low data rate speech codec for land-mobile satellite communications

    NASA Technical Reports Server (NTRS)

    Gersho, Allen; Jedrey, Thomas C.

    1990-01-01

    In an effort to foster the development of new technologies for the emerging land mobile satellite communications services, JPL funded two development contracts in 1984: one to the Univ. of Calif., Santa Barbara and the other to the Georgia Inst. of Technology, to develop algorithms and real time hardware for near toll quality speech compression at 4800 bits per second. Both universities have developed and delivered speech codecs to JPL, and the UCSB codec was extensively tested by JPL in a variety of experimental setups. The basic UCSB speech codec algorithms and the test results of the various experiments performed with this codec are presented.

  2. More About Vector Adaptive/Predictive Coding Of Speech

    NASA Technical Reports Server (NTRS)

    Jedrey, Thomas C.; Gersho, Allen

    1992-01-01

    Report presents additional information about digital speech-encoding and -decoding system described in "Vector Adaptive/Predictive Encoding of Speech" (NPO-17230). Summarizes development of vector adaptive/predictive coding (VAPC) system and describes basic functions of algorithm. Describes refinements introduced enabling receiver to cope with errors. VAPC algorithm implemented in integrated-circuit coding/decoding processors (codecs). VAPC and other codecs tested under variety of operating conditions. Tests designed to reveal effects of various background quiet and noisy environments and of poor telephone equipment. VAPC found competitive with and, in some respects, superior to other 4.8-kb/s codecs and other codecs of similar complexity.

  3. Development of an 8000 bps voice codec for AvSat

    NASA Technical Reports Server (NTRS)

    Clark, Joseph F.

    1988-01-01

    Air-mobile speech communication applications share robustness and noise immunity requirements with other mobile applications. The quality requirements are stringent, especially in the cockpit where air safety is involved. Based on these considerations, a decision was made to test an intermediate data rate such as 8.0 and 9.6 kb/s as proven technologies. A number of vocoders and codec technologies were investigated at rates ranging from 2.4 kb/s up to and including 9.6 kb/s. The proven vocoders operating at 2.4 and 4.8 kb/s lacked the noise immunity or the robustness to operate reliably in a cabin noise environment. One very attractive alternative approach was Spectrally Encoded Residual Excited LPC (SE-RELP) which is used in a multi-rate voice processor (MRP) developed at the Naval Research Lab (NRL). The MRP uses SE-RELP at rates of 9.6 and 16 kb/s. The 9.6 kb/s rate can be lowered to 8.0 kb/s without loss of information by modifying the frame. An 8.0 kb/s vocoder was developed using SE-RELP as a demonstrator and testbed. This demonstrator is implemented in real time using two Compaq 2 portable computers, each equipped with an ARIEL DSP016 Data Acquisition Processor.

  4. Audiovisual signal compression: the 64/P codecs

    NASA Astrophysics Data System (ADS)

    Jayant, Nikil S.

    1996-02-01

    Video codecs operating at integral multiples of 64 kbps are well-known in visual communications technology as p * 64 systems (p equals 1 to 24). Originally developed as a class of ITU standards, these codecs have served as core technology for videoconferencing, and they have also influenced the MPEG standards for addressable video. Video compression in the above systems is provided by motion compensation followed by discrete cosine transform -- quantization of the residual signal. Notwithstanding the promise of higher bit rates in emerging generations of networks and storage devices, there is a continuing need for facile audiovisual communications over voice band and wireless modems. Consequently, video compression at bit rates lower than 64 kbps is a widely-sought capability. In particular, video codecs operating at rates in the neighborhood of 64, 32, 16, and 8 kbps seem to have great practical value, being matched respectively to the transmission capacities of basic rate ISDN (64 kbps), and voiceband modems that represent high (32 kbps), medium (16 kbps) and low- end (8 kbps) grades in current modem technology. The purpose of this talk is to describe the state of video technology at these transmission rates, without getting too literal about the specific speeds mentioned above. In other words, we expect codecs designed for non- submultiples of 64 kbps, such as 56 kbps or 19.2 kbps, as well as for sub-multiples of 64 kbps, depending on varying constraints on modem rate and the transmission rate needed for the voice-coding part of the audiovisual communications link. The MPEG-4 video standards process is a natural platform on which to examine current capabilities in sub-ISDN rate video coding, and we shall draw appropriately from this process in describing video codec performance. Inherent in this summary is a reinforcement of motion compensation and DCT as viable building blocks of video compression systems, although there is a need for improving signal quality even in the very best of these systems. In a related part of our talk, we discuss the role of preprocessing and postprocessing subsystems which serve to enhance the performance of an otherwise standard codec. Examples of these (sometimes proprietary) subsystems are automatic face-tracking prior to the coding of a head-and-shoulders scene, and adaptive postfiltering after conventional decoding, to reduce generic classes of artifacts in low bit rate video. The talk concludes with a summary of technology targets and research directions. We discuss targets in terms of four fundamental parameters of coder performance: quality, bit rate, delay and complexity; and we emphasize the need for measuring and maximizing the composite quality of the audiovisual signal. In discussing research directions, we examine progress and opportunities in two fundamental approaches for bit rate reduction: removal of statistical redundancy and reduction of perceptual irrelevancy; we speculate on the value of techniques such as analysis-by-synthesis that have proved to be quite valuable in speech coding, and we examine the prospect of integrating speech and image processing for developing next-generation technology for audiovisual communications.

  5. Real-time speech encoding based on Code-Excited Linear Prediction (CELP)

    NASA Technical Reports Server (NTRS)

    Leblanc, Wilfrid P.; Mahmoud, S. A.

    1988-01-01

    This paper reports on the work proceeding with regard to the development of a real-time voice codec for the terrestrial and satellite mobile radio environments. The codec is based on a complexity reduced version of code-excited linear prediction (CELP). The codebook search complexity was reduced to only 0.5 million floating point operations per second (MFLOPS) while maintaining excellent speech quality. Novel methods to quantize the residual and the long and short term model filters are presented.

  6. Scalable Video Transmission Over Multi-Rate Multiple Access Channels

    DTIC Science & Technology

    2007-06-01

    Rate - compatible punctured convolutional codes (RCPC codes ) and their ap- plications,” IEEE...source encoded using the MPEG-4 video codec. The source encoded bitstream is then channel encoded with Rate Compatible Punctured Convolutional (RCPC...Clark, and J. M. Geist, “ Punctured convolutional codes or rate (n-1)/n and simplified maximum likelihood decoding,” IEEE Transactions on

  7. Flexible high speed CODEC

    NASA Technical Reports Server (NTRS)

    Wernlund, James V.

    1993-01-01

    HARRIS, under contract with NASA Lewis, has developed a hard decision BCH (Bose-Chaudhuri-Hocquenghem) triple error correcting block CODEC ASIC, that can be used in either a bursted or continuous mode. the ASIC contains both encoder and decoder functions, programmable lock thresholds, and PSK related functions. The CODEC provides up to 4 dB of coding gain for data rates up to 300 Mbps. The overhead is selectable from 7/8 to 15/16 resulting in minimal band spreading, for a given BER. Many of the internal calculations are brought out enabling the CODEC to be incorporated in more complex designs. The ASIC has been tested in BPSK, QPSK and 16-ary PSK link simulators and found to perform to within 0.1 dB of theory for BER's of 10(exp -2) to 10(exp -9). The ASIC itself, being a hard decision CODEC, is not limited to PSK modulation formats. Unlike most hard decision CODEC's, the HARRIS CODEC doesn't upgrade BER performance significantly at high BER's but rather becomes transparent.

  8. Evaluation of voice codecs for the Australian mobile satellite system

    NASA Technical Reports Server (NTRS)

    Bundrock, Tony; Wilkinson, Mal

    1990-01-01

    The evaluation procedure to choose a low bit rate voice coding algorithm is described for the Australian land mobile satellite system. The procedure is designed to assess both the inherent quality of the codec under 'normal' conditions and its robustness under 'severe' conditions. For the assessment, normal conditions were chosen to be random bit error rate with added background acoustic noise and the severe condition is designed to represent burst error conditions when mobile satellite channel suffers from signal fading due to roadside vegetation. The assessment is divided into two phases. First, a reduced set of conditions is used to determine a short list of candidate codecs for more extensive testing in the second phase. The first phase conditions include quality and robustness and codecs are ranked with a 60:40 weighting on the two. Second, the short listed codecs are assessed over a range of input voice levels, BERs, background noise conditions, and burst error distributions. Assessment is by subjective rating on a five level opinion scale and all results are then used to derive a weighted Mean Opinion Score using appropriate weights for each of the test conditions.

  9. Efficient random access high resolution region-of-interest (ROI) image retrieval using backward coding of wavelet trees (BCWT)

    NASA Astrophysics Data System (ADS)

    Corona, Enrique; Nutter, Brian; Mitra, Sunanda; Guo, Jiangling; Karp, Tanja

    2008-03-01

    Efficient retrieval of high quality Regions-Of-Interest (ROI) from high resolution medical images is essential for reliable interpretation and accurate diagnosis. Random access to high quality ROI from codestreams is becoming an essential feature in many still image compression applications, particularly in viewing diseased areas from large medical images. This feature is easier to implement in block based codecs because of the inherent spatial independency of the code blocks. This independency implies that the decoding order of the blocks is unimportant as long as the position for each is properly identified. In contrast, wavelet-tree based codecs naturally use some interdependency that exploits the decaying spectrum model of the wavelet coefficients. Thus one must keep track of the decoding order from level to level with such codecs. We have developed an innovative multi-rate image subband coding scheme using "Backward Coding of Wavelet Trees (BCWT)" which is fast, memory efficient, and resolution scalable. It offers far less complexity than many other existing codecs including both, wavelet-tree, and block based algorithms. The ROI feature in BCWT is implemented through a transcoder stage that generates a new BCWT codestream containing only the information associated with the user-defined ROI. This paper presents an efficient technique that locates a particular ROI within the BCWT coded domain, and decodes it back to the spatial domain. This technique allows better access and proper identification of pathologies in high resolution images since only a small fraction of the codestream is required to be transmitted and analyzed.

  10. A new display stream compression standard under development in VESA

    NASA Astrophysics Data System (ADS)

    Jacobson, Natan; Thirumalai, Vijayaraghavan; Joshi, Rajan; Goel, James

    2017-09-01

    The Advanced Display Stream Compression (ADSC) codec project is in development in response to a call for technologies from the Video Electronics Standards Association (VESA). This codec targets visually lossless compression of display streams at a high compression rate (typically 6 bits/pixel) for mobile/VR/HDR applications. Functionality of the ADSC codec is described in this paper, and subjective trials results are provided using the ISO 29170-2 testing protocol.

  11. 47 CFR 73.758 - System specifications for digitally modulated emissions in the HF broadcasting service.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... digital audio broadcasting and datacasting are authorized. The RF requirements for the DRM system are... tolerance. The frequency tolerance shall be 10 Hz. See Section 73.757(b)(2), notes 1 and 2. (3) Audio... performance of a speech codec (of the order of 3 kHz). The choice of audio quality is connected to the needs...

  12. M/A-COM Linkabit Eastern Operations

    DTIC Science & Technology

    1983-03-31

    Lincoln Laboratories speech codec for use in multimedia system development. Communication equipment included 1200-bps dial-up modems and a set of...connected to the DCN for use in[7, Page 4 general word-processing and network-testing applications.Additional modems and video terminals have also been...line 0) can be connected to a second terminal, a printer, or a modem . The standard configuration assumes this line is connected to a terminal or

  13. Modem design for a MOBILESAT terminal

    NASA Technical Reports Server (NTRS)

    Rice, M.; Miller, M. J.; Cowley, W. G.; Rowe, D.

    1990-01-01

    The implementation is described of a programmable digital signal processor based system, designed for use as a test bed in the development of a digital modem, codec, and channel simulator. Code was written to configure the system as a 5600 bps or 6600 bps QPSK modem. The test bed is currently being used in an experiment to evaluate the performance of digital speech over shadowed channels in the Australian mobile satellite (MOBILESAT) project.

  14. A hybrid video codec based on extended block sizes, recursive integer transforms, improved interpolation, and flexible motion representation

    NASA Astrophysics Data System (ADS)

    Karczewicz, Marta; Chen, Peisong; Joshi, Rajan; Wang, Xianglin; Chien, Wei-Jung; Panchal, Rahul; Coban, Muhammed; Chong, In Suk; Reznik, Yuriy A.

    2011-01-01

    This paper describes video coding technology proposal submitted by Qualcomm Inc. in response to a joint call for proposal (CfP) issued by ITU-T SG16 Q.6 (VCEG) and ISO/IEC JTC1/SC29/WG11 (MPEG) in January 2010. Proposed video codec follows a hybrid coding approach based on temporal prediction, followed by transform, quantization, and entropy coding of the residual. Some of its key features are extended block sizes (up to 64x64), recursive integer transforms, single pass switched interpolation filters with offsets (single pass SIFO), mode dependent directional transform (MDDT) for intra-coding, luma and chroma high precision filtering, geometry motion partitioning, adaptive motion vector resolution. It also incorporates internal bit-depth increase (IBDI), and modified quadtree based adaptive loop filtering (QALF). Simulation results are presented for a variety of bit rates, resolutions and coding configurations to demonstrate the high compression efficiency achieved by the proposed video codec at moderate level of encoding and decoding complexity. For random access hierarchical B configuration (HierB), the proposed video codec achieves an average BD-rate reduction of 30.88c/o compared to the H.264/AVC alpha anchor. For low delay hierarchical P (HierP) configuration, the proposed video codec achieves an average BD-rate reduction of 32.96c/o and 48.57c/o, compared to the H.264/AVC beta and gamma anchors, respectively.

  15. On the definition of adapted audio/video profiles for high-quality video calling services over LTE/4G

    NASA Astrophysics Data System (ADS)

    Ndiaye, Maty; Quinquis, Catherine; Larabi, Mohamed Chaker; Le Lay, Gwenael; Saadane, Hakim; Perrine, Clency

    2014-01-01

    During the last decade, the important advances and widespread availability of mobile technology (operating systems, GPUs, terminal resolution and so on) have encouraged a fast development of voice and video services like video-calling. While multimedia services have largely grown on mobile devices, the generated increase of data consumption is leading to the saturation of mobile networks. In order to provide data with high bit-rates and maintain performance as close as possible to traditional networks, the 3GPP (The 3rd Generation Partnership Project) worked on a high performance standard for mobile called Long Term Evolution (LTE). In this paper, we aim at expressing recommendations related to audio and video media profiles (selection of audio and video codecs, bit-rates, frame-rates, audio and video formats) for a typical video-calling services held over LTE/4G mobile networks. These profiles are defined according to targeted devices (smartphones, tablets), so as to ensure the best possible quality of experience (QoE). Obtained results indicate that for a CIF format (352 x 288 pixels) which is usually used for smartphones, the VP8 codec provides a better image quality than the H.264 codec for low bitrates (from 128 to 384 kbps). However sequences with high motion, H.264 in slow mode is preferred. Regarding audio, better results are globally achieved using wideband codecs offering good quality except for opus codec (at 12.2 kbps).

  16. Present state of HDTV coding in Japan and future prospect

    NASA Astrophysics Data System (ADS)

    Murakami, Hitomi

    The development status of HDTV digital codecs in Japan is evaluated; several bit rate-reduction codecs have been developed for 1125 lines/60-field HDTV, and performance trials have been conducted through satellite and optical fiber links. Prospective development efforts will attempt to achieve more efficient coding schemes able to reduce the bit rate to as little as 45 Mbps, as well as to apply coding schemes to automated teller machine networks.

  17. Performance comparison of AV1, HEVC, and JVET video codecs on 360 (spherical) video

    NASA Astrophysics Data System (ADS)

    Topiwala, Pankaj; Dai, Wei; Krishnan, Madhu; Abbas, Adeel; Doshi, Sandeep; Newman, David

    2017-09-01

    This paper compares the coding efficiency performance on 360 videos, of three software codecs: (a) AV1 video codec from the Alliance for Open Media (AOM); (b) the HEVC Reference Software HM; and (c) the JVET JEM Reference SW. Note that 360 video is especially challenging content, in that one codes full res globally, but typically looks locally (in a viewport), which magnifies errors. These are tested in two different projection formats ERP and RSP, to check consistency. Performance is tabulated for 1-pass encoding on two fronts: (1) objective performance based on end-to-end (E2E) metrics such as SPSNR-NN, and WS-PSNR, currently developed in the JVET committee; and (2) informal subjective assessment of static viewports. Constant quality encoding is performed with all the three codecs for an unbiased comparison of the core coding tools. Our general conclusion is that under constant quality coding, AV1 underperforms HEVC, which underperforms JVET. We also test with rate control, where AV1 currently underperforms the open source X265 HEVC codec. Objective and visual evidence is provided.

  18. Design considerations for computationally constrained two-way real-time video communication

    NASA Astrophysics Data System (ADS)

    Bivolarski, Lazar M.; Saunders, Steven E.; Ralston, John D.

    2009-08-01

    Today's video codecs have evolved primarily to meet the requirements of the motion picture and broadcast industries, where high-complexity studio encoding can be utilized to create highly-compressed master copies that are then broadcast one-way for playback using less-expensive, lower-complexity consumer devices for decoding and playback. Related standards activities have largely ignored the computational complexity and bandwidth constraints of wireless or Internet based real-time video communications using devices such as cell phones or webcams. Telecommunications industry efforts to develop and standardize video codecs for applications such as video telephony and video conferencing have not yielded image size, quality, and frame-rate performance that match today's consumer expectations and market requirements for Internet and mobile video services. This paper reviews the constraints and the corresponding video codec requirements imposed by real-time, 2-way mobile video applications. Several promising elements of a new mobile video codec architecture are identified, and more comprehensive computational complexity metrics and video quality metrics are proposed in order to support the design, testing, and standardization of these new mobile video codecs.

  19. Pre-processing SAR image stream to facilitate compression for transport on bandwidth-limited-link

    DOEpatents

    Rush, Bobby G.; Riley, Robert

    2015-09-29

    Pre-processing is applied to a raw VideoSAR (or similar near-video rate) product to transform the image frame sequence into a product that resembles more closely the type of product for which conventional video codecs are designed, while sufficiently maintaining utility and visual quality of the product delivered by the codec.

  20. Digital Signal Processing For Low Bit Rate TV Image Codecs

    NASA Astrophysics Data System (ADS)

    Rao, K. R.

    1987-06-01

    In view of the 56 KBPS digital switched network services and the ISDN, low bit rate codecs for providing real time full motion color video are under various stages of development. Some companies have already brought the codecs into the market. They are being used by industry and some Federal Agencies for video teleconferencing. In general, these codecs have various features such as multiplexing audio and data, high resolution graphics, encryption, error detection and correction, self diagnostics, freezeframe, split video, text overlay etc. To transmit the original color video on a 56 KBPS network requires bit rate reduction of the order of 1400:1. Such a large scale bandwidth compression can be realized only by implementing a number of sophisticated,digital signal processing techniques. This paper provides an overview of such techniques and outlines the newer concepts that are being investigated. Before resorting to the data compression techniques, various preprocessing operations such as noise filtering, composite-component transformation and horizontal and vertical blanking interval removal are to be implemented. Invariably spatio-temporal subsampling is achieved by appropriate filtering. Transform and/or prediction coupled with motion estimation and strengthened by adaptive features are some of the tools in the arsenal of the data reduction methods. Other essential blocks in the system are quantizer, bit allocation, buffer, multiplexer, channel coding etc.

  1. Parallel efficient rate control methods for JPEG 2000

    NASA Astrophysics Data System (ADS)

    Martínez-del-Amor, Miguel Á.; Bruns, Volker; Sparenberg, Heiko

    2017-09-01

    Since the introduction of JPEG 2000, several rate control methods have been proposed. Among them, post-compression rate-distortion optimization (PCRD-Opt) is the most widely used, and the one recommended by the standard. The approach followed by this method is to first compress the entire image split in code blocks, and subsequently, optimally truncate the set of generated bit streams according to the maximum target bit rate constraint. The literature proposes various strategies on how to estimate ahead of time where a block will get truncated in order to stop the execution prematurely and save time. However, none of them have been defined bearing in mind a parallel implementation. Today, multi-core and many-core architectures are becoming popular for JPEG 2000 codecs implementations. Therefore, in this paper, we analyze how some techniques for efficient rate control can be deployed in GPUs. In order to do that, the design of our GPU-based codec is extended, allowing stopping the process at a given point. This extension also harnesses a higher level of parallelism on the GPU, leading to up to 40% of speedup with 4K test material on a Titan X. In a second step, three selected rate control methods are adapted and implemented in our parallel encoder. A comparison is then carried out, and used to select the best candidate to be deployed in a GPU encoder, which gave an extra 40% of speedup in those situations where it was really employed.

  2. High bit depth infrared image compression via low bit depth codecs

    NASA Astrophysics Data System (ADS)

    Belyaev, Evgeny; Mantel, Claire; Forchhammer, Søren

    2017-08-01

    Future infrared remote sensing systems, such as monitoring of the Earth's environment by satellites, infrastructure inspection by unmanned airborne vehicles etc., will require 16 bit depth infrared images to be compressed and stored or transmitted for further analysis. Such systems are equipped with low power embedded platforms where image or video data is compressed by a hardware block called the video processing unit (VPU). However, in many cases using two 8-bit VPUs can provide advantages compared with using higher bit depth image compression directly. We propose to compress 16 bit depth images via 8 bit depth codecs in the following way. First, an input 16 bit depth image is mapped into 8 bit depth images, e.g., the first image contains only the most significant bytes (MSB image) and the second one contains only the least significant bytes (LSB image). Then each image is compressed by an image or video codec with 8 bits per pixel input format. We analyze how the compression parameters for both MSB and LSB images should be chosen to provide the maximum objective quality for a given compression ratio. Finally, we apply the proposed infrared image compression method utilizing JPEG and H.264/AVC codecs, which are usually available in efficient implementations, and compare their rate-distortion performance with JPEG2000, JPEG-XT and H.265/HEVC codecs supporting direct compression of infrared images in 16 bit depth format. A preliminary result shows that two 8 bit H.264/AVC codecs can achieve similar result as 16 bit HEVC codec.

  3. Using speech recognition to enhance the Tongue Drive System functionality in computer access.

    PubMed

    Huo, Xueliang; Ghovanloo, Maysam

    2011-01-01

    Tongue Drive System (TDS) is a wireless tongue operated assistive technology (AT), which can enable people with severe physical disabilities to access computers and drive powered wheelchairs using their volitional tongue movements. TDS offers six discrete commands, simultaneously available to the users, for pointing and typing as a substitute for mouse and keyboard in computer access, respectively. To enhance the TDS performance in typing, we have added a microphone, an audio codec, and a wireless audio link to its readily available 3-axial magnetic sensor array, and combined it with a commercially available speech recognition software, the Dragon Naturally Speaking, which is regarded as one of the most efficient ways for text entry. Our preliminary evaluations indicate that the combined TDS and speech recognition technologies can provide end users with significantly higher performance than using each technology alone, particularly in completing tasks that require both pointing and text entry, such as web surfing.

  4. A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks.

    PubMed

    Shahamiri, Seyed Reza; Salim, Siti Salwah Binti

    2014-09-01

    Automatic speech recognition (ASR) can be very helpful for speakers who suffer from dysarthria, a neurological disability that damages the control of motor speech articulators. Although a few attempts have been made to apply ASR technologies to sufferers of dysarthria, previous studies show that such ASR systems have not attained an adequate level of performance. In this study, a dysarthric multi-networks speech recognizer (DM-NSR) model is provided using a realization of multi-views multi-learners approach called multi-nets artificial neural networks, which tolerates variability of dysarthric speech. In particular, the DM-NSR model employs several ANNs (as learners) to approximate the likelihood of ASR vocabulary words and to deal with the complexity of dysarthric speech. The proposed DM-NSR approach was presented as both speaker-dependent and speaker-independent paradigms. In order to highlight the performance of the proposed model over legacy models, multi-views single-learner models of the DM-NSRs were also provided and their efficiencies were compared in detail. Moreover, a comparison among the prominent dysarthric ASR methods and the proposed one is provided. The results show that the DM-NSR recorded improved recognition rate by up to 24.67% and the error rate was reduced by up to 8.63% over the reference model.

  5. Using Speech Recognition to Enhance the Tongue Drive System Functionality in Computer Access

    PubMed Central

    Huo, Xueliang; Ghovanloo, Maysam

    2013-01-01

    Tongue Drive System (TDS) is a wireless tongue operated assistive technology (AT), which can enable people with severe physical disabilities to access computers and drive powered wheelchairs using their volitional tongue movements. TDS offers six discrete commands, simultaneously available to the users, for pointing and typing as a substitute for mouse and keyboard in computer access, respectively. To enhance the TDS performance in typing, we have added a microphone, an audio codec, and a wireless audio link to its readily available 3-axial magnetic sensor array, and combined it with a commercially available speech recognition software, the Dragon Naturally Speaking, which is regarded as one of the most efficient ways for text entry. Our preliminary evaluations indicate that the combined TDS and speech recognition technologies can provide end users with significantly higher performance than using each technology alone, particularly in completing tasks that require both pointing and text entry, such as web surfing. PMID:22255801

  6. Flexible High Speed Codec (FHSC)

    NASA Technical Reports Server (NTRS)

    Segallis, G. P.; Wernlund, J. V.

    1991-01-01

    The ongoing NASA/Harris Flexible High Speed Codec (FHSC) program is described. The program objectives are to design and build an encoder decoder that allows operation in either burst or continuous modes at data rates of up to 300 megabits per second. The decoder handles both hard and soft decision decoding and can switch between modes on a burst by burst basis. Bandspreading is low since the code rate is greater than or equal to 7/8. The encoder and a hard decision decoder fit on a single application specific integrated circuit (ASIC) chip. A soft decision applique is implemented using 300 K emitter coupled logic (ECL) which can be easily translated to an ECL gate array.

  7. Advanced modulation technology development for earth station demodulator applications. Coded modulation system development

    NASA Technical Reports Server (NTRS)

    Miller, Susan P.; Kappes, J. Mark; Layer, David H.; Johnson, Peter N.

    1990-01-01

    A jointly optimized coded modulation system is described which was designed, built, and tested by COMSAT Laboratories for NASA LeRC which provides a bandwidth efficiency of 2 bits/s/Hz at an information rate of 160 Mbit/s. A high speed rate 8/9 encoder with a Viterbi decoder and an Octal PSK modem are used to achieve this. The BER performance is approximately 1 dB from the theoretically calculated value for this system at a BER of 5 E-7 under nominal conditions. The system operates in burst mode for downlink applications and tests have demonstrated very little degradation in performance with frequency and level offset. Unique word miss rate measurements were conducted which demonstrate reliable acquisition at low values of Eb/No. Codec self tests have verified the performance of this subsystem in a stand alone mode. The codec is capable of operation at a 200 Mbit/s information rate as demonstrated using a codec test set which introduces noise digitally. The measured performance is within 0.2 dB of the computer simulated predictions. A gate array implementation of the most time critical element of the high speed Viterbi decoder was completed. This gate array add-compare-select chip significantly reduces the power consumption and improves the manufacturability of the decoder. This chip has general application in the implementation of high speed Viterbi decoders.

  8. High-performance software-only H.261 video compression on PC

    NASA Astrophysics Data System (ADS)

    Kasperovich, Leonid

    1996-03-01

    This paper describes an implementation of a software H.261 codec for PC, that takes an advantage of the fast computational algorithms for DCT-based video compression, which have been presented by the author at the February's 1995 SPIE/IS&T meeting. The motivation for developing the H.261 prototype system is to demonstrate a feasibility of real time software- only videoconferencing solution to operate across a wide range of network bandwidth, frame rate, and resolution of the input video. As the bandwidths of current network technology will be increased, the higher frame rate and resolution of video to be transmitted is allowed, that requires, in turn, a software codec to be able to compress pictures of CIF (352 X 288) resolution at up to 30 frame/sec. Running on Pentium 133 MHz PC the codec presented is capable to compress video in CIF format at 21 - 23 frame/sec. This result is comparable to the known hardware-based H.261 solutions, but it doesn't require any specific hardware. The methods to achieve high performance, the program optimization technique for Pentium microprocessor along with the performance profile, showing the actual contribution of the different encoding/decoding stages to the overall computational process, are presented.

  9. Satellite Data Transmission (SDT) requirement

    NASA Technical Reports Server (NTRS)

    Chie, C. M.; White, M.; Lindsey, W. C.

    1984-01-01

    An 85 Mb/s modem/codec to operate in a 34 MHz C-band domestic satellite transponder at a system carrier to noise power ratio of 19.5 dB is discussed. Characteristics of a satellite channel and the approach adopted for the satellite data transmission modem/codec selection are discussed. Measured data and simulation results of the existing 50 Mbps link are compared and used to verify the simulation techniques. Various modulation schemes that were screened for the SDT are discussed and the simulated performance of two prime candidates, the 8 PSK and the SMSK/2 are given. The selection process that leads to the candidate codec techniques are documented and the technology of the modem/codec candidates is assessed. Costs of the modems and codecs are estimated.

  10. Energy minimization of mobile video devices with a hardware H.264/AVC encoder based on energy-rate-distortion optimization

    NASA Astrophysics Data System (ADS)

    Kang, Donghun; Lee, Jungeon; Jung, Jongpil; Lee, Chul-Hee; Kyung, Chong-Min

    2014-09-01

    In mobile video systems powered by battery, reducing the encoder's compression energy consumption is critical to prolong its lifetime. Previous Energy-rate-distortion (E-R-D) optimization methods based on a software codec is not suitable for practical mobile camera systems because the energy consumption is too large and encoding rate is too low. In this paper, we propose an E-R-D model for the hardware codec based on the gate-level simulation framework to measure the switching activity and the energy consumption. From the proposed E-R-D model, an energy minimizing algorithm for mobile video camera sensor have been developed with the GOP (Group of Pictures) size and QP(Quantization Parameter) as run-time control variables. Our experimental results show that the proposed algorithm provides up to 31.76% of energy consumption saving while satisfying the rate and distortion constraints.

  11. Ka-Band, Multi-Gigabit-Per-Second Transceiver

    NASA Technical Reports Server (NTRS)

    Simons, Rainee N.; Wintucky, Edwin G.; Smith, Francis J.; Harris, Johnny M.; Landon, David G.; Haddadin, Osama S.; McIntire, William K.; Sun, June Y.

    2011-01-01

    A document discusses a multi-Gigabit-per-second, Ka-band transceiver with a software-defined modem (SDM) capable of digitally encoding/decoding data and compensating for linear and nonlinear distortions in the end-to-end system, including the traveling-wave tube amplifier (TWTA). This innovation can increase data rates of space-to-ground communication links, and has potential application to NASA s future spacebased Earth observation system. The SDM incorporates an extended version of the industry-standard DVB-S2, and LDPC rate 9/10 FEC codec. The SDM supports a suite of waveforms, including QPSK, 8-PSK, 16-APSK, 32- APSK, 64-APSK, and 128-QAM. The Ka-band and TWTA deliver an output power on the order of 200 W with efficiency greater than 60%, and a passband of at least 3 GHz. The modem and the TWTA together enable a data rate of 20 Gbps with a low bit error rate (BER). The payload data rates for spacecraft in NASA s integrated space communications network can be increased by an order of magnitude (>10 ) over current state-of-practice. This innovation enhances the data rate by using bandwidth-efficient modulation techniques, which transmit a higher number of bits per Hertz of bandwidth than the currently used quadrature phase shift keying (QPSK) waveforms.

  12. Secure videoconferencing equipment switching system and method

    DOEpatents

    Dirks, David H; Gomes, Diane; Stewart, Corbin J; Fischer, Robert A

    2013-04-30

    Examples of systems described herein include videoconferencing systems having audio/visual components coupled to a codec. The codec may be configured by a control system. Communication networks having different security levels may be alternately coupled to the codec following appropriate configuration by the control system. The control system may also be coupled to the communication networks.

  13. Low-delay predictive audio coding for the HIVITS HDTV codec

    NASA Astrophysics Data System (ADS)

    McParland, A. K.; Gilchrist, N. H. C.

    1995-01-01

    The status of work relating to predictive audio coding, as part of the European project on High Quality Video Telephone and HD(TV) Systems (HIVITS), is reported. The predictive coding algorithm is developed, along with six-channel audio coding and decoding hardware. Demonstrations of the audio codec operating in conjunction with the video codec, are given.

  14. Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition.

    PubMed

    Wang, Kun-Ching

    2015-01-14

    The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.

  15. A forward error correction technique using a high-speed, high-rate single chip codec

    NASA Astrophysics Data System (ADS)

    Boyd, R. W.; Hartman, W. F.; Jones, Robert E.

    The authors describe an error-correction coding approach that allows operation in either burst or continuous modes at data rates of multiple hundreds of megabits per second. Bandspreading is low since the code rate is 7/8 or greater, which is consistent with high-rate link operation. The encoder, along with a hard-decision decoder, fits on a single application-specific integrated circuit (ASIC) chip. Soft-decision decoding is possible utilizing applique hardware in conjunction with the hard-decision decoder. Expected coding gain is a function of the application and is approximately 2.5 dB for hard-decision decoding at 10-5 bit-error rate with phase-shift-keying modulation and additive Gaussian white noise interference. The principal use envisioned for this technique is to achieve a modest amount of coding gain on high-data-rate, bandwidth-constrained channels. Data rates of up to 300 Mb/s can be accommodated by the codec chip. The major objective is burst-mode communications, where code words are composed of 32 n data bits followed by 32 overhead bits.

  16. Codec-on-Demand Based on User-Level Virtualization

    NASA Astrophysics Data System (ADS)

    Zhang, Youhui; Zheng, Weimin

    At work, at home, and in some public places, a desktop PC is usually available nowadays. Therefore, it is important for users to be able to play various videos on different PCs smoothly, but the diversity of codec types complicates the situation. Although some mainstream media players can try to download the needed codec automatically, this may fail for average users because installing the codec usually requires administrator privileges to complete, while the user may not be the owner of the PC. We believe an ideal solution should work without users' intervention, and need no special privileges. This paper proposes such a user-friendly, program-transparent solution for Windows-based media players. It runs the media player in a user-mode virtualization environment, and then downloads the needed codec on-the-fly. Because of API (Application Programming Interface) interception, some resource-accessing API calls from the player will be redirected to the downloaded codec resources. Then from the viewpoint of the player, the necessary codec exists locally and it can handle the video smoothly, although neither system registry nor system folders was modified during this process. Besides convenience, the principle of least privilege is maintained and the host system is left clean. This paper completely analyzes the technical issues and presents such a prototype which can work with DirectShow-compatible players. Performance tests show that the overhead is negligible. Moreover, our solution conforms to the Software-As-A-Service (SaaS) mode, which is very promising in the Internet era.

  17. Demonstration of Multi-Gbps Data Rates at Ka-Band Using Software-Defined Modem and Broadband High Power Amplifier for Space Communications

    NASA Technical Reports Server (NTRS)

    Simons, Rainee N.; Wintucky, Edwin G.; Landon, David G.; Sun, Jun Y.; Winn, James S.; Laraway, Stephen; McIntire, William K.; Metz, John L.; Smith, Francis J.

    2011-01-01

    The paper presents the first ever research and experimental results regarding the combination of a software-defined multi-Gbps modem and a broadband high power space amplifier when tested with an extended form of the industry standard DVB-S2 and LDPC rate 9/10 FEC codec. The modem supports waveforms including QPSK, 8-PSK, 16-APSK, 32-APSK, 64-APSK, and 128-QAM. The broadband high power amplifier is a space qualified traveling-wave tube (TWT), which has a passband greater than 3 GHz at 33 GHz, output power of 200 W and efficiency greater than 60 percent. The modem and the TWTA together enabled an unprecedented data rate at 20 Gbps with low BER of 10(exp -9). The presented results include a plot of the received waveform constellation, BER vs. E(sub b)/N(sub 0) and implementation loss for each of the modulation types tested. The above results when included in an RF link budget analysis show that NASA s payload data rate can be increased by at least an order of magnitude (greater than 10X) over current state-of-practice, limited only by the spacecraft EIRP, ground receiver G/T, range, and available spectrum or bandwidth.

  18. Influence of acquisition frame-rate and video compression techniques on pulse-rate variability estimation from vPPG signal.

    PubMed

    Cerina, Luca; Iozzia, Luca; Mainardi, Luca

    2017-11-14

    In this paper, common time- and frequency-domain variability indexes obtained by pulse rate variability (PRV) series extracted from video-photoplethysmographic signal (vPPG) were compared with heart rate variability (HRV) parameters calculated from synchronized ECG signals. The dual focus of this study was to analyze the effect of different video acquisition frame-rates starting from 60 frames-per-second (fps) down to 7.5 fps and different video compression techniques using both lossless and lossy codecs on PRV parameters estimation. Video recordings were acquired through an off-the-shelf GigE Sony XCG-C30C camera on 60 young, healthy subjects (age 23±4 years) in the supine position. A fully automated, signal extraction method based on the Kanade-Lucas-Tomasi (KLT) algorithm for regions of interest (ROI) detection and tracking, in combination with a zero-phase principal component analysis (ZCA) signal separation technique was employed to convert the video frames sequence to a pulsatile signal. The frame-rate degradation was simulated on video recordings by directly sub-sampling the ROI tracking and signal extraction modules, to correctly mimic videos recorded at a lower speed. The compression of the videos was configured to avoid any frame rejection caused by codec quality leveling, FFV1 codec was used for lossless compression and H.264 with variable quality parameter as lossy codec. The results showed that a reduced frame-rate leads to inaccurate tracking of ROIs, increased time-jitter in the signals dynamics and local peak displacements, which degrades the performances in all the PRV parameters. The root mean square of successive differences (RMSSD) and the proportion of successive differences greater than 50 ms (PNN50) indexes in time-domain and the low frequency (LF) and high frequency (HF) power in frequency domain were the parameters which highly degraded with frame-rate reduction. Such a degradation can be partially mitigated by up-sampling the measured signal at a higher frequency (namely 60 Hz). Concerning the video compression, the results showed that compression techniques are suitable for the storage of vPPG recordings, although lossless or intra-frame compression are to be preferred over inter-frame compression methods. FFV1 performances are very close to the uncompressed (UNC) version with less than 45% disk size. H.264 showed a degradation of the PRV estimation directly correlated with the increase of the compression ratio.

  19. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

    PubMed Central

    Wang, Kun-Ching

    2015-01-01

    The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590

  20. Flexible high speed codec

    NASA Technical Reports Server (NTRS)

    Boyd, R. W.; Hartman, W. F.

    1992-01-01

    The project's objective is to develop an advanced high speed coding technology that provides substantial coding gains with limited bandwidth expansion for several common modulation types. The resulting technique is applicable to several continuous and burst communication environments. Decoding provides a significant gain with hard decisions alone and can utilize soft decision information when available from the demodulator to increase the coding gain. The hard decision codec will be implemented using a single application specific integrated circuit (ASIC) chip. It will be capable of coding and decoding as well as some formatting and synchronization functions at data rates up to 300 megabits per second (Mb/s). Code rate is a function of the block length and can vary from 7/8 to 15/16. Length of coded bursts can be any multiple of 32 that is greater than or equal to 256 bits. Coding may be switched in or out on a burst by burst basis with no change in the throughput delay. Reliability information in the form of 3-bit (8-level) soft decisions, can be exploited using applique circuitry around the hard decision codec. This applique circuitry will be discrete logic in the present contract. However, ease of transition to LSI is one of the design guidelines. Discussed here is the selected coding technique. Its application to some communication systems is described. Performance with 4, 8, and 16-ary Phase Shift Keying (PSK) modulation is also presented.

  1. Digital codec for real-time processing of broadcast quality video signals at 1.8 bits/pixel

    NASA Technical Reports Server (NTRS)

    Shalkhauser, Mary JO; Whyte, Wayne A., Jr.

    1989-01-01

    The authors present the hardware implementation of a digital television bandwidth compression algorithm which processes standard NTSC (National Television Systems Committee) composite color television signals and produces broadcast-quality video in real time at an average of 1.8 b/pixel. The sampling rate used with this algorithm results in 768 samples over the active portion of each video line by 512 active video lines per video frame. The algorithm is based on differential pulse code modulation (DPCM), but additionally utilizes a nonadaptive predictor, nonuniform quantizer, and multilevel Huffman coder to reduce the data rate substantially below that achievable with straight DPCM. The nonadaptive predictor and multilevel Huffman coder combine to set this technique apart from prior-art DPCM encoding algorithms. The authors describe the data compression algorithm and the hardware implementation of the codec and provide performance results.

  2. Optimal bit allocation for hybrid scalable/multiple-description video transmission over wireless channels

    NASA Astrophysics Data System (ADS)

    Jubran, Mohammad K.; Bansal, Manu; Kondi, Lisimachos P.

    2006-01-01

    In this paper, we consider the problem of optimal bit allocation for wireless video transmission over fading channels. We use a newly developed hybrid scalable/multiple-description codec that combines the functionality of both scalable and multiple-description codecs. It produces a base layer and multiple-description enhancement layers. Any of the enhancement layers can be decoded (in a non-hierarchical manner) with the base layer to improve the reconstructed video quality. Two different channel coding schemes (Rate-Compatible Punctured Convolutional (RCPC)/Cyclic Redundancy Check (CRC) coding and, product code Reed Solomon (RS)+RCPC/CRC coding) are used for unequal error protection of the layered bitstream. Optimal allocation of the bitrate between source and channel coding is performed for discrete sets of source coding rates and channel coding rates. Experimental results are presented for a wide range of channel conditions. Also, comparisons with classical scalable coding show the effectiveness of using hybrid scalable/multiple-description coding for wireless transmission.

  3. Performance comparison of leading image codecs: H.264/AVC Intra, JPEG2000, and Microsoft HD Photo

    NASA Astrophysics Data System (ADS)

    Tran, Trac D.; Liu, Lijie; Topiwala, Pankaj

    2007-09-01

    This paper provides a detailed rate-distortion performance comparison between JPEG2000, Microsoft HD Photo, and H.264/AVC High Profile 4:4:4 I-frame coding for high-resolution still images and high-definition (HD) 1080p video sequences. This work is an extension to our previous comparative study published in previous SPIE conferences [1, 2]. Here we further optimize all three codecs for compression performance. Coding simulations are performed on a set of large-format color images captured from mainstream digital cameras and 1080p HD video sequences commonly used for H.264/AVC standardization work. Overall, our experimental results show that all three codecs offer very similar coding performances at the high-quality, high-resolution setting. Differences tend to be data-dependent: JPEG2000 with the wavelet technology tends to be the best performer with smooth spatial data; H.264/AVC High-Profile with advanced spatial prediction modes tends to cope best with more complex visual content; Microsoft HD Photo tends to be the most consistent across the board. For the still-image data sets, JPEG2000 offers the best R-D performance gains (around 0.2 to 1 dB in peak signal-to-noise ratio) over H.264/AVC High-Profile intra coding and Microsoft HD Photo. For the 1080p video data set, all three codecs offer very similar coding performance. As in [1, 2], neither do we consider scalability nor complexity in this study (JPEG2000 is operating in non-scalable, but optimal performance mode).

  4. Global Interoperability of High Definition Video Streams Via ACTS and Intelsat

    NASA Technical Reports Server (NTRS)

    Hsu, Eddie; Wang, Charles; Bergman, Larry; Pearman, James; Bhasin, Kul; Clark, Gilbert; Shopbell, Patrick; Gill, Mike; Tatsumi, Haruyuki; Kadowaki, Naoto

    2000-01-01

    In 1993, a proposal at the Japan.-U.S. Cooperation in Space Program Workshop lead to a subsequent series of satellite communications experiments and demonstrations, under the title of Trans-Pacific High Data Rate Satellite Communications Experiments. The first of which is a joint collaboration between government and industry teams in the United States and Japan that successfully demonstrated distributed high definition video (HDV) post-production on a global scale using a combination of high data rate satellites and terrestrial fiber optic asynchronous transfer mode (ATM) networks. The HDV experiment is the first GIBN experiment to establish a dual-hop broadband satellite link for the transmission of digital HDV over ATM. This paper describes the team's effort in using the NASA Advanced Communications Technology Satellite (ACTS) at rates up to OC-3 (155 Mbps) between Los Angeles and Honolulu, and using Intelsat at rates up to DS-3 (45 Mbps) between Kapolei and Tokyo, with which HDV source material was transmitted between Sony Pictures High Definition Center (SPHDC) in Los Angeles and Sony Visual Communication Center (VCC) in Shinagawa, Tokyo. The global-scale connection also used terrestrial networks in Japan, the States of Hawaii and California. The 1.2 Gbps digital HDV stream was compressed down to 22.5 Mbps using a proprietary Mitsubishi MPEG-2 codec that was ATM AAL-5 compatible. The codec: employed four-way parallel processing. Improved versions of the codec are now commercially available. The successful post-production activity performed in Tokyo with a HDV clip transmitted from Los Angeles was predicated on the seamless interoperation of all the equipment between the sites, and was an exciting example in deploying a global-scale information infrastructure involving a combination of broadband satellites and terrestrial fiber optic networks. Correlation of atmospheric effects with cell loss, codec drop-out, and picture quality were made. Current efforts in the Trans-Pacific series plan to examine the use of Internet Protocol (IP)-related technologies over such an infrastructure. The use of IP allows the general public to be an integral part of the exciting activities, helps to examine issues in constructing the solar-system internet, and affords an opportunity to tap the research results from the (reliable) multicast and distributed systems communities. The current Trans- Pacific projects, including remote astronomy and digital library (visible human) are briefly described.

  5. Design and analysis of multihypothesis motion-compensated prediction (MHMCP) codec for error-resilient visual communications

    NASA Astrophysics Data System (ADS)

    Kung, Wei-Ying; Kim, Chang-Su; Kuo, C.-C. Jay

    2004-10-01

    A multi-hypothesis motion compensated prediction (MHMCP) scheme, which predicts a block from a weighted superposition of more than one reference blocks in the frame buffer, is proposed and analyzed for error resilient visual communication in this research. By combining these reference blocks effectively, MHMCP can enhance the error resilient capability of compressed video as well as achieve a coding gain. In particular, we investigate the error propagation effect in the MHMCP coder and analyze the rate-distortion performance in terms of the hypothesis number and hypothesis coefficients. It is shown that MHMCP suppresses the short-term effect of error propagation more effectively than the intra refreshing scheme. Simulation results are given to confirm the analysis. Finally, several design principles for the MHMCP coder are derived based on the analytical and experimental results.

  6. Impact of the codec and various QoS methods on the final quality of the transferred voice in an IP network

    NASA Astrophysics Data System (ADS)

    Slavata, Oldřich; Holub, Jan

    2015-02-01

    This paper deals with an analysis of the relation between the codec that is used, the QoS method, and the final voice transmission quality. The Cisco 2811 router is used for adjusting QoS. VoIP client Linphone is used for adjusting the codec. The criterion for transmission quality is the MOS parameter investigated with the ITU-T P.862 PESQ and P.863 POLQA algorithms.

  7. Java Image I/O for VICAR, PDS, and ISIS

    NASA Technical Reports Server (NTRS)

    Deen, Robert G.; Levoe, Steven R.

    2011-01-01

    This library, written in Java, supports input and output of images and metadata (labels) in the VICAR, PDS image, and ISIS-2 and ISIS-3 file formats. Three levels of access exist. The first level comprises the low-level, direct access to the file. This allows an application to read and write specific image tiles, lines, or pixels and to manipulate the label data directly. This layer is analogous to the C-language "VICAR Run-Time Library" (RTL), which is the image I/O library for the (C/C++/Fortran) VICAR image processing system from JPL MIPL (Multimission Image Processing Lab). This low-level library can also be used to read and write labeled, uncompressed images stored in formats similar to VICAR, such as ISIS-2 and -3, and a subset of PDS (image format). The second level of access involves two codecs based on Java Advanced Imaging (JAI) to provide access to VICAR and PDS images in a file-format-independent manner. JAI is supplied by Sun Microsystems as an extension to desktop Java, and has a number of codecs for formats such as GIF, TIFF, JPEG, etc. Although Sun has deprecated the codec mechanism (replaced by IIO), it is still used in many places. The VICAR and PDS codecs allow any program written using the JAI codec spec to use VICAR or PDS images automatically, with no specific knowledge of the VICAR or PDS formats. Support for metadata (labels) is included, but is format-dependent. The PDS codec, when processing PDS images with an embedded VIAR label ("dual-labeled images," such as used for MER), presents the VICAR label in a new way that is compatible with the VICAR codec. The third level of access involves VICAR, PDS, and ISIS Image I/O plugins. The Java core includes an "Image I/O" (IIO) package that is similar in concept to the JAI codec, but is newer and more capable. Applications written to the IIO specification can use any image format for which a plug-in exists, with no specific knowledge of the format itself.

  8. Analysis of the possibility of using G.729 codec for steganographic transmission

    NASA Astrophysics Data System (ADS)

    Piotrowski, Zbigniew; Ciołek, Michał; Dołowski, Jerzy; Wojtuń, Jarosław

    2017-04-01

    Network steganography is dedicated in particular for those communication services for which there are no bridges or nodes carrying out unintentional attacks on steganographic sequence. In order to set up a hidden communication channel the method of data encoding and decoding was implemented using code books of codec G.729. G.729 codec includes, in its construction, linear prediction vocoder CS-ACELP (Conjugate Structure Algebraic Code Excited Linear Prediction), and by modifying the binary content of the codebook, it is easy to change a binary output stream. The article describes the results of research on the selection of these bits of the codebook codec G.729 which the negation of the least have influence to the loss of quality and fidelity of the output signal. The study was performed with the use of subjective and objective listening tests.

  9. A Dynamic Image Quality Evaluation of Videofluoroscopy Images: Considerations for Telepractice Applications.

    PubMed

    Burns, Clare L; Keir, Benjamin; Ward, Elizabeth C; Hill, Anne J; Farrell, Anna; Phillips, Nick; Porter, Linda

    2015-08-01

    High-quality fluoroscopy images are required for accurate interpretation of videofluoroscopic swallow studies (VFSS) by speech pathologists and radiologists. Consequently, integral to developing any system to conduct VFSS remotely via telepractice is ensuring that the quality of the VFSS images transferred via the telepractice system is optimized. This study evaluates the extent of change observed in image quality when videofluoroscopic images are transmitted from a digital fluoroscopy system to (a) current clinical equipment (KayPentax Digital Swallowing Workstation, and b) four different telepractice system configurations. The telepractice system configurations consisted of either a local C20 or C60 Cisco TelePresence System (codec unit) connected to the digital fluoroscopy system and linked to a second remote C20 or C60 Cisco TelePresence System via a network running at speeds of either 2, 4 or 6 megabits per second (Mbit/s). Image quality was tested using the NEMA XR 21 Phantom, and results demonstrated some loss in spatial resolution, low contrast detectability and temporal resolution for all transferred images when compared to the fluoroscopy source. When using higher capacity codec units and/or the highest bandwidths to support data transmission, image quality transmitted through the telepractice system was found to be comparable if not better than the current clinical system. This study confirms that telepractice systems can be designed to support fluoroscopy image transfer and highlights important considerations when developing telepractice systems for VFSS analysis to ensure high-quality radiological image reproduction.

  10. Design of UAV high resolution image transmission system

    NASA Astrophysics Data System (ADS)

    Gao, Qiang; Ji, Ming; Pang, Lan; Jiang, Wen-tao; Fan, Pengcheng; Zhang, Xingcheng

    2017-02-01

    In order to solve the problem of the bandwidth limitation of the image transmission system on UAV, a scheme with image compression technology for mini UAV is proposed, based on the requirements of High-definition image transmission system of UAV. The video codec standard H.264 coding module and key technology was analyzed and studied for UAV area video communication. Based on the research of high-resolution image encoding and decoding technique and wireless transmit method, The high-resolution image transmission system was designed on architecture of Android and video codec chip; the constructed system was confirmed by experimentation in laboratory, the bit-rate could be controlled easily, QoS is stable, the low latency could meets most applied requirement not only for military use but also for industrial applications.

  11. An overview of new video coding tools under consideration for VP10: the successor to VP9

    NASA Astrophysics Data System (ADS)

    Mukherjee, Debargha; Su, Hui; Bankoski, James; Converse, Alex; Han, Jingning; Liu, Zoe; Xu, Yaowu

    2015-09-01

    Google started an opensource project, entitled the WebM Project, in 2010 to develop royaltyfree video codecs for the web. The present generation codec developed in the WebM project called VP9 was finalized in mid2013 and is currently being served extensively by YouTube, resulting in billions of views per day. Even though adoption of VP9 outside Google is still in its infancy, the WebM project has already embarked on an ambitious project to develop a next edition codec VP10 that achieves at least a generational bitrate reduction over the current generation codec VP9. Although the project is still in early stages, a set of new experimental coding tools have already been added to baseline VP9 to achieve modest coding gains over a large enough test set. This paper provides a technical overview of these coding tools.

  12. 2-Step scalar deadzone quantization for bitplane image coding.

    PubMed

    Auli-Llinas, Francesc

    2013-12-01

    Modern lossy image coding systems generate a quality progressive codestream that, truncated at increasing rates, produces an image with decreasing distortion. Quality progressivity is commonly provided by an embedded quantizer that employs uniform scalar deadzone quantization (USDQ) together with a bitplane coding strategy. This paper introduces a 2-step scalar deadzone quantization (2SDQ) scheme that achieves same coding performance as that of USDQ while reducing the coding passes and the emitted symbols of the bitplane coding engine. This serves to reduce the computational costs of the codec and/or to code high dynamic range images. The main insights behind 2SDQ are the use of two quantization step sizes that approximate wavelet coefficients with more or less precision depending on their density, and a rate-distortion optimization technique that adjusts the distortion decreases produced when coding 2SDQ indexes. The integration of 2SDQ in current codecs is straightforward. The applicability and efficiency of 2SDQ are demonstrated within the framework of JPEG2000.

  13. JPEG 2000-based compression of fringe patterns for digital holographic microscopy

    NASA Astrophysics Data System (ADS)

    Blinder, David; Bruylants, Tim; Ottevaere, Heidi; Munteanu, Adrian; Schelkens, Peter

    2014-12-01

    With the advent of modern computing and imaging technologies, digital holography is becoming widespread in various scientific disciplines such as microscopy, interferometry, surface shape measurements, vibration analysis, data encoding, and certification. Therefore, designing an efficient data representation technology is of particular importance. Off-axis holograms have very different signal properties with respect to regular imagery, because they represent a recorded interference pattern with its energy biased toward the high-frequency bands. This causes traditional images' coders, which assume an underlying 1/f2 power spectral density distribution, to perform suboptimally for this type of imagery. We propose a JPEG 2000-based codec framework that provides a generic architecture suitable for the compression of many types of off-axis holograms. This framework has a JPEG 2000 codec at its core, extended with (1) fully arbitrary wavelet decomposition styles and (2) directional wavelet transforms. Using this codec, we report significant improvements in coding performance for off-axis holography relative to the conventional JPEG 2000 standard, with Bjøntegaard delta-peak signal-to-noise ratio improvements ranging from 1.3 to 11.6 dB for lossy compression in the 0.125 to 2.00 bpp range and bit-rate reductions of up to 1.6 bpp for lossless compression.

  14. Secure videoconferencing equipment switching system and method

    DOEpatents

    Hansen, Michael E [Livermore, CA

    2009-01-13

    A switching system and method are provided to facilitate use of videoconference facilities over a plurality of security levels. The system includes a switch coupled to a plurality of codecs and communication networks. Audio/Visual peripheral components are connected to the switch. The switch couples control and data signals between the Audio/Visual peripheral components and one but nor both of the plurality of codecs. The switch additionally couples communication networks of the appropriate security level to each of the codecs. In this manner, a videoconferencing facility is provided for use on both secure and non-secure networks.

  15. On transform coding tools under development for VP10

    NASA Astrophysics Data System (ADS)

    Parker, Sarah; Chen, Yue; Han, Jingning; Liu, Zoe; Mukherjee, Debargha; Su, Hui; Wang, Yongzhe; Bankoski, Jim; Li, Shunyao

    2016-09-01

    Google started the WebM Project in 2010 to develop open source, royaltyfree video codecs designed specifically for media on the Web. The second generation codec released by the WebM project, VP9, is currently served by YouTube, and enjoys billions of views per day. Realizing the need for even greater compression efficiency to cope with the growing demand for video on the web, the WebM team embarked on an ambitious project to develop a next edition codec, VP10, that achieves at least a generational improvement in coding efficiency over VP9. Starting from VP9, a set of new experimental coding tools have already been added to VP10 to achieve decent coding gains. Subsequently, Google joined a consortium of major tech companies called the Alliance for Open Media to jointly develop a new codec AV1. As a result, the VP10 effort is largely expected to merge with AV1. In this paper, we focus primarily on new tools in VP10 that improve coding of the prediction residue using transform coding techniques. Specifically, we describe tools that increase the flexibility of available transforms, allowing the codec to handle a more diverse range or residue structures. Results are presented on a standard test set.

  16. Towards a next generation open-source video codec

    NASA Astrophysics Data System (ADS)

    Bankoski, Jim; Bultje, Ronald S.; Grange, Adrian; Gu, Qunshan; Han, Jingning; Koleszar, John; Mukherjee, Debargha; Wilkins, Paul; Xu, Yaowu

    2013-02-01

    Google has recently been developing a next generation opensource video codec called VP9, as part of the experimental branch of the libvpx repository included in the WebM project (http://www.webmproject.org/). Starting from the VP8 video codec released by Google in 2010 as the baseline, a number of enhancements and new tools have been added to improve the coding efficiency. This paper provides a technical overview of the current status of this project along with comparisons and other stateoftheart video codecs H. 264/AVC and HEVC. The new tools that have been added so far include: larger prediction block sizes up to 64x64, various forms of compound INTER prediction, more modes for INTRA prediction, ⅛pel motion vectors and 8tap switchable subpel interpolation filters, improved motion reference generation and motion vector coding, improved entropy coding and framelevel entropy adaptation for various symbols, improved loop filtering, incorporation of Asymmetric Discrete Sine Transforms and larger 16x16 and 32x32 DCTs, frame level segmentation to group similar areas together, etc. Other tools and various bitstream features are being actively worked on as well. The VP9 bitstream is expected to be finalized by earlyto mid2013. Results show VP9 to be quite competitive in performance with mainstream stateoftheart codecs.

  17. Novel inter and intra prediction tools under consideration for the emerging AV1 video codec

    NASA Astrophysics Data System (ADS)

    Joshi, Urvang; Mukherjee, Debargha; Han, Jingning; Chen, Yue; Parker, Sarah; Su, Hui; Chiang, Angie; Xu, Yaowu; Liu, Zoe; Wang, Yunqing; Bankoski, Jim; Wang, Chen; Keyder, Emil

    2017-09-01

    Google started the WebM Project in 2010 to develop open source, royalty- free video codecs designed specifically for media on the Web. The second generation codec released by the WebM project, VP9, is currently served by YouTube, and enjoys billions of views per day. Realizing the need for even greater compression efficiency to cope with the growing demand for video on the web, the WebM team embarked on an ambitious project to develop a next edition codec AV1, in a consortium of major tech companies called the Alliance for Open Media, that achieves at least a generational improvement in coding efficiency over VP9. In this paper, we focus primarily on new tools in AV1 that improve the prediction of pixel blocks before transforms, quantization and entropy coding are invoked. Specifically, we describe tools and coding modes that improve intra, inter and combined inter-intra prediction. Results are presented on standard test sets.

  18. Flexible video conference system based on ASICs and DSPs

    NASA Astrophysics Data System (ADS)

    Hu, Qiang; Yu, Songyu

    1995-02-01

    In this paper, a video conference system we developed recently is presented. In this system the video codec is compatible with CCITT H.261, the audio codec is compatible with G.711 and G.722, the channel interface circuit is designed according to CCITT H.221. In this paper emphasis is given to the video codec, which is both flexible and robust. The video codec is based on LSI LOGIC Corporation's L64700 series video compression chipset. The main function blocks of H.261, such as DCT, motion estimation, VLC, VLD, are performed by this chipset, but the chipset is a nude chipset, no peripheral function, such as memory interface, is integrated into it, this results in great difficulty to implement the system. To implement the frame buffer controller, a DSP-TMS 320c25 and a group of GALs is used, SRAM is used as a current and previous frame buffer, the DSP is not only the controller of the frame buffer, it's also the controller of the whole video codec. Because of the use of the DSP, the architecture of the video codec is very flexible, many system parameters can be reconfigured for different applications. The architecture of the whole video codec is a streamline structure. In H.261, BCH(511,493) coding is recommended to work against random errors in transmission, but if burst error occurs, it causes serious result. To solve this problem, an interleaving method is used, that means the BCH code is interleaved before it's transmitted, in the receiver it is interleaved again and the bit stream is in the original order, but the error bits are distributed into several BCH words, and the BCH decoder is able to correct it. Considering that extreme conditions may occur, a function block is implemented which is somewhat like a watchdog, it assures that the receiver can recover from errors no matter what serious error occurs in transmission. In developing the video conference system, a new synchronization problem must be solved, the monitor on the receiver can't be easily synchronized with the camera on another side, a new method is described in detail which can solve this problem successfully.

  19. Impaired extraction of speech rhythm from temporal modulation patterns in speech in developmental dyslexia

    PubMed Central

    Leong, Victoria; Goswami, Usha

    2014-01-01

    Dyslexia is associated with impaired neural representation of the sound structure of words (phonology). The “phonological deficit” in dyslexia may arise in part from impaired speech rhythm perception, thought to depend on neural oscillatory phase-locking to slow amplitude modulation (AM) patterns in the speech envelope. Speech contains AM patterns at multiple temporal rates, and these different AM rates are associated with phonological units of different grain sizes, e.g., related to stress, syllables or phonemes. Here, we assess the ability of adults with dyslexia to use speech AMs to identify rhythm patterns (RPs). We study 3 important temporal rates: “Stress” (~2 Hz), “Syllable” (~4 Hz) and “Sub-beat” (reduced syllables, ~14 Hz). 21 dyslexics and 21 controls listened to nursery rhyme sentences that had been tone-vocoded using either single AM rates from the speech envelope (Stress only, Syllable only, Sub-beat only) or pairs of AM rates (Stress + Syllable, Syllable + Sub-beat). They were asked to use the acoustic rhythm of the stimulus to identity the original nursery rhyme sentence. The data showed that dyslexics were significantly poorer at detecting rhythm compared to controls when they had to utilize multi-rate temporal information from pairs of AMs (Stress + Syllable or Syllable + Sub-beat). These data suggest that dyslexia is associated with a reduced ability to utilize AMs <20 Hz for rhythm recognition. This perceptual deficit in utilizing AM patterns in speech could be underpinned by less efficient neuronal phase alignment and cross-frequency neuronal oscillatory synchronization in dyslexia. Dyslexics' perceptual difficulties in capturing the full spectro-temporal complexity of speech over multiple timescales could contribute to the development of impaired phonological representations for words, the cognitive hallmark of dyslexia across languages. PMID:24605099

  20. Feasibility of video codec algorithms for software-only playback

    NASA Astrophysics Data System (ADS)

    Rodriguez, Arturo A.; Morse, Ken

    1994-05-01

    Software-only video codecs can provide good playback performance in desktop computers with a 486 or 68040 CPU running at 33 MHz without special hardware assistance. Typically, playback of compressed video can be categorized into three tasks: the actual decoding of the video stream, color conversion, and the transfer of decoded video data from system RAM to video RAM. By current standards, good playback performance is the decoding and display of video streams of 320 by 240 (or larger) compressed frames at 15 (or greater) frames-per- second. Software-only video codecs have evolved by modifying and tailoring existing compression methodologies to suit video playback in desktop computers. In this paper we examine the characteristics used to evaluate software-only video codec algorithms, namely: image fidelity (i.e., image quality), bandwidth (i.e., compression) ease-of-decoding (i.e., playback performance), memory consumption, compression to decompression asymmetry, scalability, and delay. We discuss the tradeoffs among these variables and the compromises that can be made to achieve low numerical complexity for software-only playback. Frame- differencing approaches are described since software-only video codecs typically employ them to enhance playback performance. To complement other papers that appear in this session of the Proceedings, we review methods derived from binary pattern image coding since these methods are amenable for software-only playback. In particular, we introduce a novel approach called pixel distribution image coding.

  1. Stochastic Packet Loss Model to Evaluate QoE Impairments

    NASA Astrophysics Data System (ADS)

    Hohlfeld, Oliver

    With provisioning of broadband access for mass market—even in wireless and mobile networks—multimedia content, especially real-time streaming of high-quality audio and video, is extensively viewed and exchanged over the Internet. Quality of Experience (QoE) aspects, describing the service quality perceived by the user, is a vital factor in ensuring customer satisfaction in today's communication networks. Frameworks for accessing quality degradations in streamed video currently are investigated as a complex multi-layered research topic, involving network traffic load, codec functions and measures of user perception of video quality.

  2. Software-codec-based full motion video conferencing on the PC using visual pattern image sequence coding

    NASA Astrophysics Data System (ADS)

    Barnett, Barry S.; Bovik, Alan C.

    1995-04-01

    This paper presents a real time full motion video conferencing system based on the Visual Pattern Image Sequence Coding (VPISC) software codec. The prototype system hardware is comprised of two personal computers, two camcorders, two frame grabbers, and an ethernet connection. The prototype system software has a simple structure. It runs under the Disk Operating System, and includes a user interface, a video I/O interface, an event driven network interface, and a free running or frame synchronous video codec that also acts as the controller for the video and network interfaces. Two video coders have been tested in this system. Simple implementations of Visual Pattern Image Coding and VPISC have both proven to support full motion video conferencing with good visual quality. Future work will concentrate on expanding this prototype to support the motion compensated version of VPISC, as well as encompassing point-to-point modem I/O and multiple network protocols. The application will be ported to multiple hardware platforms and operating systems. The motivation for developing this prototype system is to demonstrate the practicality of software based real time video codecs. Furthermore, software video codecs are not only cheaper, but are more flexible system solutions because they enable different computer platforms to exchange encoded video information without requiring on-board protocol compatible video codex hardware. Software based solutions enable true low cost video conferencing that fits the `open systems' model of interoperability that is so important for building portable hardware and software applications.

  3. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  4. Effects of utterance length and vocal loudness on speech breathing in older adults.

    PubMed

    Huber, Jessica E

    2008-12-31

    Age-related reductions in pulmonary elastic recoil and respiratory muscle strength can affect how older adults generate subglottal pressure required for speech production. The present study examined age-related changes in speech breathing by manipulating utterance length and loudness during a connected speech task (monologue). Twenty-three older adults and twenty-eight young adults produced a monologue at comfortable loudness and pitch and with multi-talker babble noise playing in the room to elicit louder speech. Dependent variables included sound pressure level, speech rate, and lung volume initiation, termination, and excursion. Older adults produced shorter utterances than young adults overall. Age-related effects were larger for longer utterances. Older adults demonstrated very different lung volume adjustments for loud speech than young adults. These results suggest that older adults have a more difficult time when the speech system is being taxed by both utterance length and loudness. The data were consistent with the hypothesis that both young and older adults use utterance length in premotor speech planning processes.

  5. Behavioral and neurobiological correlates of childhood apraxia of speech in Italian children.

    PubMed

    Chilosi, Anna Maria; Lorenzini, Irene; Fiori, Simona; Graziosi, Valentina; Rossi, Giuseppe; Pasquariello, Rosa; Cipriani, Paola; Cioni, Giovanni

    2015-11-01

    Childhood apraxia of speech (CAS) is a neurogenic Speech Sound Disorder whose etiology and neurobiological correlates are still unclear. In the present study, 32 Italian children with idiopathic CAS underwent a comprehensive speech and language, genetic and neuroradiological investigation aimed to gather information on the possible behavioral and neurobiological markers of the disorder. The results revealed four main aggregations of behavioral symptoms that indicate a multi-deficit disorder involving both motor-speech and language competence. Six children presented with chromosomal alterations. The familial aggregation rate for speech and language difficulties and the male to female ratio were both very high in the whole sample, supporting the hypothesis that genetic factors make substantial contribution to the risk of CAS. As expected in accordance with the diagnosis of idiopathic CAS, conventional MRI did not reveal macrostructural pathogenic neuroanatomical abnormalities, suggesting that CAS may be due to brain microstructural alterations. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Fast and predictable video compression in software design and implementation of an H.261 codec

    NASA Astrophysics Data System (ADS)

    Geske, Dagmar; Hess, Robert

    1998-09-01

    The use of software codecs for video compression becomes commonplace in several videoconferencing applications. In order to reduce conflicts with other applications used at the same time, mechanisms for resource reservation on endsystems need to determine an upper bound for computing time used by the codec. This leads to the demand for predictable execution times of compression/decompression. Since compression schemes as H.261 inherently depend on the motion contained in the video, an adaptive admission control is required. This paper presents a data driven approach based on dynamical reduction of the number of processed macroblocks in peak situations. Besides the absolute speed is a point of interest. The question, whether and how software compression of high quality video is feasible on today's desktop computers, is examined.

  7. Slowed articulation rate is a sensitive diagnostic marker for identifying non-fluent primary progressive aphasia

    PubMed Central

    Cordella, Claire; Dickerson, Bradford C.; Quimby, Megan; Yunusova, Yana; Green, Jordan R.

    2016-01-01

    Background Primary progressive aphasia (PPA) is a neurodegenerative aphasic syndrome with three distinct clinical variants: non-fluent (nfvPPA), logopenic (lvPPA), and semantic (svPPA). Speech (non-) fluency is a key diagnostic marker used to aid identification of the clinical variants, and researchers have been actively developing diagnostic tools to assess speech fluency. Current approaches reveal coarse differences in fluency between subgroups, but often fail to clearly differentiate nfvPPA from the variably fluent lvPPA. More robust subtype differentiation may be possible with finer-grained measures of fluency. Aims We sought to identify the quantitative measures of speech rate—including articulation rate and pausing measures—that best differentiated PPA subtypes, specifically the non-fluent group (nfvPPA) from the more fluent groups (lvPPA, svPPA). The diagnostic accuracy of the quantitative speech rate variables was compared to that of a speech fluency impairment rating made by clinicians. Methods and Procedures Automatic estimates of pause and speech segment durations and rate measures were derived from connected speech samples of participants with PPA (N=38; 11 nfvPPA, 14 lvPPA, 13 svPPA) and healthy age-matched controls (N=8). Clinician ratings of fluency impairment were made using a previously validated clinician rating scale developed specifically for use in PPA. Receiver operating characteristic (ROC) analyses enabled a quantification of diagnostic accuracy. Outcomes and Results Among the quantitative measures, articulation rate was the most effective for differentiating between nfvPPA and the more fluent lvPPA and svPPA groups. The diagnostic accuracy of both speech and articulation rate measures was markedly better than that of the clinician rating scale, and articulation rate was the best classifier overall. Area under the curve (AUC) values for articulation rate were good to excellent for identifying nfvPPA from both svPPA (AUC=.96) and lvPPA (AUC=.86). Cross-validation of accuracy results for articulation rate showed good generalizability outside the training dataset. Conclusions Results provide empirical support for (1) the efficacy of quantitative assessments of speech fluency and (2) a distinct non-fluent PPA subtype characterized, at least in part, by an underlying disturbance in speech motor control. The trend toward improved classifier performance for quantitative rate measures demonstrates the potential for a more accurate and reliable approach to subtyping in the fluency domain, and suggests that articulation rate may be a useful input variable as part of a multi-dimensional clinical subtyping approach. PMID:28757671

  8. Mixture block coding with progressive transmission in packet video. Appendix 1: Item 2. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Chen, Yun-Chung

    1989-01-01

    Video transmission will become an important part of future multimedia communication because of dramatically increasing user demand for video, and rapid evolution of coding algorithm and VLSI technology. Video transmission will be part of the broadband-integrated services digital network (B-ISDN). Asynchronous transfer mode (ATM) is a viable candidate for implementation of B-ISDN due to its inherent flexibility, service independency, and high performance. According to the characteristics of ATM, the information has to be coded into discrete cells which travel independently in the packet switching network. A practical realization of an ATM video codec called Mixture Block Coding with Progressive Transmission (MBCPT) is presented. This variable bit rate coding algorithm shows how a constant quality performance can be obtained according to user demand. Interactions between codec and network are emphasized including packetization, service synchronization, flow control, and error recovery. Finally, some simulation results based on MBCPT coding with error recovery are presented.

  9. Local statistics adaptive entropy coding method for the improvement of H.26L VLC coding

    NASA Astrophysics Data System (ADS)

    Yoo, Kook-yeol; Kim, Jong D.; Choi, Byung-Sun; Lee, Yung Lyul

    2000-05-01

    In this paper, we propose an adaptive entropy coding method to improve the VLC coding efficiency of H.26L TML-1 codec. First of all, we will show that the VLC coding presented in TML-1 does not satisfy the sibling property of entropy coding. Then, we will modify the coding method into the local statistics adaptive one to satisfy the property. The proposed method based on the local symbol statistics dynamically changes the mapping relationship between symbol and bit pattern in the VLC table according to sibling property. Note that the codewords in the VLC table of TML-1 codec is not changed. Since this changed mapping relationship also derived in the decoder side by using the decoded symbols, the proposed VLC coding method does not require any overhead information. The simulation results show that the proposed method gives about 30% and 37% reduction in average bit rate for MB type and CBP information, respectively.

  10. Noise Suppression Based on Multi-Model Compositions Using Multi-Pass Search with Multi-Label N-gram Models

    NASA Astrophysics Data System (ADS)

    Jitsuhiro, Takatoshi; Toriyama, Tomoji; Kogure, Kiyoshi

    We propose a noise suppression method based on multi-model compositions and multi-pass search. In real environments, input speech for speech recognition includes many kinds of noise signals. To obtain good recognized candidates, suppressing many kinds of noise signals at once and finding target speech is important. Before noise suppression, to find speech and noise label sequences, we introduce multi-pass search with acoustic models including many kinds of noise models and their compositions, their n-gram models, and their lexicon. Noise suppression is frame-synchronously performed using the multiple models selected by recognized label sequences with time alignments. We evaluated this method using the E-Nightingale task, which contains voice memoranda spoken by nurses during actual work at hospitals. The proposed method obtained higher performance than the conventional method.

  11. Implementing MANETS in Android based environment using Wi-Fi direct

    NASA Astrophysics Data System (ADS)

    Waqas, Muhammad; Babar, Mohammad Inayatullah Khan; Zafar, Mohammad Haseeb

    2015-05-01

    Packet loss occurs in real-time voice transmission over wireless broadcast Ad-hoc network which creates disruptions in sound. Basic objective of this research is to design a wireless Ad-hoc network based on two Android devices by using the Wireless Fidelity (WIFI) Direct Application Programming Interface (API) and apply the Network Codec, Reed Solomon Code. The network codec is used to encode the data of a music wav file and recover the lost packets if any, packets are dropped using a loss module at the transmitter device to analyze the performance with the objective of retrieving the original file at the receiver device using the network codec. This resulted in faster transmission of the files despite dropped packets. In the end both files had the original formatted music files with complete performance analysis based on the transmission delay.

  12. Binaural unmasking of multi-channel stimuli in bilateral cochlear implant users.

    PubMed

    Van Deun, Lieselot; van Wieringen, Astrid; Francart, Tom; Büchner, Andreas; Lenarz, Thomas; Wouters, Jan

    2011-10-01

    Previous work suggests that bilateral cochlear implant users are sensitive to interaural cues if experimental speech processors are used to preserve accurate interaural information in the electrical stimulation pattern. Binaural unmasking occurs in adults and children when an interaural delay is applied to the envelope of a high-rate pulse train. Nevertheless, for speech perception, binaural unmasking benefits have not been demonstrated consistently, even with coordinated stimulation at both ears. The present study aimed at bridging the gap between basic psychophysical performance on binaural signal detection tasks on the one hand and binaural perception of speech in noise on the other hand. Therefore, binaural signal detection was expanded to multi-channel stimulation and biologically relevant interaural delays. A harmonic complex, consisting of three sinusoids (125, 250, and 375 Hz), was added to three 125-Hz-wide noise bands centered on the sinusoids. When an interaural delay of 700 μs was introduced, an average BMLD of 3 dB was established. Outcomes are promising in view of real-life benefits. Future research should investigate the generalization of the observed benefits for signal detection to speech perception in everyday listening situations and determine the importance of coordination of bilateral speech processors and accentuation of envelope cues.

  13. Java Library for Input and Output of Image Data and Metadata

    NASA Technical Reports Server (NTRS)

    Deen, Robert; Levoe, Steven

    2003-01-01

    A Java-language library supports input and output (I/O) of image data and metadata (label data) in the format of the Video Image Communication and Retrieval (VICAR) image-processing software and in several similar formats, including a subset of the Planetary Data System (PDS) image file format. The library does the following: It provides low-level, direct access layer, enabling an application subprogram to read and write specific image files, lines, or pixels, and manipulate metadata directly. Two coding/decoding subprograms ("codecs" for short) based on the Java Advanced Imaging (JAI) software provide access to VICAR and PDS images in a file-format-independent manner. The VICAR and PDS codecs enable any program that conforms to the specification of the JAI codec to use VICAR or PDS images automatically, without specific knowledge of the VICAR or PDS format. The library also includes Image I/O plugin subprograms for VICAR and PDS formats. Application programs that conform to the Image I/O specification of Java version 1.4 can utilize any image format for which such a plug-in subprogram exists, without specific knowledge of the format itself. Like the aforementioned codecs, the VICAR and PDS Image I/O plug-in subprograms support reading and writing of metadata.

  14. Region of interest video coding for low bit-rate transmission of carotid ultrasound videos over 3G wireless networks.

    PubMed

    Tsapatsoulis, Nicolas; Loizou, Christos; Pattichis, Constantinos

    2007-01-01

    Efficient medical video transmission over 3G wireless is of great importance for fast diagnosis and on site medical staff training purposes. In this paper we present a region of interest based ultrasound video compression study which shows that significant reduction of the required, for transmission, bit rate can be achieved without altering the design of existing video codecs. Simple preprocessing of the original videos to define visually and clinically important areas is the only requirement.

  15. Test and Evaluation of Teleconferencing Video Codecs Transmitting at 1.5 Mbps.

    DTIC Science & Technology

    1985-08-01

    video teleconferencing codecs on the market as of November 1984 to facilitate the choice of an appropriate frame format and data compression algorithm...Engineer, computer company, male 5. Chapter Officer, national civic organization, female Group Y: 6. Marketing Representative, communication systems...both mon:tors to C4ve t e evi uators an idea what kind of cictures they will have to ; ucge . Special suggestions were given regardinc the sequences witn

  16. Standardization of End-to-End Performance of Digital Video Teleconferencing/Video Telephony Systems

    DTIC Science & Technology

    1991-12-01

    SYSTEM 3-1 end-to-end video transmission system including both firmly specified and peripheral flexible functions. The format converter changes either...which manifests itself in both subjective evaluations and objective tests. The relative importance of performance parameters is likely to change with...conventional analog performance parameters to be largely independent of bit rate, and only slightly changed between different codec models. The

  17. System on a chip with MPEG-4 capability

    NASA Astrophysics Data System (ADS)

    Yassa, Fathy; Schonfeld, Dan

    2002-12-01

    Current products supporting video communication applications rely on existing computer architectures. RISC processors have been used successfully in numerous applications over several decades. DSP processors have become ubiquitous in signal processing and communication applications. Real-time applications such as speech processing in cellular telephony rely extensively on the computational power of these processors. Video processors designed to implement the computationally intensive codec operations have also been used to address the high demands of video communication applications (e.g., cable set-top boxes and DVDs). This paper presents an overview of a system-on-chip (SOC) architecture used for real-time video in wireless communication applications. The SOC specifications answer to the system requirements imposed by the application environment. A CAM-based video processor is used to accelerate data intensive video compression tasks such as motion estimations and filtering. Other components are dedicated to system level data processing and audio processing. A rich set of I/Os allows the SOC to communicate with other system components such as baseband and memory subsystems.

  18. The Speech multi features fusion perceptual hash algorithm based on tensor decomposition

    NASA Astrophysics Data System (ADS)

    Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.

    2018-03-01

    With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.

  19. Objective and subjective quality assessment of geometry compression of reconstructed 3D humans in a 3D virtual room

    NASA Astrophysics Data System (ADS)

    Mekuria, Rufael; Cesar, Pablo; Doumanis, Ioannis; Frisiello, Antonella

    2015-09-01

    Compression of 3D object based video is relevant for 3D Immersive applications. Nevertheless, the perceptual aspects of the degradation introduced by codecs for meshes and point clouds are not well understood. In this paper we evaluate the subjective and objective degradations introduced by such codecs in a state of art 3D immersive virtual room. In the 3D immersive virtual room, users are captured with multiple cameras, and their surfaces are reconstructed as photorealistic colored/textured 3D meshes or point clouds. To test the perceptual effect of compression and transmission, we render degraded versions with different frame rates in different contexts (near/far) in the scene. A quantitative subjective study with 16 users shows that negligible distortion of decoded surfaces compared to the original reconstructions can be achieved in the 3D virtual room. In addition, a qualitative task based analysis in a full prototype field trial shows increased presence, emotion, user and state recognition of the reconstructed 3D Human representation compared to animated computer avatars.

  20. ISO-IEC MPEG-2 software video codec

    NASA Astrophysics Data System (ADS)

    Eckart, Stefan; Fogg, Chad E.

    1995-04-01

    Part 5 of the International Standard ISO/IEC 13818 `Generic Coding of Moving Pictures and Associated Audio' (MPEG-2) is a Technical Report, a sample software implementation of the procedures in parts 1, 2 and 3 of the standard (systems, video, and audio). This paper focuses on the video software, which gives an example of a fully compliant implementation of the standard and of a good video quality encoder, and serves as a tool for compliance testing. The implementation and some of the development aspects of the codec are described. The encoder is based on Test Model 5 (TM5), one of the best, published, non-proprietary coding models, which was used during MPEG-2 collaborative stage to evaluate proposed algorithms and to verify the syntax. The most important part of the Test Model is controlling the quantization parameter based on the image content and bit rate constraints under both signal-to-noise and psycho-optical aspects. The decoder has been successfully tested for compliance with the MPEG-2 standard, using the ISO/IEC MPEG verification and compliance bitstream test suites as stimuli.

  1. High-precision two-way optic-fiber time transfer using an improved time code.

    PubMed

    Wu, Guiling; Hu, Liang; Zhang, Hao; Chen, Jianping

    2014-11-01

    We present a novel high-precision two-way optic-fiber time transfer scheme. The Inter-Range Instrumentation Group (IRIG-B) time code is modified by increasing bit rate and defining new fields. The modified time code can be transmitted directly using commercial optical transceivers and is able to efficiently suppress the effect of the Rayleigh backscattering in the optical fiber. A dedicated codec (encoder and decoder) with low delay fluctuation is developed. The synchronization issue is addressed by adopting a mask technique and combinational logic circuit. Its delay fluctuation is less than 27 ps in terms of the standard deviation. The two-way optic-fiber time transfer using the improved codec scheme is verified experimentally over 2 m to100 km fiber links. The results show that the stability over 100 km fiber link is always less than 35 ps with the minimum value of about 2 ps at the averaging time around 1000 s. The uncertainty of time difference induced by the chromatic dispersion over 100 km is less than 22 ps.

  2. Realization of guitar audio effects using methods of digital signal processing

    NASA Astrophysics Data System (ADS)

    Buś, Szymon; Jedrzejewski, Konrad

    2015-09-01

    The paper is devoted to studies on possibilities of realization of guitar audio effects by means of methods of digital signal processing. As a result of research, some selected audio effects corresponding to the specifics of guitar sound were realized as the real-time system called Digital Guitar Multi-effect. Before implementation in the system, the selected effects were investigated using the dedicated application with a graphical user interface created in Matlab environment. In the second stage, the real-time system based on a microcontroller and an audio codec was designed and realized. The system is designed to perform audio effects on the output signal of an electric guitar.

  3. Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

    NASA Astrophysics Data System (ADS)

    Bugatti, Alessandro; Flammini, Alessandra; Migliorati, Pierangelo

    2002-12-01

    We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on Zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron). In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used.

  4. Dynamic frame resizing with convolutional neural network for efficient video compression

    NASA Astrophysics Data System (ADS)

    Kim, Jaehwan; Park, Youngo; Choi, Kwang Pyo; Lee, JongSeok; Jeon, Sunyoung; Park, JeongHoon

    2017-09-01

    In the past, video codecs such as vc-1 and H.263 used a technique to encode reduced-resolution video and restore original resolution from the decoder for improvement of coding efficiency. The techniques of vc-1 and H.263 Annex Q are called dynamic frame resizing and reduced-resolution update mode, respectively. However, these techniques have not been widely used due to limited performance improvements that operate well only under specific conditions. In this paper, video frame resizing (reduced/restore) technique based on machine learning is proposed for improvement of coding efficiency. The proposed method features video of low resolution made by convolutional neural network (CNN) in encoder and reconstruction of original resolution using CNN in decoder. The proposed method shows improved subjective performance over all the high resolution videos which are dominantly consumed recently. In order to assess subjective quality of the proposed method, Video Multi-method Assessment Fusion (VMAF) which showed high reliability among many subjective measurement tools was used as subjective metric. Moreover, to assess general performance, diverse bitrates are tested. Experimental results showed that BD-rate based on VMAF was improved by about 51% compare to conventional HEVC. Especially, VMAF values were significantly improved in low bitrate. Also, when the method is subjectively tested, it had better subjective visual quality in similar bit rate.

  5. Flexible high-speed CODEC

    NASA Technical Reports Server (NTRS)

    Segallis, Greg P.; Wernlund, Jim V.; Corry, Glen

    1993-01-01

    This report is prepared by Harris Government Communication Systems Division for NASA Lewis Research Center under contract NAS3-25087. It is written in accordance with SOW section 4.0 (d) as detailed in section 2.6. The purpose of this document is to provide a summary of the program, performance results and analysis, and a technical assessment. The purpose of this program was to develop a flexible, high-speed CODEC that provides substantial coding gain while maintaining bandwidth efficiency for use in both continuous and bursted data environments for a variety of applications.

  6. Architecture design of motion estimation for ITU-T H.263

    NASA Astrophysics Data System (ADS)

    Ku, Chung-Wei; Lin, Gong-Sheng; Chen, Liang-Gee; Lee, Yung-Ping

    1997-01-01

    Digitalized video and audio system has become the trend of the progress in multimedia, because it provides great performance in quality and feasibility of processing. However, as the huge amount of information is needed while the bandwidth is limitted, data compression plays an important role in the system. Say, for a 176 x 144 monochromic sequence with 10 frames/sec frame rate, the bandwidth is about 2Mbps. This wastes much channel resource and limits the applications. MPEG (moving picttre ezpert groip) standardizes the video codec scheme, and it performs high compression ratio while providing good quality. MPEG-i is used for the frame size about 352 x 240 and 30 frames per second, and MPEG-2 provides scalibility and can be applied on scenes with higher definition, say HDTV (high definition television). On the other hand, some applications concerns the very low bit-rate, such as videophone and video-conferencing. Because the channel bandwidth is much limitted in telephone network, a very high compression ratio must be required. ITU-T announced the H.263 video coding standards to meet the above requirements.8 According to the simulation results of TMN-5,22 it outperforms 11.263 with little overhead of complexity. Since wireless communication is the trend in the near future, low power design of the video codec is an important issue for portable visual telephone. Motion estimation is the most computation consuming parts in the whole video codec. About 60% of the computation is spent on this parts for the encoder. Several architectures were proposed for efficient processing of block matching algorithms. In this paper, in order to meet the requirements of 11.263 and the expectation of low power consumption, a modified sandwich architecture in21 is proposed. Based on the parallel processing philosophy, low power is expected and the generation of either one motion vector or four motion vectors with half-pixel accuracy is achieved concurrently. In addition, we will present our solution how to solve the other addition modes in 11.263 with the proposed architecture.

  7. The development of co-speech gesture in the communication of children with autism spectrum disorders.

    PubMed

    Sowden, Hannah; Clegg, Judy; Perkins, Michael

    2013-12-01

    Co-speech gestures have a close semantic relationship to speech in adult conversation. In typically developing children co-speech gestures which give additional information to speech facilitate the emergence of multi-word speech. A difficulty with integrating audio-visual information is known to exist for individuals with Autism Spectrum Disorder (ASD), which may affect development of the speech-gesture system. A longitudinal observational study was conducted with four children with ASD, aged 2;4 to 3;5 years. Participants were video-recorded for 20 min every 2 weeks during their attendance on an intervention programme. Recording continued for up to 8 months, thus affording a rich analysis of gestural practices from pre-verbal to multi-word speech across the group. All participants combined gesture with either speech or vocalisations. Co-speech gestures providing additional information to speech were observed to be either absent or rare. Findings suggest that children with ASD do not make use of the facilitating communicative effects of gesture in the same way as typically developing children.

  8. Dereverberation and denoising based on generalized spectral subtraction by multi-channel LMS algorithm using a small-scale microphone array

    NASA Astrophysics Data System (ADS)

    Wang, Longbiao; Odani, Kyohei; Kai, Atsuhiko

    2012-12-01

    A blind dereverberation method based on power spectral subtraction (SS) using a multi-channel least mean squares algorithm was previously proposed to suppress the reverberant speech without additive noise. The results of isolated word speech recognition experiments showed that this method achieved significant improvements over conventional cepstral mean normalization (CMN) in a reverberant environment. In this paper, we propose a blind dereverberation method based on generalized spectral subtraction (GSS), which has been shown to be effective for noise reduction, instead of power SS. Furthermore, we extend the missing feature theory (MFT), which was initially proposed to enhance the robustness of additive noise, to dereverberation. A one-stage dereverberation and denoising method based on GSS is presented to simultaneously suppress both the additive noise and nonstationary multiplicative noise (reverberation). The proposed dereverberation method based on GSS with MFT is evaluated on a large vocabulary continuous speech recognition task. When the additive noise was absent, the dereverberation method based on GSS with MFT using only 2 microphones achieves a relative word error reduction rate of 11.4 and 32.6% compared to the dereverberation method based on power SS and the conventional CMN, respectively. For the reverberant and noisy speech, the dereverberation and denoising method based on GSS achieves a relative word error reduction rate of 12.8% compared to the conventional CMN with GSS-based additive noise reduction method. We also analyze the effective factors of the compensation parameter estimation for the dereverberation method based on SS, such as the number of channels (the number of microphones), the length of reverberation to be suppressed, and the length of the utterance used for parameter estimation. The experimental results showed that the SS-based method is robust in a variety of reverberant environments for both isolated and continuous speech recognition and under various parameter estimation conditions.

  9. On-chip frame memory reduction using a high-compression-ratio codec in the overdrives of liquid-crystal displays

    NASA Astrophysics Data System (ADS)

    Wang, Jun; Min, Kyeong-Yuk; Chong, Jong-Wha

    2010-11-01

    Overdrive is commonly used to reduce the liquid-crystal response time and motion blur in liquid-crystal displays (LCDs). However, overdrive requires a large frame memory in order to store the previous frame for reference. In this paper, a high-compression-ratio codec is presented to compress the image data stored in the on-chip frame memory so that only 1 Mbit of on-chip memory is required in the LCD overdrives of mobile devices. The proposed algorithm further compresses the color bitmaps and representative values (RVs) resulting from the block truncation coding (BTC). The color bitmaps are represented by a luminance bitmap, which is further reduced and reconstructed using median filter interpolation in the decoder, while the RVs are compressed using adaptive quantization coding (AQC). Interpolation and AQC can provide three-level compression, which leads to 16 combinations. Using a rate-distortion analysis, we select the three optimal schemes to compress the image data for video graphics array (VGA), wide-VGA LCD, and standard-definitionTV applications. Our simulation results demonstrate that the proposed schemes outperform interpolation BTC both in PSNR (by 1.479 to 2.205 dB) and in subjective visual quality.

  10. SCTP as scalable video coding transport

    NASA Astrophysics Data System (ADS)

    Ortiz, Jordi; Graciá, Eduardo Martínez; Skarmeta, Antonio F.

    2013-12-01

    This study presents an evaluation of the Stream Transmission Control Protocol (SCTP) for the transport of the scalable video codec (SVC), proposed by MPEG as an extension to H.264/AVC. Both technologies fit together properly. On the one hand, SVC permits to split easily the bitstream into substreams carrying different video layers, each with different importance for the reconstruction of the complete video sequence at the receiver end. On the other hand, SCTP includes features, such as the multi-streaming and multi-homing capabilities, that permit to transport robustly and efficiently the SVC layers. Several transmission strategies supported on baseline SCTP and its concurrent multipath transfer (CMT) extension are compared with the classical solutions based on the Transmission Control Protocol (TCP) and the Realtime Transmission Protocol (RTP). Using ns-2 simulations, it is shown that CMT-SCTP outperforms TCP and RTP in error-prone networking environments. The comparison is established according to several performance measurements, including delay, throughput, packet loss, and peak signal-to-noise ratio of the received video.

  11. Digital CODEC for real-time processing of broadcast quality video signals at 1.8 bits/pixel

    NASA Technical Reports Server (NTRS)

    Shalkhauser, Mary JO; Whyte, Wayne A., Jr.

    1989-01-01

    Advances in very large-scale integration and recent work in the field of bandwidth efficient digital modulation techniques have combined to make digital video processing technically feasible and potentially cost competitive for broadcast quality television transmission. A hardware implementation was developed for a DPCM-based digital television bandwidth compression algorithm which processes standard NTSC composite color television signals and produces broadcast quality video in real time at an average of 1.8 bits/pixel. The data compression algorithm and the hardware implementation of the CODEC are described, and performance results are provided.

  12. Digital CODEC for real-time processing of broadcast quality video signals at 1.8 bits/pixel

    NASA Technical Reports Server (NTRS)

    Shalkhauser, Mary JO; Whyte, Wayne A.

    1991-01-01

    Advances in very large scale integration and recent work in the field of bandwidth efficient digital modulation techniques have combined to make digital video processing technically feasible an potentially cost competitive for broadcast quality television transmission. A hardware implementation was developed for DPCM (differential pulse code midulation)-based digital television bandwidth compression algorithm which processes standard NTSC composite color television signals and produces broadcast quality video in real time at an average of 1.8 bits/pixel. The data compression algorithm and the hardware implementation of the codec are described, and performance results are provided.

  13. Study of Potential Standardization of Digital Freeze Frame Video Codecs.

    DTIC Science & Technology

    1984-01-01

    and MAR track an input clock over a very wide range. These are dependent on the modem used in any specific application. Interface connectors are those...terminals, 56K bit digital transmission sets). We have a limited custan capability and are not in the custom unit business. 1.,o .2e e.. , , 4g..2. . j...will) are designed for narrowband operation. We build our own modems which send .’e- pixels at a rate of 1969 pixels/second. Grey scale information is

  14. Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting.

    PubMed

    Wöllmer, Martin; Marchi, Erik; Squartini, Stefano; Schuller, Björn

    2011-09-01

    Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today's automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database-a corpus containing emotionally colored conversations with a cognitive system for "Sensitive Artificial Listening".

  15. Automated Discovery of Speech Act Categories in Educational Games

    ERIC Educational Resources Information Center

    Rus, Vasile; Moldovan, Cristian; Niraula, Nobal; Graesser, Arthur C.

    2012-01-01

    In this paper we address the important task of automated discovery of speech act categories in dialogue-based, multi-party educational games. Speech acts are important in dialogue-based educational systems because they help infer the student speaker's intentions (the task of speech act classification) which in turn is crucial to providing adequate…

  16. Robust video transmission with distributed source coded auxiliary channel.

    PubMed

    Wang, Jiajun; Majumdar, Abhik; Ramchandran, Kannan

    2009-12-01

    We propose a novel solution to the problem of robust, low-latency video transmission over lossy channels. Predictive video codecs, such as MPEG and H.26x, are very susceptible to prediction mismatch between encoder and decoder or "drift" when there are packet losses. These mismatches lead to a significant degradation in the decoded quality. To address this problem, we propose an auxiliary codec system that sends additional information alongside an MPEG or H.26x compressed video stream to correct for errors in decoded frames and mitigate drift. The proposed system is based on the principles of distributed source coding and uses the (possibly erroneous) MPEG/H.26x decoder reconstruction as side information at the auxiliary decoder. The distributed source coding framework depends upon knowing the statistical dependency (or correlation) between the source and the side information. We propose a recursive algorithm to analytically track the correlation between the original source frame and the erroneous MPEG/H.26x decoded frame. Finally, we propose a rate-distortion optimization scheme to allocate the rate used by the auxiliary encoder among the encoding blocks within a video frame. We implement the proposed system and present extensive simulation results that demonstrate significant gains in performance both visually and objectively (on the order of 2 dB in PSNR over forward error correction based solutions and 1.5 dB in PSNR over intrarefresh based solutions for typical scenarios) under tight latency constraints.

  17. Multiframe video coding for improved performance over wireless channels.

    PubMed

    Budagavi, M; Gibson, J D

    2001-01-01

    We propose and evaluate a multi-frame extension to block motion compensation (BMC) coding of videoconferencing-type video signals for wireless channels. The multi-frame BMC (MF-BMC) coder makes use of the redundancy that exists across multiple frames in typical videoconferencing sequences to achieve additional compression over that obtained by using the single frame BMC (SF-BMC) approach, such as in the base-level H.263 codec. The MF-BMC approach also has an inherent ability of overcoming some transmission errors and is thus more robust when compared to the SF-BMC approach. We model the error propagation process in MF-BMC coding as a multiple Markov chain and use Markov chain analysis to infer that the use of multiple frames in motion compensation increases robustness. The Markov chain analysis is also used to devise a simple scheme which randomizes the selection of the frame (amongst the multiple previous frames) used in BMC to achieve additional robustness. The MF-BMC coders proposed are a multi-frame extension of the base level H.263 coder and are found to be more robust than the base level H.263 coder when subjected to simulated errors commonly encountered on wireless channels.

  18. Wavelet-based scalable L-infinity-oriented compression.

    PubMed

    Alecu, Alin; Munteanu, Adrian; Cornelis, Jan P H; Schelkens, Peter

    2006-09-01

    Among the different classes of coding techniques proposed in literature, predictive schemes have proven their outstanding performance in near-lossless compression. However, these schemes are incapable of providing embedded L(infinity)-oriented compression, or, at most, provide a very limited number of potential L(infinity) bit-stream truncation points. We propose a new multidimensional wavelet-based L(infinity)-constrained scalable coding framework that generates a fully embedded L(infinity)-oriented bit stream and that retains the coding performance and all the scalability options of state-of-the-art L2-oriented wavelet codecs. Moreover, our codec instantiation of the proposed framework clearly outperforms JPEG2000 in L(infinity) coding sense.

  19. Toward objective image quality metrics: the AIC Eval Program of the JPEG

    NASA Astrophysics Data System (ADS)

    Richter, Thomas; Larabi, Chaker

    2008-08-01

    Objective quality assessment of lossy image compression codecs is an important part of the recent call of the JPEG for Advanced Image Coding. The target of the AIC ad-hoc group is twofold: First, to receive state-of-the-art still image codecs and to propose suitable technology for standardization; and second, to study objective image quality metrics to evaluate the performance of such codes. Even tthough the performance of an objective metric is defined by how well it predicts the outcome of a subjective assessment, one can also study the usefulness of a metric in a non-traditional way indirectly, namely by measuring the subjective quality improvement of a codec that has been optimized for a specific objective metric. This approach shall be demonstrated here on the recently proposed HDPhoto format14 introduced by Microsoft and a SSIM-tuned17 version of it by one of the authors. We compare these two implementations with JPEG1 in two variations and a visual and PSNR optimal JPEG200013 implementation. To this end, we use subjective and objective tests based on the multiscale SSIM and a new DCT based metric.

  20. Speech-feature discrimination in children with Asperger syndrome as determined with the multi-feature mismatch negativity paradigm.

    PubMed

    Kujala, T; Kuuluvainen, S; Saalasti, S; Jansson-Verkasalo, E; von Wendt, L; Lepistö, T

    2010-09-01

    Asperger syndrome, belonging to the autistic spectrum of disorders, involves deficits in social interaction and prosodic use of language but normal development of formal language abilities. Auditory processing involves both hyper- and hypoactive reactivity to acoustic changes. Responses composed of mismatch negativity (MMN) and obligatory components were recorded for five types of deviations in syllables (vowel, vowel duration, consonant, syllable frequency, syllable intensity) with the multi-feature paradigm from 8-12-year old children with Asperger syndrome. Children with Asperger syndrome had larger MMNs for intensity and smaller MMNs for frequency changes than typically developing children, whereas no MMN group differences were found for the other deviant stimuli. Furthermore, children with Asperger syndrome performed more poorly than controls in Comprehension of Instructions subtest of a language test battery. Cortical speech-sound discrimination is aberrant in children with Asperger syndrome. This is evident both as hypersensitive and depressed neural reactions to speech-sound changes, and is associated with features (frequency, intensity) which are relevant for prosodic processing. The multi-feature MMN paradigm, which includes variation and thereby resembles natural speech hearing circumstances, suggests abnormal pattern of speech discrimination in Asperger syndrome, including both hypo- and hypersensitive responses for speech features. 2010 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  1. Relative Salience of Speech Rhythm and Speech Rate on Perceived Foreign Accent in a Second Language.

    PubMed

    Polyanskaya, Leona; Ordin, Mikhail; Busa, Maria Grazia

    2017-09-01

    We investigated the independent contribution of speech rate and speech rhythm to perceived foreign accent. To address this issue we used a resynthesis technique that allows neutralizing segmental and tonal idiosyncrasies between identical sentences produced by French learners of English at different proficiency levels and maintaining the idiosyncrasies pertaining to prosodic timing patterns. We created stimuli that (1) preserved the idiosyncrasies in speech rhythm while controlling for the differences in speech rate between the utterances; (2) preserved the idiosyncrasies in speech rate while controlling for the differences in speech rhythm between the utterances; and (3) preserved the idiosyncrasies both in speech rate and speech rhythm. All the stimuli were created in intoned (with imposed intonational contour) and flat (with monotonized, constant F0) conditions. The original and the resynthesized sentences were rated by native speakers of English for degree of foreign accent. We found that both speech rate and speech rhythm influence the degree of perceived foreign accent, but the effect of speech rhythm is larger than that of speech rate. We also found that intonation enhances the perception of fine differences in rhythmic patterns but reduces the perceptual salience of fine differences in speech rate.

  2. An improvement analysis on video compression using file segmentation

    NASA Astrophysics Data System (ADS)

    Sharma, Shubhankar; Singh, K. John; Priya, M.

    2017-11-01

    From the past two decades the extreme evolution of the Internet has lead a massive rise in video technology and significantly video consumption over the Internet which inhabits the bulk of data traffic in general. Clearly, video consumes that so much data size on the World Wide Web, to reduce the burden on the Internet and deduction of bandwidth consume by video so that the user can easily access the video data.For this, many video codecs are developed such as HEVC/H.265 and V9. Although after seeing codec like this one gets a dilemma of which would be improved technology in the manner of rate distortion and the coding standard.This paper gives a solution about the difficulty for getting low delay in video compression and video application e.g. ad-hoc video conferencing/streaming or observation by surveillance. Also this paper describes the benchmark of HEVC and V9 technique of video compression on subjective oral estimations of High Definition video content, playback on web browsers. Moreover, this gives the experimental ideology of dividing the video file into several segments for compression and putting back together to improve the efficiency of video compression on the web as well as on the offline mode.

  3. Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age

    PubMed Central

    Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik

    2015-01-01

    Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259

  4. The Nationwide Speech Project: A multi-talker multi-dialect speech corpus

    NASA Astrophysics Data System (ADS)

    Clopper, Cynthia G.; Pisoni, David B.

    2004-05-01

    Most research on regional phonological variation relies on field recordings of interview speech. Recent research on the perception of dialect variation by naive listeners, however, has relied on read sentence materials in order to control for phonological and lexical content and syntax. The Nationwide Speech Project corpus was designed to obtain a large amount of speech from a number of talkers representing different regional varieties of American English. Five male and five female talkers from each of six different dialect regions in the United States were recorded reading isolated words, sentences, and passages, and in conversations with the experimenter. The talkers ranged in age from 18 and 25 years old and they were all monolingual native speakers of American English. They had lived their entire life in one dialect region and both of their parents were raised in the same region. Results of an acoustic analysis of the vowel spaces of the talkers included in the Nationwide Speech Project will be presented. [Work supported by NIH.

  5. Investigating Holistic Measures of Speech Prosody

    ERIC Educational Resources Information Center

    Cunningham, Dana Aliel

    2012-01-01

    Speech prosody is a multi-faceted dimension of speech which can be measured and analyzed in a variety of ways. In this study, the speech prosody of Mandarin L1 speakers, English L2 speakers, and English L1 speakers was assessed by trained raters who listened to sound clips of the speakers responding to a graph prompt and reading a short passage.…

  6. Sound-direction identification, interaural time delay discrimination, and speech intelligibility advantages in noise for a bilateral cochlear implant user.

    PubMed

    Van Hoesel, Richard; Ramsden, Richard; Odriscoll, Martin

    2002-04-01

    To characterize some of the benefits available from using two cochlear implants compared with just one, sound-direction identification (ID) abilities, sensitivity to interaural time delays (ITDs) and speech intelligibility in noise were measured for a bilateral multi-channel cochlear implant user. Sound-direction ID in the horizontal plane was tested with a bilateral cochlear implant user. The subject was tested both unilaterally and bilaterally using two independent behind-the-ear ESPRIT (Cochlear Ltd.) processors, as well as bilaterally using custom research processors. Pink noise bursts were presented using an 11-loudspeaker array spanning the subject's frontal 180 degrees arc in an anechoic room. After each burst, the subject was asked to identify which loudspeaker had produced the sound. No explicit training, and no feedback were given. Presentation levels were nominally at 70 dB SPL, except for a repeat experiment using the clinical devices where the presentation levels were reduced to 60 dB SPL to avoid activation of the devices' automatic gain control (AGC) circuits. Overall presentation levels were randomly varied by +/- 3 dB. For the research processor, a "low-update-rate" and a "high-update-rate" strategy were tested. Direct measurements of ITD just noticeable differences (JNDs) were made using a 3 AFC paradigm targeting 70% correct performance on the psychometric function. Stimuli included simple, low-rate electrical pulse trains as well as high-rate pulse trains modulated at 100 Hz. Speech data comparing monaural and binaural performance in noise were also collected with both low, and high update-rate strategies on the research processors. Open-set sentences were presented from directly in front of the subject and competing multi-talker babble noise was presented from the same loudspeaker, or from a loudspeaker placed 90 degrees to the left or right of the subject. For the sound-direction ID task, monaural performance using the clinical devices showed large mean absolute errors of 81 degrees and 73 degrees, with standard deviations (averaged across all 11 loud-speakers) of 10 degrees and 17 degrees, for left and right ears, respectively. Fore bilateral device use at a presentation level of 70 dB SPL, the mean error improved to about 16 degrees with an average standard deviation of 18 degrees. When the presentation level was decreased to 60 dB SPL to avoid activation of the automatic gain control (AGC) circuits in the clinical processors, the mean response error improved further to 8 degrees with a standard deviation of 13 degrees. Further tests with the custom research processors, which had a higher stimulation rate and did not include AGCs, showed comparable response errors: around 8 or 9 degrees and a standard deviation of about 11 degrees for both update rates. The best ITD JNDs measured for this subject were between 350 to 400 microsec for simple low-rate pulse trains. Speech results showed a substantial headshadow advantage for bilateral device use when speech and noise were spatially separated, but little evidence of binaural unmasking. For spatially coincident speech and noise, listening with both ears showed similar results to listening with either side alone when loudness summation was compensated for. No significant differences were observed between binaural results for high and low update-rates in any test configuration. Only for monaural listening in one test configuration did the high rate show a small significant improvement over the low rate. Results show that even if interaural time delay cues are not well coded or perceived, bilateral implants can offer important advantages, both for speech in noise as well as for sound-direction identification.

  7. Intentional changes in sound pressure level and rate: their impact on measures of respiration, phonation, and articulation.

    PubMed

    Dromey, C; Ramig, L O

    1998-10-01

    The purpose of the study was to compare the effects of changing sound pressure level (SPL) and rate on respiratory, phonatory, and articulatory behavior during sentence production. Ten subjects, 5 men and 5 women, repeated the sentence, "I sell a sapapple again," under 5 SPL and 5 rate conditions. From a multi-channel recording, measures were made of lung volume (LV), SPL, fundamental frequency (F0), semitone standard deviation (STSD), and upper and lower lip displacements and peak velocities. Loud speech led to increases in LV initiation, LV termination, F0, STSD, and articulatory displacements and peak velocities for both lips. Token-to-token variability in these articulatory measures generally decreased as SPL increased, whereas rate increases were associated with increased lip movement variability. LV excursion decreased as rate increased. F0 for the men and STSD for both genders increased with rate. Lower lip displacements became smaller for faster speech. The interspeaker differences in velocity change as a function of rate contrasted with the more consistent velocity performance across speakers for changes in SPL. Because SPL and rate change are targeted in therapy for dysarthria, the present data suggest directions for future research with disordered speakers.

  8. The effect of speech rate on stuttering frequency, phonated intervals, speech effort, and speech naturalness during chorus reading.

    PubMed

    Davidow, Jason H; Ingham, Roger J

    2013-01-01

    This study examined the effect of speech rate on phonated intervals (PIs), in order to test whether a reduction in the frequency of short PIs is an important part of the fluency-inducing mechanism of chorus reading. The influence of speech rate on stuttering frequency, speaker-judged speech effort, and listener-judged naturalness was also examined. An added purpose was to determine if chorus reading could be further refined so as to provide a perceptual guide for gauging the level of physical effort exerted during speech production. A repeated-measures design was used to compare data obtained during control reading conditions and during several chorus reading conditions produced at different speech rates. Participants included 8 persons who stutter (PWS) between the ages of 16 and 32 years. There were significant reductions in the frequency of short PIs from the habitual reading condition during slower chorus conditions, no change when speech rates were matched between habitual reading and chorus conditions, and an increase in the frequency of short PIs during chorus reading produced at a faster rate than the habitual condition. Speech rate did not have an effect on stuttering frequency during chorus reading. In general, speech effort ratings improved and naturalness ratings worsened as speech rate decreased. These results provide evidence that (a) a reduction in the frequency of short PIs is not necessary for fluency improvement during chorus reading, and (b) speech rate may be altered to provide PWS with a more appropriate reference for how physically effortful normally fluent speech production should be. Future investigations should examine the necessity of changes in the activation of neural regions during chorus reading, the possibility of defining individualized units on a 9-point effort scale, and if there are upper and lower speech rate boundaries for receiving ratings of "highly natural sounding" speech during chorus reading. The reader will be able to: (1) describe the effect of changes in speech rate on the frequency of short phonated intervals during chorus reading, (2) describe changes to speaker-judged speech effort as speech rate changes during chorus reading, (3) and describe the effect of changes in speech rate on listener-judged naturalness ratings during chorus reading. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. Robust Transmission of H.264/AVC Streams Using Adaptive Group Slicing and Unequal Error Protection

    NASA Astrophysics Data System (ADS)

    Thomos, Nikolaos; Argyropoulos, Savvas; Boulgouris, Nikolaos V.; Strintzis, Michael G.

    2006-12-01

    We present a novel scheme for the transmission of H.264/AVC video streams over lossy packet networks. The proposed scheme exploits the error-resilient features of H.264/AVC codec and employs Reed-Solomon codes to protect effectively the streams. A novel technique for adaptive classification of macroblocks into three slice groups is also proposed. The optimal classification of macroblocks and the optimal channel rate allocation are achieved by iterating two interdependent steps. Dynamic programming techniques are used for the channel rate allocation process in order to reduce complexity. Simulations clearly demonstrate the superiority of the proposed method over other recent algorithms for transmission of H.264/AVC streams.

  10. Multi-time resolution analysis of speech: evidence from psychophysics

    PubMed Central

    Chait, Maria; Greenberg, Steven; Arai, Takayuki; Simon, Jonathan Z.; Poeppel, David

    2015-01-01

    How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10–40 Hz modulation frequency) and syllable-sized (2–10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~4 Hz; Slow) and rapid (~33 Hz; Shigh) modulations—corresponding to ~250 and ~30 ms, the average duration of syllables and certain phonetic properties, respectively—were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow and Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility—a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing. PMID:26136650

  11. [Effects of fundamental frequency and speech rate on impression formation].

    PubMed

    Uchida, Teruhisa; Nakaune, Naoko

    2004-12-01

    This study investigated the systematic relationship between nonverbal features of speech and personality trait ratings of the speaker. In Study 1, fundamental frequency (F0) in original speech was converted into five levels from 64% to 156.25%. Then 132 undergraduates rated each of the converted speeches in terms of personality traits. In Study 2 134 undergraduates similarly rated the speech stimuli, which had five speech rate levels as well as two F0 levels. Results showed that listener ratings along Big Five dimensions were mostly independent. Each dimension had a slightly different change profile over the five levels of F0 and speech rate. A quadratic regression equation provided a good approximation for each rating as a function of F0 or speech rate. The quadratic regression equations put together would provide us with a rough estimate of personality trait impression as a function of prosodic features. The functional relationship among F0, speech rate, and trait ratings was shown as a curved surface in the three-dimensional space.

  12. [Post-stroke speech disorder treated with acupuncture and psychological intervention combined with rehabilitation training: a randomized controlled trial].

    PubMed

    Wang, Ling; Liu, Shao-ming; Liu, Min; Li, Bao-jun; Hui, Zhen-liang; Gao, Xiang

    2011-06-01

    To assess the clinical efficacy on post-stroke speech disorder treated with acupuncture and psychological intervention combined with rehabilitation training. The multi-central randomized controlled study was adopted. One hundred and twenty cases of brain stroke were divided into a speech rehabilitation group (control group), a speech rehabilitation plus acupuncture group (observation group 1) and a speech rehabilitation plus acupuncture combined with psychotherapy group (observation group 2), 40 cases in each one. The rehabilitation training was conducted by a professional speech trainer. In acupuncture treatment, speech function area in scalp acupuncture, Jinjin (EX-HN 12) and Yuye (EX-HN 13) in tongue acupuncture and Lianquan (CV 23) were the basic points. The supplementary points were selected according to syndrome differentiation. Bloodletting method was used in combination with acupuncture. Psychotherapy was applied by the physician in psychiatric department of the hospital. The corresponding programs were used in each group. Examination of Aphasia of Chinese of Beijing Hospital was adopted to observe the oral speech expression, listening comprehension and reading and writing ability. After 21-day treatment, the total effective rate was 92.5% (37/40) in observation group 1, 97.5% (39/40) in observation group 2 and 87.5% (35/40) in control group. The efficacies were similar in comparison among 3 groups. The remarkable effective rate was 15.0% (6/40) in observation group 1, 50.0% (20/40) in observation group 2 and 2.5% (1/40) in control group. The result in observation group 2 was superior to the other two groups (P<0.01, P<0.001). In comparison of the improvements of oral expression, listening comprehension, reading and writing ability, all of the 3 groups had achieved the improvements to different extents after treatment (P<0.01, P<0.001). The results in observation group 2 were better than those in observation group 1 and control group. Acupuncture and psychological intervention combined with rehabilitation training is obviously advantageous in the treatment of post-stroke speech disorder.

  13. Joint source-channel coding for motion-compensated DCT-based SNR scalable video.

    PubMed

    Kondi, Lisimachos P; Ishtiaq, Faisal; Katsaggelos, Aggelos K

    2002-01-01

    In this paper, we develop an approach toward joint source-channel coding for motion-compensated DCT-based scalable video coding and transmission. A framework for the optimal selection of the source and channel coding rates over all scalable layers is presented such that the overall distortion is minimized. The algorithm utilizes universal rate distortion characteristics which are obtained experimentally and show the sensitivity of the source encoder and decoder to channel errors. The proposed algorithm allocates the available bit rate between scalable layers and, within each layer, between source and channel coding. We present the results of this rate allocation algorithm for video transmission over a wireless channel using the H.263 Version 2 signal-to-noise ratio (SNR) scalable codec for source coding and rate-compatible punctured convolutional (RCPC) codes for channel coding. We discuss the performance of the algorithm with respect to the channel conditions, coding methodologies, layer rates, and number of layers.

  14. High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.

    A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less

  15. High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings

    PubMed Central

    Anumanchipalli, Gopala K.; Dichter, Benjamin; Chaisanguanthum, Kris S.; Johnson, Keith; Chang, Edward F.

    2016-01-01

    A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial—especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship across speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics. PMID:27019106

  16. High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings

    DOE PAGES

    Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.; ...

    2016-03-28

    A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less

  17. [Cochlear implantation in patients with Waardenburg syndrome type II].

    PubMed

    Wan, Liangcai; Guo, Menghe; Chen, Shuaijun; Liu, Shuangriu; Chen, Hao; Gong, Jian

    2010-05-01

    To describe the multi-channel cochlear implantation in patients with Waardenburg syndrome including surgeries, pre and postoperative hearing assessments as well as outcomes of speech recognition. Multi-channel cochlear implantation surgeries have been performed in 12 cases with Waardenburg syndrome type II in our department from 2000 to 2008. All the patients received multi-channel cochlear implantation through transmastoid facial recess approach. The postoperative outcomes of 12 cases were compared with 12 cases with no inner ear malformation as a control group. The electrodes were totally inserted into the cochlear successfully, there was no facial paralysis and cerebrospinal fluid leakage occurred after operation. The hearing threshold in this series were similar to that of the normal cochlear implantation. After more than half a year of speech rehabilitation, the abilities of speech discrimination and spoken language of all the patients were improved compared with that of preoperation. Multi-channel cochlear implantation could be performed in the cases with Waardenburg syndrome, preoperative hearing and images assessments should be done.

  18. Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data

    ERIC Educational Resources Information Center

    Gow, David W., Jr.; Segawa, Jennifer A.

    2009-01-01

    The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…

  19. NASA's mobile satellite communications program; ground and space segment technologies

    NASA Technical Reports Server (NTRS)

    Naderi, F.; Weber, W. J.; Knouse, G. H.

    1984-01-01

    This paper describes the Mobile Satellite Communications Program of the United States National Aeronautics and Space Administration (NASA). The program's objectives are to facilitate the deployment of the first generation commercial mobile satellite by the private sector, and to technologically enable future generations by developing advanced and high risk ground and space segment technologies. These technologies are aimed at mitigating severe shortages of spectrum, orbital slot, and spacecraft EIRP which are expected to plague the high capacity mobile satellite systems of the future. After a brief introduction of the concept of mobile satellite systems and their expected evolution, this paper outlines the critical ground and space segment technologies. Next, the Mobile Satellite Experiment (MSAT-X) is described. MSAT-X is the framework through which NASA will develop advanced ground segment technologies. An approach is outlined for the development of conformal vehicle antennas, spectrum and power-efficient speech codecs, and modulation techniques for use in the non-linear faded channels and efficient multiple access schemes. Finally, the paper concludes with a description of the current and planned NASA activities aimed at developing complex large multibeam spacecraft antennas needed for future generation mobile satellite systems.

  20. The Levels of Speech Usage Rating Scale: Comparison of Client Self-Ratings with Speech Pathologist Ratings

    ERIC Educational Resources Information Center

    Gray, Christina; Baylor, Carolyn; Eadie, Tanya; Kendall, Diane; Yorkston, Kathryn

    2012-01-01

    Background: The term "speech usage" refers to what people want or need to do with their speech to fulfil the communication demands in their life roles. Speech-language pathologists (SLPs) need to know about clients' speech usage to plan appropriate interventions to meet their life participation goals. The Levels of Speech Usage is a…

  1. Iconic Gestures for Robot Avatars, Recognition and Integration with Speech.

    PubMed

    Bremner, Paul; Leonards, Ute

    2016-01-01

    Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realized remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances.

  2. Binaural release from masking with single- and multi-electrode stimulation in children with cochlear implantsa)

    PubMed Central

    Todd, Ann E.; Goupell, Matthew J.; Litovsky, Ruth Y.

    2016-01-01

    Cochlear implants (CIs) provide children with access to speech information from a young age. Despite bilateral cochlear implantation becoming common, use of spatial cues in free field is smaller than in normal-hearing children. Clinically fit CIs are not synchronized across the ears; thus binaural experiments must utilize research processors that can control binaural cues with precision. Research to date has used single pairs of electrodes, which is insufficient for representing speech. Little is known about how children with bilateral CIs process binaural information with multi-electrode stimulation. Toward the goal of improving binaural unmasking of speech, this study evaluated binaural unmasking with multi- and single-electrode stimulation. Results showed that performance with multi-electrode stimulation was similar to the best performance with single-electrode stimulation. This was similar to the pattern of performance shown by normal-hearing adults when presented an acoustic CI simulation. Diotic and dichotic signal detection thresholds of the children with CIs were similar to those of normal-hearing children listening to a CI simulation. The magnitude of binaural unmasking was not related to whether the children with CIs had good interaural time difference sensitivity. Results support the potential for benefits from binaural hearing and speech unmasking in children with bilateral CIs. PMID:27475132

  3. Binaural release from masking with single- and multi-electrode stimulation in children with cochlear implants.

    PubMed

    Todd, Ann E; Goupell, Matthew J; Litovsky, Ruth Y

    2016-07-01

    Cochlear implants (CIs) provide children with access to speech information from a young age. Despite bilateral cochlear implantation becoming common, use of spatial cues in free field is smaller than in normal-hearing children. Clinically fit CIs are not synchronized across the ears; thus binaural experiments must utilize research processors that can control binaural cues with precision. Research to date has used single pairs of electrodes, which is insufficient for representing speech. Little is known about how children with bilateral CIs process binaural information with multi-electrode stimulation. Toward the goal of improving binaural unmasking of speech, this study evaluated binaural unmasking with multi- and single-electrode stimulation. Results showed that performance with multi-electrode stimulation was similar to the best performance with single-electrode stimulation. This was similar to the pattern of performance shown by normal-hearing adults when presented an acoustic CI simulation. Diotic and dichotic signal detection thresholds of the children with CIs were similar to those of normal-hearing children listening to a CI simulation. The magnitude of binaural unmasking was not related to whether the children with CIs had good interaural time difference sensitivity. Results support the potential for benefits from binaural hearing and speech unmasking in children with bilateral CIs.

  4. EFFECT OF DELAYED AUDITORY FEEDBACK, SPEECH RATE, AND SEX ON SPEECH PRODUCTION.

    PubMed

    Stuart, Andrew; Kalinowski, Joseph

    2015-06-01

    Perturbations in Delayed Auditory Feedback (DAF) and speech rate were examined as sources of disruptions in speech between men and women. Fluent adult men (n = 16) and women (n = 16) spoke at a normal and an imposed fast rate of speech with 0, 25, 50, 100, and 200 msec. DAF. The syllable rate significantly increased when participants were instructed to speak at a fast rate, and the syllable rate decreased with increasing DAF delays. Men's speech rate was significantly faster during the fast speech rate condition with a 200 msec. DAF. Disfluencies increased with increasing DAF delay. Significantly more disfluency occurred at delays of 25 and 50 msec. at the fast rate condition, while more disfluency occurred at 100 and 200 msec. in normal rate conditions. Men and women did not display differences in the number of disfluencies. These findings demonstrate sex differences in susceptibility to perturbations in DAF and speech rate suggesting feedforward/feedback subsystems that monitor vocalizations may be different between sexes.

  5. Experimental research and comparison of LDPC and RS channel coding in ultraviolet communication systems.

    PubMed

    Wu, Menglong; Han, Dahai; Zhang, Xiang; Zhang, Feng; Zhang, Min; Yue, Guangxin

    2014-03-10

    We have implemented a modified Low-Density Parity-Check (LDPC) codec algorithm in ultraviolet (UV) communication system. Simulations are conducted with measured parameters to evaluate the LDPC-based UV system performance. Moreover, LDPC (960, 480) and RS (18, 10) are implemented and experimented via a non-line-of-sight (NLOS) UV test bed. The experimental results are in agreement with the simulation and suggest that based on the given power and 10(-3)bit error rate (BER), in comparison with an uncoded system, average communication distance increases 32% with RS code, while 78% with LDPC code.

  6. Development of a mobile satellite communication unit

    NASA Technical Reports Server (NTRS)

    Suzuki, Ryutaro; Ikegami, Tetsushi; Hamamoto, Naokazu; Taguchi, Tetsu; Endo, Nobuhiro; Yamamoto, Osamu; Ichiyoshi, Osamu

    1988-01-01

    A compact 210(W) x 280(H) x 330(D) mm mobile terminal capable of transmitting voice and data through L-band mobile satellites is described. The Voice Codec can convert an analog voice to or from digital codes at rates of 9.6, 8 and 4.8 kb/s by an MPC algorithm. The terminal functions with a single 12 V power supplied vehicle battery. The equipment can operate at any L-band frequency allocated for mobile uses in a full duplex mode and will soon be put into a field test via Japans's ETS-V satellite.

  7. How our own speech rate influences our perception of others.

    PubMed

    Bosker, Hans Rutger

    2017-08-01

    In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through 6 experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing prerecorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  8. Systematic studies of modified vocalization: the effect of speech rate on speech production measures during metronome-paced speech in persons who stutter.

    PubMed

    Davidow, Jason H

    2014-01-01

    Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech in order to determine changes that may be important for fluency during this fluency-inducing condition. Thirteen persons who stutter (PWS), aged 18-62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Vowel duration, voice onset time, pressure rise time and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30-100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. © 2013 Royal College of Speech and Language Therapists.

  9. Cortical Tracking of Global and Local Variations of Speech Rhythm during Connected Natural Speech Perception.

    PubMed

    Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

    2018-06-19

    During natural speech perception, listeners must track the global speaking rate, that is, the overall rate of incoming linguistic information, as well as transient, local speaking rate variations occurring within the global speaking rate. Here, we address the hypothesis that this tracking mechanism is achieved through coupling of cortical signals to the amplitude envelope of the perceived acoustic speech signals. Cortical signals were recorded with magnetoencephalography (MEG) while participants perceived spontaneously produced speech stimuli at three global speaking rates (slow, normal/habitual, and fast). Inherently to spontaneously produced speech, these stimuli also featured local variations in speaking rate. The coupling between cortical and acoustic speech signals was evaluated using audio-MEG coherence. Modulations in audio-MEG coherence spatially differentiated between tracking of global speaking rate, highlighting the temporal cortex bilaterally and the right parietal cortex, and sensitivity to local speaking rate variations, emphasizing the left parietal cortex. Cortical tuning to the temporal structure of natural connected speech thus seems to require the joint contribution of both auditory and parietal regions. These findings suggest that cortical tuning to speech rhythm operates on two functionally distinct levels: one encoding the global rhythmic structure of speech and the other associated with online, rapidly evolving temporal predictions. Thus, it may be proposed that speech perception is shaped by evolutionary tuning, a preference for certain speaking rates, and predictive tuning, associated with cortical tracking of the constantly changing rate of linguistic information in a speech stream.

  10. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    NASA Astrophysics Data System (ADS)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  11. Speech Rate Entrainment in Children and Adults With and Without Autism Spectrum Disorder.

    PubMed

    Wynn, Camille J; Borrie, Stephanie A; Sellers, Tyra P

    2018-05-03

    Conversational entrainment, a phenomenon whereby people modify their behaviors to match their communication partner, has been evidenced as critical to successful conversation. It is plausible that deficits in entrainment contribute to the conversational breakdowns and social difficulties exhibited by people with autism spectrum disorder (ASD). This study examined speech rate entrainment in children and adult populations with and without ASD. Sixty participants including typically developing children, children with ASD, typically developed adults, and adults with ASD participated in a quasi-conversational paradigm with a pseudoconfederate. The confederate's speech rate was digitally manipulated to create slow and fast speech rate conditions. Typically developed adults entrained their speech rate in the quasi-conversational paradigm, using a faster rate during the fast speech rate conditions and a slower rate during the slow speech rate conditions. This entrainment pattern was not evident in adults with ASD or in children populations. Findings suggest that speech rate entrainment is a developmentally acquired skill and offers preliminary evidence of speech rate entrainment deficits in adults with ASD. Impairments in this area may contribute to the conversational breakdowns and social difficulties experienced by this population. Future work is needed to advance this area of inquiry.

  12. A multi-band environment-adaptive approach to noise suppression for cochlear implants.

    PubMed

    Saki, Fatemeh; Mirzahasanloo, Taher; Kehtarnavaz, Nasser

    2014-01-01

    This paper presents an improved environment-adaptive noise suppression solution for the cochlear implants speech processing pipeline. This improvement is achieved by using a multi-band data-driven approach in place of a previously developed single-band data-driven approach. Seven commonly encountered noisy environments of street, car, restaurant, mall, bus, pub and train are considered to quantify the improvement. The results obtained indicate about 10% improvement in speech quality measures.

  13. Mimicking aphasic semantic errors in normal speech production: evidence from a novel experimental paradigm.

    PubMed

    Hodgson, Catherine; Lambon Ralph, Matthew A

    2008-01-01

    Semantic errors are commonly found in semantic dementia (SD) and some forms of stroke aphasia and provide insights into semantic processing and speech production. Low error rates are found in standard picture naming tasks in normal controls. In order to increase error rates and thus provide an experimental model of aphasic performance, this study utilised a novel method- tempo picture naming. Experiment 1 showed that, compared to standard deadline naming tasks, participants made more errors on the tempo picture naming tasks. Further, RTs were longer and more errors were produced to living items than non-living items a pattern seen in both semantic dementia and semantically-impaired stroke aphasic patients. Experiment 2 showed that providing the initial phoneme as a cue enhanced performance whereas providing an incorrect phonemic cue further reduced performance. These results support the contention that the tempo picture naming paradigm reduces the time allowed for controlled semantic processing causing increased error rates. This experimental procedure would, therefore, appear to mimic the performance of aphasic patients with multi-modal semantic impairment that results from poor semantic control rather than the degradation of semantic representations observed in semantic dementia [Jefferies, E. A., & Lambon Ralph, M. A. (2006). Semantic impairment in stoke aphasia vs. semantic dementia: A case-series comparison. Brain, 129, 2132-2147]. Further implications for theories of semantic cognition and models of speech processing are discussed.

  14. Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures During Metronome-Paced Speech in Persons who Stutter

    PubMed Central

    Davidow, Jason H.

    2013-01-01

    Background Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. Aims This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech, in order to determine changes that may be important for fluency during this fluency-inducing condition. Methods and Procedures Thirteen persons who stutter (PWS), aged 18–62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Outcomes & Results Vowel duration, voice onset time, pressure rise time, and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30–100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. Conclusions & Implications A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. PMID:24372888

  15. Speech and pause characteristics associated with voluntary rate reduction in Parkinson's disease and Multiple Sclerosis.

    PubMed

    Tjaden, Kris; Wilding, Greg

    2011-01-01

    The primary purpose of this study was to investigate how speakers with Parkinson's disease (PD) and Multiple Sclerosis (MS) accomplish voluntary reductions in speech rate. A group of talkers with no history of neurological disease was included for comparison. This study was motivated by the idea that knowledge of how speakers with dysarthria voluntarily accomplish a reduced speech rate would contribute toward a descriptive model of speaking rate change in dysarthria. Such a model has the potential to assist in identifying rate control strategies to receive focus in clinical treatment programs and also would advance understanding of global speech timing in dysarthria. All speakers read a passage in Habitual and Slow conditions. Speech rate, articulation rate, pause duration, and pause frequency were measured. All speaker groups adjusted articulation time as well as pause time to reduce overall speech rate. Group differences in how voluntary rate reduction was accomplished were primarily one of quantity or degree. Overall, a slower-than-normal rate was associated with a reduced articulation rate, shorter speech runs that included fewer syllables, and longer more frequent pauses. Taken together, these results suggest that existing skills or strategies used by patients should be emphasized in dysarthria training programs focusing on rate reduction. Results further suggest that a model of voluntary speech rate reduction based on neurologically normal speech shows promise as being applicable for mild to moderate dysarthria. The reader will be able to: (1) describe the importance of studying voluntary adjustments in speech rate in dysarthria, (2) discuss how speakers with Parkinson's disease and Multiple Sclerosis adjust articulation time and pause time to slow speech rate. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. The Effect of Speech Rate on Stuttering Frequency, Phonated Intervals, Speech Effort, and Speech Naturalness during Chorus Reading

    ERIC Educational Resources Information Center

    Davidow, Jason H.; Ingham, Roger J.

    2013-01-01

    Purpose: This study examined the effect of speech rate on phonated intervals (PIs), in order to test whether a reduction in the frequency of short PIs is an important part of the fluency-inducing mechanism of chorus reading. The influence of speech rate on stuttering frequency, speaker-judged speech effort, and listener-judged naturalness was also…

  17. Age-Related Differences in Speech Rate Perception Do Not Necessarily Entail Age-Related Differences in Speech Rate Use

    ERIC Educational Resources Information Center

    Heffner, Christopher C.; Newman, Rochelle S.; Dilley, Laura C.; Idsardi, William J.

    2015-01-01

    Purpose: A new literature has suggested that speech rate can influence the parsing of words quite strongly in speech. The purpose of this study was to investigate differences between younger adults and older adults in the use of context speech rate in word segmentation, given that older adults perceive timing information differently from younger…

  18. Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

    NASA Astrophysics Data System (ADS)

    Dat, Tran Huy; Takeda, Kazuya; Itakura, Fumitada

    We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.

  19. Effects of linear and nonlinear speech rate changes on speech intelligibility in stationary and fluctuating maskers

    PubMed Central

    Cooke, Martin; Aubanel, Vincent

    2017-01-01

    Algorithmic modifications to the durational structure of speech designed to avoid intervals of intense masking lead to increases in intelligibility, but the basis for such gains is not clear. The current study addressed the possibility that the reduced information load produced by speech rate slowing might explain some or all of the benefits of durational modifications. The study also investigated the influence of masker stationarity on the effectiveness of durational changes. Listeners identified keywords in sentences that had undergone linear and nonlinear speech rate changes resulting in overall temporal lengthening in the presence of stationary and fluctuating maskers. Relative to unmodified speech, a slower speech rate produced no intelligibility gains for the stationary masker, suggesting that a reduction in information rate does not underlie intelligibility benefits of durationally modified speech. However, both linear and nonlinear modifications led to substantial intelligibility increases in fluctuating noise. One possibility is that overall increases in speech duration provide no new phonetic information in stationary masking conditions, but that temporal fluctuations in the background increase the likelihood of glimpsing additional salient speech cues. Alternatively, listeners may have benefitted from an increase in the difference in speech rates between the target and background. PMID:28618803

  20. Methods of analysis speech rate: a pilot study.

    PubMed

    Costa, Luanna Maria Oliveira; Martins-Reis, Vanessa de Oliveira; Celeste, Letícia Côrrea

    2016-01-01

    To describe the performance of fluent adults in different measures of speech rate. The study included 24 fluent adults, of both genders, speakers of Brazilian Portuguese, who were born and still living in the metropolitan region of Belo Horizonte, state of Minas Gerais, aged between 18 and 59 years. Participants were grouped by age: G1 (18-29 years), G2 (30-39 years), G3 (40-49 years), and G4 (50-59 years). The speech samples were obtained following the methodology of the Speech Fluency Assessment Protocol. In addition to the measures of speech rate proposed by the protocol (speech rate in words and syllables per minute), the rate of speech into phonemes per second and the articulation rate with and without the disfluencies were calculated. We used the nonparametric Friedman test and the Wilcoxon test for multiple comparisons. Groups were compared using the nonparametric Kruskal Wallis. The significance level was of 5%. There were significant differences between measures of speech rate involving syllables. The multiple comparisons showed that all the three measures were different. There was no effect of age for the studied measures. These findings corroborate previous studies. The inclusion of temporal acoustic measures such as speech rate in phonemes per second and articulation rates with and without disfluencies can be a complementary approach in the evaluation of speech rate.

  1. Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex.

    PubMed

    Ibayashi, Kenji; Kunii, Naoto; Matsuo, Takeshi; Ishishita, Yohei; Shimada, Seijiro; Kawai, Kensuke; Saito, Nobuhito

    2018-01-01

    Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs) is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA), local field potential (LFP), and electrocorticography (ECoG) are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC), we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs.

  2. Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex

    PubMed Central

    Ibayashi, Kenji; Kunii, Naoto; Matsuo, Takeshi; Ishishita, Yohei; Shimada, Seijiro; Kawai, Kensuke; Saito, Nobuhito

    2018-01-01

    Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs) is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA), local field potential (LFP), and electrocorticography (ECoG) are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC), we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs. PMID:29674950

  3. Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

    PubMed Central

    Bremner, Paul; Leonards, Ute

    2016-01-01

    Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realized remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances. PMID:26925010

  4. Multi-function robots with speech interaction and emotion feedback

    NASA Astrophysics Data System (ADS)

    Wang, Hongyu; Lou, Guanting; Ma, Mengchao

    2018-03-01

    Nowadays, the service robots have been applied in many public circumstances; however, most of them still don’t have the function of speech interaction, especially the function of speech-emotion interaction feedback. To make the robot more humanoid, Arduino microcontroller was used in this study for the speech recognition module and servo motor control module to achieve the functions of the robot’s speech interaction and emotion feedback. In addition, W5100 was adopted for network connection to achieve information transmission via Internet, providing broad application prospects for the robot in the area of Internet of Things (IoT).

  5. Speech effort measurement and stuttering: investigating the chorus reading effect.

    PubMed

    Ingham, Roger J; Warner, Allison; Byrd, Anne; Cotton, John

    2006-06-01

    The purpose of this study was to investigate chorus reading's (CR's) effect on speech effort during oral reading by adult stuttering speakers and control participants. The effect of a speech effort measurement highlighting strategy was also investigated. Twelve persistent stuttering (PS) adults and 12 normally fluent control participants completed 1-min base rate readings (BR-nonchorus) and CRs within a BR/CR/BR/CR/BR experimental design. Participants self-rated speech effort using a 9-point scale after each reading trial. Stuttering frequency, speech rate, and speech naturalness measures were also obtained. Instructions highlighting speech effort ratings during BR and CR phases were introduced after the first CR. CR improved speech effort ratings for the PS group, but the control group showed a reverse trend. Both groups' effort ratings were not significantly different during CR phases but were significantly poorer than the control group's effort ratings during BR phases. The highlighting strategy did not significantly change effort ratings. The findings show that CR will produce not only stutter-free and natural sounding speech but also reliable reductions in speech effort. However, these reductions do not reach effort levels equivalent to those achieved by normally fluent speakers, thereby conditioning its use as a gold standard of achievable normal fluency by PS speakers.

  6. Blind speech separation system for humanoid robot with FastICA for audio filtering and separation

    NASA Astrophysics Data System (ADS)

    Budiharto, Widodo; Santoso Gunawan, Alexander Agung

    2016-07-01

    Nowadays, there are many developments in building intelligent humanoid robot, mainly in order to handle voice and image. In this research, we propose blind speech separation system using FastICA for audio filtering and separation that can be used in education or entertainment. Our main problem is to separate the multi speech sources and also to filter irrelevant noises. After speech separation step, the results will be integrated with our previous speech and face recognition system which is based on Bioloid GP robot and Raspberry Pi 2 as controller. The experimental results show the accuracy of our blind speech separation system is about 88% in command and query recognition cases.

  7. Measuring Speech Comprehensibility in Students with Down Syndrome

    PubMed Central

    Woynaroski, Tiffany; Camarata, Stephen

    2016-01-01

    Purpose There is an ongoing need to develop assessments of spontaneous speech that focus on whether the child's utterances are comprehensible to listeners. This study sought to identify the attributes of a stable ratings-based measure of speech comprehensibility, which enabled examining the criterion-related validity of an orthography-based measure of the comprehensibility of conversational speech in students with Down syndrome. Method Participants were 10 elementary school students with Down syndrome and 4 unfamiliar adult raters. Averaged across-observer Likert ratings of speech comprehensibility were called a ratings-based measure of speech comprehensibility. The proportion of utterance attempts fully glossed constituted an orthography-based measure of speech comprehensibility. Results Averaging across 4 raters on four 5-min segments produced a reliable (G = .83) ratings-based measure of speech comprehensibility. The ratings-based measure was strongly (r > .80) correlated with the orthography-based measure for both the same and different conversational samples. Conclusion Reliable and valid measures of speech comprehensibility are achievable with the resources available to many researchers and some clinicians. PMID:27299989

  8. Systematic studies of modified vocalization: effects of speech rate and instatement style during metronome stimulation.

    PubMed

    Davidow, Jason H; Bothe, Anne K; Richardson, Jessica D; Andreatta, Richard D

    2010-12-01

    This study introduces a series of systematic investigations intended to clarify the parameters of the fluency-inducing conditions (FICs) in stuttering. Participants included 11 adults, aged 20-63 years, with typical speech-production skills. A repeated measures design was used to examine the relationships between several speech production variables (vowel duration, voice onset time, fundamental frequency, intraoral pressure, pressure rise time, transglottal airflow, and phonated intervals) and speech rate and instatement style during metronome-entrained rhythmic speech. Measures of duration (vowel duration, voice onset time, and pressure rise time) differed across different metronome conditions. When speech rates were matched between the control condition and metronome condition, voice onset time was the only variable that changed. Results confirm that speech rate and instatement style can influence speech production variables during the production of fluency-inducing conditions. Future studies of normally fluent speech and of stuttered speech must control both features and should further explore the importance of voice onset time, which may be influenced by rate during metronome stimulation in a way that the other variables are not.

  9. Unequal effects of speech and nonspeech contexts on the perceptual normalization of Cantonese level tones.

    PubMed

    Zhang, Caicai; Peng, Gang; Wang, William S-Y

    2012-08-01

    Context is important for recovering language information from talker-induced variability in acoustic signals. In tone perception, previous studies reported similar effects of speech and nonspeech contexts in Mandarin, supporting a general perceptual mechanism underlying tone normalization. However, no supportive evidence was obtained in Cantonese, also a tone language. Moreover, no study has compared speech and nonspeech contexts in the multi-talker condition, which is essential for exploring the normalization mechanism of inter-talker variability in speaking F0. The other question is whether a talker's full F0 range and mean F0 equally facilitate normalization. To answer these questions, this study examines the effects of four context conditions (speech/nonspeech × F0 contour/mean F0) in the multi-talker condition in Cantonese. Results show that raising and lowering the F0 of speech contexts change the perception of identical stimuli from mid level tone to low and high level tone, whereas nonspeech contexts only mildly increase the identification preference. It supports the speech-specific mechanism of tone normalization. Moreover, speech context with flattened F0 trajectory, which neutralizes cues of a talker's full F0 range, fails to facilitate normalization in some conditions, implying that a talker's mean F0 is less efficient for minimizing talker-induced lexical ambiguity in tone perception.

  10. The Relationship between Speech Rate and Memory Span in Children.

    ERIC Educational Resources Information Center

    Henry, Lucy A.

    1994-01-01

    Examined whether speech rate is related to the amount recalled and if developmental increases in speech rate allow faster rehearsal with age, and hence, greater recall. Found that the group relationship was clear and replicable but that speech rates of individual children were not good predictors of those children's memory spans; age was found to…

  11. Connected word recognition using a cascaded neuro-computational model

    NASA Astrophysics Data System (ADS)

    Hoya, Tetsuya; van Leeuwen, Cees

    2016-10-01

    We propose a novel framework for processing a continuous speech stream that contains a varying number of words, as well as non-speech periods. Speech samples are segmented into word-tokens and non-speech periods. An augmented version of an earlier-proposed, cascaded neuro-computational model is used for recognising individual words within the stream. Simulation studies using both a multi-speaker-dependent and speaker-independent digit string database show that the proposed method yields a recognition performance comparable to that obtained by a benchmark approach using hidden Markov models with embedded training.

  12. Collaborative Signaling of Informational Structures by Dynamic Speech Rate.

    ERIC Educational Resources Information Center

    Koiso, Hanae; Shimojima, Atsushi; Katagiri, Yasuhiro

    1998-01-01

    Investigated the functions of dynamic speech rates as contextualization cues in conversational Japanese, examining five spontaneous task-oriented dialogs and analyzing the potential of speech-rate changes in signaling the structure of the information being exchanged. Results found a correlation between speech decelerations and the openings of new…

  13. Merlot Design

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahern, S D

    2003-06-10

    We describe Merlot, a system for delivery of digital imagery over high speed networks. We describe various use cases, the client/server interaction, and the image and network codecs. We also describe some possible applications using Merlot and future work.

  14. Speech Rate Normalization and Phonemic Boundary Perception in Cochlear-Implant Users.

    PubMed

    Jaekel, Brittany N; Newman, Rochelle S; Goupell, Matthew J

    2017-05-24

    Normal-hearing (NH) listeners rate normalize, temporarily remapping phonemic category boundaries to account for a talker's speech rate. It is unknown if adults who use auditory prostheses called cochlear implants (CI) can rate normalize, as CIs transmit degraded speech signals to the auditory nerve. Ineffective adjustment to rate information could explain some of the variability in this population's speech perception outcomes. Phonemes with manipulated voice-onset-time (VOT) durations were embedded in sentences with different speech rates. Twenty-three CI and 29 NH participants performed a phoneme identification task. NH participants heard the same unprocessed stimuli as the CI participants or stimuli degraded by a sine vocoder, simulating aspects of CI processing. CI participants showed larger rate normalization effects (6.6 ms) than the NH participants (3.7 ms) and had shallower (less reliable) category boundary slopes. NH participants showed similarly shallow slopes when presented acoustically degraded vocoded signals, but an equal or smaller rate effect in response to reductions in available spectral and temporal information. CI participants can rate normalize, despite their degraded speech input, and show a larger rate effect compared to NH participants. CI participants may particularly rely on rate normalization to better maintain perceptual constancy of the speech signal.

  15. 76 FR 44326 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-25

    ... Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities; Structure and Practices of the Video Relay Service Program AGENCY: Federal Communications Commission. ACTION...-minute video relay service (``VRS'') compensation rates, and adopts per-minute compensation rates for the...

  16. Automated Speech Rate Measurement in Dysarthria

    ERIC Educational Resources Information Center

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  17. The Levels of Speech Usage rating scale: comparison of client self-ratings with speech pathologist ratings.

    PubMed

    Gray, Christina; Baylor, Carolyn; Eadie, Tanya; Kendall, Diane; Yorkston, Kathryn

    2012-01-01

    The term 'speech usage' refers to what people want or need to do with their speech to fulfil the communication demands in their life roles. Speech-language pathologists (SLPs) need to know about clients' speech usage to plan appropriate interventions to meet their life participation goals. The Levels of Speech Usage is a categorical scale intended for client self-report of speech usage, but SLPs may want the option to use it as a proxy-report tool. The relationship between self-report and clinician ratings should be examined before the instrument is used in a proxy format. The primary purpose of this study was to compare client self-ratings with SLP ratings on the Levels of Speech Usage scale. The secondary purpose was to determine if the SLP ratings differed depending on whether or not the SLPs knew about the clients' medical condition. Self-ratings of adults with communication disorders on the Levels of Speech Usage scale were available from prior research. Vignettes about these individuals were created from existing data. Two sets of vignettes were created. One set contained information about demographic information, living situation, occupational status and hobbies or social activities. The second set was identical to the first with the addition of information about the clients' medical conditions and communication disorders. Various communication disorders were represented including dysarthria, voice disorders, laryngectomy, and mild cognitive and language disorders. Sixty SLPs were randomly divided into two groups with each group rating one set of vignettes. The task was completed online. While this does not replicate typical in-person clinical interactions, it was a feasible method for this study. For data analysis, the client self-ratings were considered fixed points and the percentage of SLP ratings in agreement with the self-ratings was calculated. The percentage of SLP ratings in exact agreement with client self-ratings was 44.9%. Agreement was lowest for the less-demanding speech usage categories and highest for the most demanding usage category. There was no significant difference between the two groups of SLPs based on knowledge of medical condition. SLPs often need to document the speech usage levels of clients. This study suggests the potential for SLPs to misjudge how clients see their own speech demands. Further research is needed to determine if similar results would be found in actual clinical interactions. Until then, SLPs should seek the input of their clients when using this instrument. © 2012 Royal College of Speech and Language Therapists.

  18. Effect of perceptual load on semantic access by speech in children

    PubMed Central

    Jerger, Susan; Damian, Markus F.; Mills, Candice; Bartlett, James; Tye-Murray, Nancy; Abdi, Hervè

    2013-01-01

    Purpose To examine whether semantic access by speech requires attention in children. Method Children (N=200) named pictures and ignored distractors on a cross-modal (distractors: auditory-no face) or multi-modal (distractors: auditory-static face and audiovisual-dynamic face) picture word task. The cross-modal had a low load, and the multi-modal had a high load [i.e., respectively naming pictures displayed 1) on a blank screen vs 2) below the talker’s face on his T-shirt]. Semantic content of distractors was manipulated to be related vs unrelated to picture (e.g., picture dog with distractors bear vs cheese). Lavie's (2005) perceptual load model proposes that semantic access is independent of capacity limited attentional resources if irrelevant semantic-content manipulation influences naming times on both tasks despite variations in loads but dependent on attentional resources exhausted by higher load task if irrelevant content influences naming only on cross-modal (low load). Results Irrelevant semantic content affected performance for both tasks in 6- to 9-year-olds, but only on cross-modal in 4–5-year-olds. The addition of visual speech did not influence results on the multi-modal task. Conclusion Younger and older children differ in dependence on attentional resources for semantic access by speech. PMID:22896045

  19. Long-term temporal tracking of speech rate affects spoken-word recognition.

    PubMed

    Baese-Berk, Melissa M; Heffner, Christopher C; Dilley, Laura C; Pitt, Mark A; Morrill, Tuuli H; McAuley, J Devin

    2014-08-01

    Humans unconsciously track a wide array of distributional characteristics in their sensory environment. Recent research in spoken-language processing has demonstrated that the speech rate surrounding a target region within an utterance influences which words, and how many words, listeners hear later in that utterance. On the basis of hypotheses that listeners track timing information in speech over long timescales, we investigated the possibility that the perception of words is sensitive to speech rate over such a timescale (e.g., an extended conversation). Results demonstrated that listeners tracked variation in the overall pace of speech over an extended duration (analogous to that of a conversation that listeners might have outside the lab) and that this global speech rate influenced which words listeners reported hearing. The effects of speech rate became stronger over time. Our findings are consistent with the hypothesis that neural entrainment by speech occurs on multiple timescales, some lasting more than an hour. © The Author(s) 2014.

  20. SISL (ScreeningsInstrument Schisis Leuven): assessment of cleft palate speech, resonance and myofunction.

    PubMed

    Breuls, M; Sell, D; Manders, E; Boulet, E; Vander Poorten, V

    2006-01-01

    This paper presents an assessment protocol for the evaluation and description of speech, resonance and myofunctional characteristics commonly associated with cleft palate and/or velopharyngeal dysfunction. The protocol is partly based on the GOS.SP.ASS'98 and adapted to Flemish. It focuses on the relevant aspects of cleft type speech necessary to facilitate assessment, adequate diagnosis and management planning in a multi-disciplinary setting of cleft team care.

  1. Combinatorial Markov Random Fields and Their Applications to Information Organization

    DTIC Science & Technology

    2008-02-01

    titles, part-of- speech tags; • Image processing: images, colors, texture, blobs, interest points, caption words; • Video processing: video signal, audio...McGurk and MacDonald published their pioneering work [80] that revealed the multi-modal nature of speech perception: sound and moving lips compose one... Speech (POS) n-grams (that correspond to the syntactic structure of text). POS n-grams are extracted from sentences in an incremental manner: the first n

  2. Rate and rhythm control strategies for apraxia of speech in nonfluent primary progressive aphasia.

    PubMed

    Beber, Bárbara Costa; Berbert, Monalise Costa Batista; Grawer, Ruth Siqueira; Cardoso, Maria Cristina de Almeida Freitas

    2018-01-01

    The nonfluent/agrammatic variant of primary progressive aphasia is characterized by apraxia of speech and agrammatism. Apraxia of speech limits patients' communication due to slow speaking rate, sound substitutions, articulatory groping, false starts and restarts, segmentation of syllables, and increased difficulty with increasing utterance length. Speech and language therapy is known to benefit individuals with apraxia of speech due to stroke, but little is known about its effects in primary progressive aphasia. This is a case report of a 72-year-old, illiterate housewife, who was diagnosed with nonfluent primary progressive aphasia and received speech and language therapy for apraxia of speech. Rate and rhythm control strategies for apraxia of speech were trained to improve initiation of speech. We discuss the importance of these strategies to alleviate apraxia of speech in this condition and the future perspectives in the area.

  3. Real-time demonstration hardware for enhanced DPCM video compression algorithm

    NASA Technical Reports Server (NTRS)

    Bizon, Thomas P.; Whyte, Wayne A., Jr.; Marcopoli, Vincent R.

    1992-01-01

    The lack of available wideband digital links as well as the complexity of implementation of bandwidth efficient digital video CODECs (encoder/decoder) has worked to keep the cost of digital television transmission too high to compete with analog methods. Terrestrial and satellite video service providers, however, are now recognizing the potential gains that digital video compression offers and are proposing to incorporate compression systems to increase the number of available program channels. NASA is similarly recognizing the benefits of and trend toward digital video compression techniques for transmission of high quality video from space and therefore, has developed a digital television bandwidth compression algorithm to process standard National Television Systems Committee (NTSC) composite color television signals. The algorithm is based on differential pulse code modulation (DPCM), but additionally utilizes a non-adaptive predictor, non-uniform quantizer and multilevel Huffman coder to reduce the data rate substantially below that achievable with straight DPCM. The non-adaptive predictor and multilevel Huffman coder combine to set this technique apart from other DPCM encoding algorithms. All processing is done on a intra-field basis to prevent motion degradation and minimize hardware complexity. Computer simulations have shown the algorithm will produce broadcast quality reconstructed video at an average transmission rate of 1.8 bits/pixel. Hardware implementation of the DPCM circuit, non-adaptive predictor and non-uniform quantizer has been completed, providing realtime demonstration of the image quality at full video rates. Video sampling/reconstruction circuits have also been constructed to accomplish the analog video processing necessary for the real-time demonstration. Performance results for the completed hardware compare favorably with simulation results. Hardware implementation of the multilevel Huffman encoder/decoder is currently under development along with implementation of a buffer control algorithm to accommodate the variable data rate output of the multilevel Huffman encoder. A video CODEC of this type could be used to compress NTSC color television signals where high quality reconstruction is desirable (e.g., Space Station video transmission, transmission direct-to-the-home via direct broadcast satellite systems or cable television distribution to system headends and direct-to-the-home).

  4. Compressed Speech Technology: Implications for Learning and Instruction.

    ERIC Educational Resources Information Center

    Sullivan, LeRoy L.

    This paper first traces the historical development of speech compression technology, which has made it possible to alter the spoken rate of a pre-recorded message without excessive distortion. Terms used to describe techniques employed as the technology evolved are discussed, including rapid speech, rate altered speech, cut-and-spliced speech, and…

  5. An Improved SEL Test of the ADV212 Video Codec

    NASA Technical Reports Server (NTRS)

    Wilcox, Edward P.; Campola, Michael J.; Nadendla, Seshagiri; Kadari, Madhusudhan; Gigliuto, Robert A.

    2017-01-01

    Single-event effect (SEE) test data is presented on the Analog Devices ADV212. Focus is given to the test setup used to improve data quality and validate single-event latch-up (SEL) protection circuitry.

  6. An Improved SEL Test of the ADV212 Video Codec

    NASA Technical Reports Server (NTRS)

    Wilcox, Edward P; Campola, Michael J.; Nadendla, Seshagiri; Kadari, Madhusudhan; Gigliuto, Robert A.

    2017-01-01

    Single-event effect (SEE) test data is presented on the Analog Devices ADV212. Focus is given to the test setup used to improve data quality and validate single-event latchup (SEL) protection circuitry.

  7. High-speed low-complexity video coding with EDiCTius: a DCT coding proposal for JPEG XS

    NASA Astrophysics Data System (ADS)

    Richter, Thomas; Fößel, Siegfried; Keinert, Joachim; Scherl, Christian

    2017-09-01

    In its 71th meeting, the JPEG committee issued a call for low complexity, high speed image coding, designed to address the needs of low-cost video-over-ip applications. As an answer to this call, Fraunhofer IIS and the Computing Center of the University of Stuttgart jointly developed an embedded DCT image codec requiring only minimal resources while maximizing throughput on FPGA and GPU implementations. Objective and subjective tests performed for the 73rd meeting confirmed its excellent performance and suitability for its purpose, and it was selected as one of the two key contributions for the development of a joined test model. In this paper, its authors describe the design principles of the codec, provide a high-level overview of the encoder and decoder chain and provide evaluation results on the test corpus selected by the JPEG committee.

  8. The Interrelationships between Ratings of Speech and Facial Acceptability in Persons with Cleft Palate.

    ERIC Educational Resources Information Center

    Sinko, Garnet R.; Hedrick, Dona L.

    1982-01-01

    Thirty untrained young adult observers rated the speech and facial acceptablity of 20 speakers with cleft palate. The observers were reliable in rating both speech and facial acceptability. Judgments of facial acceptability were generally more positive, suggesting that speech is generally judged more negatively in speakers with cleft palate.…

  9. Untrained listeners' ratings of speech disorders in a group with cleft palate: a comparison with speech and language pathologists' ratings.

    PubMed

    Brunnegård, Karin; Lohmander, Anette; van Doorn, Jan

    2009-01-01

    Hypernasal resonance, audible nasal air emission and/or nasal turbulence, and articulation errors are typical speech disorders associated with the speech of children with cleft lip and palate. Several studies indicate that hypernasal resonance tends to be perceived negatively by listeners. Most perceptual studies of speech disorders related to cleft palate are carried out with speech and language pathologists as listeners, whereas only a few studies have been conducted to explore how judgements by untrained listeners compare with expert assessments. These types of studies can be used to determine whether children for whom speech and language pathologists recommend intervention have a significant speech deviance that is also detected by untrained listeners. To compare ratings by untrained listeners with ratings by speech and language pathologists for cleft palate speech. An assessment form for untrained listeners was developed using statements and a five-point scale. The assessment form was tailored to facilitate comparison with expert judgements. Twenty-eight untrained listeners assessed the speech of 26 speakers with cleft palate and ten speakers without cleft in a comparison group. This assessment was compared with the joint assessment of two expert speech and language pathologists. Listener groups generally agreed on which speakers were nasal. The untrained listeners detected hyper- and hyponasality when it was present in speech and considered moderate to severe hypernasality to be serious enough to call for intervention. The expert listeners assessed audible nasal air emission and/or nasal turbulence to be present in twice as many speakers as the untrained listeners who were much less sensitive to audible nasal air emission and/or nasal turbulence. The results of untrained listeners' ratings in this study in the main confirm the ratings of speech and language pathologists and show that cleft palate speech disorders may have an impact in the everyday life of the speaker.

  10. Speech Rate Normalization and Phonemic Boundary Perception in Cochlear-Implant Users

    PubMed Central

    Newman, Rochelle S.; Goupell, Matthew J.

    2017-01-01

    Purpose Normal-hearing (NH) listeners rate normalize, temporarily remapping phonemic category boundaries to account for a talker's speech rate. It is unknown if adults who use auditory prostheses called cochlear implants (CI) can rate normalize, as CIs transmit degraded speech signals to the auditory nerve. Ineffective adjustment to rate information could explain some of the variability in this population's speech perception outcomes. Method Phonemes with manipulated voice-onset-time (VOT) durations were embedded in sentences with different speech rates. Twenty-three CI and 29 NH participants performed a phoneme identification task. NH participants heard the same unprocessed stimuli as the CI participants or stimuli degraded by a sine vocoder, simulating aspects of CI processing. Results CI participants showed larger rate normalization effects (6.6 ms) than the NH participants (3.7 ms) and had shallower (less reliable) category boundary slopes. NH participants showed similarly shallow slopes when presented acoustically degraded vocoded signals, but an equal or smaller rate effect in response to reductions in available spectral and temporal information. Conclusion CI participants can rate normalize, despite their degraded speech input, and show a larger rate effect compared to NH participants. CI participants may particularly rely on rate normalization to better maintain perceptual constancy of the speech signal. PMID:28395319

  11. Don’t speak too fast! Processing of fast rate speech in children with specific language impairment

    PubMed Central

    Bedoin, Nathalie; Krifi-Papoz, Sonia; Herbillon, Vania; Caillot-Bascoul, Aurélia; Gonzalez-Monge, Sibylle; Boulenger, Véronique

    2018-01-01

    Background Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Method Sixteen French children with SLI (8–13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d′) to semantically incongruent sentence-final words was measured. Results Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. Conclusion In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception. PMID:29373610

  12. Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

    PubMed Central

    Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

    2016-01-01

    Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714

  13. Speech rate in Parkinson's disease: A controlled study.

    PubMed

    Martínez-Sánchez, F; Meilán, J J G; Carro, J; Gómez Íñiguez, C; Millian-Morell, L; Pujante Valverde, I M; López-Alburquerque, T; López, D E

    2016-09-01

    Speech disturbances will affect most patients with Parkinson's disease (PD) over the course of the disease. The origin and severity of these symptoms are of clinical and diagnostic interest. To evaluate the clinical pattern of speech impairment in PD patients and identify significant differences in speech rate and articulation compared to control subjects. Speech rate and articulation in a reading task were measured using an automatic analytical method. A total of 39 PD patients in the 'on' state and 45 age-and sex-matched asymptomatic controls participated in the study. None of the patients experienced dyskinesias or motor fluctuations during the test. The patients with PD displayed a significant reduction in speech and articulation rates; there were no significant correlations between the studied speech parameters and patient characteristics such as L-dopa dose, duration of the disorder, age, and UPDRS III scores and Hoehn & Yahr scales. Patients with PD show a characteristic pattern of declining speech rate. These results suggest that in PD, disfluencies are the result of the movement disorder affecting the physiology of speech production systems. Copyright © 2014 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.

  14. Stuttering Frequency, Speech Rate, Speech Naturalness, and Speech Effort During the Production of Voluntary Stuttering.

    PubMed

    Davidow, Jason H; Grossman, Heather L; Edge, Robin L

    2018-05-01

    Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.

  15. The speech naturalness of people who stutter speaking under delayed auditory feedback as perceived by different groups of listeners.

    PubMed

    Van Borsel, John; Eeckhout, Hannelore

    2008-09-01

    This study investigated listeners' perception of the speech naturalness of people who stutter (PWS) speaking under delayed auditory feedback (DAF) with particular attention for possible listener differences. Three panels of judges consisting of 14 stuttering individuals, 14 speech language pathologists, and 14 naive listeners rated the naturalness of speech samples of stuttering and non-stuttering individuals using a 9-point interval scale. Results clearly indicate that these three groups evaluate naturalness differently. Naive listeners appear to be more severe in their judgements than speech language pathologists and stuttering listeners, and speech language pathologists are apparently more severe than PWS. The three listener groups showed similar trends with respect to the relationship between speech naturalness and speech rate. Results of all three indicated that for PWS, the slower a speaker's rate was, the less natural speech was judged to sound. The three listener groups also showed similar trends with regard to naturalness of the stuttering versus the non-stuttering individuals. All three panels considered the speech of the non-stuttering participants more natural. The reader will be able to: (1) discuss the speech naturalness of people who stutter speaking under delayed auditory feedback, (2) discuss listener differences about the naturalness of people who stutter speaking under delayed auditory feedback, and (3) discuss the importance of speech rate for the naturalness of speech.

  16. New procedures to evaluate visually lossless compression for display systems

    NASA Astrophysics Data System (ADS)

    Stolitzka, Dale F.; Schelkens, Peter; Bruylants, Tim

    2017-09-01

    Visually lossless image coding in isochronous display streaming or plesiochronous networks reduces link complexity and power consumption and increases available link bandwidth. A new set of codecs developed within the last four years promise a new level of coding quality, but require new techniques that are sufficiently sensitive to the small artifacts or color variations induced by this new breed of codecs. This paper begins with a summary of the new ISO/IEC 29170-2, a procedure for evaluation of lossless coding and reports the new work by JPEG to extend the procedure in two important ways, for HDR content and for evaluating the differences between still images, panning images and image sequences. ISO/IEC 29170-2 relies on processing test images through a well-defined process chain for subjective, forced-choice psychophysical experiments. The procedure sets an acceptable quality level equal to one just noticeable difference. Traditional image and video coding evaluation techniques, such as, those used for television evaluation have not proven sufficiently sensitive to the small artifacts that may be induced by this breed of codecs. In 2015, JPEG received new requirements to expand evaluation of visually lossless coding for high dynamic range images, slowly moving images, i.e., panning, and image sequences. These requirements are the basis for new amendments of the ISO/IEC 29170-2 procedures described in this paper. These amendments promise to be highly useful for the new content in television and cinema mezzanine networks. The amendments passed the final ballot in April 2017 and are on track to be published in 2018.

  17. Language-independent talker-specificity in first-language and second-language speech production by bilingual talkers: L1 speaking rate predicts L2 speaking rate

    PubMed Central

    Bradlow, Ann R.; Kim, Midam; Blasingame, Michael

    2017-01-01

    Second-language (L2) speech is consistently slower than first-language (L1) speech, and L1 speaking rate varies within- and across-talkers depending on many individual, situational, linguistic, and sociolinguistic factors. It is asked whether speaking rate is also determined by a language-independent talker-specific trait such that, across a group of bilinguals, L1 speaking rate significantly predicts L2 speaking rate. Two measurements of speaking rate were automatically extracted from recordings of read and spontaneous speech by English monolinguals (n = 27) and bilinguals from ten L1 backgrounds (n = 86): speech rate (syllables/second), and articulation rate (syllables/second excluding silent pauses). Replicating prior work, L2 speaking rates were significantly slower than L1 speaking rates both across-groups (monolinguals' L1 English vs bilinguals' L2 English), and across L1 and L2 within bilinguals. Critically, within the bilingual group, L1 speaking rate significantly predicted L2 speaking rate, suggesting that a significant portion of inter-talker variation in L2 speech is derived from inter-talker variation in L1 speech, and that individual variability in L2 spoken language production may be best understood within the context of individual variability in L1 spoken language production. PMID:28253679

  18. A Hierarchical multi-input and output Bi-GRU Model for Sentiment Analysis on Customer Reviews

    NASA Astrophysics Data System (ADS)

    Zhang, Liujie; Zhou, Yanquan; Duan, Xiuyu; Chen, Ruiqi

    2018-03-01

    Multi-label sentiment classification on customer reviews is a practical challenging task in Natural Language Processing. In this paper, we propose a hierarchical multi-input and output model based bi-directional recurrent neural network, which both considers the semantic and lexical information of emotional expression. Our model applies two independent Bi-GRU layer to generate part of speech and sentence representation. Then the lexical information is considered via attention over output of softmax activation on part of speech representation. In addition, we combine probability of auxiliary labels as feature with hidden layer to capturing crucial correlation between output labels. The experimental result shows that our model is computationally efficient and achieves breakthrough improvements on customer reviews dataset.

  19. Acoustic properties of naturally produced clear speech at normal speaking rates

    NASA Astrophysics Data System (ADS)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  20. [Speech fluency developmental profile in Brazilian Portuguese speakers].

    PubMed

    Martins, Vanessa de Oliveira; Andrade, Claudia Regina Furquim de

    2008-01-01

    speech fluency varies from one individual to the next, fluent or stutterer, depending on several factors. Studies that investigate the influence of age on fluency patterns have been identified; however these differences were investigated in isolated age groups. Studies about life span fluency variations were not found. to verify the speech fluency developmental profile. speech samples of 594 fluent participants of both genders, with ages between 2:0 and 99:11 years, speakers of the Brazilian Portuguese language, were analyzed. Participants were grouped as follows: pre-scholars, scholars, early adolescence, late adolescence, adults and elderlies. Speech samples were analyzed according to the Speech Fluency Profile variables and were compared regarding: typology of speech disruptions (typical and less typical), speech rate (words and syllables per minute) and frequency of speech disruptions (percentage of speech discontinuity). although isolated variations were identified, overall there was no significant difference between the age groups for the speech disruption indexes (typical and less typical speech disruptions and percentage of speech discontinuity). Significant differences were observed between the groups when considering speech rate. the development of the neurolinguistic system for speech fluency, in terms of speech disruptions, seems to stabilize itself during the first years of life, presenting no alterations during the life span. Indexes of speech rate present variations in the age groups, indicating patterns of acquisition, development, stabilization and degeneration.

  1. The Role of Clinical Experience in Speech-Language Pathologists' Perception of Subphonemic Detail in Children's Speech

    PubMed Central

    Munson, Benjamin; Johnson, Julie M.; Edwards, Jan

    2013-01-01

    Purpose This study examined whether experienced speech-language pathologists differ from inexperienced people in their perception of phonetic detail in children's speech. Method Convenience samples comprising 21 experienced speech-language pathologist and 21 inexperienced listeners participated in a series of tasks in which they made visual-analog scale (VAS) ratings of children's natural productions of target /s/-/θ/, /t/-/k/, and /d/-/ɡ/ in word-initial position. Listeners rated the perception distance between individual productions and ideal productions. Results The experienced listeners' ratings differed from inexperienced listeners' in four ways: they had higher intra-rater reliability, they showed less bias toward a more frequent sound, their ratings were more closely related to the acoustic characteristics of the children's speech, and their responses were related to a different set of predictor variables. Conclusions Results suggest that experience working as a speech-language pathologist leads to better perception of phonetic detail in children's speech. Limitations and future research are discussed. PMID:22230182

  2. Oral Motor Abilities Are Task Dependent: A Factor Analytic Approach to Performance Rate.

    PubMed

    Staiger, Anja; Schölderle, Theresa; Brendel, Bettina; Bötzel, Kai; Ziegler, Wolfram

    2017-01-01

    Measures of performance rates in speech-like or volitional nonspeech oral motor tasks are frequently used to draw inferences about articulation rate abnormalities in patients with neurologic movement disorders. The study objective was to investigate the structural relationship between rate measures of speech and of oral motor behaviors different from speech. A total of 130 patients with neurologic movement disorders and 130 healthy subjects participated in the study. Rate data was collected for oral reading (speech), rapid syllable repetition (speech-like), and rapid single articulator movements (nonspeech). The authors used factor analysis to determine whether the different rate variables reflect the same or distinct constructs. The behavioral data were most appropriately captured by a measurement model in which the different task types loaded onto separate latent variables. The data on oral motor performance rates show that speech tasks and oral motor tasks such as rapid syllable repetition or repetitive single articulator movements measure separate traits.

  3. Adaptive data rate SSMA system for personal and mobile satellite communications

    NASA Technical Reports Server (NTRS)

    Ikegami, Tetsushi; Takahashi, Takashi; Arakaki, Yoshiya; Wakana, Hiromitsu

    1995-01-01

    An adaptive data rate SSMA (spread spectrum multiple access) system is proposed for mobile and personal multimedia satellite communications without the aid of system control earth stations. This system has a constant occupied bandwidth and has variable data rates and processing gains to mitigate communication link impairments such as fading, rain attenuation and interference as well as to handle variable data rate on demand. Proof of concept hardware for 6MHz bandwidth transponder is developed, that uses offset-QPSK (quadrature phase shift keying) and MSK (minimum shift keying) for direct sequence spread spectrum modulation and handle data rates of 4k to 64kbps. The RS422 data interface, low rate voice and H.261 video codecs are installed. The receiver is designed with coherent matched filter technique to achieve fast code acquisition, AFC (automatic frequency control) and coherent detection with minimum hardware losses in a single matched filter circuit. This receiver structure facilitates variable data rate on demand during a call. This paper shows the outline of the proposed system and the performance of the prototype equipment.

  4. Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech.

    PubMed

    Ben-David, Boaz M; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H H M

    2016-02-01

    Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5 discrete emotions (anger, fear, happiness, sadness, and neutral) presented in prosody and semantics. Listeners were asked to rate the sentence as a whole, integrating both speech channels, or to focus on one channel only (prosody or semantics). We observed supremacy of congruency, failure of selective attention, and prosodic dominance. Supremacy of congruency means that a sentence that presents the same emotion in both speech channels was rated highest; failure of selective attention means that listeners were unable to selectively attend to one channel when instructed; and prosodic dominance means that prosodic information plays a larger role than semantics in processing emotional speech. Emotional prosody and semantics are separate but not separable channels, and it is difficult to perceive one without the influence of the other. Our findings indicate that the Test for Rating of Emotions in Speech can reveal specific aspects in the processing of emotional speech and may in the future prove useful for understanding emotion-processing deficits in individuals with pathologies.

  5. A real-time phoneme counting algorithm and application for speech rate monitoring.

    PubMed

    Aharonson, Vered; Aharonson, Eran; Raichlin-Levi, Katia; Sotzianu, Aviv; Amir, Ofer; Ovadia-Blechman, Zehava

    2017-03-01

    Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient's speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient's practice. The algorithm's phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of -4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Speech Rate Normalization and Phonemic Boundary Perception in Cochlear-Implant Users

    ERIC Educational Resources Information Center

    Jaekel, Brittany N.; Newman, Rochelle S.; Goupell, Matthew J.

    2017-01-01

    Purpose: Normal-hearing (NH) listeners rate normalize, temporarily remapping phonemic category boundaries to account for a talker's speech rate. It is unknown if adults who use auditory prostheses called cochlear implants (CI) can rate normalize, as CIs transmit degraded speech signals to the auditory nerve. Ineffective adjustment to rate…

  7. The effect of presentation level and stimulation rate on speech perception and modulation detection for cochlear implant users.

    PubMed

    Brochier, Tim; McDermott, Hugh J; McKay, Colette M

    2017-06-01

    In order to improve speech understanding for cochlear implant users, it is important to maximize the transmission of temporal information. The combined effects of stimulation rate and presentation level on temporal information transfer and speech understanding remain unclear. The present study systematically varied presentation level (60, 50, and 40 dBA) and stimulation rate [500 and 2400 pulses per second per electrode (pps)] in order to observe how the effect of rate on speech understanding changes for different presentation levels. Speech recognition in quiet and noise, and acoustic amplitude modulation detection thresholds (AMDTs) were measured with acoustic stimuli presented to speech processors via direct audio input (DAI). With the 500 pps processor, results showed significantly better performance for consonant-vowel nucleus-consonant words in quiet, and a reduced effect of noise on sentence recognition. However, no rate or level effect was found for AMDTs, perhaps partly because of amplitude compression in the sound processor. AMDTs were found to be strongly correlated with the effect of noise on sentence perception at low levels. These results indicate that AMDTs, at least when measured with the CP910 Freedom speech processor via DAI, explain between-subject variance of speech understanding, but do not explain within-subject variance for different rates and levels.

  8. Speech and language therapy in Sure Start Local Programmes: a survey-based analysis of practice and innovation.

    PubMed

    Fuller, Alison

    2010-01-01

    Sure Start has been a flagship policy for the UK Labour Government since 1998. Its aim was to improve the life chances of children under five years of age who live in areas of socio-economic disadvantage by means of multi-agency, multidisciplinary Sure Start Local Programmes (SSLPs). Speech and language therapists have played a key part in many SSLPs, and have had the opportunity to extend their roles. Despite the scrutiny paid to Sure Start, there has been no comprehensive analysis of speech and language therapists' contribution to date. Studies have focused on individual programmes or small samples: there has been no attempt to collate the full range of practice. As Sure Start evolved and Children's Centres emerged, it became vital to learn from the Sure Start experience and inform the mainstreaming of practice, before the window of opportunity closed. The survey aims were, firstly, to identify the range of practice amongst speech and language therapists working in SSLPs, highlighting new practice, and, secondly, to categorize the practices according to the tiered model of UK health and social services of the Royal College of Speech and Language Therapists (RCSLT 2006). An online mixed-method, semi-structured survey was designed to elicit primarily quantitative and categorical data. A total of 501 Sure Start Local Programmes were invited to take part. A total of 128 speech and language therapists responded, giving a response rate of 26%. A descriptive analysis of the response data was undertaken. A total of 103 respondents (80%) reported maintaining a clinical role as well as extending their roles to include preventative services. Of those 103 respondents, 69% were able to see referred children at a younger average age and 80% saw them more quickly than before Sure Start. A wide variety of preventative practice was identified. A widening of access to speech and language therapist was reported in terms of venues used and hours offered. Respondents reported on their use of evaluation or outcome measures, which was at a higher rate for new practice than for established practice. A total of 121 respondents (95%) reported at least one example of new practice; 103 (80%) reported at least one use of evaluation or outcome measures. The tiered model of UK health and social services provided an effective way of categorizing practice. A categorized record of Sure Start speech and language therapist is presented that may contribute to establishing a broad curriculum of practice for speech and language therapist in the early years. The effectiveness of the practices is not investigated: suggestions are made for further research to develop the evidence base. 2010 Royal College of Speech & Language Therapists.

  9. Loss tolerant speech decoder for telecommunications

    NASA Technical Reports Server (NTRS)

    Prieto, Jr., Jaime L. (Inventor)

    1999-01-01

    A method and device for extrapolating past signal-history data for insertion into missing data segments in order to conceal digital speech frame errors. The extrapolation method uses past-signal history that is stored in a buffer. The method is implemented with a device that utilizes a finite-impulse response (FIR) multi-layer feed-forward artificial neural network that is trained by back-propagation for one-step extrapolation of speech compression algorithm (SCA) parameters. Once a speech connection has been established, the speech compression algorithm device begins sending encoded speech frames. As the speech frames are received, they are decoded and converted back into speech signal voltages. During the normal decoding process, pre-processing of the required SCA parameters will occur and the results stored in the past-history buffer. If a speech frame is detected to be lost or in error, then extrapolation modules are executed and replacement SCA parameters are generated and sent as the parameters required by the SCA. In this way, the information transfer to the SCA is transparent, and the SCA processing continues as usual. The listener will not normally notice that a speech frame has been lost because of the smooth transition between the last-received, lost, and next-received speech frames.

  10. Children with a cochlear implant: characteristics and determinants of speech recognition, speech-recognition growth rate, and speech production.

    PubMed

    Wie, Ona Bø; Falkenberg, Eva-Signe; Tvete, Ole; Tomblin, Bruce

    2007-05-01

    The objectives of the study were to describe the characteristics of the first 79 prelingually deaf cochlear implant users in Norway and to investigate to what degree the variation in speech recognition, speech- recognition growth rate, and speech production could be explained by the characteristics of the child, the cochlear implant, the family, and the educational setting. Data gathered longitudinally were analysed using descriptive statistics, multiple regression, and growth-curve analysis. The results show that more than 50% of the variation could be explained by these characteristics. Daily user-time, non-verbal intelligence, mode of communication, length of CI experience, and educational placement had the highest effect on the outcome. The results also indicate that children educated in a bilingual approach to education have better speech perception and faster speech perception growth rate with increased focus on spoken language.

  11. Attentional Gain Control of Ongoing Cortical Speech Representations in a “Cocktail Party”

    PubMed Central

    Kerlin, Jess R.; Shahin, Antoine J.; Miller, Lee M.

    2010-01-01

    Normal listeners possess the remarkable perceptual ability to select a single speech stream among many competing talkers. However, few studies of selective attention have addressed the unique nature of speech as a temporally extended and complex auditory object. We hypothesized that sustained selective attention to speech in a multi-talker environment would act as gain control on the early auditory cortical representations of speech. Using high-density electroencephalography and a template-matching analysis method, we found selective gain to the continuous speech content of an attended talker, greatest at a frequency of 4–8 Hz, in auditory cortex. In addition, the difference in alpha power (8–12 Hz) at parietal sites across hemispheres indicated the direction of auditory attention to speech, as has been previously found in visual tasks. The strength of this hemispheric alpha lateralization, in turn, predicted an individual’s attentional gain of the cortical speech signal. These results support a model of spatial speech stream segregation, mediated by a supramodal attention mechanism, enabling selection of the attended representation in auditory cortex. PMID:20071526

  12. Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech

    ERIC Educational Resources Information Center

    Ben-David, Boaz M.; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H. H. M.

    2016-01-01

    Purpose: Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. Method: We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5…

  13. Optimal speech level for speech transmission in a noisy environment for young adults and aged persons

    NASA Astrophysics Data System (ADS)

    Sato, Hayato; Ota, Ryo; Morimoto, Masayuki; Sato, Hiroshi

    2005-04-01

    Assessing sound environment of classrooms for the aged is a very important issue, because classrooms can be used by the aged for their lifelong learning, especially in the aged society. Hence hearing loss due to aging is a considerable factor for classrooms. In this study, the optimal speech level in noisy fields for both young adults and aged persons was investigated. Listening difficulty ratings and word intelligibility scores for familiar words were used to evaluate speech transmission performance. The results of the tests demonstrated that the optimal speech level for moderate background noise (i.e., less than around 60 dBA) was fairly constant. Meanwhile, the optimal speech level depended on the speech-to-noise ratio when the background noise level exceeded around 60 dBA. The minimum required speech level to minimize difficulty ratings for the aged was higher than that for the young. However, the minimum difficulty ratings for both the young and the aged were given in the range of speech level of 70 to 80 dBA of speech level.

  14. Validity and Reliability of Visual Analog Scaling for Assessment of Hypernasality and Audible Nasal Emission in Children With Repaired Cleft Palate.

    PubMed

    Baylis, Adriane; Chapman, Kathy; Whitehill, Tara L; Group, The Americleft Speech

    2015-11-01

    To investigate the validity and reliability of multiple listener judgments of hypernasality and audible nasal emission, in children with repaired cleft palate, using visual analog scaling (VAS) and equal-appearing interval (EAI) scaling. Prospective comparative study of multiple listener ratings of hypernasality and audible nasal emission. Multisite institutional. Five trained and experienced speech-language pathologist listeners from the Americleft Speech Project. Average VAS and EAI ratings of hypernasality and audible nasal emission/turbulence for 12 video-recorded speech samples from the Americleft Speech Project. Intrarater and interrater reliability was computed, as well as linear and polynomial models of best fit. Intrarater and interrater reliability was acceptable for both rating methods; however, reliability was higher for VAS as compared to EAI ratings. When VAS ratings were plotted against EAI ratings, results revealed a stronger curvilinear relationship. The results of this study provide additional evidence that alternate rating methods such as VAS may offer improved validity and reliability over EAI ratings of speech. VAS should be considered a viable method for rating hypernasality and nasal emission in speech in children with repaired cleft palate.

  15. Soldier experiments and assessments using SPEAR speech control system for UGVs

    NASA Astrophysics Data System (ADS)

    Brown, Jonathan; Blanco, Chris; Czerniak, Jeffrey; Hoffman, Brian; Hoffman, Orin; Juneja, Amit; Ngia, Lester; Pruthi, Tarun; Liu, Dongqing

    2010-04-01

    This paper reports on a Soldier Experiment performed by the Army Research Lab's Human Research Engineering Directorate (HRED) Field Element located at the Maneuver Center of Excellence, Ft. Benning, and a Limited Use Assessment conducted by the Marine Corps Forces Pacific Command Experimentation Center (MEC) at Camp Pendleton evaluating the effectiveness of using speech commands to control an Unmanned Ground Vehicle. SPEAR, developed by Think-A-Move, Ltd., provides speech control of UGVs. SPEAR detects user speech in the ear canal with an earpiece containing an in-ear microphone. The system design provides up to 30 dB of passive noise reduction, enabling it to work well in high-noise environments, where traditional speech systems, using external microphones, fail; it also utilizes a proprietary speech recognition engine. SPEAR has been integrated with iRobot's PackBot 510 with FasTac Kit, and with Multi-Robot Operator Control Unit (MOCU), developed by SPAWAR Systems Center Pacific. These integrated systems allow speech to supplement the hand-controller for multi-modal control of different UGV functions simultaneously. HRED's experiment measured the impact of SPEAR on reducing the cognitive load placed on UGV Operators and the time to complete specific tasks. Army NCOs and Officer School Candidates participated in this experiment, which found that speech control was faster than manual control to complete tasks requiring menu navigation, as well as reducing the cognitive load on UGV Operators. The MEC assessment examined speech commands used for two different missions: Route Clearance and Cordon and Search; participants included Explosive Ordnance Disposal Technicians and Combat Engineers. The majority of the Marines thought it was easier to complete the mission scenarios with SPEAR than with only using manual controls, and that using SPEAR improved their situational awareness. Overall results of these Assessments are reported in the paper, along with possible applications to autonomous mine detection systems.

  16. Two different phenomena in basic motor speech performance in premanifest Huntington disease.

    PubMed

    Skodda, Sabine; Grönheit, Wenke; Lukas, Carsten; Bellenberg, Barbara; von Hein, Sarah M; Hoffmann, Rainer; Saft, Carsten

    2016-03-09

    Dysarthria is a common feature in Huntington disease (HD). The aim of this cross-sectional pilot study was the description and objective analysis of different speech parameters with special emphasis on the aspect of speech timing of connected speech and nonspeech verbal utterances in premanifest HD (preHD). A total of 28 preHD mutation carriers and 28 age- and sex-matched healthy speakers had to perform a reading task and several syllable repetition tasks. Results of computerized acoustic analysis of different variables for the measurement of speech rate and regularity were correlated with clinical measures and MRI-based brain atrophy assessment by voxel-based morphometry. An impaired capacity to steadily repeat single syllables with higher variations in preHD compared to healthy controls was found (variance 1: Cohen d = 1.46). Notably, speech rate was increased compared to controls and showed correlations to the volume of certain brain areas known to be involved in the sensory-motor speech networks (net speech rate: Cohen d = 1.19). Furthermore, speech rate showed correlations to disease burden score, probability of disease onset, the estimated years to onset, and clinical measures like the cognitive score. Measurement of speech rate and regularity might be helpful additional tools for the monitoring of subclinical functional disability in preHD. As one of the possible causes for higher performance in preHD, we discuss huntingtin-dependent temporarily advantageous development processes of the brain. © 2016 American Academy of Neurology.

  17. Post-treatment speech naturalness of comprehensive stuttering program clients and differences in ratings among listener groups.

    PubMed

    Teshima, Shelli; Langevin, Marilyn; Hagler, Paul; Kully, Deborah

    2010-03-01

    The purposes of this study were to investigate naturalness of the post-treatment speech of Comprehensive Stuttering Program (CSP) clients and differences in naturalness ratings by three listener groups. Listeners were 21 student speech-language pathologists, 9 community members, and 15 listeners who stutter. Listeners rated perceptually fluent speech samples of CSP clients obtained immediately post-treatment (Post) and at 5 years follow-up (F5), and speech samples of matched typically fluent (TF) speakers. A 9-point interval rating scale was used. A 3 (listener group)x2 (time)x2 (speaker) mixed ANOVA was used to test for differences among mean ratings. The difference between CSP Post and F5 mean ratings was statistically significant. The F5 mean rating was within the range reported for typically fluent speakers. Student speech-language pathologists were found to be less critical than community members and listeners who stutter in rating naturalness; however, there were no significant differences in ratings made by community members and listeners who stutter. Results indicate that the naturalness of post-treatment speech of CSP clients improves in the post-treatment period and that it is possible for clients to achieve levels of naturalness that appear to be acceptable to adults who stutter and that are within the range of naturalness ratings given to typically fluent speakers. Readers will be able to (a) summarize key findings of studies that have investigated naturalness ratings, and (b) interpret the naturalness ratings of Comprehensive Stuttering Program speaker samples and the ratings made by the three listener groups in this study.

  18. Effects of interior aircraft noise on speech intelligibility and annoyance

    NASA Technical Reports Server (NTRS)

    Pearsons, K. S.; Bennett, R. L.

    1977-01-01

    Recordings of the aircraft ambiance from ten different types of aircraft were used in conjunction with four distinct speech interference tests as stimuli to determine the effects of interior aircraft background levels and speech intelligibility on perceived annoyance in 36 subjects. Both speech intelligibility and background level significantly affected judged annoyance. However, the interaction between the two variables showed that above an 85 db background level the speech intelligibility results had a minimal effect on annoyance ratings. Below this level, people rated the background as less annoying if there was adequate speech intelligibility.

  19. [A modified speech enhancement algorithm for electronic cochlear implant and its digital signal processing realization].

    PubMed

    Wang, Yulin; Tian, Xuelong

    2014-08-01

    In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients.

  20. Affective Properties of Mothers' Speech to Infants With Hearing Impairment and Cochlear Implants

    PubMed Central

    Bergeson, Tonya R.; Xu, Huiping; Kitamura, Christine

    2015-01-01

    Purpose The affective properties of infant-directed speech influence the attention of infants with normal hearing to speech sounds. This study explored the affective quality of maternal speech to infants with hearing impairment (HI) during the 1st year after cochlear implantation as compared to speech to infants with normal hearing. Method Mothers of infants with HI and mothers of infants with normal hearing matched by age (NH-AM) or hearing experience (NH-EM) were recorded playing with their infants during 3 sessions over a 12-month period. Speech samples of 25 s were low-pass filtered, leaving intonation but not speech information intact. Sixty adults rated the stimuli along 5 scales: positive/negative affect and intention to express affection, to encourage attention, to comfort/soothe, and to direct behavior. Results Low-pass filtered speech to HI and NH-EM groups was rated as more positive, affective, and comforting compared with the such speech to the NH-AM group. Speech to infants with HI and with NH-AM was rated as more directive than speech to the NH-EM group. Mothers decreased affective qualities in speech to all infants but increased directive qualities in speech to infants with NH-EM over time. Conclusions Mothers fine-tune communicative intent in speech to their infant's developmental stage. They adjust affective qualities to infants' hearing experience rather than to chronological age but adjust directive qualities of speech to the chronological age of their infants. PMID:25679195

  1. Association of Orofacial Muscle Activity and Movement during Changes in Speech Rate and Intensity

    ERIC Educational Resources Information Center

    McClean, Michael D.; Tasko, Stephen M.

    2003-01-01

    Understanding how orofacial muscle activity and movement covary across changes in speech rate and intensity has implications for the neural control of speech production and the use of clinical procedures that manipulate speech prosody. The present study involved a correlation analysis relating average lower-lip and jaw-muscle activity to lip and…

  2. Of Mouths and Men: Non-Native Listeners' Identification and Evaluation of Varieties of English.

    ERIC Educational Resources Information Center

    Jarvella, Robert J.; Bang, Eva; Jakobsen, Arnt Lykke; Mees, Inger M.

    2001-01-01

    Advanced Danish students of English tried to identify the national origin of young men from Ireland, Scotland, England, and the United States from their speech and then rated the speech for attractiveness. Listeners rated speech produced by Englishmen as most attractive, and speech by Americans as least attractive. (Author/VWL)

  3. Utterances in infant-directed speech are shorter, not slower.

    PubMed

    Martin, Andrew; Igarashi, Yosuke; Jincho, Nobuyuki; Mazuka, Reiko

    2016-11-01

    It has become a truism in the literature on infant-directed speech (IDS) that IDS is pronounced more slowly than adult-directed speech (ADS). Using recordings of 22 Japanese mothers speaking to their infant and to an adult, we show that although IDS has an overall lower mean speech rate than ADS, this is not the result of an across-the-board slowing in which every vowel is expanded equally. Instead, the speech rate difference is entirely due to the effects of phrase-final lengthening, which disproportionally affects IDS because of its shorter utterances. These results demonstrate that taking utterance-internal prosodic characteristics into account is crucial to studies of speech rate. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Effect of delayed auditory feedback on normal speakers at two speech rates

    NASA Astrophysics Data System (ADS)

    Stuart, Andrew; Kalinowski, Joseph; Rastatter, Michael P.; Lynch, Kerry

    2002-05-01

    This study investigated the effect of short and long auditory feedback delays at two speech rates with normal speakers. Seventeen participants spoke under delayed auditory feedback (DAF) at 0, 25, 50, and 200 ms at normal and fast rates of speech. Significantly two to three times more dysfluencies were displayed at 200 ms (p<0.05) relative to no delay or the shorter delays. There were significantly more dysfluencies observed at the fast rate of speech (p=0.028). These findings implicate the peripheral feedback system(s) of fluent speakers for the disruptive effects of DAF on normal speech production at long auditory feedback delays. Considering the contrast in fluency/dysfluency exhibited between normal speakers and those who stutter at short and long delays, it appears that speech disruption of normal speakers under DAF is a poor analog of stuttering.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    None

    Discussion Session - Accelerator System Design (Part II) Tutors: C. Darve, J. Weisend II, Ph. Lebrun, A. Dabrowski, U. Raich Video Conference with the CERN Control Center. Experts in the field of Accelerator science will be available to answer the students questions. This session will link the CCC and SA (using Codec VC).

  6. Block-based scalable wavelet image codec

    NASA Astrophysics Data System (ADS)

    Bao, Yiliang; Kuo, C.-C. Jay

    1999-10-01

    This paper presents a high performance block-based wavelet image coder which is designed to be of very low implementational complexity yet with rich features. In this image coder, the Dual-Sliding Wavelet Transform (DSWT) is first applied to image data to generate wavelet coefficients in fixed-size blocks. Here, a block only consists of wavelet coefficients from a single subband. The coefficient blocks are directly coded with the Low Complexity Binary Description (LCBiD) coefficient coding algorithm. Each block is encoded using binary context-based bitplane coding. No parent-child correlation is exploited in the coding process. There is also no intermediate buffering needed in between DSWT and LCBiD. The compressed bit stream generated by the proposed coder is both SNR and resolution scalable, as well as highly resilient to transmission errors. Both DSWT and LCBiD process the data in blocks whose size is independent of the size of the original image. This gives more flexibility in the implementation. The codec has a very good coding performance even the block size is (16,16).

  7. Slowed Speech Input has a Differential Impact on On-line and Off-line Processing in Children’s Comprehension of Pronouns

    PubMed Central

    Walenski, Matthew; Swinney, David

    2009-01-01

    The central question underlying this study revolves around how children process co-reference relationships—such as those evidenced by pronouns (him) and reflexives (himself)—and how a slowed rate of speech input may critically affect this process. Previous studies of child language processing have demonstrated that typical language developing (TLD) children as young as 4 years of age process co-reference relations in a manner similar to adults on-line. In contrast, off-line measures of pronoun comprehension suggest a developmental delay for pronouns (relative to reflexives). The present study examines dependency relations in TLD children (ages 5–13) and investigates how a slowed rate of speech input affects the unconscious (on-line) and conscious (off-line) parsing of these constructions. For the on-line investigations (using a cross-modal picture priming paradigm), results indicate that at a normal rate of speech TLD children demonstrate adult-like syntactic reflexes. At a slowed rate of speech the typical language developing children displayed a breakdown in automatic syntactic parsing (again, similar to the pattern seen in unimpaired adults). As demonstrated in the literature, our off-line investigations (sentence/picture matching task) revealed that these children performed much better on reflexives than on pronouns at a regular speech rate. However, at the slow speech rate, performance on pronouns was substantially improved, whereas performance on reflexives was not different than at the regular speech rate. We interpret these results in light of a distinction between fast automatic processes (relied upon for on-line processing in real time) and conscious reflective processes (relied upon for off-line processing), such that slowed speech input disrupts the former, yet improves the latter. PMID:19343495

  8. The development and validation of the speech quality instrument.

    PubMed

    Chen, Stephanie Y; Griffin, Brianna M; Mancuso, Dean; Shiau, Stephanie; DiMattia, Michelle; Cellum, Ilana; Harvey Boyd, Kelly; Prevoteau, Charlotte; Kohlberg, Gavriel D; Spitzer, Jaclyn B; Lalwani, Anil K

    2017-12-08

    Although speech perception tests are available to evaluate hearing, there is no standardized validated tool to quantify speech quality. The objective of this study is to develop a validated tool to measure quality of speech heard. Prospective instrument validation study of 35 normal hearing adults recruited at a tertiary referral center. Participants listened to 44 speech clips of male/female voices reciting the Rainbow Passage. Speech clips included original and manipulated excerpts capturing goal qualities such as mechanical and garbled. Listeners rated clips on a 10-point visual analog scale (VAS) of 18 characteristics (e.g. cartoonish, garbled). Skewed distribution analysis identified mean ratings in the upper and lower 2-point limits of the VAS (ratings of 8-10, 0-2, respectively); items with inconsistent responses were eliminated. The test was pruned to a final instrument of nine speech clips that clearly define qualities of interest: speech-like, male/female, cartoonish, echo-y, garbled, tinny, mechanical, rough, breathy, soothing, hoarse, like, pleasant, natural. Mean ratings were highest for original female clips (8.8) and lowest for not-speech manipulation (2.1). Factor analysis identified two subsets of characteristics: internal consistency demonstrated Cronbach's alpha of 0.95 and 0.82 per subset. Test-retest reliability of total scores was high, with an intraclass correlation coefficient of 0.76. The Speech Quality Instrument (SQI) is a concise, valid tool for assessing speech quality as an indicator for hearing performance. SQI may be a valuable outcome measure for cochlear implant recipients who, despite achieving excellent speech perception, often experience poor speech quality. 2b. Laryngoscope, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.

  9. Neurogenic Orofacial Weakness and Speech in Adults With Dysarthria

    PubMed Central

    Makashay, Matthew J.; Helou, Leah B.; Clark, Heather M.

    2017-01-01

    Purpose This study compared orofacial strength between adults with dysarthria and neurologically normal (NN) matched controls. In addition, orofacial muscle weakness was examined for potential relationships to speech impairments in adults with dysarthria. Method Matched groups of 55 adults with dysarthria and 55 NN adults generated maximum pressure (Pmax) against an air-filled bulb during lingual elevation, protrusion and lateralization, and buccodental and labial compressions. These orofacial strength measures were compared with speech intelligibility, perceptual ratings of speech, articulation rate, and fast syllable-repetition rate. Results The dysarthria group demonstrated significantly lower orofacial strength than the NN group on all tasks. Lingual strength correlated moderately and buccal strength correlated weakly with most ratings of speech deficits. Speech intelligibility was not sensitive to dysarthria severity. Individuals with severely reduced anterior lingual elevation Pmax (< 18 kPa) had normal to profoundly impaired sentence intelligibility (99%–6%) and moderately to severely impaired speech (26%–94% articulatory imprecision; 33%–94% overall severity). Conclusions Results support the presence of orofacial muscle weakness in adults with dysarthrias of varying etiologies but reinforce tenuous links between orofacial strength and speech production disorders. By examining individual data, preliminary evidence emerges to suggest that speech, but not necessarily intelligibility, is likely to be impaired when lingual weakness is severe. PMID:28763804

  10. The influence of speech rate and accent on access and use of semantic information.

    PubMed

    Sajin, Stanislav M; Connine, Cynthia M

    2017-04-01

    Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.

  11. Spatiotemporal movement variability in ALS: Speaking rate effects on tongue, lower lip, and jaw motor control

    PubMed Central

    Kuruvilla-Dugdale, Mili; Mefferd, Antje

    2017-01-01

    Purpose Although it is frequently presumed that bulbar muscle degeneration in Amyotrophic Lateral Sclerosis (ALS) is associated with progressive loss of speech motor control, empirical evidence is limited. Furthermore, because speaking rate slows with disease progression and rate manipulations are used to improve intelligibility in ALS, this study sought to (i) determine between and within-group differences in articulatory motor control as a result of speaking rate changes and (ii) identify the strength of association between articulatory motor control and speech impairment severity. Method Ten talkers with ALS and 11 healthy controls repeated the target sentence at habitual, fast, and slow rates. The spatiotemporal variability index (STI) was calculated to determine tongue, lower lip, and jaw movement variability. Results During habitual speech, talkers with mild-moderate dysarthria displayed significantly lower tongue and lip movement variability whereas those with severe dysarthria showed greater variability compared to controls. Within-group rate effects were significant only for talkers with ALS. Specifically, lip and tongue movement variability significantly increased during slow speech relative to habitual and fast speech. Finally, preliminary associations between speech impairment severity and movement variability were moderate to strong in talkers with ALS. Conclusion Between-group differences for habitual speech and within-group effects for slow speech replicated previous findings for lower lip and jaw movements. Preliminary findings of moderate to strong associations between speech impairment severity and STI suggest that articulatory variability may vary from pathologically low (possibly indicating articulatory compensation) to pathologically high variability (possibly indicating loss of control) with dysarthria progression in ALS. PMID:28528293

  12. SUBTHALAMIC NUCLEUS NEURONS DIFFERENTIALLY ENCODE EARLY AND LATE ASPECTS OF SPEECH PRODUCTION.

    PubMed

    Lipski, W J; Alhourani, A; Pirnia, T; Jones, P W; Dastolfo-Hromack, C; Helou, L B; Crammond, D J; Shaiman, S; Dickey, M W; Holt, L L; Turner, R S; Fiez, J A; Richardson, R M

    2018-05-22

    Basal ganglia-thalamocortical loops mediate all motor behavior, yet little detail is known about the role of basal ganglia nuclei in speech production. Using intracranial recording during deep brain stimulation surgery in humans with Parkinson's disease, we tested the hypothesis that the firing rate of subthalamic nucleus neurons is modulated in sync with motor execution aspects of speech. Nearly half of seventy-nine unit recordings exhibited firing rate modulation, during a syllable reading task across twelve subjects (male and female). Trial-to-trial timing of changes in subthalamic neuronal activity, relative to cue onset versus production onset, revealed that locking to cue presentation was associated more with units that decreased firing rate, while locking to speech onset was associated more with units that increased firing rate. These unique data indicate that subthalamic activity is dynamic during the production of speech, reflecting temporally-dependent inhibition and excitation of separate populations of subthalamic neurons. SIGNIFICANCE STATEMENT The basal ganglia are widely assumed to participate in speech production, yet no prior studies have reported detailed examination of speech-related activity in basal ganglia nuclei. Using microelectrode recordings from the subthalamic nucleus during a single syllable reading task, in awake humans undergoing deep brain stimulation implantation surgery, we show that the firing rate of subthalamic nucleus neurons is modulated in response to motor execution aspects of speech. These results are the first to establish a role for subthalamic nucleus neurons in encoding of aspects of speech production, and they lay the groundwork for launching a modern subfield to explore basal ganglia function in human speech. Copyright © 2018 the authors.

  13. Flexible digital modulation and coding synthesis for satellite communications

    NASA Technical Reports Server (NTRS)

    Vanderaar, Mark; Budinger, James; Hoerig, Craig; Tague, John

    1991-01-01

    An architecture and a hardware prototype of a flexible trellis modem/codec (FTMC) transmitter are presented. The theory of operation is built upon a pragmatic approach to trellis-coded modulation that emphasizes power and spectral efficiency. The system incorporates programmable modulation formats, variations of trellis-coding, digital baseband pulse-shaping, and digital channel precompensation. The modulation formats examined include (uncoded and coded) binary phase shift keying (BPSK), quatenary phase shift keying (QPSK), octal phase shift keying (8PSK), 16-ary quadrature amplitude modulation (16-QAM), and quadrature quadrature phase shift keying (Q squared PSK) at programmable rates up to 20 megabits per second (Mbps). The FTMC is part of the developing test bed to quantify modulation and coding concepts.

  14. A randomized controlled trial of the effects of multi-sensory stimulation (MSS) for people with dementia.

    PubMed

    Baker, R; Bell, S; Baker, E; Gibson, S; Holloway, J; Pearce, R; Dowling, Z; Thomas, P; Assey, J; Wareing, L A

    2001-03-01

    To investigate short-term effects of Multi-Sensory Stimulation (MSS) on behaviour, mood and cognition of older adults with dementia, the generalization of effects to day hospital and home environments and the endurance of any effects over time. A randomized controlled trial comparing MSS with a credible control of one-to-one activities. Fifty patients with diagnoses of moderate to severe dementia were randomized to either MSS or Activity groups. Patients participated in eight 30-minute sessions over a 4-week period. Ratings of behaviour and mood were taken before, during and after sessions to investigate immediate effects. Pre, mid, post-trial, and follow-up assessments were taken to investigate any generalization of effects on cognition, behaviour at the day hospital and behaviour and mood at home and endurance of effects once sessions had ceased. Immediately after MSS and Activity sessions patients talked more spontaneously, related better to others, did more from their own initiative, were less bored/inactive, and were more happy, active or alert. Both groups were more attentive to their environment than before, with a significantly greater improvement from the MSS group. At the day hospital, patients in the Activity group improved on their 'speech skills' (amount of speech; initiation of speech), whereas the MSS group remained unchanged during the trial. The MSS group showed a significant improvement in mood and behaviour at home compared to the Activity group whose behaviour deteriorated. No longer-term benefits were shown; indeed, behaviour declined sharply during the month follow-up period. Both MSS and Activity sessions appear to be effective and appropriate therapies for people with dementia.

  15. Cognitive control components and speech symptoms in people with schizophrenia.

    PubMed

    Becker, Theresa M; Cicero, David C; Cowan, Nelson; Kerns, John G

    2012-03-30

    Previous schizophrenia research suggests poor cognitive control is associated with schizophrenia speech symptoms. However, cognitive control is a broad construct. Two important cognitive control components are poor goal maintenance and poor verbal working memory storage. In the current research, people with schizophrenia (n=45) performed three cognitive tasks that varied in their goal maintenance and verbal working memory storage demands. Speech symptoms were assessed using clinical rating scales, ratings of disorganized speech from typed transcripts, and self-reported disorganization. Overall, alogia was associated with both goal maintenance and verbal working memory tasks. Objectively rated disorganized speech was associated with poor goal maintenance and with a task that included both goal maintenance and verbal working memory storage demands. In contrast, self-reported disorganization was unrelated to either amount of objectively rated disorganized speech or to cognitive control task performance, instead being associated with negative mood symptoms. Overall, our results suggest that alogia is associated with both poor goal maintenance and poor verbal working memory storage and that disorganized speech is associated with poor goal maintenance. In addition, patients' own assessment of their disorganization is related to negative mood, but perhaps not to objective disorganized speech or to cognitive control task performance. Published by Elsevier Ireland Ltd.

  16. Auditory perceptual simulation: Simulating speech rates or accents?

    PubMed

    Zhou, Peiyun; Christianson, Kiel

    2016-07-01

    When readers engage in Auditory Perceptual Simulation (APS) during silent reading, they mentally simulate characteristics of voices attributed to a particular speaker or a character depicted in the text. Previous research found that auditory perceptual simulation of a faster native English speaker during silent reading led to shorter reading times that auditory perceptual simulation of a slower non-native English speaker. Yet, it was uncertain whether this difference was triggered by the different speech rates of the speakers, or by the difficulty of simulating an unfamiliar accent. The current study investigates this question by comparing faster Indian-English speech and slower American-English speech in the auditory perceptual simulation paradigm. Analyses of reading times of individual words and the full sentence reveal that the auditory perceptual simulation effect again modulated reading rate, and auditory perceptual simulation of the faster Indian-English speech led to faster reading rates compared to auditory perceptual simulation of the slower American-English speech. The comparison between this experiment and the data from Zhou and Christianson (2016) demonstrate further that the "speakers'" speech rates, rather than the difficulty of simulating a non-native accent, is the primary mechanism underlying auditory perceptual simulation effects. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Spatial hearing benefits demonstrated with presentation of acoustic temporal fine structure cues in bilateral cochlear implant listeners.

    PubMed

    Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Litovsky, Ruth Y

    2014-09-01

    Most contemporary cochlear implant (CI) processing strategies discard acoustic temporal fine structure (TFS) information, and this may contribute to the observed deficits in bilateral CI listeners' ability to localize sounds when compared to normal hearing listeners. Additionally, for best speech envelope representation, most contemporary speech processing strategies use high-rate carriers (≥900 Hz) that exceed the limit for interaural pulse timing to provide useful binaural information. Many bilateral CI listeners are sensitive to interaural time differences (ITDs) in low-rate (<300 Hz) constant-amplitude pulse trains. This study explored the trade-off between superior speech temporal envelope representation with high-rate carriers and binaural pulse timing sensitivity with low-rate carriers. The effects of carrier pulse rate and pulse timing on ITD discrimination, ITD lateralization, and speech recognition in quiet were examined in eight bilateral CI listeners. Stimuli consisted of speech tokens processed at different electrical stimulation rates, and pulse timings that either preserved or did not preserve acoustic TFS cues. Results showed that CI listeners were able to use low-rate pulse timing cues derived from acoustic TFS when presented redundantly on multiple electrodes for ITD discrimination and lateralization of speech stimuli.

  18. Decoding the attended speech stream with multi-channel EEG: implications for online, daily-life applications

    NASA Astrophysics Data System (ADS)

    Mirkovic, Bojana; Debener, Stefan; Jaeger, Manuela; De Vos, Maarten

    2015-08-01

    Objective. Recent studies have provided evidence that temporal envelope driven speech decoding from high-density electroencephalography (EEG) and magnetoencephalography recordings can identify the attended speech stream in a multi-speaker scenario. The present work replicated the previous high density EEG study and investigated the necessary technical requirements for practical attended speech decoding with EEG. Approach. Twelve normal hearing participants attended to one out of two simultaneously presented audiobook stories, while high density EEG was recorded. An offline iterative procedure eliminating those channels contributing the least to decoding provided insight into the necessary channel number and optimal cross-subject channel configuration. Aiming towards the future goal of near real-time classification with an individually trained decoder, the minimum duration of training data necessary for successful classification was determined by using a chronological cross-validation approach. Main results. Close replication of the previously reported results confirmed the method robustness. Decoder performance remained stable from 96 channels down to 25. Furthermore, for less than 15 min of training data, the subject-independent (pre-trained) decoder performed better than an individually trained decoder did. Significance. Our study complements previous research and provides information suggesting that efficient low-density EEG online decoding is within reach.

  19. Working Memory and Speech Recognition in Noise Under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type Among Adults With Hearing Loss.

    PubMed

    Miller, Christi W; Stewart, Erin K; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A; Tremblay, Kelly

    2017-08-16

    This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2 measures of WM were taken: a reading span measure, and Word Auditory Recognition and Recall Measure (Smith, Pichora-Fuller, & Alexander, 2016). Speech recognition was measured with the Multi-Modal Lexical Sentence Test for Adults (Kirk et al., 2012) in steady-state noise and 4-talker babble, with and without visual cues. Testing was under unaided conditions. A linear mixed model revealed visual cues and pure-tone average as the only significant predictors of Multi-Modal Lexical Sentence Test outcomes. Neither WM measure nor noise type showed a significant effect. The contribution of WM in explaining unaided speech recognition in noise was negligible and not influenced by noise type or visual cues. We anticipate that with audibility partially restored by hearing aids, the effects of WM will increase. For clinical practice to be affected, more significant effect sizes are needed.

  20. Impairments of speech fluency in Lewy body spectrum disorder.

    PubMed

    Ash, Sharon; McMillan, Corey; Gross, Rachel G; Cook, Philip; Gunawardena, Delani; Morgan, Brianna; Boller, Ashley; Siderowf, Andrew; Grossman, Murray

    2012-03-01

    Few studies have examined connected speech in demented and non-demented patients with Parkinson's disease (PD). We assessed the speech production of 35 patients with Lewy body spectrum disorder (LBSD), including non-demented PD patients, patients with PD dementia (PDD), and patients with dementia with Lewy bodies (DLB), in a semi-structured narrative speech sample in order to characterize impairments of speech fluency and to determine the factors contributing to reduced speech fluency in these patients. Both demented and non-demented PD patients exhibited reduced speech fluency, characterized by reduced overall speech rate and long pauses between sentences. Reduced speech rate in LBSD correlated with measures of between-utterance pauses, executive functioning, and grammatical comprehension. Regression analyses related non-fluent speech, grammatical difficulty, and executive difficulty to atrophy in frontal brain regions. These findings indicate that multiple factors contribute to slowed speech in LBSD, and this is mediated in part by disease in frontal brain regions. Copyright © 2011 Elsevier Inc. All rights reserved.

  1. Fluency variation in adolescents.

    PubMed

    Furquim de Andrade, Claudia Regina; de Oliveira Martins, Vanessa

    2007-10-01

    The Speech Fluency Profile of fluent adolescent speakers of Brazilian Portuguese, were examined with respect to gender and neurolinguistic variations. Speech samples of 130 male and female adolescents, aged between 12;0 and 17;11 years were gathered. They were analysed according to type of speech disruption; speech rate; and frequency of speech disruptions. Statistical analysis did not find significant differences between genders for the variables studied. However, regarding the phases of adolescence (early: 12;0-14;11 years; late: 15;0-17;11 years), statistical differences were observed for all of the variables. As for neurolinguistic maturation, a decrease in the number of speech disruptions and an increase in speech rate occurred during the final phase of adolescence, indicating that the maturation of the motor and linguistic processes exerted an influence over the fluency profile of speech.

  2. Speech and gait in Parkinson's disease: When rhythm matters.

    PubMed

    Ricciardi, Lucia; Ebreo, Michela; Graziosi, Adriana; Barbuto, Marianna; Sorbera, Chiara; Morgante, Letterio; Morgante, Francesca

    2016-11-01

    Speech disturbances in Parkinson's disease (PD) are heterogeneous, ranging from hypokinetic to hyperkinetic types. Repetitive speech disorder has been demonstrated in more advanced disease stages and has been considered the speech equivalent of freezing of gait (FOG). We aimed to verify a possible relationship between speech and FOG in patients with PD. Forty-three consecutive PD patients and 20 healthy control subjects underwent standardized speech evaluation using the Italian version of the Dysarthria Profile (DP), for its motor component, and subsets of the Battery for the Analysis of the Aphasic Deficit (BADA), for its procedural component. DP is a scale composed of 7 sub-sections assessing different features of speech; the rate/prosody section of DP includes items investigating the presence of repetitive speech disorder. Severity of FOG was evaluated with the new freezing of gait questionnaire (NFGQ). PD patients performed worse at DP and BADA compared to healthy controls; patients with FOG or with Hoehn-Yahr >2 reported lower scores in the articulation, intellibility, rate/prosody sections of DP and in the semantic verbal fluency test. Logistic regression analysis showed that only age and rate/prosody scores were significantly associated to FOG in PD. Multiple regression analysis showed that only the severity of FOG was associated to rate/prosody score. Our data demonstrate that repetitive speech disorder is related to FOG and is associated to advanced disease stages and independent of disease duration. Speech dysfluency represents a disorder of motor speech control, possibly sharing pathophysiological mechanisms with FOG. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Motor speech signature of behavioral variant frontotemporal dementia: Refining the phenotype.

    PubMed

    Vogel, Adam P; Poole, Matthew L; Pemberton, Hugh; Caverlé, Marja W J; Boonstra, Frederique M C; Low, Essie; Darby, David; Brodtmann, Amy

    2017-08-22

    To provide a comprehensive description of motor speech function in behavioral variant frontotemporal dementia (bvFTD). Forty-eight individuals (24 bvFTD and 24 age- and sex-matched healthy controls) provided speech samples. These varied in complexity and thus cognitive demand. Their language was assessed using the Progressive Aphasia Language Scale and verbal fluency tasks. Speech was analyzed perceptually to describe the nature of deficits and acoustically to quantify differences between patients with bvFTD and healthy controls. Cortical thickness and subcortical volume derived from MRI scans were correlated with speech outcomes in patients with bvFTD. Speech of affected individuals was significantly different from that of healthy controls. The speech signature of patients with bvFTD is characterized by a reduced rate (75%) and accuracy (65%) on alternating syllable production tasks, and prosodic deficits including reduced speech rate (45%), prolonged intervals (54%), and use of short phrases (41%). Groups differed on acoustic measures derived from the reading, unprepared monologue, and diadochokinetic tasks but not the days of the week or sustained vowel tasks. Variability of silence length was associated with cortical thickness of the inferior frontal gyrus and insula and speech rate with the precentral gyrus. One in 8 patients presented with moderate speech timing deficits with a further two-thirds rated as mild or subclinical. Subtle but measurable deficits in prosody are common in bvFTD and should be considered during disease management. Language function correlated with speech timing measures derived from the unprepared monologue only. © 2017 American Academy of Neurology.

  4. Multi-voxel Patterns Reveal Functionally Differentiated Networks Underlying Auditory Feedback Processing of Speech

    PubMed Central

    Zheng, Zane Z.; Vicente-Grabovetsky, Alejandro; MacDonald, Ewen N.; Munhall, Kevin G.; Cusack, Rhodri; Johnsrude, Ingrid S.

    2013-01-01

    The everyday act of speaking involves the complex processes of speech motor control. An important component of control is monitoring, detection and processing of errors when auditory feedback does not correspond to the intended motor gesture. Here we show, using fMRI and converging operations within a multi-voxel pattern analysis framework, that this sensorimotor process is supported by functionally differentiated brain networks. During scanning, a real-time speech-tracking system was employed to deliver two acoustically different types of distorted auditory feedback or unaltered feedback while human participants were vocalizing monosyllabic words, and to present the same auditory stimuli while participants were passively listening. Whole-brain analysis of neural-pattern similarity revealed three functional networks that were differentially sensitive to distorted auditory feedback during vocalization, compared to during passive listening. One network of regions appears to encode an ‘error signal’ irrespective of acoustic features of the error: this network, including right angular gyrus, right supplementary motor area, and bilateral cerebellum, yielded consistent neural patterns across acoustically different, distorted feedback types, only during articulation (not during passive listening). In contrast, a fronto-temporal network appears sensitive to the speech features of auditory stimuli during passive listening; this preference for speech features was diminished when the same stimuli were presented as auditory concomitants of vocalization. A third network, showing a distinct functional pattern from the other two, appears to capture aspects of both neural response profiles. Taken together, our findings suggest that auditory feedback processing during speech motor control may rely on multiple, interactive, functionally differentiated neural systems. PMID:23467350

  5. Speech Recognition and Parent Ratings From Auditory Development Questionnaires in Children Who Are Hard of Hearing.

    PubMed

    McCreery, Ryan W; Walker, Elizabeth A; Spratford, Meredith; Oleson, Jacob; Bentler, Ruth; Holte, Lenore; Roush, Patricia

    2015-01-01

    Progress has been made in recent years in the provision of amplification and early intervention for children who are hard of hearing. However, children who use hearing aids (HAs) may have inconsistent access to their auditory environment due to limitations in speech audibility through their HAs or limited HA use. The effects of variability in children's auditory experience on parent-reported auditory skills questionnaires and on speech recognition in quiet and in noise were examined for a large group of children who were followed as part of the Outcomes of Children with Hearing Loss study. Parent ratings on auditory development questionnaires and children's speech recognition were assessed for 306 children who are hard of hearing. Children ranged in age from 12 months to 9 years. Three questionnaires involving parent ratings of auditory skill development and behavior were used, including the LittlEARS Auditory Questionnaire, Parents Evaluation of Oral/Aural Performance in Children rating scale, and an adaptation of the Speech, Spatial, and Qualities of Hearing scale. Speech recognition in quiet was assessed using the Open- and Closed-Set Test, Early Speech Perception test, Lexical Neighborhood Test, and Phonetically Balanced Kindergarten word lists. Speech recognition in noise was assessed using the Computer-Assisted Speech Perception Assessment. Children who are hard of hearing were compared with peers with normal hearing matched for age, maternal educational level, and nonverbal intelligence. The effects of aided audibility, HA use, and language ability on parent responses to auditory development questionnaires and on children's speech recognition were also examined. Children who are hard of hearing had poorer performance than peers with normal hearing on parent ratings of auditory skills and had poorer speech recognition. Significant individual variability among children who are hard of hearing was observed. Children with greater aided audibility through their HAs, more hours of HA use, and better language abilities generally had higher parent ratings of auditory skills and better speech-recognition abilities in quiet and in noise than peers with less audibility, more limited HA use, or poorer language abilities. In addition to the auditory and language factors that were predictive for speech recognition in quiet, phonological working memory was also a positive predictor for word recognition abilities in noise. Children who are hard of hearing continue to experience delays in auditory skill development and speech-recognition abilities compared with peers with normal hearing. However, significant improvements in these domains have occurred in comparison to similar data reported before the adoption of universal newborn hearing screening and early intervention programs for children who are hard of hearing. Increasing the audibility of speech has a direct positive effect on auditory skill development and speech-recognition abilities and also may enhance these skills by improving language abilities in children who are hard of hearing. Greater number of hours of HA use also had a significant positive impact on parent ratings of auditory skills and children's speech recognition.

  6. A large-scale video codec comparison of x264, x265 and libvpx for practical VOD applications

    NASA Astrophysics Data System (ADS)

    De Cock, Jan; Mavlankar, Aditya; Moorthy, Anush; Aaron, Anne

    2016-09-01

    Over the last years, we have seen exciting improvements in video compression technology, due to the introduction of HEVC and royalty-free coding specifications such as VP9. The potential compression gains of HEVC over H.264/AVC have been demonstrated in different studies, and are usually based on the HM reference software. For VP9, substantial gains over H.264/AVC have been reported in some publications, whereas others reported less optimistic results. Differences in configurations between these publications make it more difficult to assess the true potential of VP9. Practical open-source encoder implementations such as x265 and libvpx (VP9) have matured, and are now showing high compression gains over x264. In this paper, we demonstrate the potential of these encoder imple- mentations, with settings optimized for non-real-time random access, as used in a video-on-demand encoding pipeline. We report results from a large-scale video codec comparison test, which includes x264, x265 and libvpx. A test set consisting of a variety of titles with varying spatio-temporal characteristics from our catalog is used, resulting in tens of millions of encoded frames, hence larger than test sets previously used in the literature. Re- sults are reported in terms of PSNR, SSIM, MS-SSIM, VIF and the recently introduced VMAF quality metric. BD-rate calculations show that using x265 and libvpx vs. x264 can lead to significant bitrate savings for the same quality. x265 outperforms libvpx in most cases, but the performance gap narrows (or even reverses) at the higher resolutions.

  7. Synthesized speech rate and pitch effects on intelligibility of warning messages for pilots

    NASA Technical Reports Server (NTRS)

    Simpson, C. A.; Marchionda-Frost, K.

    1984-01-01

    In civilian and military operations, a future threat-warning system with a voice display could warn pilots of other traffic, obstacles in the flight path, and/or terrain during low-altitude helicopter flights. The present study was conducted to learn whether speech rate and voice pitch of phoneme-synthesized speech affects pilot accuracy and response time to typical threat-warning messages. Helicopter pilots engaged in an attention-demanding flying task and listened for voice threat warnings presented in a background of simulated helicopter cockpit noise. Performance was measured by flying-task performance, threat-warning intelligibility, and response time. Pilot ratings were elicited for the different voice pitches and speech rates. Significant effects were obtained only for response time and for pilot ratings, both as a function of speech rate. For the few cases when pilots forgot to respond to a voice message, they remembered 90 percent of the messages accurately when queried for their response 8 to 10 sec later.

  8. The influence of speaking rate on nasality in the speech of hearing-impaired individuals.

    PubMed

    Dwyer, Claire H; Robb, Michael P; O'Beirne, Greg A; Gilbert, Harvey R

    2009-10-01

    The purpose of this study was to determine whether deliberate increases in speaking rate would serve to decrease the amount of nasality in the speech of severely hearing-impaired individuals. The participants were 11 severely to profoundly hearing-impaired students, ranging in age from 12 to 19 years (M = 16 years). Each participant provided a baseline speech sample (R1) followed by 3 training sessions during which participants were trained to increase their speaking rate. Following the training sessions, a second speech sample was obtained (R2). Acoustic and perceptual analyses of the speech samples obtained at R1 and R2 were undertaken. The acoustic analysis focused on changes in first (F(1)) and second (F(2)) formant frequency and formant bandwidths. The perceptual analysis involved listener ratings of the speech samples (at R1 and R2) for perceived nasality. Findings indicated a significant increase in speaking rate at R2. In addition, significantly narrower F(2) bandwidth and lower perceptual rating scores of nasality were obtained at R2 across all participants, suggesting a decrease in nasality as speaking rate increases. The nasality demonstrated by hearing-impaired individuals is amenable to change when speaking rate is increased. The influences of speaking rate changes on the perception and production of nasality in hearing-impaired individuals are discussed.

  9. Contextual modulation of reading rate for direct versus indirect speech quotations.

    PubMed

    Yao, Bo; Scheepers, Christoph

    2011-12-01

    In human communication, direct speech (e.g., Mary said: "I'm hungry") is perceived to be more vivid than indirect speech (e.g., Mary said [that] she was hungry). However, the processing consequences of this distinction are largely unclear. In two experiments, participants were asked to either orally (Experiment 1) or silently (Experiment 2, eye-tracking) read written stories that contained either a direct speech or an indirect speech quotation. The context preceding those quotations described a situation that implied either a fast-speaking or a slow-speaking quoted protagonist. It was found that this context manipulation affected reading rates (in both oral and silent reading) for direct speech quotations, but not for indirect speech quotations. This suggests that readers are more likely to engage in perceptual simulations of the reported speech act when reading direct speech as opposed to meaning-equivalent indirect speech quotations, as part of a more vivid representation of the former. Copyright © 2011 Elsevier B.V. All rights reserved.

  10. Speech acoustic markers of early stage and prodromal Huntington's disease: a marker of disease onset?

    PubMed

    Vogel, Adam P; Shirbin, Christopher; Churchyard, Andrew J; Stout, Julie C

    2012-12-01

    Speech disturbances (e.g., altered prosody) have been described in symptomatic Huntington's Disease (HD) individuals, however, the extent to which speech changes in gene positive pre-manifest (PreHD) individuals is largely unknown. The speech of individuals carrying the mutant HTT gene is a behavioural/motor/cognitive marker demonstrating some potential as an objective indicator of early HD onset and disease progression. Speech samples were acquired from 30 individuals carrying the mutant HTT gene (13 PreHD, 17 early stage HD) and 15 matched controls. Participants read a passage, produced a monologue and said the days of the week. Data were analysed acoustically for measures of timing, frequency and intensity. There was a clear effect of group across most acoustic measures, so that speech performance differed in-line with disease progression. Comparisons across groups revealed significant differences between the control and the early stage HD group on measures of timing (e.g., speech rate). Participants carrying the mutant HTT gene presented with slower rates of speech, took longer to say words and produced greater silences between and within words compared to healthy controls. Importantly, speech rate showed a significant correlation to burden of disease scores. The speech of early stage HD differed significantly from controls. The speech of PreHD, although not reaching significance, tended to lie between the performance of controls and early stage HD. This suggests that changes in speech production appear to be developing prior to diagnosis. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. The influence of listener experience and academic training on ratings of nasality.

    PubMed

    Lewis, Kerry E; Watterson, Thomas L; Houghton, Sarah M

    2003-01-01

    This study assessed listener agreement levels for nasality ratings, and the strength of relationship between nasality ratings and nasalance scores on one hand, and listener clinical experience and formal academic training in cleft palate speech on the other. The listeners were 12 adults who represented four levels of clinical experience and academic training in cleft palate speech. Three listeners were teachers with no clinical experience and no academic training (TR), three were graduate students in speech-language pathology (GS) with academic training but no clinical experience, three were craniofacial surgeons (MD) with extensive experience listening to cleft palate speech but with no academic training in speech disorders, and three were certified speech-language pathologists (SLP) with both extensive academic training and clinical experience. The speech samples were audio recordings from 20 persons representing a range of nasality from normal to severely hypernasal. Nasalance scores were obtained simultaneously with the audio recordings. Results revealed that agreement levels for nasality ratings were highest for the SLPs, followed by the MDs. Thus, the more experienced groups tended to be more reliable. Mean nasality ratings obtained for each of the rater groups revealed an inverse relationship with experience. That is, the two groups with clinical experience (SLP and MD) tended to rate nasality lower than the two groups without experience (GS and TR). Correlation coefficients between nasalance scores and nasality judgments were low to moderate for all groups and did not follow a pattern. EDUCATIONAL OUTCOMES: As a result of this activity, the reader will be able to (1) describe the influence of listener experience and academic training in cleft palate speech on perceptual ratings of nasality. (2) describe the influence of experience and training on the nasality/nasalance relationship and, (3) compare the present findings to previous findings reported in the literature.

  12. Predictions of Speech Chimaera Intelligibility Using Auditory Nerve Mean-Rate and Spike-Timing Neural Cues.

    PubMed

    Wirtzfeld, Michael R; Ibrahim, Rasha A; Bruce, Ian C

    2017-10-01

    Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.

  13. Optimal pattern synthesis for speech recognition based on principal component analysis

    NASA Astrophysics Data System (ADS)

    Korsun, O. N.; Poliyev, A. V.

    2018-02-01

    The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.

  14. A laboratory study for assessing speech privacy in a simulated open-plan office.

    PubMed

    Lee, P J; Jeon, J Y

    2014-06-01

    The aim of this study is to assess speech privacy in open-plan office using two recently introduced single-number quantities: the spatial decay rate of speech, DL(2,S) [dB], and the A-weighted sound pressure level of speech at a distance of 4 m, L(p,A,S,4) m [dB]. Open-plan offices were modeled using a DL(2,S) of 4, 8, and 12 dB, and L(p,A,S,4) m was changed in three steps, from 43 to 57 dB.Auditory experiments were conducted at three locations with source–receiver distances of 8, 16, and 24 m, while background noise level was fixed at 30 dBA.A total of 20 subjects were asked to rate the speech intelligibility and listening difficulty of 240 Korean sentences in such surroundings. The speech intelligibility scores were not affected by DL(2,S) or L(p,A,S,4) m at a source–receiver distance of 8 m; however, listening difficulty ratings were significantly changed with increasing DL(2,S) and L(p,A,S,4) m values. At other locations, the influences of DL(2,S) and L(p,A,S,4) m on speech intelligibility and listening difficulty ratings were significant. It was also found that the speech intelligibility scores and listening difficulty ratings were considerably changed with increasing the distraction distance (r(D)). Furthermore, listening difficulty is more sensitive to variations in DL(2,S) and L(p,A,S,4) m than intelligibility scores for sound fields with high speech transmission performances. The recently introduced single-number quantities in the ISO standard, based on the spatial distribution of sound pressure level, were associated with speech privacy in an open-plan office. The results support single-number quantities being suitable to assess speech privacy, mainly at large distances. This new information can be considered when designing open-plan offices and making acoustic guidelines of open-plan offices.

  15. Working memory capacity may influence perceived effort during aided speech recognition in noise.

    PubMed

    Rudner, Mary; Lunner, Thomas; Behrens, Thomas; Thorén, Elisabet Sundewall; Rönnberg, Jerker

    2012-09-01

    Recently there has been interest in using subjective ratings as a measure of perceived effort during speech recognition in noise. Perceived effort may be an indicator of cognitive load. Thus, subjective effort ratings during speech recognition in noise may covary both with signal-to-noise ratio (SNR) and individual cognitive capacity. The present study investigated the relation between subjective ratings of the effort involved in listening to speech in noise, speech recognition performance, and individual working memory (WM) capacity in hearing impaired hearing aid users. In two experiments, participants with hearing loss rated perceived effort during aided speech perception in noise. Noise type and SNR were manipulated in both experiments, and in the second experiment hearing aid compression release settings were also manipulated. Speech recognition performance was measured along with WM capacity. There were 46 participants in all with bilateral mild to moderate sloping hearing loss. In Experiment 1 there were 16 native Danish speakers (eight women and eight men) with a mean age of 63.5 yr (SD = 12.1) and average pure tone (PT) threshold of 47. 6 dB (SD = 9.8). In Experiment 2 there were 30 native Swedish speakers (19 women and 11 men) with a mean age of 70 yr (SD = 7.8) and average PT threshold of 45.8 dB (SD = 6.6). A visual analog scale (VAS) was used for effort rating in both experiments. In Experiment 1, effort was rated at individually adapted SNRs while in Experiment 2 it was rated at fixed SNRs. Speech recognition in noise performance was measured using adaptive procedures in both experiments with Dantale II sentences in Experiment 1 and Hagerman sentences in Experiment 2. WM capacity was measured using a letter-monitoring task in Experiment 1 and the reading span task in Experiment 2. In both experiments, there was a strong and significant relation between rated effort and SNR that was independent of individual WM capacity, whereas the relation between rated effort and noise type seemed to be influenced by individual WM capacity. Experiment 2 showed that hearing aid compression setting influenced rated effort. Subjective ratings of the effort involved in speech recognition in noise reflect SNRs, and individual cognitive capacity seems to influence relative rating of noise type. American Academy of Audiology.

  16. Speech Rate as a Sticky Switch: A Multiple Lesion Case Analysis of Mutism and Hyperlalia

    ERIC Educational Resources Information Center

    Braun, Claude M. J.; Dumont, Mathieu; Duval, Julie; Hamel-Hebert, Isabelle

    2004-01-01

    Though it has long been known on the basis of clinical associations and serendipitous observation that speech rate is related to mood and psychomotor baseline, it is less known that speech rate is also related to libido and to immune function. We make the case for a bipolar phenomenon of ''psychic tonus,'' encompassing all these dimensions. The…

  17. Utilizing Multi-Modal Literacies in Middle Grades Science

    ERIC Educational Resources Information Center

    Saurino, Dan; Ogletree, Tamra; Saurino, Penelope

    2010-01-01

    The nature of literacy is changing. Increased student use of computer-mediated, digital, and visual communication spans our understanding of adolescent multi-modal capabilities that reach beyond the traditional conventions of linear speech and written text in the science curriculum. Advancing technology opens doors to learning that involve…

  18. Accent, intelligibility, and comprehensibility in the perception of foreign-accented Lombard speech

    NASA Astrophysics Data System (ADS)

    Li, Chi-Nin

    2003-10-01

    Speech produced in noise (Lombard speech) has been reported to be more intelligible than speech produced in quiet (normal speech). This study examined the perception of non-native Lombard speech in terms of intelligibility, comprehensibility, and degree of foreign accent. Twelve Cantonese speakers and a comparison group of English speakers read simple true and false English statements in quiet and in 70 dB of masking noise. Lombard and normal utterances were mixed with noise at a constant signal-to-noise ratio, and presented along with noise-free stimuli to eight new English listeners who provided transcription scores, comprehensibility ratings, and accent ratings. Analyses showed that, as expected, utterances presented in noise were less well perceived than were noise-free sentences, and that the Cantonese speakers' productions were more accented, but less intelligible and less comprehensible than those of the English speakers. For both groups of speakers, the Lombard sentences were correctly transcribed more often than their normal utterances in noisy conditions. However, the Cantonese-accented Lombard sentences were not rated as easier to understand than was the normal speech in all conditions. The assigned accent ratings were similar throughout all listening conditions. Implications of these findings will be discussed.

  19. The Effect of Uni- and Bilateral Thalamic Deep Brain Stimulation on Speech in Patients With Essential Tremor: Acoustics and Intelligibility.

    PubMed

    Becker, Johannes; Barbe, Michael T; Hartinger, Mariam; Dembek, Till A; Pochmann, Jil; Wirths, Jochen; Allert, Niels; Mücke, Doris; Hermes, Anne; Meister, Ingo G; Visser-Vandewalle, Veerle; Grice, Martine; Timmermann, Lars

    2017-04-01

    Deep brain stimulation (DBS) of the ventral intermediate nucleus (VIM) is performed to suppress medically-resistant essential tremor (ET). However, stimulation induced dysarthria (SID) is a common side effect, limiting the extent to which tremor can be suppressed. To date, the exact pathogenesis of SID in VIM-DBS treated ET patients is unknown. We investigate the effect of inactivated, uni- and bilateral VIM-DBS on speech production in patients with ET. We employ acoustic measures, tempo, and intelligibility ratings and patient's self-estimated speech to quantify SID, with a focus on comparing bilateral to unilateral stimulation effects and the effect of electrode position on speech. Sixteen German ET patients participated in this study. Each patient was acoustically recorded with DBS-off, unilateral-right-hemispheric-DBS-on, unilateral-left-hemispheric-DBS-on, and bilateral-DBS-on during an oral diadochokinesis task and a read German standard text. To capture the extent of speech impairment, we measured syllable duration and intensity ratio during the DDK task. Naïve listeners rated speech tempo and speech intelligibility of the read text on a 5-point-scale. Patients had to rate their "ability to speak". We found an effect of bilateral compared to unilateral and inactivated stimulation on syllable durations and intensity ratio, as well as on external intelligibility ratings and patients' VAS scores. Additionally, VAS scores are associated with more laterally located active contacts. For speech ratings, we found an effect of syllable duration such that tempo and intelligibility was rated worse for speakers exhibiting greater syllable durations. Our data confirms that SID is more pronounced under bilateral compared to unilateral stimulation. Laterally located electrodes are associated with more severe SID according to patient's self-ratings. We can confirm the relation between diadochokinetic rate and SID in that listener's tempo and intelligibility ratings can be predicted by measured syllable durations from DDK tasks. © 2017 International Neuromodulation Society.

  20. Formant-frequency variation and its effects on across-formant grouping in speech perception.

    PubMed

    Roberts, Brian; Summers, Robert J; Bailey, Peter J

    2013-01-01

    How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints.

  1. Cutaneous sensory nerve as a substitute for auditory nerve in solving deaf-mutes’ hearing problem: an innovation in multi-channel-array skin-hearing technology

    PubMed Central

    Li, Jianwen; Li, Yan; Zhang, Ming; Ma, Weifang; Ma, Xuezong

    2014-01-01

    The current use of hearing aids and artificial cochleas for deaf-mute individuals depends on their auditory nerve. Skin-hearing technology, a patented system developed by our group, uses a cutaneous sensory nerve to substitute for the auditory nerve to help deaf-mutes to hear sound. This paper introduces a new solution, multi-channel-array skin-hearing technology, to solve the problem of speech discrimination. Based on the filtering principle of hair cells, external voice signals at different frequencies are converted to current signals at corresponding frequencies using electronic multi-channel bandpass filtering technology. Different positions on the skin can be stimulated by the electrode array, allowing the perception and discrimination of external speech signals to be determined by the skin response to the current signals. Through voice frequency analysis, the frequency range of the band-pass filter can also be determined. These findings demonstrate that the sensory nerves in the skin can help to transfer the voice signal and to distinguish the speech signal, suggesting that the skin sensory nerves are good candidates for the replacement of the auditory nerve in addressing deaf-mutes’ hearing problems. Scientific hearing experiments can be more safely performed on the skin. Compared with the artificial cochlea, multi-channel-array skin-hearing aids have lower operation risk in use, are cheaper and are more easily popularized. PMID:25317171

  2. Articulatory-to-Acoustic Relations in Response to Speaking Rate and Loudness Manipulations

    ERIC Educational Resources Information Center

    Mefferd, Antje S.; Green, Jordan R.

    2010-01-01

    Purpose: In this investigation, the authors determined the strength of association between tongue kinematic and speech acoustics changes in response to speaking rate and loudness manipulations. Performance changes in the kinematic and acoustic domains were measured using two aspects of speech production presumably affecting speech clarity:…

  3. Methodological Choices in Rating Speech Samples

    ERIC Educational Resources Information Center

    O'Brien, Mary Grantham

    2016-01-01

    Much pronunciation research critically relies upon listeners' judgments of speech samples, but researchers have rarely examined the impact of methodological choices. In the current study, 30 German native listeners and 42 German L2 learners (L1 English) rated speech samples produced by English-German L2 learners along three continua: accentedness,…

  4. Speech recognition and parent-ratings from auditory development questionnaires in children who are hard of hearing

    PubMed Central

    McCreery, Ryan W.; Walker, Elizabeth A.; Spratford, Meredith; Oleson, Jacob; Bentler, Ruth; Holte, Lenore; Roush, Patricia

    2015-01-01

    Objectives Progress has been made in recent years in the provision of amplification and early intervention for children who are hard of hearing. However, children who use hearing aids (HA) may have inconsistent access to their auditory environment due to limitations in speech audibility through their HAs or limited HA use. The effects of variability in children’s auditory experience on parent-report auditory skills questionnaires and on speech recognition in quiet and in noise were examined for a large group of children who were followed as part of the Outcomes of Children with Hearing Loss study. Design Parent ratings on auditory development questionnaires and children’s speech recognition were assessed for 306 children who are hard of hearing. Children ranged in age from 12 months to 9 years of age. Three questionnaires involving parent ratings of auditory skill development and behavior were used, including the LittlEARS Auditory Questionnaire, Parents Evaluation of Oral/Aural Performance in Children Rating Scale, and an adaptation of the Speech, Spatial and Qualities of Hearing scale. Speech recognition in quiet was assessed using the Open and Closed set task, Early Speech Perception Test, Lexical Neighborhood Test, and Phonetically-balanced Kindergarten word lists. Speech recognition in noise was assessed using the Computer-Assisted Speech Perception Assessment. Children who are hard of hearing were compared to peers with normal hearing matched for age, maternal educational level and nonverbal intelligence. The effects of aided audibility, HA use and language ability on parent responses to auditory development questionnaires and on children’s speech recognition were also examined. Results Children who are hard of hearing had poorer performance than peers with normal hearing on parent ratings of auditory skills and had poorer speech recognition. Significant individual variability among children who are hard of hearing was observed. Children with greater aided audibility through their HAs, more hours of HA use and better language abilities generally had higher parent ratings of auditory skills and better speech recognition abilities in quiet and in noise than peers with less audibility, more limited HA use or poorer language abilities. In addition to the auditory and language factors that were predictive for speech recognition in quiet, phonological working memory was also a positive predictor for word recognition abilities in noise. Conclusions Children who are hard of hearing continue to experience delays in auditory skill development and speech recognition abilities compared to peers with normal hearing. However, significant improvements in these domains have occurred in comparison to similar data reported prior to the adoption of universal newborn hearing screening and early intervention programs for children who are hard of hearing. Increasing the audibility of speech has a direct positive effect on auditory skill development and speech recognition abilities, and may also enhance these skills by improving language abilities in children who are hard of hearing. Greater number of hours of HA use also had a significant positive impact on parent ratings of auditory skills and children’s speech recognition. PMID:26731160

  5. Tailoring auditory training to patient needs with single and multiple talkers: transfer-appropriate gains on a four-choice discrimination test.

    PubMed

    Barcroft, Joe; Sommers, Mitchell S; Tye-Murray, Nancy; Mauzé, Elizabeth; Schroy, Catherine; Spehar, Brent

    2011-11-01

    Our long-term objective is to develop an auditory training program that will enhance speech recognition in those situations where patients most want improvement. As a first step, the current investigation trained participants using either a single talker or multiple talkers to determine if auditory training leads to transfer-appropriate gains. The experiment implemented a 2 × 2 × 2 mixed design, with training condition as a between-participants variable and testing interval and test version as repeated-measures variables. Participants completed a computerized six-week auditory training program wherein they heard either the speech of a single talker or the speech of six talkers. Training gains were assessed with single-talker and multi-talker versions of the Four-choice discrimination test. Participants in both groups were tested on both versions. Sixty-nine adult hearing-aid users were randomly assigned to either single-talker or multi-talker auditory training. Both groups showed significant gains on both test versions. Participants who trained with multiple talkers showed greater improvement on the multi-talker version whereas participants who trained with a single talker showed greater improvement on the single-talker version. Transfer-appropriate gains occurred following auditory training, suggesting that auditory training can be designed to target specific patient needs.

  6. Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

    PubMed Central

    2012-01-01

    Abstract Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based on the non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of the proposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearing impaired children and elderly listeners. It was shown that for the speech with average rate equal to or higher than 6.48 vowels/s, all of the proposed methods have statistically significant impact on the improvement of speech intelligibility for hearing impaired children with reduced hearing resolution and one of the proposed methods significantly improves comprehension of speech in the group of elderly listeners with reduced hearing resolution. Virtual slides http://www.diagnosticpathology.diagnomx.eu/vs/2065486371761991 PMID:23009662

  7. Neural and Behavioral Mechanisms of Clear Speech

    ERIC Educational Resources Information Center

    Luque, Jenna Silver

    2017-01-01

    Clear speech is a speaking style that has been shown to improve intelligibility in adverse listening conditions, for various listener and talker populations. Clear-speech phonetic enhancements include a slowed speech rate, expanded vowel space, and expanded pitch range. Although clear-speech phonetic enhancements have been demonstrated across a…

  8. Next-Generation Psychiatric Assessment: Using Smartphone Sensors to Monitor Behavior and Mental Health

    PubMed Central

    Ben-Zeev, Dror; Scherer, Emily A.; Wang, Rui; Xie, Haiyi; Campbell, Andrew T.

    2015-01-01

    Objective Optimal mental health care is dependent upon sensitive and early detection of mental health problems. The current study introduces a state-of-the-art method for remote behavioral monitoring that transports assessment out of the clinic and into the environments in which individuals negotiate their daily lives. The objective of this study was examine whether the information captured with multi-modal smartphone sensors can serve as behavioral markers for one’s mental health. We hypothesized that: a) unobtrusively collected smartphone sensor data would be associated with individuals’ daily levels of stress, and b) sensor data would be associated with changes in depression, stress, and subjective loneliness over time. Methods A total of 47 young adults (age range: 19–30 y.o.) were recruited for the study. Individuals were enrolled as a single cohort and participated in the study over a 10-week period. Participants were provided with smartphones embedded with a range of sensors and software that enabled continuous tracking of their geospatial activity (using GPS and WiFi), kinesthetic activity (using multi-axial accelerometers), sleep duration (modeled using device use data, accelerometer inferences, ambient sound features, and ambient light levels), and time spent proximal to human speech (i.e., speech duration using microphone and speech detection algorithms). Participants completed daily ratings of stress, as well as pre/post measures of depression (Patient Health Questionnaire-9), stress (Perceived Stress Scale), and loneliness (Revised UCLA Loneliness Scale). Results Mixed-effects linear modeling showed that sensor-derived geospatial activity (p<.05), sleep duration (p<.05), and variability in geospatial activity (p<.05), were associated with daily stress levels. Penalized functional regression showed associations between changes in depression and sensor-derived speech duration (p<.05), geospatial activity (p<.05), and sleep duration (p<.05). Changes in loneliness were associated with sensor-derived kinesthetic activity (p<.01). Conclusions and implications for practice Smartphones can be harnessed as instruments for unobtrusive monitoring of several behavioral indicators of mental health. Creative leveraging of smartphone sensing will create novel opportunities for close-to-invisible psychiatric assessment at a scale and efficiency that far exceed what is currently feasible with existing assessment technologies. PMID:25844912

  9. Fifty years of progress in speech and speaker recognition

    NASA Astrophysics Data System (ADS)

    Furui, Sadaoki

    2004-10-01

    Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.

  10. Prior Knowledge Guides Speech Segregation in Human Auditory Cortex.

    PubMed

    Wang, Yuanye; Zhang, Jianfeng; Zou, Jiajie; Luo, Huan; Ding, Nai

    2018-05-18

    Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.

  11. Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures during Metronome-Paced Speech in Persons Who Stutter

    ERIC Educational Resources Information Center

    Davidow, Jason H.

    2014-01-01

    Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…

  12. Cognitive Load in Voice Therapy Carry-Over Exercises.

    PubMed

    Iwarsson, Jenny; Morris, David Jackson; Balling, Laura Winther

    2017-01-01

    The cognitive load generated by online speech production may vary with the nature of the speech task. This article examines 3 speech tasks used in voice therapy carry-over exercises, in which a patient is required to adopt and automatize new voice behaviors, ultimately in daily spontaneous communication. Twelve subjects produced speech in 3 conditions: rote speech (weekdays), sentences in a set form, and semispontaneous speech. Subjects simultaneously performed a secondary visual discrimination task for which response times were measured. On completion of each speech task, subjects rated their experience on a questionnaire. Response times from the secondary, visual task were found to be shortest for the rote speech, longer for the semispontaneous speech, and longest for the sentences within the set framework. Principal components derived from the subjective ratings were found to be linked to response times on the secondary visual task. Acoustic measures reflecting fundamental frequency distribution and vocal fold compression varied across the speech tasks. The results indicate that consideration should be given to the selection of speech tasks during the process leading to automation of revised speech behavior and that self-reports may be a reliable index of cognitive load.

  13. Phonologically-based biomarkers for major depressive disorder

    NASA Astrophysics Data System (ADS)

    Trevino, Andrea Carolina; Quatieri, Thomas Francis; Malyska, Nicolas

    2011-12-01

    Of increasing importance in the civilian and military population is the recognition of major depressive disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we introduce vocal biomarkers that are derived automatically from phonologically-based measures of speech rate. To assess our measures, we use a 35-speaker free-response speech database of subjects treated for depression over a 6-week duration. We find that dissecting average measures of speech rate into phone-specific characteristics and, in particular, combined phone-duration measures uncovers stronger relationships between speech rate and depression severity than global measures previously reported for a speech-rate biomarker. Results of this study are supported by correlation of our measures with depression severity and classification of depression state with these vocal measures. Our approach provides a general framework for analyzing individual symptom categories through phonological units, and supports the premise that speaking rate can be an indicator of psychomotor retardation severity.

  14. Incremental Phonological Encoding during Unscripted Sentence Production

    PubMed Central

    Jaeger, T. Florian; Furth, Katrina; Hilliard, Caitlin

    2012-01-01

    We investigate phonological encoding during unscripted sentence production, focusing on the effect of phonological overlap on phonological encoding. Previous work on this question has almost exclusively employed isolated word production or highly scripted multi-word production. These studies have led to conflicting results: some studies found that phonological overlap between two words facilitates phonological encoding, while others found inhibitory effects. One worry with many of these paradigms is that they involve processes that are not typical to everyday language use, which calls into question to what extent their findings speak to the architectures and mechanisms underlying language production. We present a paradigm to investigate the consequences of phonological overlap between words in a sentence while leaving speakers much of the lexical and structural choices typical in everyday language use. Adult native speakers of English described events in short video clips. We annotated the presence of disfluencies and the speech rate at various points throughout the sentence, as well as the constituent order. We find that phonological overlap has an inhibitory effect on phonological encoding. Specifically, if adjacent content words share their phonological onset (e.g., hand the hammer), they are preceded by production difficulty, as reflected in fluency and speech rate. We also find that this production difficulty affects speakers’ constituent order preferences during grammatical encoding. We discuss our results and previous works to isolate the properties of other paradigms that resulted in facilitatory or inhibitory results. The data from our paradigm also speak to questions about the scope of phonological planning in unscripted speech and as to whether phonological and grammatical encoding interact. PMID:23162515

  15. Quantification and Systematic Characterization of Stuttering-Like Disfluencies in Acquired Apraxia of Speech.

    PubMed

    Bailey, Dallin J; Blomgren, Michael; DeLong, Catharine; Berggren, Kiera; Wambaugh, Julie L

    2017-06-22

    The purpose of this article is to quantify and describe stuttering-like disfluencies in speakers with acquired apraxia of speech (AOS), utilizing the Lidcombe Behavioural Data Language (LBDL). Additional purposes include measuring test-retest reliability and examining the effect of speech sample type on disfluency rates. Two types of speech samples were elicited from 20 persons with AOS and aphasia: repetition of mono- and multisyllabic words from a protocol for assessing AOS (Duffy, 2013), and connected speech tasks (Nicholas & Brookshire, 1993). Sampling was repeated at 1 and 4 weeks following initial sampling. Stuttering-like disfluencies were coded using the LBDL, which is a taxonomy that focuses on motoric aspects of stuttering. Disfluency rates ranged from 0% to 13.1% for the connected speech task and from 0% to 17% for the word repetition task. There was no significant effect of speech sampling time on disfluency rate in the connected speech task, but there was a significant effect of time for the word repetition task. There was no significant effect of speech sample type. Speakers demonstrated both major types of stuttering-like disfluencies as categorized by the LBDL (fixed postures and repeated movements). Connected speech samples yielded more reliable tallies over repeated measurements. Suggestions are made for modifying the LBDL for use in AOS in order to further add to systematic descriptions of motoric disfluencies in this disorder.

  16. Long-Term Trajectories of the Development of Speech Sound Production in Pediatric Cochlear Implant Recipients

    PubMed Central

    Tomblin, J. Bruce; Peng, Shu-Chen; Spencer, Linda J.; Lu, Nelson

    2011-01-01

    Purpose This study characterized the development of speech sound production in prelingually deaf children with a minimum of 8 years of cochlear implant (CI) experience. Method Twenty-seven pediatric CI recipients' spontaneous speech samples from annual evaluation sessions were phonemically transcribed. Accuracy for these speech samples was evaluated in piecewise regression models. Results As a group, pediatric CI recipients showed steady improvement in speech sound production following implantation, but the improvement rate declined after 6 years of device experience. Piecewise regression models indicated that the slope estimating the participants' improvement rate was statistically greater than 0 during the first 6 years postimplantation, but not after 6 years. The group of pediatric CI recipients' accuracy of speech sound production after 4 years of device experience reasonably predicts their speech sound production after 5–10 years of device experience. Conclusions The development of speech sound production in prelingually deaf children stabilizes after 6 years of device experience, and typically approaches a plateau by 8 years of device use. Early growth in speech before 4 years of device experience did not predict later rates of growth or levels of achievement. However, good predictions could be made after 4 years of device use. PMID:18695018

  17. A dynamic multi-channel speech enhancement system for distributed microphones in a car environment

    NASA Astrophysics Data System (ADS)

    Matheja, Timo; Buck, Markus; Fingscheidt, Tim

    2013-12-01

    Supporting multiple active speakers in automotive hands-free or speech dialog applications is an interesting issue not least due to comfort reasons. Therefore, a multi-channel system for enhancement of speech signals captured by distributed distant microphones in a car environment is presented. Each of the potential speakers in the car has a dedicated directional microphone close to his position that captures the corresponding speech signal. The aim of the resulting overall system is twofold: On the one hand, a combination of an arbitrary pre-defined subset of speakers' signals can be performed, e.g., to create an output signal in a hands-free telephone conference call for a far-end communication partner. On the other hand, annoying cross-talk components from interfering sound sources occurring in multiple different mixed output signals are to be eliminated, motivated by the possibility of other hands-free applications being active in parallel. The system includes several signal processing stages. A dedicated signal processing block for interfering speaker cancellation attenuates the cross-talk components of undesired speech. Further signal enhancement comprises the reduction of residual cross-talk and background noise. Subsequently, a dynamic signal combination stage merges the processed single-microphone signals to obtain appropriate mixed signals at the system output that may be passed to applications such as telephony or a speech dialog system. Based on signal power ratios between the particular microphone signals, an appropriate speaker activity detection and therewith a robust control mechanism of the whole system is presented. The proposed system may be dynamically configured and has been evaluated for a car setup with four speakers sitting in the car cabin disturbed in various noise conditions.

  18. Classification of Parkinson's disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples.

    PubMed

    Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang

    2016-11-16

    The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.

  19. Cleft audit protocol for speech (CAPS-A): a comprehensive training package for speech analysis.

    PubMed

    Sell, D; John, A; Harding-Bell, A; Sweeney, T; Hegarty, F; Freeman, J

    2009-01-01

    The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been paid to this issue. To design, execute, and evaluate a training programme for speech and language therapists on the systematic and reliable use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A), addressing issues of standardized speech samples, data acquisition, recording, playback, and listening guidelines. Thirty-six specialist speech and language therapists undertook the training programme over four days. This consisted of two days' training on the CAPS-A tool followed by a third day, making independent ratings and transcriptions on ten new cases which had been previously recorded during routine audit data collection. This task was repeated on day 4, a minimum of one month later. Ratings were made using the CAPS-A record form with the CAPS-A definition table. An analysis was made of the speech and language therapists' CAPS-A ratings at occasion 1 and occasion 2 and the intra- and inter-rater reliability calculated. Trained therapists showed consistency in individual judgements on specific sections of the tool. Intraclass correlation coefficients were calculated for each section with good agreement on eight of 13 sections. There were only fair levels of agreement on anterior oral cleft speech characteristics, non-cleft errors/immaturities and voice. This was explained, at least in part, by their low prevalence which affects the calculation of the intraclass correlation coefficient statistic. Speech and language therapists benefited from training on the CAPS-A, focusing on specific aspects of speech using definitions of parameters and scalar points, in order to apply the tool systematically and reliably. Ratings are enhanced by ensuring a high degree of attention to the nature of the data, standardizing the speech sample, data acquisition, the listening process together with the use of high-quality recording and playback equipment. In addition, a method is proposed for maintaining listening skills following training as part of an individual's continuing education.

  20. The Effect of Speech Repetition Rate on Neural Activation in Healthy Adults: Implications for Treatment of Aphasia and Other Fluency Disorders.

    PubMed

    Marchina, Sarah; Norton, Andrea; Kumar, Sandeep; Schlaug, Gottfried

    2018-01-01

    Functional imaging studies have provided insight into the effect of rate on production of syllables, pseudowords, and naturalistic speech, but the influence of rate on repetition of commonly-used words/phrases suitable for therapeutic use merits closer examination. Aim: To identify speech-motor regions responsive to rate and test the hypothesis that those regions would provide greater support as rates increase, we used an overt speech repetition task and functional magnetic resonance imaging (fMRI) to capture rate-modulated activation within speech-motor regions and determine whether modulations occur linearly and/or show hemispheric preference. Methods: Twelve healthy, right-handed adults participated in an fMRI task requiring overt repetition of commonly-used words/phrases at rates of 1, 2, and 3 syllables/second (syll./sec.). Results: Across all rates, bilateral activation was found both in ventral portions of primary sensorimotor cortex and middle and superior temporal regions. A repeated measures analysis of variance with pairwise comparisons revealed an overall difference between rates in temporal lobe regions of interest (ROIs) bilaterally ( p < 0.001); all six comparisons reached significance ( p < 0.05). Five of the six were highly significant ( p < 0.008), while the left-hemisphere 2- vs. 3-syll./sec. comparison, though still significant, was less robust ( p = 0.037). Temporal ROI mean beta-values increased linearly across the three rates bilaterally. Significant rate effects observed in the temporal lobes were slightly more pronounced in the right-hemisphere. No significant overall rate differences were seen in sensorimotor ROIs, nor was there a clear hemispheric effect. Conclusion: Linear effects in superior temporal ROIs suggest that sensory feedback corresponds directly to task demands. The lesser degree of significance in left-hemisphere activation at the faster, closer-to-normal rate may represent an increase in neural efficiency (and therefore, decreased demand) when the task so closely approximates a highly-practiced function. The presence of significant bilateral activation during overt repetition of words/phrases at all three rates suggests that repetition-based speech production may draw support from either or both hemispheres. This bihemispheric redundancy in regions associated with speech-motor control and their sensitivity to changes in rate may play an important role in interventions for nonfluent aphasia and other fluency disorders, particularly when right-hemisphere structures are the sole remaining pathway for production of meaningful speech.

  1. Open-Source Multi-Language Audio Database for Spoken Language Processing Applications

    DTIC Science & Technology

    2012-12-01

    Mandarin, and Russian . Approximately 30 hours of speech were collected for each language. Each passage has been carefully transcribed at the...manual and automatic methods. The Russian passages have not yet been marked at the phonetic level. Another phase of the work was to explore...You Tube. 300 passages were collected in each of three languages—English, Mandarin, and Russian . Approximately 30 hours of speech were

  2. Working Memory and Speech Recognition in Noise Under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type Among Adults With Hearing Loss

    PubMed Central

    Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

    2017-01-01

    Purpose This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2 measures of WM were taken: a reading span measure, and Word Auditory Recognition and Recall Measure (Smith, Pichora-Fuller, & Alexander, 2016). Speech recognition was measured with the Multi-Modal Lexical Sentence Test for Adults (Kirk et al., 2012) in steady-state noise and 4-talker babble, with and without visual cues. Testing was under unaided conditions. Results A linear mixed model revealed visual cues and pure-tone average as the only significant predictors of Multi-Modal Lexical Sentence Test outcomes. Neither WM measure nor noise type showed a significant effect. Conclusion The contribution of WM in explaining unaided speech recognition in noise was negligible and not influenced by noise type or visual cues. We anticipate that with audibility partially restored by hearing aids, the effects of WM will increase. For clinical practice to be affected, more significant effect sizes are needed. PMID:28744550

  3. Multi-modal highlight generation for sports videos using an information-theoretic excitability measure

    NASA Astrophysics Data System (ADS)

    Hasan, Taufiq; Bořil, Hynek; Sangwan, Abhijeet; L Hansen, John H.

    2013-12-01

    The ability to detect and organize `hot spots' representing areas of excitement within video streams is a challenging research problem when techniques rely exclusively on video content. A generic method for sports video highlight selection is presented in this study which leverages both video/image structure as well as audio/speech properties. Processing begins where the video is partitioned into small segments and several multi-modal features are extracted from each segment. Excitability is computed based on the likelihood of the segmental features residing in certain regions of their joint probability density function space which are considered both exciting and rare. The proposed measure is used to rank order the partitioned segments to compress the overall video sequence and produce a contiguous set of highlights. Experiments are performed on baseball videos based on signal processing advancements for excitement assessment in the commentators' speech, audio energy, slow motion replay, scene cut density, and motion activity as features. Detailed analysis on correlation between user excitability and various speech production parameters is conducted and an effective scheme is designed to estimate the excitement level of commentator's speech from the sports videos. Subjective evaluation of excitability and ranking of video segments demonstrate a higher correlation with the proposed measure compared to well-established techniques indicating the effectiveness of the overall approach.

  4. Stuttered and Fluent Speakers' Heart Rate and Skin Conductance in Response to Fluent and Stuttered Speech

    ERIC Educational Resources Information Center

    Zhang, Jianliang; Kalinowski, Joseph; Saltuklaroglu, Tim; Hudock, Daniel

    2010-01-01

    Background: Previous studies have found simultaneous increases in skin conductance response and decreases in heart rate when normally fluent speakers watched and listened to stuttered speech compared with fluent speech, suggesting that stuttering induces arousal and emotional unpleasantness in listeners. However, physiological responses of persons…

  5. Not so fast: Fast speech correlates with lower lexical and structural information.

    PubMed

    Cohen Priva, Uriel

    2017-03-01

    Speakers dynamically adjust their speech rate throughout conversations. These adjustments have been linked to cognitive and communicative limitations: for example, speakers speak words that are contextually unexpected (and thus add more information) with slower speech rates. This raises the question whether limitations of this type vary wildly across speakers or are relatively constant. The latter predicts that across speakers (or conversations), speech rate and the amount of information content are inversely correlated: on average, speakers can either provide high information content or speak quickly, but not both. Using two corpus studies replicated across two corpora, I demonstrate that indeed, fast speech correlates with the use of less informative words and syntactic structures. Thus, while there are individual differences in overall information throughput, speakers are more similar in this aspect than differences in speech rate would suggest. The results suggest that information theoretic constraints on production operate at a higher level than was observed before and affect language throughout production, not only after words and structures are chosen. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Neural Oscillations Carry Speech Rhythm through to Comprehension

    PubMed Central

    Peelle, Jonathan E.; Davis, Matthew H.

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251

  7. 78 FR 57648 - Notice of Issuance of Final Determination Concerning Video Teleconferencing Server

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-09-19

    ... the Chinese- origin Video Board and the Filter Board, impart the essential character to the video... includes the codec; a network filter electronic circuit board (``Filter Board''); a housing case; a power... (``Linux software''). The Linux software allows the Filter Board to inspect each Ethernet packet of...

  8. Using on-line altered auditory feedback treating Parkinsonian speech

    NASA Astrophysics Data System (ADS)

    Wang, Emily; Verhagen, Leo; de Vries, Meinou H.

    2005-09-01

    Patients with advanced Parkinson's disease tend to have dysarthric speech that is hesitant, accelerated, and repetitive, and that is often resistant to behavior speech therapy. In this pilot study, the speech disturbances were treated using on-line altered feedbacks (AF) provided by SpeechEasy (SE), an in-the-ear device registered with the FDA for use in humans to treat chronic stuttering. Eight PD patients participated in the study. All had moderate to severe speech disturbances. In addition, two patients had moderate recurring stuttering at the onset of PD after long remission since adolescence, two had bilateral STN DBS, and two bilateral pallidal DBS. An effective combination of delayed auditory feedback and frequency-altered feedback was selected for each subject and provided via SE worn in one ear. All subjects produced speech samples (structured-monologue and reading) under three conditions: baseline, with SE without, and with feedbacks. The speech samples were randomly presented and rated for speech intelligibility goodness using UPDRS-III item 18 and the speaking rate. The results indicted that SpeechEasy is well tolerated and AF can improve speech intelligibility in spontaneous speech. Further investigational use of this device for treating speech disorders in PD is warranted [Work partially supported by Janus Dev. Group, Inc.].

  9. Highly efficient codec based on significance-linked connected-component analysis of wavelet coefficients

    NASA Astrophysics Data System (ADS)

    Chai, Bing-Bing; Vass, Jozsef; Zhuang, Xinhua

    1997-04-01

    Recent success in wavelet coding is mainly attributed to the recognition of importance of data organization. There has been several very competitive wavelet codecs developed, namely, Shapiro's Embedded Zerotree Wavelets (EZW), Servetto et. al.'s Morphological Representation of Wavelet Data (MRWD), and Said and Pearlman's Set Partitioning in Hierarchical Trees (SPIHT). In this paper, we propose a new image compression algorithm called Significant-Linked Connected Component Analysis (SLCCA) of wavelet coefficients. SLCCA exploits both within-subband clustering of significant coefficients and cross-subband dependency in significant fields. A so-called significant link between connected components is designed to reduce the positional overhead of MRWD. In addition, the significant coefficients' magnitude are encoded in bit plane order to match the probability model of the adaptive arithmetic coder. Experiments show that SLCCA outperforms both EZW and MRWD, and is tied with SPIHT. Furthermore, it is observed that SLCCA generally has the best performance on images with large portion of texture. When applied to fingerprint image compression, it outperforms FBI's wavelet scalar quantization by about 1 dB.

  10. Secure transport and adaptation of MC-EZBC video utilizing H.264-based transport protocols☆

    PubMed Central

    Hellwagner, Hermann; Hofbauer, Heinz; Kuschnig, Robert; Stütz, Thomas; Uhl, Andreas

    2012-01-01

    Universal Multimedia Access (UMA) calls for solutions where content is created once and subsequently adapted to given requirements. With regard to UMA and scalability, which is required often due to a wide variety of end clients, the best suited codecs are wavelet based (like the MC-EZBC) due to their inherent high number of scaling options. However, most transport technologies for delivering videos to end clients are targeted toward the H.264/AVC standard or, if scalability is required, the H.264/SVC. In this paper we will introduce a mapping of the MC-EZBC bitstream to existing H.264/SVC based streaming and scaling protocols. This enables the use of highly scalable wavelet based codecs on the one hand and the utilization of already existing network technologies without accruing high implementation costs on the other hand. Furthermore, we will evaluate different scaling options in order to choose the best option for given requirements. Additionally, we will evaluate different encryption options based on transport and bitstream encryption for use cases where digital rights management is required. PMID:26869746

  11. Backwards compatible high dynamic range video compression

    NASA Astrophysics Data System (ADS)

    Dolzhenko, Vladimir; Chesnokov, Vyacheslav; Edirisinghe, Eran A.

    2014-02-01

    This paper presents a two layer CODEC architecture for high dynamic range video compression. The base layer contains the tone mapped video stream encoded with 8 bits per component which can be decoded using conventional equipment. The base layer content is optimized for rendering on low dynamic range displays. The enhancement layer contains the image difference, in perceptually uniform color space, between the result of inverse tone mapped base layer content and the original video stream. Prediction of the high dynamic range content reduces the redundancy in the transmitted data while still preserves highlights and out-of-gamut colors. Perceptually uniform colorspace enables using standard ratedistortion optimization algorithms. We present techniques for efficient implementation and encoding of non-uniform tone mapping operators with low overhead in terms of bitstream size and number of operations. The transform representation is based on human vision system model and suitable for global and local tone mapping operators. The compression techniques include predicting the transform parameters from previously decoded frames and from already decoded data for current frame. Different video compression techniques are compared: backwards compatible and non-backwards compatible using AVC and HEVC codecs.

  12. Predicting chroma from luma with frequency domain intra prediction

    NASA Astrophysics Data System (ADS)

    Egge, Nathan E.; Valin, Jean-Marc

    2015-03-01

    This paper describes a technique for performing intra prediction of the chroma planes based on the reconstructed luma plane in the frequency domain. This prediction exploits the fact that while RGB to YUV color conversion has the property that it decorrelates the color planes globally across an image, there is still some correlation locally at the block level.1 Previous proposals compute a linear model of the spatial relationship between the luma plane (Y) and the two chroma planes (U and V).2 In codecs that use lapped transforms this is not possible since transform support extends across the block boundaries3 and thus neighboring blocks are unavailable during intra- prediction. We design a frequency domain intra predictor for chroma that exploits the same local correlation with lower complexity than the spatial predictor and which works with lapped transforms. We then describe a low- complexity algorithm that directly uses luma coefficients as a chroma predictor based on gain-shape quantization and band partitioning. An experiment is performed that compares these two techniques inside the experimental Daala video codec and shows the lower complexity algorithm to be a better chroma predictor.

  13. Intensive Speech and Language Therapy for Older Children with Cerebral Palsy: A Systems Approach

    ERIC Educational Resources Information Center

    Pennington, Lindsay; Miller, Nick; Robson, Sheila; Steen, Nick

    2010-01-01

    Aim: To investigate whether speech therapy using a speech systems approach to controlling breath support, phonation, and speech rate can increase the speech intelligibility of children with dysarthria and cerebral palsy (CP). Method: Sixteen children with dysarthria and CP participated in a modified time series design. Group characteristics were…

  14. Increasing Parental Involvement in Speech-Sound Remediation

    ERIC Educational Resources Information Center

    Roberts, Micah Renee Ferguson

    2014-01-01

    Speech therapy homework is a key component of a successful speech therapy program, increasing carryover of learned speech sounds. Poor return rate of homework assigned, with a lack of parental involvement, is a problem. The purpose of this project study was to examine what may increase parental participation in speech therapy homework. Guided by…

  15. Automated Speech Rate Measurement in Dysarthria.

    PubMed

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-06-01

    In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. The new algorithm was trained and tested using Dutch speech samples of 36 speakers with no history of speech impairment and 40 speakers with mild to moderate dysarthria. We tested the algorithm under various conditions: according to speech task type (sentence reading, passage reading, and storytelling) and algorithm optimization method (speaker group optimization and individual speaker optimization). Correlations between automated and human SR determination were calculated for each condition. High correlations between automated and human SR determination were found in the various testing conditions. The new algorithm measures SR in a sufficiently reliable manner. It is currently being integrated in a clinical software tool for assessing and managing prosody in dysarthric speech. Further research is needed to fine-tune the algorithm to severely dysarthric speech, to make the algorithm less sensitive to background noise, and to evaluate how the algorithm deals with syllabic consonants.

  16. θ-Band and β-Band Neural Activity Reflects Independent Syllable Tracking and Comprehension of Time-Compressed Speech.

    PubMed

    Pefkou, Maria; Arnal, Luc H; Fontolan, Lorenzo; Giraud, Anne-Lise

    2017-08-16

    Recent psychophysics data suggest that speech perception is not limited by the capacity of the auditory system to encode fast acoustic variations through neural γ activity, but rather by the time given to the brain to decode them. Whether the decoding process is bounded by the capacity of θ rhythm to follow syllabic rhythms in speech, or constrained by a more endogenous top-down mechanism, e.g., involving β activity, is unknown. We addressed the dynamics of auditory decoding in speech comprehension by challenging syllable tracking and speech decoding using comprehensible and incomprehensible time-compressed auditory sentences. We recorded EEGs in human participants and found that neural activity in both θ and γ ranges was sensitive to syllabic rate. Phase patterns of slow neural activity consistently followed the syllabic rate (4-14 Hz), even when this rate went beyond the classical θ range (4-8 Hz). The power of θ activity increased linearly with syllabic rate but showed no sensitivity to comprehension. Conversely, the power of β (14-21 Hz) activity was insensitive to the syllabic rate, yet reflected comprehension on a single-trial basis. We found different long-range dynamics for θ and β activity, with β activity building up in time while more contextual information becomes available. This is consistent with the roles of θ and β activity in stimulus-driven versus endogenous mechanisms. These data show that speech comprehension is constrained by concurrent stimulus-driven θ and low-γ activity, and by endogenous β activity, but not primarily by the capacity of θ activity to track the syllabic rhythm. SIGNIFICANCE STATEMENT Speech comprehension partly depends on the ability of the auditory cortex to track syllable boundaries with θ-range neural oscillations. The reason comprehension drops when speech is accelerated could hence be because θ oscillations can no longer follow the syllabic rate. Here, we presented subjects with comprehensible and incomprehensible accelerated speech, and show that neural phase patterns in the θ band consistently reflect the syllabic rate, even when speech becomes too fast to be intelligible. The drop in comprehension, however, is signaled by a significant decrease in the power of low-β oscillations (14-21 Hz). These data suggest that speech comprehension is not limited by the capacity of θ oscillations to adapt to syllabic rate, but by an endogenous decoding process. Copyright © 2017 the authors 0270-6474/17/377930-09$15.00/0.

  17. a Comparative Analysis of Fluent and Cerebral Palsied Speech.

    NASA Astrophysics Data System (ADS)

    van Doorn, Janis Lee

    Several features of the acoustic waveforms of fluent and cerebral palsied speech were compared, using six fluent and seven cerebral palsied subjects, with a major emphasis being placed on an investigation of the trajectories of the first three formants (vocal tract resonances). To provide an overall picture which included other acoustic features, fundamental frequency, intensity, speech timing (speech rate and syllable duration), and prevocalization (vocalization prior to initial stop consonants found in cerebral palsied speech) were also investigated. Measurements were made using repetitions of a test sentence which was chosen because it required large excursions of the speech articulators (lips, tongue and jaw), so that differences in the formant trajectories for the fluent and cerebral palsied speakers would be emphasized. The acoustic features were all extracted from the digitized speech waveform (10 kHz sampling rate): the fundamental frequency contours were derived manually, the intensity contours were measured using the signal covariance, speech rate and syllable durations were measured manually, as were the prevocalization durations, while the formant trajectories were derived from short time spectra which were calculated for each 10 ms of speech using linear prediction analysis. Differences which were found in the acoustic features can be summarized as follows. For cerebral palsied speakers, the fundamental frequency contours generally showed inappropriate exaggerated fluctuations, as did some of the intensity contours; the mean fundamental frequencies were either higher or the same as for the fluent subjects; speech rates were reduced, and syllable durations were longer; prevocalization was consistently present at the beginning of the test sentence; formant trajectories were found to have overall reduced frequency ranges, and to contain anomalous transitional features, but it is noteworthy that for any one cerebral palsied subject, the inappropriate trajectory pattern was generally reproducible. The anomalous transitional features took the form of (a) inappropriate transition patterns, (b) reduced frequency excursions, (c) increased transition durations, and (d) decreased maximum rates of frequency change.

  18. Speech rate and fluency in children with phonological disorder.

    PubMed

    Novaes, Priscila Maronezi; Nicolielo-Carrilho, Ana Paola; Lopes-Herrera, Simone Aparecida

    2015-01-01

    To identify and describe the speech rate and fluency of children with phonological disorder (PD) with and without speech-language therapy. Thirty children, aged 5-8 years old, both genders, were divided into three groups: experimental group 1 (G1) — 10 children with PD in intervention; experimental group 2 (G2) — 10 children with PD without intervention; and control group (CG) — 10 children with typical development. Speech samples were collected and analyzed according to parameters of specific protocol. The children in CG had higher number of words per minute compared to those in G1, which, in turn, performed better in this aspect compared to children in G2. Regarding the number of syllables per minute, the CG showed the best result. In this aspect, the children in G1 showed better results than those in G2. Comparing children's performance in the assessed groups regarding the tests, those with PD in intervention had higher time of speech sample and adequate speech rate, which may be indicative of greater auditory monitoring of their own speech as a result of the intervention.

  19. Pilot Workload and Speech Analysis: A Preliminary Investigation

    NASA Technical Reports Server (NTRS)

    Bittner, Rachel M.; Begault, Durand R.; Christopher, Bonny R.

    2013-01-01

    Prior research has questioned the effectiveness of speech analysis to measure the stress, workload, truthfulness, or emotional state of a talker. The question remains regarding the utility of speech analysis for restricted vocabularies such as those used in aviation communications. A part-task experiment was conducted in which participants performed Air Traffic Control read-backs in different workload environments. Participant's subjective workload and the speech qualities of fundamental frequency (F0) and articulation rate were evaluated. A significant increase in subjective workload rating was found for high workload segments. F0 was found to be significantly higher during high workload while articulation rates were found to be significantly slower. No correlation was found to exist between subjective workload and F0 or articulation rate.

  20. Predicting clinical decline in progressive agrammatic aphasia and apraxia of speech.

    PubMed

    Whitwell, Jennifer L; Weigand, Stephen D; Duffy, Joseph R; Clark, Heather M; Strand, Edythe A; Machulda, Mary M; Spychalla, Anthony J; Senjem, Matthew L; Jack, Clifford R; Josephs, Keith A

    2017-11-28

    To determine whether baseline clinical and MRI features predict rate of clinical decline in patients with progressive apraxia of speech (AOS). Thirty-four patients with progressive AOS, with AOS either in isolation or in the presence of agrammatic aphasia, were followed up longitudinally for up to 4 visits, with clinical testing and MRI at each visit. Linear mixed-effects regression models including all visits (n = 94) were used to assess baseline clinical and MRI variables that predict rate of worsening of aphasia, motor speech, parkinsonism, and behavior. Clinical predictors included baseline severity and AOS type. MRI predictors included baseline frontal, premotor, motor, and striatal gray matter volumes. More severe parkinsonism at baseline was associated with faster rate of decline in parkinsonism. Patients with predominant sound distortions (AOS type 1) showed faster rates of decline in aphasia and motor speech, while patients with segmented speech (AOS type 2) showed faster rates of decline in parkinsonism. On MRI, we observed trends for fastest rates of decline in aphasia in patients with relatively small left, but preserved right, Broca area and precentral cortex. Bilateral reductions in lateral premotor cortex were associated with faster rates of decline of behavior. No associations were observed between volumes and decline in motor speech or parkinsonism. Rate of decline of each of the 4 clinical features assessed was associated with different baseline clinical and regional MRI predictors. Our findings could help improve prognostic estimates for these patients. © 2017 American Academy of Neurology.

  1. Research on the optoacoustic communication system for speech transmission by variable laser-pulse repetition rates

    NASA Astrophysics Data System (ADS)

    Jiang, Hongyan; Qiu, Hongbing; He, Ning; Liao, Xin

    2018-06-01

    For the optoacoustic communication from in-air platforms to submerged apparatus, a method based on speech recognition and variable laser-pulse repetition rates is proposed, which realizes character encoding and transmission for speech. Firstly, the theories and spectrum characteristics of the laser-generated underwater sound are analyzed; and moreover character conversion and encoding for speech as well as the pattern of codes for laser modulation is studied; lastly experiments to verify the system design are carried out. Results show that the optoacoustic system, where laser modulation is controlled by speech-to-character baseband codes, is beneficial to improve flexibility in receiving location for underwater targets as well as real-time performance in information transmission. In the overwater transmitter, a pulse laser is controlled to radiate by speech signals with several repetition rates randomly selected in the range of one to fifty Hz, and then in the underwater receiver laser pulse repetition rate and data can be acquired by the preamble and information codes of the corresponding laser-generated sound. When the energy of the laser pulse is appropriate, real-time transmission for speaker-independent speech can be realized in that way, which solves the problem of underwater bandwidth resource and provides a technical approach for the air-sea communication.

  2. Movement goals and feedback and feedforward control mechanisms in speech production

    PubMed Central

    Perkell, Joseph S.

    2010-01-01

    Studies of speech motor control are described that support a theoretical framework in which fundamental control variables for phonemic movements are multi-dimensional regions in auditory and somatosensory spaces. Auditory feedback is used to acquire and maintain auditory goals and in the development and function of feedback and feedforward control mechanisms. Several lines of evidence support the idea that speakers with more acute sensory discrimination acquire more distinct goal regions and therefore produce speech sounds with greater contrast. Feedback modification findings indicate that fluently produced sound sequences are encoded as feedforward commands, and feedback control serves to correct mismatches between expected and produced sensory consequences. PMID:22661828

  3. Movement goals and feedback and feedforward control mechanisms in speech production.

    PubMed

    Perkell, Joseph S

    2012-09-01

    Studies of speech motor control are described that support a theoretical framework in which fundamental control variables for phonemic movements are multi-dimensional regions in auditory and somatosensory spaces. Auditory feedback is used to acquire and maintain auditory goals and in the development and function of feedback and feedforward control mechanisms. Several lines of evidence support the idea that speakers with more acute sensory discrimination acquire more distinct goal regions and therefore produce speech sounds with greater contrast. Feedback modification findings indicate that fluently produced sound sequences are encoded as feedforward commands, and feedback control serves to correct mismatches between expected and produced sensory consequences.

  4. The Reliability of Methodological Ratings for speechBITE Using the PEDro-P Scale

    ERIC Educational Resources Information Center

    Murray, Elizabeth; Power, Emma; Togher, Leanne; McCabe, Patricia; Munro, Natalie; Smith, Katherine

    2013-01-01

    Background: speechBITE (http://www.speechbite.com) is an online database established in order to help speech and language therapists gain faster access to relevant research that can used in clinical decision-making. In addition to containing more than 3000 journal references, the database also provides methodological ratings on the PEDro-P (an…

  5. Measuring Severity of Involvement in Speech Delay: Segmental and Whole-Word Measures

    ERIC Educational Resources Information Center

    Flipsen, Peter, Jr.; Hammer, Jill B.; Yost, Kathryn M.

    2005-01-01

    Purpose: This study examined whether any of a series of segmental and whole-word measures of articulatory competence captured more of the variance in impressionistic ratings of severity of involvement in speech delay. It also examined whether knowing the age of the child affected severity ratings. Method: Ten very experienced speech-language…

  6. Individual Variability in Delayed Auditory Feedback Effects on Speech Fluency and Rate in Normally Fluent Adults

    ERIC Educational Resources Information Center

    Chon, HeeCheong; Kraft, Shelly Jo; Zhang, Jingfei; Loucks, Torrey; Ambrose, Nicoline G.

    2013-01-01

    Purpose: Delayed auditory feedback (DAF) is known to induce stuttering-like disfluencies (SLDs) and cause speech rate reductions in normally fluent adults, but the reason for speech disruptions is not fully known, and individual variation has not been well characterized. Studying individual variation in susceptibility to DAF may identify factors…

  7. Relationship between Perceptual Ratings of Nasality and Nasometry in Children/adolescents with Cleft Palate and/or Velopharyngeal Dysfunction

    ERIC Educational Resources Information Center

    Sweeney, Triona; Sell, Debbie

    2008-01-01

    Background: Nasometry has supplemented perceptual assessments of nasality, using speech stimuli, which are devoid of nasal consonants. However, such speech stimuli are not representative of conversational speech. A weak relationship has been found in previous studies between perceptual ratings of hypernasality and nasalance scores for passages…

  8. An Empirical Investigation of Mode of Delivery, Ratings of Speech Characteristics, and Perceptions of Speaking Effectiveness.

    ERIC Educational Resources Information Center

    Vallin, Marlene Boyd

    A study tested those theories upon which instruction and curriculum in speech and public communication are based. The study investigated the relationship of mode of delivery on ratings of individual speech characteristics as well as the relationship of these perceptions of effectiveness in a public communication setting. Twenty-four videotapes of…

  9. Revisiting Speech Rate and Utterance Length Manipulations in Stuttering Speakers

    ERIC Educational Resources Information Center

    Blomgren, Michael; Goberman, Alexander M.

    2008-01-01

    The goal of this study was to evaluate stuttering frequency across a multidimensional (2 x 2) hierarchy of speech performance tasks. Specifically, this study examined the interaction between changes in length of utterance and levels of speech rate stability. Forty-four adult male speakers participated in the study (22 stuttering speakers and 22…

  10. On mobile wireless ad hoc IP video transports

    NASA Astrophysics Data System (ADS)

    Kazantzidis, Matheos

    2006-05-01

    Multimedia transports in wireless, ad-hoc, multi-hop or mobile networks must be capable of obtaining information about the network and adaptively tune sending and encoding parameters to the network response. Obtaining meaningful metrics to guide a stable congestion control mechanism in the transport (i.e. passive, simple, end-to-end and network technology independent) is a complex problem. Equally difficult is obtaining a reliable QoS metrics that agrees with user perception in a client/server or distributed environment. Existing metrics, objective or subjective, are commonly used after or before to test or report on a transmission and require access to both original and transmitted frames. In this paper, we propose that an efficient and successful video delivery and the optimization of overall network QoS requires innovation in a) a direct measurement of available and bottleneck capacity for its congestion control and b) a meaningful subjective QoS metric that is dynamically reported to video sender. Once these are in place, a binomial -stable, fair and TCP friendly- algorithm can be used to determine the sending rate and other packet video parameters. An adaptive mpeg codec can then continually test and fit its parameters and temporal-spatial data-error control balance using the perceived QoS dynamic feedback. We suggest a new measurement based on a packet dispersion technique that is independent of underlying network mechanisms. We then present a binomial control based on direct measurements. We implement a QoS metric that is known to agree with user perception (MPQM) in a client/server, distributed environment by using predetermined table lookups and characterization of video content.

  11. Verbal Short-Term Memory Span in Speech-Disordered Children: Implications for Articulatory Coding in Short-Term Memory.

    ERIC Educational Resources Information Center

    Raine, Adrian; And Others

    1991-01-01

    Children with speech disorders had lower short-term memory capacity and smaller word length effect than control children. Children with speech disorders also had reduced speech-motor activity during rehearsal. Results suggest that speech rate may be a causal determinant of verbal short-term memory capacity. (BC)

  12. Construct-related validity of the TOCS measures: comparison of intelligibility and speaking rate scores in children with and without speech disorders.

    PubMed

    Hodge, Megan M; Gotzke, Carrie L

    2014-01-01

    This study evaluated construct-related validity of the Test of Children's Speech (TOCS). Intelligibility scores obtained using open-set word identification tasks (orthographic transcription) for the TOCS word and sentence tests and rate scores for the TOCS sentence test (words per minute or WPM and intelligible words per minute or IWPM) were compared for a group of 15 adults (18-30 years of age) with normal speech production and three groups of children: 48 3-6 year-olds with typical speech development and neurological histories (TDS), 48 3-6 year-olds with a speech sound disorder of unknown origin and no identified neurological impairment (SSD-UNK), and 22 3-10 year-olds with dysarthria and cerebral palsy (DYS). As expected, mean intelligibility scores and rates increased with age in the TDS group. However, word test intelligibility, WPM and IWPM scores for the 6 year-olds in the TDS group were significantly lower than those for the adults. The DYS group had significantly lower word and sentence test intelligibility and WPM and IWPM scores than the TDS and SSD-UNK groups. Compared to the TDS group, the SSD-UNK group also had significantly lower intelligibility scores for the word and sentence tests, and significantly lower IWPM, but not WPM scores on the sentence test. The results support the construct-related validity of TOCS as a tool for obtaining intelligibility and rate scores that are sensitive to group differences in 3-6 year-old children, with and without speech sound disorders, and to 3+ year-old children with speech disorders, with and without dysarthria. Readers will describe the word and sentence intelligibility and speaking rate performance of children with typically developing speech at age levels of 3, 4, 5 and 6 years, as measured by the Test of Children's Speech, and how these compare with adult speakers and two groups of children with speech disorders. They will also recognize what measures on this test differentiate children with speech sound disorders of unknown origin from children with cerebral palsy and dysarthria. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. The effect of group music therapy on mood, speech, and singing in individuals with Parkinson's disease--a feasibility study.

    PubMed

    Elefant, Cochavit; Baker, Felicity A; Lotan, Meir; Lagesen, Simen Krogstie; Skeie, Geir Olve

    2012-01-01

    Parkinson's disease (PD) is a progressive neurodegenerative disorder where patients exhibit impairments in speech production. Few studies have investigated the influence of music interventions on vocal abilities of individuals with PD. To evaluate the influence of a group voice and singing intervention on speech, singing, and depressive symptoms in individuals with PD. Ten patients diagnosed with PD participated in this one-group, repeated measures design study. Participants received the sixty-minute intervention, in a small group setting once a week for 20 consecutive weeks. Speech and singing quality were acoustically analyzed using a KayPentax Multi-Dimensional Voice Program, voice ability using the Voice Handicap Index (VHI), and depressive symptoms using the Montgomery and Asberg Depression rating scale (MADRS). Measures were taken at baseline (Time 1), after 10 weeks of weekly sessions (Time 2), and after 20 weeks of weekly sessions (Time 3). Significant changes were observed for five of the six singing quality outcomes at Time 2 and 3, as well as voice range and the VHI physical subscale at Time 3. No significant changes were found for speaking quality or depressive symptom outcomes; however, there was an absence of decline on speaking quality outcomes over the intervention period. Significant improvements in singing quality and voice range, coupled with the absence of decline in speaking quality support group singing as a promising intervention for persons with PD. A two-group randomized control study is needed to determine whether the intervention contributes to maintenance of speaking quality in persons with PD.

  14. The right hemisphere is highlighted in connected natural speech production and perception.

    PubMed

    Alexandrou, Anna Maria; Saarinen, Timo; Mäkelä, Sasu; Kujala, Jan; Salmelin, Riitta

    2017-05-15

    Current understanding of the cortical mechanisms of speech perception and production stems mostly from studies that focus on single words or sentences. However, it has been suggested that processing of real-life connected speech may rely on additional cortical mechanisms. In the present study, we examined the neural substrates of natural speech production and perception with magnetoencephalography by modulating three central features related to speech: amount of linguistic content, speaking rate and social relevance. The amount of linguistic content was modulated by contrasting natural speech production and perception to speech-like non-linguistic tasks. Meaningful speech was produced and perceived at three speaking rates: normal, slow and fast. Social relevance was probed by having participants attend to speech produced by themselves and an unknown person. These speech-related features were each associated with distinct spatiospectral modulation patterns that involved cortical regions in both hemispheres. Natural speech processing markedly engaged the right hemisphere in addition to the left. In particular, the right temporo-parietal junction, previously linked to attentional processes and social cognition, was highlighted in the task modulations. The present findings suggest that its functional role extends to active generation and perception of meaningful, socially relevant speech. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Adaptation to delayed auditory feedback induces the temporal recalibration effect in both speech perception and production.

    PubMed

    Yamamoto, Kosuke; Kawabata, Hideaki

    2014-12-01

    We ordinarily speak fluently, even though our perceptions of our own voices are disrupted by various environmental acoustic properties. The underlying mechanism of speech is supposed to monitor the temporal relationship between speech production and the perception of auditory feedback, as suggested by a reduction in speech fluency when the speaker is exposed to delayed auditory feedback (DAF). While many studies have reported that DAF influences speech motor processing, its relationship to the temporal tuning effect on multimodal integration, or temporal recalibration, remains unclear. We investigated whether the temporal aspects of both speech perception and production change due to adaptation to the delay between the motor sensation and the auditory feedback. This is a well-used method of inducing temporal recalibration. Participants continually read texts with specific DAF times in order to adapt to the delay. Then, they judged the simultaneity between the motor sensation and the vocal feedback. We measured the rates of speech with which participants read the texts in both the exposure and re-exposure phases. We found that exposure to DAF changed both the rate of speech and the simultaneity judgment, that is, participants' speech gained fluency. Although we also found that a delay of 200 ms appeared to be most effective in decreasing the rates of speech and shifting the distribution on the simultaneity judgment, there was no correlation between these measurements. These findings suggest that both speech motor production and multimodal perception are adaptive to temporal lag but are processed in distinct ways.

  16. Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: effects of sequence length and practice

    PubMed Central

    Reilly, Kevin J.; Spencer, Kristie A.

    2013-01-01

    The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121

  17. Speech and Voice Response to a Levodopa Challenge in Late-Stage Parkinson's Disease.

    PubMed

    Fabbri, Margherita; Guimarães, Isabel; Cardoso, Rita; Coelho, Miguel; Guedes, Leonor Correia; Rosa, Mario M; Godinho, Catarina; Abreu, Daisy; Gonçalves, Nilza; Antonini, Angelo; Ferreira, Joaquim J

    2017-01-01

    Parkinson's disease (PD) patients are affected by hypokinetic dysarthria, characterized by hypophonia and dysprosody, which worsens with disease progression. Levodopa's (l-dopa) effect on quality of speech is inconclusive; no data are currently available for late-stage PD (LSPD). To assess the modifications of speech and voice in LSPD following an acute l-dopa challenge. LSPD patients [Schwab and England score <50/Hoehn and Yahr stage >3 (MED ON)] performed several vocal tasks before and after an acute l-dopa challenge. The following was assessed: respiratory support for speech, voice quality, stability and variability, speech rate, and motor performance (MDS-UPDRS-III). All voice samples were recorded and analyzed by a speech and language therapist blinded to patients' therapeutic condition using Praat 5.1 software. 24/27 (14 men) LSPD patients succeeded in performing voice tasks. Median age and disease duration of patients were 79 [IQR: 71.5-81.7] and 14.5 [IQR: 11-15.7] years, respectively. In MED OFF, respiratory breath support and pitch break time of LSPD patients were worse than the normative values of non-parkinsonian. A correlation was found between disease duration and voice quality ( R  = 0.51; p  = 0.013) and speech rate ( R  = -0.55; p  = 0.008). l-Dopa significantly improved MDS-UPDRS-III score (20%), with no effect on speech as assessed by clinical rating scales and automated analysis. Speech is severely affected in LSPD. Although l-dopa had some effect on motor performance, including axial signs, speech and voice did not improve. The applicability and efficacy of non-pharmacological treatment for speech impairment should be considered for speech disorder management in PD.

  18. Relationship Among Signal Fidelity, Hearing Loss, and Working Memory for Digital Noise Suppression.

    PubMed

    Arehart, Kathryn; Souza, Pamela; Kates, James; Lunner, Thomas; Pedersen, Michael Syskind

    2015-01-01

    This study considered speech modified by additive babble combined with noise-suppression processing. The purpose was to determine the relative importance of the signal modifications, individual peripheral hearing loss, and individual cognitive capacity on speech intelligibility and speech quality. The participant group consisted of 31 individuals with moderate high-frequency hearing loss ranging in age from 51 to 89 years (mean = 69.6 years). Speech intelligibility and speech quality were measured using low-context sentences presented in babble at several signal-to-noise ratios. Speech stimuli were processed with a binary mask noise-suppression strategy with systematic manipulations of two parameters (error rate and attenuation values). The cumulative effects of signal modification produced by babble and signal processing were quantified using an envelope-distortion metric. Working memory capacity was assessed with a reading span test. Analysis of variance was used to determine the effects of signal processing parameters on perceptual scores. Hierarchical linear modeling was used to determine the role of degree of hearing loss and working memory capacity in individual listener response to the processed noisy speech. The model also considered improvements in envelope fidelity caused by the binary mask and the degradations to envelope caused by error and noise. The participants showed significant benefits in terms of intelligibility scores and quality ratings for noisy speech processed by the ideal binary mask noise-suppression strategy. This benefit was observed across a range of signal-to-noise ratios and persisted when up to a 30% error rate was introduced into the processing. Average intelligibility scores and average quality ratings were well predicted by an objective metric of envelope fidelity. Degree of hearing loss and working memory capacity were significant factors in explaining individual listener's intelligibility scores for binary mask processing applied to speech in babble. Degree of hearing loss and working memory capacity did not predict listeners' quality ratings. The results indicate that envelope fidelity is a primary factor in determining the combined effects of noise and binary mask processing for intelligibility and quality of speech presented in babble noise. Degree of hearing loss and working memory capacity are significant factors in explaining variability in listeners' speech intelligibility scores but not in quality ratings.

  19. Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects.

    PubMed

    Skoog Waller, Sara; Eriksson, Mårten

    2016-01-01

    The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics ( f 0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f 0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f 0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20-25, 40-45, and 60-65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers' age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency ( f 0 ) and speech rate when attempting to sound younger and decreased f 0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f 0 , as a cue to speaker age. It was concluded that age disguise by voice can be achieved by naïve speakers even though the perceived effect was smaller than intended.

  20. Revisiting speech rate and utterance length manipulations in stuttering speakers.

    PubMed

    Blomgren, Michael; Goberman, Alexander M

    2008-01-01

    The goal of this study was to evaluate stuttering frequency across a multidimensional (2x2) hierarchy of speech performance tasks. Specifically, this study examined the interaction between changes in length of utterance and levels of speech rate stability. Forty-four adult male speakers participated in the study (22 stuttering speakers and 22 non-stuttering speakers). Participants were audio and video recorded while producing a spontaneous speech task and four different experimental speaking tasks. The four experimental speaking tasks involved reading a list of 45 words and a list 45 phrases two times each. One reading of each list involved speaking at a steady habitual rate (habitual rate tasks) and another reading involved producing each list at a variable speaking rate (variable rate tasks). For the variable rate tasks, participants were directed to produce words or phrases at randomly ordered slow, habitual, and fast rates. The stuttering speakers exhibited significantly more stuttering on the variable rate tasks than on the habitual rate tasks. In addition, the stuttering speakers exhibited significantly more stuttering on the first word of the phrase length tasks compared to the single word tasks. Overall, the results indicated that varying levels of both utterance length and temporal complexity function to modulate stuttering frequency in adult stuttering speakers. Discussion focuses on issues of speech performance according to stuttering severity and possible clinical implications. The reader will learn about and be able to: (1) describe the mediating effects of length of utterance and speech rate on the frequency of stuttering in stuttering speakers; (2) understand the rationale behind multidimensional skill performance matrices; and (3) describe possible applications of motor skill performance matrices to stuttering therapy.

  1. Visual-auditory integration during speech imitation in autism.

    PubMed

    Williams, Justin H G; Massaro, Dominic W; Peel, Natalie J; Bosseler, Alexis; Suddendorf, Thomas

    2004-01-01

    Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional 'mirror neuron' systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a 'virtual' head (Baldi), delivered speech stimuli for identification in auditory, visual or bimodal conditions. Children with ASD were poorer than controls at recognizing stimuli in the unimodal conditions, but once performance on this measure was controlled for, no group difference was found in the bimodal condition. A group of participants with ASD were also trained to develop their speech-reading ability. Training improved visual accuracy and this also improved the children's ability to utilize visual information in their processing of speech. Overall results were compared to predictions from mathematical models based on integration and non-integration, and were most consistent with the integration model. We conclude that, whilst they are less accurate in recognizing stimuli in the unimodal condition, children with ASD show normal integration of visual and auditory speech stimuli. Given that training in recognition of visual speech was effective, children with ASD may benefit from multi-modal approaches in imitative therapy and language training.

  2. Sequential Adaptive Multi-Modality Target Detection and Classification Using Physics Based Models

    DTIC Science & Technology

    2006-09-01

    estimation," R. Raghuram, R. Raich and A.O. Hero, IEEE Intl. Conf. on Acoustics, Speech , and Signal Processing, Toulouse France, June 2006, <http...can then be solved using off-the-shelf classifiers such as radial basis functions, SVM, or kNN classifier structures. When applied to mine detection we...stage waveform selection for adaptive resource constrained state estimation," 2006 IEEE Intl. Conf. on Acoustics, Speech , and Signal Processing

  3. Effect of concurrent walking and interlocutor distance on conversational speech intensity and rate in Parkinson's disease.

    PubMed

    McCaig, Cassandra M; Adams, Scott G; Dykstra, Allyson D; Jog, Mandar

    2016-01-01

    Previous studies have demonstrated a negative effect of concurrent walking and talking on gait in Parkinson's disease (PD) but there is limited information about the effect of concurrent walking on speech production. The present study examined the effect of sitting, standing, and three concurrent walking tasks (slow, normal, fast) on conversational speech intensity and speech rate in fifteen individuals with hypophonia related to idiopathic Parkinson's disease (PD) and fourteen age-equivalent controls. Interlocuter (talker-to-talker) distance effects and walking speed were also examined. Concurrent walking was found to produce a significant increase in speech intensity, relative to standing and sitting, in both the control and PD groups. Faster walking produced significantly greater speech intensity than slower walking. Concurrent walking had no effect on speech rate. Concurrent walking and talking produced significant reductions in walking speed in both the control and PD groups. In general, the results of the present study indicate that concurrent walking tasks and the speed of concurrent walking can have a significant positive effect on conversational speech intensity. These positive, "energizing" effects need to be given consideration in future attempts to develop a comprehensive model of speech intensity regulation and they may have important implications for the development of new evaluation and treatment procedures for individuals with hypophonia related to PD. Crown Copyright © 2015. Published by Elsevier B.V. All rights reserved.

  4. Eye’m talking to you: speakers’ gaze direction modulates co-speech gesture processing in the right MTG

    PubMed Central

    Toni, Ivan; Hagoort, Peter; Kelly, Spencer D.; Özyürek, Aslı

    2015-01-01

    Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture. Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech–gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts. PMID:24652857

  5. Binaural sluggishness in the perception of tone sequences and speech in noise.

    PubMed

    Culling, J F; Colburn, H S

    2000-01-01

    The binaural system is well-known for its sluggish response to changes in the interaural parameters to which it is sensitive. Theories of binaural unmasking have suggested that detection of signals in noise is mediated by detection of differences in interaural correlation. If these theories are correct, improvements in the intelligibility of speech in favorable binaural conditions is most likely mediated by spectro-temporal variations in interaural correlation of the stimulus which mirror the spectro-temporal amplitude modulations of the speech. However, binaural sluggishness should limit the temporal resolution of the representation of speech recovered by this means. The present study tested this prediction in two ways. First, listeners' masked discrimination thresholds for ascending vs descending pure-tone arpeggios were measured as a function of rate of frequency change in the NoSo and NoSpi binaural configurations. Three-tone arpeggios were presented repeatedly and continuously for 1.6 s, masked by a 1.6-s burst of noise. In a two-interval task, listeners determined the interval in which the arpeggios were ascending. The results showed a binaural advantage of 12-14 dB for NoSpi at 3.3 arpeggios per s (arp/s), which reduced to 3-5 dB at 10.4 arp/s. This outcome confirmed that the discrimination of spectro-temporal patterns in noise is susceptible to the effects of binaural sluggishness. Second, listeners' masked speech-reception thresholds were measured in speech-shaped noise using speech which was 1, 1.5, and 2 times the original articulation rate. The articulation rate was increased using a phase-vocoder technique which increased all the modulation frequencies in the speech without altering its pitch. Speech-reception thresholds were, on average, 5.2 dB lower for the NoSpi than for the NoSo configuration, at the original articulation rate. This binaural masking release was reduced to 2.8 dB when the articulation rate was doubled, but the most notable effect was a 6-8 dB increase in thresholds with articulation rate for both configurations. These results suggest that higher modulation frequencies in masked signals cannot be temporally resolved by the binaural system, but that the useful modulation frequencies in speech are sufficiently low (<5 Hz) that they are invulnerable to the effects of binaural sluggishness, even at elevated articulation rates.

  6. The effect of intensive speech rate and intonation therapy on intelligibility in Parkinson's disease.

    PubMed

    Martens, Heidi; Van Nuffelen, Gwen; Dekens, Tomas; Hernández-Díaz Huici, Maria; Kairuz Hernández-Díaz, Hector Arturo; De Letter, Miet; De Bodt, Marc

    2015-01-01

    Most studies on treatment of prosody in individuals with dysarthria due to Parkinson's disease are based on intensive treatment of loudness. The present study investigates the effect of intensive treatment of speech rate and intonation on the intelligibility of individuals with dysarthria due to Parkinson's disease. A one group pretest-posttest design was used to compare intelligibility, speech rate, and intonation before and after treatment. Participants included eleven Dutch-speaking individuals with predominantly moderate dysarthria due to Parkinson's disease, who received five one-hour treatment sessions per week during three weeks. Treatment focused on lowering speech rate and magnifying the phrase final intonation contrast between statements and questions. Intelligibility was perceptually assessed using a standardized sentence intelligibility test. Speech rate was automatically assessed during the sentence intelligibility test as well as during a passage reading task and a storytelling task. Intonation was perceptually assessed using a sentence reading task and a sentence repetition task, and also acoustically analyzed in terms of maximum fundamental frequency. After treatment, there was a significant improvement of sentence intelligibility (effect size .83), a significant increase of pause frequency during the passage reading task, a significant improvement of correct listener identification of statements and questions, and a significant increase of the maximum fundamental frequency in the final syllable of questions during both intonation tasks. The findings suggest that participants were more intelligible and more able to manipulate pause frequency and statement-question intonation after treatment. However, the relationship between the change in intelligibility on the one hand and the changes in speech rate and intonation on the other hand is not yet fully understood. Results are nuanced in the light of the operated research design. The reader will be able to: (1) describe the effect of intensive speech rate and intonation treatment on intelligibility of speakers with dysarthria due to PD, (2) describe the effect of intensive speech rate treatment on rate manipulation by speakers with dysarthria due to PD, and (3) describe the effect of intensive intonation treatment on manipulation of the phrase final intonation contrast between statements and questions by speakers with dysarthria due to PD. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Novel modes and adaptive block scanning order for intra prediction in AV1

    NASA Astrophysics Data System (ADS)

    Hadar, Ofer; Shleifer, Ariel; Mukherjee, Debargha; Joshi, Urvang; Mazar, Itai; Yuzvinsky, Michael; Tavor, Nitzan; Itzhak, Nati; Birman, Raz

    2017-09-01

    The demand for streaming video content is on the rise and growing exponentially. Networks bandwidth is very costly and therefore there is a constant effort to improve video compression rates and enable the sending of reduced data volumes while retaining quality of experience (QoE). One basic feature that utilizes the spatial correlation of pixels for video compression is Intra-Prediction, which determines the codec's compression efficiency. Intra prediction enables significant reduction of the Intra-Frame (I frame) size and, therefore, contributes to efficient exploitation of bandwidth. In this presentation, we propose new Intra-Prediction algorithms that improve the AV1 prediction model and provide better compression ratios. Two (2) types of methods are considered: )1( New scanning order method that maximizes spatial correlation in order to reduce prediction error; and )2( New Intra-Prediction modes implementation in AVI. Modern video coding standards, including AVI codec, utilize fixed scan orders in processing blocks during intra coding. The fixed scan orders typically result in residual blocks with high prediction error mainly in blocks with edges. This means that the fixed scan orders cannot fully exploit the content-adaptive spatial correlations between adjacent blocks, thus the bitrate after compression tends to be large. To reduce the bitrate induced by inaccurate intra prediction, the proposed approach adaptively chooses the scanning order of blocks according to criteria of firstly predicting blocks with maximum number of surrounding, already Inter-Predicted blocks. Using the modified scanning order method and the new modes has reduced the MSE by up to five (5) times when compared to conventional TM mode / Raster scan and up to two (2) times when compared to conventional CALIC mode / Raster scan, depending on the image characteristics (which determines the percentage of blocks predicted with Inter-Prediction, which in turn impacts the efficiency of the new scanning method). For the same cases, the PSNR was shown to improve by up to 7.4dB and up to 4 dB, respectively. The new modes have yielded 5% improvement in BD-Rate over traditionally used modes, when run on K-Frame, which is expected to yield 1% of overall improvement.

  8. Systematic Studies of Modified Vocalization: Speech Production Changes During a Variation of Metronomic Speech in Persons Who Do and Do Not Stutter

    PubMed Central

    Davidow, Jason H.; Bothe, Anne K.; Ye, Jun

    2011-01-01

    The most common way to induce fluency using rhythm requires persons who stutter to speak one syllable or one word to each beat of a metronome, but stuttering can also be eliminated when the stimulus is of a particular duration (e.g., 1 s). The present study examined stuttering frequency, speech production changes, and speech naturalness during rhythmic speech that alternated 1 s of reading with 1 s of silence. A repeated-measures design was used to compare data obtained during a control reading condition and during rhythmic reading in 10 persons who stutter (PWS) and 10 normally fluent controls. Ratings for speech naturalness were also gathered from naïve listeners. Results showed that mean vowel duration increased significantly, and the percentage of short phonated intervals decreased significantly, for both groups from the control to the experimental condition. Mean phonated interval length increased significantly for the fluent controls. Mean speech naturalness ratings during the experimental condition were approximately 7 on a 1–9 scale (1 = highly natural; 9 = highly unnatural), and these ratings were significantly correlated with vowel duration and phonated intervals for PWS. The findings indicate that PWS may be altering vocal fold vibration duration to obtain fluency during this rhythmic speech style, and that vocal fold vibration duration may have an impact on speech naturalness during rhythmic speech. Future investigations should examine speech production changes and speech naturalness during variations of this rhythmic condition. Educational Objectives The reader will be able to: (1) describe changes (from a control reading condition) in speech production variables when alternating between 1 s of reading and 1 s of silence, (2) describe which rhythmic conditions have been found to sound and feel the most natural, (3) describe methodological issues for studies about alterations in speech production variables during fluency-inducing conditions, and (4) describe which fluency-inducing conditions have been shown to involve a reduction in short phonated intervals. PMID:21664528

  9. Now you hear it, now you don't: vowel devoicing in Japanese infant-directed speech.

    PubMed

    Fais, Laurel; Kajikawa, Sachiyo; Amano, Shigeaki; Werker, Janet F

    2010-03-01

    In this work, we examine a context in which a conflict arises between two roles that infant-directed speech (IDS) plays: making language structure salient and modeling the adult form of a language. Vowel devoicing in fluent adult Japanese creates violations of the canonical Japanese consonant-vowel word structure pattern by systematically devoicing particular vowels, yielding surface consonant clusters. We measured vowel devoicing rates in a corpus of infant- and adult-directed Japanese speech, for both read and spontaneous speech, and found that the mothers in our study preserve the fluent adult form of the language and mask underlying phonological structure by devoicing vowels in infant-directed speech at virtually the same rates as those for adult-directed speech. The results highlight the complex interrelationships among the modifications to adult speech that comprise infant-directed speech, and that form the input from which infants begin to build the eventual mature form of their native language.

  10. Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech.

    PubMed

    Dilley, Laura C; Wieland, Elizabeth A; Gamache, Jessica L; McAuley, J Devin; Redford, Melissa A

    2013-02-01

    As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Speech was modified by lowering formants and fundamental frequency, for 5-year-old children's utterances, or raising them, for adult caregivers' utterances. Next, participants differing in awareness of the manipulation (Experiment 1A) or amount of speech-language training (Experiment 1B) made judgments of prosodic, segmental, and talker attributes. Experiment 2 investigated the effects of spectral modification on intelligibility. Finally, in Experiment 3, trained analysts used formal prosody coding to assess prosodic characteristics of spectrally modified and unmodified speech. Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work.

  11. Frontal brain electrical activity (EEG) and heart rate in response to affective infant-directed (ID) speech in 9-month-old infants.

    PubMed

    Santesso, Diane L; Schmidt, Louis A; Trainor, Laurel J

    2007-10-01

    Many studies have shown that infants prefer infant-directed (ID) speech to adult-directed (AD) speech. ID speech functions to aid language learning, obtain and/or maintain an infant's attention, and create emotional communication between the infant and caregiver. We examined psychophysiological responses to ID speech that varied in affective content (i.e., love/comfort, surprise, fear) in a group of typically developing 9-month-old infants. Regional EEG and heart rate were collected continuously during stimulus presentation. We found the pattern of overall frontal EEG power was linearly related to affective intensity of the ID speech, such that EEG power was greatest in response to fear, than surprise than love/comfort; this linear pattern was specific to the frontal region. We also noted that heart rate decelerated to ID speech independent of affective content. As well, infants who were reported by their mothers as temperamentally distressed tended to exhibit greater relative right frontal EEG activity during baseline and in response to affective ID speech, consistent with previous work with visual stimuli and extending it to the auditory modality. Findings are discussed in terms of how increases in frontal EEG power in response to different affective intensity may reflect the cognitive aspects of emotional processing across sensory domains in infancy.

  12. MPCM: a hardware coder for super slow motion video sequences

    NASA Astrophysics Data System (ADS)

    Alcocer, Estefanía; López-Granado, Otoniel; Gutierrez, Roberto; Malumbres, Manuel P.

    2013-12-01

    In the last decade, the improvements in VLSI levels and image sensor technologies have led to a frenetic rush to provide image sensors with higher resolutions and faster frame rates. As a result, video devices were designed to capture real-time video at high-resolution formats with frame rates reaching 1,000 fps and beyond. These ultrahigh-speed video cameras are widely used in scientific and industrial applications, such as car crash tests, combustion research, materials research and testing, fluid dynamics, and flow visualization that demand real-time video capturing at extremely high frame rates with high-definition formats. Therefore, data storage capability, communication bandwidth, processing time, and power consumption are critical parameters that should be carefully considered in their design. In this paper, we propose a fast FPGA implementation of a simple codec called modulo-pulse code modulation (MPCM) which is able to reduce the bandwidth requirements up to 1.7 times at the same image quality when compared with PCM coding. This allows current high-speed cameras to capture in a continuous manner through a 40-Gbit Ethernet point-to-point access.

  13. Frontal Brain Electrical Activity (EEG) and Heart Rate in Response to Affective Infant-Directed (ID) Speech in 9-Month-Old Infants

    ERIC Educational Resources Information Center

    Santesso, Diane L.; Schmidt, Louis A.; Trainor, Laurel J.

    2007-01-01

    Many studies have shown that infants prefer infant-directed (ID) speech to adult-directed (AD) speech. ID speech functions to aid language learning, obtain and/or maintain an infant's attention, and create emotional communication between the infant and caregiver. We examined psychophysiological responses to ID speech that varied in affective…

  14. Increased vocal intensity due to the Lombard effect in speakers with Parkinson's disease: simultaneous laryngeal and respiratory strategies.

    PubMed

    Stathopoulos, Elaine T; Huber, Jessica E; Richardson, Kelly; Kamphaus, Jennifer; DeCicco, Devan; Darling, Meghan; Fulcher, Katrina; Sussman, Joan E

    2014-01-01

    The objective of the present study was to investigate whether speakers with hypophonia, secondary to Parkinson's disease (PD), would increases their vocal intensity when speaking in a noisy environment (Lombard effect). The other objective was to examine the underlying laryngeal and respiratory strategies used to increase vocal intensity. Thirty-three participants with PD were included for study. Each participant was fitted with the SpeechVive™ device that played multi-talker babble noise into one ear during speech. Using acoustic, aerodynamic and respiratory kinematic techniques, the simultaneous laryngeal and respiratory mechanisms used to regulate vocal intensity were examined. Significant group results showed that most speakers with PD (26/33) were successful at increasing their vocal intensity when speaking in the condition of multi-talker babble noise. They were able to support their increased vocal intensity and subglottal pressure with combined strategies from both the laryngeal and respiratory mechanisms. Individual speaker analysis indicated that the particular laryngeal and respiratory interactions differed among speakers. The SpeechVive™ device elicited higher vocal intensities from patients with PD. Speakers used different combinations of laryngeal and respiratory physiologic mechanisms to increase vocal intensity, thus suggesting that disease process does not uniformly affect the speech subsystems. Readers will be able to: (1) identify speech characteristics of people with Parkinson's disease (PD), (2) identify typical respiratory strategies for increasing sound pressure level (SPL), (3) identify typical laryngeal strategies for increasing SPL, (4) define the Lombard effect. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. AT2 DS II - Accelerator System Design (Part II) - CCC Video Conference

    ScienceCinema

    None

    2017-12-09

    Discussion Session - Accelerator System Design (Part II) Tutors: C. Darve, J. Weisend II, Ph. Lebrun, A. Dabrowski, U. Raich Video Conference with the CERN Control Center. Experts in the field of Accelerator science will be available to answer the students questions. This session will link the CCC and SA (using Codec VC).

  16. Impact of speech presentation level on cognitive task performance: implications for auditory display design.

    PubMed

    Baldwin, Carryl L; Struckman-Johnson, David

    2002-01-15

    Speech displays and verbal response technologies are increasingly being used in complex, high workload environments that require the simultaneous performance of visual and manual tasks. Examples of such environments include the flight decks of modern aircraft, advanced transport telematics systems providing invehicle route guidance and navigational information and mobile communication equipment in emergency and public safety vehicles. Previous research has established an optimum range for speech intelligibility. However, the potential for variations in presentation levels within this range to affect attentional resources and cognitive processing of speech material has not been examined previously. Results of the current experimental investigation demonstrate that as presentation level increases within this 'optimum' range, participants in high workload situations make fewer sentence-processing errors and generally respond faster. Processing errors were more sensitive to changes in presentation level than were measures of reaction time. Implications of these findings are discussed in terms of their application for the design of speech communications displays in complex multi-task environments.

  17. Dysfluencies in the speech of adults with intellectual disabilities and reported speech difficulties.

    PubMed

    Coppens-Hofman, Marjolein C; Terband, Hayo R; Maassen, Ben A M; van Schrojenstein Lantman-De Valk, Henny M J; van Zaalen-op't Hof, Yvonne; Snik, Ad F M

    2013-01-01

    In individuals with an intellectual disability, speech dysfluencies are more common than in the general population. In clinical practice, these fluency disorders are generally diagnosed and treated as stuttering rather than cluttering. To characterise the type of dysfluencies in adults with intellectual disabilities and reported speech difficulties with an emphasis on manifestations of stuttering and cluttering, which distinction is to help optimise treatment aimed at improving fluency and intelligibility. The dysfluencies in the spontaneous speech of 28 adults (18-40 years; 16 men) with mild and moderate intellectual disabilities (IQs 40-70), who were characterised as poorly intelligible by their caregivers, were analysed using the speech norms for typically developing adults and children. The speakers were subsequently assigned to different diagnostic categories by relating their resulting dysfluency profiles to mean articulatory rate and articulatory rate variability. Twenty-two (75%) of the participants showed clinically significant dysfluencies, of which 21% were classified as cluttering, 29% as cluttering-stuttering and 25% as clear cluttering at normal articulatory rate. The characteristic pattern of stuttering did not occur. The dysfluencies in the speech of adults with intellectual disabilities and poor intelligibility show patterns that are specific for this population. Together, the results suggest that in this specific group of dysfluent speakers interventions should be aimed at cluttering rather than stuttering. The reader will be able to (1) describe patterns of dysfluencies in the speech of adults with intellectual disabilities that are specific for this group of people, (2) explain that a high rate of dysfluencies in speech is potentially a major determiner of poor intelligibility in adults with ID and (3) describe suggestions for intervention focusing on cluttering rather than stuttering in dysfluent speakers with ID. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model.

    PubMed

    Jürgens, Tim; Brand, Thomas

    2009-11-01

    This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.

  19. Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers

    PubMed Central

    Mustafa, Mumtaz Begum; Salim, Siti Salwah; Mohamed, Noraini; Al-Qatab, Bassam; Siong, Chng Eng

    2014-01-01

    Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data. PMID:24466004

  20. Acoustic changes in the speech of children with cerebral palsy following an intensive program of dysarthria therapy.

    PubMed

    Pennington, Lindsay; Lombardo, Eftychia; Steen, Nick; Miller, Nick

    2018-01-01

    The speech intelligibility of children with dysarthria and cerebral palsy has been observed to increase following therapy focusing on respiration and phonation. To determine if speech intelligibility change following intervention is associated with change in acoustic measures of voice. We recorded 16 young people with cerebral palsy and dysarthria (nine girls; mean age 14 years, SD = 2; nine spastic type, two dyskinetic, four mixed; one Worster-Drought) producing speech in two conditions (single words, connected speech) twice before and twice after therapy focusing on respiration, phonation and rate. In both single-word and connected speech we measured vocal intensity (root mean square-RMS), period-to-period variability (Shimmer APQ, Jitter RAP and PPQ) and harmonics-to-noise ratio (HNR). In connected speech we also measured mean fundamental frequency, utterance duration in seconds and speech and articulation rate (syllables/s with and without pauses respectively). All acoustic measures were made using Praat. Intelligibility was calculated in previous research. In single words statistically significant but very small reductions were observed in period-to-period variability following therapy: Shimmer APQ -0.15 (95% CI = -0.21 to -0.09); Jitter RAP -0.08 (95% CI = -0.14 to -0.01); Jitter PPQ -0.08 (95% CI = -0.15 to -0.01). No changes in period-to-period perturbation across phrases in connected speech were detected. However, changes in connected speech were observed in phrase length, rate and intensity. Following therapy, mean utterance duration increased by 1.11 s (95% CI = 0.37-1.86) when measured with pauses and by 1.13 s (95% CI = 0.40-1.85) when measured without pauses. Articulation rate increased by 0.07 syllables/s (95% CI = 0.02-0.13); speech rate increased by 0.06 syllables/s (95% CI = < 0.01-0.12); and intensity increased by 0.03 Pascals (95% CI = 0.02-0.04). There was a gradual reduction in mean fundamental frequency across all time points (-11.85 Hz, 95% CI = -19.84 to -3.86). Only increases in the intensity of single words (0.37 Pascals, 95% CI = 0.10-0.65) and reductions in fundamental frequency (-0.11 Hz, 95% CI = -0.21 to -0.02) in connected speech were associated with gains in intelligibility. Mean reductions in impairment in vocal function following therapy observed were small and most are unlikely to be clinically significant. Changes in vocal control did not explain improved intelligibility. © 2017 Royal College of Speech and Language Therapists.

  1. Pulse Vector-Excitation Speech Encoder

    NASA Technical Reports Server (NTRS)

    Davidson, Grant; Gersho, Allen

    1989-01-01

    Proposed pulse vector-excitation speech encoder (PVXC) encodes analog speech signals into digital representation for transmission or storage at rates below 5 kilobits per second. Produces high quality of reconstructed speech, but with less computation than required by comparable speech-encoding systems. Has some characteristics of multipulse linear predictive coding (MPLPC) and of code-excited linear prediction (CELP). System uses mathematical model of vocal tract in conjunction with set of excitation vectors and perceptually-based error criterion to synthesize natural-sounding speech.

  2. Multi-modal Biomarkers to Discriminate Cognitive State

    DTIC Science & Technology

    2015-11-01

    in the speech of a large sample of Parkinson patients. J. Speech Hear. Disord. 43(1), 47. 36. Ekman, P., Freisen, W.V. and Ancoli, S . 1980. Facial...Patel, Laura Brattain, Brian S . Helfer, Daryush D. Mehta, Jeffrey Palmer Kristin Heaton2, Marianna Eddy3, Joseph Moran3 1MIT Lincoln Laboratory...Parkinson’s disease [23]-[35]. Voice has been used in cognitive load by Yin et al [2] who achieved 77% accuracy using standard vocal features (e.g., mel

  3. Why Should Speech Rate (Tempo) Be Integrated into Pronunciation Teaching Curriculum

    ERIC Educational Resources Information Center

    Yurtbasi, Meti

    2015-01-01

    The pace of speech i.e. tempo can be varied to our mood of the moment. Fast speech can convey urgency, whereas slower speech can be used for emphasis. In public speaking, orators produce powerful effects by varying the loudness and pace of their speech. The juxtaposition of very loud and very quiet utterances is a device often used by those trying…

  4. Talker Differences in Clear and Conversational Speech: Perceived Sentence Clarity for Young Adults with Normal Hearing and Older Adults with Hearing Loss

    ERIC Educational Resources Information Center

    Ferguson, Sarah Hargus; Morgan, Shae D.

    2018-01-01

    Purpose: The purpose of this study is to examine talker differences for subjectively rated speech clarity in clear versus conversational speech, to determine whether ratings differ for young adults with normal hearing (YNH listeners) and older adults with hearing impairment (OHI listeners), and to explore effects of certain talker characteristics…

  5. Improving Speech Perception in Noise with Current Focusing in Cochlear Implant Users

    PubMed Central

    Srinivasan, Arthi G.; Padilla, Monica; Shannon, Robert V.; Landsberger, David M.

    2013-01-01

    Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. PMID:23467170

  6. Only Behavioral But Not Self-Report Measures of Speech Perception Correlate with Cognitive Abilities.

    PubMed

    Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A

    2016-01-01

    Good speech perception and communication skills in everyday life are crucial for participation and well-being, and are therefore an overarching aim of auditory rehabilitation. Both behavioral and self-report measures can be used to assess these skills. However, correlations between behavioral and self-report speech perception measures are often low. One possible explanation is that there is a mismatch between the specific situations used in the assessment of these skills in each method, and a more careful matching across situations might improve consistency of results. The role that cognition plays in specific speech situations may also be important for understanding communication, as speech perception tests vary in their cognitive demands. In this study, the role of executive function, working memory (WM) and attention in behavioral and self-report measures of speech perception was investigated. Thirty existing hearing aid users with mild-to-moderate hearing loss aged between 50 and 74 years completed a behavioral test battery with speech perception tests ranging from phoneme discrimination in modulated noise (easy) to words in multi-talker babble (medium) and keyword perception in a carrier sentence against a distractor voice (difficult). In addition, a self-report measure of aided communication, residual disability from the Glasgow Hearing Aid Benefit Profile, was obtained. Correlations between speech perception tests and self-report measures were higher when specific speech situations across both were matched. Cognition correlated with behavioral speech perception test results but not with self-report. Only the most difficult speech perception test, keyword perception in a carrier sentence with a competing distractor voice, engaged executive functions in addition to WM. In conclusion, any relationship between behavioral and self-report speech perception is not mediated by a shared correlation with cognition.

  7. Only Behavioral But Not Self-Report Measures of Speech Perception Correlate with Cognitive Abilities

    PubMed Central

    Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.

    2016-01-01

    Good speech perception and communication skills in everyday life are crucial for participation and well-being, and are therefore an overarching aim of auditory rehabilitation. Both behavioral and self-report measures can be used to assess these skills. However, correlations between behavioral and self-report speech perception measures are often low. One possible explanation is that there is a mismatch between the specific situations used in the assessment of these skills in each method, and a more careful matching across situations might improve consistency of results. The role that cognition plays in specific speech situations may also be important for understanding communication, as speech perception tests vary in their cognitive demands. In this study, the role of executive function, working memory (WM) and attention in behavioral and self-report measures of speech perception was investigated. Thirty existing hearing aid users with mild-to-moderate hearing loss aged between 50 and 74 years completed a behavioral test battery with speech perception tests ranging from phoneme discrimination in modulated noise (easy) to words in multi-talker babble (medium) and keyword perception in a carrier sentence against a distractor voice (difficult). In addition, a self-report measure of aided communication, residual disability from the Glasgow Hearing Aid Benefit Profile, was obtained. Correlations between speech perception tests and self-report measures were higher when specific speech situations across both were matched. Cognition correlated with behavioral speech perception test results but not with self-report. Only the most difficult speech perception test, keyword perception in a carrier sentence with a competing distractor voice, engaged executive functions in addition to WM. In conclusion, any relationship between behavioral and self-report speech perception is not mediated by a shared correlation with cognition. PMID:27242564

  8. NASA. Lewis Research Center Advanced Modulation and Coding Project: Introduction and overview

    NASA Technical Reports Server (NTRS)

    Budinger, James M.

    1992-01-01

    The Advanced Modulation and Coding Project at LeRC is sponsored by the Office of Space Science and Applications, Communications Division, Code EC, at NASA Headquarters and conducted by the Digital Systems Technology Branch of the Space Electronics Division. Advanced Modulation and Coding is one of three focused technology development projects within the branch's overall Processing and Switching Program. The program consists of industry contracts for developing proof-of-concept (POC) and demonstration model hardware, university grants for analyzing advanced techniques, and in-house integration and testing of performance verification and systems evaluation. The Advanced Modulation and Coding Project is broken into five elements: (1) bandwidth- and power-efficient modems; (2) high-speed codecs; (3) digital modems; (4) multichannel demodulators; and (5) very high-data-rate modems. At least one contract and one grant were awarded for each element.

  9. Multi-channel spatial auditory display for speech communications

    NASA Astrophysics Data System (ADS)

    Begault, Durand; Erbe, Tom

    1993-10-01

    A spatial auditory display for multiple speech communications was developed at NASA-Ames Research Center. Input is spatialized by use of simplified head-related transfer functions, adapted for FIR filtering on Motorola 56001 digital signal processors. Hardware and firmware design implementations are overviewed for the initial prototype developed for NASA-Kennedy Space Center. An adaptive staircase method was used to determine intelligibility levels of four letter call signs used by launch personnel at NASA, against diotic speech babble. Spatial positions at 30 deg azimuth increments were evaluated. The results from eight subjects showed a maximal intelligibility improvement of about 6 to 7 dB when the signal was spatialized to 60 deg or 90 deg azimuth positions.

  10. Multi-channel spatial auditory display for speech communications

    NASA Technical Reports Server (NTRS)

    Begault, Durand; Erbe, Tom

    1993-01-01

    A spatial auditory display for multiple speech communications was developed at NASA-Ames Research Center. Input is spatialized by use of simplified head-related transfer functions, adapted for FIR filtering on Motorola 56001 digital signal processors. Hardware and firmware design implementations are overviewed for the initial prototype developed for NASA-Kennedy Space Center. An adaptive staircase method was used to determine intelligibility levels of four letter call signs used by launch personnel at NASA, against diotic speech babble. Spatial positions at 30 deg azimuth increments were evaluated. The results from eight subjects showed a maximal intelligibility improvement of about 6 to 7 dB when the signal was spatialized to 60 deg or 90 deg azimuth positions.

  11. Deep Learning Based Binaural Speech Separation in Reverberant Environments.

    PubMed

    Zhang, Xueliang; Wang, DeLiang

    2017-05-01

    Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning problem, and we employ deep learning to map from both spatial and spectral features to a training target. With binaural inputs, we first apply a fixed beamformer and then extract several spectral features. A new spatial feature is proposed and extracted to complement the spectral features. The training target is the recently suggested ideal ratio mask. Systematic evaluations and comparisons show that the proposed system achieves very good separation performance and substantially outperforms related algorithms under challenging multi-source and reverberant environments.

  12. Methodology for speech assessment in the Scandcleft project--an international randomized clinical trial on palatal surgery: experiences from a pilot study.

    PubMed

    Lohmander, A; Willadsen, E; Persson, C; Henningsson, G; Bowden, M; Hutters, B

    2009-07-01

    To present the methodology for speech assessment in the Scandcleft project and discuss issues from a pilot study. Description of methodology and blinded test for speech assessment. Speech samples and instructions for data collection and analysis for comparisons of speech outcomes across five included languages were developed and tested. PARTICIPANTS AND MATERIALS: Randomly selected video recordings of 10 5-year-old children from each language (n = 50) were included in the project. Speech material consisted of test consonants in single words, connected speech, and syllable chains with nasal consonants. Five experienced speech and language pathologists participated as observers. Narrow phonetic transcription of test consonants translated into cleft speech characteristics, ordinal scale rating of resonance, and perceived velopharyngeal closure (VPC). A velopharyngeal composite score (VPC-sum) was extrapolated from raw data. Intra-agreement comparisons were performed. Range for intra-agreement for consonant analysis was 53% to 89%, for hypernasality on high vowels in single words the range was 20% to 80%, and the agreement between the VPC-sum and the overall rating of VPC was 78%. Pooling data of speakers of different languages in the same trial and comparing speech outcome across trials seems possible if the assessment of speech concerns consonants and is confined to speech units that are phonetically similar across languages. Agreed conventions and rules are important. A composite variable for perceptual assessment of velopharyngeal function during speech seems usable; whereas, the method for hypernasality evaluation requires further testing.

  13. Temporal processing of speech in a time-feature space

    NASA Astrophysics Data System (ADS)

    Avendano, Carlos

    1997-09-01

    The performance of speech communication systems often degrades under realistic environmental conditions. Adverse environmental factors include additive noise sources, room reverberation, and transmission channel distortions. This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances. Speech reflects the geometry of the vocal organs, and the linguistically dominant component is in the shape of the vocal tract. At any given point in time, the shape of the vocal tract is reflected in the short-time spectral envelope of the speech signal. The rate of change of the vocal tract shape appears to be important for the identification of linguistic components. This rate of change, or the rate of change of the short-time spectral envelope can be described by the modulation spectrum, i.e. the spectrum of the time trajectories described by the short-time spectral envelope. For a wide range of frequency bands, the modulation spectrum of speech exhibits a maximum at about 4 Hz, the average syllabic rate. Disturbances often have modulation frequency components outside the speech range, and could in principle be attenuated without significantly affecting the range with relevant linguistic information. Early efforts for exploiting the modulation spectrum domain (temporal processing), such as the dynamic cepstrum or the RASTA processing, used ad hoc designed processing and appear to be suboptimal. As a major contribution, in this dissertation we aim for a systematic data-driven design of temporal processing. First we analytically derive and discuss some properties and merits of temporal processing for speech signals. We attempt to formalize the concept and provide a theoretical background which has been lacking in the field. In the experimental part we apply temporal processing to a number of problems including adaptive noise reduction in cellular telephone environments, reduction of reverberation for speech enhancement, and improvements on automatic recognition of speech degraded by linear distortions and reverberation.

  14. Auditory and cognitive factors underlying individual differences in aided speech-understanding among older adults

    PubMed Central

    Humes, Larry E.; Kidd, Gary R.; Lentz, Jennifer J.

    2013-01-01

    This study was designed to address individual differences in aided speech understanding among a relatively large group of older adults. The group of older adults consisted of 98 adults (50 female and 48 male) ranging in age from 60 to 86 (mean = 69.2). Hearing loss was typical for this age group and about 90% had not worn hearing aids. All subjects completed a battery of tests, including cognitive (6 measures), psychophysical (17 measures), and speech-understanding (9 measures), as well as the Speech, Spatial, and Qualities of Hearing (SSQ) self-report scale. Most of the speech-understanding measures made use of competing speech and the non-speech psychophysical measures were designed to tap phenomena thought to be relevant for the perception of speech in competing speech (e.g., stream segregation, modulation-detection interference). All measures of speech understanding were administered with spectral shaping applied to the speech stimuli to fully restore audibility through at least 4000 Hz. The measures used were demonstrated to be reliable in older adults and, when compared to a reference group of 28 young normal-hearing adults, age-group differences were observed on many of the measures. Principal-components factor analysis was applied successfully to reduce the number of independent and dependent (speech understanding) measures for a multiple-regression analysis. Doing so yielded one global cognitive-processing factor and five non-speech psychoacoustic factors (hearing loss, dichotic signal detection, multi-burst masking, stream segregation, and modulation detection) as potential predictors. To this set of six potential predictor variables were added subject age, Environmental Sound Identification (ESI), and performance on the text-recognition-threshold (TRT) task (a visual analog of interrupted speech recognition). These variables were used to successfully predict one global aided speech-understanding factor, accounting for about 60% of the variance. PMID:24098273

  15. Accounting for rate-dependent category boundary shifts in speech perception.

    PubMed

    Bosker, Hans Rutger

    2017-01-01

    The perception of temporal contrasts in speech is known to be influenced by the speech rate in the surrounding context. This rate-dependent perception is suggested to involve general auditory processes because it is also elicited by nonspeech contexts, such as pure tone sequences. Two general auditory mechanisms have been proposed to underlie rate-dependent perception: durational contrast and neural entrainment. This study compares the predictions of these two accounts of rate-dependent speech perception by means of four experiments, in which participants heard tone sequences followed by Dutch target words ambiguous between /ɑs/ "ash" and /a:s/ "bait". Tone sequences varied in the duration of tones (short vs. long) and in the presentation rate of the tones (fast vs. slow). Results show that the duration of preceding tones did not influence target perception in any of the experiments, thus challenging durational contrast as explanatory mechanism behind rate-dependent perception. Instead, the presentation rate consistently elicited a category boundary shift, with faster presentation rates inducing more /a:s/ responses, but only if the tone sequence was isochronous. Therefore, this study proposes an alternative, neurobiologically plausible account of rate-dependent perception involving neural entrainment of endogenous oscillations to the rate of a rhythmic stimulus.

  16. High-frame-rate full-vocal-tract 3D dynamic speech imaging.

    PubMed

    Fu, Maojing; Barlaz, Marissa S; Holtrop, Joseph L; Perry, Jamie L; Kuehn, David P; Shosted, Ryan K; Liang, Zhi-Pei; Sutton, Bradley P

    2017-04-01

    To achieve high temporal frame rate, high spatial resolution and full-vocal-tract coverage for three-dimensional dynamic speech MRI by using low-rank modeling and sparse sampling. Three-dimensional dynamic speech MRI is enabled by integrating a novel data acquisition strategy and an image reconstruction method with the partial separability model: (a) a self-navigated sparse sampling strategy that accelerates data acquisition by collecting high-nominal-frame-rate cone navigator sand imaging data within a single repetition time, and (b) are construction method that recovers high-quality speech dynamics from sparse (k,t)-space data by enforcing joint low-rank and spatiotemporal total variation constraints. The proposed method has been evaluated through in vivo experiments. A nominal temporal frame rate of 166 frames per second (defined based on a repetition time of 5.99 ms) was achieved for an imaging volume covering the entire vocal tract with a spatial resolution of 2.2 × 2.2 × 5.0 mm 3 . Practical utility of the proposed method was demonstrated via both validation experiments and a phonetics investigation. Three-dimensional dynamic speech imaging is possible with full-vocal-tract coverage, high spatial resolution and high nominal frame rate to provide dynamic speech data useful for phonetic studies. Magn Reson Med 77:1619-1629, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.

  17. Perceptual weighting of the envelope and fine structure across frequency bands for sentence intelligibility: Effect of interruption at the syllabic-rate and periodic-rate of speech

    PubMed Central

    Fogerty, Daniel

    2011-01-01

    Listeners often only have fragments of speech available to understand the intended message due to competing background noise. In order to maximize successful speech recognition, listeners must allocate their perceptual resources to the most informative acoustic properties. The speech signal contains temporally-varying acoustics in the envelope and fine structure that are present across the frequency spectrum. Understanding how listeners perceptually weigh these acoustic properties in different frequency regions during interrupted speech is essential for the design of assistive listening devices. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for interrupted sentence materials. Perceptual weights were obtained during interruption at the syllabic rate (i.e., 4 Hz) and the periodic rate (i.e., 128 Hz) of speech. Potential interruption interactions with fundamental frequency information were investigated by shifting the natural pitch contour higher relative to the interruption rate. The availability of each acoustic property was varied independently by adding noise at different levels. Perceptual weights were determined by correlating a listener’s performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated similar relative weights across the interruption conditions, with emphasis on the envelope in high-frequencies. PMID:21786914

  18. Examining the relationship between speech intensity and self-rated communicative effectiveness in individuals with Parkinson's disease and hypophonia.

    PubMed

    Dykstra, Allyson D; Adams, Scott G; Jog, Mandar

    2015-01-01

    To examine the relationship between speech intensity and self-ratings of communicative effectiveness in speakers with Parkinson's disease (PD) and hypophonia. An additional purpose was to evaluate if self-ratings of communicative effectiveness made by participants with PD differed from ratings made by primary communication partners. Thirty participants with PD and 15 healthy older adults completed the Communication Effectiveness Survey. Thirty primary communication partners rated the communicative effectiveness of his/her partner with PD. Speech intensity was calculated for participants with PD and control participants based on conversational utterances. Results revealed significant differences between groups in conversational speech intensity (p=.001). Participants with PD self-rated communicative effectiveness significantly lower than control participants (p=.000). Correlational analyses revealed a small but non-significant relationship between speech intensity and communicative effectiveness for participants with PD (r=0.298, p=.110) and control participants (r=0.327, p=.234). Self-ratings of communicative effectiveness made participants with PD was not significantly different than ratings made by primary communication partners (p=.20). Obtaining information on communicative effectiveness may help to broaden outcome measurement and may aid in the provision of educational strategies. Findings also suggest that communicative effectiveness may be a separate and a distinct construct that cannot necessarily be predicted from the severity of hypophonia. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Public and private language ideologies as reflected in language attitudes on the Island of Korcula.

    PubMed

    Sujoldzić, Anita; Simicić, Lucija

    2013-06-01

    Since languages are such powerful means of group identification, they may be considered as constitutive of communities. Attitudes expressed toward certain linguistic varieties may thus be perceived as attitudes held toward respective community-members. However, as attitudes are not always easily accessible, and are rarely one-dimensional but rather multi-layered, an insight into overt (publicly proclaimed) and covert (privately held) ideologies can enhance understanding of language attitudes and their meaning. This paper brings the analysis of these two types of attitudes held by adolescents in three most populated places on the island of Korcula, Croatia. The analysis is based on the results obtained by means of a questionnaire eliciting, among other things, overt attitudes toward six local, regional and supra-regional varieties, and covert attitudes toward judges' local speech and the Standard variety of Croatian. Although the results confirm some expected tendencies in the evaluation of different varieties, subsequently conducted analysis of speech recognition rates offers some valuable insights and interesting implications for further interpretation of the results.

  20. The effect of emotion on articulation rate in persistence and recovery of childhood stuttering.

    PubMed

    Erdemir, Aysu; Walden, Tedra A; Jefferson, Caswell M; Choi, Dahye; Jones, Robin M

    2018-06-01

    This study investigated the possible association of emotional processes and articulation rate in pre-school age children who stutter and persist (persisting), children who stutter and recover (recovered) and children who do not stutter (nonstuttering). The participants were ten persisting, ten recovered, and ten nonstuttering children between the ages of 3-5 years; who were classified as persisting, recovered, or nonstuttering approximately 2-2.5 years after the experimental testing took place. The children were exposed to three emotionally-arousing video clips (baseline, positive and negative) and produced a narrative based on a text-free storybook following each video clip. From the audio-recordings of these narratives, individual utterances were transcribed and articulation rates were calculated. Results indicated that persisting children exhibited significantly slower articulation rates following the negative emotion condition, unlike recovered and nonstuttering children whose articulation rates were not affected by either of the two emotion-inducing conditions. Moreover, all stuttering children displayed faster rates during fluent compared to stuttered speech; however, the recovered children were significantly faster than the persisting children during fluent speech. Negative emotion plays a detrimental role on the speech-motor control processes of children who persist, whereas children who eventually recover seem to exhibit a relatively more stable and mature speech-motor system. This suggests that complex interactions between speech-motor and emotional processes are at play in stuttering recovery and persistency; and articulation rates following negative emotion or during stuttered versus fluent speech might be considered as potential factors to prospectively predict persistence and recovery from stuttering. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. A speech-controlled environmental control system for people with severe dysarthria.

    PubMed

    Hawley, Mark S; Enderby, Pam; Green, Phil; Cunningham, Stuart; Brownsell, Simon; Carmichael, James; Parker, Mark; Hatzis, Athanassios; O'Neill, Peter; Palmer, Rebecca

    2007-06-01

    Automatic speech recognition (ASR) can provide a rapid means of controlling electronic assistive technology. Off-the-shelf ASR systems function poorly for users with severe dysarthria because of the increased variability of their articulations. We have developed a limited vocabulary speaker dependent speech recognition application which has greater tolerance to variability of speech, coupled with a computerised training package which assists dysarthric speakers to improve the consistency of their vocalisations and provides more data for recogniser training. These applications, and their implementation as the interface for a speech-controlled environmental control system (ECS), are described. The results of field trials to evaluate the training program and the speech-controlled ECS are presented. The user-training phase increased the recognition rate from 88.5% to 95.4% (p<0.001). Recognition rates were good for people with even the most severe dysarthria in everyday usage in the home (mean word recognition rate 86.9%). Speech-controlled ECS were less accurate (mean task completion accuracy 78.6% versus 94.8%) but were faster to use than switch-scanning systems, even taking into account the need to repeat unsuccessful operations (mean task completion time 7.7s versus 16.9s, p<0.001). It is concluded that a speech-controlled ECS is a viable alternative to switch-scanning systems for some people with severe dysarthria and would lead, in many cases, to more efficient control of the home.

  2. Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech

    PubMed Central

    Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.

    2013-01-01

    Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414

  3. Effect of Dialect on Identification and Severity of Speech Impairment in Indigenous Australian Children

    ERIC Educational Resources Information Center

    Toohill, Bethany J.; Mcleod, Sharynne; Mccormack, Jane

    2012-01-01

    This study investigated the effect of dialectal difference on identification and rating of severity of speech impairment in children from Indigenous Australian backgrounds. The speech of 15 Indigenous Australian children identified by their parents/caregivers and teachers as having "difficulty talking and making speech sounds" was…

  4. Breathing-Impaired Speech after Brain Haemorrhage: A Case Study

    ERIC Educational Resources Information Center

    Heselwood, Barry

    2007-01-01

    Results are presented from an auditory and acoustic analysis of the speech of an adult male with impaired prosody and articulation due to brain haemorrhage. They show marked effects on phonation, speech rate and articulator velocity, and a speech rhythm disrupted by "intrusive" stresses. These effects are discussed in relation to the speaker's…

  5. Speech Intelligibility and Personality Peer-Ratings of Young Adults with Cochlear Implants

    ERIC Educational Resources Information Center

    Freeman, Valerie

    2018-01-01

    Speech intelligibility, or how well a speaker's words are understood by others, affects listeners' judgments of the speaker's competence and personality. Deaf cochlear implant (CI) users vary widely in speech intelligibility, and their speech may have a noticeable "deaf" quality, both of which could evoke negative stereotypes or…

  6. Speech Characteristics and Intelligibility in Adults with Mild and Moderate Intellectual Disabilities

    PubMed Central

    Coppens-Hofman, Marjolein C.; Terband, Hayo; Snik, Ad F.M.; Maassen, Ben A.M.

    2017-01-01

    Purpose Adults with intellectual disabilities (ID) often show reduced speech intelligibility, which affects their social interaction skills. This study aims to establish the main predictors of this reduced intelligibility in order to ultimately optimise management. Method Spontaneous speech and picture naming tasks were recorded in 36 adults with mild or moderate ID. Twenty-five naïve listeners rated the intelligibility of the spontaneous speech samples. Performance on the picture-naming task was analysed by means of a phonological error analysis based on expert transcriptions. Results The transcription analyses showed that the phonemic and syllabic inventories of the speakers were complete. However, multiple errors at the phonemic and syllabic level were found. The frequencies of specific types of errors were related to intelligibility and quality ratings. Conclusions The development of the phonemic and syllabic repertoire appears to be completed in adults with mild-to-moderate ID. The charted speech difficulties can be interpreted to indicate speech motor control and planning difficulties. These findings may aid the development of diagnostic tests and speech therapies aimed at improving speech intelligibility in this specific group. PMID:28118637

  7. Patterns of Post-Stroke Brain Damage that Predict Speech Production Errors in Apraxia of Speech and Aphasia Dissociate

    PubMed Central

    Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius

    2015-01-01

    Background and Purpose Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions regarding if AOS emerges from a unique pattern of brain damage or as a sub-element of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Methods Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The Apraxia of Speech Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with AOS and/or aphasia. Localized brain damage was identified using structural MRI, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS and/or aphasia, and brain damage. Results The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS and/or aphasia were associated with damage to the temporal lobe and the inferior pre-central frontal regions. Conclusion AOS likely occurs in conjunction with aphasia due to the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. PMID:25908457

  8. Influence of speech sample on perceptual rating of hypernasality.

    PubMed

    Medeiros, Maria Natália Leite de; Fukushiro, Ana Paula; Yamashita, Renata Paciello

    2016-07-07

    To investigate the influence of speech sample of spontaneous conversation or sentences repetition on intra and inter-rater hypernasality reliability. One hundred and twenty audio recorded speech samples (60 containing spontaneous conversation and 60 containing repeated sentences) of individuals with repaired cleft palate±lip, both genders, aged between 6 and 52 years old (mean=21±10) were selected and edited. Three experienced speech and language pathologists rated hypernasality according to their own criteria using 4-point scale: 1=absence of hypernasality, 2=mild hypernasality, 3=moderate hypernasality and 4=severe hypernasality, first in spontaneous speech samples and 30 days after, in sentences repetition samples. Intra- and inter-rater agreements were calculated for both speech samples and were statistically compared by the Z test at a significance level of 5%. Comparison of intra-rater agreements between both speech samples showed an increase of the coefficients obtained in the analysis of sentences repetition compared to those obtained in spontaneous conversation. Comparison between inter-rater agreement showed no significant difference among the three raters for the two speech samples. Sentences repetition improved intra-raters reliability of perceptual judgment of hypernasality. However, the speech sample had no influence on reliability among different raters.

  9. Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs

    NASA Astrophysics Data System (ADS)

    Dias, Tiago; Roma, Nuno; Sousa, Leonel

    2014-12-01

    A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e.g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 × 4,320 at 30 fps) in real time.

  10. Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects

    PubMed Central

    Skoog Waller, Sara; Eriksson, Mårten

    2016-01-01

    The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics (f0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20–25, 40–45, and 60–65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers’ age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency (f0) and speech rate when attempting to sound younger and decreased f0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f0, as a cue to speaker age. It was concluded that age disguise by voice can be achieved by naïve speakers even though the perceived effect was smaller than intended. PMID:27917144

  11. Classification of speech and language profiles in 4-year old children with cerebral palsy: A prospective preliminary study

    PubMed Central

    Hustad, Katherine C.; Gorton, Kristin; Lee, Jimin

    2010-01-01

    Purpose Little is known about the speech and language abilities of children with cerebral palsy (CP) and there is currently no system for classifying speech and language profiles. Such a system would have epidemiological value and would have the potential to advance the development of interventions that improve outcomes. In this study, we propose and test a preliminary speech and language classification system by quantifying how well speech and language data differentiate among children classified into different hypothesized profile groups. Method Speech and language assessment data were collected in a laboratory setting from 34 children with CP (18 males; 16 females) who were a mean age of 54 months (SD 1.8 months). Measures of interest were vowel area, speech rate, language comprehension scores, and speech intelligibility ratings. Results Canonical discriminant function analysis showed that three functions accounted for 100% of the variance among profile groups, with speech variables accounting for 93% of the variance. Classification agreement varied from 74% to 97% using four different classification paradigms. Conclusions Results provide preliminary support for the classification of speech and language abilities of children with CP into four initial profile groups. Further research is necessary to validate the full classification system. PMID:20643795

  12. Recovering With Acquired Apraxia of Speech: The First 2 Years.

    PubMed

    Haley, Katarina L; Shafer, Jennifer N; Harmon, Tyson G; Jacks, Adam

    2016-12-01

    This study was intended to document speech recovery for 1 person with acquired apraxia of speech quantitatively and on the basis of her lived experience. The second author sustained a traumatic brain injury that resulted in acquired apraxia of speech. Over a 2-year period, she documented her recovery through 22 video-recorded monologues. We analyzed these monologues using a combination of auditory perceptual, acoustic, and qualitative methods. Recovery was evident for all quantitative variables examined. For speech sound production, the recovery was most prominent during the first 3 months, but slower improvement was evident for many months. Measures of speaking rate, fluency, and prosody changed more gradually throughout the entire period. A qualitative analysis of topics addressed in the monologues was consistent with the quantitative speech recovery and indicated a subjective dynamic relationship between accuracy and rate, an observation that several factors made speech sound production variable, and a persisting need for cognitive effort while speaking. Speech features improved over an extended time, but the recovery trajectories differed, indicating dynamic reorganization of the underlying speech production system. The relationship among speech dimensions should be examined in other cases and in population samples. The combination of quantitative and qualitative analysis methods offers advantages for understanding clinically relevant aspects of recovery.

  13. Perception of speech rhythm in second language: the case of rhythmically similar L1 and L2

    PubMed Central

    Ordin, Mikhail; Polyanskaya, Leona

    2015-01-01

    We investigated the perception of developmental changes in timing patterns that happen in the course of second language (L2) acquisition, provided that the native and the target languages of the learner are rhythmically similar (German and English). It was found that speech rhythm in L2 English produced by German learners becomes increasingly stress-timed as acquisition progresses. This development is captured by the tempo-normalized rhythm measures of durational variability. Advanced learners also deliver speech at a faster rate. However, when native speakers have to classify the timing patterns characteristic of L2 English of German learners at different proficiency levels, they attend to speech rate cues and ignore the differences in speech rhythm. PMID:25859228

  14. The Ways of the Hand: A Study of Hand Function among Blind, Visually Impaired and Visually Impaired Multi-Handicapped Children and Adolescents.

    ERIC Educational Resources Information Center

    Rogow, Sally M.

    1987-01-01

    The manual development of 148 blind, visually impaired, and visually impaired multi-handicapped students, aged 3-19, was studied. Results indicated a significant relationship between object manipulation and speech, and an inverse relationship between object manipulation and stereotypic hand mannerisms. Optimal development of manual functions and…

  15. Primer for Perception: A Manual Designed to Help Professionals, Para-Professionals and Volunteers Help Children "Learn to Learn".

    ERIC Educational Resources Information Center

    Goldzer, Beatrice F.

    This manual for use by professionals, paraprofessionals, and tutors provides 10 multi-level, multi-purpose units for teaching children with reading, writing, or speech problems. The units were designed for use with preschool through sixth-grade students and consist of games, exercises, drills, evaluation, and suggestions for activities. The manual…

  16. Vector Adaptive/Predictive Encoding Of Speech

    NASA Technical Reports Server (NTRS)

    Chen, Juin-Hwey; Gersho, Allen

    1989-01-01

    Vector adaptive/predictive technique for digital encoding of speech signals yields decoded speech of very good quality after transmission at coding rate of 9.6 kb/s and of reasonably good quality at 4.8 kb/s. Requires 3 to 4 million multiplications and additions per second. Combines advantages of adaptive/predictive coding, and code-excited linear prediction, yielding speech of high quality but requires 600 million multiplications and additions per second at encoding rate of 4.8 kb/s. Vector adaptive/predictive coding technique bridges gaps in performance and complexity between adaptive/predictive coding and code-excited linear prediction.

  17. Auditory Masking Effects on Speech Fluency in Apraxia of Speech and Aphasia: Comparison to Altered Auditory Feedback

    PubMed Central

    Haley, Katarina L.

    2015-01-01

    Purpose To study the effects of masked auditory feedback (MAF) on speech fluency in adults with aphasia and/or apraxia of speech (APH/AOS). We hypothesized that adults with AOS would increase speech fluency when speaking with noise. Altered auditory feedback (AAF; i.e., delayed/frequency-shifted feedback) was included as a control condition not expected to improve speech fluency. Method Ten participants with APH/AOS and 10 neurologically healthy (NH) participants were studied under both feedback conditions. To allow examination of individual responses, we used an ABACA design. Effects were examined on syllable rate, disfluency duration, and vocal intensity. Results Seven of 10 APH/AOS participants increased fluency with masking by increasing rate, decreasing disfluency duration, or both. In contrast, none of the NH participants increased speaking rate with MAF. In the AAF condition, only 1 APH/AOS participant increased fluency. Four APH/AOS participants and 8 NH participants slowed their rate with AAF. Conclusions Speaking with MAF appears to increase fluency in a subset of individuals with APH/AOS, indicating that overreliance on auditory feedback monitoring may contribute to their disorder presentation. The distinction between responders and nonresponders was not linked to AOS diagnosis, so additional work is needed to develop hypotheses for candidacy and underlying control mechanisms. PMID:26363508

  18. Speech intelligibility in complex acoustic environments in young children

    NASA Astrophysics Data System (ADS)

    Litovsky, Ruth

    2003-04-01

    While the auditory system undergoes tremendous maturation during the first few years of life, it has become clear that in complex scenarios when multiple sounds occur and when echoes are present, children's performance is significantly worse than their adult counterparts. The ability of children (3-7 years of age) to understand speech in a simulated multi-talker environment and to benefit from spatial separation of the target and competing sounds was investigated. In these studies, competing sources vary in number, location, and content (speech, modulated or unmodulated speech-shaped noise and time-reversed speech). The acoustic spaces were also varied in size and amount of reverberation. Finally, children with chronic otitis media who received binaural training were tested pre- and post-training on a subset of conditions. Results indicated the following. (1) Children experienced significantly more masking than adults, even in the simplest conditions tested. (2) When the target and competing sounds were spatially separated speech intelligibility improved, but the amount varied with age, type of competing sound, and number of competitors. (3) In a large reverberant classroom there was no benefit of spatial separation. (4) Binaural training improved speech intelligibility performance in children with otitis media. Future work includes similar studies in children with unilateral and bilateral cochlear implants. [Work supported by NIDCD, DRF, and NOHR.

  19. An integrated analysis of speech and gestural characteristics in conversational child-computer interactions

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Montanari, Simona; Andersen, Elaine; Narayanan, Shrikanth S.

    2003-10-01

    Understanding the fine details of children's speech and gestural characteristics helps, among other things, in creating natural computer interfaces. We analyze the acoustic, lexical/non-lexical and spoken/gestural discourse characteristics of young children's speech using audio-video data gathered using a Wizard of Oz technique from 4 to 6 year old children engaged in resolving a series of age-appropriate cognitive challenges. Fundamental and formant frequencies exhibited greater variations between subjects consistent with previous results on read speech [Lee et al., J. Acoust. Soc. Am. 105, 1455-1468 (1999)]. Also, our analysis showed that, in a given bandwidth, phonemic information contained in the speech of young child is significantly less than that of older ones and adults. To enable an integrated analysis, a multi-track annotation board was constructed using the ANVIL tool kit [M. Kipp, Eurospeech 1367-1370 (2001)]. Along with speech transcriptions and acoustic analysis, non-lexical and discourse characteristics, and child's gesture (facial expressions, body movements, hand/head movements) were annotated in a synchronized multilayer system. Initial results showed that younger children rely more on gestures to emphasize their verbal assertions. Younger children use non-lexical speech (e.g., um, huh) associated with frustration and pondering/reflecting more frequently than older ones. Younger children also repair more with humans than with computer.

  20. Masking release for words in amplitude-modulated noise as a function of modulation rate and task

    PubMed Central

    Buss, Emily; Whittle, Lisa N.; Grose, John H.; Hall, Joseph W.

    2009-01-01

    For normal-hearing listeners, masked speech recognition can improve with the introduction of masker amplitude modulation. The present experiments tested the hypothesis that this masking release is due in part to an interaction between the temporal distribution of cues necessary to perform the task and the probability of those cues temporally coinciding with masker modulation minima. Stimuli were monosyllabic words masked by speech-shaped noise, and masker modulation was introduced via multiplication with a raised sinusoid of 2.5–40 Hz. Tasks included detection, three-alternative forced-choice identification, and open-set identification. Overall, there was more masking release associated with the closed than the open-set tasks. The best rate of modulation also differed as a function of task; whereas low modulation rates were associated with best performance for the detection and three-alternative identification tasks, performance improved with modulation rate in the open-set task. This task-by-rate interaction was also observed when amplitude-modulated speech was presented in a steady masker, and for low- and high-pass filtered speech presented in modulated noise. These results were interpreted as showing that the optimal rate of amplitude modulation depends on the temporal distribution of speech cues and the information required to perform a particular task. PMID:19603883

  1. Systematic studies of modified vocalization: speech production changes during a variation of metronomic speech in persons who do and do not stutter.

    PubMed

    Davidow, Jason H; Bothe, Anne K; Ye, Jun

    2011-06-01

    The most common way to induce fluency using rhythm requires persons who stutter to speak one syllable or one word to each beat of a metronome, but stuttering can also be eliminated when the stimulus is of a particular duration (e.g., 1 second [s]). The present study examined stuttering frequency, speech production changes, and speech naturalness during rhythmic speech that alternated 1s of reading with 1s of silence. A repeated-measures design was used to compare data obtained during a control reading condition and during rhythmic reading in 10 persons who stutter (PWS) and 10 normally fluent controls. Ratings for speech naturalness were also gathered from naïve listeners. Results showed that mean vowel duration increased significantly, and the percentage of short phonated intervals decreased significantly, for both groups from the control to the experimental condition. Mean phonated interval length increased significantly for the fluent controls. Mean speech naturalness ratings during the experimental condition were approximately "7" on a 1-9 scale (1=highly natural; 9=highly unnatural), and these ratings were significantly correlated with vowel duration and phonated intervals for PWS. The findings indicate that PWS may be altering vocal fold vibration duration to obtain fluency during this rhythmic speech style, and that vocal fold vibration duration may have an impact on speech naturalness during rhythmic speech. Future investigations should examine speech production changes and speech naturalness during variations of this rhythmic condition. The reader will be able to: (1) describe changes (from a control reading condition) in speech production variables when alternating between 1s of reading and 1s of silence, (2) describe which rhythmic conditions have been found to sound and feel the most natural, (3) describe methodological issues for studies about alterations in speech production variables during fluency-inducing conditions, and (4) describe which fluency-inducing conditions have been shown to involve a reduction in short phonated intervals. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Research on oral test modeling based on multi-feature fusion

    NASA Astrophysics Data System (ADS)

    Shi, Yuliang; Tao, Yiyue; Lei, Jun

    2018-04-01

    In this paper, the spectrum of speech signal is taken as an input of feature extraction. The advantage of PCNN in image segmentation and other processing is used to process the speech spectrum and extract features. And a new method combining speech signal processing and image processing is explored. At the same time of using the features of the speech map, adding the MFCC to establish the spectral features and integrating them with the features of the spectrogram to further improve the accuracy of the spoken language recognition. Considering that the input features are more complicated and distinguishable, we use Support Vector Machine (SVM) to construct the classifier, and then compare the extracted test voice features with the standard voice features to achieve the spoken standard detection. Experiments show that the method of extracting features from spectrograms using PCNN is feasible, and the fusion of image features and spectral features can improve the detection accuracy.

  3. Standardization of Freeze Frame TV Codecs

    DTIC Science & Technology

    1990-06-01

    Kodak SV9600 Still Video Transceiver Colorado Video, Inc.286 Digital Transceiver Image Data Corp. CP-200 Photophone Interand Corp. DISCON Imagephone...error recovery Proprietary Proprby retransmission errorIMAGE BUILD-UP Sequential Sequential PHOTOPHONE Video Teleconferenc- DISCON Imaqephone GENERIC...and information transfer is effected among terminals. An indication of the function and power of these commands can be obtained by reviewing Table

  4. Cohesive and coherent connected speech deficits in mild stroke.

    PubMed

    Barker, Megan S; Young, Breanne; Robinson, Gail A

    2017-05-01

    Spoken language production theories and lesion studies highlight several important prelinguistic conceptual preparation processes involved in the production of cohesive and coherent connected speech. Cohesion and coherence broadly connect sentences with preceding ideas and the overall topic. Broader cognitive mechanisms may mediate these processes. This study aims to investigate (1) whether stroke patients without aphasia exhibit impairments in cohesion and coherence in connected speech, and (2) the role of attention and executive functions in the production of connected speech. Eighteen stroke patients (8 right hemisphere stroke [RHS]; 6 left [LHS]) and 21 healthy controls completed two self-generated narrative tasks to elicit connected speech. A multi-level analysis of within and between-sentence processing ability was conducted. Cohesion and coherence impairments were found in the stroke group, particularly RHS patients, relative to controls. In the whole stroke group, better performance on the Hayling Test of executive function, which taps verbal initiation/suppression, was related to fewer propositional repetitions and global coherence errors. Better performance on attention tasks was related to fewer propositional repetitions, and decreased global coherence errors. In the RHS group, aspects of cohesive and coherent speech were associated with better performance on attention tasks. Better Hayling Test scores were related to more cohesive and coherent speech in RHS patients, and more coherent speech in LHS patients. Thus, we documented connected speech deficits in a heterogeneous stroke group without prominent aphasia. Our results suggest that broader cognitive processes may play a role in producing connected speech at the early conceptual preparation stage. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. An 8-PSK TDMA uplink modulation and coding system

    NASA Technical Reports Server (NTRS)

    Ames, S. A.

    1992-01-01

    The combination of 8-phase shift keying (8PSK) modulation and greater than 2 bits/sec/Hz drove the design of the Nyquist filter to one specified to have a rolloff factor of 0.2. This filter when built and tested was found to produce too much intersymbol interference and was abandoned for a design with a rolloff factor of 0.4. The preamble is limited to 100 bit periods of the uncoded bit period of 5 ns for a maximum preamble length of 500 ns or 40 8PSK symbol times at 12.5 ns per symbol. For 8PSK modulation, the required maximum degradation of 1 dB in -20 dB cochannel interference (CCI) drove the requirement for forward error correction coding. In this contract, the funding was not sufficient to develop the proposed codec so the codec was limited to a paper design during the preliminary design phase. The mechanization of the demodulator is digital, starting from the output of the analog to digital converters which quantize the outputs of the quadrature phase detectors. This approach is amenable to an application specific integrated circuit (ASIC) replacement in the next phase of development.

  6. Improved inter-layer prediction for light field content coding with display scalability

    NASA Astrophysics Data System (ADS)

    Conti, Caroline; Ducla Soares, Luís.; Nunes, Paulo

    2016-09-01

    Light field imaging based on microlens arrays - also known as plenoptic, holoscopic and integral imaging - has recently risen up as feasible and prospective technology due to its ability to support functionalities not straightforwardly available in conventional imaging systems, such as: post-production refocusing and depth of field changing. However, to gradually reach the consumer market and to provide interoperability with current 2D and 3D representations, a display scalable coding solution is essential. In this context, this paper proposes an improved display scalable light field codec comprising a three-layer hierarchical coding architecture (previously proposed by the authors) that provides interoperability with 2D (Base Layer) and 3D stereo and multiview (First Layer) representations, while the Second Layer supports the complete light field content. For further improving the compression performance, novel exemplar-based inter-layer coding tools are proposed here for the Second Layer, namely: (i) an inter-layer reference picture construction relying on an exemplar-based optimization algorithm for texture synthesis, and (ii) a direct prediction mode based on exemplar texture samples from lower layers. Experimental results show that the proposed solution performs better than the tested benchmark solutions, including the authors' previous scalable codec.

  7. Broadband set-top box using MAP-CA processor

    NASA Astrophysics Data System (ADS)

    Bush, John E.; Lee, Woobin; Basoglu, Chris

    2001-12-01

    Advances in broadband access are expected to exert a profound impact in our everyday life. It will be the key to the digital convergence of communication, computer and consumer equipment. A common thread that facilitates this convergence comprises digital media and Internet. To address this market, Equator Technologies, Inc., is developing the Dolphin broadband set-top box reference platform using its MAP-CA Broadband Signal ProcessorT chip. The Dolphin reference platform is a universal media platform for display and presentation of digital contents on end-user entertainment systems. The objective of the Dolphin reference platform is to provide a complete set-top box system based on the MAP-CA processor. It includes all the necessary hardware and software components for the emerging broadcast and the broadband digital media market based on IP protocols. Such reference design requires a broadband Internet access and high-performance digital signal processing. By using the MAP-CA processor, the Dolphin reference platform is completely programmable, allowing various codecs to be implemented in software, such as MPEG-2, MPEG-4, H.263 and proprietary codecs. The software implementation also enables field upgrades to keep pace with evolving technology and industry demands.

  8. Hemispheric asymmetry in auditory processing of speech envelope modulations in prereading children.

    PubMed

    Vanvooren, Sophie; Poelmans, Hanne; Hofmann, Michael; Ghesquière, Pol; Wouters, Jan

    2014-01-22

    The temporal envelope of speech is an important cue contributing to speech intelligibility. Theories about the neural foundations of speech perception postulate that the left and right auditory cortices are functionally specialized in analyzing speech envelope information at different time scales: the right hemisphere is thought to be specialized in processing syllable rate modulations, whereas a bilateral or left hemispheric specialization is assumed for phoneme rate modulations. Recently, it has been found that this functional hemispheric asymmetry is different in individuals with language-related disorders such as dyslexia. Most studies were, however, performed in adults and school-aged children, and only a little is known about how neural auditory processing at these specific rates manifests and develops in very young children before reading acquisition. Yet, studying hemispheric specialization for processing syllable and phoneme rate modulations in preliterate children may reveal early neural markers for dyslexia. In the present study, human cortical evoked potentials to syllable and phoneme rate modulations were measured in 5-year-old children at high and low hereditary risk for dyslexia. The results demonstrate a right hemispheric preference for processing syllable rate modulations and a symmetric pattern for phoneme rate modulations, regardless of hereditary risk for dyslexia. These results suggest that, while hemispheric specialization for processing syllable rate modulations seems to be mature in prereading children, hemispheric specialization for phoneme rate modulation processing may still be developing. These findings could have important implications for the development of phonological and reading skills.

  9. Extending and Applying the EPIC Architecture for Human Cognition and Performance: Auditory and Spatial Components

    DTIC Science & Technology

    2016-03-01

    manual rather than verbal responses. The coordinate response measure ( CRM ) task and speech corpus is a highly simplified form of the command and...in multi-talker speech experiments. The CRM corpus is a collection of recorded command utterances in the form of Ready <Callsign> go to <Color...In the two-talker CRM listening task, participants respond to commands by pointing to the appropriate Color/Digit pair on a computer display. A

  10. I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance

    PubMed Central

    Hantke, Simone; Weninger, Felix; Kurle, Richard; Ringeval, Fabien; Batliner, Anton; Mousa, Amr El-Desoky; Schuller, Björn

    2016-01-01

    We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient. PMID:27176486

  11. Acceptable range of speech level in noisy sound fields for young adults and elderly persons.

    PubMed

    Sato, Hayato; Morimoto, Masayuki; Ota, Ryo

    2011-09-01

    The acceptable range of speech level as a function of background noise level was investigated on the basis of word intelligibility scores and listening difficulty ratings. In the present study, the acceptable range is defined as the range that maximizes word intelligibility scores and simultaneously does not cause a significant increase in listening difficulty ratings from the minimum ratings. Listening tests with young adult and elderly listeners demonstrated the following. (1) The acceptable range of speech level for elderly listeners overlapped that for young listeners. (2) The lower limit of the acceptable speech level for both young and elderly listeners was 65 dB (A-weighted) for noise levels of 40 and 45 dB (A-weighted), a level with a speech-to-noise ratio of +15 dB for noise levels of 50 and 55 dB, and a level with a speech-to-noise ratio of +10 dB for noise levels from 60 to 70 dB. (3) The upper limit of the acceptable speech level for both young and elderly listeners was 80 dB for noise levels from 40 to 55 dB and 85 dB or above for noise levels from 55 to 70 dB. © 2011 Acoustical Society of America

  12. Foreign-Accented Speech Perception Ratings: A Multifactorial Case Study

    ERIC Educational Resources Information Center

    Kraut, Rachel; Wulff, Stefanie

    2013-01-01

    Seventy-eight native English speakers rated the foreign-accented speech (FAS) of 24 international students enrolled in an Intensive English programme at a public university in Texas on degree of accent, comprehensibility and communicative ability. Variables considered to potentially impact listeners' ratings were the sex of the speaker, the first…

  13. When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

    PubMed

    Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

    2017-11-01

    Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. An Approach to Co-Channel Talker Interference Suppression Using a Sinusoidal Model for Speech

    DTIC Science & Technology

    1988-02-05

    Massachusetts Institute of Technologp, with the ’Apport of the Department of the Air Force under Contract F19628-85-C-0002. ŕir re-port tniay be...Extracted from Summed Vocalic Waveforms 28 5-1 Failure of the Least Squares Solution with Closely-Spaced Frequencies. (a) Crossing Frequency Tracks, (b... Crossing Pitch Contours. 31 5-2 Multi-Frame Interpolation 33 5-3 Different Forms of Multi-Frame Interpolation 33 5-4 Recovery of Missing Lobe with Multi

  15. Adaptation to an electropalatograph palate: acoustic, impressionistic, and perceptual data.

    PubMed

    McLeod, Sharynne; Searl, Jeff

    2006-05-01

    The purpose of this study was to evaluate adaptation to the electropalatograph (EPG) from the perspective of consonant acoustics, listener perceptions, and speaker ratings. Seven adults with typical speech wore an EPG and pseudo-EPG palate over 2 days and produced syllables, read a passage, counted, and rated their adaptation to the palate. Consonant acoustics, listener ratings, and speaker ratings were analyzed. The spectral mean for the burst (/t/) and frication (/s/) was reduced for the first 60-120 min of wearing the pseudo-EPG palate. Temporal features (stop gap, frication, and syllable duration) were unaffected by wearing the pseudo-EPG palate. The EPG palate had a similar effect on consonant acoustics as the pseudo-EPG palate. Expert listener ratings indicated minimal to no change in speech naturalness or distortion from the pseudo-EPG or EPG palate. The sounds [see text] were most likely to be affected. Speaker self-ratings related to oral comfort, speech, tongue movement, appearance, and oral sensation were negatively affected by the presence of the palatal devices. Speakers detected a substantial difference when wearing a palatal device, but the effects on speech were minimal based on listener ratings. Spectral features of consonants were initially affected, although adaptation occurred. Wearing an EPG or pseudo-EPG palate for approximately 2 hr results in relatively normal-sounding speech with acoustic features similar to a no-palate condition.

  16. Positron Emission Tomography Imaging Reveals Auditory and Frontal Cortical Regions Involved with Speech Perception and Loudness Adaptation.

    PubMed

    Berding, Georg; Wilke, Florian; Rode, Thilo; Haense, Cathleen; Joseph, Gert; Meyer, Geerd J; Mamach, Martin; Lenarz, Minoo; Geworski, Lilli; Bengel, Frank M; Lenarz, Thomas; Lim, Hubert H

    2015-01-01

    Considerable progress has been made in the treatment of hearing loss with auditory implants. However, there are still many implanted patients that experience hearing deficiencies, such as limited speech understanding or vanishing perception with continuous stimulation (i.e., abnormal loudness adaptation). The present study aims to identify specific patterns of cerebral cortex activity involved with such deficiencies. We performed O-15-water positron emission tomography (PET) in patients implanted with electrodes within the cochlea, brainstem, or midbrain to investigate the pattern of cortical activation in response to speech or continuous multi-tone stimuli directly inputted into the implant processor that then delivered electrical patterns through those electrodes. Statistical parametric mapping was performed on a single subject basis. Better speech understanding was correlated with a larger extent of bilateral auditory cortex activation. In contrast to speech, the continuous multi-tone stimulus elicited mainly unilateral auditory cortical activity in which greater loudness adaptation corresponded to weaker activation and even deactivation. Interestingly, greater loudness adaptation was correlated with stronger activity within the ventral prefrontal cortex, which could be up-regulated to suppress the irrelevant or aberrant signals into the auditory cortex. The ability to detect these specific cortical patterns and differences across patients and stimuli demonstrates the potential for using PET to diagnose auditory function or dysfunction in implant patients, which in turn could guide the development of appropriate stimulation strategies for improving hearing rehabilitation. Beyond hearing restoration, our study also reveals a potential role of the frontal cortex in suppressing irrelevant or aberrant activity within the auditory cortex, and thus may be relevant for understanding and treating tinnitus.

  17. Positron Emission Tomography Imaging Reveals Auditory and Frontal Cortical Regions Involved with Speech Perception and Loudness Adaptation

    PubMed Central

    Berding, Georg; Wilke, Florian; Rode, Thilo; Haense, Cathleen; Joseph, Gert; Meyer, Geerd J.; Mamach, Martin; Lenarz, Minoo; Geworski, Lilli; Bengel, Frank M.; Lenarz, Thomas; Lim, Hubert H.

    2015-01-01

    Considerable progress has been made in the treatment of hearing loss with auditory implants. However, there are still many implanted patients that experience hearing deficiencies, such as limited speech understanding or vanishing perception with continuous stimulation (i.e., abnormal loudness adaptation). The present study aims to identify specific patterns of cerebral cortex activity involved with such deficiencies. We performed O-15-water positron emission tomography (PET) in patients implanted with electrodes within the cochlea, brainstem, or midbrain to investigate the pattern of cortical activation in response to speech or continuous multi-tone stimuli directly inputted into the implant processor that then delivered electrical patterns through those electrodes. Statistical parametric mapping was performed on a single subject basis. Better speech understanding was correlated with a larger extent of bilateral auditory cortex activation. In contrast to speech, the continuous multi-tone stimulus elicited mainly unilateral auditory cortical activity in which greater loudness adaptation corresponded to weaker activation and even deactivation. Interestingly, greater loudness adaptation was correlated with stronger activity within the ventral prefrontal cortex, which could be up-regulated to suppress the irrelevant or aberrant signals into the auditory cortex. The ability to detect these specific cortical patterns and differences across patients and stimuli demonstrates the potential for using PET to diagnose auditory function or dysfunction in implant patients, which in turn could guide the development of appropriate stimulation strategies for improving hearing rehabilitation. Beyond hearing restoration, our study also reveals a potential role of the frontal cortex in suppressing irrelevant or aberrant activity within the auditory cortex, and thus may be relevant for understanding and treating tinnitus. PMID:26046763

  18. The Comprehension of Rapid Speech by the Blind: Part III. Final Report.

    ERIC Educational Resources Information Center

    Foulke, Emerson

    Accounts of completed and ongoing research conducted from 1964 to 1968 are presented on the subject of accelerated speech as a substitute for the written word. Included are a review of the research on intelligibility and comprehension of accelerated speech, some methods for controlling the word rate of recorded speech, and a comparison of…

  19. Auditory Brainstem Response to Complex Sounds Predicts Self-Reported Speech-in-Noise Performance

    ERIC Educational Resources Information Center

    Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Kraus, Nina

    2013-01-01

    Purpose: To compare the ability of the auditory brainstem response to complex sounds (cABR) to predict subjective ratings of speech understanding in noise on the Speech, Spatial, and Qualities of Hearing Scale (SSQ; Gatehouse & Noble, 2004) relative to the predictive ability of the Quick Speech-in-Noise test (QuickSIN; Killion, Niquette,…

  20. Auditory-Perceptual Assessment of Fluency in Typical and Neurologically Disordered Speech

    ERIC Educational Resources Information Center

    Penttilä, Nelly; Korpijaakko-Huuhka, Anna-Maija; Kent, Ray D.

    2018-01-01

    Purpose: The aim of this study is to investigate how speech fluency in typical and atypical speech is perceptually assessed by speech-language pathologists (SLPs). Our research questions were as follows: (a) How do SLPs rate fluency in speakers with and without neurological communication disorders? (b) Do they differentiate the speaker groups? and…

  1. How Our Own Speech Rate Influences Our Perception of Others

    ERIC Educational Resources Information Center

    Bosker, Hans Rutger

    2017-01-01

    In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects…

  2. Recognition of Time-Compressed and Natural Speech with Selective Temporal Enhancements by Young and Elderly Listeners

    ERIC Educational Resources Information Center

    Gordon-Salant, Sandra; Fitzgibbons, Peter J.; Friedman, Sarah A.

    2007-01-01

    Purpose: The goal of this experiment was to determine whether selective slowing of speech segments improves recognition performance by young and elderly listeners. The hypotheses were (a) the benefits of time expansion occur for rapid speech but not for natural-rate speech, (b) selective time expansion of consonants produces greater score…

  3. Sinusoidal transform coding

    NASA Technical Reports Server (NTRS)

    Mcaulay, Robert J.; Quatieri, Thomas F.

    1988-01-01

    It has been shown that an analysis/synthesis system based on a sinusoidal representation of speech leads to synthetic speech that is essentially perceptually indistinguishable from the original. Strategies for coding the amplitudes, frequencies and phases of the sine waves have been developed that have led to a multirate coder operating at rates from 2400 to 9600 bps. The encoded speech is highly intelligible at all rates with a uniformly improving quality as the data rate is increased. A real-time fixed-point implementation has been developed using two ADSP2100 DSP chips. The methods used for coding and quantizing the sine-wave parameters for operation at the various frame rates are described.

  4. Temporal Sensitivity Measured Shortly After Cochlear Implantation Predicts 6-Month Speech Recognition Outcome.

    PubMed

    Erb, Julia; Ludwig, Alexandra Annemarie; Kunke, Dunja; Fuchs, Michael; Obleser, Jonas

    2018-04-24

    Psychoacoustic tests assessed shortly after cochlear implantation are useful predictors of the rehabilitative speech outcome. While largely independent, both spectral and temporal resolution tests are important to provide an accurate prediction of speech recognition. However, rapid tests of temporal sensitivity are currently lacking. Here, we propose a simple amplitude modulation rate discrimination (AMRD) paradigm that is validated by predicting future speech recognition in adult cochlear implant (CI) patients. In 34 newly implanted patients, we used an adaptive AMRD paradigm, where broadband noise was modulated at the speech-relevant rate of ~4 Hz. In a longitudinal study, speech recognition in quiet was assessed using the closed-set Freiburger number test shortly after cochlear implantation (t0) as well as the open-set Freiburger monosyllabic word test 6 months later (t6). Both AMRD thresholds at t0 (r = -0.51) and speech recognition scores at t0 (r = 0.56) predicted speech recognition scores at t6. However, AMRD and speech recognition at t0 were uncorrelated, suggesting that those measures capture partially distinct perceptual abilities. A multiple regression model predicting 6-month speech recognition outcome with deafness duration and speech recognition at t0 improved from adjusted R = 0.30 to adjusted R = 0.44 when AMRD threshold was added as a predictor. These findings identify AMRD thresholds as a reliable, nonredundant predictor above and beyond established speech tests for CI outcome. This AMRD test could potentially be developed into a rapid clinical temporal-resolution test to be integrated into the postoperative test battery to improve the reliability of speech outcome prognosis.

  5. Patterns of poststroke brain damage that predict speech production errors in apraxia of speech and aphasia dissociate.

    PubMed

    Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius

    2015-06-01

    Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions on whether AOS emerges from a unique pattern of brain damage or as a subelement of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The AOS Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with both AOS and aphasia. Localized brain damage was identified using structural magnetic resonance imaging, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS or aphasia, and brain damage. The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS or aphasia were associated with damage to the temporal lobe and the inferior precentral frontal regions. AOS likely occurs in conjunction with aphasia because of the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. © 2015 American Heart Association, Inc.

  6. Speech rate reduction and "nasality" in normal speakers.

    PubMed

    Brancewicz, T M; Reich, A R

    1989-12-01

    This study explored the effects of reduced speech rate on nasal/voice accelerometric measures and nasality ratings. Nasal/voice accelerometric measures were obtained from normal adults for various speech stimuli and speaking rates. Stimuli included three sentences (one obstruent-loaded, one semivowel-loaded, and one containing a single nasal), and /pv/ syllable trains.. Speakers read the stimuli at their normal rate, half their normal rate, and as slowly as possible. In addition, a computer program paced each speaker at rates of 1, 2, and 3 syllables per second. The nasal/voice accelerometric values revealed significant stimulus effects but no rate effects. The nasality ratings of experienced listeners, evaluated as a function of stimulus and speaking rate, were compared to the accelerometric measures. The nasality scale values demonstrated small, but statistically significant, stimulus and rate effects. However, the nasality percepts were poorly correlated with the nasal/voice accelerometric measures.

  7. Improving speech perception in noise with current focusing in cochlear implant users.

    PubMed

    Srinivasan, Arthi G; Padilla, Monica; Shannon, Robert V; Landsberger, David M

    2013-05-01

    Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. Quantitative assessment of motor speech abnormalities in idiopathic rapid eye movement sleep behaviour disorder.

    PubMed

    Rusz, Jan; Hlavnička, Jan; Tykalová, Tereza; Bušková, Jitka; Ulmanová, Olga; Růžička, Evžen; Šonka, Karel

    2016-03-01

    Patients with idiopathic rapid eye movement sleep behaviour disorder (RBD) are at substantial risk for developing Parkinson's disease (PD) or related neurodegenerative disorders. Speech is an important indicator of motor function and movement coordination, and therefore may be an extremely sensitive early marker of changes due to prodromal neurodegeneration. Speech data were acquired from 16 RBD subjects and 16 age- and sex-matched healthy control subjects. Objective acoustic assessment of 15 speech dimensions representing various phonatory, articulatory, and prosodic deviations was performed. Statistical models were applied to characterise speech disorders in RBD and to estimate sensitivity and specificity in differentiating between RBD and control subjects. Some form of speech impairment was revealed in 88% of RBD subjects. Articulatory deficits were the most prominent findings in RBD. In comparison to controls, the RBD group showed significant alterations in irregular alternating motion rates (p = 0.009) and articulatory decay (p = 0.01). The combination of four distinctive speech dimensions, including aperiodicity, irregular alternating motion rates, articulatory decay, and dysfluency, led to 96% sensitivity and 79% specificity in discriminating between RBD and control subjects. Speech impairment was significantly more pronounced in RBD subjects with the motor score of the Unified Parkinson's Disease Rating Scale greater than 4 points when compared to other RBD individuals. Simple quantitative speech motor measures may be suitable for the reliable detection of prodromal neurodegeneration in subjects with RBD, and therefore may provide important outcomes for future therapy trials. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Psychovisual masks and intelligent streaming RTP techniques for the MPEG-4 standard

    NASA Astrophysics Data System (ADS)

    Mecocci, Alessandro; Falconi, Francesco

    2003-06-01

    In today multimedia audio-video communication systems, data compression plays a fundamental role by reducing the bandwidth waste and the costs of the infrastructures and equipments. Among the different compression standards, the MPEG-4 is becoming more and more accepted and widespread. Even if one of the fundamental aspects of this standard is the possibility of separately coding video objects (i.e. to separate moving objects from the background and adapt the coding strategy to the video content), currently implemented codecs work only at the full-frame level. In this way, many advantages of the flexible MPEG-4 syntax are missed. This lack is due both to the difficulties in properly segmenting moving objects in real scenes (featuring an arbitrary motion of the objects and of the acquisition sensor), and to the current use of these codecs, that are mainly oriented towards the market of DVD backups (a full-frame approach is enough for these applications). In this paper we propose a codec for MPEG-4 real-time object streaming, that codes separately the moving objects and the scene background. The proposed codec is capable of adapting its strategy during the transmission, by analysing the video currently transmitted and setting the coder parameters and modalities accordingly. For example, the background can be transmitted as a whole or by dividing it into "slightly-detailed" and "highly detailed" zones that are coded in different ways to reduce the bit-rate while preserving the perceived quality. The coder can automatically switch in real-time, from one modality to the other during the transmission, depending on the current video content. Psychovisual masks and other video-content based measurements have been used as inputs for a Self Learning Intelligent Controller (SLIC) that changes the parameters and the transmission modalities. The current implementation is based on the ISO 14496 standard code that allows Video Objects (VO) transmission (other Open Source Codes like: DivX, Xvid, and Cisco"s Mpeg-4IP, have been analyzed but, as for today, they do not support VO). The original code has been deeply modified to integrate the SLIC and to adapt it for real-time streaming. A personal RTP (Real Time Protocol) has been defined and a Client-Server application has been developed. The viewer can decode and demultiplex the stream in real-time, while adapting to the changing modalities adopted by the Server according to the current video content. The proposed codec works as follows: the image background is separated by means of a segmentation module and it is transmitted by means of a wavelet compression scheme similar to that used in the JPEG2000. The VO are coded separately and multiplexed with the background stream. At the receiver the stream is demultiplexed to obtain the background and the VO that are subsequently pasted together. The final quality depends on many factors, in particular: the quantization parameters, the Group Of Video Object (GOV) length, the GOV structure (i.e. the number of I-P-B VOP), the search area for motion compensation. These factors are strongly related to the following measurement parameters (that have been defined during the development): the Objects Apparent Size (OAS) in the scene, the Video Object Incidence factor (VOI), the temporal correlation (measured through the Normalized Mean SAD, NMSAD). The SLIC module analyzes the currently transmitted video and selects the most appropriate settings by choosing from a predefined set of transmission modalities. For example, in the case of a highly temporal correlated sequence, the number of B-VOP is increased to improve the compression ratio. The strategy for the selection of the number of B-VOP turns out to be very different from those reported in the literature for B-frames (adopted for MPEG-1 and MPEG-2), due to the different behaviour of the temporal correlation when limited only to moving objects. The SLIC module also decides how to transmit the background. In our implementation we adopted the Visual Brain theory i.e. the study of what the "psychic eye" can get from a scene. According to this theory, a Psychomask Image Analysis (PIA) module has been developed to extract the visually homogeneous regions of the background. The PIA module produces two complementary masks one for the visually low variance zones and one for the higly variable zones; these zones are compressed with different strategies and encoded into two multiplexed streams. From practical experiments it turned out that the separate coding is advantageous only if the low variance zones exceed 50% of the whole background area (due to the overhead given by the need of transmitting the zone masks). The SLIC module takes care of deciding the appropriate transmission modality by analyzing the results produced by the PIA module. The main features of this codec are: low bitrate, good image quality and coding speed. The current implementation runs in real-time on standard PC platforms, the major limitation being the fixed position of the acquisition sensor. This limitation is due to the difficulties in separating moving objects from the background when the acquisition sensor moves. Our current real-time segmentation module does not produce suitable results if the acquisition sensor moves (only slight oscillatory movements are tolerated). In any case, the system is particularly suitable for tele surveillance applications at low bit-rates, where the camera is usually fixed or alternates among some predetermined positions (our segmentation module is capable of accurately separate moving objects from the static background when the acquisition sensor stops, even if different scenes are seen as a result of the sensor displacements). Moreover, the proposed architecture is general, in the sense that when real-time, robust segmentation systems (capable of separating objects in real-time from the background while the sensor itself is moving) will be available, they can be easily integrated while leaving the rest of the system unchanged. Experimental results related to real sequences for traffic monitoring and for people tracking and afety control are reported and deeply discussed in the paper. The whole system has been implemented in standard ANSI C code and currently runs on standard PCs under Microsoft Windows operating system (Windows 2000 pro and Windows XP).

  10. Hemispheric asymmetry of auditory steady-state responses to monaural and diotic stimulation.

    PubMed

    Poelmans, Hanne; Luts, Heleen; Vandermosten, Maaike; Ghesquière, Pol; Wouters, Jan

    2012-12-01

    Amplitude modulations in the speech envelope are crucial elements for speech perception. These modulations comprise the processing rate at which syllabic (~3-7 Hz), and phonemic transitions occur in speech. Theories about speech perception hypothesize that each hemisphere in the auditory cortex is specialized in analyzing modulations at different timescales, and that phonemic-rate modulations of the speech envelope lateralize to the left hemisphere, whereas right lateralization occurs for slow, syllabic-rate modulations. In the present study, neural processing of phonemic- and syllabic-rate modulations was investigated with auditory steady-state responses (ASSRs). ASSRs to speech-weighted noise stimuli, amplitude modulated at 4, 20, and 80 Hz, were recorded in 30 normal-hearing adults. The 80 Hz ASSR is primarily generated by the brainstem, whereas 20 and 4 Hz ASSRs are mainly cortically evoked and relate to speech perception. Stimuli were presented diotically (same signal to both ears) and monaurally (one signal to the left or right ear). For 80 Hz, diotic ASSRs were larger than monaural responses. This binaural advantage decreased with decreasing modulation frequency. For 20 Hz, diotic ASSRs were equal to monaural responses, while for 4 Hz, diotic responses were smaller than monaural responses. Comparison of left and right ear stimulation demonstrated that, with decreasing modulation rate, a gradual change from ipsilateral to right lateralization occurred. Together, these results (1) suggest that ASSR enhancement to binaural stimulation decreases in the ascending auditory system and (2) indicate that right lateralization is more prominent for low-frequency ASSRs. These findings may have important consequences for electrode placement in clinical settings, as well as for the understanding of low-frequency ASSR generation.

  11. Use of the Progressive Aphasia Severity Scale (PASS) in monitoring speech and language status in PPA

    PubMed Central

    Sapolsky, Daisy; Domoto-Reilly, Kimiko; Dickerson, Bradford C.

    2014-01-01

    Background Primary progressive aphasia (PPA) is a devastating neurodegenerative syndrome involving the gradual development of aphasia, slowly impairing the patient’s ability to communicate. Pharmaceutical treatments do not currently exist and intervention often focuses on speech-language behavioral therapies, although further investigation is warranted to determine how best to harness functional benefits. Efforts to develop pharmaceutical and behavioral treatments have been hindered by a lack of standardized methods to monitor disease progression and treatment efficacy. Aims Here we describe our current approach to monitoring progression of PPA, including the development and applications of a novel clinical instrument for this purpose, the Progressive Aphasia Severity Scale (PASS). We also outline some of the issues related to initial evaluation and longitudinal monitoring of PPA. Methods & Procedures In our clinical and research practice we perform initial and follow-up assessments of PPA patients using a multi-faceted approach. In addition to standardized assessment measures, we use the PASS to rate presence and severity of symptoms across distinct domains of speech, language, and functional and pragmatic aspects of communication. Ratings are made using the clinician’s best judgment, integrating information from patient test performance in the office as well as a companion’s description of routine daily functioning. Outcomes & Results Monitoring symptom characteristics and severity with the PASS can assist in developing behavioral therapies, planning treatment goals, and counseling patients and families on clinical status and prognosis. The PASS also has potential to advance the implementation of PPA clinical trials. Conclusions PPA patients display heterogeneous language profiles that change over time given the progressive nature of the disease. The monitoring of symptom progression is therefore crucial to ensure that proposed treatments are appropriate at any given stage, including speech-language therapy and potentially pharmaceutical treatments once these become available. Because of the discrepancy that can exist between a patient’s daily functioning and standardized test performance, we believe a comprehensive assessment and monitoring battery must include performance-based instruments, interviews with the patient and partner, questionnaires about functioning in daily life, and measures of clinician judgment. We hope that our clinician judgment-based rating scale described here will be a valuable addition to the PPA assessment and monitoring battery. PMID:25419031

  12. The minor third communicates sadness in speech, mirroring its use in music.

    PubMed

    Curtis, Meagan E; Bharucha, Jamshed J

    2010-06-01

    There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.

  13. Virtual personal assistance

    NASA Astrophysics Data System (ADS)

    Aditya, K.; Biswadeep, G.; Kedar, S.; Sundar, S.

    2017-11-01

    Human computer communication has growing demand recent days. The new generation of autonomous technology aspires to give computer interfaces emotional states that relate and consider user as well as system environment considerations. In the existing computational model is based an artificial intelligent and externally by multi-modal expression augmented with semi human characteristics. But the main problem with is multi-model expression is that the hardware control given to the Artificial Intelligence (AI) is very limited. So, in our project we are trying to give the Artificial Intelligence (AI) more control on the hardware. There are two main parts such as Speech to Text (STT) and Text to Speech (TTS) engines are used accomplish the requirement. In this work, we are using a raspberry pi 3, a speaker and a mic as hardware and for the programing part, we are using python scripting.

  14. Alternating motion rate as an index of speech motor disorder in traumatic brain injury.

    PubMed

    Wang, Yu-Tsai; Kent, Ray D; Duffy, Joseph R; Thomas, Jack E; Weismer, Gary

    2004-01-01

    The task of syllable alternating motion rate (AMR) (also called diadochokinesis) is suitable for examining speech disorders of varying degrees of severity and in individuals with varying levels of linguistic and cognitive ability. However, very limited information on this task has been published for subjects with traumatic brain injury (TBI). This study is a quantitative and qualitative acoustic analysis of AMR in seven subjects with TBI. The primary goal was to use acoustic analyses to assess speech motor control disturbances for the group as a whole and for individual patients. Quantitative analyses included measures of syllable rate, syllable and intersyllable gap durations, energy maxima, and voice onset time (VOT). Qualitative analyses included classification of features evident in spectrograms and waveforms to provide a more detailed description. The TBI group had (1) a slowed syllable rate due mostly to lengthened syllables and, to a lesser degree, lengthened intersyllable gaps, (2) highly correlated syllable rates between AMR and conversation, (3) temporal and energy maxima irregularities within repetition sequences, (4) normal median VOT values but with large variation, and (5) a number of speech production abnormalities revealed by qualitative analysis, including explosive speech quality, breathy voice quality, phonatory instability, multiple or missing stop bursts, continuous voicing, and spirantization. The relationships between these findings and TBI speakers' neurological status and dysarthria types are also discussed. It was concluded that acoustic analyses of the AMR task provides specific information on motor speech limitations in individuals with TBI.

  15. A speech processing study using an acoustic model of a multiple-channel cochlear implant

    NASA Astrophysics Data System (ADS)

    Xu, Ying

    1998-10-01

    A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and channel numbers were not beneficial. Manipulations of stimulation rate and number of activated channels did not appreciably affect consonant recognition. These results suggest that overall speech performance may improve by appropriately increasing stimulation rate and number of activated channels. Future revision of this acoustic model is necessary to provide more accurate amplitude representation of speech.

  16. Psychological Literacy Weakly Differentiates Students by Discipline and Year of Enrolment

    PubMed Central

    Heritage, Brody; Roberts, Lynne D.; Gasson, Natalie

    2016-01-01

    Psychological literacy, a construct developed to reflect the types of skills graduates of a psychology degree should possess and be capable of demonstrating, has recently been scrutinized in terms of its measurement adequacy. The recent development of a multi-item measure encompassing the facets of psychological literacy has provided the potential for improved validity in measuring the construct. We investigated the known-groups validity of this multi-item measure of psychological literacy to examine whether psychological literacy could predict (a) students’ course of enrolment and (b) students’ year of enrolment. Five hundred and fifteen undergraduate psychology students, 87 psychology/human resource management students, and 83 speech pathology students provided data. In the first year cohort, the reflective processes (RPs) factor significantly predicted psychology and psychology/human resource management course enrolment, although no facets significantly differentiated between psychology and speech pathology enrolment. Within the second year cohort, generic graduate attributes (GGAs) and RPs differentiated psychology and speech pathology course enrolment. GGAs differentiated first-year and second-year psychology students, with second-year students more likely to have higher scores on this factor. Due to weak support for known-groups validity, further measurement refinements are recommended to improve the construct’s utility. PMID:26909058

  17. Psychological Literacy Weakly Differentiates Students by Discipline and Year of Enrolment.

    PubMed

    Heritage, Brody; Roberts, Lynne D; Gasson, Natalie

    2016-01-01

    Psychological literacy, a construct developed to reflect the types of skills graduates of a psychology degree should possess and be capable of demonstrating, has recently been scrutinized in terms of its measurement adequacy. The recent development of a multi-item measure encompassing the facets of psychological literacy has provided the potential for improved validity in measuring the construct. We investigated the known-groups validity of this multi-item measure of psychological literacy to examine whether psychological literacy could predict (a) students' course of enrolment and (b) students' year of enrolment. Five hundred and fifteen undergraduate psychology students, 87 psychology/human resource management students, and 83 speech pathology students provided data. In the first year cohort, the reflective processes (RPs) factor significantly predicted psychology and psychology/human resource management course enrolment, although no facets significantly differentiated between psychology and speech pathology enrolment. Within the second year cohort, generic graduate attributes (GGAs) and RPs differentiated psychology and speech pathology course enrolment. GGAs differentiated first-year and second-year psychology students, with second-year students more likely to have higher scores on this factor. Due to weak support for known-groups validity, further measurement refinements are recommended to improve the construct's utility.

  18. Sub-band/transform compression of video sequences

    NASA Technical Reports Server (NTRS)

    Sauer, Ken; Bauer, Peter

    1992-01-01

    The progress on compression of video sequences is discussed. The overall goal of the research was the development of data compression algorithms for high-definition television (HDTV) sequences, but most of our research is general enough to be applicable to much more general problems. We have concentrated on coding algorithms based on both sub-band and transform approaches. Two very fundamental issues arise in designing a sub-band coder. First, the form of the signal decomposition must be chosen to yield band-pass images with characteristics favorable to efficient coding. A second basic consideration, whether coding is to be done in two or three dimensions, is the form of the coders to be applied to each sub-band. Computational simplicity is of essence. We review the first portion of the year, during which we improved and extended some of the previous grant period's results. The pyramid nonrectangular sub-band coder limited to intra-frame application is discussed. Perhaps the most critical component of the sub-band structure is the design of bandsplitting filters. We apply very simple recursive filters, which operate at alternating levels on rectangularly sampled, and quincunx sampled images. We will also cover the techniques we have studied for the coding of the resulting bandpass signals. We discuss adaptive three-dimensional coding which takes advantage of the detection algorithm developed last year. To this point, all the work on this project has been done without the benefit of motion compensation (MC). Motion compensation is included in many proposed codecs, but adds significant computational burden and hardware expense. We have sought to find a lower-cost alternative featuring a simple adaptation to motion in the form of the codec. In sequences of high spatial detail and zooming or panning, it appears that MC will likely be necessary for the proposed quality and bit rates.

  19. Speech Comprehension Difficulties in Chronic Tinnitus and Its Relation to Hyperacusis

    PubMed Central

    Vielsmeier, Veronika; Kreuzer, Peter M.; Haubner, Frank; Steffens, Thomas; Semmler, Philipp R. O.; Kleinjung, Tobias; Schlee, Winfried; Langguth, Berthold; Schecklmann, Martin

    2016-01-01

    Objective: Many tinnitus patients complain about difficulties regarding speech comprehension. In spite of the high clinical relevance little is known about underlying mechanisms and predisposing factors. Here, we performed an exploratory investigation in a large sample of tinnitus patients to (1) estimate the prevalence of speech comprehension difficulties among tinnitus patients, to (2) compare subjective reports of speech comprehension difficulties with behavioral measurements in a standardized speech comprehension test and to (3) explore underlying mechanisms by analyzing the relationship between speech comprehension difficulties and peripheral hearing function (pure tone audiogram), as well as with co-morbid hyperacusis as a central auditory processing disorder. Subjects and Methods: Speech comprehension was assessed in 361 tinnitus patients presenting between 07/2012 and 08/2014 at the Interdisciplinary Tinnitus Clinic at the University of Regensburg. The assessment included standard audiological assessments (pure tone audiometry, tinnitus pitch, and loudness matching), the Goettingen sentence test (in quiet) for speech audiometric evaluation, two questions about hyperacusis, and two questions about speech comprehension in quiet and noisy environments (“How would you rate your ability to understand speech?”; “How would you rate your ability to follow a conversation when multiple people are speaking simultaneously?”). Results: Subjectively-reported speech comprehension deficits are frequent among tinnitus patients, especially in noisy environments (cocktail party situation). 74.2% of all investigated patients showed disturbed speech comprehension (indicated by values above 21.5 dB SPL in the Goettingen sentence test). Subjective speech comprehension complaints (both for general and in noisy environment) were correlated with hearing level and with audiologically-assessed speech comprehension ability. In contrast, co-morbid hyperacusis was only correlated with speech comprehension difficulties in noisy environments, but not with speech comprehension difficulties in general. Conclusion: Speech comprehension deficits are frequent among tinnitus patients. Whereas speech comprehension deficits in quiet environments are primarily due to peripheral hearing loss, speech comprehension deficits in noisy environments are related to both peripheral hearing loss and dysfunctional central auditory processing. Disturbed speech comprehension in noisy environments might be modulated by a central inhibitory deficit. In addition, attentional and cognitive aspects may play a role. PMID:28018209

  20. Speech Comprehension Difficulties in Chronic Tinnitus and Its Relation to Hyperacusis.

    PubMed

    Vielsmeier, Veronika; Kreuzer, Peter M; Haubner, Frank; Steffens, Thomas; Semmler, Philipp R O; Kleinjung, Tobias; Schlee, Winfried; Langguth, Berthold; Schecklmann, Martin

    2016-01-01

    Objective: Many tinnitus patients complain about difficulties regarding speech comprehension. In spite of the high clinical relevance little is known about underlying mechanisms and predisposing factors. Here, we performed an exploratory investigation in a large sample of tinnitus patients to (1) estimate the prevalence of speech comprehension difficulties among tinnitus patients, to (2) compare subjective reports of speech comprehension difficulties with behavioral measurements in a standardized speech comprehension test and to (3) explore underlying mechanisms by analyzing the relationship between speech comprehension difficulties and peripheral hearing function (pure tone audiogram), as well as with co-morbid hyperacusis as a central auditory processing disorder. Subjects and Methods: Speech comprehension was assessed in 361 tinnitus patients presenting between 07/2012 and 08/2014 at the Interdisciplinary Tinnitus Clinic at the University of Regensburg. The assessment included standard audiological assessments (pure tone audiometry, tinnitus pitch, and loudness matching), the Goettingen sentence test (in quiet) for speech audiometric evaluation, two questions about hyperacusis, and two questions about speech comprehension in quiet and noisy environments ("How would you rate your ability to understand speech?"; "How would you rate your ability to follow a conversation when multiple people are speaking simultaneously?"). Results: Subjectively-reported speech comprehension deficits are frequent among tinnitus patients, especially in noisy environments (cocktail party situation). 74.2% of all investigated patients showed disturbed speech comprehension (indicated by values above 21.5 dB SPL in the Goettingen sentence test). Subjective speech comprehension complaints (both for general and in noisy environment) were correlated with hearing level and with audiologically-assessed speech comprehension ability. In contrast, co-morbid hyperacusis was only correlated with speech comprehension difficulties in noisy environments, but not with speech comprehension difficulties in general. Conclusion: Speech comprehension deficits are frequent among tinnitus patients. Whereas speech comprehension deficits in quiet environments are primarily due to peripheral hearing loss, speech comprehension deficits in noisy environments are related to both peripheral hearing loss and dysfunctional central auditory processing. Disturbed speech comprehension in noisy environments might be modulated by a central inhibitory deficit. In addition, attentional and cognitive aspects may play a role.

  1. Perceptual rate normalization in naturally produced bilabial stops

    NASA Astrophysics Data System (ADS)

    Nagao, Kyoko; de Jong, Kenneth

    2003-10-01

    The perception of voicing categories is affected by the speaking rate, so that listeners' category boundaries on a VOT continuum shift to a lower value when the syllable duration decreases (Miller and Volaitis, 1989; Volaitis and Miller, 1992). Previous rate normalization effects have been found using computer-generated stimuli. This study examines the effect of speech rate on voicing categorization in naturally produced speech. Four native speakers of American English repeated syllables (/bi/ and /pi/) at increasing rates in time with a metronome. Three-syllable stimuli were spliced from the repetitive speech. These stimuli contained natural decreases in VOT with faster speech rates. Besides, this rate effect on VOT was larger for /p/ than /b/, so that VOT values for /b/ and /p/ overlapped at the fastest rates. Eighteen native listeners of American English were presented with 168 stimuli and asked to identify the consonant. Perceptual category boundaries occur at VOT values 15 ms shorter than the values reported for synthesized stimuli. This difference may be due to the extraordinarily wide range of VOT values in previous studies. The values found in the current study closely match the actual division point for /b/ and /p/. The underlying mechanism of perceptual normalization will be discussed.

  2. Development of a good-quality speech coder for transmission over noisy channels at 2.4 kb/s

    NASA Astrophysics Data System (ADS)

    Viswanathan, V. R.; Berouti, M.; Higgins, A.; Russell, W.

    1982-03-01

    This report describes the development, study, and experimental results of a 2.4 kb/s speech coder called harmonic deviations (HDV) vocoder, which transmits good-quality speech over noisy channels with bit-error rates of up to 1%. The HDV coder is based on the linear predictive coding (LPC) vocoder, and it transmits additional information over and above the data transmitted by the LPC vocoder, in the form of deviations between the speech spectrum and the LPC all-pole model spectrum at a selected set of frequencies. At the receiver, the spectral deviations are used to generate the excitation signal for the all-pole synthesis filter. The report describes and compares several methods for extracting the spectral deviations from the speech signal and for encoding them. To limit the bit-rate of the HDV coder to 2.4 kb/s the report discusses several methods including orthogonal transformation and minimum-mean-square-error scalar quantization of log area ratios, two-stage vector-scalar quantization, and variable frame rate transmission. The report also presents the results of speech-quality optimization of the HDV coder at 2.4 kb/s.

  3. Social Anxiety, Affect, Cortisol Response and Performance on a Speech Task.

    PubMed

    Losiak, Wladyslaw; Blaut, Agata; Klosowska, Joanna; Slowik, Natalia

    2016-01-01

    Social anxiety is characterized by increased emotional reactivity to social stimuli, but results of studies focusing on affective reactions of socially anxious subjects in the situation of social exposition are inconclusive, especially in the case of endocrinological measures of affect. This study was designed to examine individual differences in endocrinological and affective reactions to social exposure as well as in performance on a speech task in a group of students (n = 44) comprising subjects with either high or low levels of social anxiety. Measures of salivary cortisol and positive and negative affect were taken before and after an impromptu speech. Self-ratings and observer ratings of performance were also obtained. Cortisol levels and negative affect increased in both groups after the speech task, and positive affect decreased; however, group × affect interactions were not significant. Assessments conducted after the speech task revealed that highly socially anxious participants had lower observer ratings of performance while cortisol increase and changes in self-reported affect were not related to performance. Socially anxious individuals do not differ from nonanxious individuals in affective reactions to social exposition, but reveal worse performance at a speech task. © 2015 S. Karger AG, Basel.

  4. Speech perception of young children using nucleus 22-channel or CLARION cochlear implants.

    PubMed

    Young, N M; Grohne, K M; Carrasco, V N; Brown, C

    1999-04-01

    This study compares the auditory perceptual skill development of 23 congenitally deaf children who received the Nucleus 22-channel cochlear implant with the SPEAK speech coding strategy, and 20 children who received the CLARION Multi-Strategy Cochlear Implant with the Continuous Interleaved Sampler (CIS) speech coding strategy. All were under 5 years old at implantation. Preimplantation, there were no significant differences between the groups in age, length of hearing aid use, or communication mode. Auditory skills were assessed at 6 months and 12 months after implantation. Postimplantation, the mean scores on all speech perception tests were higher for the Clarion group. These differences were statistically significant for the pattern perception and monosyllable subtests of the Early Speech Perception battery at 6 months, and for the Glendonald Auditory Screening Procedure at 12 months. Multiple regression analysis revealed that device type accounted for the greatest variance in performance after 12 months of implant use. We conclude that children using the CIS strategy implemented in the Clarion implant may develop better auditory perceptual skills during the first year postimplantation than children using the SPEAK strategy with the Nucleus device.

  5. Perception of temporally modified speech in auditory neuropathy.

    PubMed

    Hassan, Dalia Mohamed

    2011-01-01

    Disrupted auditory nerve activity in auditory neuropathy (AN) significantly impairs the sequential processing of auditory information, resulting in poor speech perception. This study investigated the ability of AN subjects to perceive temporally modified consonant-vowel (CV) pairs and shed light on their phonological awareness skills. Four Arabic CV pairs were selected: /ki/-/gi/, /to/-/do/, /si/-/sti/ and /so/-/zo/. The formant transitions in consonants and the pauses between CV pairs were prolonged. Rhyming, segmentation and blending skills were tested using words at a natural rate of speech and with prolongation of the speech stream. Fourteen adult AN subjects were compared to a matched group of cochlear-impaired patients in their perception of acoustically processed speech. The AN group distinguished the CV pairs at a low speech rate, in particular with modification of the consonant duration. Phonological awareness skills deteriorated in adult AN subjects but improved with prolongation of the speech inter-syllabic time interval. A rehabilitation program for AN should consider temporal modification of speech, training for auditory temporal processing and the use of devices with innovative signal processing schemes. Verbal modifications as well as visual imaging appear to be promising compensatory strategies for remediating the affected phonological processing skills.

  6. Human phoneme recognition depending on speech-intrinsic variability.

    PubMed

    Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

    2010-11-01

    The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).

  7. Everyday listeners' impressions of speech produced by individuals with adductor spasmodic dysphonia.

    PubMed

    Nagle, Kathleen F; Eadie, Tanya L; Yorkston, Kathryn M

    2015-01-01

    Individuals with adductor spasmodic dysphonia (ADSD) have reported that unfamiliar communication partners appear to judge them as sneaky, nervous or not intelligent, apparently based on the quality of their speech; however, there is minimal research into the actual everyday perspective of listening to ADSD speech. The purpose of this study was to investigate the impressions of listeners hearing ADSD speech for the first time using a mixed-methods design. Everyday listeners were interviewed following sessions in which they made ratings of ADSD speech. A semi-structured interview approach was used and data were analyzed using thematic content analysis. Three major themes emerged: (1) everyday listeners make judgments about speakers with ADSD; (2) ADSD speech does not sound normal to everyday listeners; and (3) rating overall severity is difficult for everyday listeners. Participants described ADSD speech similarly to existing literature; however, some listeners inaccurately extrapolated speaker attributes based solely on speech samples. Listeners may draw erroneous conclusions about individuals with ADSD and these biases may affect the communicative success of these individuals. Results have implications for counseling individuals with ADSD, as well as the need for education and awareness about ADSD. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Autonomic Correlates of Speech Versus Nonspeech Tasks in Children and Adults

    PubMed Central

    Arnold, Hayley S.; MacPherson, Megan K.; Smith, Anne

    2015-01-01

    Purpose To assess autonomic arousal associated with speech and nonspeech tasks in school-age children and young adults. Method Measures of autonomic arousal (electrodermal level, electrodermal response amplitude, blood pulse volume, and heart rate) were recorded prior to, during, and after the performance of speech and nonspeech tasks by twenty 7- to 9-year-old children and twenty 18- to 22-year-old adults. Results Across age groups, autonomic arousal was higher for speech tasks compared with nonspeech tasks, based on peak electrodermal response amplitude and blood pulse volume. Children demonstrated greater relative arousal, based on heart rate and blood pulse volume, for nonspeech oral motor tasks than adults but showed similar mean arousal levels for speech tasks as adults. Children demonstrated sex differences in autonomic arousal; specifically, autonomic arousal remained high for school-age boys but not girls in a more complex open-ended narrative task that followed a simple sentence production task. Conclusions Speech tasks elicit greater autonomic arousal than nonspeech tasks, and children demonstrate greater autonomic arousal for nonspeech oral motor tasks than adults. Sex differences in autonomic arousal associated with speech tasks in school-age children are discussed relative to speech-language differences between boys and girls. PMID:24686989

  9. Predefined Redundant Dictionary for Effective Depth Maps Representation

    NASA Astrophysics Data System (ADS)

    Sebai, Dorsaf; Chaieb, Faten; Ghorbel, Faouzi

    2016-01-01

    The multi-view video plus depth (MVD) video format consists of two components: texture and depth map, where a combination of these components enables a receiver to generate arbitrary virtual views. However, MVD presents a very voluminous video format that requires a compression process for storage and especially for transmission. Conventional codecs are perfectly efficient for texture images compression but not for intrinsic depth maps properties. Depth images indeed are characterized by areas of smoothly varying grey levels separated by sharp discontinuities at the position of object boundaries. Preserving these characteristics is important to enable high quality view synthesis at the receiver side. In this paper, sparse representation of depth maps is discussed. It is shown that a significant gain in sparsity is achieved when particular mixed dictionaries are used for approximating these types of images with greedy selection strategies. Experiments are conducted to confirm the effectiveness at producing sparse representations, and competitiveness, with respect to candidate state-of-art dictionaries. Finally, the resulting method is shown to be effective for depth maps compression and represents an advantage over the ongoing 3D high efficiency video coding compression standard, particularly at medium and high bitrates.

  10. Development of The Viking Speech Scale to classify the speech of children with cerebral palsy.

    PubMed

    Pennington, Lindsay; Virella, Daniel; Mjøen, Tone; da Graça Andrada, Maria; Murray, Janice; Colver, Allan; Himmelmann, Kate; Rackauskaite, Gija; Greitane, Andra; Prasauskiene, Audrone; Andersen, Guro; de la Cruz, Javier

    2013-10-01

    Surveillance registers monitor the prevalence of cerebral palsy and the severity of resulting impairments across time and place. The motor disorders of cerebral palsy can affect children's speech production and limit their intelligibility. We describe the development of a scale to classify children's speech performance for use in cerebral palsy surveillance registers, and its reliability across raters and across time. Speech and language therapists, other healthcare professionals and parents classified the speech of 139 children with cerebral palsy (85 boys, 54 girls; mean age 6.03 years, SD 1.09) from observation and previous knowledge of the children. Another group of health professionals rated children's speech from information in their medical notes. With the exception of parents, raters reclassified children's speech at least four weeks after their initial classification. Raters were asked to rate how easy the scale was to use and how well the scale described the child's speech production using Likert scales. Inter-rater reliability was moderate to substantial (k>.58 for all comparisons). Test-retest reliability was substantial to almost perfect for all groups (k>.68). Over 74% of raters found the scale easy or very easy to use; 66% of parents and over 70% of health care professionals judged the scale to describe children's speech well or very well. We conclude that the Viking Speech Scale is a reliable tool to describe the speech performance of children with cerebral palsy, which can be applied through direct observation of children or through case note review. Copyright © 2013 Elsevier Ltd. All rights reserved.

  11. Analyzing crowdsourced ratings of speech-based take-over requests for automated driving.

    PubMed

    Bazilinskyy, P; de Winter, J C F

    2017-10-01

    Take-over requests in automated driving should fit the urgency of the traffic situation. The robustness of various published research findings on the valuations of speech-based warning messages is unclear. This research aimed to establish how people value speech-based take-over requests as a function of speech rate, background noise, spoken phrase, and speaker's gender and emotional tone. By means of crowdsourcing, 2669 participants from 95 countries listened to a random 10 out of 140 take-over requests, and rated each take-over request on urgency, commandingness, pleasantness, and ease of understanding. Our results replicate several published findings, in particular that an increase in speech rate results in a monotonic increase of perceived urgency. The female voice was easier to understand than a male voice when there was a high level of background noise, a finding that contradicts the literature. Moreover, a take-over request spoken with Indian accent was found to be easier to understand by participants from India than by participants from other countries. Our results replicate effects in the literature regarding speech-based warnings, and shed new light on effects of background noise, gender, and nationality. The results may have implications for the selection of appropriate take-over requests in automated driving. Additionally, our study demonstrates the promise of crowdsourcing for testing human factors and ergonomics theories with large sample sizes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Is Language a Factor in the Perception of Foreign Accent Syndrome?

    PubMed

    Jose, Linda; Read, Jennifer; Miller, Nick

    2016-06-01

    Neurogenic foreign accent syndrome (FAS) is diagnosed when listeners perceive speech associated with motor speech impairments as foreign rather than disordered. Speakers with foreign accent syndrome typically have aphasia. It remains unclear how far language changes might contribute to the perception of foreign accent syndrome independent of accent. Judges with and without training in language analysis rated orthographic transcriptions of speech from people with foreign accent syndrome, speech-language disorder and no foreign accent syndrome, foreign accent without neurological impairment and healthy controls on scales of foreignness, normalness and disorderedness. Control speakers were judged as significantly more normal, less disordered and less foreign than other groups. Foreign accent syndrome speakers' transcriptions consistently profiled most closely to those of foreign speakers and significantly different to speakers with speech-language disorder. On normalness and foreignness ratings there were no significant differences between foreign and foreign accent syndrome speakers. For disorderedness, foreign accent syndrome participants fell midway between foreign speakers and those with speech-language impairment only. Slower rate, more hesitations, pauses within and between utterances influenced judgments, delineating control scripts from others. Word-level syntactic and morphological deviations and reduced syntactic and semantic repertoire linked strongly with foreignness perceptions. Greater disordered ratings related to word fragments, poorly intelligible grammatical structures and inappropriate word selection. Language changes influence foreignness perception. Clinical and theoretical issues are addressed.

  13. Emotional and physiological responses of fluent listeners while watching the speech of adults who stutter.

    PubMed

    Guntupalli, Vijaya K; Everhart, D Erik; Kalinowski, Joseph; Nanjundeswaran, Chayadevie; Saltuklaroglu, Tim

    2007-01-01

    People who stutter produce speech that is characterized by intermittent, involuntary part-word repetitions and prolongations. In addition to these signature acoustic manifestations, those who stutter often display repetitive and fixated behaviours outside the speech producing mechanism (e.g. in the head, arm, fingers, nares, etc.). Previous research has examined the attitudes and perceptions of those who stutter and people who frequently interact with them (e.g. relatives, parents, employers). Results have shown an unequivocal, powerful and robust negative stereotype despite a lack of defined differences in personality structure between people who stutter and normally fluent individuals. However, physiological investigations of listener responses during moments of stuttering are limited. There is a need for data that simultaneously examine physiological responses (e.g. heart rate and galvanic skin conductance) and subjective behavioural responses to stuttering. The pairing of these objective and subjective data may provide information that casts light on the genesis of negative stereotypes associated with stuttering, the development of compensatory mechanisms in those who stutter, and the true impact of stuttering on senders and receivers alike. To compare the emotional and physiological responses of fluent speakers while listening and observing fluent and severe stuttered speech samples. Twenty adult participants (mean age = 24.15 years, standard deviation = 3.40) observed speech samples of two fluent speakers and two speakers who stutter reading aloud. Participants' skin conductance and heart rate changes were measured as physiological responses to stuttered or fluent speech samples. Participants' subjective responses on arousal (excited-calm) and valence (happy-unhappy) dimensions were assessed via the Self-Assessment Manikin (SAM) rating scale with an additional questionnaire comprised of a set of nine bipolar adjectives. Results showed significantly increased skin conductance and lower mean heart rate during the presentation of stuttered speech relative to the presentation of fluent speech samples (p<0.05). Listeners also self-rated themselves as being more aroused, unhappy, nervous, uncomfortable, sad, tensed, unpleasant, avoiding, embarrassed, and annoyed while viewing stuttered speech relative to the fluent speech. These data support the notion that stutter-filled speech can elicit physiological and emotional responses in listeners. Clinicians who treat stuttering should be aware that listeners show involuntary physiological responses to moderate-severe stuttering that probably remain salient over time and contribute to the evolution of negative stereotypes of people who stutter. With this in mind, it is hoped that clinicians can work with people who stutter to develop appropriate coping strategies. The role of amygdala and mirror neural mechanism in physiological and subjective responses to stuttering is discussed.

  14. The role of accent imitation in sensorimotor integration during processing of intelligible speech

    PubMed Central

    Adank, Patti; Rueschemeyer, Shirley-Ann; Bekkering, Harold

    2013-01-01

    Recent theories on how listeners maintain perceptual invariance despite variation in the speech signal allocate a prominent role to imitation mechanisms. Notably, these simulation accounts propose that motor mechanisms support perception of ambiguous or noisy signals. Indeed, imitation of ambiguous signals, e.g., accented speech, has been found to aid effective speech comprehension. Here, we explored the possibility that imitation in speech benefits perception by increasing activation in speech perception and production areas. Participants rated the intelligibility of sentences spoken in an unfamiliar accent of Dutch in a functional Magnetic Resonance Imaging experiment. Next, participants in one group repeated the sentences in their own accent, while a second group vocally imitated the accent. Finally, both groups rated the intelligibility of accented sentences in a post-test. The neuroimaging results showed an interaction between type of training and pre- and post-test sessions in left Inferior Frontal Gyrus, Supplementary Motor Area, and left Superior Temporal Sulcus. Although alternative explanations such as task engagement and fatigue need to be considered as well, the results suggest that imitation may aid effective speech comprehension by supporting sensorimotor integration. PMID:24109447

  15. Early speech development in Koolen de Vries syndrome limited by oral praxis and hypotonia.

    PubMed

    Morgan, Angela T; Haaften, Leenke van; van Hulst, Karen; Edley, Carol; Mei, Cristina; Tan, Tiong Yang; Amor, David; Fisher, Simon E; Koolen, David A

    2018-01-01

    Communication disorder is common in Koolen de Vries syndrome (KdVS), yet its specific symptomatology has not been examined, limiting prognostic counselling and application of targeted therapies. Here we examine the communication phenotype associated with KdVS. Twenty-nine participants (12 males, 4 with KANSL1 variants, 25 with 17q21.31 microdeletion), aged 1.0-27.0 years were assessed for oral-motor, speech, language, literacy, and social functioning. Early history included hypotonia and feeding difficulties. Speech and language development was delayed and atypical from onset of first words (2; 5-3; 5 years of age on average). Speech was characterised by apraxia (100%) and dysarthria (93%), with stuttering in some (17%). Speech therapy and multi-modal communication (e.g., sign-language) was critical in preschool. Receptive and expressive language abilities were typically commensurate (79%), both being severely affected relative to peers. Children were sociable with a desire to communicate, although some (36%) had pragmatic impairments in domains, where higher-level language was required. A common phenotype was identified, including an overriding 'double hit' of oral hypotonia and apraxia in infancy and preschool, associated with severely delayed speech development. Remarkably however, speech prognosis was positive; apraxia resolved, and although dysarthria persisted, children were intelligible by mid-to-late childhood. In contrast, language and literacy deficits persisted, and pragmatic deficits were apparent. Children with KdVS require early, intensive, speech motor and language therapy, with targeted literacy and social language interventions as developmentally appropriate. Greater understanding of the linguistic phenotype may help unravel the relevance of KANSL1 to child speech and language development.

  16. Multi-microphone adaptive array augmented with visual cueing.

    PubMed

    Gibson, Paul L; Hedin, Dan S; Davies-Venn, Evelyn E; Nelson, Peggy; Kramer, Kevin

    2012-01-01

    We present the development of an audiovisual array that enables hearing aid users to converse with multiple speakers in reverberant environments with significant speech babble noise where their hearing aids do not function well. The system concept consists of a smartphone, a smartphone accessory, and a smartphone software application. The smartphone accessory concept is a multi-microphone audiovisual array in a form factor that allows attachment to the back of the smartphone. The accessory will also contain a lower power radio by which it can transmit audio signals to compatible hearing aids. The smartphone software application concept will use the smartphone's built in camera to acquire images and perform real-time face detection using the built-in face detection support of the smartphone. The audiovisual beamforming algorithm uses the location of talking targets to improve the signal to noise ratio and consequently improve the user's speech intelligibility. Since the proposed array system leverages a handheld consumer electronic device, it will be portable and low cost. A PC based experimental system was developed to demonstrate the feasibility of an audiovisual multi-microphone array and these results are presented.

  17. The Role of Efficient XML Interchange (EXI) in Navy Wide-Area Network (WAN) Optimization

    DTIC Science & Technology

    2015-03-01

    compress, and re-encrypt data to continue providing optimization through compression; however, that capability requires careful consideration of...optimization 23 of encrypted data requires a careful analysis and comparison of performance improvements and IA vulnerabilities. It is important...Contained EXI capitalizes on multiple techniques to improve compression, and they vary depending on a set of EXI options passed to the codec

  18. Comparison of single-microphone noise reduction schemes: can hearing impaired listeners tell the difference?

    PubMed

    Huber, Rainer; Bisitz, Thomas; Gerkmann, Timo; Kiessling, Jürgen; Meister, Hartmut; Kollmeier, Birger

    2018-06-01

    The perceived qualities of nine different single-microphone noise reduction (SMNR) algorithms were to be evaluated and compared in subjective listening tests with normal hearing and hearing impaired (HI) listeners. Speech samples added with traffic noise or with party noise were processed by the SMNR algorithms. Subjects rated the amount of speech distortions, intrusiveness of background noise, listening effort and overall quality, using a simplified MUSHRA (ITU-R, 2003 ) assessment method. 18 normal hearing and 18 moderately HI subjects participated in the study. Significant differences between the rating behaviours of the two subject groups were observed: While normal hearing subjects clearly differentiated between different SMNR algorithms, HI subjects rated all processed signals very similarly. Moreover, HI subjects rated speech distortions of the unprocessed, noisier signals as being more severe than the distortions of the processed signals, in contrast to normal hearing subjects. It seems harder for HI listeners to distinguish between additive noise and speech distortions or/and they might have a different understanding of the term "speech distortion" than normal hearing listeners have. The findings confirm that the evaluation of SMNR schemes for hearing aids should always involve HI listeners.

  19. Content analysis of the professional journal of the Royal College of Speech and Language Therapists, III: 1966-2015-into the 21st century.

    PubMed

    Armstrong, Linda; Stansfield, Jois; Bloch, Steven

    2017-11-01

    Following content analyses of the first 30 years of the UK speech and language therapy professional body's journal, this study was conducted to survey the published work of the speech (and language) therapy profession over the last 50 years and trace key changes and themes. To understand better the development of the UK speech and language therapy profession over the last 50 years. All volumes of the professional journal of the Royal College of Speech and Language Therapists published between 1966 and 2015 (British Journal of Communication Disorders, European Journal of Communication Disorders and International Journal of Language and Communication Disorders) were examined using content analysis. The content was compared with that of the same journal as it appeared from 1935 to 1965. The journal has shown a trend towards more multi-authored and international papers, and a formalization of research methodologies. The volume of papers has increased considerably. Topic areas have expanded, but retain many of the areas of study found in earlier issues of the journal. The journal and its articles reflect the growing complexity of conditions being researched by speech and language therapists and their professional colleagues and give an indication of the developing evidence base for intervention and the diverse routes which speech and language therapy practice has taken over the last 50 years. © 2017 Royal College of Speech and Language Therapists.

  20. Multi-sensory learning and learning to read.

    PubMed

    Blomert, Leo; Froyen, Dries

    2010-09-01

    The basis of literacy acquisition in alphabetic orthographies is the learning of the associations between the letters and the corresponding speech sounds. In spite of this primacy in learning to read, there is only scarce knowledge on how this audiovisual integration process works and which mechanisms are involved. Recent electrophysiological studies of letter-speech sound processing have revealed that normally developing readers take years to automate these associations and dyslexic readers hardly exhibit automation of these associations. It is argued that the reason for this effortful learning may reside in the nature of the audiovisual process that is recruited for the integration of in principle arbitrarily linked elements. It is shown that letter-speech sound integration does not resemble the processes involved in the integration of natural audiovisual objects such as audiovisual speech. The automatic symmetrical recruitment of the assumedly uni-sensory visual and auditory cortices in audiovisual speech integration does not occur for letter and speech sound integration. It is also argued that letter-speech sound integration only partly resembles the integration of arbitrarily linked unfamiliar audiovisual objects. Letter-sound integration and artificial audiovisual objects share the necessity of a narrow time window for integration to occur. However, they differ from these artificial objects, because they constitute an integration of partly familiar elements which acquire meaning through the learning of an orthography. Although letter-speech sound pairs share similarities with audiovisual speech processing as well as with unfamiliar, arbitrary objects, it seems that letter-speech sound pairs develop into unique audiovisual objects that furthermore have to be processed in a unique way in order to enable fluent reading and thus very likely recruit other neurobiological learning mechanisms than the ones involved in learning natural or arbitrary unfamiliar audiovisual associations. Copyright 2010 Elsevier B.V. All rights reserved.

  1. Sparse/DCT (S/DCT) two-layered representation of prediction residuals for video coding.

    PubMed

    Kang, Je-Won; Gabbouj, Moncef; Kuo, C-C Jay

    2013-07-01

    In this paper, we propose a cascaded sparse/DCT (S/DCT) two-layer representation of prediction residuals, and implement this idea on top of the state-of-the-art high efficiency video coding (HEVC) standard. First, a dictionary is adaptively trained to contain featured patterns of residual signals so that a high portion of energy in a structured residual can be efficiently coded via sparse coding. It is observed that the sparse representation alone is less effective in the R-D performance due to the side information overhead at higher bit rates. To overcome this problem, the DCT representation is cascaded at the second stage. It is applied to the remaining signal to improve coding efficiency. The two representations successfully complement each other. It is demonstrated by experimental results that the proposed algorithm outperforms the HEVC reference codec HM5.0 in the Common Test Condition.

  2. Carry-over fluency induced by extreme prolongations: A new behavioral paradigm.

    PubMed

    Briley, P M; Barnes, M P; Kalinowski, J S

    2016-04-01

    Extreme prolongations, which can be generated via extreme delayed auditory feedback (DAF) (e.g., 250-500 ms) or mediated cognitively with timing applications (e.g., analog stopwatch) at 2 s per syllable, have long been behavioral techniques used to inhibit stuttering. Some therapies have used this rate solely to establish initial fluency, while others use extremely slowed speech to establish fluency and add other strategic techniques such as easy onsets and diaphragmatic breathing. Extreme prolongations generate effective, efficient, and immediate forward flowing fluent speech, removing the signature behaviors of discrete stuttering (i.e., syllable repetitions and audible and inaudible postural fixations). Prolonged use of extreme prolongations establishes carry-over fluency, which is spontaneous, effortless speech absent of most, if not all, overt and covert manifestations of stuttering. The creation of this immediate fluency and the immense potential of extreme prolongations to generate long periods of carry-over fluency have been overlooked by researchers and clinicians alike. Clinicians depart from these longer prolongation durations as they attempt to achieve the same fluent results at a near normal rate of speech. Clinicians assume they are re-teaching fluency and slow rates will give rise to more normal rates with less control, but without carry-over fluency, controls and cognitive mediation are always needed for the inherently unstable speech systems of persons who stutter to experience fluent speech. The assumption being that the speech system is untenable without some level of cognitive and motoric monitoring that is always necessary. The goal is omnipresent "near normal rate sounding fluency" with continuous mediation via cognitive and motoric processes. This pursuit of "normal sounding fluency" continues despite ever-present relapse. Relapse has become so common that acceptance of stuttering is the new therapy modality because relapse has come to be understood as somewhat inevitable. Researchers and clinicians fail to recognize that immediate amelioration of stuttering and its attendant carry-over fluency are signs of a different pathway to fluency. In this path, clinicians focus on extreme prolongations and the extent of their carry-over. While fluency is automatically generated under these extreme prolongations, the realization is that communication at this rate in routine speaking tasks is not feasible. The perceived solution is a systematic reduction in the duration of these prolongations, which attempts to approximate "normal speech." Typically, the reintroduction of speech at a normalized rate precipitates a laborious style that is undesirable to the person who stutters (PWS) and is discontinued, once departed from the comforts of the clinical setting. The inevitable typically occurs; the well-intentioned therapist instructs the PWS to focus on the techniques while speaking at a rate that is nearest normal speech, but the overlooked extreme prolongations are unlikely to ever be revisited. The foundation of this hypothesis is that the departure from fluency generators (e.g. extreme prolongations) is the cause of regression to the stuttering set point. In turn, we postulate that the continued use of extreme prolongations, as a solitary practice method, will establish and nurture different neural pathways that will create a modality of fluent speech, able to be experienced without cognitive or motoric mediation. This would therefore result in fewer occurrences of stuttering due to a phenomenon called carry-over fluency. Thus, we hypothesize that the use of extreme prolongations fosters neural pathways for fluent speech, which will result in carry-over fluency that does not require mediation by the speaker. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. The Role of Experience in the Perception of Phonetic Detail in Children's Speech: A Comparison between Speech-Language Pathologists and Clinically Untrained Listeners

    ERIC Educational Resources Information Center

    Munson, Benjamin; Johnson, Julie M.; Edwards, Jan

    2012-01-01

    Purpose: This study examined whether experienced speech-language pathologists (SLPs) differ from inexperienced people in their perception of phonetic detail in children's speech. Method: Twenty-one experienced SLPs and 21 inexperienced listeners participated in a series of tasks in which they used a visual-analog scale (VAS) to rate children's…

  4. Evaluation of NASA speech encoder

    NASA Technical Reports Server (NTRS)

    1976-01-01

    Techniques developed by NASA for spaceflight instrumentation were used in the design of a quantizer for speech-decoding. Computer simulation of the actions of the quantizer was tested with synthesized and real speech signals. Results were evaluated by a phometician. Topics discussed include the relationship between the number of quantizer levels and the required sampling rate; reconstruction of signals; digital filtering; speech recording, sampling, and storage, and processing results.

  5. Reduced efficiency of audiovisual integration for nonnative speech.

    PubMed

    Yi, Han-Gyol; Phelps, Jasmine E B; Smiljanic, Rajka; Chandrasekaran, Bharath

    2013-11-01

    The role of visual cues in native listeners' perception of speech produced by nonnative speakers has not been extensively studied. Native perception of English sentences produced by native English and Korean speakers in audio-only and audiovisual conditions was examined. Korean speakers were rated as more accented in audiovisual than in the audio-only condition. Visual cues enhanced word intelligibility for native English speech but less so for Korean-accented speech. Reduced intelligibility of Korean-accented audiovisual speech was associated with implicit visual biases, suggesting that listener-related factors partially influence the efficiency of audiovisual integration for nonnative speech perception.

  6. Expressed parental concern regarding childhood stuttering and the Test of Childhood Stuttering.

    PubMed

    Tumanova, Victoria; Choi, Dahye; Conture, Edward G; Walden, Tedra A

    The purpose of the present study was to determine whether the Test of Childhood Stuttering observational rating scales (TOCS; Gillam et al., 2009) (1) differed between parents who did versus did not express concern (independent from the TOCS) about their child's speech fluency; (2) correlated with children's frequency of stuttering measured during a child-examiner conversation; and (3) correlated with the length and complexity of children's utterances, as indexed by mean length of utterance (MLU). Participants were 183 young children ages 3:0-5:11. Ninety-one had parents who reported concern about their child's stuttering (65 boys, 26 girls) and 92 had parents who reported no such concern (50 boys, 42 girls). Participants' conversational speech during a child-examiner conversation was analyzed for (a) frequency of occurrence of stuttered and non-stuttered disfluencies, and (b) MLU. Besides expressing concern or lack thereof about their child's speech fluency, parents completed the TOCS observational rating scales documenting how often they observe different disfluency types in speech of their children, as well as disfluency-related consequences. There were three main findings. First, parents who expressed concern (independently from the TOCS) about their child's stuttering reported significantly higher scores on the TOCS Speech Fluency and Disfluency-Related Consequences rating scales. Second, children whose parents rated them higher on the TOCS Speech Fluency rating scale produced more stuttered disfluencies during a child-examiner conversation. Third, children with higher scores on the TOCS Disfluency-Related Consequences rating scale had shorter MLU during child-examiner conversation, across age and level of language ability. Findings support the use of the TOCS observational rating scales as one documentable, objective means to determine parental perception of and concern about their child's stuttering. Findings also support the notion that parents are reasonably accurate, if not reliable, judges of the quantity and quality (i.e., stuttered vs. non-stuttered) of their child's speech disfluencies. Lastly, findings that some children may decrease their verbal output in attempts to minimize instances of stuttering - as indexed by relatively low MLU and a high TOCS Disfluency-Related Consequences scores - provides strong support for sampling young children's speech and language across various situations to obtain the most representative index possible of the child's MLU and associated instances of stuttering. Copyright © 2018 Elsevier Inc. All rights reserved.

  7. Speaking Rate Characteristics of Elementary-School-Aged Children Who Do and Do Not Stutter

    ERIC Educational Resources Information Center

    Logan, Kenneth J.; Byrd, Courtney T.; Mazzocchi, Elizabeth M.; Gillam, Ronald B.

    2011-01-01

    Purpose: To compare articulation and speech rates of school-aged children who do and do not stutter across sentence priming, structured conversation, and narration tasks and to determine factors that predict children's speech and articulation rates. Method: 34 children who stutter (CWS) and 34 age- and gender-matched children who do not stutter…

  8. Assessing Speech Discrimination in Individual Infants

    ERIC Educational Resources Information Center

    Houston, Derek M.; Horn, David L.; Qi, Rong; Ting, Jonathan Y.; Gao, Sujuan

    2007-01-01

    Assessing speech discrimination skills in individual infants from clinical populations (e.g., infants with hearing impairment) has important diagnostic value. However, most infant speech discrimination paradigms have been designed to test group effects rather than individual differences. Other procedures suffer from high attrition rates. In this…

  9. Psychoacoustic cues to emotion in speech prosody and music.

    PubMed

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.

  10. Emotional speech synchronizes brains across listeners and engages large-scale dynamic brain networks

    PubMed Central

    Nummenmaa, Lauri; Saarimäki, Heini; Glerean, Enrico; Gotsopoulos, Athanasios; Jääskeläinen, Iiro P.; Hari, Riitta; Sams, Mikko

    2014-01-01

    Speech provides a powerful means for sharing emotions. Here we implement novel intersubject phase synchronization and whole-brain dynamic connectivity measures to show that networks of brain areas become synchronized across participants who are listening to emotional episodes in spoken narratives. Twenty participants' hemodynamic brain activity was measured with functional magnetic resonance imaging (fMRI) while they listened to 45-s narratives describing unpleasant, neutral, and pleasant events spoken in neutral voice. After scanning, participants listened to the narratives again and rated continuously their feelings of pleasantness–unpleasantness (valence) and of arousal–calmness. Instantaneous intersubject phase synchronization (ISPS) measures were computed to derive both multi-subject voxel-wise similarity measures of hemodynamic activity and inter-area functional dynamic connectivity (seed-based phase synchronization, SBPS). Valence and arousal time series were subsequently used to predict the ISPS and SBPS time series. High arousal was associated with increased ISPS in the auditory cortices and in Broca's area, and negative valence was associated with enhanced ISPS in the thalamus, anterior cingulate, lateral prefrontal, and orbitofrontal cortices. Negative valence affected functional connectivity of fronto-parietal, limbic (insula, cingulum) and fronto-opercular circuitries, and positive arousal affected the connectivity of the striatum, amygdala, thalamus, cerebellum, and dorsal frontal cortex. Positive valence and negative arousal had markedly smaller effects. We propose that high arousal synchronizes the listeners' sound-processing and speech-comprehension networks, whereas negative valence synchronizes circuitries supporting emotional and self-referential processing. PMID:25128711

  11. Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.

    PubMed

    Summers, Robert J; Bailey, Peter J; Roberts, Brian

    2012-04-01

    How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1 + F2 + F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3C; F2 + F3), where F2C + F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0 = constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.

  12. Support for context effects on segmentation and segments depends on the context.

    PubMed

    Heffner, Christopher C; Newman, Rochelle S; Idsardi, William J

    2017-04-01

    Listeners must adapt to differences in speech rate across talkers and situations. Speech rate adaptation effects are strong for adjacent syllables (i.e., proximal syllables). For studies that have assessed adaptation effects on speech rate information more than one syllable removed from a point of ambiguity in speech (i.e., distal syllables), the difference in strength between different types of ambiguity is stark. Studies of word segmentation have shown large shifts in perception as a result of distal rate manipulations, while studies of segmental perception have shown only weak, or even nonexistent, effects. However, no study has standardized methods and materials to study context effects for both types of ambiguity simultaneously. Here, a set of sentences was created that differed as minimally as possible except for whether the sentences were ambiguous to the voicing of a consonant or ambiguous to the location of a word boundary. The sentences were then rate-modified to slow down the distal context speech rate to various extents, dependent on three different definitions of distal context that were adapted from previous experiments, along with a manipulation of proximal context to assess whether proximal effects were comparable across ambiguity types. The results indicate that the definition of distal influenced the extent of distal rate effects strongly for both segments and segmentation. They also establish the presence of distal rate effects on word-final segments for the first time. These results were replicated, with some caveats regarding the perception of individual segments, in an Internet-based sample recruited from Mechanical Turk.

  13. Variable frame rate transmission - A review of methodology and application to narrow-band LPC speech coding

    NASA Astrophysics Data System (ADS)

    Viswanathan, V. R.; Makhoul, J.; Schwartz, R. M.; Huggins, A. W. F.

    1982-04-01

    The variable frame rate (VFR) transmission methodology developed, implemented, and tested in the years 1973-1978 for efficiently transmitting linear predictive coding (LPC) vocoder parameters extracted from the input speech at a fixed frame rate is reviewed. With the VFR method, parameters are transmitted only when their values have changed sufficiently over the interval since their preceding transmission. Two distinct approaches to automatic implementation of the VFR method are discussed. The first bases the transmission decisions on comparisons between the parameter values of the present frame and the last transmitted frame. The second, which is based on a functional perceptual model of speech, compares the parameter values of all the frames that lie in the interval between the present frame and the last transmitted frame against a linear model of parameter variation over that interval. Also considered is the application of VFR transmission to the design of narrow-band LPC speech coders with average bit rates of 2000-2400 bts/s.

  14. A Comparison of LBG and ADPCM Speech Compression Techniques

    NASA Astrophysics Data System (ADS)

    Bachu, Rajesh G.; Patel, Jignasa; Barkana, Buket D.

    Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. In all speech there is a degree of predictability and speech coding techniques exploit this to reduce bit rates yet still maintain a suitable level of quality. This paper is a study and implementation of Linde-Buzo-Gray Algorithm (LBG) and Adaptive Differential Pulse Code Modulation (ADPCM) algorithms to compress speech signals. In here we implemented the methods using MATLAB 7.0. The methods we used in this study gave good results and performance in compressing the speech and listening tests showed that efficient and high quality coding is achieved.

  15. Speech analyzer

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C. (Inventor)

    1977-01-01

    A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

  16. Investigations in mechanisms and strategies to enhance hearing with cochlear implants

    NASA Astrophysics Data System (ADS)

    Churchill, Tyler H.

    Cochlear implants (CIs) produce hearing sensations by stimulating the auditory nerve (AN) with current pulses whose amplitudes are modulated by filtered acoustic temporal envelopes. While this technology has provided hearing for multitudinous CI recipients, even bilaterally-implanted listeners have more difficulty understanding speech in noise and localizing sounds than normal hearing (NH) listeners. Three studies reported here have explored ways to improve electric hearing abilities. Vocoders are often used to simulate CIs for NH listeners. Study 1 was a psychoacoustic vocoder study examining the effects of harmonic carrier phase dispersion and simulated CI current spread on speech intelligibility in noise. Results showed that simulated current spread was detrimental to speech understanding and that speech vocoded with carriers whose components' starting phases were equal was the least intelligible. Cross-correlogram analyses of AN model simulations confirmed that carrier component phase dispersion resulted in better neural envelope representation. Localization abilities rely on binaural processing mechanisms in the brainstem and mid-brain that are not fully understood. In Study 2, several potential mechanisms were evaluated based on the ability of metrics extracted from stereo AN simulations to predict azimuthal locations. Results suggest that unique across-frequency patterns of binaural cross-correlation may provide a strong cue set for lateralization and that interaural level differences alone cannot explain NH sensitivity to lateral position. While it is known that many bilateral CI users are sensitive to interaural time differences (ITDs) in low-rate pulsatile stimulation, most contemporary CI processing strategies use high-rate, constant-rate pulse trains. In Study 3, we examined the effects of pulse rate and pulse timing on ITD discrimination, ITD lateralization, and speech recognition by bilateral CI listeners. Results showed that listeners were able to use low-rate pulse timing cues presented redundantly on multiple electrodes for ITD discrimination and lateralization of speech stimuli even when mixed with high rates on other electrodes. These results have contributed to a better understanding of those aspects of the auditory system that support speech understanding and binaural hearing, suggested vocoder parameters that may simulate aspects of electric hearing, and shown that redundant, low-rate pulse timing supports improved spatial hearing for bilateral CI listeners.

  17. Speech deterioration in amyotrophic lateral sclerosis (ALS) after manifestation of bulbar symptoms.

    PubMed

    Makkonen, Tanja; Ruottinen, Hanna; Puhto, Riitta; Helminen, Mika; Palmio, Johanna

    2018-03-01

    The symptoms and their progression in amyotrophic lateral sclerosis (ALS) are typically studied after the diagnosis has been confirmed. However, many people with ALS already have severe dysarthria and loss of adequate speech at the time of diagnosis. Speech-and-language therapy interventions should be targeted timely based on communicative need in ALS. To investigate how long natural speech will remain functional and to identify the changes in the speech of persons with ALS. Altogether 30 consecutive participants were studied and divided into two groups based on the initial type of ALS, bulbar or spinal. Their speech disorder was evaluated on severity, articulation rate and intelligibility during the 2-year follow-up. The ability to speak deteriorated to poor and necessitated augmentative and alternative communication (AAC) methods with 60% of the participants. Their speech remained adequate on average for 18 months from the first bulbar symptom. Severity, articulation rate and intelligibility declined with nearly all participants during the study. To begin with speech deteriorated more in the bulbar group than in the spinal group and the difference remained during the whole follow-up with some exceptions. The onset of bulbar symptoms indicated the time to loss of speech better than when assessed from ALS diagnosis or the first speech therapy evaluation. In clinical work, it is important to take the initial type of ALS into consideration when determining the urgency of AAC measures as people with bulbar-onset ALS are more susceptible to delayed evaluation and AAC intervention. © 2017 Royal College of Speech and Language Therapists.

  18. Effects of hearing loss on speech recognition under distracting conditions and working memory in the elderly.

    PubMed

    Na, Wondo; Kim, Gibbeum; Kim, Gungu; Han, Woojae; Kim, Jinsook

    2017-01-01

    The current study aimed to evaluate hearing-related changes in terms of speech-in-noise processing, fast-rate speech processing, and working memory; and to identify which of these three factors is significantly affected by age-related hearing loss. One hundred subjects aged 65-84 years participated in the study. They were classified into four groups ranging from normal hearing to moderate-to-severe hearing loss. All the participants were tested for speech perception in quiet and noisy conditions and for speech perception with time alteration in quiet conditions. Forward- and backward-digit span tests were also conducted to measure the participants' working memory. 1) As the level of background noise increased, speech perception scores systematically decreased in all the groups. This pattern was more noticeable in the three hearing-impaired groups than in the normal hearing group. 2) As the speech rate increased faster, speech perception scores decreased. A significant interaction was found between speed of speech and hearing loss. In particular, 30% of compressed sentences revealed a clear differentiation between moderate hearing loss and moderate-to-severe hearing loss. 3) Although all the groups showed a longer span on the forward-digit span test than the backward-digit span test, there was no significant difference as a function of hearing loss. The degree of hearing loss strongly affects the speech recognition of babble-masked and time-compressed speech in the elderly but does not affect the working memory. We expect these results to be applied to appropriate rehabilitation strategies for hearing-impaired elderly who experience difficulty in communication.

  19. Co-speech iconic gestures and visuo-spatial working memory.

    PubMed

    Wu, Ying Choon; Coulson, Seana

    2014-11-01

    Three experiments tested the role of verbal versus visuo-spatial working memory in the comprehension of co-speech iconic gestures. In Experiment 1, participants viewed congruent discourse primes in which the speaker's gestures matched the information conveyed by his speech, and incongruent ones in which the semantic content of the speaker's gestures diverged from that in his speech. Discourse primes were followed by picture probes that participants judged as being either related or unrelated to the preceding clip. Performance on this picture probe classification task was faster and more accurate after congruent than incongruent discourse primes. The effect of discourse congruency on response times was linearly related to measures of visuo-spatial, but not verbal, working memory capacity, as participants with greater visuo-spatial WM capacity benefited more from congruent gestures. In Experiments 2 and 3, participants performed the same picture probe classification task under conditions of high and low loads on concurrent visuo-spatial (Experiment 2) and verbal (Experiment 3) memory tasks. Effects of discourse congruency and verbal WM load were additive, while effects of discourse congruency and visuo-spatial WM load were interactive. Results suggest that congruent co-speech gestures facilitate multi-modal language comprehension, and indicate an important role for visuo-spatial WM in these speech-gesture integration processes. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. Cognitive-Perceptual Examination of Remediation Approaches to Hypokinetic Dysarthria

    ERIC Educational Resources Information Center

    McAuliffe, Megan J.; Kerr, Sarah E.; Gibson, Elizabeth M. R.; Anderson, Tim; LaShell, Patrick J.

    2014-01-01

    Purpose: To determine how increased vocal loudness and reduced speech rate affect listeners' cognitive-perceptual processing of hypokinetic dysarthric speech associated with Parkinson's disease. Method: Fifty-one healthy listener participants completed a speech perception experiment. Listeners repeated phrases produced by 5 individuals…

  1. Neural Entrainment to Rhythmically Presented Auditory, Visual, and Audio-Visual Speech in Children

    PubMed Central

    Power, Alan James; Mead, Natasha; Barnes, Lisa; Goswami, Usha

    2012-01-01

    Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal “samples” of information from the speech stream at different rates, phase resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (“phase locking”). Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate) based on repetition of the syllable “ba,” presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a “talking head”). To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the “ba” stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a “ba” in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling, such as dyslexia. PMID:22833726

  2. Adaptation of hidden Markov models for recognizing speech of reduced frame rate.

    PubMed

    Lee, Lee-Min; Jean, Fu-Rong

    2013-12-01

    The frame rate of the observation sequence in distributed speech recognition applications may be reduced to suit a resource-limited front-end device. In order to use models trained using full-frame-rate data in the recognition of reduced frame-rate (RFR) data, we propose a method for adapting the transition probabilities of hidden Markov models (HMMs) to match the frame rate of the observation. Experiments on the recognition of clean and noisy connected digits are conducted to evaluate the proposed method. Experimental results show that the proposed method can effectively compensate for the frame-rate mismatch between the training and the test data. Using our adapted model to recognize the RFR speech data, one can significantly reduce the computation time and achieve the same level of accuracy as that of a method, which restores the frame rate using data interpolation.

  3. Perceptually tuned low-bit-rate video codec for ATM networks

    NASA Astrophysics Data System (ADS)

    Chou, Chun-Hsien

    1996-02-01

    In order to maintain high visual quality in transmitting low bit-rate video signals over asynchronous transfer mode (ATM) networks, a layered coding scheme that incorporates the human visual system (HVS), motion compensation (MC), and conditional replenishment (CR) is presented in this paper. An empirical perceptual model is proposed to estimate the spatio- temporal just-noticeable distortion (STJND) profile for each frame, by which perceptually important (PI) prediction-error signals can be located. Because of the limited channel capacity of the base layer, only coded data of motion vectors, the PI signals within a small strip of the prediction-error image and, if there are remaining bits, the PI signals outside the strip are transmitted by the cells of the base-layer channel. The rest of the coded data are transmitted by the second-layer cells which may be lost due to channel error or network congestion. Simulation results show that visual quality of the reconstructed CIF sequence is acceptable when the capacity of the base-layer channel is allocated with 2 multiplied by 64 kbps and the cells of the second layer are all lost.

  4. Modifying Speech to Children based on their Perceived Phonetic Accuracy

    PubMed Central

    Julien, Hannah M.; Munson, Benjamin

    2014-01-01

    Purpose We examined the relationship between adults' perception of the accuracy of children's speech, and acoustic detail in their subsequent productions to children. Methods Twenty-two adults participated in a task in which they rated the accuracy of 2- and 3-year-old children's word-initial /s/and /∫/ using a visual analog scale (VAS), then produced a token of the same word as if they were responding to the child whose speech they had just rated. Result The duration of adults' fricatives varied as a function of their perception of the accuracy of children's speech: longer fricatives were produced following productions that they rated as inaccurate. This tendency to modify duration in response to perceived inaccurate tokens was mediated by measures of self-reported experience interacting with children. However, speakers did not increase the spectral distinctiveness of their fricatives following the perception of inaccurate tokens. Conclusion These results suggest that adults modify temporal features of their speech in response to perceiving children's inaccurate productions. These longer fricatives are potentially both enhanced input to children, and an error-corrective signal. PMID:22744140

  5. Speech motor correlates of treatment-related changes in stuttering severity and speech naturalness.

    PubMed

    Tasko, Stephen M; McClean, Michael D; Runyan, Charles M

    2007-01-01

    Participants of stuttering treatment programs provide an opportunity to evaluate persons who stutter as they demonstrate varying levels of fluency. Identifying physiologic correlates of altered fluency levels may lead to insights about mechanisms of speech disfluency. This study examined respiratory, orofacial kinematic and acoustic measures in 35 persons who stutter prior to and as they were completing a 1-month intensive stuttering treatment program. Participants showed a marked reduction in stuttering severity as they completed the treatment program. Coincident with reduced stuttering severity, participants increased the amplitude and duration of speech breaths, reduced the rate of lung volume change during inspiration, reduced the amplitude and speed of lip movements early in the test utterance, increased lip and jaw movement durations, and reduced syllable rate. A multiple regression model that included two respiratory measures and one orofacial kinematic measure accounted for 62% of the variance in changes in stuttering severity. Finally, there was a weak but significant tendency for speech of participants with the largest reductions in stuttering severity to be rated as more unnatural as they completed the treatment program.

  6. Binary video codec for data reduction in wireless visual sensor networks

    NASA Astrophysics Data System (ADS)

    Khursheed, Khursheed; Ahmad, Naeem; Imran, Muhammad; O'Nils, Mattias

    2013-02-01

    Wireless Visual Sensor Networks (WVSN) is formed by deploying many Visual Sensor Nodes (VSNs) in the field. Typical applications of WVSN include environmental monitoring, health care, industrial process monitoring, stadium/airports monitoring for security reasons and many more. The energy budget in the outdoor applications of WVSN is limited to the batteries and the frequent replacement of batteries is usually not desirable. So the processing as well as the communication energy consumption of the VSN needs to be optimized in such a way that the network remains functional for longer duration. The images captured by VSN contain huge amount of data and require efficient computational resources for processing the images and wide communication bandwidth for the transmission of the results. Image processing algorithms must be designed and developed in such a way that they are computationally less complex and must provide high compression rate. For some applications of WVSN, the captured images can be segmented into bi-level images and hence bi-level image coding methods will efficiently reduce the information amount in these segmented images. But the compression rate of the bi-level image coding methods is limited by the underlined compression algorithm. Hence there is a need for designing other intelligent and efficient algorithms which are computationally less complex and provide better compression rate than that of bi-level image coding methods. Change coding is one such algorithm which is computationally less complex (require only exclusive OR operations) and provide better compression efficiency compared to image coding but it is effective for applications having slight changes between adjacent frames of the video. The detection and coding of the Region of Interest (ROIs) in the change frame efficiently reduce the information amount in the change frame. But, if the number of objects in the change frames is higher than a certain level then the compression efficiency of both the change coding and ROI coding becomes worse than that of image coding. This paper explores the compression efficiency of the Binary Video Codec (BVC) for the data reduction in WVSN. We proposed to implement all the three compression techniques i.e. image coding, change coding and ROI coding at the VSN and then select the smallest bit stream among the results of the three compression techniques. In this way the compression performance of the BVC will never become worse than that of image coding. We concluded that the compression efficiency of BVC is always better than that of change coding and is always better than or equal that of ROI coding and image coding.

  7. Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications, January 1-March 31, 1981.

    ERIC Educational Resources Information Center

    Haskins Labs., New Haven, CT.

    Research reports on the nature of speech, instrumentation for the investigation of speech, and practical application of research are included in this status report for January 1-March 31, 1981. The reports deal with the following topics: (1) distinguishing temporal information for speaking rate from temporal information for intervocalic stop…

  8. Brain-to-text: decoding spoken phrases from phone representations in the brain.

    PubMed

    Herff, Christian; Heger, Dominic; de Pesters, Adriana; Telaar, Dominic; Brunner, Peter; Schalk, Gerwin; Schultz, Tanja

    2015-01-01

    It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.

  9. Brain-to-text: decoding spoken phrases from phone representations in the brain

    PubMed Central

    Herff, Christian; Heger, Dominic; de Pesters, Adriana; Telaar, Dominic; Brunner, Peter; Schalk, Gerwin; Schultz, Tanja

    2015-01-01

    It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech. PMID:26124702

  10. Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.

    PubMed

    Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger

    2011-01-01

    The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.

  11. Studies in automatic speech recognition and its application in aerospace

    NASA Astrophysics Data System (ADS)

    Taylor, Michael Robinson

    Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.

  12. Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality.

    PubMed

    Kates, James M; Arehart, Kathryn H

    2015-10-01

    This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships.

  13. Evaluating signal-to-noise ratios, loudness, and related measures as indicators of airborne sound insulation.

    PubMed

    Park, H K; Bradley, J S

    2009-09-01

    Subjective ratings of the audibility, annoyance, and loudness of music and speech sounds transmitted through 20 different simulated walls were used to identify better single number ratings of airborne sound insulation. The first part of this research considered standard measures such as the sound transmission class the weighted sound reduction index (R(w)) and variations of these measures [H. K. Park and J. S. Bradley, J. Acoust. Soc. Am. 126, 208-219 (2009)]. This paper considers a number of other measures including signal-to-noise ratios related to the intelligibility of speech and measures related to the loudness of sounds. An exploration of the importance of the included frequencies showed that the optimum ranges of included frequencies were different for speech and music sounds. Measures related to speech intelligibility were useful indicators of responses to speech sounds but were not as successful for music sounds. A-weighted level differences, signal-to-noise ratios and an A-weighted sound transmission loss measure were good predictors of responses when the included frequencies were optimized for each type of sound. The addition of new spectrum adaptation terms to R(w) values were found to be the most practical approach for achieving more accurate predictions of subjective ratings of transmitted speech and music sounds.

  14. Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality

    PubMed Central

    Kates, James M.; Arehart, Kathryn H.

    2015-01-01

    This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships. PMID:26520329

  15. Longitudinal changes in speech recognition in older persons.

    PubMed

    Dubno, Judy R; Lee, Fu-Shing; Matthews, Lois J; Ahlstrom, Jayne B; Horwitz, Amy R; Mills, John H

    2008-01-01

    Recognition of isolated monosyllabic words in quiet and recognition of key words in low- and high-context sentences in babble were measured in a large sample of older persons enrolled in a longitudinal study of age-related hearing loss. Repeated measures were obtained yearly or every 2 to 3 years. To control for concurrent changes in pure-tone thresholds and speech levels, speech-recognition scores were adjusted using an importance-weighted speech-audibility metric (AI). Linear-regression slope estimated the rate of change in adjusted speech-recognition scores. Recognition of words in quiet declined significantly faster with age than predicted by declines in speech audibility. As subjects aged, observed scores deviated increasingly from AI-predicted scores, but this effect did not accelerate with age. Rate of decline in word recognition was significantly faster for females than males and for females with high serum progesterone levels, whereas noise history had no effect. Rate of decline did not accelerate with age but increased with degree of hearing loss, suggesting that with more severe injury to the auditory system, impairments to auditory function other than reduced audibility resulted in faster declines in word recognition as subjects aged. Recognition of key words in low- and high-context sentences in babble did not decline significantly with age.

  16. Effect of Listeners' Linguistic Background on Perceptual Judgements of Hypernasality

    ERIC Educational Resources Information Center

    Lee, Alice; Brown, Susanna; Gibbon, Fiona E.

    2008-01-01

    Background: Many speech and language therapists work in a multilingual environment, making cross-linguistic studies of speech disorders clinically and theoretically important. Aims: To investigate the effect of listeners' linguistic background on their perceptual ratings of hypernasality and the reliability of the ratings. Methods &…

  17. Gender, status and 'powerless' speech: interactions of students and lecturers.

    PubMed

    McFadyen, R G

    1996-09-01

    The present study investigated whether the use of 'powerless' speech was affected by role status, speaker's gender and gender of another participant. Fifty-two university lecturers and 156 students participated. Students were paired with a lecturer or student of the same or opposite sex. The findings placed a question mark over the link between powerless speech and individuals of low role status. Moreover, against hypothesis, speaker's gender and gender of partner did not affect the use of qualifiers or fillers, although they affected the use of tag questions and some types of hesitation. A qualitative analysis was also conducted which suggested that the powerless features were, in fact, multi-functional with respect to power. In addition, the importance of a variety of interactional techniques, such as credibility techniques, in the creation or negotiation of relational power was documented. As a whole, these findings highlight problems with the concept of 'powerless' speech, at least with respect to relational power.

  18. Speech-language pathology students' self-reports on voice training: easier to understand or to do?

    PubMed

    Lindhe, Christina; Hartelius, Lena

    2009-01-01

    The aim of the study was to describe the subjective ratings of the course 'Training of the student's own voice and speech', from a student-centred perspective. A questionnaire was completed after each of the six individual sessions. Six speech and language pathology (SLP) students rated how they perceived the practical exercises in terms of doing and understanding. The results showed that five of the six participants rated the exercises as significantly easier to understand than to do. The exercises were also rated as easier to do over time. Results are interpreted within in a theoretical framework of approaches to learning. The findings support the importance of both the physical and reflective aspects of the voice training process.

  19. Development of Second Language French Oral Skills in an Instructed Setting: A Focus on Speech Ratings

    ERIC Educational Resources Information Center

    Trofimovich, Pavel; Kennedy, Sara; Blanchet, Josée

    2017-01-01

    This study examined the relationship between targeted pronunciation instruction in French as a second language (L2) and listener-based ratings of accent, comprehensibility, and fluency. The ratings by 20 French listeners evaluating the speech of 30 adult L2 French learners enrolled in a 15-week listening and speaking course targeting segments,…

  20. The effect of instantaneous input dynamic range setting on the speech perception of children with the nucleus 24 implant.

    PubMed

    Davidson, Lisa S; Skinner, Margaret W; Holstad, Beth A; Fears, Beverly T; Richter, Marie K; Matusofsky, Margaret; Brenner, Christine; Holden, Timothy; Birath, Amy; Kettel, Jerrica L; Scollie, Susan

    2009-06-01

    The purpose of this study was to examine the effects of a wider instantaneous input dynamic range (IIDR) setting on speech perception and comfort in quiet and noise for children wearing the Nucleus 24 implant system and the Freedom speech processor. In addition, children's ability to understand soft and conversational level speech in relation to aided sound-field thresholds was examined. Thirty children (age, 7 to 17 years) with the Nucleus 24 cochlear implant system and the Freedom speech processor with two different IIDR settings (30 versus 40 dB) were tested on the Consonant Nucleus Consonant (CNC) word test at 50 and 60 dB SPL, the Bamford-Kowal-Bench Speech in Noise Test, and a loudness rating task for four-talker speech noise. Aided thresholds for frequency-modulated tones, narrowband noise, and recorded Ling sounds were obtained with the two IIDRs and examined in relation to CNC scores at 50 dB SPL. Speech Intelligibility Indices were calculated using the long-term average speech spectrum of the CNC words at 50 dB SPL measured at each test site and aided thresholds. Group mean CNC scores at 50 dB SPL with the 40 IIDR were significantly higher (p < 0.001) than with the 30 IIDR. Group mean CNC scores at 60 dB SPL, loudness ratings, and the signal to noise ratios-50 for Bamford-Kowal-Bench Speech in Noise Test were not significantly different for the two IIDRs. Significantly improved aided thresholds at 250 to 6000 Hz as well as higher Speech Intelligibility Indices afforded improved audibility for speech presented at soft levels (50 dB SPL). These results indicate that an increased IIDR provides improved word recognition for soft levels of speech without compromising comfort of higher levels of speech sounds or sentence recognition in noise.

  1. Predicting Intelligibility Gains in Dysarthria through Automated Speech Feature Analysis

    ERIC Educational Resources Information Center

    Fletcher, Annalise R.; Wisler, Alan A.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.

    2017-01-01

    Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, &…

  2. Building Searchable Collections of Enterprise Speech Data.

    ERIC Educational Resources Information Center

    Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret

    The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…

  3. Measuring Speech Comprehensibility in Students with Down Syndrome

    ERIC Educational Resources Information Center

    Yoder, Paul J.; Woynaroski, Tiffany; Camarata, Stephen

    2016-01-01

    Purpose: There is an ongoing need to develop assessments of spontaneous speech that focus on whether the child's utterances are comprehensible to listeners. This study sought to identify the attributes of a stable ratings-based measure of speech comprehensibility, which enabled examining the criterion-related validity of an orthography-based…

  4. Immediate Effect of Alcohol on Voice Tremor Parameters and Speech Motor Control

    ERIC Educational Resources Information Center

    Krishnan, Gayathri; Ghosh, Vipin

    2017-01-01

    The complex neuro-muscular interplay of speech subsystems is susceptible to alcohol intoxication. Published reports have studied language formulation and fundamental frequency measures pre- and post-intoxication. This study aimed at tapping the speech motor control measure using rate, consistency, and accuracy measures of diadochokinesis and…

  5. Planning and production of grammatical and lexical verbs in multi-word messages.

    PubMed

    Michel Lange, Violaine; Messerschmidt, Maria; Harder, Peter; Siebner, Hartwig Roman; Boye, Kasper

    2017-01-01

    Grammatical words represent the part of grammar that can be most directly contrasted with the lexicon. Aphasiological studies, linguistic theories and psycholinguistic studies suggest that their processing is operated at different stages in speech production. Models of sentence production propose that at the formulation stage, lexical words are processed at the functional level while grammatical words are processed at a later positional level. In this study we consider proposals made by linguistic theories and psycholinguistic models to derive two predictions for the processing of grammatical words compared to lexical words. First, based on the assumption that grammatical words are less crucial for communication and therefore paid less attention to, it is predicted that they show shorter articulation times and/or higher error rates than lexical words. Second, based on the assumption that grammatical words differ from lexical words in being dependent on a lexical host, it is hypothesized that the retrieval of a grammatical word has to be put on hold until its lexical host is available, and it is predicted that this is reflected in longer reaction times (RTs) for grammatical compared to lexical words. We investigated these predictions by comparing fully homonymous sentences with only a difference in verb status (grammatical vs. lexical) elicited by a specific context. We measured RTs, duration and accuracy rate. No difference in duration was observed. Longer RTs and a lower accuracy rate for grammatical words were reported, successfully reflecting grammatical word properties as defined by linguistic theories and psycholinguistic models. Importantly, this study provides insight into the span of encoding and grammatical encoding processes in speech production.

  6. Planning and production of grammatical and lexical verbs in multi-word messages

    PubMed Central

    Messerschmidt, Maria; Harder, Peter; Siebner, Hartwig Roman; Boye, Kasper

    2017-01-01

    Grammatical words represent the part of grammar that can be most directly contrasted with the lexicon. Aphasiological studies, linguistic theories and psycholinguistic studies suggest that their processing is operated at different stages in speech production. Models of sentence production propose that at the formulation stage, lexical words are processed at the functional level while grammatical words are processed at a later positional level. In this study we consider proposals made by linguistic theories and psycholinguistic models to derive two predictions for the processing of grammatical words compared to lexical words. First, based on the assumption that grammatical words are less crucial for communication and therefore paid less attention to, it is predicted that they show shorter articulation times and/or higher error rates than lexical words. Second, based on the assumption that grammatical words differ from lexical words in being dependent on a lexical host, it is hypothesized that the retrieval of a grammatical word has to be put on hold until its lexical host is available, and it is predicted that this is reflected in longer reaction times (RTs) for grammatical compared to lexical words. We investigated these predictions by comparing fully homonymous sentences with only a difference in verb status (grammatical vs. lexical) elicited by a specific context. We measured RTs, duration and accuracy rate. No difference in duration was observed. Longer RTs and a lower accuracy rate for grammatical words were reported, successfully reflecting grammatical word properties as defined by linguistic theories and psycholinguistic models. Importantly, this study provides insight into the span of encoding and grammatical encoding processes in speech production. PMID:29091940

  7. Normative Topographic ERP Analyses of Speed of Speech Processing and Grammar Before and After Grammatical Treatment

    PubMed Central

    Yoder, Paul J.; Molfese, Dennis; Murray, Micah M.; Key, Alexandra P. F.

    2013-01-01

    Typically developing (TD) preschoolers and age-matched preschoolers with specific language impairment (SLI) received event-related potentials (ERPs) to four monosyllabic speech sounds prior to treatment and, in the SLI group, after 6 months of grammatical treatment. Before treatment, the TD group processed speech sounds faster than the SLI group. The SLI group increased the speed of their speech processing after treatment. Post-treatment speed of speech processing predicted later impairment in comprehending phrase elaboration in the SLI group. During the treatment phase, change in speed of speech processing predicted growth rate of grammar in the SLI group. PMID:24219693

  8. Gigabit Network Communications Research

    DTIC Science & Technology

    1992-12-31

    additional BPF channels, raw bytesync support for video codecs, and others. All source file modifications were logged with RCS. Source and object trees were...34 (RFCs). 20 RFCs were published this quarter: RFC 1366: Gerich, E., " Guidelines for Management of IP Address Space", Merit, October 1992. RFC 1367...Topolcic, C., "Schedule for IP Address Space Management Guidelines ", CNRI, October 1992. RFC 1368: McMaster, D. (Synoptics Communications, Inc.), K

  9. CAGE IIIA Distributed Simulation Design Methodology

    DTIC Science & Technology

    2014-05-01

    2 VHF Very High Frequency VLC Video LAN Codec – an Open-source cross-platform multimedia player and framework VM Virtual Machine VOIP Voice Over...Implementing Defence Experimentation (GUIDEx). The key challenges for this methodology are with understanding how to: • design it o define the...operation and to be available in the other nation’s simulations. The challenge for the CAGE campaign of experiments is to continue to build upon this

  10. Coprime and Nested Arrays: A New Paradigm for Sampling in Space and Time

    DTIC Science & Technology

    2015-09-14

    International Conference on Computers and Devices for Communication (CODEC), Kolkata, India , December, 2015. 2) Plenary speaker at the Asia Pacific...National Symposium on Mathematical Methods and Applications, Chennai, India , Dec. 2013. This is held to honor the late Srinivasa Ramanujan every year...4) Plenary speaker at the International Conf. on Comm. and Signal Processing (ICCSP), Calicut, India , 2011. List of publications during the above

  11. The Influence of Noise Reduction on Speech Intelligibility, Response Times to Speech, and Perceived Listening Effort in Normal-Hearing Listeners.

    PubMed

    van den Tillaart-Haverkate, Maj; de Ronde-Brons, Inge; Dreschler, Wouter A; Houben, Rolph

    2017-01-01

    Single-microphone noise reduction leads to subjective benefit, but not to objective improvements in speech intelligibility. We investigated whether response times (RTs) provide an objective measure of the benefit of noise reduction and whether the effect of noise reduction is reflected in rated listening effort. Twelve normal-hearing participants listened to digit triplets that were either unprocessed or processed with one of two noise-reduction algorithms: an ideal binary mask (IBM) and a more realistic minimum mean square error estimator (MMSE). For each of these three processing conditions, we measured (a) speech intelligibility, (b) RTs on two different tasks (identification of the last digit and arithmetic summation of the first and last digit), and (c) subjective listening effort ratings. All measurements were performed at four signal-to-noise ratios (SNRs): -5, 0, +5, and +∞ dB. Speech intelligibility was high (>97% correct) for all conditions. A significant decrease in response time, relative to the unprocessed condition, was found for both IBM and MMSE for the arithmetic but not the identification task. Listening effort ratings were significantly lower for IBM than for MMSE and unprocessed speech in noise. We conclude that RT for an arithmetic task can provide an objective measure of the benefit of noise reduction. For young normal-hearing listeners, both ideal and realistic noise reduction can reduce RTs at SNRs where speech intelligibility is close to 100%. Ideal noise reduction can also reduce perceived listening effort.

  12. Identification of speech transients using variable frame rate analysis and wavelet packets.

    PubMed

    Rasetshwane, Daniel M; Boston, J Robert; Li, Ching-Chung

    2006-01-01

    Speech transients are important cues for identifying and discriminating speech sounds. Yoo et al. and Tantibundhit et al. were successful in identifying speech transients and, emphasizing them, improving the intelligibility of speech in noise. However, their methods are computationally intensive and unsuitable for real-time applications. This paper presents a method to identify and emphasize speech transients that combines subband decomposition by the wavelet packet transform with variable frame rate (VFR) analysis and unvoiced consonant detection. The VFR analysis is applied to each wavelet packet to define a transitivity function that describes the extent to which the wavelet coefficients of that packet are changing. Unvoiced consonant detection is used to identify unvoiced consonant intervals and the transitivity function is amplified during these intervals. The wavelet coefficients are multiplied by the transitivity function for that packet, amplifying the coefficients localized at times when they are changing and attenuating coefficients at times when they are steady. Inverse transform of the modified wavelet packet coefficients produces a signal corresponding to speech transients similar to the transients identified by Yoo et al. and Tantibundhit et al. A preliminary implementation of the algorithm runs more efficiently.

  13. Clear Speech Variants: An Acoustic Study in Parkinson's Disease.

    PubMed

    Lam, Jennifer; Tjaden, Kris

    2016-08-01

    The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different sentences selected from the Sentence Intelligibility Test (Yorkston & Beukelman, 1996). All speakers produced stimuli in 4 speaking conditions (habitual, clear, overenunciate, and hearing impaired). Segmental acoustic measures included vowel space area and first moment (M1) coefficient difference measures for consonant pairs. Second formant slope of diphthongs and measures of vowel and fricative durations were also obtained. Suprasegmental measures included fundamental frequency, sound pressure level, and articulation rate. For the majority of adjustments, all variants of clear speech instruction differed from the habitual condition. The overenunciate condition elicited the greatest magnitude of change for segmental measures (vowel space area, vowel durations) and the slowest articulation rates. The hearing impaired condition elicited the greatest fricative durations and suprasegmental adjustments (fundamental frequency, sound pressure level). Findings have implications for a model of speech production for healthy speakers as well as for speakers with dysarthria. Findings also suggest that particular clear speech instructions may target distinct speech subsystems.

  14. Bilingual Listeners' Perception of Temporally Manipulated English Passages

    ERIC Educational Resources Information Center

    Shi, Lu-Feng; Farooq, Nadia

    2012-01-01

    Purpose: The current study measured, objectively and subjectively, how changes in speech rate affect recognition of English passages in bilingual listeners. Method: Ten native monolingual, 20 English-dominant bilingual, and 20 non-English-dominant bilingual listeners repeated target words in English passages at five speech rates (unprocessed, two…

  15. Listeners' Perceptions of Speech and Language Disorders

    ERIC Educational Resources Information Center

    Allard, Emily R.; Williams, Dale F.

    2008-01-01

    Using semantic differential scales with nine trait pairs, 445 adults rated five audio-taped speech samples, one depicting an individual without a disorder and four portraying communication disorders. Statistical analyses indicated that the no disorder sample was rated higher with respect to the trait of employability than were the articulation,…

  16. Speech perception at the interface of neurobiology and linguistics.

    PubMed

    Poeppel, David; Idsardi, William J; van Wassenhove, Virginie

    2008-03-12

    Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.

  17. Is complex signal processing for bone conduction hearing aids useful?

    PubMed

    Kompis, Martin; Kurz, Anja; Pfiffner, Flurin; Senn, Pascal; Arnold, Andreas; Caversaccio, Marco

    2014-05-01

    To establish whether complex signal processing is beneficial for users of bone anchored hearing aids. Review and analysis of two studies from our own group, each comparing a speech processor with basic digital signal processing (either Baha Divino or Baha Intenso) and a processor with complex digital signal processing (either Baha BP100 or Baha BP110 power). The main differences between basic and complex signal processing are the number of audiologist accessible frequency channels and the availability and complexity of the directional multi-microphone noise reduction and loudness compression systems. Both studies show a small, statistically non-significant improvement of speech understanding in quiet with the complex digital signal processing. The average improvement for speech in noise is +0.9 dB, if speech and noise are emitted both from the front of the listener. If noise is emitted from the rear and speech from the front of the listener, the advantage of the devices with complex digital signal processing as opposed to those with basic signal processing increases, on average, to +3.2 dB (range +2.3 … +5.1 dB, p ≤ 0.0032). Complex digital signal processing does indeed improve speech understanding, especially in noise coming from the rear. This finding has been supported by another study, which has been published recently by a different research group. When compared to basic digital signal processing, complex digital signal processing can increase speech understanding of users of bone anchored hearing aids. The benefit is most significant for speech understanding in noise.

  18. Toward a dual-learning systems model of speech category learning

    PubMed Central

    Chandrasekaran, Bharath; Koslov, Seth R.; Maddox, W. T.

    2014-01-01

    More than two decades of work in vision posits the existence of dual-learning systems of category learning. The reflective system uses working memory to develop and test rules for classifying in an explicit fashion, while the reflexive system operates by implicitly associating perception with actions that lead to reinforcement. Dual-learning systems models hypothesize that in learning natural categories, learners initially use the reflective system and, with practice, transfer control to the reflexive system. The role of reflective and reflexive systems in auditory category learning and more specifically in speech category learning has not been systematically examined. In this article, we describe a neurobiologically constrained dual-learning systems theoretical framework that is currently being developed in speech category learning and review recent applications of this framework. Using behavioral and computational modeling approaches, we provide evidence that speech category learning is predominantly mediated by the reflexive learning system. In one application, we explore the effects of normal aging on non-speech and speech category learning. Prominently, we find a large age-related deficit in speech learning. The computational modeling suggests that older adults are less likely to transition from simple, reflective, unidimensional rules to more complex, reflexive, multi-dimensional rules. In a second application, we summarize a recent study examining auditory category learning in individuals with elevated depressive symptoms. We find a deficit in reflective-optimal and an enhancement in reflexive-optimal auditory category learning. Interestingly, individuals with elevated depressive symptoms also show an advantage in learning speech categories. We end with a brief summary and description of a number of future directions. PMID:25132827

  19. Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

    PubMed Central

    Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor

    2015-01-01

    In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757

  20. Impact of cognitive function and dysarthria on spoken language and perceived speech severity in multiple sclerosis

    NASA Astrophysics Data System (ADS)

    Feenaughty, Lynda

    Purpose: The current study sought to investigate the separate effects of dysarthria and cognitive status on global speech timing, speech hesitation, and linguistic complexity characteristics and how these speech behaviors impose on listener impressions for three connected speech tasks presumed to differ in cognitive-linguistic demand for four carefully defined speaker groups; 1) MS with cognitive deficits (MSCI), 2) MS with clinically diagnosed dysarthria and intact cognition (MSDYS), 3) MS without dysarthria or cognitive deficits (MS), and 4) healthy talkers (CON). The relationship between neuropsychological test scores and speech-language production and perceptual variables for speakers with cognitive deficits was also explored. Methods: 48 speakers, including 36 individuals reporting a neurological diagnosis of MS and 12 healthy talkers participated. The three MS groups and control group each contained 12 speakers (8 women and 4 men). Cognitive function was quantified using standard clinical tests of memory, information processing speed, and executive function. A standard z-score of ≤ -1.50 indicated deficits in a given cognitive domain. Three certified speech-language pathologists determined the clinical diagnosis of dysarthria for speakers with MS. Experimental speech tasks of interest included audio-recordings of an oral reading of the Grandfather passage and two spontaneous speech samples in the form of Familiar and Unfamiliar descriptive discourse. Various measures of spoken language were of interest. Suprasegmental acoustic measures included speech and articulatory rate. Linguistic speech hesitation measures included pause frequency (i.e., silent and filled pauses), mean silent pause duration, grammatical appropriateness of pauses, and interjection frequency. For the two discourse samples, three standard measures of language complexity were obtained including subordination index, inter-sentence cohesion adequacy, and lexical diversity. Ten listeners judged each speech sample using the perceptual construct of Speech Severity using a visual analog scale. Additional measures obtained to describe participants included the Sentence Intelligibility Test (SIT), the 10-item Communication Participation Item Bank (CPIB), and standard biopsychosocial measures of depression (Beck Depression Inventory-Fast Screen; BDI-FS), fatigue (Fatigue Severity Scale; FSS), and overall disease severity (Expanded Disability Status Scale; EDSS). Healthy controls completed all measures, with the exception of the CPIB and EDSS. All data were analyzed using standard, descriptive and parametric statistics. For the MSCI group, the relationship between neuropsychological test scores and speech-language variables were explored for each speech task using Pearson correlations. The relationship between neuropsychological test scores and Speech Severity also was explored. Results and Discussion: Topic familiarity for descriptive discourse did not strongly influence speech production or perceptual variables; however, results indicated predicted task-related differences for some spoken language measures. With the exception of the MSCI group, all speaker groups produced the same or slower global speech timing (i.e., speech and articulatory rates), more silent and filled pauses, more grammatical and longer silent pause durations in spontaneous discourse compared to reading aloud. Results revealed no appreciable task differences for linguistic complexity measures. Results indicated group differences for speech rate. The MSCI group produced significantly faster speech rates compared to the MSDYS group. Both the MSDYS and the MSCI groups were judged to have significantly poorer perceived Speech Severity compared to typically aging adults. The Task x Group interaction was only significant for the number of silent pauses. The MSDYS group produced fewer silent pauses in spontaneous speech and more silent pauses in the reading task compared to other groups. Finally, correlation analysis revealed moderate relationships between neuropsychological test scores and speech hesitation measures, within the MSCI group. Slower information processing and poorer memory were significantly correlated with more silent pauses and poorer executive function was associated with fewer filled pauses in the Unfamiliar discourse task. Results have both clinical and theoretical implications. Overall, clinicians should demonstrate caution when interpreting global measures of speech timing and perceptual measures in the absence of information about cognitive ability. Results also have implications for a comprehensive model of spoken language incorporating cognitive, linguistic, and motor variables.

  1. Controller design and consonantal contrast coding using a multi-finger tactual display1

    PubMed Central

    Israr, Ali; Meckl, Peter H.; Reed, Charlotte M.; Tan, Hong Z.

    2009-01-01

    This paper presents the design and evaluation of a new controller for a multi-finger tactual display in speech communication. A two-degree-of-freedom controller consisting of a feedback controller and a prefilter and its application in a consonant contrasting experiment are presented. The feedback controller provides stable, fast, and robust response of the fingerpad interface and the prefilter shapes the frequency-response of the closed-loop system to match with the human detection-threshold function. The controller is subsequently used in a speech communication system that extracts spectral features from recorded speech signals and presents them as vibrational-motional waveforms to three digits on a receiver’s left hand. Performance from a consonantal contrast test suggests that participants are able to identify tactual cues necessary for discriminating consonants in the initial position of consonant-vowel-consonant (CVC) segments. The average sensitivity indices for contrasting voicing, place, and manner features are 3.5, 2.7, and 3.4, respectively. The results show that the consonantal features can be successfully transmitted by utilizing a broad range of the kinesthetic-cutaneous sensory system. The present study also demonstrates the validity of designing controllers that take into account not only the electromechanical properties of the hardware, but the sensory characteristics of the human user. PMID:19507975

  2. Spatial release of cognitive load measured in a dual-task paradigm in normal-hearing and hearing-impaired listeners.

    PubMed

    Xia, Jing; Nooraei, Nazanin; Kalluri, Sridhar; Edwards, Brent

    2015-04-01

    This study investigated whether spatial separation between talkers helps reduce cognitive processing load, and how hearing impairment interacts with the cognitive load of individuals listening in multi-talker environments. A dual-task paradigm was used in which performance on a secondary task (visual tracking) served as a measure of the cognitive load imposed by a speech recognition task. Visual tracking performance was measured under four conditions in which the target and the interferers were distinguished by (1) gender and spatial location, (2) gender only, (3) spatial location only, and (4) neither gender nor spatial location. Results showed that when gender cues were available, a 15° spatial separation between talkers reduced the cognitive load of listening even though it did not provide further improvement in speech recognition (Experiment I). Compared to normal-hearing listeners, large individual variability in spatial release of cognitive load was observed among hearing-impaired listeners. Cognitive load was lower when talkers were spatially separated by 60° than when talkers were of different genders, even though speech recognition was comparable in these two conditions (Experiment II). These results suggest that a measure of cognitive load might provide valuable insight into the benefit of spatial cues in multi-talker environments.

  3. Effect of a multi-level intervention on nurse—patient communication in the intensive care unit: Results of the SPEACS trial

    PubMed Central

    Happ, Mary Beth; Garrett, Kathryn L.; Tate, Judith A.; DiVirgilio, Dana; Houze, Martin P.; Demirci, Jill R.; George, Elisabeth; Sereika, Susan M.

    2014-01-01

    Objective To test the impact of two levels of intervention on communication frequency, quality, success, and ease between nurses and intubated intensive care unit (ICU) patients. Design Quasi-experimental, 3-phase sequential cohort study: (1) usual care, (2) basic communication skills training (BCST) for nurses, (3) additional training in augmentative and alternative communication devices and speech language pathologist consultation (AAC + SLP). Trained observers rated four 3-min video-recordings for each nurseepatient dyad for communication frequency, quality and success. Patients self-rated communication ease. Setting Two ICUs in a university-affiliated medical center. Participants 89 intubated patients awake, responsive and unable to speak and 30 ICU nurses. Main results Communication frequency (mean number of communication acts within a communication exchange) and positive nurse communication behaviors increased significantly in one ICU only. Percentage of successful communication exchanges about pain were greater for the two intervention groups than the usual care/control group across both ICUs (p = .03) with more successful sessions about pain and other symptoms in the AAC + SLP group (p = .07). Patients in the AAC SLP intervention group used significantly more AAC methods (p = .002) and rated communication at high difficulty less often (p < .01). Conclusions This study provides support for the feasibility, utility and efficacy of a multi-level communication skills training, materials and SLP consultation intervention in the ICU. PMID:24495519

  4. Comprehension of synthetic speech and digitized natural speech by adults with aphasia.

    PubMed

    Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E

    2017-09-01

    Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. [Development and equivalence evaluation of spondee lists of mandarin speech test materials].

    PubMed

    Zhang, Hua; Wang, Shuo; Wang, Liang; Chen, Jing; Chen, Ai-ting; Guo, Lian-sheng; Zhao, Xiao-yan; Ji, Chen

    2006-06-01

    To edit the spondee (disyllable) word lists as a part of mandarin speech test materials (MSTM). These will be basic speech materials for routine tests in clinics and laboratories. Two groups of professionals (audiologists, Chinese and Mandarin scientists, linguistician and statistician) were set up at first. The editing principles were established after 3 round table meetings. Ten spondee lists, each with 50 words, were edited and recorded into cassettes. All lists were phonemically balanced (3-dimensions: vowels, consonants and Chinese tones). Seventy-three normal hearing college students were tested. The speech was presented by earphone monaurally. Three statistic methods were used for equivalent analysis. Related analysis showed that all lists were much related, except List 5. Cluster analysis showed that all ten lists could be classified as two groups. But Kappa test showed that the lists' homogeneity were not well. Spondee lists are one of the most routine speech test materials. Their editing, recording and equivalent evaluation are affected by many factors. This also needs multi-discipline cooperation. All lists edited in present study need future modification in recording and testing in order to be used clinically and in research. The phonemic balance should be kept.

  6. Utility and accuracy of perceptual voice and speech distinctions in the diagnosis of Parkinson's disease, PSP and MSA-P.

    PubMed

    Miller, Nick; Nath, Uma; Noble, Emma; Burn, David

    2017-06-01

    To determine if perceptual speech measures distinguish people with Parkinson's disease (PD), multiple system atrophy with predominant parkinsonism (MSA-P) and progressive supranuclear palsy (PSP). Speech-language therapists blind to patient characteristics employed clinical rating scales to evaluate speech/voice in 24 people with clinically diagnosed PD, 17 with PSP and 9 with MSA-P, matched for disease duration (mean 4.9 years, standard deviation 2.2). No consistent intergroup differences appeared on specific speech/voice variables. People with PD were significantly less impaired on overall speech/voice severity. Analyses by severity suggested further investigation around laryngeal, resonance and fluency changes may characterize individual groups. MSA-P and PSP compared with PD were distinguished by severity of speech/voice deterioration, but individual speech/voice parameters failed to consistently differentiate groups.

  7. Accuracy of Cochlear Implant Recipients on Speech Reception in Background Music

    PubMed Central

    Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Kliethermes, Stephanie; Driscoll, Virginia

    2012-01-01

    Objectives This study (a) examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of three contrasting types of background music, and (b) compared performance based upon listener groups: CI recipients using conventional long-electrode (LE) devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing (NH) adults. Methods We tested 154 LE CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 NH adults on closed-set recognition of spondees presented in three contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Outcomes Signal-to-noise thresholds for speech in music (SRTM) were examined in relation to measures of speech recognition in background noise and multi-talker babble, pitch perception, and music experience. Results SRTM thresholds varied as a function of category of background music, group membership (LE, Hybrid, NH), and age. Thresholds for speech in background music were significantly correlated with measures of pitch perception and speech in background noise thresholds; auditory status was an important predictor. Conclusions Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music. PMID:23342550

  8. Neural Representations Used by Brain Regions Underlying Speech Production

    ERIC Educational Resources Information Center

    Segawa, Jennifer Anne

    2013-01-01

    Speech utterances are phoneme sequences but may not always be represented as such in the brain. For instance, electropalatography evidence indicates that as speaking rate increases, gestures within syllables are manipulated separately but those within consonant clusters act as one motor unit. Moreover, speech error data suggest that a syllable's…

  9. Autonomic Correlates of Speech versus Nonspeech Tasks in Children and Adults

    ERIC Educational Resources Information Center

    Arnold, Hayley S.; MacPherson, Megan K.; Smith, Anne

    2014-01-01

    Purpose: To assess autonomic arousal associated with speech and nonspeech tasks in school-age children and young adults. Method: Measures of autonomic arousal (electrodermal level, electrodermal response amplitude, blood pulse volume, and heart rate) were recorded prior to, during, and after the performance of speech and nonspeech tasks by twenty…

  10. A high quality voice coder with integrated echo canceller and voice activity detector for mobile satellite applications

    NASA Technical Reports Server (NTRS)

    Kondoz, A. M.; Evans, B. G.

    1993-01-01

    In the last decade, low bit rate speech coding research has received much attention resulting in newly developed, good quality, speech coders operating at as low as 4.8 Kb/s. Although speech quality at around 8 Kb/s is acceptable for a wide variety of applications, at 4.8 Kb/s more improvements in quality are necessary to make it acceptable to the majority of applications and users. In addition to the required low bit rate with acceptable speech quality, other facilities such as integrated digital echo cancellation and voice activity detection are now becoming necessary to provide a cost effective and compact solution. In this paper we describe a CELP speech coder with integrated echo canceller and a voice activity detector all of which have been implemented on a single DSP32C with 32 KBytes of SRAM. The quality of CELP coded speech has been improved significantly by a new codebook implementation which also simplifies the encoder/decoder complexity making room for the integration of a 64-tap echo canceller together with a voice activity detector.

  11. "Having the heart to be evaluated": The differential effects of fears of positive and negative evaluation on emotional and cardiovascular responses to social threat.

    PubMed

    Weeks, Justin W; Zoccola, Peggy M

    2015-12-01

    Accumulating evidence supports fear of evaluation in general as important in social anxiety, including fear of positive evaluation (FPE) and fear of negative evaluation (FNE). The present study examined state responses to an impromptu speech task with a sample of 81 undergraduates. This study is the first to compare and contrast physiological responses associated with FPE and FNE, and to examine both FPE- and FNE-related changes in state anxiety/affect in response to perceived social evaluation during a speech. FPE uniquely predicted (relative to FNE/depression) increases in mean heart rate during the speech; in contrast, neither FNE nor depression related to changes in heart rate. Both FPE and FNE related uniquely to increases in negative affect and state anxiety during the speech. Furthermore, pre-speech state anxiety mediated the relationship between trait FPE and diminished positive affect during the speech. Implications for the theoretical conceptualization and treatment of social anxiety are discussed. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    PubMed

    Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D; Senn, Pascal

    2013-01-01

    To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0-500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Higher frame rate (>7 fps), higher camera resolution (>640 × 480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  13. Voice stress analysis

    NASA Technical Reports Server (NTRS)

    Brenner, Malcolm; Shipp, Thomas

    1988-01-01

    In a study of the validity of eight candidate voice measures (fundamental frequency, amplitude, speech rate, frequency jitter, amplitude shimmer, Psychological Stress Evaluator scores, energy distribution, and the derived measure of the above measures) for determining psychological stress, 17 males age 21 to 35 were subjected to a tracking task on a microcomputer CRT while parameters of vocal production as well as heart rate were measured. Findings confirm those of earlier studies that increases in fundamental frequency, amplitude, and speech rate are found in speakers involved in extreme levels of stress. In addition, it was found that the same changes appear to occur in a regular fashion within a more subtle level of stress that may be characteristic, for example, of routine flying situations. None of the individual speech measures performed as robustly as did heart rate.

  14. Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression.

    PubMed

    Nilsonne, A; Sundberg, J; Ternström, S; Askenfelt, A

    1988-02-01

    A method of measuring the rate of change of fundamental frequency has been developed in an effort to find acoustic voice parameters that could be useful in psychiatric research. A minicomputer program was used to extract seven parameters from the fundamental frequency contour of tape-recorded speech samples: (1) the average rate of change of the fundamental frequency and (2) its standard deviation, (3) the absolute rate of fundamental frequency change, (4) the total reading time, (5) the percent pause time of the total reading time, (6) the mean, and (7) the standard deviation of the fundamental frequency distribution. The method is demonstrated on (a) a material consisting of synthetic speech and (b) voice recordings of depressed patients who were examined during depression and after improvement.

  15. Spectral and temporal changes to speech produced in the presence of energetic and informational maskers.

    PubMed

    Cooke, Martin; Lu, Youyi

    2010-10-01

    Talkers change the way they speak in noisy conditions. For energetic maskers, speech production changes are relatively well-understood, but less is known about how informational maskers such as competing speech affect speech production. The current study examines the effect of energetic and informational maskers on speech production by talkers speaking alone or in pairs. Talkers produced speech in quiet and in backgrounds of speech-shaped noise, speech-modulated noise, and competing speech. Relative to quiet, speech output level and fundamental frequency increased and spectral tilt flattened in proportion to the energetic masking capacity of the background. In response to modulated backgrounds, talkers were able to reduce substantially the degree of temporal overlap with the noise, with greater reduction for the competing speech background. Reduction in foreground-background overlap can be expected to lead to a release from both energetic and informational masking for listeners. Passive changes in speech rate, mean pause length or pause distribution cannot explain the overlap reduction, which appears instead to result from a purposeful process of listening while speaking. Talkers appear to monitor the background and exploit upcoming pauses, a strategy which is particularly effective for backgrounds containing intelligible speech.

  16. Pronunciation difficulty, temporal regularity, and the speech-to-song illusion.

    PubMed

    Margulis, Elizabeth H; Simchy-Gross, Rhimmon; Black, Justin L

    2015-01-01

    The speech-to-song illusion (Deutsch et al., 2011) tracks the perceptual transformation from speech to song across repetitions of a brief spoken utterance. Because it involves no change in the stimulus itself, but a dramatic change in its perceived affiliation to speech or to music, it presents a unique opportunity to comparatively investigate the processing of language and music. In this study, native English-speaking participants were presented with brief spoken utterances that were subsequently repeated ten times. The utterances were drawn either from languages that are relatively difficult for a native English speaker to pronounce, or languages that are relatively easy for a native English speaker to pronounce. Moreover, the repetition could occur at regular or irregular temporal intervals. Participants rated the utterances before and after the repetitions on a 5-point Likert-like scale ranging from "sounds exactly like speech" to "sounds exactly like singing." The difference in ratings before and after was taken as a measure of the strength of the speech-to-song illusion in each case. The speech-to-song illusion occurred regardless of whether the repetitions were spaced at regular temporal intervals or not; however, it occurred more readily if the utterance was spoken in a language difficult for a native English speaker to pronounce. Speech circuitry seemed more liable to capture native and easy-to-pronounce languages, and more reluctant to relinquish them to perceived song across repetitions.

  17. Young children's preferences for listening rates.

    PubMed

    Leeper, H A; Thomas, C L

    1978-12-01

    A paired-comparison paradigm was utilized to determine the preferences of 20 young children for listening rate for prose speech. An electronic expansion/compression technique yielded nine rates of speech ranging from 100 wpm to 200 wpm, with intervals of 25 wpm. The results indicated that the children most preferred a listening rate of 200 wpm and least preferred a rate of 100 wpm. Comparisons of the present findings with preference rates of older, post-adolescent children and adults are discussed. Direction for further research with temporal alteration and linguistic constraints on the message are considered.

  18. Speech Timing and Working Memory in Profoundly Deaf Children after Cochlear Implantation.

    ERIC Educational Resources Information Center

    Burkholder, Rose A.; Pisoni, David B.

    2003-01-01

    Compared speaking rates, digit span, and speech timing in profoundly deaf 8- and 9-year-olds with cochlear implants and normal-hearing children. Found that deaf children displayed longer sentence durations and pauses during recall and shorter digit spans than normal-hearing children. Articulation rates strongly correlated with immediate memory…

  19. Speech Errors across the Lifespan

    ERIC Educational Resources Information Center

    Vousden, Janet I.; Maylor, Elizabeth A.

    2006-01-01

    Dell, Burger, and Svec (1997) proposed that the proportion of speech errors classified as anticipations (e.g., "moot and mouth") can be predicted solely from the overall error rate, such that the greater the error rate, the lower the anticipatory proportion (AP) of errors. We report a study examining whether this effect applies to changes in error…

  20. Quadcopter Control Using Speech Recognition

    NASA Astrophysics Data System (ADS)

    Malik, H.; Darma, S.; Soekirno, S.

    2018-04-01

    This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).

Top