Note: This page contains sample records for the topic objective perceptual video from Science.gov.
While these samples are representative of the content of Science.gov,
they are not comprehensive nor are they the most current set.
We encourage you to perform a real-time search of Science.gov
to obtain the most current and comprehensive results. Last update: November 12, 2013.
This paper firstly presents a study of video quality for H.264 compressed videos compared to MPEG-4 (simple profile) from a perceptual point of view. Traditionally, peak signal-to-noise ratio (PSNR) has been used to represent the quality of a compressed video sequence. However, PSNR has been found to correlate poorly with subjective quality ratings, partic- ularly at much lower bitrates and
Ee Ping Ong; Xiaokang Yang; Weisi Lin; Zhongkang Lu; Susu Yao; Xiao Lin; Susanto Rahardja; Choong Seng Boon
For industry, the need to access accurate and reliable objectivevideo metrics has become more pressing with the advent of new video applications and services such as mobile broadcasting, Internet video, and Internet Protocol television (IPTV). Industry-class objective quality- measurement models have a wide range of uses, including equipment testing (e.g., codec evaluation), transmission- planning and network-dimensioning tasks, head-end quality
Kjell Brunnström; David Hands; Filippo Speranza; Arthur Webster
In picture quality assessment, the amount of distortion perceived by a human observer differs from one region to another according to its particular local content. This subjective perception can be explained/predicted by considering some simple psychovisual properties (masking) of the Human Visual System (HVS). We have implemented a HVS model based on a pyramid decomposition for extracting the spatial frequencies, associated with a multi-resolution motion representation. Then the visibility of the decoded errors is computed by exploiting the Kelly's contrast sensitivity spatio-velocity model. The resulting data is called a 'Quality-map.' Special attention has been paid to temporal/moving effects since, in the case of video sequences, motion strongly influences the subjective quality assessment. The quality of the motion information is thus preponderant. In the second part, two possible uses of these psychovisual properties for improving MPEG video encoding performances are depicted: (1) The pre-processing of the pictures to remove non-visible information using a motion adapted filtering. This process is efficient in term of bits saved and degradation is not significant especially on consumer electronic TV sets. (2) A perceptual quantizer based on a local adaptation scheme in order to obtain Quality-maps as uniform as possible (homogeneous perceived distortion), at constant bit-rate. Further improvements have been considered, especially when the viewer is tracking a moving object in the scene.
Action video games have been shown to enhance behavioral performance on a wide variety of perceptual tasks, from those that require effective allocation of attentional resources across the visual scene, to those that demand the successful identification of fleetingly presented stimuli. Importantly, these effects have not only been shown in expert action video game players, but a causative link has
The growth of new imaging technologies has created a need for techniques that can be used for copyright protection of digital images and video. One approach for copyright protection is to introduce an invisible signal, known as a digital watermark, into an image or video sequence. In this paper, we describe digital watermarking techniques, known as perceptually based watermarks, that
Raymond B. Wolfgang; Christine I. Podilchuk; Edward J. Delp
|Two experimental series are reported using both reaction time (RT) and a data-limited perceptual report to examine the effects of perceptual load on object-based attention. Perceptual load was manipulated across 3 levels by increasing the complexity of perceptual judgments. Data from the RT-based experiments showed object-based effects when the…
We have developed a video processing method that achieves human perceptual visual quality-oriented video coding. The patterns of moving objects are modeled by considering the limited human capacity for spatial-temporal resolution and the visual sensory memory together, and an online moving pattern classifier is devised by using the Hedge algorithm. The moving pattern classifier is embedded in the existing visual saliency with the purpose of providing a human perceptualvideo quality saliency model. In order to apply the developed saliency model to video coding, the conventional foveation filtering method is extended. The proposed foveation filter can smooth and enhance the video signals locally, in conformance with the developed saliency model, without causing any artifacts. The performance evaluation results confirm that the proposed video processing method shows reliable improvements in the perceptual quality for various sequences and at various bandwidths, compared to existing saliency-based video coding methods. PMID:23247854
This paper discusses methods used in different objectivevideo quality metrics. An experimental comparison of different objective methods is also conducted. This experiment shows the importance of video content for a subjective quality evaluation not comprised well by the objective metrics used.
In this paper, we first present a video quality metric based on human vision. We then propose a new perceptualvideo local activity metric. These two metrics are of great interest for the optimization of both the transmission and the coding of video sequences. Consequently, next we motivate the use of these metrics in order to increase the end-user perceptual
A key function of the sense of smell is to guide organisms towards rewards and away from dangers. However, because relatively few volatile chemicals in the environment carry intrinsic biological value, the meaning of an odor often needs to be acquired through learning and experience. The tremendous perceptual and neural plasticity of the olfactory system provides a design that is ideal for the establishment of links between odor cues and behaviorally relevant events, promoting appropriate adaptive responses to foods, friends, foes, and mates. This article describes recent human neuroimaging data showing the dynamic effects of olfactory perceptual learning and aversive conditioning on the behavioral discrimination of odor objects, with parallel plasticity and reorganization in the posterior piriform and orbitofrontal cortices. The findings presented here highlight the important role of experience in shaping odor object perception and in ensuring the human sense of smell achieves its full perceptual potential.
This paper presents a novel Multiresolution, Perceptual and Vector Quantization (MPVQ) based video coding scheme. In the intra-frame\\u000a mode of operation, a wavelet transform is applied to the input frame and decorrelates it into its frequency subbands. The\\u000a coefficients in each detail subband are pixel quantized using a uniform quantization factor divided by the perceptual weighting\\u000a factor of that subband.
Akbar Sheikh Akbari; Pooneh Bagheri Zadeh; Tom Buggy; John Soraghan
A series of four experiments measured the transfer of perceptual learning in object recognition. Subjects viewed backward-masked, gray-scale images of common objects and practiced an object naming task for multiple days. In Experiment 1, recognition thresholds decreased on average by over 20% over 5 days of training but increased reliably following the transfer to a new set of objects. This suggests that the learning was specific to the practiced objects. Experiment 2 ruled out familiarity with strategies specific to the experimental context, such as stimulus discrimination, as the source of the improvement. Experiments 3 and 4 found that learning transferred across changes in image size. Learning could not be accounted for solely by an improvement in general perceptual abilities, nor by learning of the specific experimental context. Our results indicate that a large amount of learning took place in object-specific mechanisms that are insensitive to image size. PMID:10820606
It is generally recognized that severe video distortions that are transient in space and/or time have a large effect on overall perceived video quality. In order to understand this phenomena, we study the distribution of spatio-temporally local quality scores obtained from several video quality assessment (VQA) algorithms on videos suffering from compression and lossy transmission over communication channels. We propose a content adaptive spatial and temporal pooling strategy based on the observed distribution. Our method adaptively emphasizes "worst" scores along both the spatial and temporal dimensions of a video sequence and also considers the perceptual effect of large-area cohesive motion flow such as egomotion. We demonstrate the efficacy of the method by testing it using three different VQA algorithms on the LIVE Video Quality database and the EPFL-PoliMI video quality database. PMID:23008260
Park, Jincheol; Seshadrinathan, Kalpana; Lee, Sanghoon; Bovik, Alan Conrad
In this paper, a perceptual cryptography combining with JPEG2000 codec is proposed, which permutes different number of bit-planes in each code block, encrypts different number of coefficients' signs, and permutes code blocks in different frequency bands. Thus, the images or videos can be degraded to different degrees under the control of quality factor. Theoretical analyses and experimental results show that,
Naïve conceptions and associated misconceptions about object motion arise in part from limitations on perceptual experience. Certain commercial video games, such as Enigmo, provide interactive experience with realistic trajectories and practice at purposefully manipulating those trajectories. We tested the possibility that this experience could modify naïve intuitions about object motion, bringing them into closer alignment with Newtonian principles of mechanics.
Michael E. J. Masson; Daniel N. Bub; Christopher E. Lalonde
Evaluating the perceptual quality of video is of tremendous importance in the design and optimization of wireless video processing and transmission systems. In an endeavor to emulate human perception of quality, various objectivevideo quality assessment (VQA) algorithms have been developed. However, the only subjective video quality database that exists on which these algorithms can be tested is dated and
Anush Krishna Moorthy; Kalpana Seshadrinathan; Rajiv Soundararajan; Alan Conrad Bovik
We explore the use of tracked 2D object motion to enable novel approaches to interacting with video. These include moving annotations, video navigation by direct manipulation of objects, and creating an image composite from multiple video frames. Features in the video are automatically tracked and grouped in an off-line preprocess that enables later interactive manipulation. Examples of annotations include speech
Dan B. Goldman; Chris Gonterman; Brian Curless; David Salesin; Steven M. Seitz
In thi.s paper, a perceptual encpption algorithm is proposed ,fkr MPEG4 encoded videos. The algaru,.itlim enoyts MPEG4 stream seledivelv arid progrecsivels under the control oJqr~ulIr?, farttir. Bared on the encryption algorithm. a smitre video-oil-demand svstem is lmsented. Tlieoretical arialvsis and e-rpwimental resu1t.s show !hat. die encvption algorithm is of high secnri~v, law cost, and si1pport.y direct hit-rate control. and the
Shiguo Lian; Dengpan Ye; Jinsheng Sun; Zhiquan Wang
The development of image and video archives has made multimedia retrieval become important. Hence, an efficient system for multimedia retrieval is needed. In this study, we will propose a content-based videoobject retrieval system to retrieve both moving and stationary objects. First, users must circumscribe a query object by a bounding box on the video, image, or poster. The central
The new video coding standard MPEG-4 is enabling content-based functionalities. It takes advantage of a prior decomposition of sequences into videoobject planes (VOPs) so that each VOP represents one moving object. A comprehensive review summarizes some of the most important motion segmentation and VOP generation techniques that have been proposed. Then, a new automatic video sequence segmentation algorithm that
This paper proposes a game theoretical rate control technique for video compression. Using a cooperative gaming approach, which has been utilized in several branches of natural and social sciences because of its enormous potential for solving constrained optimization problems, we propose a dual-level scheme to optimize the perceptual quality while guaranteeing "fairness" in bit allocation among macroblocks. At the frame level, the algorithm allocates target bits to frames based on their coding complexity. At the macroblock level, the algorithm distributes bits to macroblocks by defining a bargaining game. Macroblocks play cooperatively to compete for shares of resources (bits) to optimize their quantization scales while considering the Human Visual System"s perceptual property. Since the whole frame is an entity perceived by viewers, macroblocks compete cooperatively under a global objective of achieving the best quality with the given bit constraint. The major advantage of the proposed approach is that the cooperative game leads to an optimal and fair bit allocation strategy based on the Nash Bargaining Solution. Another advantage is that it allows multi-objective optimization with multiple decision makers (macroblocks). The simulation results testify the algorithm"s ability to achieve accurate bit rate with good perceptual quality, and to maintain a stable buffer level.
This study builds on a body of research in sports science that has used video as a means of measuring and training perceptual and decision- making skills in a variety of sports. Expert-novice studies using a video occlusion method have shown that expert athletes are able to make bet- ter and earlier recognition of an opponent's action, such as a
In a number of studies the context provided by a real-world scene has been claimed to have a mandatory, perceptual effect on the identification of individual objects in such a scene. This claim has provided a basis for challenging widely accepted data-driven models of visual perception in order to advocate alternative models with an outspoken top-down character. The present paper
Peter De Graef; Dominiek Christiaens; Géry d'Ydewalle
We describe a method for automatically obtaining object representations suitable for retrieval from generic video shots. The object representation consists of an association of frame regions. These regions provide exemplars of the object's possible visual appearances. Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating regions from the multiple visual aspects
Josef Sivic; Frederik Schaffalitzky; Andrew Zisserman
Studies concerning how the brain might represent objects by means of a perceptual space have primarily focused on the visual domain. Here we want to show that the haptic modality can equally well recover the underlying structure of a physical object space, forming a perceptual space that is highly congruent to the visual perceptual space. By varying three shape parameters
Quality assessment is a very challenging problem and will still as is since it is difficult to define universal tools. So, subjective assessment is one adapted way but it is tedious, time consuming and needs normalized room. Objective metrics can be with reference, with reduced reference and with no-reference. This paper presents a study carried out for the development of a no-reference objective metric dedicated to the quality evaluation of display devices. Initially, a subjective study has been devoted to this problem by asking a representative panel (15 male and 15 female; 10 young adults, 10 adults and 10 seniors) to answer questions regarding their perception of several criteria for quality assessment. These quality factors were hue, saturation, contrast and texture. This aims to define the importance of perceptual criteria in the human judgment of quality. Following the study, the factors that impact the quality evaluation of display devices have been proposed. The development of a no-reference objective metric has been performed by using statistical tools allowing to separate the important axes. This no-reference metric based on perceptual criteria by integrating some specificities of the human visual system (HVS) has a high correlation with the subjective data.
We describe a method for automatically obtaining object representations suitable for retrieval from generic video shots. The\\u000a object representation consists of an association of frame regions. These regions provide exemplars of the object’s possible\\u000a visual appearances.\\u000a \\u000a Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating regions\\u000a from the multiple visual aspects
Josef Sivic; Frederik Schaffalitzky; Andrew Zisserman
This paper presents an extensible architectural model for general content-based analysis and indexing of video data which can be customised for a given problem domain. Video interpretation is ap- proached as a joint inference problems which can be solved through the use of modern machine learning and probabilistic inference techniques. An important aspect of the work concerns the use of
Motion estimation is one of the important procedures in the all video encoders. Most of the complexity of the video coder depends on the complexity of the motion estimation step. The original motion estimation algorithm has a remarkable complexity and therefore many improvements were proposed to enhance the crude version of the motion estimation. The basic idea of many of
Amin Banitalebi; Said Nader-Esfahani; Alireza Nasiri Avanaki
This study tested the effect of screen size on the decision making performance of experienced and inexperienced basketball players during a video?based perceptual decision making task. Participants were 13 elite, 25 intermediate, and 34 novice participants who viewed 30 structured sequences of basketball games twice for between 4 to 6, once on a 43 cm (17 in.) computer monitor and
Concrete objects are used to help children understand math concepts. Research suggests that perceptually rich objects may hinder children's performance on math tasks relative to bland objects. However, previous studies have confounded the perceptual richness of objects with children's established knowledge of the objects. The present study examined how these two factors influence children's developing counting skill. Children (M age
T his paper reports on tracking of multiple objects using color histogram backprojection and motion cues. Four tasks which facilitate this are discussed. The first is an adaptive color histogram backprojection (which builds upon the works of Swain and Ballard) and its application to tracking of multiple objects in video sequences. The second task is designing eÅcient fast blob detectors
Two synchronized cameras are utilized to obtain independent video streams to detect moving objects from two different viewing angles. The video frames are directly correlated in time. Moving objects in image frames from the two cameras are identified and tagged for tracking. One advantage of such a system involves overcoming effects of occlusions that could result in an object in partial or full view in one camera, when the same object is fully visible in another camera. Object registration is achieved by determining the location of common features in the moving object across simultaneous frames. Perspective differences are adjusted. Combining information from images from multiple cameras increases robustness of the tracking process. Motion tracking is achieved by determining anomalies caused by the objects' movement across frames in time in each and the combined video information. The path of each object is determined heuristically. Accuracy of detection is dependent on the speed of the object as well as variations in direction of motion. Fast cameras increase accuracy but limit the speed and complexity of the algorithm. Such an imaging system has applications in traffic analysis, surveillance and security, as well as object modeling from multi-view images. The system can easily be expanded by increasing the number of cameras such that there is an overlap between the scenes from at least two cameras in proximity. An object can then be tracked long distances or across multiple cameras continuously, applicable, for example, in wireless sensor networks for surveillance or navigation.
When people take videos, they always want to capture intended objects, which are essential for presenting what they want to express in their videos, and to share the intended objects with others. The concept of intended objects provide a novel perspective for video content analysis, and detecting intended objects may be beneficial for wide range of applications such as video
The past few years have seen a dramatic request for semantic video analysis. Object based interpretation in real-time imposes\\u000a increased challenges on resource management to maintain sufficient quality of service, and requires careful design of the\\u000a system architecture. This paper focuses on the role of context for system performance in a multi-stage object detection process.\\u000a We extract context from simple
In this paper, we discuss a region-based perceptual quality-regulable H.264 video encoder system that we developed. The ability to adjust the quality of specific regions of a source video to a predefined level of quality is an essential technique for region-based video applications. We use the structural similarity index as the quality metric for distortion-quantization modeling and develop a bit allocation and rate control scheme for enhancing regional perceptual quality. Exploiting the relationship between the reconstructed macroblock and the best predicted macroblock from mode decision, a novel quantization parameter prediction method is built and used to achieve the target video quality of the processed macroblock. Experimental results show that the system model has only 0.013 quality error in average. Moreover, the proposed region-based rate control system can encode video well under a bitrate constraint with a 0.1% bitrate error in average. For the situation of the low bitrate constraint, the proposed system can encode video with a 0.5% bit error rate in average and enhance the quality of the target regions. PMID:23542953
Perceptual learning has widely been claimed to be attention driven; attention assists in choosing the relevant sensory information and attention may be necessary in many cases for learning. In this paper, we focus on the interaction of perceptual learning and attention – that perceptual learning can reduce or eliminate the limitations of attention, or, correspondingly, that perceptual learning depends on the attention condition. Object attention is a robust limit on performance. Two attributes of a single attended object may be reported without loss, while the same two attributes of different objects can exhibit a substantial dual-report deficit due to the sharing of attention between objects. The current experiments document that this fundamental dual-object report deficit can be reduced, or eliminated, through perceptual learning that is partially specific to retinal location. This suggests that alternative routes established by practice may reduce the competition between objects for processing resources.
The extent to which an irrelevant distractor is processed during selective attention depends critically on the level of perceptual load on the relevant task. Here we show that perceptual load also affects the tendency of graspable objects to afford associated actions. Participants carried out a letter-search task and identified a target letter with the right or left hand while ignoring a graspable object with a handle oriented on the left or the right side of the object. The target letter was presented either on its own (low perceptual load) or alongside five nontarget letters (high load). Responses were faster when the action afforded by the ignored object was compatible (vs. incompatible) with the current target response, but only when the perceptual load of the letter search task was low. This finding is the first to demonstrate the role of perceptual load in action affordances by ignored objects. PMID:22806449
Murphy, Sandra; van Velzen, José; de Fockert, Jan W
A system simulation model was used to create scene-dependent noise masks that reflect current performance of mobile phone cameras. Stimuli with different overall magnitudes of noise and with varying mixtures of red, green, blue, and luminance noises were included in the study. Eleven treatments in each of ten pictorial scenes were evaluated by twenty observers using the softcopy ruler method. In addition to determining the quality loss function in just noticeable differences (JNDs) for the average observer and scene, transformations for different combinations of observer sensitivity and scene susceptibility were derived. The psychophysical results were used to optimize an objective metric of isotropic noise based on system noise power spectra (NPS), which were integrated over a visual frequency weighting function to yield perceptually relevant variances and covariances in CIE L*a*b* space. Because the frequency weighting function is expressed in terms of cycles per degree at the retina, it accounts for display pixel size and viewing distance effects, so application-specific predictions can be made. Excellent results were obtained using only L* and a* variances and L*a* covariance, with relative weights of 100, 5, and 12, respectively. The positive a* weight suggests that the luminance (photopic) weighting is slightly narrow on the long wavelength side for predicting perceived noisiness. The L*a* covariance term, which is normally negative, reflects masking between L* and a* noise, as confirmed in informal evaluations. Test targets in linear sRGB and rendered L*a*b* spaces for each treatment are available at http://www.aptina.com/ImArch/ to enable other researchers to test metrics of their own design and calibrate them to JNDs of quality loss without performing additional observer experiments. Such JND-calibrated noise metrics are particularly valuable for comparing the impact of noise and other attributes, and for computing overall image quality.
Keelan, Brian W.; Jin, Elaine W.; Prokushkin, Sergey
As a step towards a perceptual user interface, an object tracking algorithm is developed and demonstrated tracking human faces. Computer vision algorithms that are intended to form part of a perceptual user interface must be fast and efficient. They must be able to track in real time and yet not absorb a major share of computational resources. An efficient, new
In the intelligence community, aerial video has become one of the fastest growing data sources and it has been extensively used in intelligence, surveillance, reconnaissance, tactical and security applications. This paper presents a tracking approach to detect moving vehicles and person in such videos taken from aerial platform. In our approach, we combine the layer segmentation approach with background stabilization and post-tracking refinement to reliably detect small moving objects at the relatively low processing speed. For each individual moving object, a corresponding layer is created to maintain an independent appearance and motion model during the tracking process. After the online tracking process, we apply a post-tracking refinement process to link the track fragments into a long consistent track ID to further reduce false alarm and increase detection rate. Furthermore, a vehicle and person classifier is also integrated into the approach to identify the moving object categories. The classifier is based on image histogram of gradient (HOG), which is more reliable to illumination variation or camera automatic gain change. Finally, we report the results of our algorithms on a large scale of EO and IR data set collected from VIVID program, and the results show that our approach achieved a good and stable tracking performance on the data set that is more than eight hours.
In the last years, the development of novel video coding technologies has spurred the interest in developing digital video communications. The definition of evaluation mechanisms to assess the quality of video will play a major role in the overall design of video communication systems. It is well-known that simple energy based metrics such as the Peak Signal Noise Ratio (PSNR)
José Luis Martínez; Pedro Cuenca; Francisco Delicado; Francisco Quiles
In order to support content-based video database access over the Internet Protocol (IP), achieving the following objectives are important: (i) video query by a representative object (key object) or some statistical characterization of the target contents, (ii) bandwidth-efficient browsing over IP, and (iii) scalable and user-centric video transmission over a heterogeneous and variable-bandwidth network. We present a videoobject extraction
The new video coding standard MPEG-4 is enabling content-based functionalities as well as high coding efficiency considering shape information of moving objects. A novel algorithm for segmentation of moving objects in video sequences and VOP (videoobject planes) extraction is presented. This algorithm begins with a robust double edge map from the difference between two successive frames. After removing edges
A novel video retrieval tool, based on videoobjects (VOs) and objects trajectories is presented. The algorithm extends the concept of edge potential functions (EPF), already used in shape-based image retrieval, tailored to work on shapes extracted from videoobject planes (VOP). First, the initial model is detected by using local motion and global motion information. After the moving object
Minh-Son Dao; Francesco G. B. DeNatale; Andrea Massa
Perceptual learning is an improvement in perceptual task performance reflecting plasticity in the perceptual system. Practice effects were studied in two object orientation tasks: a first order, luminance object task and a second-order, texture object task. Perceptual learn- ing was small or absent in the first-order task, but consistently occurred for the second-order (texture) task, where it was limited to
Most global motion estimation (GME) methods are oriented to video coding while videoobject segmentation methods either assume no global motion (GM) or directly adopt a coding-oriented method to compensate for GM. This paper proposes a hierarchical differential GME method oriented to videoobject segmentation. A scheme which combines three-step search and motion parameters prediction is proposed for initial estimation
In order to quantitatively study object perception, be it perception by biological systems or by machines, one needs to create objects and object categories with precisely definable, preferably naturalistic, properties1. Furthermore, for studies on perceptual learning, it is useful to create novel objects and object categories (or object classes) with such properties2. Many innovative and useful methods currently exist for creating novel objects and object categories3-6 (also see refs. 7,8). However, generally speaking, the existing methods have three broad types of shortcomings. First, shape variations are generally imposed by the experimenter5,9,10, and may therefore be different from the variability in natural categories, and optimized for a particular recognition algorithm. It would be desirable to have the variations arise independently of the externally imposed constraints. Second, the existing methods have difficulty capturing the shape complexity of natural objects11-13. If the goal is to study natural object perception, it is desirable for objects and object categories to be naturalistic, so as to avoid possible confounds and special cases. Third, it is generally hard to quantitatively measure the available information in the stimuli created by conventional methods. It would be desirable to create objects and object categories where the available information can be precisely measured and, where necessary, systematically manipulated (or 'tuned'). This allows one to formulate the underlying object recognition tasks in quantitative terms. Here we describe a set of algorithms, or methods, that meet all three of the above criteria. Virtual morphogenesis (VM) creates novel, naturalistic virtual 3-D objects called 'digital embryos' by simulating the biological process of embryogenesis14. Virtual phylogenesis (VP) creates novel, naturalistic object categories by simulating the evolutionary process of natural selection9,12,13. Objects and object categories created by these simulations can be further manipulated by various morphing methods to generate systematic variations of shape characteristics15,16. The VP and morphing methods can also be applied, in principle, to novel virtual objects other than digital embryos, or to virtual versions of real-world objects9,13. Virtual objects created in this fashion can be rendered as visual images using a conventional graphical toolkit, with desired manipulations of surface texture, illumination, size, viewpoint and background. The virtual objects can also be 'printed' as haptic objects using a conventional 3-D prototyper. We also describe some implementations of these computational algorithms to help illustrate the potential utility of the algorithms. It is important to distinguish the algorithms from their implementations. The implementations are demonstrations offered solely as a 'proof of principle' of the underlying algorithms. It is important to note that, in general, an implementation of a computational algorithm often has limitations that the algorithm itself does not have. Together, these methods represent a set of powerful and flexible tools for studying object recognition and perceptual learning by biological and computational systems alike. With appropriate extensions, these methods may also prove useful in the study of morphogenesis and phylogenesis.
Hauffen, Karin; Bart, Eugene; Brady, Mark; Kersten, Daniel; Hegde, Jay
|The goal of the video in the classroom video project at an elementary school was to increase student achievement by making learning come alive to students, have learners actively engaged in the process of learning through authentic learning experiences, and make that learning have meaning to their lives. Fifth grade students created group video…
Objective assessment of visual comfort for stereoscopic video is of great importance for stereoscopic image safety issue. We propose a novel visual comfort assessment metric framework that systematically exploits human visual attention models. In a stereoscopic video shot, perceptually significant regions where human subjects pay more attention are likely to play an essential role in determining the overall level of visual comfort. As a specific example of this concept, we develop a visual comfort metric that quantifies the level of visual discomfort caused by fast salient object motion. The performance of the proposed visual comfort metric has been evaluated using natural stereoscopic videos. The experimental results show that the proposed visual comfort metric significantly improves the correlations with subjective judgment.
Ju Jung, Yong; Lee, Seong-il; Sohn, Hosik; Wook Park, Hyun; Man Ro, Yong
In image and video compression and transmission, it is important to rely on an objective image/video quality metric which accurately represents the subjective quality of processed images and video sequences. In some scenarios, it is also important to evaluate the quality of the received video sequence with minimal reference to the transmitted one. For instance, for quality improvement of video transmission through closed-loop optimisation, the video quality measure can be evaluated at the receiver and provided as feedback information to the system controller. The original image/video sequence--prior to compression and transmission--is not usually available at the receiver side, and it is important to rely at the receiver side on an objectivevideo quality metric that does not need reference or needs minimal reference to the original video sequence. The observation that the human eye is very sensitive to edge and contour information of an image underpins the proposal of our reduced reference (RR) quality metric, which compares edge information between the distorted and the original image. Results highlight that the metric correlates well with subjective observations, also in comparison with commonly used full-reference metrics and with a state-of-the-art RR metric.
Martini, Maria G.; Villarini, Barbara; Fiorucci, Federico
In the intelligence community, aerial video has become one of the fastest growing data sources and it has been extensively used in intelligence, surveillance, reconnaissance, tactical and security applications. This paper presents a tracking approach to detect moving vehicles and person in such videos taken from aerial platform. In our approach, we combine the layer segmentation approach with background stabilization
Jiangjian Xiao; Hui Cheng; Han Feng; Changjiang Yang
The problem of automatic video restoration and object removal attract the attention of many researchers. In this paper we present a new framework for video inpainting. We consider the case when a camera motion is approximately parallel to the plane of image projection. The scene may consist of a stationary background with a moving foreground, both of which may require inpainting. Moving objects can move differently, but should not to change their size. A framework presented in this paper contains the following steps: moving objects identification, moving objects tracking and background/foreground segmentation, inpainting and, finally, a video rendering. Some results on test video sequence processing are presented.
Frantc, V. A.; Voronin, V. V.; Marchuck, V. I.; Egiazarian, K. O.
ABSTRACT We propose measures to evaluate the performance of videoobject segmentation and tracking methods quantitatively without ground-truth segmentation maps. The proposed measures are based on spatial difierences of color and motion along the boundary of the estimated videoobject plane and temporal difierences between the color histogram of the current object plane and its neighbors. They can be used
Çigdem Eroglu Erdem; Bülent Sankur; A. Murat Tekalp
We address the problem of recognizing the pose of an object category from video sequences capturing the object under small camera movements. This scenario is relevant in applications such as robotic object manipulation or autonomous navigation. We introdu...
|Five experiments examined what is learned based on the perceptual and semantic information of objects in visual statistical learning (VSL). In the familiarization phase, participants viewed a sequence of line drawings and detected repetitions of various objects. In a subsequent test phase, they watched 2 test sequences (statistically related…
Otsuka, Sachio; Nishiyama, Megumi; Nakahara, Fumitaka; Kawaguchi, Jun
Two studies of the exploratory behaviour of preschool children and first grade elementary school children using Hutt's novel object are reported. The novel object was a box with a movable level. Manipulating the lever released sound and light effects from the box. The task was such that manipulatory behaviour dominated other forms of exploration, like perceptual investigation and asking questions.
Klaus Schneider; Matthias Moch; Rita Sandfort; Monika Auerswald; Karin Walther-Weckman
Abstract Our aim is to insert depth information into an existing 2D video sequence to provide content for 3D-TV applications, which we try to achieve through segmentation of the objects in the given 2D video sequence. To this effect, we present a method for temporal stabilization of videoobject segmentation algorithms for 3D-TV applications. First, two quantitative measures to evaluate
Çigdem Eroglu Erdem; Fabian Ernst; André Redert; Emile A. Hendriks
Our aim is to insert depth information into an existing 2D video sequence to provide content for 3D-TV applications, which we try to achieve through segmentation of the objects in the given 2D video sequence. To this effect, we present a method for temporal stabilization of videoobject segmentation algorithms for 3D-TV applications. First, two quantitative measures to evaluate temporal
Our aim is to insert depth information into an existing 2D video sequence to provide content for 3D-TV applications, which we try to achieve through segmentation of the objects in the given 2D video sequence. To this effect, we present a method for temporal stabilization of videoobject segmentation algorithms for 3D-TV applications. First, two quantitative measures to evaluate temporal
In this paper we investigate performance metrics for quantitative evaluation of object-based video segmentation algorithms. The metrics address the case when ground-truth videoobject planes are available. The proposed metrics are used to evaluate three essentially different approaches for video segmentation, i.e., an edge-based (1), a motion clustering based (2), and a total feature vector clustering based (3) algorithm.
Seeing an object on one occasion may facilitate or prime processing of the same object if it is later again encountered. Such priming may also be found — but at a reduced level — for different but perceptually similar objects that are alternative exemplars or ‘tokens’ of the initially presented object. We explored the neural correlates of this perceptual specificity
W. Koutstaal; A. D. Wagner; M. Rotte; A. Maril; R. L. Buckner; D. L. Schacter
\\u000a Objectivevideo quality assessment (VQA) refers to evaluation of the quality of a video by an algorithm. The performance of\\u000a any such VQA algorithm is gaged by how well the algorithmic scores correlate with human perception of quality. Research in\\u000a the area of VQA has produced a host of full-reference (FR) VQA algorithms. FR VQA algorithms are those in which
This paper addresses the problem of objectively measur- ing quality in free-viewpoint video production. The accuracy of scene reconstruction is typically limited and an evaluation of free-viewpoint video should explicitly consider the quality of image production. A simple objective measure of accuracy is presented in terms of structural registration error in view synthesis. This technique can be applied as a
Joe Kilner; Jonathan Starck; Jean-Yves Guillemaut; Adrian Hilton
This study examined how aging affects the spatial patterns of repetition effects associated with perceptual priming of unfamiliar visual objects. Healthy young (N=14) and elderly adults (N=13) viewed four repetitions of structurally possible and impossible figures while being scanned with BOLD fMRI. Although explicit recognition memory for the figures was reduced in the elder subjects, repetition priming did not differ
Anja Soldan; Yunglin Gazes; H. John Hilton; Yaakov Stern
In this paper, we describe an efficient video indexing scheme based on motion behavior of videoobjects for fast content-based browsing and retrieval in a video database. The proposed novel method constructs a dictionary of prototype objects. The first step in our approach extracts moving objects by analyzing layered images constructed from the coarse data in a 3D wavelet decomposition of the video sequence. These images capture motion information only. Moving objects are modeled as collections of interconnected rigid polygonal shapes in the motion sequences that we derive from the wavelet representation. The motion signatures of the object are computed from the rotational and translational motions associated to the elemental polygons that form the objects. These signatures are finally stored as potential query terms.
Physics teachers have long employed video clips to study moving objects in their classrooms and instructional labs. A number of approaches exist, both free and commercial, for tracking the coordinates of a point using video. The main characteristics of the method described in this paper are: it is simple to use; coordinates can be tracked using…
This paper proposes an objectivevideo quality metric based on an analysis of spatial and temporal distortions. Spatial quality features extracted from the spatiotemporal region of reference and distorted videos are used to express the spatial distortion. Temporal distortion, caused by frame freezing resulting from a packet loss, is derived from the spatial distortion before and after the frozen frames.
While recent studies have shed light on the mechanisms that generate gamma (>40 Hz) oscillations, the functional role of these oscillations is still debated. Here we suggest that the purported mechanism of gamma oscillations (feedback inhibition from local interneurons), coupled with lateral connections implementing “Gestalt” principles of object integration, naturally leads to a decomposition of the visual input into object-based “perceptual cycles,” in which neuron populations representing different objects within the scene will tend to fire at successive cycles of the local gamma oscillation. We describe a simple model of V1 in which such perceptual cycles emerge automatically from the interaction between lateral excitatory connections (linking oriented cells falling along a continuous contour) and fast feedback inhibition (implementing competitive firing and gamma oscillations). Despite its extreme simplicity, the model spontaneously gives rise to perceptual cycles even when faced with natural images. The robustness of the system to parameter variation and to image complexity, together with the paucity of assumptions built in the model, support the hypothesis that perceptual cycles occur in natural vision.
The video coding standard MPEG-4 is enabling content-based functionalities by the introduction of videoobject planes (VOP's) which represent semantically meaningful objects. In this paper, a novel fast, unsupervised semantic segmentation scheme is presented for stereoscopic sequences, which utilizes the provided depth information. Each stereo pair is first analyzed and the disparity field and occluded areas are estimated. Then a
Klimis S. Ntalianis; Nikolaos D. Doulamis; Anastasios D. Doulamis; Stefanos D. Kollias
D video games use texture maps to improve the realism and the visual detail of graphical objects without significantly increasing rendering complexity. The general trend in the gaming industry towards the use of large texture maps has lead to the use of texture compression techniques. Current color texture compression techniques treat all texture content uniformly which can lead to reduced
This paper presents an architecture and methodology for high-speed activity-based digital video indexing and retrieval for physical security applications. State of the art computer vision algorithms detect objects in real-time video and determine basic activity information such as object type (human, vehicle, etc), object trajectory, and interactions with other objects. This information is encoded as a light-weight stream of activity-based
Previous research has clearly demonstrated action video game improvements in visual and spatial attention. The present study investigated action video game related changes in the resolution of representations for both dynamic and stationary objects by comparing video game players (VGP) and non-video game players (NVGP). In a color wheel task (adapted from Zhang & Luck, 2008) where viewers were asked to freely recall the color of briefly presented objects, we found that VGPs were more accurate than NVGPs. Furthermore, in the Multiple Identity Tracking task (Horowitz et al., 2007), we found that VGPs were able to track not only more objects but also maintain identity of tracked objects better than NVGPs. Finally, we demonstrated that VGPs had greater attentional breadth and higher spatial representation resolution. PMID:22266223
The most crucial backend algorithm of the new MPEG-4 standard is the computational expensive rendering of arbitrary shaped videoobjects to the final video scene. This co-processor architecture presents a solution for the scene rendering of the CCIR 601 video format with an arbitrary number of videoobjects. For the very high data bandwidth rate a hierarchical memory concept has been implemented. The total size of all rendered objects for one scene may reach two times the size of the CCIR 601 format. Running at 100 MHz clock frequency, the co-processor achieves a peak performance at about two billion multiply- accumulate operations. The co-processor has been designed for a 0.35 micrometers CMOS technology. About 60% of the overall area of 52 mm2 is used for on-chip static memory. The power consumption of the co-processor has been estimated with 1 W.
ABSTRACT This paper proposes a scrambling method,for Motion JPEG (MJ) videos and moving,objects (MOs) detection from scrambled videos. In the proposed method, both scrambling and MOs detection utilize the property of the positive and negative sign of discrete cosine trans- formed (DCT) coefficients. Since a DCT sign is encoded separately from its corresponding magnitude, the sign is processed without de-
A novel content-aware warping approach is introduced for video retargeting. The key to this technique is adapting videos to fit displays with various aspect ratios and sizes while preserving both visually salient content and temporal coherence. Most previous studies solve this spatiotemporal problem by consistently resizing content in frames. This strategy significantly improves the retargeting results, but does not fully consider object preservation, sometimes causing apparent distortions on visually salient objects. We propose an object-preserving warping scheme with object-based significance estimation to reduce this unpleasant distortion. In the proposed scheme, visually salient objects in 3D space-time space are forced to undergo as-rigid-as-possible warping, while low-significance objects are warped as close as possible to linear rescaling. These strategies enable our method to consistently preserve both the spatial shapes and temporal motions of visually salient objects, and avoid over-deformations on low-significance objects, yielding a pleasing motion-aware video retargeting. Qualitative analyses, including a user study with chi-squared test, and experiments on complex videos containing diverse cameras and dynamic motions, show a clear superiority of our method over state-of-the-art video retargeting methods. PMID:23609060
The representation of video information in terms of its content is at the foundation of many multimedia applications, such as broadcasting, content-based information retrieval, interactive video, remote surveillance and entertainment. In particular, object-based representation consists in decomposing the video content into a collection of meaningful objects. This approach offers a broad range of capabilities in terms of access, manipulation and interaction with the visual content. The basic difference when compared with pixel-based procedures is that instead of processing individual pixels, image objects are used in the representation. To exploit the benefits of object-based representation, multimedia applications need automatic techniques for extracting such objects from video data, a problem that still remains largely unsolved. In this paper, we first review the extraction techniques that enable the separation of foreground objects from the background. Their field of applicability and their limitations are discussed. Next, automatic tools for evaluating their performances are introduced. The major applications that benefit from an object-based approach are then analysed. Finally, we discuss some open research issues in object-based video.
Introduces a novel approach for rigid object pose estimation. The system rotates a reference frame of the object of interest until it reaches a view at which the rotated reference view and the unknown-pose view seem to be “similar”. A number of pose similarity measures were tested for different types of objects undergoing various amounts of rotation from the reference
Traditional video scene analysis depends on accurate background modeling to identify salient foreground objects. However, in many important surveillance applications, saliency is defined by the appearance of a new non-ephemeral object that is between the foreground and background. This midground realm is defined by a temporal window following the object's appearance; but it also depends on adaptive background modeling to
Brian Valentine; Senyo Apewokin; Linda M. Wills; D. Scott Wills; Antonio Gentile
A new fuzzy moving object segmentation algorithm for video sequences is pre- sented in this paper. Our proposed efficient object segmentation algorithm consists of three steps, namely the spatial segmentation step, the temporal tracking step, and the step for identifying the moving object from the frame in a fuzzy way. Especially, our pro- posed algorithm can robustly distinguish the foreground
Kuo-liang Chung; Shih-wei Yu; Hsueh-ju Yeh; Yong-huai Huang; Ta-jen Yao
This paper describes a system that produces an object-based rep- resentation of a video shots composed by a background (still) mo- saic and moving objects. Segmentation of moving objects is based on ego-motion compensation and on background modeling using tools from robust statistics. Region matching is carried out by an algorithm that operates on the Mahalanobis distance between re- gion
A new method to assess the presence and the strength of the blurring artifact in video frames, without using a reference ideal\\u000a image, is presented. The estimation is performed first through a global and simple measure over the whole picture, then through\\u000a a finer, local analysis of the sharpness of the objects borders. The subjective relevance of the scene content
Classification of moving objects in imaging through long-distance atmospheric path may be affected by distortions such as blur and spatiotemporal movements caused by air turbulence. This work aims to study and quantify the effects of these distortions on the ability to classify moving objects in atmospherically degraded video signals. For this purpose, we perform simulations and examine real long-range thermal video cases. In the simulation, we evaluate various geometrical (shape-based) object features for classification at different distortion levels. Furthermore, we examine the influence of image restoration on the classification performances in the real-degraded videos, using geometrical and textural features (combined and in separate) of the objects. Principal component analysis together with both k-nearest neighbor and support vector machines is used for the classification process. Results show how classification performances decrease as the level of blur increases, and how successful digital image restoration for real cases can significantly improve the classification performances.
In this paper, a classification method of four moving objects including vehicle, human, motorcycle and bicycle in surveillance video was presented by using machine learning idea. The method can be described as three steps: feature selection, training of Support Vector Machine(SVM) classifier and performance evaluation. Firstly, a feature vector to represent the discriminabilty of an object is described. From the
This paper presents an automatic algorithm for segmenting and extracting moving objects suitable for indoor and outdoor video applications, where the background scene can be captured beforehand. Since edge detection is often used to extract accurate boundaries of the image's objects, the first step in our algorithm is accomplished by combining two edge maps that are detected from the frames
In this paper, we present an efficient technique for unsupervised semantically meaningful object segmentation of stereoscopic video sequences. Using this technique we extract semantic objects using the additional information a stereoscopic pair of frames provides. Each pair is analyzed and the disparity field, occluded areas and depth map are estimated. The key algorithm, which is applied on the stereo pair
Anastasios D. Doulamis; Nikolaos D. Doulamis; Klimis S. Ntalianis; Stefanos D. Kollias
Natural consonant vowel syllables are reliably classified by most listeners as voiced or voiceless. However, our previous research (Liederman et al., 2005) suggests that among synthetic stimuli varying systematically in voice onset time (VOT), syllables that are classified reliably as voiceless are nonetheless perceived differently within and between listeners. This perceptual ambiguity was measured by variation in the accuracy of matching two identical stimuli presented in rapid succession. In the current experiment, we used magnetoencephalography (MEG) to examine the differential contribution of objective (i.e., VOT) and subjective (i.e., perceptual ambiguity) acoustic features on speech processing. Distributed source models estimated cortical activation within two regions of interest in the superior temporal gyrus (STG) and one in the inferior frontal gyrus. These regions were differentially modulated by VOT and perceptual ambiguity. Ambiguity strongly influenced lateralization of activation; however, the influence on lateralization was different in the anterior and middle/posterior portions of the STG. The influence of ambiguity on the relative amplitude of activity in the right and left anterior STG activity depended on VOT, whereas that of middle/posterior portions of the STG did not. These data support the idea that early cortical responses are bilaterally distributed whereas late processes are lateralized to the dominant hemisphere and support a “how/what” dual-stream auditory model. This study helps to clarify the role of the anterior STG, especially in the right hemisphere, in syllable perception. Moreover, our results demonstrate that both objective phonological and subjective perceptual characteristics of syllables independently modulate spatiotemporal patterns of cortical activation.
Frye, Richard E.; Fisher, Janet McGraw; Witzel, Thomas; Ahlfors, Seppo P.; Swank, Paul; Liederman, Jacqueline; Halgren, Eric
Natural consonant-vowel syllables are reliably classified by most listeners as voiced or voiceless. However, our previous research [Liederman, J., Frye, R., Fisher, J.M., Greenwood, K., Alexander, R., 2005. A temporally dynamic context effect that disrupts voice onset time discrimination of rapidly successive stimuli. Psychon Bull Rev. 12, 380-386] suggests that among synthetic stimuli varying systematically in voice onset time (VOT), syllables that are classified reliably as voiceless are nonetheless perceived differently within and between listeners. This perceptual ambiguity was measured by variation in the accuracy of matching two identical stimuli presented in rapid succession. In the current experiment, we used magnetoencephalography (MEG) to examine the differential contribution of objective (i.e., VOT) and subjective (i.e., perceptual ambiguity) acoustic features on speech processing. Distributed source models estimated cortical activation within two regions of interest in the superior temporal gyrus (STG) and one in the inferior frontal gyrus. These regions were differentially modulated by VOT and perceptual ambiguity. Ambiguity strongly influenced lateralization of activation; however, the influence on lateralization was different in the anterior and middle/posterior portions of the STG. The influence of ambiguity on the relative amplitude of activity in the right and left anterior STG activity depended on VOT, whereas that of middle/posterior portions of the STG did not. These data support the idea that early cortical responses are bilaterally distributed whereas late processes are lateralized to the dominant hemisphere and support a "how/what" dual-stream auditory model. This study helps to clarify the role of the anterior STG, especially in the right hemisphere, in syllable perception. Moreover, our results demonstrate that both objective phonological and subjective perceptual characteristics of syllables independently modulate spatiotemporal patterns of cortical activation. PMID:18356082
Frye, Richard E; Fisher, Janet McGraw; Witzel, Thomas; Ahlfors, Seppo P; Swank, Paul; Liederman, Jacqueline; Halgren, Eric
Perception and encoding of object size is an important feature of sensory systems. In the visual system object size is encoded by the visual angle (visual aperture) on the retina, but the aperture depends on the distance of the object. As object distance is not unambiguously encoded in the visual system, higher computational mechanisms are needed. This phenomenon is termed "size constancy". It is assumed to reflect an automatic re-scaling of visual aperture with perceived object distance. Recently, it was found that in echolocating bats, the 'sonar aperture', i.e., the range of angles from which sound is reflected from an object back to the bat, is unambiguously perceived and neurally encoded. Moreover, it is well known that object distance is accurately perceived and explicitly encoded in bat sonar. Here, we addressed size constancy in bat biosonar, recruiting virtual-object techniques. Bats of the species Phyllostomus discolor learned to discriminate two simple virtual objects that only differed in sonar aperture. Upon successful discrimination, test trials were randomly interspersed using virtual objects that differed in both aperture and distance. It was tested whether the bats spontaneously assigned absolute width information to these objects by combining distance and aperture. The results showed that while the isolated perceptual cues encoding object width, aperture, and distance were all perceptually well resolved by the bats, the animals did not assign absolute width information to the test objects. This lack of sonar size constancy may result from the bats relying on different modalities to extract size information at different distances. Alternatively, it is conceivable that familiarity with a behaviorally relevant, conspicuous object is required for sonar size constancy, as it has been argued for visual size constancy. Based on the current data, it appears that size constancy is not necessarily an essential feature of sonar perception in bats. PMID:23630598
In this paper, we present a multiview approach to segment the foreground objects consisting of a group of people into individual human objects and track them across the video sequence. Depth and occlusion information recovered from multiple views of the scene is integrated into the object detection, segmentation, and tracking processes. Adaptive background penalty with occlusion reasoning is proposed to separate the foreground regions from the background in the initial frame. Multiple cues are employed to segment individual human objects from the group. To propagate the segmentation through video, each object region is independently tracked by motion compensation and uncertainty refinement, and the motion occlusion is tackled as layer transition. The experimental results implemented on both our sequences and other's sequence have demonstrated the algorithm's efficiency in terms of subjective performance. Objective comparison with a state-of-the-art algorithm validates the superior performance of our method quantitatively. PMID:21659028
We introduce an approach for learning human actions as interactions between persons and objects in realistic videos. Previous work typically represents actions with low-level features such as image gradients or optical flow. In contrast, we explicitly localize in space and track over time both the object and the person, and represent an action as the trajectory of the object w.r.t. to the person position. Our approach relies on state-of-the-art techniques for human detection, object detection, and tracking. We show that this results in human and object tracks of sufficient quality to model and localize human-object interactions in realistic videos. Our human-object interaction features capture the relative trajectory of the object w.r.t. the human. Experimental results on the Coffee and Cigarettes dataset, the video dataset of, and the Rochester Daily Activities dataset show that 1) our explicit human-object model is an informative cue for action recognition; 2) it is complementary to traditional low-level descriptors such as 3D--HOG extracted over human tracks. We show that combining our human-object interaction features with 3D-HOG improves compared to their individual performance as well as over the state of the art. PMID:22889819
We present the Video Graph-Shifts (VGS) approach for efficiently incorporating temporal consistency into MRF energy minimization for multi-class videoobject segmentation. In contrast to previous methods, our dynamic temporal links avoid the computational overhead of using a fully connected spatiotemporal MRF, while still being able to deal with the uncertainties of the exact inter-frame pixel correspondence issues. The dynamic temporal
The touch-based displays (devices) have entailed rich interactions between the videos and users. The objects appearing in videos usually interest users in wanting to know relative knowledge about them. In this paper, we proposed a video playback system for users to interactively query objects of interest in videos. Since the text information accompanied with videos might not be strongly related
This paper presents a novel background modeling and subtraction approach for videoobject segmentation. A neural network (NN) architecture is proposed to form an unsupervised Bayesian classifier for this application domain. The constructed classifier efficiently handles the segmentation in natural-scene sequences with complex background motion and changes in illumination. The weights of the proposed NN serve as a model of
Dubravko Culibrk; Oge Marques; Daniel Socek; Hari Kalva; Borko Furht
Nowadays, the 3D video system using the MVD (multi-view video plus depth) data format is being actively studied. The system has many advantages with respect to virtual view synthesis such as an auto-stereoscopic functionality, but compression of huge input data remains a problem. Therefore, efficient 3D data compression is extremely important in the system, and problems of low temporal consistency and viewpoint correlation should be resolved for efficient depth video coding. In this paper, we propose an object-adaptive depth compensated inter prediction method to resolve the problems where object-adaptive mean-depth difference between a current block, to be coded, and a reference block are compensated during inter prediction. In addition, unique properties of depth video are exploited to reduce side information required for signaling decoder to conduct the same process. To evaluate the coding performance, we have implemented the proposed method into MVC (multiview video coding) reference software, JMVC 8.2. Experimental results have demonstrated that our proposed method is especially efficient for depth videos estimated by DERS (depth estimation reference software) discussed in the MPEG 3DV coding group. The coding gain was up to 11.69% bit-saving, and it was even increased when we evaluated it on synthesized views of virtual viewpoints.
A novel motion-based object-oriented codec for video transmission at very low bit-rates is proposed. The object motion is modeled by quadratic transform with coefficients estimated via a nonlinear quasi-Newton method. The segmentation problem is put forward as a constrained optimization problem which interacts with the motion estimation process in the course of region growing. A context-based shape coding method which
We propose a novel method for removing irrelevant frames from a video given user-provided frame-level labeling for a very small number of frames. We first hypothesize a number of windows which possibly contain the object of interest, and then determine which window(s) truly contain the object of interest. Our method enjoys several favorable properties. First, compared to approaches where a
When the object is similar to the background, it is difficult to segment the completed human body object from video images. To solve the problem, this paper proposes an object segmentation algorithm based on guided layering from video images. This algorithm adopts the structure of advance by degrees, including three parts altogether. Each part constructs the different energy function in terms of the spatiotemporal information to maximize the posterior probability of segmentation label. In part one, the energy functions are established, respectively, with the frame difference information in the first layer and second layer. By optimization, the initial segmentation is solved in the first layer, and then the amended segmentation is obtained in the second layer. In part two, the energy function is built in the interframe with the shape feature as the prior guiding to eliminate the interframe difference of the segmentation result. In art three, the segmentation results in the previous two parts are fused to suppress or inhibit the over-repairing segmentation and the object shape variations in the adjacent two-frame. The results from the compared experiment indicate that this algorithm can obtain the completed human body object in the case of the video image with similarity between object and background.
This paper describes a framework for investigating and manipulating the attentional components of video game play in order to affect learning transfer across different task environments. Several groups of video game players (VGP) and non video game players - both hockey and non-hockey groups (NVGPH, NVGP) will be tested at baseline on several aspects of visual processing skill. The NVGP
Desmond E. Mulligan; Michael W. Dobson; Janet Mccracken
This paper presents a novel algorithm in unsupervised object of interest discovery for multi-view video sequences. We classify a multi-view video sequence based on the degree of movement in a video sequence. In a video sequence with movement, we first group video frames along and across views as a group of picture (GOP). Key points or feature vectors representing textures
We propose a novel method for removing irrelevant frames from a video given user-provided frame-level labeling for a very small number of frames. We first hypothesize a number of windows which possibly contain the object of interest, and then determine which window(s) truly contain the object of interest. Our method enjoys several favorable properties. First, compared to approaches where a single descriptor is used to describe a whole frame, each window's feature descriptor has the chance of genuinely describing the object of interest; hence it is less affected by background clutter. Second, by considering the temporal continuity of a video instead of treating frames as independent, we can hypothesize the location of the windows more accurately. Third, by infusing prior knowledge into the patch-level model, we can precisely follow the trajectory of the object of interest. This allows us to largely reduce the number of windows and hence reduce the chance of overfitting the data during learning. We demonstrate the effectiveness of the method by comparing it to several other semi-supervised learning approaches on challenging video clips. PMID:20975116
In this paper, we describe COGNIVA, a closed-loop Cognitive-Neural method and system for image and video analysis that combines recent technological breakthroughs in bio-vision cognitive algorithms and neural signatures of human visual processing. COGNIVA is an "operational neuroscience" framework for intelligent and rapid search and categorization of Items Of Interest (IOI) in imagery and video. The IOI could be a single object, group of objects, specific image regions, specific spatio-temporal pattern/sequence or even the category that the image itself belongs to (e.g., vehicle or non-vehicle). There are two main types of approach for rapid search and categorization of IOI in imagery and video. The first approach uses conventional machine vision or bio-inspired cognitive algorithms. These usually need a predefined set of IOI and suffer from high false alarm rates. The second class of algorithms is based on neural signatures of target detection. These algorithms usually break the entire image into sub-images and process EEG data from these images and classify them based on it. This approach may suffer from high false alarms and is slow because the entire image is chipped and presented to the human observer. The proposed COGNIVA overcomes the limitations of both methods by combining them resulting in a low false alarm rate and high detection with high throughput making it applicable to both image and video analysis. In the most basic form, COGNIVA first uses bioinspired cognitive algorithms for deciding potential IOI in a sequence of images/video. These potential IOI are then shown to a human and neural signatures of visual detection of IOI are collected and processed. The resulting signatures are used to categorize and provide final IOI. We will present the concept and typical results of COGNIVA for detecting Items of interest in image data.
Khosla, Deepak; Huber, David J.; Bhattacharyya, Rajan; Daily, Mike; Tasinga, Penn
Atmospheric scattering causes significant degradation in the quality of video images, particularly when imaging over long distances. The principle problem is the reduction in contrast due to scattered light. It is known that when the scattering particles are not too large compared with the imaging wavelength (i.e. Mie scattering) then high spatial resolution information may be contained within a low-contrast image. Unfortunately this information is not easily perceived by a human observer, particularly when using a standard video monitor. A secondary problem is the difficulty of achieving a sharp focus since automatic focus techniques tend to fail in such conditions. Recently several commercial colour video processing systems have become available. These systems use various techniques to improve image quality in low contrast conditions whilst retaining colour content. These systems produce improvements in subjective image quality in some situations, particularly in conditions of haze and light fog. There is also some evidence that video enhancement leads to improved ATR performance when used as a pre-processing stage. Psychological literature indicates that low contrast levels generally lead to a reduction in the performance of human observers in carrying out simple visual tasks. The aim of this paper is to present the results of an empirical study on object recognition in adverse viewing conditions. The chosen visual task was vehicle number plate recognition at long ranges (500 m and beyond). Two different commercial video enhancement systems are evaluated using the same protocol. The results show an increase in effective range with some differences between the different enhancement systems.
Recent studies suggest that visual object recognition is a proactive process through which perceptual evidence accumulates over time before a decision can be made about the object. However, the exact electrophysiological correlates and time-course of this complex process remain unclear. In addition, the potential influence of emotion on this process has not been investigated yet. We recorded high density EEG
Antonio Schettino; Tom Loeys; Sylvain Delplanque; Gilles Pourtois
This paper presents a new approach of combining real video and synthetic objects. The purpose of this work is to use the proposed technology in the fields of advanced animation, virtual reality, games, and so forth. Computer graphics has been used in the fields previously mentioned. Recently, some applications have added real video to graphic scenes for the purpose of augmenting the realism that the computer graphics lacks in. This approach called augmented or mixed reality can produce more realistic environment that the entire use of computer graphics. Our approach differs from the virtual reality and augmented reality in the manner that computer- generated graphic objects are combined to 3D structure extracted from monocular image sequences. The extraction of the 3D structure requires the estimation of 3D depth followed by the construction of a height map. Graphic objects are then combined to the height map. The realization of our proposed approach is carried out in the following steps: (1) We derive 3D structure from test image sequences. The extraction of the 3D structure requires the estimation of depth and the construction of a height map. Due to the contents of the test sequence, the height map represents the 3D structure. (2) The height map is modeled by Delaunay triangulation or Bezier surface and each planar surface is texture-mapped. (3) Finally, graphic objects are combined to the height map. Because 3D structure of the height map is already known, Step (3) is easily manipulated. Following this procedure, we produced an animation video demonstrating the combination of the 3D structure and graphic models. Users can navigate the realistic 3D world whose associated image is rendered on the display monitor.
|The ability to ignore task-irrelevant information and overcome distraction is central to our ability to efficiently carry out a number of tasks. One factor shown to strongly influence distraction is the perceptual load of the task being performed; as the perceptual load of task-relevant information processing increases, the likelihood that…
The ability to ignore task-irrelevant information and overcome distraction is central to our ability to efficiently carry out a number of tasks. One factor shown to strongly influence distraction is the perceptual load of the task being performed; as the perceptual load of task-relevant information processing increases, the likelihood that…
For videos transmitted in an error-prone network, it is necessary to protect the source bitstream. Based on our packet loss visibility model, we minimize the end-to-end video quality degradation when transmitted in an AWGN channel using rate-compatible punctured convolutional codes for a given channel rate budget. We transform the original problem into a binary-decision problem, then we solve this integer
In this paper, a novel image segmentation and a robust unsupervised videoobjects tracking algorithm are proposed. The proposed method is able to track complete object regions in a sequence of video frames. In this work, object tracking is achieved by analysing the movement of the contours with frame by frame in the video stream. The proposed algorithm involves with
A layered videoobject coding system is presented in this paper. The goal is to improve video coding efficiency by exploiting the layering of video and to support content-based functionality. These two objectives are accomplished using a sprite technique and an affine motion model on a per-object basis. Several novel algorithms have been developed for mask processing and coding, trajectory
Ming-Chieh Lee; Wei-Ge Chen; Chih-lung Bruce Lin; Chuang Gu; Tomislav Markoc; Steven I. Zabinsky; Richard Szeliski
This paper presents a novel background modeling and subtraction approach for videoobject segmentation. A neural network (NN) architecture is proposed to form an unsupervised Bayesian classifier for this application domain. The constructed classifier efficiently handles the segmentation in natural-scene sequences with complex background motion and changes in illumination. The weights of the proposed NN serve as a model of the background and are temporally updated to reflect the observed statistics of background. The segmentation performance of the proposed NN is qualitatively and quantitatively examined and compared to two extant probabilistic object segmentation algorithms, based on a previously published test pool containing diverse surveillance-related sequences. The proposed algorithm is parallelized on a subpixel level and designed to enable efficient hardware implementation. PMID:18051181
The extent of visual perceptual processing that occurs in the absence of awareness is as yet unclear. Here we examined event-related-potential (ERP) indices of visual and cognitive processes as awareness was manipulated through object-substitution masking (OSM), an awareness-disrupting effect that has been hypothesized to result from the disruption of reentrant signaling to low-level visual cortical areas. In OSM, a visual stimulus array is briefly presented that includes a parafoveal visual target denoted by a cue, typically consisting of several surrounding dots. When the offset of the target-surrounding cue dots is delayed relative to the rest of the array, a striking reduction in the perception of the target image surrounded by the dots is observed. Using faces and houses as the target stimuli, we found that successful OSM reduced or eliminated all the measured electrophysiological indices of visual processing stages after 130ms post-stimulus. More specifically, when targets were missed within the masked condition (i.e., on trials with effective OSM that disrupted awareness), we observed fully intact early feed-forward processing up through the visual extrastriate P1 ERP component peaking at 100ms, followed by reduced low-level activity over the occipital pole 130-170ms post-stimulus, reduced ERP indices of lateralized shifts of attention toward the parafoveal target, reduced object-generic visual processing, abolished object-category-specific (face-specific) processing, and reduced late visual short-term-memory processing activity. The results provide a comprehensive electrophysiological account of the neurocognitive underpinnings of effective OSM of visual-object images, including evidence for central roles of early reentrant signal disruption and insufficient visual attentional deployment. PMID:23751171
The Rapid Serial Visual Presentation (RSVP) protocol for EEG has recently been discovered as a useful tool for highthroughput filtering of images into simple target and nontarget categories . This concept can be extended to the detection of objects and anomalies in images and videos that are of interest to the user (observer) in an applicationspecific context. For example, an image analyst looking for a moving vehicle in wide-area imagery will consider such an object to be target or Item Of Interest (IOI). The ordering of images in the RSVP sequence is expected to have an impact on the detection accuracy. In this paper, we describe an algorithm for learning the RSVP ordering that employs a user interaction step to maximize the detection accuracy while simultaneously minimizing false alarms. With user feedback, the algorithm learns the optimal balance of image distance metrics in order to closely emulate the human's own preference for image order. It then employs the fusion of various perceptual and bio-inspired image metrics to emulate the human's sequencing ability for groups of image chips, which are subsequently used in RSVP trials. Such a method can be employed in human-assisted threat assessment in which the system must scan a wide field of view and report any detections or anomalies to the landscape. In these instances, automated classification methods might fail. We will describe the algorithm and present preliminary results on real-world imagery.
Khosla, Deepak; Bhattacharyya, Rajan; Tasinga, Penn; Huber, David J.
In error-prone channels, forward error correction is necessary for protecting important data. In this paper, we use a packet loss visibility model to evaluate the visual importance of video packets to be transmitted. With the loss visibility of each packet, we use the Branch and Bound method to optimally allocate rates of Rate-Compatible Punctured Convolutional codes. The complexity of our
The popularity of digital image and video is increasing rapidly. To help users navigate libraries of image and video, Content Based Information Retrieval (CBIR) system that can automatically index image and video documents are needed. However, due to the semantic gap between low-level machine descriptors and high-level semantic descriptors, the existing CBIR systems are still far from perfect. Text embedded
Although the veridicality of unconscious perception is increasingly accepted, core issues remain unresolved [Jack, A., & Shallice, T. (2001). Introspective physicalism as an approach to the science of consciousness. Cognition, 79, 161-196], and sharp disagreement persists regarding fundamental methodological and theoretical issues. The most critical problem is simple but tenacious-namely, how to definitively rule out weak conscious perception as an alternative explanation for putatively unconscious effects. Using a direct task and objectively undetectable stimuli, the current experiments demonstrate clearly reliable unconscious perceptual effects, which differ qualitatively from weakly conscious effects in fundamental ways. Most importantly, the current effects correlate negatively with stimulus detectability, directly rebutting the exhaustiveness, null sensitivity, and exclusiveness problems [Reingold, E., & Merikle, P. (1988). Using direct and indirect measures to study perception without awareness. Perception & Psychophysics, 44, 563-575; Reingold, E., & Merikle, P. (1990). On the inter-relatedness of theory and measurement in the study of unconscious processes. Mind and Language, 5, 9-28)], which all predict positive correlations. Moreover, the current effects are entirely bidirectional [Katz, (2001). Bidirectional experimental effects. Psychological Methods, 6, 270-281)] and radically uncontrollable, including below-chance performance despite intentions to facilitate. In contrast, weakly conscious effects on direct measures are unidirectional, facilitative, and potentially controllable. Moreover, these qualitative differences also suggest that objective and subjective threshold phenomena are fundamentally distinct, rather than the former simply being a weaker version of the latter [Merikle, P., Smilek, D., Eastwood, J. (2001). Perception without awareness: Perspectives from cognitive psychology. Cognition, 79, 115-134]. Accordingly, it is important to distinguish between rather than conflate these methods. Further, the current effects reinforce recent work [e.g. Naccache, L., Blandin, E., & Dehaene, S. (2002). Unconscious masked priming depends on temporal attention. Psychological Science, 13, 416-424] demonstrating that unconscious effects, although not selectively controllable, are nonetheless mediated by strategic and individual difference factors, rather than being immune to such influences as long thought. PMID:16289068
SIPHER was first revealed in a US Air Force Research Laboratory Information Directorate (AFRL/RIEC) project concerned with polarimetric and SAR processing techniques. It is a means to make objects in a digital image vary in intensity (amplitude) with respect to other objects or backgrounds, in an unusual manner which promotes object or target cognitive perception. We describe this phenomenon as objects being in or out of spatial intensity phase with one another, somewhat analogous to how different signals' amplitudes differ at any instance due to their relative phases. Simple surface reflectivity and a single, static illumination source provide no special means to distinguish objects from backgrounds, other than their reflectivity differences. However, if different surfaces are illuminated from different source positions or with different amplitudes, like from a moving spotlight, different pixels with the same reflectivity may have different amplitudes at different instances within the source's dynamic behavior. The problem is that we cannot necessarily control source dynamics or collect images over sufficient time to benefit from these dynamics. SIPHER simulates source dynamics in a single, static image. It creates apparent reflectivity changes in an image taken at one instance, as if the illumination source's intensity and position was changing, as a function of algorithm threshold settings. This produces a series of processed images wherein object and background pixel amplitudes are out of phase with one another due to their orientation and surface characteristics (flat, curved, etc.), and become more perceptible. Cognitive perception is enhanced by creating a video sequence of the processed image series. This produces an apparent motion effect in the object relative to its surroundings, or renders an apparent threedimensional effect where the object appears to "jump out" from its surroundings. We first define this spatial intensity phase quantity mathematically, then compare it to conventional signal phase relationships, and finally apply it to some images to demonstrate its behavior. We also discuss anticipated enhancement and normalization techniques which may improve the technique in the future.
The monitoring of human physiological data, in both normal and abnormal situations of activity, is interesting for the purpose\\u000a of emergency event detection, especially in the case of elderly people living on their own. Several techniques have been proposed\\u000a for identifying such distress situations using either motion, audio or video data from the monitored subject and the surrounding\\u000a environment. This
In this paper, just noticeable distortion (JND) profile based upon the human visual system (HVS) has been exploited to guide the motion search and introduce an adaptive filter for residue error after motion compensation, in hybrid video coding (e.g., H.26x and MPEG-x). Because of the importance of accurate JND estimation, a new spatial-domain JND estimator (the nonlinear additivity model for
Xiaokang Yang; Weisi Lin; Zhongkang Lu; Ee Ping Ong; Susu Yao
A particularly unpleasant version of motion aftereffect was revealed after extensively playing proprietary video games in which the task is to co-ordinate spatially distributed responses in time with music. During playing, key musical and rhythmic phrases descend as coloured shapes from the top of the screen. After playing, static text is presented that appears to slide upwards, reflecting a neural reaction contrary to the falling shapes. The game both serves as a contemporary example of motion aftereffect and also highlights certain cross-modal associations between space, time, and sound in the design of stimulus-response relations. PMID:20301853
Two experiments investigated the contribution of space- and object-based coordinates to previously reported leftward perceptual biases (pseudoneglect) at various locations across visual space. Neurologically intact participants (n=34 and 27) made luminance discriminations between two left\\/right mirror-reversed luminance gradients (greyscales task), which were variously displaced around the midline in the participants’ left and right hemispaces. The orientations of the stimuli were
|The present study aimed to investigate whether perceptual completion is available at birth, in the absence of any visual experience. An extremely underspecified kinetic visual display composed of four spatially separated fragments arranged to give rise to an illusory rectangle that occluded a vertical rod (illusory condition) or rotated so as not…
The cortical mechanisms of perceptual segregation of concurrent sound sources were examined, based on binaural detection of interaural timing differences. Auditory event-related potentials were measured from 11 healthy subjects. Binaural stimuli were created by introducing a dichotic delay of 500-ms duration to a narrow frequency region within a broadband noise, and resulted in a perception of a centrally located noise
With the “green”projects such as the construction of China's gradual penetration into the network video monitoring system in the maintenance of social security, the fight against crime, an increasingly prominent role. Faced with a deluge of live and recorded video content, relying solely on manual observation and to distinguish between the traditional monitoring methods can no longer meet the application
With high-definition television becoming the mainstream of technology development,people have higher and higher demand on video quality. This paper analyses HD video quality based on distortion cased by compression, then proposes mothed of feature extraction based on H.264. This metric shows good performance compared with MOS and can be used in real-time evaluation.
With the increasing demand for video-based appli- cations, the reliable prediction of video quality has increased in importance. Numerous video quality assessment methods and metrics have been proposed over the past years with varying com- putational complexity and accuracy. In this paper, we introduce a classification scheme for full-reference and reduced-reference media-layer objectivevideo quality assessment methods. Our classification scheme
Shyamprasad Chikkerur; Vijay Sundaram; Martin Reisslein; Lina J. Karam
We present metrics to evaluate the performance of videoobject segmentation and tracking methods quantitatively when ground- truth segmentation maps are not available. The proposed metrics are based on the color and motion differences along the boundary of the estimated videoobject plane and the color histogram differ- ences between the current object plane and its temporal neighbors. These metrics
Çigdem Eroglu Erdem; Biilent Sankur; A. Murat Tekalp
This paper proposes an object-level rate control algorithm to jointly controlling the bit rates of multiple videoobjects. Utilizing noncooperative game theory, the proposed rate control algorithm mimics the behaviors of players representing videoobjects. Each player competes for available bits to optimize its visual quality. The algorithm finds an ??optimal solution?? in that it conforms to the mixed strategy
We address the problem of segmenting highly articulated videoobjects in a wide variety of poses. The main idea of our approach is to model the prior information of object appearance via random forests. To automatically extract an object from a video sequence, we first build a random forest based on image patches sampled from the initial tem- plate. Owing
Boosted by technological advances in the area of communications and computer engineering, there has been an explosion in the amount of the distributed visual information in the last decade. Digital video and the expected fusion of television and internet ...
The MPEG-4 and MPEG-7 visual standard to support each frame of a video sequence should be segmented in terms of videoobject\\u000a planes (VOPs). This paper presents an image segmentation method for extracting videoobjects from image sequences. The method\\u000a is based on a multiresolution application of wavelet and watershed transformations, followed by a wavelet coefficient-based\\u000a region merging procedure. The
Perceptual learning refers to experience-induced improvements in the pick-up of information. Perceptual constancy describes the fact that, despite variable sensory input, perceptual representations typically correspond to stable properties of objects. Here, we show evidence of a strong link between perceptual learning and perceptual constancy: Perceptual learning depends on constancy-based perceptual representations. Perceptual learning may involve changes in early sensory analyzers, but such changes may in general be constrained by categorical distinctions among the high-level perceptual representations to which they contribute. Using established relations of perceptual constancy and sensory inputs, we tested the ability to discover regularities in tasks that dissociated perceptual and sensory invariants. We found that human subjects could learn to classify based on a perceptual invariant that depended on an underlying sensory invariant but could not learn the identical sensory invariant when it did not correlate with a perceptual invariant. These results suggest that constancy-based representations, known to be important for thought and action, also guide learning and plasticity.
The media industry is undergoing comprehensive change due to the shifting audience and consumption patterns fostered by the diffusion of the Internet. This article describes how these changes shape established practices of video production and redefine the cultural categories of video and broadcasting. Drawing on an empirical case study of the practices within the British Broadcasting Corporation (BBC), the we
In order to eliminate the effect caused by noise, complex motion, and uncovered background in the process of change detection segmenting videoobjects, a new method for videoobjects extraction based on DFD between the frames and threshold segmentation is proposed. In this method, filtering and obtained two consecutive difference between the frames, and then amended the difference images by
Parameters of tracked videoobjects (for example: the angles of moving objects) are discrete random variables and the amount of data increases over time. In this paper we use a new method to analyze the parameter angle: the video frame is segmented into small sections and in each section the angle values during some time period are gathered. Through analysis
The emerging video coding standard MPEG-4 en- ables various content-based functionalities for multimedia applica- tions. To support such functionalities, as well as to improve coding efficiency, MPEG-4 relies on a decomposition of each frame of an image sequence into videoobject planes (VOPs). Each VOP corre- sponds to a single moving object in the scene. This paper presents a new
End-to-end quality of service for real-time video delivery over ATM networks depends on both video coding and network performance. Historical measures of QoS for data services have involved packet delivery parameters such as cell loss and delay, but these are not directly representative of a user's perception of the delivered video quality. To this end, this paper describes the use of a video quality measurement scheme that employs a human vision based picture quality model to objectively estimate end-user video quality. We present results to demonstrate that the video quality analysis based approach can be used to determine performance characteristics for distribution-quality video coding parameters over ATM networks.
The 3D video quality is of highest importance for the adoption of a new technology from a user's point of view. In this paper we evaluated the impact of coding artefacts on stereoscopic 3D video quality by making use of several existing full reference 2D objective metrics. We analyzed the performance of objective metrics by comparing to the results of subjective experiment. The results show that pixel based Visual Information Fidelity metrics fits subjective data the best. The 2D stereoscopic video quality seems to have dominant impact on the coding artefacts impaired stereoscopic videos.
Wang, K.; Brunnström, K.; Barkowsky, Marcus; Urvoy, M.; Sjöström, M.; Le Callet, P.; Tourancheau, S.; Andrén, B.
Text extraction in video documents, as an important research field of content-based information indexing and retrieval, has been developing rapidly since 1990s. This has led to much progress in text extraction, performance evaluation, and related applications. By reviewing the approaches proposed during the past five years, this paper introduces the progress made in this area and discusses promising directions for
Since video quality fluctuation degrades the visual perception significantly in multimedia communication systems, it is important to maintain a consistent objective quality over the entire video sequence. We propose a rate control algorithm to keep the consistent objective quality in high efficiency video coding (HEVC), which is an upcoming standard video codec. In the proposed algorithm, the probability density function of transformed coefficients is modeled based on a Laplacian function that considers the quadtree coding unit structure, which is one of the characteristics of HEVC. In controlling the video quality, distortion-quantization and rate-quantization models are derived by using the Laplacian function. Based on those models, a quantization parameter is determined to control the quality of the encoded frames where the fluctuation of video quality is minimized and the overflow and underflow of buffer are prevented. From the simulation results, it is shown that the proposed rate control algorithm outperforms the other conventional schemes. PMID:23481856
Motion estimation consumes most of the computational time in video coding system since the matching between blocks should be carried out on numerous search points (SPs). Adaptive Search Range (ASR) schemes, as one kind of fast ME algorithms, are widely used to reduce the SPs. However, all of the so-called ASR schemes are applied on the precondition that there already
Based on the classical fractal video compression method, an improved monocular fractal compression method is proposed which includes using more effective macroblock partition scheme instead of classical quadtree partition scheme; using improved fast motion estimation to increase the calculation speed; using homo-I-frame like in H.264, etc. The monocular codec uses the motion compensated prediction (MCP) structure. And stereo fractal video coding is proposed which matches the macroblock with two reference frames in left and right views, and it results in increasing compression ratio and reducing bit rate/bandwidth when transmitting compressed video data. The stereo codec combines MCP and disparity compensated prediction. And a new method of object-based fractal video coding is proposed in which each object can be encoded and decoded independently with higher compression ratio and speed and less bit rate/bandwidth when transmitting compressed stereo video data greatly. Experimental results indicate that the proposed monocular method can raise compression ratio 3.6 to 7.5 times, speed up compression time 5.3 to 22.3 times, and improve the image quality 3.81 to 9.24 dB in comparison with circular prediction mapping and non-contractive interframe mapping. The PSNR of the proposed stereo video coding is about 0.17 dB higher than that of the proposed monocular video coding, and 0.69 dB higher than that of JMVC 4.0 on average. Comparing with the bit rate resulted by the proposed monocular video coding and JMVC 4.0, the proposed stereo video coding achieves, on average, 2.53 and 21.14 Kbps bit rate saving, respectively. The proposed object-based fractal monocular and stereo video coding methods are simple and effective, and they make the applications of fractal monocular and stereo video coding more flexible and practicable.
A novel object-based fractal monocular and stereo video compression scheme with quadtree-based motion and disparity compensation\\u000a is proposed in this paper. Fractal coding is adopted and each object is encoded independently by a prior image segmentation alpha plane, which is defined exactly as in MPEG-4. The first n frames of right video sequence are encoded by using the Circular Prediction
We present a scalable object tracking framework, which is capable of tracking the contour of nonrigid objects in the presence of occlusion. The framework consists of open-loop boundary prediction and closed-loop boundary correction parts. The open-loop prediction block adaptively divides the object con- tour into subcontours, and estimates the mapping parameters for each subsegment. The closed-loop boundary correction block em-
Çigdem Eroglu Erdem; Bülent Sankur; A. Murat Tekalp
One of the decisive steps in automated surveillance and monitoring is object detection. A standard approach to constructing object detectors consists of annotating large data sets and using them to train a detector. Nevertheless, due to unavoidable constraints of a typical training data set, supervised approaches are inappropriate for building generic systems applicable to a wide diversity of camera setups
Hasan Celik; Alan Hanjalic; Emile A. Hendriks; Sabri Boughorbel
We propose a novel algorithm for object tracking in video pictures, based on image segmentation and pattern matching. With image segmentation, we can detect all objects in images, whether they are moving or not. Using the image segmentation results of successive frames, we exploit pattern matching in a simple feature space for tracking the objects. Consequently, the proposed algorithm can
Takashi Morimoto; Osamu Kiriyama; Yohmei Harada; Hidekazu Adachi; Tetsushi Koide; Hans Jürgen Mattausch
A novel method of using stereoscopic video images to synthesize the computer-generated hologram (CGH) patterns of a real 3D object is proposed. Stereoscopic video images of a real 3D object are captured by a 3D camera system. Disparity maps between the captured stereo image pairs are estimated and from these estimated maps the depth data for each pixel of the object can be extracted on a frame basis. By using these depth data and original color images, hologram patterns of a real object can be computationally generated. In experiments, stereoscopic video images of a real 3D object, a wooden rhinoceros doll, are captured by using the Wasol 3D adapter system and its depth data are extracted from them. Then, CGH patterns of 1280 pixels x 1024 pixels are generated with these depth-annotated images of the wooden rhinoceros doll, and the CGH patterns are experimentally displayed via a holographic display system. PMID:16855665
Kim, Seung-Cheol; Hwang, Dong-Choon; Lee, Dong-Hwi; Kim, Eun-Soo
A novel method of using stereoscopic video images to synthesize the computer-generated hologram (CGH) patterns of a real 3D object is proposed. Stereoscopic video images of a real 3D object are captured by a 3D camera system. Disparity maps between the captured stereo image pairs are estimated and from these estimated maps the depth data for each pixel of the object can be extracted on a frame basis. By using these depth data and original color images, hologram patterns of a real object can be computationally generated. In experiments, stereoscopic video images of a real 3D object, a wooden rhinoceros doll, are captured by using the Wasol 3D adapter system and its depth data are extracted from them. Then, CGH patterns of 1280 pixels × 1024 pixels are generated with these depth-annotated images of the wooden rhinoceros doll, and the CGH patterns are experimentally displayed via a holographic display system.
Kim, Seung-Cheol; Hwang, Dong-Choon; Lee, Dong-Hwi; Kim, Eun-Soo
Presents evidence that although patients with semantic deficits can sometimes show good performance on tests or object decisions, this pattern applies when nonsee-objects do not respect the regularities of the domain. Patients with semantic dementia viewed line drawings of a real and chimeric animals side-by-side and were asked to decide which was…
Rogers, Timothy T.; Hodges, John R.; Ralph, Matthew A. Lambon; Patterson, Karalyn
|To study the dynamic interplay between different component processes involved in the identification of fragmented object outlines, the authors used a discrete-identification paradigm in which the masked presentation duration of fragmented object outlines was repeatedly increased until correct naming occurred. Survival analysis was used to…
Predictive coding theories posit that the perceptual system is structured as a hierarchically organized set of generative models with increasingly general models at higher levels. The difference between model predictions and the actual input (prediction error) drives model selection and adaptation processes minimizing the prediction error. Event-related brain potentials elicited by sensory deviance are thought to reflect the processing of prediction error at an intermediate level in the hierarchy. We review evidence from auditory and visual studies of deviance detection suggesting that the memory representations inferred from these studies meet the criteria set for perceptualobject representations. Based on this evidence we then argue that these perceptualobject representations are closely related to the generative models assumed by predictive coding theories. PMID:22047947
The main media in multimedia communication are moving images, and the development of the objective quality evaluation method of the color moving image is strongly hoped for. In this research, as the preparatory steps for the development of the objective quality evaluation method, the subjective evaluation experiment was done using the semantic differential method, to clarify factors of the subjective quality evaluation in which the moving image was encoded using the MC plus DCT. These factors are the basic components which evaluate objectively the picture quality. Moreover, the difference of the evaluation factors was analyzed, it compared with the result of the subjective evaluation experiment of the intra-frame coding and the inter-frame coding. Next, the subjective evaluation experiment was done by the EBU method and the relation between the subjective evaluation factors which is derived from the SD method and the scale of the quality degradation (MOS) was investigated.
We present a novel unsupervised learning algorithm for discovering objects and their location in videos from moving cameras. The videos can switch between different shots, and contain cluttered background, occlusion, camera motion, and multiple independently moving objects. We exploit both appearance consistency and spatial configuration consistency of local patches across frames for object recognition and localization. The contributions of this paper are twofold. First, we propose a combined approach for simultaneous spatial context and temporal context generation. Local video patches are extracted and described using the generated spatial-temporal context words. Second, a dynamic topic model, based on the representation of a bag of spatial-temporal context words, is introduced to learn object category models in video sequences. The proposed model can categorize and localize multiple objects in a single video. Objects leaving or entering the scene at multiple times can also be handled efficiently in the dynamic framework. Experimental results on the CamVid data set and the VISAT™ data set demonstrate the effectiveness and robustness of the proposed method.
This paper presents a content-based approach for temporal segmentation of videos. Tracked objects are characterized by their 2D trajectories which are used in a meaningful way to model visual semantics, i.e., the observed single video ob- ject activities and their interactions. To this end, hierarchi- cal Semi-Markov Chains (SMCs) are computed in order to take into account the temporal causalities
Alexandre Hervieu; Patrick Bouthemy; Jean-pierre Le Cadre
Here, we demonstrate that action video game play enhances subjects’ ability in two tasks thought to indicate the number of items that can be apprehended. Using an enumeration task, in which participants have to determine the number of quickly flashed squares, accuracy measures showed a near ceiling performance for low numerosities and a sharp drop in performance once a critical number of squares was reached. Importantly, this critical number was higher by about two items in video game players (VGPs) than in non-video game players (NVGPs). A following control study indicated that this improvement was not due to an enhanced ability to instantly apprehend the numerosity of the display, a process known as subitizing, but rather due to an enhancement in the slower more serial process of counting. To confirm that video game play facilitates the processing of multiple objects at once, we compared VGPs and NVGPs on the multiple object tracking task (MOT), which requires the allocation of attention to several items over time. VGPs were able to successfully track approximately two more items than NVGPs. Furthermore, NVGPs trained on an action video game established the causal effect of game playing in the enhanced performance on the two tasks. Together, these studies confirm the view that playing action video games enhances the number of objects that can be apprehended and suggest that this enhancement is mediated by changes in visual short-term memory skills.
This article presents a novel representation for dynamic scenes composed of multiple rigid objects that may undergo different motions and are observed by a moving camera. Multi-view constraints associated with groups of affine-co variant scene patches and a normalized description of their appearance are used to segment a scene into its rigid components, construct three-dimensional models of these components, and
Fred Rothganger; Svetlana Lazebnik; Cordelia Schmid; Jean Ponce
In the present study two separate stimulus-response compatibility effects (functional affordance and Simon-like effects) were investigated with centrally presented pictures of an object tool (a torch) characterized by a structural separation between the graspable portion and the goal-directed portion. In Experiment 1, participants were required to decide whether the torch was red or blue, while in Experiment 2 they were required to decide whether the torch was upright or inverted. Our results showed that with the same stimulus two types of compatibility effect emerged: one based on the direction signalled by the goal-directed portion of the tool (a Simon-like effect as observed in Experiment 1), and the other based on the actions associated with an object (a functional affordance effect as observed in Experiment 2). Both effects emerged independently of the person's intention to act on the stimulus, but depended on the stimulus properties that were processed in order to perform the task. PMID:20589580
Pellicano, Antonello; Iani, Cristina; Borghi, Anna M; Rubichi, Sandro; Nicoletti, Roberto
Public safety practitioners increasingly use video for object recognition tasks. These end users need guidance regarding how to identify the level of video quality necessary for their application. The quality of video used in public safety applications must be evaluated in terms of its usability for specific tasks performed by the end user. The Public Safety Communication Research (PSCR) project performed a subjective test as one of the first in a series to explore visual intelligibility in video-a user's ability to recognize an object in a video stream given various conditions. The test sought to measure the effects on visual intelligibility of three scene parameters (target size, scene motion, scene lighting), several compression rates, and two resolutions (VGA (640x480) and CIF (352x288)). Seven similarly sized objects were used as targets in nine sets of near-identical source scenes, where each set was created using a different combination of the parameters under study. Viewers were asked to identify the objects via multiple choice questions. Objective measurements were performed on each of the scenes, and the ability of the measurement to predict visual intelligibility was studied.
The fusiform face area (FFA) is a region of human cortex that responds selectively to faces, but whether it supports a more general function relevant for perceptual expertise is debated. Although both faces and objects of expertise engage many brain areas, the FFA remains the focus of the strongest modular claims and the clearest predictions about expertise. Functional MRI studies at standard-resolution (SR-fMRI) have found responses in the FFA for nonface objects of expertise, but high-resolution fMRI (HR-fMRI) in the FFA [Grill-Spector K, et al. (2006) Nat Neurosci 9:1177-1185] and neurophysiology in face patches in the monkey brain [Tsao DY, et al. (2006) Science 311:670-674] reveal no reliable selectivity for objects. It is thus possible that FFA responses to objects with SR-fMRI are a result of spatial blurring of responses from nonface-selective areas, potentially driven by attention to objects of expertise. Using HR-fMRI in two experiments, we provide evidence of reliable responses to cars in the FFA that correlate with behavioral car expertise. Effects of expertise in the FFA for nonface objects cannot be attributed to spatial blurring beyond the scale at which modular claims have been made, and within the lateral fusiform gyrus, they are restricted to a small area (200 mm(2) on the right and 50 mm(2) on the left) centered on the peak of face selectivity. Experience with a category may be sufficient to explain the spatially clustered face selectivity observed in this region. PMID:23027970
The fusiform face area (FFA) is a region of human cortex that responds selectively to faces, but whether it supports a more general function relevant for perceptual expertise is debated. Although both faces and objects of expertise engage many brain areas, the FFA remains the focus of the strongest modular claims and the clearest predictions about expertise. Functional MRI studies at standard-resolution (SR-fMRI) have found responses in the FFA for nonface objects of expertise, but high-resolution fMRI (HR-fMRI) in the FFA [Grill-Spector K, et al. (2006) Nat Neurosci 9:1177–1185] and neurophysiology in face patches in the monkey brain [Tsao DY, et al. (2006) Science 311:670–674] reveal no reliable selectivity for objects. It is thus possible that FFA responses to objects with SR-fMRI are a result of spatial blurring of responses from nonface-selective areas, potentially driven by attention to objects of expertise. Using HR-fMRI in two experiments, we provide evidence of reliable responses to cars in the FFA that correlate with behavioral car expertise. Effects of expertise in the FFA for nonface objects cannot be attributed to spatial blurring beyond the scale at which modular claims have been made, and within the lateral fusiform gyrus, they are restricted to a small area (200 mm2 on the right and 50 mm2 on the left) centered on the peak of face selectivity. Experience with a category may be sufficient to explain the spatially clustered face selectivity observed in this region.
McGugin, Rankin Williams; Gatenby, J. Christopher; Gore, John C.; Gauthier, Isabel
|This study explored whether the reported inability of newborns to perceive object unity could result from the limited abilities of newborns to recognize the correspondence between 2 stimuli that were identical except for the presence or absence of an occluder. Five experiments were carried out using a visual habituation technique. The results of…
The increasing use of compression standards in broadcasting digital TV has raised the need for established criteria to measure perceived quality. Novel methods must take into account the specific artifacts introduced by digital compression techniques. This paper presents a methodology using circular backpropagation (CBP) neural networks for the objective quality assessment of motion picture expert group (MPEG) video streams. Objective
In this paper, a moving object detection method in video sequences is described. In the first step, the camera motion is eliminated using motion compensation. An adaptive subband decomposition structure is then used to analyze the motion compensated image. In the “low–high” and “high–low” subimages moving objects appear as outliers and they are detected using a statistical detection test based
In this paper, we present our work for automatic generation of textual metadata based on visual content analysis of video news. We present two methods for semantic object detection and recognition from a cross modal image-text thesaurus. These thesaurus represent a supervised association between models and semantic labels. This paper is concerned with two semantic objects: faces and Tv logos.
\\u000a This paper presents the results obtained in a real experiment for object recognition in a sequence of images captured by a\\u000a mobile robot in an indoor environment. The purpose is that the robot learns to identify and locate objects of interest in\\u000a its environment from samples of different views of the objects taken from video sequences. In this work, objects
The goal of the development phase of the CPR Instructor Real-time Review through Use of Simulation (CIRRUS) research program was to create a video library portraying a spectrum of objectively verified simulation chest compression performances. Investigators scripted and recorded 12 two-person cardiopulmonary resuscitation (CPR) videos with specific chest compression parameters encompassing a range of hand positions, rates, depths, and chest releases in combinations that proportionately reflected typical learner cohort performances. Six videos were designated to portray adequate chest compressions, whereas the other six videos were to feature inadequate compressions. All 12 final 2-minute videos showed chest compression parameters as originally specified within tolerances to comply with American Heart Association recommendations. Deviations from specification were 1 to 10 cpm (mode = 4 cpm) for compression rate and -1.4 to 1.3 cm (mode = 0.9 cm) for depth. The program's collection of simulated CPR videos with objectively verified chest compression performances may help researchers and educators study and improve CPR instruction and provider preparation for the effective delivery of optimal patient care. PMID:23230856
Al-Rasheed, Rakan S; Devine, Jeffrey; Dunbar-Viveiros, Jennifer A; Jones, Mark S; Dannecker, Max; Jay, Gregory D; Kobayashi, Leo
A simple and versatile method to collect and analyse motion data is described. A video camera is focused on a bright spot attached to an object, and an appropriately adapted computer performs the display and further processing. Thus it is possible to display large stroboscopic images and to derive other data from them, such as time derivatives and distribution functions.
We conceive the problem of multiple semantic videoobject (SVO) extraction as an issue of designing extensive opera- tors on a complete lattice of partitions. As a result, we propose a framework based on spatial partition generation and application of optimal operators on the generated partitions. Based on a sta- tistical analysis of the watershed algorithm, we develop a multi-
A number of object-oriented coding algorithms have been proposed for coding video sequences at low bit rates. Instead of estimating motion of pixel blocks, these algorithms segment each image into regions of uniform motion and estimate the motion of these regions. Estimating the segmentation and computing motion parameters are evidently closely related. Most algorithms iteratively compute complex motion parameters and
Video foreground object detection faces the problems of moving backgrounds, illumination changes, chaotic motion in real word applications. This paper presents a hybrid pixel-based background (HPB) model, which is constructed by single stable record and multi-layer astable records after initial learning. This HPB model can be used for background subtraction to extract objects precisely in various complex scenes. Using the multi-layer astable records, we also propose a homogeneous background subtraction that can detect the foreground object with less memory load. Based on the benchmark videos, the experimental results show that single stable and 3-layer multi-layer astable records can be enough for background model construction and are updated quickly to overcome the background variation. The proposed approach can improve the average error rates of foreground object detection up to 86% when comparing with the latest works. Furthermore, our method can achieve real-time analysis for complex scenes on personal computers and embedded platforms.
This paper proposes an object detection algorithm and a framework based on a combination of Normalized Central Moment Invariant (NCMI) and Normalized Geometric Radial Moment (NGRM). The developed framework allows detecting objects with offline pre-loaded signatures and/or using the tracker data in order to create an online object signature representation. The framework has been successfully applied to the target detection and has demonstrated its performance on real video and imagery scenes. In order to overcome the implementation constraints of the low-powered hardware, the developed framework uses a combination of image moment functions and utilizes a multi-layer neural network. The developed framework has been shown to be robust to false alarms on non-target objects. In addition, optimization for fast calculation of the image moments descriptors is discussed. This paper presents an overview of the developed framework and demonstrates its performance on real video and imagery scenes.
A system for 3-D reconstruction of a rigid object from monocular video sequences is introduced. Initially an object pose is estimated in each image by locating similar (unknown) texture assuming flat depth map for all images. Shape-from-silhouette as stated in R. Szeliski (1993) is then applied to construct a 3-D model which is used to obtain better pose estimates using
This paper describes a method of detecting, tracking and identifying moving objects in video scenes. The method is based on an adaptive change detector which detects and tracks moving objects and extracts silhouettes of the objects from the background so they can be classified by shape. The adaptive change detector uses estimates of image noise and contrast to dynamically adjust decision thresholds. A principal feature of this method is the synergistic interaction between the tracker, the segmenter and the classifier to eliminate uninteresting objects, to improve estimates of the background and noise, to guide threshold selection, and to influence feature selection classification. 9 refs., 4 figs.
We present a new compressed domain method for tracking objects in airborne videos. In the proposed scheme, a statistical snake is used for object segmentation in I-frames, and motion vectors extracted from P-frames are used for tracking the object detected in I-frames. It is shown that the energy function of the statistical snake can be obtained directly from the compressed DCT coefficients without the need of full decompression. The number of snake deformation iterations can be also significantly reduced in compressed domain implementation. The computational cost is significantly reduced by using compressed domain processing while the performance is competitive to that of pixel domain processing. The proposed method is tested using several UAV video sequences, and experiments show that the tracking results are satisfactory.
Temporal pooling and temporal defects are the two dierences between image and video quality assessment. Whereas temporal pooling has been the object of two recent studies, this paper focuses on the rarely addressed topic of compression-induced temporal artifacts, such as mosquito noise. To study temporal aspects in subjective quality assessment, we compared the perceived quality of two versions of a mosquito noise corrector: one purely spatial and the other spatio-temporal. We set up a paired-comparison experiment and choose videos whose compression mainly creates temporal artifacts. Results proved the existence of a purely temporal aspect in video quality perception. We investigate the correlation between subjective results from the experiment and three video metrics (VQM, MOVIE, VQEM), as well as two temporally-pooled image metrics (SSIM and PSNR). SSIM and PSNR metrics nd the corrected sequences of better quality than the compressed ones but do not distinguish spatial and spatio-temporal processings. The confrontation of those results with the VQM and Movie objective metrics show that they do not account for this type of defects. A detailed study highlights that either they do not detect them or the response of their temporal component is masked by the one of their spatial components.
Super resolution image reconstruction allows for the enhancement of images in a video sequence that is superior to the original pixel resolution of the imager. Difficulty arises when there are foreground objects that move differently than the background. A common example of this is a car in motion in a video. Given the common occurrence of such situations, super resolution reconstruction becomes non-trivial. One method for dealing with this is to segment out foreground objects and quantify their pixel motion differently. First we estimate local pixel motion using a standard block motion algorithm common to MPEG encoding. This is then combined with the image itself into a five dimensional mean-shift kernel density estimation based image segmentation with mixed motion and color image feature information. This results in a tight segmentation of objects in terms of both motion and visible image features. The next step is to combine segments into a single master object. Statistically common motion and proximity are used to merge segments into master objects. To account for inconsistencies that can arise when tracking objects, we compute statistics over the object and fit it with a generalized linear model. Using the Kullback-Leibler divergence, we have a metric for the goodness of the track for an object between frames.
Mundhenk, T. Nathan; Sundareswara, Rashmi; Gerwe, David R.; Chen, Yang
In this paper, we present our work for automatic generation of textual metadata based on visual content analysis of video news. We present two methods for semantic object detection and recognition from a cross modal image-text thesaurus. These thesaurus represent a supervised association between models and semantic labels. This paper is concerned with two semantic objects: faces and Tv logos. In the first part, we present our work for efficient face detection and recogniton with automatic name generation. This method allows us also to suggest the textual annotation of shots close-up estimation. On the other hand, we were interested to automatically detect and recognize different Tv logos present on incoming different news from different Tv Channels. This work was done jointly with the French Tv Channel TF1 within the "MediaWorks" project that consists on an hybrid text-image indexing and retrieval plateform for video news.
In this paper we propose a system that identifies and tracks the movement of an object appearing fully or partially hidden by occlusion in a video sequence for the ultimate purpose of modeling the moving object in 2.5D space using the SfM (structure from motion) concepts. This paper presents a novel algorithm to detect moving objects in video sequences by
Recently a number of new low bit-rate block- based video-coding algorithms have been reported that focus on moving objects by dividing the macroblocks into three classes based on motion involved. In the conventional H.26X coding standards, macroblocks are divided into static (no motion) and active (with motion) classes. These new algorithms have improved coding efficiency as well as corresponding picture
Three-dimensional (3-D) video is a real 3-D movie recording the object's full 3-D shape, motion, and precise surface texture. This paper first proposes a parallel pipeline processing method for reconstructing a dynamic 3-D object shape from multiview video images, by which a temporal series of full 3-D voxel representations of the object behavior can be obtained in real time. To
For the purpose of extracting moving objects from H.264/advanced video coding (AVC) bit stream of a complex scene, an algorithm based on maximum a posteriori Markov random field (MRF) framework to extract moving objects directly from H.264 compressed video is proposed in this paper. It mainly involves encoding information of motion vectors (MVs) and block partition modes in H.264/AVC bit stream and utilizes temporal continuity and spatial consistency of moving object's pieces. First, it retrieves MVs and block partition modes of identical 4×4 pixel blocks in P frames and establishes Gaussian mixture model (GMM) of the phase of MVs as a reference background, and then creates MRF model based on MVs, block partition modes, the GMM of the background, spatial, and temporal consistency. The moving objects are retrieved by solving the MRF model. The experimental results show that it can perform robustly in a complex environment and the precision and recall have been improved over the existing algorithm.
Mingsheng, Chen; Mingxin, Qin; Guangming, Liang; Jixiang, Sun; Xu, Ning
Sign language users are eager for the freedom and convenience of video communication over cellular devices. Compression of sign language video in this setting offers unique challenges. The low bitrates available make encoding decisions extremely important, while the power constraints of the device limit the encoder complexity. The ultimate goal is to maximize the intelligibility of the conversation given the rate-constrained cellular channel and power constrained encoding device. This paper uses an objective measure of intelligibility, based on subjective testing with members of the Deaf community, for rate-distortion optimization of sign language video within the H.264 framework. Performance bounds are established by using the intelligibility metric in a Lagrangian cost function along with a trellis search to make optimal mode and quantizer decisions for each macroblock. The optimal QP values are analyzed and the unique structure of sign language is exploited in order to reduce complexity by three orders of magnitude relative to the trellis search technique with no loss in rate-distortion performance. Further reductions in complexity are made by eliminating rarely occuring modes in the encoding process. The low complexity SL optimization technique increases the measured intelligibility up to 3.5 dB, at fixed rates, and reduces rate by as much as 60% at fixed levels of intelligibility with respect to a rate control algorithm designed for aesthetic distortion as measured by MSE.
Images and videos are subject to a wide variety of distortions during acquisition, digitizing, processing, restoration, compression, storage, transmission and reproduction, any of which may result in degradation in visual quality. That is why image quality assessment plays a major role in many image processing applications. Image and video quality metrics can be classified by using a number of criteria such as the type of the application domain, the predicted distortion (noise, blur, etc.) and the type of information needed to assess the quality (original image, distorted image, etc.). In the literature, the most reliable way of assessing the quality of an image or of a video is subjective evaluation , because human beings are the ultimate receivers in most applications. The subjective quality metric, obtained from a number of human observers, has been regarded for many years as the most reliable form of quality measurement. However, this approach is too cumbersome, slow and expensive for most applications . So, in recent years a great effort has been made towards the development of quantitative measures. The objective quality evaluation is automated, done in real time and needs no user interaction. But ideally, such a quality assessment system would perceive and measure image or video impairments just like a human being . The quality assessment is so important and is still an active and evolving research topic because it is a central issue in the design, implementation, and performance testing of all systems [4, 5]. Usually, the relevant literature and the related work present only a state of the art of metrics that are limited to a specific application domain. The major goal of this paper is to present a wider state of the art of the most used metrics in several application domains such as compression , restoration , etc. In this paper, we review the basic concepts and methods in subjective and objective image/video quality assessment research and we discuss their performances and drawbacks in each application domain. We show that if in some domains a lot of work has been done and several metrics were developed, on the other hand, in some other domains a lot of work has to be done and specific metrics need to be developed.
In a recent paper, Shawn Green and Daphne Bavelier show that playing an action video game markedly improved subject performance on a range of visual skills related to detecting objects in briefly flashed displays. This is noteworthy as previous studies on perceptual learning, which have commonly focused on well-controlled and rather abstract tasks, found little transfer of learning to novel
With the proliferation of digital and video cameras, personal collection of multimedia materials such as amateur video-clips\\u000a are abundant now-a-days. Most of these multimedia materials may be useful to others if they are shared and can be located\\u000a easily. Semantic Web technologies hold promise to organize and re-use such non-textual information effectively. However, annotation\\u000a of multimedia contents is a tedious
In this paper, we propose a new system for videoobject detection based on user-defined models. Object models are described by 'model graphs' in which nodes represent image regions and edges denote spatial proximity. Each node is attributed with color and shape information about the corresponding image region. Model graphs are specified manually based on a sample image of the object. Object recognition starts with automatic color segmentation of the input image. For each region, the same features are extracted as specified in the model graph. Recognition is based on finding a subgraph in the image graph that matches the model graph. Evidently, it is not possible to find an isomorph subgraph, since node and edge attributes will not match exactly. Furthermore, the automatic segmentation step leads to an oversegmented image. For this reason, we employ inexact graph matching, where several nodes of the image graph may be mapped onto a single node in the model graph. We have applied our object recognition algorithm to cartoon sequences. This class of sequences is difficult to handle with current automatic segmentation algorithms because the motion estimation has difficulties arising from large homogeneous regions and because the object appearance is typically highly variable. Experiments show that our algorithm can robustly detect the specified objects and also accurately find the object boundary.
Farin, Dirk; de With, Peter H. N.; Effelsberg, Wolfgang
A highly challenging object detection task is the recognition of relevant events in outdoor applications, such as it is the case in sport broadcasts. Changing illumination, different weather conditions, and noise in the imaging process are the most important issues that require a truly robust detection system. The original contribution of this work is to take advantage of a dynamic integration of object beliefs from different evidences of spatial and temporal context to receive a recursively updated object hypothesis, with the aim to render object detection more robust. The object representation is outlined in a probabilistic framework to enable reasoning on multiple instances of detection results and decision making based on statistical evaluations. The representation is based on the local appearances of the objects, and therefore makes the interpretation more robust to occlusion by enabling reasoning based on spatial context between the appearances of individual object parts. Reasoning is driven by the evaluation of Bayesian decision fusion of the single probabilistic local image interpretations. The detection system is evaluated on the detection of company logos in extensive video material from Formula One broadcasts. The experimental results demonstrate that fusion is crucial to improve robustness and accuracy of the outdoor detection system.
For members of the Deaf Community in the United States, current communication tools include TTY/TTD services, video relay services, and text-based communication. With the growth of cellular technology, mobile sign language conversations are becoming a possibility. Proper coding techniques must be employed to compress American Sign Language (ASL) video for low-rate transmission while maintaining the quality of the conversation. In order to evaluate these techniques, an appropriate quality metric is needed. This paper demonstrates that traditional video quality metrics, such as PSNR, fail to predict subjective intelligibility scores. By considering the unique structure of ASL video, an appropriate objective metric is developed. Face and hand segmentation is performed using skin-color detection techniques. The distortions in the face and hand regions are optimally weighted and pooled across all frames to create an objective intelligibility score for a distorted sequence. The objective intelligibility metric performs significantly better than PSNR in terms of correlation with subjective responses.
Automated video monitoring of mobile objects is a growing trend in many sectors, especially in surveillance applications. Many research groups are addressing substantial efforts to develop autonomous applications able to recognize specified events. Reliability of such a system is mainly defined by its ability to extract and to track features of interest in image sequences. Since the capacity to perform this basic task is strongly related to changes in image contrast, video monitoring units made up mono-spectral sensors offer limited performances in many situations. The best example of such a limitation is the uselessness of a visible CCD camera in low brightness scene. To overcome these shortcomings, we developed an acquisition unit including an uncooled VOx thermal camera (8-12 mm)and a High-Dynamic-Range-CMOS(R) color camera more suitable for outdoor applications. Unlike similar systems, we perform image registration at hardware level rather than at software level. Advantageous characteristics of such a design are presented in this paper. A simple framework is also proposed in order to achieve context-independent event extraction from color and thermal information.
St. Laurent, Louis; Prévost, Donald; Maldague, Xavier P. V.
The present invention involves a mixed reality or video game authoring tool system and method which integrates design information in the mixed reality or video game interfaces and allows the authoring of both mixed reality and video game environment and f...
A. J. Nelson E. H. Kirkley J. R. Kirkley S. C. Borland S. J. Tomblin W. R. Pendleton
Three uses of video games are described: 1. Video games as assessment and as measures of performance. 2. Video games as training and practice in cognitive and perceptual skills, biological and physiological functioning, and cooperation and teamwork. 3. Video games as entertainment. Selected studies are described for each domain. The purpose is to suggest the many possible roles for video
We reviewed studies of viewers' reactions to stereoscopic image sequences. The dimensions considered were perceived image quality, sharpness, depth and naturalness. Stereoscopic displays produced a reliable and consistent increase in the perceived depth of image sequences. By comparison, improvements on other dimensions were not as robust. The key conclusion is that viewers' responses to stereoscopic image sequences vary along a number of independent dimensions. Overall preference for stereoscopic images will occur only if the enhanced depth perceived in a stereoscopic image sequence is not accompanied by distortions created by excessive disparity, ghosting/crosstalk, or conflicts between monoscopic and stereoscopic depth information.
Stelmach, Lew B.; Tam, Wa James; Meegan, Daniel V.
In two experiments, it was investigated how preverbal infants perceive the relationship between a person and an object she is looking at. More specifically, it was examined whether infants interpret an adult's object-directed gaze as a marker of an intention to act or whether they relate the person and the object via a mechanism of associative…
In two experiments, it was investigated how preverbal infants perceive the relationship between a person and an object she is looking at. More specifically, it was examined whether infants interpret an adult's object-directed gaze as a marker of an intention to act or whether they relate the person and the object via a mechanism of associative learning. Fourteen-month-old infants observed an adult gazing repeatedly at one of two objects. When the adult reached out to grasp this object in the test trials, infants showed no systematic visual anticipations to it (i.e. first visual anticipatory gaze shifts) but only displayed longer looking times for this object than for another before her hand reached the object. However, they showed visual anticipatory gaze shifts to the correct action target when only the grasping action was presented. The second experiment shows that infants also look longer at the object a person has been gazing at when the person is still present, but is not performing any action during the test trials. Looking preferences for the objects were reversed, however, when the person was absent during the test trials. This study provides evidence for the claim that infants around 1 year of age do not employ other people's object-directed gaze to anticipate future actions, but to establish person-object associations. The implications of this finding for theoretical conceptions of infants' social-cognitive development are discussed. PMID:22010890
There has been an increased interest in adaptive video quality control and dynamically adjusting the output video bit rate based on the status of the network. However, network-level performance parameters cannot accurately reflect the video quality perceived by the end users. Our goal is to investigate an adaptive perceptualvideo quality control mechanism based on an application-level perceptualvideo quality
Xiaoxiang Lu; Ramon Orlando Morando; Magda El Zarki
In order to provide more efficient content-based functionalities for video applications, it is necessary to extract meaningful regions from scenes as perceptual oriented representation of video content. We present a novel approach for salient region extraction, and simple visual features bound to salient regions will better represent the video content in perceptual manner. Since perceptual saliency for visual information is
Biased category payoff matrices engender separate reward- and accuracy-maximizing decision criteria. Although instructed to\\u000a maximize reward, observers use suboptimal decision criteria that place greater emphasis on accuracy than is optimal. In this\\u000a study, objective classifier feedback (the objectively correct response) was compared with optimal classifier feedback (the\\u000a optimal classifier’s response) at two levels of category discriminability when zero or negative
One of the major problems of modern industrial robots is a lack of reliable perceptual systems that are similar to human vision in its abilities to understand visual scene and detect and unambiguously identify objects. The traditional linear bottom-up "segmentation-grouping-learning-recognition" approach to image processing and analysis cannot provide a reliable separation of an object from its background or clutter, while human vision unambiguously solves this problem. The modern computer vision can only recognize certain features from visual information, and it plays an auxiliary role, helping to build or choose appropriate 3-dimensional models of objects and visual scene. As result, designers of robotics systems must create for industrial robots artificial environments, which allowing for precise computations of 3-dimensional models within such environments. However, outside of such an artificial environment, the robot is dysfunctional. Biologically-inspired Network-Symbolic models do not compute precise 3-dimensional models, but convert image information into an "understandable" Network-Symbolic format, which is similar to relational knowledge models. Feature, symbol, and predicate are equivalent in the Network-Symbolic systems. A linking mechanism binds these features or symbols into coherent structures, and image converts from a "raster" into a "vector" representation that can be better interpreted by higher-level knowledge structures. Logic of visual scenes can be captured in the Network-Symbolic models and used for the disambiguation of visual information.
Readiness depends on how accessible categories are to the stimulated organism. Accessibility is a function of the likehood of occurrence of previously learned events, and one's need states and habits of daily living. Lack of perceptual readiness can be rectified by relearning the categories, or by constant close inspection of events and objects. Sensory stimuli are \\
|Multiple levels of category inclusiveness in 4 object domains (animals, vehicles, fruit, and furniture) were examined using a sequential touching procedure and assessed in both individual and group analyses in eighty 12-, 18-, 24-, and 30-month-olds. The roles of stimulus discriminability and child motor development, fatigue, and actions were…
Multiple levels of category inclusiveness in 4 object domains (animals, vehicles, fruit, and furniture) were examined using a sequential touching procedure and assessed in both individual and group analyses in eighty 12-, 18-, 24-, and 30-month-olds. The roles of stimulus discriminability and child motor development, fatigue, and actions were also investigated. More inclusive levels of categorization systematically emerged before less
Recent neurophysiological studies have shown that primary visual cortex, or V1, does more than passively process image features using the feedforward filters suggested by Hubel and Wiesel. It also uses horizontal interactions to group features preattentively into object representations, and feedback interactions to selectively attend to these groupings. All neocortical areas, including V1, are organized into layered circuits. We present
Perceptual learning refers to the phenomenon that practice or training in perceptual tasks often substantially improves perceptual performance. Often exhibiting stimulus or task specificities, perceptual learning differs from learning in the cognitive or motor domains. Research on perceptual learning reveals important plasticity in adult perceptual systems, and as well as the limitations in the information processing of the human observer. In this article, we review the behavioral results, mechanisms, physiological basis, computational models, and applications of visual perceptual learning.
Lu, Zhong-Lin; Hua, Tianmiao; Huang, Chang-Bing; Zhou, Yifeng; Dosher, Barbara Anne
Three-dimensional (3-D) video technologies are becoming increasingly popular because they can provide high quality and immersive experience to end users. Depth image-based rendering (DIBR) is a key technology in 3-D video systems due to its low bandwidth cost as well as the arbitrary rendering viewpoint. We propose an object-based DIBR method by color-correction optimization. The proposed method first performs temporal consistent rendering to reduce the rendering complexity. Then, by segmenting the depth map into foreground and background, the object-based scalable rendering is performed to improve the rendering quality and reduce the rendering complexity. Finally, the rendered virtual view is further optimized by color-correction operation. Experimental results show that, compared to the results without the above optimization operations, the proposed method can reduce >40% computational complexity while maintaining high rendering quality.
Hypervideo is the natural evolution of Hypertext. Interlinking images and text in modern Hypertext pages is well understood and widely used in commercial services. Links from text to images and image to text are used interchangeably and there is a plethora of development environments in the market today.Links from video sequences out to other pieces of information are not currently
Recently, there has been an increasing interest in game artificial intelligence (AI). Game AI is a system that makes the game characters behave like human beings that is able to make smart decisions to achieve the target in a computer or video game. Thus, this study focuses on an automated method of generating artificial neural network (ANN) controller that is
Tse Guan Tan; Patricia Anthony; Jason Teo; Jia Hui Ong
Debates about the local and the global continue to be prominent in cultural studies. By taking an example of Australian gay porn videos, which in some ways are convincingly ‘local’, the paper suggests that previous attempts to define ‘the local’ - in terms either of textual features or provenance of production - are problematic. It proposes instead the idea of
How does the brain group together different parts of an object into a coherent visual object representation? Different parts of an object may be processed by the brain at different rates and may thus become desynchronized. Perceptual framing is a process that resynchronizes cortical activities corresponding to the same retinal object. A neural network model is presented that is able
We present a method to monitor a patient and the equipment in a radiotherapy treatment room, by exploiting the information in the treatment plan, enriched with other elements such as visual, geometric, and "semantic" information. Using all these information items, and a generic model, a virtual environment of the scene is created, with maximum precision. The images resulting from video sequences with several cameras are also used to confront the filmed information on the scene and its numerical representation. The method is based on the features of the scene elements, and on a fuzzy formalism. The feasibility of the method is being quantitatively evaluated in the absence of treatment, to be further exploited in a module for external control by video in real conditions. PMID:22127989
Portela Sotelo, M A; Desserée, É; Moreau, J-M; Shariat, B; Beuve, M
Hammersmith Infant Neurological Examination (HINE) is a set of tests used for grading neurological development of infants on a scale of 0 to 3. These tests help in assessing neurophysiological development of babies, especially preterm infants who are born before (the fetus reaches) the gestational age of 36 weeks. Such tests are often conducted in the follow-up clinics of hospitals for grading infants with suspected disabilities. Assessment based on HINE depends on the expertise of the physicians involved in conducting the examinations. It has been noted that some of these tests, especially pulled-to-sit and lateral tilting, are difficult to assess solely based on visual observation. For example, during the pulled-to-sit examination, the examiner needs to observe the relative movement of the head with respect to torso while pulling the infant by holding wrists. The examiner may find it difficult to follow the head movement from the coronal view. Videoobject tracking based automatic or semi-automatic analysis can be helpful in this case. In this paper, we present a video based method to automate the analysis of pulled-to-sit examination. In this context, a dynamic programming and node pruning based efficient videoobject tracking algorithm has been proposed. Pulled-to-sit event detection is handled by the proposed tracking algorithm that uses a 2-D geometric model of the scene. The algorithm has been tested with normal as well as marker based videos of the examination recorded at the neuro-development clinic of the SSKM Hospital, Kolkata, India. It is found that the proposed algorithm is capable of estimating the pulled-to-sit score with sensitivity (80%-92%) and specificity (89%-96%). PMID:22157070
A retinally stabilized object readily undergoes perceptual fading and disappears from consciousness. This startling phenomenon is commonly believed to arise from local bottom-up sensory adaptation to edge information that occurs early in the visual pathway, such as in the lateral geniculate nucleus of the thalamus or retinal ganglion cells. Here we use random dot stereograms to generate perceivable contours or shapes that are not present on the retina and ask whether perceptual fading occurs for such "cortical" contours. Our results show that perceptual fading occurs for "cortical" contours and that the time a contour requires to fade increases as a function of its size, suggesting that retinal adaptation is not necessary for the phenomenon and that perceptual fading may be based in the cortex. PMID:22250867
What is learned in perceptual learning? How does perceptual learning change the perceptual system? We investigate these questions using a systems analysis of the perceptual system during the course of perceptual learning using psychophysical methods and models of the observer. Effects of perceptual learning on an observer’s performance are characterized by external noise tests within the framework of noisy observer models. We find evidence that two independent mechanisms, external noise exclusion and stimulus enhancement support perceptual learning across a range of tasks. We suggest that both mechanisms may reflect re-weighting of stable early sensory representations.
A novel approach for fast generation of video holograms of three-dimensional (3-D) moving objects using a motion compensation-based novel-look-up-table (MC-N-LUT) method is proposed. Motion compensation has been widely employed in compression of conventional 2-D video data because of its ability to exploit high temporal correlation between successive video frames. Here, this concept of motion-compensation is firstly applied to the N-LUT based on its inherent property of shift-invariance. That is, motion vectors of 3-D moving objects are extracted between the two consecutive video frames, and with them motions of the 3-D objects at each frame are compensated. Then, through this process, 3-D object data to be calculated for its video holograms are massively reduced, which results in a dramatic increase of the computational speed of the proposed method. Experimental results with three kinds of 3-D video scenarios reveal that the average number of calculated object points and the average calculation time for one object point of the proposed method, have found to be reduced down to 86.95%, 86.53% and 34.99%, 32.30%, respectively compared to those of the conventional N-LUT and temporal redundancy-based N-LUT (TR-N-LUT) methods. PMID:23670014
Kim, Seung-Cheol; Dong, Xiao-Bin; Kwon, Min-Woo; Kim, Eun-Soo
Parameter adjustment of the transmitted light source yields a good-quality image for video measurement microscopy. However, a disadvantage is that the measured size of the objects changes simultaneously while the parameters of the transmitted light source are adjusted, which directly affects the measurement accuracy adversely. Furthermore, the variation of the objects' measured size leads to the magnification variation as well. In order to analyse how the measured size of the objects varies with the parameters, we apply a photometric method and an operator algebra method to derive a unified mathematical model that correlates the parameters of the transmitted light source with the measured size of the objects. Theoretical results show that factors including the illumination intensity, the diameter of the aperture stop and the field stop and the object thickness are important to the measured size of the objects. The simulation results have been verified thoroughly using a wide variety of internal and external diameter measurement experiments. The proposed method has proved to be suitable to determine how these parameters affect measured size of the objects. PMID:23581426
Timely detection of packages that are left unattended in public spaces is a security concern, and rapid detection is important for prevention of potential threats. Because constant surveillance of such places is challenging and labor intensive, automated abandoned-object-detection systems aiding operators have started to be widely used. In many studies, stationary objects, such as people sitting on a bench, are also detected as suspicious objects due to abandoned items being defined as items newly added to the scene and remained stationary for a predefined time. Therefore, any stationary object results in an alarm causing a high number of false alarms. These false alarms could be prevented by classifying suspicious items as living and nonliving objects. In this study, a system for abandoned object detection that aids operators surveilling indoor environments such as airports, railway or metro stations, is proposed. By analysis of information from a thermal- and visible-band camera, people and the objects left behind can be detected and discriminated as living and nonliving, reducing the false-alarm rate. Experiments demonstrate that using data obtained from a thermal camera in addition to a visible-band camera also increases the true detection rate of abandoned objects.
Object segmentation and tracking are problems within the scope of MPEG-4 and MPEG-7 standardization activities. A novel algorithm for both object segmentation and tracking is presented. The algorithm fuses motion, color, and accumulated previous segmentation data at `region level', in contrast to conventional `pixel level' approaches. The information fusion is achieved by a rule-based region processing unit which intelligently utilizes
The results in general support the hypotheses that the greater the agreement between the individual's self-description and an objective description of him, the less perceptual defense he will show, the more adequate will be his personal adjustment, and the more adequate his personal adjustment, the less perceptual defence he will show.
This paper presents a novel multiresolution image segmentation method based on the discrete wavelet transform and Markov Random Field (MRF) modelling. A major contribution of this work is to add spatial scalability to the segmentation algorithm producing the same segmentation pattern at different resolutions. This property makes it suitable for the scalable object-based wavelet coding. The correlation between different resolutions
This paper presents a novel multiresolution image segmentation method based on the discrete wavelet transform and Markov Random Field (MRF) modelling. A major contribution of this work is to add spatial scalability to the segmentation algorithm producing the same segmentation pattern at different resolutions. This property makes it suitable for the scalable object-based wavelet coding. The correlation between different resolutions
Fardin Akhlaghian Tab; Golshah Naghdy; Alfred Mertins
In this paper, we present a review of existing techniques and systems for tracking multiple occluding objects using one or more cameras. Following a formulation of the occlusion problem, we divide these techniques into two groups: merge- split (MS) approaches and straight-through (ST) approaches. Then, we consider tracking in ball game applications, with emphasis on soccer. Based on this assessment
Pierre F. Gabriel; Jacques G. Verly; Justus H. Piater; André Genon
Bag-of-features (BoF) deriving from local keypoints has re- cently appeared promising for object and scene classiflca- tion. Whether BoF can naturally survive the challenges such as reliability and scalability of visual classiflcation, nev- ertheless, remains uncertain due to various implementation choices. In this paper, we evaluate various factors which govern the performance of BoF. The factors include the choices of
In this paper, we propose a novel method for moving foreground object extraction in sequences taken by a wearable camera,\\u000a with strong motion. We use camera motion compensated frame differencing, enhanced with a novel kernel-based estimation of\\u000a the probability density function of background pixels. The probability density functions are used for filtering false foreground\\u000a pixels on the motion compensated difference
D. Szolgay; J. Benois-Pineau; R. Megret; Y. Gaestel; J.-F. Dartigues
This paper describes a methodology that integrates recognition and segmentation, simultaneously with image tracking in a cooperative\\u000a manner, for recognition of objects (or parts of them) in image sequences. A probabilistic general approach at pixel level\\u000a is depicted together with a practical heuristic simplification in which pixels’ class probabilities are approximated by a\\u000a finite small set of class possibility values.
Nicolás Amézquita Gómez; René Alquézar; Francesc Serratosa
We propose a PQSvideo (Picture Quality Scale for moving image), a method of objective quality assessment for coded moving images. We expect that the proposed PQSvideo approximates subjective assessment well. In PQSvideo, we define essential distortion factors considering not only global distortions (such as random noise) but also distortions on local features. Then, we describe each distortion metrically considering human visual perception. The PQSvideo is given by a linear combination of define each essential distortion factor, utilizing the principal component analysis method and the multiple regression analysis method between quantity of each essential distortion factor and MOS (Mean Opinion Score) obtained by assessment test. We have confirmed that the PQSvideo approximates MOS successfully.
Tetsuji, Yamashita; Kameda, Masashi; Miyahara, Makoto M.
Sensory perception is a learned trait. The brain strategies we use to perceive the world are constantly modified by experience. With practice, we subconsciously become better at identifying familiar objects or distinguishing fine details in our environment. Current theoretical models simulate some properties of perceptual learning, but neglect the underlying cortical circuits. Future neural network models must incorporate the top-down alteration of cortical function by expectation or perceptual tasks. These newly found dynamic processes are challenging earlier views of static and feedforward processing of sensory information.
A retinally stabilized object readily undergoes perceptual fading and disappears from consciousness. This startling phenomenon is commonly believed to arise from local bottom-up sensory adaptation to edge information that occurs early in the visual pathway, such as in the lateral geniculate nucleus of the thalamus or retinal ganglion cells. Here…
Sensory perception is a learned trait. The brain strategies we use to perceive the world are constantly modified by experience. With practice, we subconsciously become better at identifying familiar objects or distinguishing fine details in our environment. Current theoretical models simulate some properties of perceptual learning, but neglect the underlying cortical circuits. Future neural network models must incorporate the top-down
|A retinally stabilized object readily undergoes perceptual fading and disappears from consciousness. This startling phenomenon is commonly believed to arise from local bottom-up sensory adaptation to edge information that occurs early in the visual pathway, such as in the lateral geniculate nucleus of the thalamus or retinal ganglion cells. Here…
An on-road evaluation of two perceptual countermeasure treatments (an enhanced curve post treatment and peripheral transverse edgelines on the approach to an intersection) was conducted over one year to indicate potential for reducing travel speed. Measures included speed and deceleration profiles, braking, and lateral placement observations taken from video recordings at each site. Data were collected before treatment, immediately after treatment, and 12 months after treatment. The results obtained were quite variable across sites and treatments. At curves, speed effects were mixed with both speed reductions and increases observed immediately after and 12-months later. Braking results tended to support travel speed findings and some improvement in lateral placement were also observed at these locations. At intersections, the results were more stable where speed reductions were more common both immediately after treatment as well as longer-term. There were no differences in braking and lateral placement at these straight-road locations. The findings seem to have been unduly influenced to some degree by misadventure and wear and tear at these sites. It is argued that while the effectiveness of these treatments may be site specific to some degree, they do offer a low-cost solution to reducing travel speed at hazardous locations. PMID:16179136
To develop accurate objective measurements (models) for video quality assessment, subjective data is traditionally collected via human subject testing. The ITU has a series of Recommendations that address methodology for performing subjective tests in a rigorous manner. These methods are targeted at the entertainment application of video. However, video is often used for many applications outside of the entertainment sector, and generally this class of video is used to perform a specific task. Examples of these applications include security, public safety, remote command and control, and sign language. For these applications, video is used to recognize objects, people or events. The existing methods, developed to assess a person's perceptual opinion of quality, are not appropriate for task-based video. The Institute for Telecommunication Sciences, under a program from the Department of Homeland Security and the National Institute for Standards and Technology's Office of Law Enforcement, has developed a subjective test method to determine a person's ability to perform recognition tasks using video, thereby rating the quality according to the usefulness of the video quality within its application. This new method is presented, along with a discussion of two examples of subjective tests using this method.
Ford, Carolyn G.; McFarland, Mark A.; Stange, Irena W.
|A good training video has a focused objective, well-written script, clear sound track, and visuals that enhance the communication of the message. Good visuals depend on lighting, camera angles, continuity, and motivation for the scene. (SK)|
Similarities in anomalous perception of internal gastric states and sensitivity to distraction among the obese to variations in perceptual reactance suggest that the obese tend to augment the intensity of visceral cues associated with hunger. It was hypothesized that the obese would be overrepresented at the augmenter end of the perceptual reactance continuum. Thirteen obese (six male, seven female) and
In this paper, we present a perceptual distortion measure that predicts image integrity far better than mean-squared error. This perceptual distortion measure is based on a model of human visual processing that fits empirical measurements of: (1) the response properties of neurons in the primary visual cortex, and (2) the psychophysics of spatial pattern detection. We also illustrate the usefulness
The earliest stages in our perception of the world have a subtle but powerful influence on later thought processes; they provide the contextual cues within which our thoughts are framed and they adapt to many different environments throughout our lives. Understanding the changes in these cues is crucial to understanding how our perceptual ability develops, but these changes are often difficult to quantify in sufficiently complex tasks where objective measures of development are available. Here we simulate perceptual learning using neural networks and demonstrate fundamental changes in these cues as a function of skill. These cues are cognitively grouped together to form perceptual templates that enable rapid `whole scene' categorisation of complex stimuli. Such categories reduce the computational load on our capacity limited thought processes, they inform our higher cognitive processes and they suggest a framework of perceptual pre-processing that captures the central role of perception in expertise.
Image quality metrics (IQMs), such as the mean squared error (MSE) and the structural similarity index (SSIM), are quantitative measures to approximate perceived visual quality. In this paper, through analyzing the relationship between the MSE and the SSIM under an additive noise distortion model, we propose a perceptually relevant MSE-based IQM, MSE-SSIM, which is expressed in terms of the variance of the source image and the MSE between the source and distorted images. Evaluations on three publicly available databases (LIVE, CSIQ, and TID2008) show that the proposed metric, despite requiring less computation, compares favourably in performance to several existing IQMs. In addition, due to its simplicity, MSE-SSIM is amenable for the use in a wide range of image and video tasks that involve solving an optimization problem. As an example, MSE-SSIM is used as the objective function in designing a Wiener filter that aims at optimizing the perceptual visual quality of the output. Experimental results show that the images filtered with a MSE-SSIM-optimal Wiener filter have better visual quality than those filtered with a MSE-optimal Wiener filter. PMID:24057005
Magic illusions provide the perceptual and cognitive scientist with a toolbox of experimental manipulations and testable hypotheses about the building blocks of conscious experience. Here we studied several sleight-of-hand manipulations in the performance of the classic “Cups and Balls” magic trick (where balls appear and disappear inside upside-down opaque cups). We examined a version inspired by the entertainment duo Penn & Teller, conducted with three opaque and subsequently with three transparent cups. Magician Teller used his right hand to load (i.e. introduce surreptitiously) a small ball inside each of two upside-down cups, one at a time, while using his left hand to remove a different ball from the upside-down bottom of the cup. The sleight at the third cup involved one of six manipulations: (a) standard maneuver, (b) standard maneuver without a third ball, (c) ball placed on the table, (d) ball lifted, (e) ball dropped to the floor, and (f) ball stuck to the cup. Seven subjects watched the videos of the performances while reporting, via button press, whenever balls were removed from the cups/table (button “1”) or placed inside the cups/on the table (button “2”). Subjects’ perception was more accurate with transparent than with opaque cups. Perceptual performance was worse for the conditions where the ball was placed on the table, or stuck to the cup, than for the standard maneuver. The condition in which the ball was lifted displaced the subjects’ gaze position the most, whereas the condition in which there was no ball caused the smallest gaze displacement. Training improved the subjects’ perceptual performance. Occlusion of the magician’s face did not affect the subjects’ perception, suggesting that gaze misdirection does not play a strong role in the Cups and Balls illusion. Our results have implications for how to optimize the performance of this classic magic trick, and for the types of hand and object motion that maximize magic misdirection.
We describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video
Three-dimensional television (3D-TV) has gained in- creasing popularity in the broadcasting domain, as it enables en- hanced viewing experiences in comparison to conventional two-di- mensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To al- leviate such content shortage, an economical and practical solution is to reuse the huge media
Neurons in the visual cortex are responsive to the presentation of oriented and curved line segments, which are thought to act as primitives for the visual processing of shapes and objects. Prolonged adaptation to such stimuli gives rise to two related perceptual effects: a slow change in the appearance of the adapting stimulus (perceptual drift), and the distortion of subsequently presented test stimuli (adaptational aftereffects). Here we used a psychophysical nulling technique to dissociate and quantify these two classical observations in order to examine their underlying mechanisms and their relationship to one another. In agreement with previous work, we found that during adaptation horizontal and vertical straight lines serve as attractors for perceived orientation and curvature. However, the rate of perceptual drift for different stimuli was not predictive of the corresponding aftereffect magnitudes, indicating that the two perceptual effects are governed by distinct neural processes. Finally, the rate of perceptual drift for curved line segments did not depend on the spatial scale of the stimulus, suggesting that its mechanisms lie outside strictly retinotopic processing stages. These findings provide new evidence that the visual system relies on statistically salient intrinsic reference stimuli for the processing of visual patterns, and point to perceptual drift as an experimental window for studying the mechanisms of visual perception.
Muller, Kai-Markus; Schillinger, Frieder; Do, David H.; Leopold, David A.
The paper proposes a mechanism for the spontaneous formation of perceptually grounded meanings under the selectionist pressure of a di~rimination task. The mechanism is defined formally and the results of ,~me simulation experiments are reported.
3D Video system based on Depth-Image-Based Rendering relies on high quality depth data. Errors distributed randomly in depth map sequences induce annoying temporal noise, such as flickering and object shifting. Prior studies on video quality assessment focused mainly on spatial quality of the tested sequence and often ignored its temporal performance. In synthesized sequences, a large number of tiny geometric distortions and illumination differences are temporally constant and perceptually invisible. The dynamic noise impairs subjective quality of the sequences more greatly than the static spatial noise. Temporal quality plays a dominant role in overall quality assessment on the synthesized sequences with temporal instability problem. We propose a simple full-reference metric, Peak Signal to Perceptible Temporal Noise Ratio, to evaluate quality of synthesized sequences by measuring the perceptible temporal noise in them.
In this paper, we propose a model for video metadata that supports video retrieval based on video content, video structure and\\/or video attributes. Our model supports video retrieval based on the relationships among videos and between videos and real world objects. This model is also appropriate for providing personalization and recommendation functionality in video based services, while it takes into
Human vision system is very complex and has been studied for many years specifically for purposes of efficient encoding of visual, e.g. video content from digital TV. There have been physiological and psychological evidences which indicate that viewers do not pay equal attention to all exposed visual information, but only focus on certain areas known as focus of attention (FOA) or saliency regions. In this work, we propose a novel based objective quality assessment metric, for assessing the perceptual quality of decoded video sequences affected by transmission errors and packed loses. The proposed method weights the Mean Square Error (MSE), Weighted-MSE (WMSE), according to the calculated saliency map at each pixel. Our method was validated trough subjective quality experiments.
Boujut, H.; Benois-Pineau, J.; Hadar, O.; Ahmed, T.; Bonnet, P.
Discusses the potential for training perception and cognition in sport through video simulation of game situations. The paper reviews perceptual training studies and data from studies involving slide and video training in basketball. Issues that arise when creating simulations and assessment of transfer are discussed. (SM)
For a sensible application of Augmented Reality (AR) and Virtual Environments (VE) it is necessary to include basic human information processing resources and characteristics. Because there is no fully functional model of human perceptual, cognitive, and motor behavior, this requires empirical analyses. Moreover, these analyses are often based on subjective ratings rather than objective measures. With regard to perception as
A data model for long objects (such as video files) is introduced, to support general referencing structures, along with various system implementation strategies. Based on the data model, various indexing techniques for video are then introduced. A set of basic functionalities is described, including all the frame level control, indexing, and video clip editing. We show how the techniques can be used to automatically index video files based on closed captions with a typical video capture card, for both compressed and uncompressed video files. Applications are presented using those indexing techniques in security control and viewers' rating choice, general video search (from laser discs, CD ROMs, and regular disks), training videos, and video based user or system manuals.
Chen, C.-Y. Roger; Meliksetian, Dikran S.; Liu, Larry J.; Chang, Martin C.
|Olfactory perceptual learning is a relatively long-term, learned increase in perceptual acuity, and has been described in both humans and animals. Data from recent electrophysiological studies have indicated that olfactory perceptual learning may be correlated with changes in odorant receptive fields of neurons in the olfactory bulb and piriform…
Wilson, Donald A.; Fletcher, Max L.; Sullivan, Regina M.
A pitch detector based on Licklider's (1979) duplex theory of pitch perception was implemented and tested on a variety of stimuli from human perceptual tests. It is believed that this approach accurately models how people perceive pitch. It is shown that it correctly identifies the pitch of complex harmonic and inharmonic stimuli and that it is robust in the face
Three experiments considered the development of perceptual causality in children from 3 to 9 years of age ( N ? 176 in total). Adults tend to see cause and effect even in schematic, two-dimensional motion events: Thus, if square A moves toward B, which moves upon contact, they report that A launches B—physical causality. If B moves before contact, adults
Anne Schlottmann; Deborah Allen; Carina Linderoth; Sarah Hesketh
In two experiments it was investigated which aspects of memory are influenced by emotion. Using a framework proposed by Roediger (American Psychologist 45 (1990) 1043–1056), two dimensions relevant for memory were distinguished the implicit–explicit distinction, and the perceptual versus conceptual distinction. In week 1, subjects viewed a series of slides accompanied with a spoken story in either of the two
We used adaptation to examine the relationship between perceptual norms--the stimuli observers describe as psychologically neutral, and response norms--the stimulus levels that leave visual sensitivity in a neutral or balanced state. Adapting to stimuli on opposite sides of a neutral point (e.g. redder or greener than white) biases appearance in opposite ways. Thus the adapting stimulus can be titrated to find the unique adapting level that does not bias appearance. We compared these response norms to subjectively defined neutral points both within the same observer (at different retinal eccentricities) and between observers. These comparisons were made for visual judgments of color, image focus, and human faces, stimuli that are very different and may depend on very different levels of processing, yet which share the property that for each there is a well defined and perceptually salient norm. In each case the adaptation aftereffects were consistent with an underlying sensitivity basis for the perceptual norm. Specifically, response norms were similar to and thus covaried with the perceptual norm, and under common adaptation differences between subjectively defined norms were reduced. These results are consistent with models of norm-based codes and suggest that these codes underlie an important link between visual coding and visual experience.
The central problems of vision are often divided into object identification and localization. Object identification, at least at fine levels of discrimination, may require the application of top-down knowledge to resolve ambiguous image information. Utilizing top-down knowledge, however, may require the initial rapid access of abstract object categories based on low-level image cues. Does object localization require a different set of operating principles than object identification or is category determination also part of the perception of depth and spatial layout? Three-dimensional graphics movies of objects and their cast shadows are used to argue that identifying perceptual categories is important for determining the relative depths of objects. Processes that can identify the causal class (e.g. the kind of material) that generates the image data can provide information to determine the spatial relationships between surfaces. Changes in the blurriness of an edge may be characteristically associated with shadows caused by relative motion between two surfaces. The early identification of abstract events such as moving object/shadow pairs may also be important for depth from shadows. Knowledge of how correlated motion in the image relates to an object and its shadow may provide a reliable cue to access such event categories.
This study took a close look at the mechanism behind gender disparity in video game usage by examining two perceptual variables: perceptions about others' video game usage and perceived influence of unrealistic video game character images on others. Both men and women perceived that young women play video games far less frequently than young men and also considered themselves less
We consider perceptual learning: experience-induced changes in the way perceivers extract information. Often neglected in scientific accounts of learning and in instruction, perceptual learning is a fundamental contributor to human expertise and is crucial in domains where humans show remarkable levels of attainment, such as language, chess, music, and mathematics. In Section 2, we give a brief history and discuss the relation of perceptual learning to other forms of learning. We consider in Section 3 several specific phenomena, illustrating the scope and characteristics of perceptual learning, including both discovery and fluency effects. We describe abstract perceptual learning, in which structural relationships are discovered and recognized in novel instances that do not share constituent elements or basic features. In Section 4, we consider primary concepts that have been used to explain and model perceptual learning, including receptive field change, selection, and relational recoding. In Section 5, we consider the scope of perceptual learning, contrasting recent research, focused on simple sensory discriminations, with earlier work that emphasized extraction of invariance from varied instances in more complex tasks. Contrary to some recent views, we argue that perceptual learning should not be confined to changes in early sensory analyzers. Phenomena at various levels, we suggest, can be unified by models that emphasize discovery and selection of relevant information. In a final section, we consider the potential role of perceptual learning in educational settings. Most instruction emphasizes facts and procedures that can be verbalized, whereas expertise depends heavily on implicit pattern recognition and selective extraction skills acquired through perceptual learning. We consider reasons why perceptual learning has not been systematically addressed in traditional instruction, and we describe recent successful efforts to create a technology of perceptual learning in areas such as aviation, mathematics, and medicine. Research in perceptual learning promises to advance scientific accounts of learning, and perceptual learning technology may offer similar promise in improving education.
IMVS (Intelligent Mobile Video Stream Monitoring System) is a mobile video surveillance system. The objective of IMVS is to design a high performance video stream monitoring system in a mobile computing environment. In particular, the technical questions to be addressed are: (1) how to minimize the amount of video signals to be transmitted between the front-end mobile device and the
People with autism have consistently been found to outperform controls on visuo-spatial tasks such as block design, embedded figures, and visual search tasks. Plaisted, O'Riordan, and others (Bonnel et al., 2003; O'Riordan & Plaisted, 2001; O'Riordan, Plaisted, Driver, & Baron-Cohen, 2001; Plaisted, O'Riordan, & Baron-Cohen, 1998a, 1998b) have suggested that these findings might be explained in terms of reduced perceptual
Lewis Bott; Jon Brock; Noellie Brockdorff; Jill Boucher; Koen Lamberts
This paper investigates the perceptual quality of video affected by packet losses. We focus on low-resolution and low bit-rate video coded by the H.264\\/AVC encoder and the packet loss patterns likely in low bit-rate wireless networks. We examine the impact of several factors on the perceptual quality, including the error length (the error propagation duration after a loss), the loss
Tao Liu; Yao Wang; Jill M. Boyce; Hua Yang; Zhenyu Wu
This talk will consider the implications of sensorineural hearing loss for auditory perceptual organization. In everyday environments, the listener is often faced with the difficulty of processing a target sound that intermingles acoustically with one or more extraneous sounds. Under such circumstances, several auditory processes enable the complex waveforms reaching the two ears to be interpreted in terms of putative auditory objects giving rise to the target and extraneous sounds. Such processes of perceptual organization depend upon the central analysis of cues that allow distributed spectral information to be either linked together or split apart on the basis of details related to such variables as synchrony of onset/modulation, harmonic relation, rhythm, and interaural differences. Efficient perceptual organization must depend not only upon such central auditory analyses but also upon the fidelity with which the peripheral auditory system encodes the spectral and temporal characteristics of sound. We will consider the implications of sensorineural hearing loss for perceptual organization in terms of both peripheral and central auditory processes.
Perceptual curiosity, as defined by Berlyne (1954), involves interest in and giving attention to novel perceptual stimulation, and motivates visual and sensory-inspection. A 33-item questionnaire constructed to assess individual differences in perceptual curiosity was administered to 320 undergraduate students (202 females; 118 males). The participants also responded to the trait scales of the State-Trait Personality Inventory (STPI), and to selected
Robert P Collins; Jordan A Litman; Charles D Spielberger
When viewing multimedia presentations, a user can only be attending to a relatively small part of the video display at any one point in time. Accordingly, by shifting allocation of bandwidth from peripheral regions of the screen to regions-of-interest (RoI), which are measured or highly probable positions of user attention, attentive displays can be produced. This paper reports a perceptual
This paper examines two fundamental challenges in two areas, respectively, which have been intensively researched in the field of image processing and communications, i.e., digital picture coding\\/compression and digital picture (including both video and still images) restoration (or de-noising). It reflects on historical developments and reviews the state-of-the-art in the area of digital picture coding. Quantitative perceptual distortion measure based
The omnifocus video camera takes videos, in which objects at different distances are all in focus in a single video display. The omnifocus video camera consists of an array of color video cameras combined with a unique distance mapping camera called the Divcam. The color video cameras are all aimed at the same scene, but each is focused at a different distance. The Divcam provides real-time distance information for every pixel in the scene. A pixel selection utility uses the distance information to select individual pixels from the multiple video outputs focused at different distances, in order to generate the final single video display that is everywhere in focus. This paper presents principle of operation, design consideration, detailed construction, and over all performance of the omnifocus video camera. The major emphasis of the paper is the proof of concept, but the prototype has been developed enough to demonstrate the superiority of this video camera over a conventional video camera. The resolution of the prototype is high, capturing even fine details such as fingerprints in the image. Just as the movie camera was a significant advance over the still camera, the omnifocus video camera represents a significant advance over all-focus cameras for still images.
The video differential planimeter is an instrument which measures variations in the projected area of any remote object with the aid of a flying spot scanner or television camera system. The composite video signal, caused by the scanning of the object and its contrasting background, is shaped to yield a sequence of constant amplitude rectangular pulses that are negative going
A critical component of decision making is the ability to adjust criteria for classifying stimuli. fMRI and drift diffusion models were used to explore the neural representations of perceptual criteria in decision making. The specific focus was on the relative engagement of perceptual- and decision-related neural systems in response to adjustments in perceptual criteria. Human participants classified visual stimuli as big or small based on criteria of different sizes, which effectively biased their choices toward one response over the other. A drift diffusion model was fit to the behavioral data to extract estimates of stimulus size, criterion size, and difficulty for each participant and condition. These parameter values were used as modulated regressors to create a highly constrained model for the fMRI analysis that accounted for several components of the decision process. The results show that perceptual criteria values were reflected by activity in left inferior temporal cortex, a region known to represent objects and their physical properties, whereas stimulus size was reflected by activation in occipital cortex. A frontoparietal network of regions, including dorsolateral prefrontal cortex and superior parietal lobule, corresponded to the decision variables resulting from the downstream stimulus-criterion comparison, independent of stimulus type. The results provide novel evidence that perceptual criteria are represented in stimulus space and serve as inputs to be compared with the presented stimulus, recruiting a common network of decision regions shown to be active in other simple decisions. This work advances our understanding of the neural correlates of decision flexibility and adjustments of behavioral bias. PMID:23175825
White, Corey N; Mumford, Jeanette A; Poldrack, Russell A
|Increasing perceptual load reduces the processing of visual stimuli outside the focus of attention, but the mechanism underlying these effects remains unclear. Here we tested an account attributing the effects of perceptual load to modulations of visual cortex excitability. In contrast to stimulus competition accounts, which propose that load…
Carmel, David; Thorne, Jeremy D.; Rees, Geraint; Lavie, Nilli
|In a variety of conflict paradigms, target and distractor stimuli are defined in terms of perceptual features. Interference evoked by distractor stimuli tends to be reduced when the ratio of congruent to incongruent trials is decreased, suggesting conflict-induced perceptual filtering (i.e., adjusting the processing weights assigned to stimuli…
Wendt, Mike; Luna-Rodriguez, Aquiles; Jacobsen, Thomas
To enhance task performance in partially structured environment, enhancement of teleoperation was proposed by introducing autonomous behaviors. Such autonomy is implemented based on reactive robotic architecture, where reactive motor agents that directly couples sensory inputs and motor actions become the building blocks. To this end, presented in this paper is a perceptual basis for the motor agents. The perceptual basis consists of perceptual agents that extracts environmental information from a structured light vision system and provide action oriented perception for the corresponding motor agents. Rather than performing general scene reconstruction, a perceptual agent directly provides the motion reference for the motor behavior. Various sensory mechanisms--sensor fission, fusion, and fashion--becomes basic building blocks of the perception process. Since perception is a process deeply intertwined with the motor actions, active perception may also incorporate motor behaviors as an integral perceptual process.
Park, Y. S.; Ewing, T. F.; Boyle, J. M.; Yule, T. J.
The criteria by which optically variable devices are judged are aesthetic, semantic, security, ergonomic, and physical/chemical. This paper addresses ergonomic aspects which relate to the human vision and perceptual-cognitive system. Applying some pertinent rules may help greatly to improve the image visual information for easier, more straight-forward reception of a persistent security message. We consider two important aspects of the human visual system that help to determine the ergonomic response to visual displays created using optical diffraction. The human visual system aspect treats the retinal source of information, which is the retinal signal produced when an image of the external world is projected on the retina. The other aspect is the underlying information-processing mechanism of our brains and its constructive operations, which yields the final perceptual information. In this paper we consider information processing methods hidden in the biology of our cognition system. Findings on the relationship between physiology and psychology, sensory results and the activities of the optic pathway and subjective brightness sensations can be applied directly in designing images. Some effects are demonstrated by video tape.
Moser, Jean-Frederic; Staub, Rene; Tompkin, Wayne R.
Perceptual filling-in occurs when structures of the visual system interpolate information across regions of visual space where that information is physically absent. It is a ubiquitous and heterogeneous phenomenon, which takes place in different forms almost every time we view the world around us, such as when objects are occluded by other objects or when they fall behind the blind spot. Yet, to date, there is no clear framework for relating these various forms of perceptual filling-in. Similarly, whether these and other forms of filling-in share common mechanisms is not yet known. Here we present a new taxonomy to categorize the different forms of perceptual filling-in. We then examine experimental evidence for the processes involved in each type of perceptual filling-in. Finally, we use established theories of general surface perception to show how contextualizing filling-in using this framework broadens our understanding of the possible shared mechanisms underlying perceptual filling-in. In particular, we consider the importance of the presence of boundaries in determining the phenomenal experience of perceptual filling-in.
This video of This microscope video shows how live Listeria move via actin filaments in an infected cell. This video is also featured on the DVD 2000 and Beyond: Confronting the Microbe Menace, available free from HHMI. This video is one minute and 7 seconds in length, and available in MOV (6 MB) and WMV (8 MB). All Infectious Disease videos are located at: http://www.hhmi.org/biointeractive/disease/video.html.
Dr. Brett Finlay (Howard Hughes Medical Institute;)
In recent years, advertisers and magazine editors have been widely criticized for taking digital photo retouching to an extreme. Impossibly thin, tall, and wrinkle- and blemish-free models are routinely splashed onto billboards, advertisements, and magazine covers. The ubiquity of these unrealistic and highly idealized images has been linked to eating disorders and body image dissatisfaction in men, women, and children. In response, several countries have considered legislating the labeling of retouched photos. We describe a quantitative and perceptually meaningful metric of photo retouching. Photographs are rated on the degree to which they have been digitally altered by explicitly modeling and estimating geometric and photometric changes. This metric correlates well with perceptual judgments of photo retouching and can be used to objectively judge by how much a retouched photo has strayed from reality. PMID:22123980
Certain simple visual displays consisting of moving 2-D geometric shapes can give rise to percepts with high-level properties such as causality and animacy. This article reviews recent research on such phenomena, which began with the classic work of Michotte and of Heider and Simmel. The importance of such phenomena stems in part from the fact that these interpretations seem to be largely perceptual in nature - to be fairly fast, automatic, irresistible and highly stimulus driven - despite the fact that they involve impressions typically associated with higher-level cognitive processing. This research suggests that just as the visual system works to recover the physical structure of the world by inferring properties such as 3-D shape, so too does it work to recover the causal and social structure of the world by inferring properties such as causality and animacy. PMID:10904254
This paper proposes a robust video fingerprinting method based on 2-Dimensional Oriented Principal Component Analysis (2D-OPCA) of affine covariant regions. The goal of video fingerprinting is to identify a video clip using perceptual features called fingerprints. In the proposed method, to achieve the robustness against geomet- ric transformations, fingerprints are extracted from local regions co- variant with a class of
This paper proposes a novel approach to stitch videos fast and with high quality. In general, scene depth is frequently varying for dynamic video content. For example, moving foreground objects will move closely or far from a camera. The scene depth will be changing according to changed scene content. Therefore, accurate projection transform estimation corresponding to current scene depth is
To extend and clarify the nature of the perceptual processes used by sport experts to perceive schematic sport information, two experiments used schematic football diagrams that varied in structure (structured vs. unstructured) and complexity (complex vs. easy). The primary objective was to examine and characterize the nature of the perceptual structures (chunks) that are initially encoded, stored, and subsequently retrieved. In Experiment 1, compared with nonexperts, experts recalled larger perceptual structures following the initial stimulus presentation of structured stimuli only, replicating the recall findings of previous research in other skill domains. Experiment 2 used a long-term memory recognition task and a sorting task. Experts had superior recall and recognition of structured stimuli only, along with more discriminating sorting criteria of perceptual structures within long-term memory. This suggests that experts possess a highly refined semantic network or organized, structured schematic information. This research extends and clarifies the similarities between the perceptual processes of experts in sport (i.e., football) and experts in skill domains that require obvious cognitive involvement (i.e., chess). The results are discussed with reference to the perceptual and conceptual chunking hypotheses. It is proposed that sport experts' knowledge of a conceptual category enables them to retrieve elements using a "generate-and-test process." PMID:1862820
Interactive video and television viewers should have the power to control their viewing position. To make this a reality, we introduce the concept of Immersive Video, which employs computer vision and computer graphics technologies to provide remote users a sense of complete immersion when viewing an event. Immersive Video uses multiple videos of an event, captured from different perspectives, to
Interactive video and television viewers should have the power to control their viewing position. To realize this, we introduce the concept of immersive video, which employs computer vision and computer graphics technologies to provide viewers of live events a sense of total immersion by providing the viewer with a “virtual camera”. Immersive video uses multiple videos of an event, captured
Saied Moezzi; Arun Katkere; D. Y. Kuramura; Ramesh Jain
|Last school year, I had a web link emailed to me entitled "A Dashboard Physics Lesson." The link, created and posted by Dale Basier on his "Lab Out Loud" blog, illustrates video of a car's speedometer synchronized with video of the road. These two separate video streams are compiled into one video that students can watch and analyze. After seeing…
Last school year, I had a web link emailed to me entitled "A Dashboard Physics Lesson." The link, created and posted by Dale Basier on his "Lab Out Loud" blog, illustrates video of a car's speedometer synchronized with video of the road. These two separate video streams are compiled into one video that students can watch and analyze. After seeing…
The current investigation was conducted to elucidate whether errors of commission in the Sustained Attention to Response Task (SART) are indicators of perceptual or motor decoupling. Twenty-eight participants completed SARTs with motor and perceptual aspects of the task manipulated. The participants completed four different SART blocks whereby stimuli location uncertainty and stimuli acquisition were manipulated. In previous studies of more traditional sustained attention tasks stimuli location uncertainty reduces sustained attention performance. In the case of the SART the motor manipulation (stimuli acquisition), but not the perceptual manipulation (stimuli location uncertainty) significantly reduced commission errors. The results suggest that the majority of SART commission errors are likely to be indicators of motor decoupling not necessarily perceptual decoupling. PMID:23838467
An optimal agent will base judgments on the strength and reliability of decision-relevant evidence. However, previous investigations of the computational mechanisms of perceptual judgments have focused on integration of the evidence mean (i.e., strength), and overlooked the contribution of evidence variance (i.e., reliability). Here, using a multielement averaging task, we show that human observers process heterogeneous decision-relevant evidence more slowly and less accurately, even when signal strength, signal-to-noise ratio, category uncertainty, and low-level perceptual variability are controlled for. Moreover, observers tend to exclude or downweight extreme samples of perceptual evidence, as a statistician might exclude an outlying data point. These phenomena are captured by a probabilistic optimal model in which observers integrate the log odds of each choice option. Robust averaging may have evolved to mitigate the influence of untrustworthy evidence in perceptual judgments.
This study evaluated visual perceptual grouping in schizophrenia to test the hypothesis that the disorganization syndrome in schizophrenia is related to a deficit in cognitive coordination. Perceptual grouping was examined with three psychophysically well-controlled tasks in patients with disorganized schizophrenia (n=11), non-disorganized schizophrenia (n=24), psychotic disorders other than schizophrenia (n=31) and non-psychotic psychiatric disorders (n=35). These measures assessed processing of
Peter J. Uhlhaas; William A. Phillips; Gordon Mitchell; Steven M. Silverstein
|Humans can rapidly extract object and category information from an image despite surprising limitations in detecting changes to the individual parts of that image. In this article we provide evidence that the construction of a perceptual whole, or Gestalt, reduces awareness of changes to the parts of this object. This result suggests that the…
Visual perception involves the grouping of individual elements into coherent patterns, such as object representations, that reduce the descriptive complexity of a visual scene. The computational and physiological bases of this perceptual remain poorly understood. We discuss recent fMRI evidence from our laboratory where we measured activity in a higher object processing area (LOC), and in primary visual cortex (V1)
|Determining the acceleration of a free-falling object due to gravity is a standard experiment in physics. Different methods to do this have been developed over the years. This article discusses the use of video-analysis tools as another method. If there is a video available and a known scale it is possible to analyse the motion. The use of video…
If people represent concepts with perceptual simulations, two predictions follow in the property verification task (e.g., Is face a property of GORILLA?). First, perceptual variables such as property size should predict the performance of neutral subjects, because these variables determine the ease of processing properties in perceptual simulations (i.e., perceptual effort). Second, uninstructed neutral subjects should spontaneously construct simulations to verify properties and therefore perform similarly to imagery subjects asked explicitly to use images (i.e., instructional equivalence). As predicted, neutral subjects exhibited both perceptual effort and instructional equivalence, consistent with the assumption that they construct perceptual simulations spontaneously to verify properties. Notably, however, this pattern occurred only when highly associated false properties prevented the use of a word association strategy. In other conditions that used unassociated false properties, the associative strength between concept and property words became a diagnostic cue for true versus false responses, so that associative strength became a better predictor of verification than simulation. This pattern indicates that conceptual tasks engender mixtures of simulation and word association, and that researchers must deter word association strategies when the goal is to assess conceptual knowledge. PMID:15190717
An adequate ontology of color must face the empirical facts about per- ceptual variation. In this paper I begin by reviewing a range of data about perceptual variation, and showing how they tell against color physicalism and motivate color relationalism. Next I consider a series of objections to the argument from perceptual variation, and argue that they are un- persuasive.
Immersive multimedia requires not only realistic visual imagery but also a perceptually-accurate aural experience. A sound field may be presented simultaneously to a listener via a loudspeaker rendering system using the direct sound from acoustic sources as well as a simulation or "auralization" of room acoustics. Beginning with classical Wave-Field Synthesis (WFS), improvements are made to correct for asymmetries in loudspeaker array geometry. Presented is a new Spatially-Equalized WFS (SE-WFS) technique to maintain the energy-time balance of a simulated room by equalizing the reproduced spectrum at the listener for a distribution of possible source angles. Each reproduced source or reflection is filtered according to its incidence angle to the listener. An SE-WFS loudspeaker array of arbitrary geometry reproduces the sound field of a room with correct spectral and temporal balance, compared with classically-processed WFS systems. Localization accuracy of human listeners in SE-WFS sound fields is quantified by psychoacoustical testing. At a loudspeaker spacing of 0.17 m (equivalent to an aliasing cutoff frequency of 1 kHz), SE-WFS exhibits a localization blur of 3 degrees, nearly equal to real point sources. Increasing the loudspeaker spacing to 0.68 m (for a cutoff frequency of 170 Hz) results in a blur of less than 5 degrees. In contrast, stereophonic reproduction is less accurate with a blur of 7 degrees. The ventriloquist effect is psychometrically investigated to determine the effect of an intentional directional incongruence between audio and video stimuli. Subjects were presented with prerecorded full-spectrum speech and motion video of a talker's head as well as broadband noise bursts with a static image. The video image was displaced from the audio stimulus in azimuth by varying amounts, and the perceived auditory location measured. A strong bias was detectable for small angular discrepancies between audio and video stimuli for separations of less than 8 degrees for speech and less than 4 degrees with a pink noise burst. The results allow for the density of WFS systems to be selected from the required localization accuracy. Also, by exploiting the ventriloquist effect, the angular resolution of an audio rendering may be reduced when combined with spatially-accurate video.
Highly emotional events are associated with vivid "flashbulb" memories. Here we examine whether the flashbulb metaphor characterizes a previously unknown emotion-enhanced vividness (EEV) during initial perceptual experience. Using a magnitude estimation procedure, human observers estimated the relative magnitude of visual noise overlaid on scenes. After controlling for computational metrics of objective visual salience, emotional salience was associated with decreased noise, or heightened perceptual vividness, demonstrating EEV, which predicted later memory vividness. Event-related potentials revealed a posterior P2 component at ?200 ms that was associated with both increased emotional salience and decreased objective noise levels, consistent with EEV. Blood oxygenation level-dependent response in the lateral occipital complex (LOC), insula, and amygdala predicted online EEV. The LOC and insula represented complimentary influences on EEV, with the amygdala statistically mediating both. These findings indicate that the metaphorical vivid light surrounding emotional memories is embodied directly in perceptual cortices during initial experience, supported by cortico-limbic interactions. PMID:22895705
Todd, Rebecca M; Talmi, Deborah; Schmitz, Taylor W; Susskind, Josh; Anderson, Adam K
Background Human resolution for object size is typically determined by psychophysical methods that are based on conscious perception. In contrast, grasping of the same objects might be less conscious. It is suggested that grasping is mediated by mechanisms other than those mediating conscious perception. In this study, we compared the visual resolution for object size of the visuomotor and the perceptual system. Methodology/Principal Findings In Experiment 1, participants discriminated the size of pairs of objects once through perceptual judgments and once by grasping movements toward the objects. Notably, the actual size differences were set below the Just Noticeable Difference (JND). We found that grasping trajectories reflected the actual size differences between the objects regardless of the JND. This pattern was observed even in trials in which the perceptual judgments were erroneous. The results of an additional control experiment showed that these findings were not confounded by task demands. Participants were not aware, therefore, that their size discrimination via grasp was veridical. Conclusions/Significance We conclude that human resolution is not fully tapped by perceptually determined thresholds. Grasping likely exhibits greater resolving power than people usually realize.
Ganel, Tzvi; Freud, Erez; Chajut, Eran; Algom, Daniel
Priming effects in perceptual tests of implicit memory are assumed to be perceptually specific. Surprisingly, changing object colors from study to test did not diminish priming in most previous studies. However, these studies used implicit tests that are based on object identification, which mainly depends on the analysis of the object shape and therefore operates color-independently. The present study shows
In recent years, many experiments have demonstrated that optic flow is sufficient for visually controlled action, with the suggestion that perceptual representations of 3-D space are superfluous. In contrast, recent research in our lab indicates that some visually controlled actions, including some thought to be based on optic flow, are indeed mediated by perceptual representations. For example, we have demonstrated that people are able to perform complex spatial behaviors, like walking, driving, and object interception, in virtual environments which are rendered visible solely by cyclopean stimulation (random-dot cinematograms). In such situations, the absence of any retinal optic flow that is correlated with the objects and surfaces within the virtual environment means that people are using stereo-based perceptual representations to perform the behavior. The fact that people can perform such behaviors without training suggests that the perceptual representations are likely the same as those used when retinal optic flow is present. Other research indicates that optic flow, whether retinal or a more abstract property of the perceptual representation, is not the basis for postural control, because postural instability is related to perceived relative motion between self and the visual surroundings rather than to optic flow, even in the abstract sense.
Loomis, Jack M.; Beall, Andrew C.; Kelly, Jonathan W.; Macuga, Kristen L.
The metaphor of film and TV permeates the design of software to support video on the PC. Simply transplanting the non- interactive, sequential experience of film to the PC fails to exploit the virtues of the new context. Video ont eh PC should be interactive and non-sequential. This paper experiments with a variety of tools for using video on the PC that exploits the new content of the PC. Some feature are more successful than others. Applications that use these tools are explored, including primarily the home video archive but also streaming video servers on the Internet. The ability to browse, edit, abstract and index large volumes of video content such as home video and corporate video is a problem without appropriate solution in today's market. The current tools available are complex, unfriendly video editors, requiring hours of work to prepare a short home video, far more work that a typical home user can be expected to provide. Our proposed solution treats video like a text document, providing functionality similar to a text editor. Users can browse, interact, edit and compose one or more video sequences with the same ease and convenience as handling text documents. With this level of text-like composition, we call what is normally a sequential medium a 'video document'. An important component of the proposed solution is shot detection, the ability to detect when a short started or stopped. When combined with a spreadsheet of key frames, the host become a grid of pictures that can be manipulated and viewed in the same way that a spreadsheet can be edited. Multiple video documents may be viewed, joined, manipulated, and seamlessly played back. Abstracts of unedited video content can be produce automatically to create novel video content for export to other venues. Edited and raw video content can be published to the net or burned to a CD-ROM with a self-installing viewer for Windows 98 and Windows NT 4.0.
Periodic bimanual movements are often the focus of studies of the basic organizational principles of human actions. In such movements there is a typical spontaneous tendency towards mirror symmetry. Even involuntary slips from asymmetrical movement patterns into symmetry occur, but not vice versa. Traditionally, this phenomenon has been interpreted as a tendency towards co-activation of homologous muscles, probably originating in motoric neuronal structures. Here we provide evidence contrary to this widespread assumption. We show for two prominent experimental models-bimanual finger oscillation and bimanual four-finger tapping-that the symmetry bias is actually towards spatial, perceptual symmetry, without regard to the muscles involved. We suggest that spontaneous coordination phenomena of this kind are purely perceptual in nature. In the case of a bimanual circling model, our findings reveal that highly complex, even 'impossible' movements can easily be performed with only simple visual feedback. A 'motoric' representation of the performed perceptual oscillation patterns is not necessary. Thus there is no need to translate such a 'motoric' into a 'perceptual' representation or vice versa, using 'internal models' (ref. 29). We suggest that voluntary movements are organized by way of a representation of the perceptual goals, whereas the corresponding motor activity, of sometimes high complexity, is spontaneously and flexibly tuned in. PMID:11689944
The quality assessment of impaired stereoscopic video is a key element in designing and deploying advanced immersive media distribution platforms. A widely accepted quality metric to measure impairments of stereoscopic video is, however, still to be developed. As a step toward finding a solution to this problem, this paper proposes a full reference stereoscopic video quality metric to measure the perceptual quality of compressed stereoscopic video. A comprehensive set of subjective experiments is performed with 14 different stereoscopic video sequences, which are encoded using both the H.264 and high efficiency video coding compliant video codecs, to develop a subjective test results database of 116 test stimuli. The subjective results are analyzed using statistical techniques to uncover different patterns of subjective scoring for symmetrically and asymmetrically encoded stereoscopic video. The subjective result database is subsequently used for training and validating a simple but effective stereoscopic video quality metric considering heuristics of binocular vision. The proposed metric performs significantly better than state-of-the-art stereoscopic image and video quality metrics in predicting the subjective scores. The proposed metric and the subjective result database will be made publicly available, and it is expected that the proposed metric and the subjective assessments will have important uses in advanced 3D media delivery systems. PMID:23771337
De Silva, Varuna; Arachchi, Hemantha Kodikara; Ekmekcioglu, Erhan; Kondoz, Ahmet
George Nauck of ENCORE invented and markets the Advanced Range Performance (ARPM) Video Golf System for measuring the result of a golf swing. After Nauck requested their assistance, Marshall Space Flight Center scientists suggested video and image process...
Salmonella are a common bacteria associated with food poisoning. Dr. Finlay shows live Salmonella under the microscope to demonstrate how far and fast they can move. This video is also featured on the DVD 2000 and Beyond: Confronting the Microbe Menace, available free from HHMI. This video is 37 seconds in length, and available in MOV (13 MB) and WMV (18 MB). All Infectious Disease videos are located at: http://www.hhmi.org/biointeractive/disease/video.html.
Howard Hughes Medical Institute (Howard Hughes Medical Institute;)
In the past few years, the video game explosion has permeated our everyday culture, involving in particular the younger generation. Video games once found only in arcades are now being manufactured for home use. This study seeks to analyze the value of video games as an activity program for geriatric populations in skilled nursing home facilities. Areas examined include resident
Research on human infants has begun to shed light on early-developing processes for segmenting perceptual arrays into objects. Infants appear to perceive obiects by analyzing three-dlmensional surface arrangements and motions. Their per- ception does not accord with a general tendency to maximize figural goodness or to attend to nonaccldentol geometric relations in visual arrays. Object perceptlan does accord with principles
A perceptual analogue of Kelley's augmentation principle was created in animated films depicting the movements of two objects toward a goal. Experiment 1 examined children's causal attributions in the presence and absence of inhibitory causes. Experiment 2 investigated children's causal attributions in the presence of inhibitory causes of…
|This study examined numerosity comparison in 3-year-old children. Predictions derived from the analog numerical model and the object-file model were contrasted by testing the effects of size and ratio between numerosities to be compared. Different perceptual controls were also introduced to evaluate the hypothesis that comparison by preschoolers…
The Individually Prescribed Instruction (IPI) Model developed by Bolvin and Glaser (1968) is applied to a perceptual development curriculum for children manifesting learning disabilities. The Model utilizes criterion referenced tests for behavioral objectives in four areas: general motor, visual motor, auditory motor, and integrative. Eight units…
Basic aspects of magnitude (such as luminance contrast) are directly represented by sensory representations in early visual areas. However, it is unclear how symbolic magnitudes (such as Arabic numerals) are represented in the brain. Here we show that symbolic magnitude affects binocular rivalry: perceptual dominance of numbers and objects of known size increases with their magnitude. Importantly, variations in symbolic magnitude acted like variations in luminance contrast: we found that an increase in numerical magnitude of adding one lead to an equivalent increase in perceptual dominance as a contrast increment of 0.32%. Our results support the claim that magnitude is extracted automatically, since the increase in perceptual dominance came about in the absence of a magnitude-related task. Our findings show that symbolic, acculturated knowledge about magnitude interacts with visual perception and affects perception in a manner similar to lower-level aspects of magnitude such as luminance contrast. PMID:21316649
Stochastic accumulator models account for response time in perceptual decision-making tasks by assuming that perceptual evidence accumulates to a threshold. The present investigation mapped the firing rate of frontal eye field (FEF) visual neurons onto perceptual evidence and the firing rate of FEF movement neurons onto evidence accumulation to…
Purcell, Braden A.; Heitz, Richard P.; Cohen, Jeremiah Y.; Schall, Jeffrey D.; Logan, Gordon D.; Palmeri, Thomas J.
The structuring of the sensory scene (perceptual organization) profoundly affects what we perceive, and is of increasing clinical interest. In both vision and audition, many cues have been identified that influence perceptual organization, but only a little is known about its neural basis. Previous studies have suggested that auditory cortex may play a role in auditory perceptual organization (also called
|Stochastic accumulator models account for response time in perceptual decision-making tasks by assuming that perceptual evidence accumulates to a threshold. The present investigation mapped the firing rate of frontal eye field (FEF) visual neurons onto perceptual evidence and the firing rate of FEF movement neurons onto evidence accumulation to…
Purcell, Braden A.; Heitz, Richard P.; Cohen, Jeremiah Y.; Schall, Jeffrey D.; Logan, Gordon D.; Palmeri, Thomas J.
|Performance in perceptual tasks often improves with practice. This effect is known as "perceptual learning," and it has been the source of a great deal of interest and debate over the course of the last century. Here, we consider the effects of perceptual learning within the context of signal detection theory. According to signal detection…
Gold, Jason M.; Sekuler, Allison B.; Bennett, Partrick J.
To determine the effect of perceptual anticipation upon reaction time, two different types of experiment were carried out. In the first a skilled response had occasionally to be altered at a given point after a variable warning period. In the second the subject had to react to two auditory signals separated by a short time interval which was systematically varied,
Reaction time is significantly higher to homosexual and sexual words than to neutral words for both high and low scorers on a scale for manifest attitudes toward homosexuality. Fraternity men and applied majors score higher on this scale than nonfraternity men and liberal arts majors. Discussion suggests stereotyping rather than perceptual defense.
Two relevant dimensions are revealed within which developmental patterns of perceptual organization might be investigated. Within the local-integrative dimension, employing a contour integration task, we found indications that spatial integration develops slowly. We also found reduced contextual modulation of a local target in children employing the Ebbinghaus illusion. Within the action-perception dimension, we hypothesize a relatively slow development of the
Most current natural language processing (NLP) systems are built using statistical learning algo- rithms trained on large annotated corpora which can be expensive and time-consuming to collect. In contrast, humans can learn language through exposure to linguistic input in the context of a rich, rele- vant, perceptual environment. If a machine learning system can acquire language in a similar manner
This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that ...
A panorama video server system has been developed. This system produces a continuous panoramic view of the entire surrounding area in real time and allows multiple users to select and view visual fields independently. A significant feature of the system is that each user can select the visual field he or she wants to see at the same time. This new system is composed of video cameras, video signal conversion units, video busses, and visual field selection units. It can be equipped with up to 24 video cameras. The most appropriate camera arrangement can be decided by considering both the objects to be taken and the viewing angle of the cameras. The visual field selection unit picks up the required image data from video busses, on which all of the video data is provided. The number of users who can access simultaneously depends only on the number of visual field selection units. To smoothly connect two images captured by different cameras, a luminance-compensating function and a geometry-compensating function are included. This system has many interesting applications, such as in the distribution of beautiful scenery, sports, and monitoring.
A new television system based on the Autosophy information theory is now being developed for Internet video applications. Autosophy communication is already widely used on the Internet, with current applications including data compression in V.42bis modems, and the lossless still image compression standards GIF and TIF. Now Autosophy Internet video is being developed and is ready for demonstration. In conventional television, bit rates are determined entirely by screen size, resolution, and scanning rates. The images shown on the screen are irrelevant, such that random noise video requires the same bit rate as any other video content. In the new Autosophy-based television, in contrast, bit rates are determined entirely by the video content, essentially motion and complexity within the images. A very high degree of visually lossless video compression is possible because only moving portions of the video are transmitted. Transmitted codes represent multi-pixel image clusters found in a pre-grown hyperspace library. The system can dynamically reduce resolution of fast-moving objects when necessary to accommodate bandwidth restrictions. Ideally suited to the Internet environment, the new television also features high resistance to delayed or dropped packets, a universal hardware-independent communication format, and optional codebook encryption for secure communications.
The traditional study of associations is part of a more inclusive problem, one that concerns the relation between the objective structure of the stimulus and the coherence of its parts, as evident in perception and memory. The classical association experiment has restricted observation to a limited phase of this problem. It has concentrated on the coherence established between pairs of
According to the object-based view, visual attention can be deployed to ''objects'' or perceptual units, regardless of spatial locations. Recently, however, the notion of object has also been extended to the auditory domain, with some authors suggesting possible interactions between visual and auditory objects. Here we show that task-irrelevant…
The following shape segmentation problem is addressed: find the part decomposition of a 3D object that accounts for an observed pattern of similarities among several of the object's views. This represents the inverse, ill-posed version of the direct problem of computing perceptual similarities among object views when the object parts are known. The problem is solved by inverting a proposed
Alzheimer's disease (AD) is often accompanied by impaired object recognition, thereby reducing the ability to recognize common objects and familiar faces. Impaired recognition may stem from decreased efficacy in integrating visual information. Studies of perceptual abnormalities in AD indicate an impairment in organizing elements of the visual scene, thereby confusing components of individual forms. This type of impairment is consistent with the characteristics of neural loss, which impact cortical integration. To examine the extent to which perceptual organization is impaired in AD, psychophysical measurements were made of visual perceptual grouping based upon spatial relationships in a group of AD patients and demographically matched elderly control subjects. A comparison was also made between young and elderly control subjects to evaluate the effects of aging on these capacities. Deficits in perceptual organization were found for a subgroup of AD patients, which corresponded to impairment on facial recognition. A less profound functional decline was found for the elderly control group. The degree of impairment for AD subjects did not correlate to level of dementia, but instead appears to be idiosyncratic to individual patients. These results are consistent with impaired integrative function in AD, the degree of which reflects individual differences in the regional distribution of neuropathological changes. PMID:12719635
Kurylo, Daniel D; Allan, Walter C; Collins, T Edward; Baron, Joshua
This paper discusses the use of video analysis software and computer-generated animations for student activities. The use of artificial video affords the opportunity for students to study phenomena for which a real video may not be easy or even possible to procure, using analysis software with which the students are already familiar. We will present three example activities: two lab activity ``add ons'' as well as a complete virtual laboratory exercise.
The ability of 16 AD patients and 16 age-matched control Ss to discriminate degraded forms was compared. Also examined were the effects of aging on perceptual organization by comparison of performance of normal Ss ranging in age from 20 to 86 years. Ss discriminated 2 forms, a circle and a square, each composed of randomly distributed dots concurrently embedded in visual noise. By means of a forced-choice procedure, the threshold signal-to-noise ratios at 4 levels of figure degradation were obtained, each presented at 3 durations. Performance by the normal Ss did not vary with age for long-duration stimuli, but did decline with age for briefly presented stimuli. Relative to age-matched control Ss, AD patients had significantly elevated thresholds at all form densities. Disruption of visual processing at the level of perceptual organization is likely a contributing factor to impairment of high-order visual function. PMID:7893427
Grounded in scholarship on both the perceptual and behavioral components of the third-person effect, the present experimental study examined the effects of perceived impact of political parody videos on self and on others, by varying the perceived intent of the video producer and perceived level of exposure. Building on previous research on the behavioral consequences of such presumed influence, we
An investigation was made of the time course of perceptual grouping that is based on two qualitatively different spatial relationships:\\u000a proximity and alignment. An index of grouping capacity was used to assess the processing time required before a backward pattern\\u000a mask interfered with grouping. Stimuli consisted of bistable arrays of disjunct dots that were followed by a mask. Grouping\\u000a cues,
Past studies have revealed that encountering negative events interferes with cognitive processing of subsequent stimuli. The present study investigates whether negative events affect semantic and perceptual processing differently. Presentation of negative pictures produced slower reaction times than neutral or positive pictures in tasks that require semantic processing, such as natural or man-made judgments about drawings of objects, commonness judgments about objects, and categorical judgments about pairs of words. In contrast, negative picture presentation did not slow down judgments in subsequent perceptual processing (e.g., color judgments about words, size judgments about objects). The subjective arousal level of negative pictures did not modulate the interference effects on semantic or perceptual processing. These findings indicate that encountering negative emotional events interferes with semantic processing of subsequent stimuli more strongly than perceptual processing, and that not all types of subsequent cognitive processing are impaired by negative events. PMID:22142207
Virtual reality technology may provide new options for conducting perceptual-motor assessment within simulated 3D environments for persons with a wide range of disabilities. This paper outlines our work developing a series of game-like VR scenarios to assess and rehabilitate eye-hand coordination, range of motion and other relevant perceptual-motor activities. Our efforts have focused on building engaging game-based stereoscopic graphic scenarios that allow clients to participate in perceptual-motor rehabilitation by interacting with 3D stimuli within a full 360-degree space using a head mounted display or by way of a "face-forward" format using 3D projection displays. Exploratory work using multiple video sensors to detect and track 3D body motion, identify body postures and quantify motor performance is also described. PMID:17271398
Rizzo, A A; Cohen, I; Weiss, P L; Kim, J G; Yeh, S C; Zali, B; Hwang, J
This collection of videos, produced by Global Learning and Observations to Benefit the Environment (GLOBE), features overviews of the GLOBE program as well as a variety of science topics. These include Earth as a system, atmospheric science, water chemistry, soil science, land cover, and remote sensing. The videos can be downloaded or viewed directly from the website and are accompanied by written scripts.
The GLOBE Program, University Corporation for Atmospheric Research (UCAR)
The paper introduces a transformed spherical model to represent the color space. A circular cone with a spherical top tightly circumscribing the RGB color cube is equipped with a spherical coordinate system. Every point in the color cube is represented by three spherical coordinates, with the radius ? measuring the distance to the origin, indicating the brightness attribute of the color, the azimuthal angle ? measuring the angle on the horizontal plane, indicating the hue attribute of the color, and the polar angle ? measuring the opening of the circular cone with the vertical axis as its center, indicating the saturation attribute of the color. Similar to the commonly used perceptual color models including the HSV model, the spherical model specifies color by describing the color attributes recognized by human vision. The conversions between the spherical model and the RGB color model are mathematically simpler than that of the HSV model, and the interpretation of the model is more intuitive too. Most importantly, color changes perceptually smoother in the spherical color model than in the existing perceptual color models.
The development of video quality metrics requires methods for measuring perceived video quality. Most of these metrics are designed and tested using databases of images degraded by compression and scored using opinion ratings. We studied video quality preferences for enhanced images of normally-sighted participants using the method of paired comparisons with a thorough statistical analysis. Participants (n=40) made pair-wise comparisons of high definition video clips enhanced at four different levels using a commercially available enhancement device. Perceptual scales were computed with binary logistic regression to estimate preferences for each level and to provide statistical inference of the differences among levels and the impact of other variables. While moderate preference for enhanced videos was found, two unexpected effects were also uncovered: 1) participants could be broadly classified into two groups: a) those who preferred enhancement ("Sharp") and b) those who disliked enhancement ("Smooth") and 2) enhancement preferences depended on video content, particularly for human faces to be enhanced less. The results suggest that algorithms to evaluate image quality (at least for enhancement) may need to be adjusted or applied differentially based on video content and viewer preferences. The possible impact of similar effects on image quality of compressed video needs to be evaluated. PMID:24107400
Satgunam, Prem Nandhini; Woods, Russell L; Bronstad, P Matthew; Peli, Eli
The stimulus complexity of naturally occurring odours presents unique challenges for central nervous systems that are aiming to internalize the external olfactory landscape. One mechanism by which the brain encodes perceptual representations of behaviourally relevant smells is through the synthesis of different olfactory inputs into a unified perceptual experience — an odour object. Recent evidence indicates that the identification, categorization
A robust finding among functional neuroimaging studies on visual priming is decreased neural activity in extrastriate and inferior prefrontal cortices for the second presentation of an object relative to its first presentation. This effect can also be observed for different but perceptually similar objects that are alternative exemplars of the initially presented object (e.g. two different pencils). An unanswered question
Video is composed of audio-visual information. Providing content based access to video data is essential for the sucessful integration of video into computers. Organizing video for content based access requires the use of video metadata. This paper explores the nature video metadata. A data model for video databases is presented based on a study of the applications of video, the
A linked perceptual class consists of two distinct perceptual classes, A' and B', the members of which have become related to each other. For example, a linked perceptual class might be composed of many pictures of a woman (one perceptual class) and the sounds of that woman's voice (the other perceptual class). In this case, any sound of the…
|A linked perceptual class consists of two distinct perceptual classes, A' and B', the members of which have become related to each other. For example, a linked perceptual class might be composed of many pictures of a woman (one perceptual class) and the sounds of that woman's voice (the other perceptual class). In this case, any sound of the…
Perceptual quality evaluation experiments are used to assess the excellence of multimedia quality. However, these studies disregard qualitative experiential descriptions, interpretations, and impressions of quality. The goal of this paper is to identify general descriptive characteristics of experienced quality of 3D video on mobile devices. We conducted five studies in which descriptive data was collected after the psychoperceptual quality evaluation
Satu Jumisko-Pyykkö; Dominik Strohmeier; Timo Utriainen; Kristina Kunze
The general goals and aims of this project were to evaluate the use of computer training software, and in particular action video games, as a tool to train and enhance perceptual skills, visual attention, and working memory capacities. The author's p...
From phenomenological and experimental perspectives, research in schizophrenia has emphasized deficits in “higher” cognitive functions, including attention, executive function, as well as memory. In contrast, general consensus has viewed dysfunctions in basic perceptual processes to be relatively unimportant in the explanation of more complex aspects of the disorder, including changes in self-experience and the development of symptoms such as delusions. We present evidence from phenomenology and cognitive neuroscience that changes in the perceptual field in schizophrenia may represent a core impairment. After introducing the phenomenological approach to perception (Husserl, the Gestalt School), we discuss the views of Paul Matussek, Klaus Conrad, Ludwig Binswanger, and Wolfgang Blankenburg on perception in schizophrenia. These 4 psychiatrists describe changes in perception and automatic processes that are related to the altered experience of self. The altered self-experience, in turn, may be responsible for the emergence of delusions. The phenomenological data are compatible with current research that conceptualizes dysfunctions in perceptual processing as a deficit in the ability to combine stimulus elements into coherent object representations. Relationships of deficits in perceptual organization to cognitive and social dysfunction as well as the possible neurobiological mechanisms are discussed.
Various types of video can be captured with fisheye lenses; their wide field of view is particularly suited to surveillance video. However, fisheye lenses introduce distortion, and this changes as objects in the scene move, making fisheye video difficult to interpret. Current still fisheye image correction methods are either limited to small angles of view, or are strongly content dependent, and therefore unsuitable for processing video streams. We present an efficient and robust scheme for fisheye video correction, which minimizes time-varying distortion and preserves salient content in a coherent manner. Our optimization process is controlled by user annotation, and takes into account a wide set of measures addressing different aspects of natural scene appearance. Each is represented as a quadratic term in an energy minimization problem, leading to a closed-form solution via a sparse linear system. We illustrate our method with a range of examples, demonstrating coherent natural-looking video output. The visual quality of individual frames is comparable to those produced by state-of-the-art methods for fisheye still photograph correction. PMID:21788670
Wei, Jin; Li, Chen-Feng; Hu, Shi-Min; Martin, Ralph R; Tai, Chiew-Lan
Theories of autism have proposed that a bias towards low-level perceptual information, or a featural\\/surface-biased information- processing style, may compromise higher-level language processing in such individuals. Two experiments, utilizing linguistic stimuli with competing low-level\\/perceptual and high-level\\/semantic information, tested processing biases in children with autism and matched controls. Whereas children with autism exhibited superior perceptual processing of speech relative to controls,
Anna Järvinen-Pasley; Gregory L. Wallace; Franck Ramus; Francesca Happé; Pamela Heaton
A new speech analysis-synthesis approach that is based on a perceptually-motivated all-pole (PMAP) modeling is described. The main idea is to directly estimate the perceptually relevant pole locations using an auditory excitation pattern-matching method. The all-pole model is synthesized using the perceptual poles and produces improved spectral fitting. We show that the prediction residual obtained from the PMAP analysis has
When an image is viewed at varying resolutions, it is known to create discrete perceptual jumps or transi- tions amid the continuous intensity changes. In this pa- per, we study a perceptual scale-space theory which differs from the traditional image scale-space theory in two aspects. (i) In representation, the perceptual scale-space adopts a full generative model. From a Gaussian pyramid
The problem addressed here is the classification of videos at the highest level into pre-defined genre. The approach adopted is based on the dynamic content of short sequences (~30 secs). This paper presents two methods of extracting motion from a video sequence: foreground object motion and background camera motion. These dynamics are extracted, processed and applied to classify 3 broad
Objective: The aim of this paper is to revisit the controversial issue of the association of violent video games and aggressive behaviour.Conclusions: Several lines of evidence suggest that there is a link between exposure to violent video games and aggressive behaviour. However, methodological shortcomings of research conducted so far make several interpretations of this relationship possible. Thus, aggressive behaviour may
Educators often use concrete objects to help children understand mathematics concepts. However, findings on the effectiveness of concrete objects are mixed. The present study examined how two factors-perceptual richness and established knowledge of the objects-combine to influence children's counting performance. In two experiments, preschoolers (N = 133; Mage = 3;10) were randomly assigned to counting tasks that used one of four types of objects in a 2 (perceptual richness: high or low) × 2 (established knowledge: high or low) factorial design. Findings suggest that perceptually rich objects facilitate children's performance when children have low knowledge of the objects but hinder performance when children have high knowledge of the objects. PMID:23240867
There are large amount of CCTV cameras collecting colossal amounts of video data about people and their behaviour. However, this overwhelming amount of data also causes overflow of information if their content is not analysed in a wider context to provide selective focus and automated alert triggering. To date, truly semantics based video analytic systems do not exist. There is an urgent need for the development of automated systems to monitor holistically the behaviours of people, vehicles and the whereabout of objects of interest in public space. In this work, we highlight the challenges and recent progress towards building computer vision systems for holistic video detection in a distributed network of multiple cameras based on object localisation, categorisation and tagging from different views in highly cluttered scenes.
Perceptual learning is required for olfactory function to adapt appropriately to changing odor environments. We here show that newborn neurons in the olfactory bulb are not only involved in, but necessary for, olfactory perceptual learning. First, the discrimination of perceptually similar odorants improves in mice after repeated exposure to the odorants. Second, this improved discrimination is accompanied by an elevated survival rate of newborn inhibitory neurons, preferentially involved in processing of the learned odor, within the olfactory bulb. Finally, blocking neurogenesis before and during the odorant exposure period prevents this learned improvement in discrimination. Olfactory perceptual learning is thus mediated by the reinforcement of functional inhibition in the olfactory bulb by adult neurogenesis.
|We investigated whether there exists a behavioral dependency between object detection and categorization. Previous work (Grill-Spector & Kanwisher, 2005) suggests that object detection and basic-level categorization may be the very same perceptual mechanism: As objects are parsed from the background they are categorized at the basic level. In the…
Direct observations of behavior evoked by a fetish object (wet shoe) in one patient are reported. The intrinsic qualities of an object endowing it with fetish power are examined. Such qualities may be related to human perceptual preferences, a product of phylogeny, stemming from such factors as the primate interest in body parts and extracorporeal objects. The capacity to relate
An objective image quality metric can be used to compare the output of different image processing algorithms, but objective measures are not always well correlated with subjective image quality assessment; the latter implies the use of human observers, thus objective methods able to emulate the Human Visual System (HVS) better than the classical measures are preferred. In this paper a full reference objective metric, based on perceptual criteria and oriented to demosaiced images is proposed. This technique is useful to evaluate the quality of the interpolation techniques implemented in the image processing pipeline of different Digital Still Cameras (DSC).
A video flowmeter is described that is capable of specifying flow nature and pattern and, at the same time, the quantitative value of the rate of volumetric flow. An image of a determinable volumetric region within a fluid containing entrained particles is formed and positioned by a rod optic lens assembly on the raster area of a low-light level television camera. The particles are illuminated by light transmitted through a bundle of glass fibers surrounding the rod optic lens assembly. Only particle images having speeds on the raster area below the raster line scanning speed may be used to form a video picture which is displayed on a video screen. The flowmeter is calibrated so that the locus of positions of origin of the video picture gives a determination of the volumetric flow rate of the fluid.
A video flowmeter is described that is capable of specifying flow nature and pattern and, at the same time, the quantitative value of the rate of volumetric flow. An image of a determinable volumetric region within a fluid containing entrained particles is formed and positioned by a rod optic lens assembly on the raster area of a low-light level television camera. The particles are illuminated by light transmitted through a bundle of glass fibers surrounding the rod optic lens assembly. Only particle images having speeds on the raster area below the raster line scanning speed may be used to form a video picture which is displayed on a video screen. The flowmeter is calibrated so that the locus of positions of origin of the video picture gives a determination of the volumetric flow rate of the fluid. 4 figs.
A video flowmeter is described that is capable of specifying flow nature and pattern and, at the same time, the quantitative value of the rate of volumetric flow. An image of a determinable volumetric region within a fluid (10) containing entrained particles (12) is formed and positioned by a rod optic lens assembly (31) on the raster area of a low-light level television camera (20). The particles (12) are illuminated by light transmitted through a bundle of glass fibers (32) surrounding the rod optic lens assembly (31). Only particle images having speeds on the raster area below the raster line scanning speed may be used to form a video picture which is displayed on a video screen (40). The flowmeter is calibrated so that the locus of positions of origin of the video picture gives a determination of the volumetric flow rate of the fluid (10).
Lord, David E. (Livermore, CA); Carter, Gary W. (Livermore, CA); Petrini, Richard R. (Livermore, CA)
Cardiopulmonary resuscitation (CPR) is an essential skill taught within undergraduate nursing programmes. At the author's institution, students must pass the CPR objective structured clinical examination (OSCE) before progressing to second year. However, some students have difficulties developing competence in CPR and evidence suggests that resuscitation skills may only be retained for several months. This has implications for practice as nurses are required to be competent in CPR. Therefore, further opportunities for students to develop these skills are necessary. An action research project was conducted with six students who were assessed by an examiner at a video-recorded mock OSCE. Students self-assessed their skills using the video and a checklist. Semi-structured interviews were conducted to compare checklist scores, and explore students' thoughts and experiences of the OSCE. The findings indicate that students may need to repeat this exercise by comparing their previous and current performances to develop both their self-assessment and CPR skills. Although there were some differences between the examiner's and student's checklist scores, all students reported the benefits of participating in this project, e.g. discussion and identification of knowledge and skills deficits, thus emphasising the benefits of formative assessments to prepare students for summative assessments and ultimately clinical practice. PMID:20149746
To study the selectivity of visual perceptual impairment in children with early brain injury, eight visual perceptual tasks (L94), were administered to congenitally disabled children both with and without risk for cerebral visual impairment (CVI). The battery comprised six object-recognition and two visuoconstructive tasks. Seven tasks were newly designed. For these normative data are presented (age 2.75–6.50 years). Because the
Peter Stiers; Bernadette M van den Hout; Monique Haers; Ria Vanderkelen; Linda S de Vries; Onno van Nieuwenhuizen; Erik Vandenbussche
Perceptual judgments are frequently made during uncertain situations. Previous human brain imaging studies have revealed multiple cortical and subcortical areas that are activated when decision uncertainty is linked to outcome probability. However, the neural mechanisms of uncertainty modulation in different perceptual decision tasks have not been systematically investigated. Uncertainty of perceptual decision can originate either from highly similar object categories (e.g. tasks based on criterion comparison) or from noise being added to visual stimuli (e.g. tasks based on signal detection). In this study, we used functional magnetic resonance imaging (fMRI) to investigate the neural mechanisms of task-dependent modulation of uncertainty in the human brain during perceptual judgements. We observed correlations between uncertainty levels and fMRI activity in a network of areas responsible for performance monitoring and sensory evidence comparison in both tasks. These areas are associated with late stages of perceptual decision, and include the posterior medial frontal cortex, dorsal lateral prefrontal cortex, and intraparietal sulcus. When the modulation of uncertainty on the two tasks was compared, dissociable cortical networks were identified. Uncertainty in the criterion comparison task modulated activity in the left lateral prefrontal cortex related to rule retrieval. In the signal detection task, uncertainty modulated activity in higher visual processing areas thought to be sensory information 'accumulators' that are active during early stages of perceptual decision. These findings offer insights into the mechanism of information processing during perceptual decision-making. PMID:23033853
Background Segregating auditory scenes into distinct objects or streams is one of our brain's greatest perceptual challenges. Streaming has classically been studied with bistable sound stimuli, perceived alternately as a single group or two separate groups. Throughout the last decade different methodologies have yielded inconsistent evidence about the role of auditory cortex in the maintenance of streams. In particular, studies using functional magnetic resonance imaging (fMRI) have been unable to show persistent activity within auditory cortex (AC) that distinguishes between perceptual states. Results We use bistable stimuli, an explicit perceptual categorization task, and a focused region of interest (ROI) analysis to demonstrate an effect of perceptual state within AC. We find that AC has more activity when listeners perceive the split percept rather than the grouped percept. In addition, within this ROI the pattern of acoustic response across voxels is significantly correlated with the pattern of perceptual modulation. In a whole-brain exploratory test, we corroborate previous work showing an effect of perceptual state in the intraparietal sulcus. Conclusions Our results show that the maintenance of auditory streams is reflected in AC activity, directly relating sound responses to perception, and that perceptual state is further represented in multiple, higher level cortical regions.
This paper presents a computational approach to perceptual grouping in dot patterns. Detection of perceptual organization is done in two steps. The first step called the lowest level grouping, extracts the perceptual segments of dots that group together b...
Abstract, In this paper, we describe a statistical video representation and modeling scheme. Video representation schemes are needed to segment a video stream into meaningful video-objects, useful for later indexing and retrieval applications. In the proposed methodology, unsupervised clustering via Gaussian mixture modeling extracts coherent space-time regions in feature space, and corresponding coherent segments (video-regions) in the video content. A
In this paper, a multimedia data mining framework for discovering important but previously unknown knowledge such as vehicle identification, traffic flow, and the spatio-temporal relations of the vehicles at the intersections from traffic video sequences is proposed. The proposed multimedia data mining framework analyzes the traffic video sequences by using background subtraction, image\\/video segmentation, object tracking, and modeling with multimedia
Shu-ching Chen; Mei-ling Shyu; Chengcui Zhang; Jeff Strickrott
Objectivevideo quality measurement has become an important issue, as multimedia services are now widely available over the Internet and other wireless communication media. Traditionally, professional CRT monitors have been used to measure subjective video quality. However, the majority of users have LCD, plasma display panel (PDP), or consumer-graded CRT monitors. We compared the subjective video quality of various TV
We change the behavior of actors in a video. For instance, the outcome of a 100-meter race in the Olympic game can be falsified. We track objects and segment motions using a modified mean shift mechanism. The resulting video layers can be played in different speeds and at different reference points with respect to the original video. In order to
Timothy K. Shih; Nick C. Tang; Joseph C. Tsai; Hsing-ying Zhong
Two experiments were conducted in which the effects of different feedback displays on decision criterion learning were examined\\u000a in a perceptual categorization task with unequal cost-benefits. In Experiment 1, immediate versus delayed feedback was combined\\u000a factorially with objective versus optimal classifier feedback. Immediate versus delayed feedback had no effect. Performance\\u000a improved significantly over blocks with optimal classifier feedback and remained
In this paper we introduce SitCom, a novel software utility for developing context-aware services, editing and deploying context situation models, and simulating\\u000a perceptual components. SitCom, which stands for Situation Composer, is constructed as an open and extensible framework with rich graphical rendering capabilities, including 3D visualization.\\u000a One of SitCom’s main goals is to simulate interactions among people and objects in
This paper presents a laboratory exercise that introduces students to the use of video analysis software and the Lenz's law demonstration. Digital techniques have proved to be very useful for the understanding of physical concepts. In particular, the availability of affordable digital video offers students the opportunity to actively engage in kinematics in introductory-level physics.1,2 By using digital videos frame advance features and ``marking'' the position of a moving object in each frame, students are able to more precisely determine the position of an object at much smaller time increments than would be possible with common time devices. Once the student collects data consisting of positions and times, these values may be manipulated to determine velocity and acceleration. There are a variety of commercial and free applications that can be used for video analysis. Because the relevant technology has become inexpensive, video analysis has become a prevalent tool in introductory physics courses.
Cross correlation analysis of digitised grey scale patterns is based on - at least - two images which are compared one to each other. Comparison is performed by means of a two-dimensional cross correlation algorithm applied to a set of local intensity submatrices taken from the pattern matrices of the reference and the comparison images in the surrounding of predefined points of interest. Established as an outstanding NDE tool for 2D and 3D deformation field analysis with a focus on micro- and nanoscale applications (microDAC and nanoDAC), the method exhibits an additional potential for far wider applications, that could be used for advancing homeland security. Cause the cross correlation algorithm in some kind seems to imitate some of the "smart" properties of human vision, this "field-of-surface-related" method can provide alternative solutions to some object and process recognition problems that are difficult to solve with more classic "object-related" image processing methods. Detecting differences between two or more images using cross correlation techniques can open new and unusual applications in identification and detection of hidden objects or objects with unknown origin, in movement or displacement field analysis and in some aspects of biometric analysis, that could be of special interest for homeland security.
Theoretical models of unsupervised category learning postulate that humans “invent” categories to accommodate new patterns, but tend to group stimuli into a small number of categories. This “Occam's razor” principle is motivated by normative rules of statistical inference. If categories influence perception, then one should find effects of category invention on simple perceptual estimation. In a series of experiments, we tested this prediction by asking participants to estimate the number of colored circles on a computer screen, with the number of circles drawn from a color-specific distribution. When the distributions associated with each color overlapped substantially, participants' estimates were biased toward values intermediate between the two means, indicating that subjects ignored the color of the circles and grouped different-colored stimuli into one perceptual category. These data suggest that humans favor simpler explanations of sensory inputs. In contrast, when the distributions associated with each color overlapped minimally, the bias was reduced (i.e., the estimates for each color were closer to the true means), indicating that sensory evidence for more complex explanations can override the simplicity bias. We present a rational analysis of our task, showing how these qualitative patterns can arise from Bayesian computations.
Many companies have launched their products or services online as a new business focus, but only a few of them have survived the competition and made profits. The most important key to an online business's success is to create "brand value" for the customers. Although the concept of online brand has been discussed in previous studies, there is no empirical study on the measurement of online branding. As Web 2.0 emerges to be critical to online branding, the purpose of this study was to measure Taiwan's major Web sites with a number of personality traits to build a perceptual map for online brands. A pretest identified 10 most representative online brand perceptions. The results of the correspondence analysis showed five groups in the perceptual map. This study provided a practical view of the associations and similarities among online brands for potential alliance or branding strategies. The findings also suggested that brand perceptions can be used with identified consumer needs and behaviors to better position online services. The brand perception map in the study also contributed to a better understanding of the online brands in Taiwan. PMID:18785819
Theoretical models of unsupervised category learning postulate that humans "invent" categories to accommodate new patterns, but tend to group stimuli into a small number of categories. This "Occam's razor" principle is motivated by normative rules of statistical inference. If categories influence perception, then one should find effects of category invention on simple perceptual estimation. In a series of experiments, we tested this prediction by asking participants to estimate the number of colored circles on a computer screen, with the number of circles drawn from a color-specific distribution. When the distributions associated with each color overlapped substantially, participants' estimates were biased toward values intermediate between the two means, indicating that subjects ignored the color of the circles and grouped different-colored stimuli into one perceptual category. These data suggest that humans favor simpler explanations of sensory inputs. In contrast, when the distributions associated with each color overlapped minimally, the bias was reduced (i.e., the estimates for each color were closer to the true means), indicating that sensory evidence for more complex explanations can override the simplicity bias. We present a rational analysis of our task, showing how these qualitative patterns can arise from Bayesian computations. PMID:24137136
Two sets of generalized, perceptual-based features are investigated for use in classifying animal vocalizations. Since many species, especially mammals, share similar physical sound perception mechanisms which vary in size, two features sets commonly used in human speech processing, mel-frequency cepstral coefficients (MFCCs) and perceptual linear prediction (PLP) analysis, are modified for use in other species. One modification made to the
Demosaicing is an important part of the image-processing chain for many digital color cameras. The demosaicing operation converts a raw image acquired with a single sensor array, overlaid with a color filter array, into a full-color image. In this paper, we report the results of two perceptual experiments that compare the perceptual quality of the output of different demosaicing algorithms.
PHILIPPE LONGÈRE; Xuemei Zhang; PETER B. DELAHUNT; DAVID H. BRAINARD
The study presented here outlines a procedure for mea- suring and quantitatively representing the perceptual tempo of a musical excerpt. We also present a method for apply- ing such measures of perceptual tempo to the design of automatic tempo-trackers in order to more accurately rep- resent the perceived beat in music.
This paper deals with the possible benefits of Perceptual Learning in Artificial Intelligence. On the one hand, Per- ceptual Learning is more and more studied in neurobiology and is now considered as an essential part of any living sys- tem. In fact, Perceptual Learning and Cognitive Learning are both necessary for learning and often depends on each other. On the
Nicolas Bredeche; Zhongzhi Shi; Jean-daniel Zucker
To discriminate and to recognize sound sources in a noisy, reverberant environment, listeners need to perceptually integrate the direct wave with the reflections of each sound source. It has been confirmed that perceptual fusion between direct and reflected waves of a speech sound helps listeners recognize this speech sound in a simulated reverberant environment with disrupting sound sources. When the
Measuring preferences for moving video quality is harder than for static images due to the fleeting and variable nature of moving video. Subjective preferences for image quality can be tested by observers indicating their preference for one image over another. Such pairwise comparisons can be analyzed using Thurstone scaling (Farrell, 1999). Thurstone (1927) scaling is widely used in applied psychology, marketing, food tasting and advertising research. Thurstone analysis constructs an arbitrary perceptual scale for the items that are compared (e.g. enhancement levels). However, Thurstone scaling does not determine the statistical significance of the differences between items on that perceptual scale. Recent papers have provided inferential statistical methods that produce an outcome similar to Thurstone scaling (Lipovetsky and Conklin, 2004). Here, we demonstrate that binary logistic regression can analyze preferences for enhanced video.
Woods, Russell L.; Satgunam, Premnandhini; Bronstad, P. Matthew; Peli, Eli
As a controllable medium, video-realistic crowds are important for creating the illusion of a populated reality in special effects, games, and architectural visualization. While recent progress in simulation and motion captured-based techniques for crowd synthesis has focused on natural macroscale behavior, this paper addresses the complementary problem of synthesizing crowds with realistic microscale behavior and appearance. Example-based synthesis methods such as video textures are an appealing alternative to conventional model-based methods, but current techniques are unable to represent and satisfy constraints between video sprites and the scene. This paper describes how to synthesize crowds by segmenting pedestrians from input videos of natural crowds and optimally placing them into an output video while satisfying environmental constraints imposed by the scene. We introduce crowd tubes, a representation of videoobjects designed to compose a crowd of video billboards while avoiding collisions between static and dynamic obstacles. The approach consists of representing crowd tube samples and constraint violations with a conflict graph. The maximal independent set yields a dense constraint-satisfying crowd composition. We present a prototype system for the capture, analysis, synthesis, and control of video-based crowds. Several results demonstrate the system's ability to generate videos of crowds which exhibit a variety of natural behaviors. PMID:24029912
|This study examined whether infants privilege shape over other perceptual properties when making inferences about the shared properties of novel objects. Forty-six 15-month-olds were presented with novel target objects that possessed a nonobvious property, followed by test objects that varied in shape, color, or texture relative to the target.…
We examined the effect of spatial iconicity (a perceptual simulation of canonical locations of objects) and word-order frequency on language processing and episodic memory of orientation. Participants made speeded relatedness judgments to pairs of words presented in locations typical to their real world arrangements (e.g., ceiling on top and floor on bottom). They then engaged in a surprise orientation recognition task for the word pairs. We replicated Louwerse’s finding (2008) that word-order frequency has a stronger effect on semantic relatedness judgments than spatial iconicity. This is consistent with recent suggestions that linguistic representations have a stronger impact on immediate decisions about verbal materials than perceptual simulations. In contrast, spatial iconicity enhanced episodic memory of orientation to a greater extent than word-order frequency did. This new finding indicates that perceptual simulations have an important role in episodic memory. Results are discussed with respect to theories of perceptual representation and linguistic processing.
The stimulus complexity of naturally occurring odours presents unique challenges for central nervous systems that are aiming to internalize the external olfactory landscape. One mechanism by which the brain encodes perceptual representations of behaviourally relevant smells is through the synthesis of different olfactory inputs into a unified perceptual experience — an odour object. Recent evidence indicates that the identification, categorization and discrimination of olfactory stimuli rely on the formation and modulation of odour objects in the piriform cortex. Convergent findings from human and rodent models suggest that distributed piriform ensemble patterns of olfactory qualities and categories are crucial for maintaining the perceptual constancy of ecologically inconstant stimuli.
A synthesizer is based on a nonlinear wave-shaping model of the glottal area, an algebraic model of the glottal aerodynamics as well as concatenated-tube models of the trachea and vocal tract. Voice disorders are simulated by way of models of vocal frequency jitter and tremor, vocal amplitude shimmer and tremor, as well as pulsatile additive noise. Six experiments have been carried out to assess the synthesizer perceptually. Three experiments involve the perceptual categorization of male synthetic and human stimuli and one the auditory discrimination between synthetic and human tokens. A fifth experiment reports the auditory discrimination between synthetic tokens with different levels of additive and modulation noise. A sixth experiment reports the scoring by expert listeners of male synthetic stimuli on equal-appearing interval scales grade-roughness-breathiness (GRB). A first objective is to demonstrate the ability of the synthesizer to simulate vowel sounds that are valid exemplars of speech sounds produced by humans with voice disorders. A second objective is to learn how human expert raters perceptually map vocal frequency, additive and modulation noise as well as vowel categories into scores on GRB scales. PMID:23039453
Stereoscopic displays are an increasingly prevalent tool for experiencing virtual environments, and the inclusion of stereo has the potential to improve distance perception within the virtual environment. When multiple users simultaneously view the same stereoscopic display, only one user experiences the projectively correct view of the virtual environment, and all other users view the same stereoscopic images while standing at locations displaced from the center of projection (CoP). This study was designed to evaluate the perceptual distortions caused by displacement from the CoP when viewing virtual objects in the context of a virtual scene containing stereo depth cues. Judgments of angles were distorted after leftward and rightward displacement from the CoP. Judgments of object depth were distorted after forward and backward displacement from the CoP. However, perceptual distortions of angle and depth were smaller than predicted by a ray-intersection model based on stereo viewing geometry. Furthermore, perceptual distortions were asymmetric, leading to different patterns of distortion depending on the direction of displacement. This asymmetry also conflicts with the predictions of the ray-intersection model. The presence of monocular depth cues might account for departures from model predictions.
Burton, Melissa; Pollock, Brice; Kelly, Jonathan W.; Gilbert, Stephen; Winer, Eliot; de La Cruz, Julio
For unattended persistent surveillance there is a need for a system which provides the following information: target classification, target quantity estimate, cargo presence and characterization, direction of travel, and action. Over highly bandwidth restricted links, such as Iridium, SATCOM or HF, the data rates of common techniques are too high, even after aggressive compression, to deliver the required intelligence in a timely, low power manner. We propose the following solution to this data rate problem: Profile Video. Profile video is a new technique which provides all of the required information in a very low data-rate package.
It has been shown experimentally that under certain combinations of sensory stimuli, human subjects can perceive one of several distinct illusions about their overall orientation in or movement through space. In at least some cases, the structure of such multistable illusory perceptions of orientation can be efficiently described by perceptual transformations that act on a current orientation estimate to yield an updated perceptual construct. Repeated application of identified generating transformations yields a limited set of predicted illusions for a given sensory environment. This approach is especially valuable for perceptual data that exhibits discretely differing classes of illusions between subjects or trials. In a previous study, application of a semigroup of perceptual centering transformations has succeeded in reproducing and simplifying data from an experiment in which subjects experiencing visual vection reported a range of illusions about the orientations of their gaze, head, and torso to gravity. After reviewing previously obtained results on perceptual centering, this article generalizes the approach, presenting the mathematics required to characterize perceptual transformations. The developed framework should be widely applicable in the understanding of perceptual illusions, particularly when these are guided by alignment with preferred constructs. Secondly, the article reveals the nontrivial mathematical process of perceptual semigroup formation and evaluation, deducing the complete description of the semigroup constructed in the previous study. Perceptual centering transformations identified in terrestrial experiments may predict illusions to be expected in spaceflight. For example, our results indicate that under certain conditions, many astronauts will misperceive a visual rotation axis to be centered in front of the head or even the torso. PMID:18626136
A new generation of industrial robots needs to have reliable perceptual systems that are similar to human vision. Human vision is based on the principles of image understanding and active vision. Both principles are possible in the form of Network-Symbolic systems. An Image/Video Analysis that is based on Network-Symbolic principles significantly differs from a traditional image analysis. Instead of precise computations of 3-dimensional models, such a system converts image into an "understandable" relational format similar to knowledge models. It is hard to use geometric operations for processing of natural images. Instead, the brain builds a relational network-symbolic structure of visual scene, using different clues to set up the relational order of surfaces and objects with respect to the observer and to each other. Spatial order can be represented as a connection graph. There is a generic logic of 3-D structures, which is based on relational changes of object views in the visual or object buffers. In Network-Symbolic Models, the derived structure and not the primary view is a subject for recognition. Such recognition is not affected by local changes and appearances of the object as seen from a set of similar views. Integrated into the industrial robotic systems, Network-Symbolic models can intelligently interpret images and video.
In this paper a frame-loss adaptive temporal pooling method for video quality assessment is proposed. Extensive subjective tests have been carried out to determine the duration of successive frames based on which steady quality judgment can be made by human observers. The resulting duration is applied to the determination of the length of Group of Frames (GOF), where a flexible algorithm is used to separate the input video into variable sized GOFs. Short-term temporal pooling is first performed for each of the GOF to get the GOF quality, where quality contribution of each frame is incorporated with the context and frame loss well taken into account. The video quality is then obtained by long-term temporal pooling of the GOF quality considering the fact that perceptualvideo quality is predominately determined by the worst parts of the video. Extensive experimental results have demonstrated the effectiveness of the proposed method both for regular and irregular frame loss.
Given two video sequences of different scenes, the problem of seamlessly transferring a 3D moving object from one sequence to the other is a complex task. In this paper, we present a method to extract the alpha matte of a moving 3D object from a source video, and then correctly augment the object into another target video. Our framework builds
There has been an increasing need recently to develop objective quality measurement techniques that can predict perceived video quality automatically. This paper introduces two video quality assessment models. The first one requires the original video as a reference and is a structural distortion measurement based ap- proach, which is different from traditional error sensitivity based methods. Experiments on the video
There has been an increasing need recently to develop objective quality measurement techniques that can predict perceived video quality automatically. This paper introduces two video quality assessment models. The first one requires the original video as a reference and is a structural distortion measurement based approach, which is different from traditional error sensitivity based methods. Experiments on the video quality
Ligang Lu; Zhou Wang; Alan C. Bovik; Jack Kouloheris
|Schizophrenia patients exhibit perceptual and cognitive deficits, including in visual motion processing. Given that cognitive systems depend upon perceptual inputs, improving patients' perceptual abilities may be an effective means of cognitive intervention. In healthy people, motion perception can be enhanced through perceptual learning, but it…
Norton, Daniel J.; McBain, Ryan K.; Ongur, Dost; Chen, Yue
|Based on the premise that writing for video is not like writing for text, this guide tells the writer how to create a video script that turns particular ideas into powerful and effective video. Defining a video script as a "blueprint for a video," the guide takes the writer step-by-step through the entire process of creating a video. Chapters in…
The Kinks, ONE FOR THE ROAD Vestron Video 60 min. $29.95.Neil Diamond, LIVE AT THE GREEK. Vestron Video. 60 min. (HiFi)Video 45s XX½ Stray Cats. Sony Corp. Video 45. (13 min.)XXX David Bowie. Sony Corp. Video 45 (14 min.)Duran Duran. Sony Corp. Video 45 (11 min.)
Video content characterization is a challenging problem in video databases. The aim of such characterization is to generate indices that can describe a video clip in terms of objects and their actions in the clip. Generally, such indices are extracted by performing image analysis on the video clips. Many such indices can also be generated by analyzing the embedded audio information of video clips. Indices pertaining to context, scene emotion, and actors or characters present in a video clip appear especially suitable for generation via audio analysis techniques of keyword spotting, and speech and speaker recognition. In this paper, we examine the potential of speaker identification techniques for characterizing video clips in terms of actors present in them. We describe a three-stage processing system consisting of a shot boundary detection stage, an audio classification stage, and a speaker identification stage to determine the presence of different actors in isolated shots. Experimental results using the movie A Few Good Men are presented to show the efficacy of speaker identification for labeling video clips in terms of persons present in them.
Objective Experiments to study voice quality have typically used rating scales or direct magnitude estimation to obtain listener judgments. Unfortunately, the data obtained using these tasks is context-dependent, which makes it difficult to compare perceptual judgments of voice quality across experiments. The present experiment describes a simple matching task to quantify voice quality. The data obtained through this task was compared to perceptual judgments obtained using rating scale and direct magnitude estimation tasks to evaluate whether the three tasks provide equivalent perceptual distances across stimuli. Methods Ten synthetic vowel continua that varied in terms of their aspiration noise were evaluated for breathiness using each of the three tasks. Linear and nonlinear regression was used to compare the perceptual distances between stimuli obtained through each technique. Results Results show that the perceptual distances estimated from matching and direct magnitude estimation task are similar, but both differ from the rating scale task, suggesting that the matching task provides perceptual distances with ratio-level measurement properties. Conclusions The matching task is advantageous for measurement of vocal quality because it provides reliable measurement with ratio-level scale properties. It allows the use of a fixed reference signal for all comparisons, thus allowing researchers to directly compare findings across different experiments.
Perceptual motor activities for physically handicapped children are presented in the areas of fine and gross motor skills. Also detailed are activities to develop body image, visual motor skills, and tactile and auditory perception. (JD)
The geometry of perceptual space needs to be known to model spatial orientation constancy or to create virtual environments. To examine one main aspect of this geometry, the angular relation between the three spatial axes was measured. Experiments were pe...
This study is the first step in the psychoacoustic exploration of perceptual differences between the sounds of different violins. A method was used which enabled the same performance to be replayed on different ``virtual violins\\
We employed a parametric psychophysical design in combination with functional imaging to examine the influence of metric changes in perceptual incongruence on perceptual alternation rates and cortical responses. Subjects viewed a bistable stimulus defined by incongruent depth cues; bistability resulted from incongruence between binocular disparity and monocular perspective cues that specify different slants (slant rivalry). Psychophysical results revealed that perceptual alternation rates were positively correlated with the degree of perceived incongruence. Functional imaging revealed systematic increases in activity that paralleled the psychophysical results within anterior intraparietal sulcus, prior to the onset of perceptual alternations. We suggest that this cortical activity predicts the frequency of subsequent alternations, implying a putative causal role for these areas in initiating bistable perception. In contrast, areas implicated in form and depth processing (LOC and V3A) were sensitive to the degree of slant, but failed to show increases in activity when these cues were in conflict.
Brouwer, Gijs Joost; Tong, Frank; Hagoort, Peter; van Ee, Raymond
Purpose of review This paper describes recent advances in perceptual, acoustic, aerodynamic, and endoscopic imaging methods for assessing voice production. Recent findings Perceptual assessment Speech-language pathologists are being encouraged to use the new CAPE-V inventory for auditory perceptual assessment of voice quality, and recent studies have provided new insights into listener reliability issues that have plagued subjective perceptual judgments of voice quality. Acoustic assessment Progress is being made on the development of algorithms that are more robust for analyzing disordered voices, including the capability to extract voice quality-related measures from running speech segments. Aerodynamic assessment New devices for measuring phonation threshold air pressures and air flows have the potential to serve as sensitive indices of glottal phonatory conditions, and recent developments in aeroacoustic theory may provide new insights into laryngeal sound production mechanisms. Endoscopic imaging The increased light sensitivity of new ultra high-speed color digital video processors is enabling high-quality endoscopic imaging of vocal fold tissue motion at unprecedented image capture rates, which promises to provide new insights into mechanisms of normal and disordered voice production. Summary Some of the recent research advances in voice quality assessment could be more readily adopted into clinical practice, while others will require further development.
Background There is unequal access to health care in Australia, particularly for the one-third of the population living in remote and rural areas. Video consultations delivered via the Internet present an opportunity to provide medical services to those who are underserviced, but this is not currently routine practice in Australia. There are advantages and shortcomings to using video consultations for diagnosis, and general practitioners (GPs) have varying opinions regarding their efficacy. Objective The aim of this Internet-based study was to explore the attitudes of Australian GPs toward video consultation by using a range of patient scenarios presenting different clinical problems. Methods Overall, 102 GPs were invited to view 6 video vignettes featuring patients presenting with acute and chronic illnesses. For each vignette, they were asked to offer a differential diagnosis and to complete a survey based on the theory of planned behavior documenting their views on the value of a video consultation. Results A total of 47 GPs participated in the study. The participants were younger than Australian GPs based on national data, and more likely to be working in a larger practice. Most participants (72%-100%) agreed on the differential diagnosis in all video scenarios. Approximately one-third of the study participants were positive about video consultations, one-third were ambivalent, and one-third were against them. In all, 91% opposed conducting a video consultation for the patient with symptoms of an acute myocardial infarction. Inability to examine the patient was most frequently cited as the reason for not conducting a video consultation. Australian GPs who were favorably inclined toward video consultations were more likely to work in larger practices, and were more established GPs, especially in rural areas. The survey results also suggest that the deployment of video technology will need to focus on follow-up consultations. Conclusions Patients with minor self-limiting illnesses and those with medical emergencies are unlikely to be offered access to a GP by video. The process of establishing video consultations as routine practice will need to be endorsed by senior members of the profession and funding organizations. Video consultation techniques will also need to be taught in medical schools.
Perceptual differences were investigated between 50 college students who were non-drug users and 50 hippies who used LSD. The groups were matched for race, sex, and age. No significant differences were found between intelligence and social class. The seven perceptual tests used were Color-Form Attention Test, Judgment of Sounds, Autokinetic Effect, Six-Inch Estimation, Rod and Frame Test, Estimation of Head
|Previous research has suggested that perceptual information about objects is activated during sentence comprehension [Zwaan, R. A., Stanfield, R. A., & Yaxley, R. H. (2002). Language comprehenders mentally represent the shapes of objects. "Psychological Science, 13"(2), 168-171]. The goal in the current study was to examine the role of the two…
Due to the improvement of image rendering processes, and the increasing importance of quantitative comparisons among synthetic color images, it is essential to define perceptually based metrics which enable to objectively assess the visual quality of digital simulations. In response to this need, this paper proposes a new methodology for the determination of an objective image quality metric, and gives
Stephane Albin; Gilles Rougeron; Bernard Péroche; Alain Trémeau
Objective image quality assessment (IQA) aims to evaluate image quality consistently with human perception. Most of the existing perceptual IQA metrics cannot accurately represent the degradations from different types of distortion, e.g., existing structural similarity metrics perform well on content-dependent distortions while not as well as peak signal-to-noise ratio (PSNR) on content-independent distortions. In this paper, we integrate the merits of the existing IQA metrics with the guide of the recently revealed internal generative mechanism (IGM). The IGM indicates that the human visual system actively predicts sensory information and tries to avoid residual uncertainty for image perception and understanding. Inspired by the IGM theory, we adopt an autoregressive prediction algorithm to decompose an input scene into two portions, the predicted portion with the predicted visual content and the disorderly portion with the residual content. Distortions on the predicted portion degrade the primary visual information, and structural similarity procedures are employed to measure its degradation; distortions on the disorderly portion mainly change the uncertain information and the PNSR is employed for it. Finally, according to the noise energy deployment on the two portions, we combine the two evaluation results to acquire the overall quality score. Experimental results on six publicly available databases demonstrate that the proposed metric is comparable with the state-of-the-art quality metrics. PMID:22910116
Image-based rendering is one of the hottest new areas in computer graphics. Instead of using 3D modeling and painting tools to construct graphics models by hand, IBR uses real-world imagery to rapidly create photorealistic s hape and appearance models. However, IBR results to date have mostly been restricted to static objects and scenes. Video- based rendering brings the same kind
This is a collection of links to educational videos related to engineering which are made for a general audience. Some of them explore the engineering feats accomplished by some of the great civilizations in history; some look at how engineers and designers use historic inventions and clues from the natural world in ingenious ways to develop new buildings and machines; and some deal with industrial design, product design, and our relationship with the manufactured objects that surround us.
This article characterizes video-based interactions that emerge from YouTube's video response feature, which allows users to discuss themes and to provide reviews for products or places using much richer media than text. Based on crawled data covering a representative subset of videos and users, we present a characterization from two perspectives: the video response view and the interaction network view.
Fabrício Benevenuto; Tiago Rodrigues; Virgilio Almeida; Jussara M. Almeida; Keith W. Ross
The objective of this study was to analyse the correctness of the offside judgements of the assistant referees during the final round of the FIFA 2002 World Cup. We also contrasted two hypotheses to explain the errors in judging offside. The optical error hypothesis is based on an incorrect viewing angle, while the flash-lag hypothesis refers to perceptual errors associated with the flash-lag effect (i.e. a moving object is perceived as spatially leading its real position at a discrete instant signalled by a briefly flashed stimulus). Across all 64 matches, 337 offsides were analysed using digital video technology. The error percentage was 26.2%. During the first 15 min match period, there were significantly more errors (38.5%) than during any other 15 min interval. As predicted by the flash-lag effect, we observed many more flag errors (86.6%) than non-flag errors (13.4%). Unlike the predictions of the optical error hypothesis, there was no significant difference between the correct and incorrect decisions in terms of the positioning of the assistant referees relative to the offside line (0.81 and 0.77 m ahead, respectively). To reduce the typical errors in judging offside, alternative ways need to be considered to teach assistant referees to better deal with flash-lag effects. PMID:16608766
In this work we describe a novel statistical video representa- tion and modeling scheme. Video representation schemes are needed to enable segmenting a video stream into meaningful video-objects, useful for later indexing and retrieval applications. In the proposed method- ology, unsupervised clustering via Guassian mixture modeling extracts coherent space-time regions in feature space, and corresponding coher- ent segments (video-regions) in
A method for authoring video documents includes the steps of inputting video data to be processed, segmenting the video data into shots by identifying breaks between the shots, subdividing the shots into subshots using motion analysis to provide location information for motions of objects of interest, describing boundaries for the objects of interest in the video data such that the objects of interest are represented by the boundaries in the shots and creating an anchorable information unit file based on the boundaries of the objects of interest such that objects of interest are used to identify portions of the video data. A system is also included.
An important step towards gaining an understanding of how a particular medium can be used most effectively in education is to study its outstanding examples, regardless of their original purpose. It is assumed that \\
According to the object-based view, visual attention can be deployed to "objects" or perceptual units, regardless of spatial locations. Recently, however, the notion of object has also been extended to the auditory domain, with some authors suggesting possible interactions between visual and auditory objects. Here we show that task-irrelevant auditory objects may affect the deployment of visual attention, providing evidence that crossmodal links can also occur at an object-based level. Hence, in addition to the well documented control of visual objects over what we hear, our findings demonstrate that, in some cases, auditory objects can affect visual processing. PMID:15925569
This is a short activity intended to allow students to practice kinematics using a video of a familiar object: a spring-powered toy car. Students measure displacement and elapsed time from the video and use these measurements to calculate average speed. Observing that the car has an initial speed of zero, students can find the final speed and acceleration. Students will use a QuickTime video recorded at 240 frames per second, making measurements directly from the video using a ruler and a frame-counter overlaid on the video. The video at right is a preview of the video students use for the activity.
A fundamental task of visual perception is to group visual features - sometimes spatially separated and partially occluded - into coherent, unified representations of objects. Perceptual grouping can vastly simplify the description of a visual scene and is critical for our visual system to understand the three-dimensional visual world. Numerous neurophysiological and brain imaging studies have demonstrated that neural mechanisms of perceptual grouping are characterized by the enhancement of neural responses throughout the visual processing hierarchy, from lower visual areas processing grouped features to higher visual areas representing objects and shapes from grouping. In a series of psychophysical adaptation experiments, we made the counterintuitive observation that perceptual grouping amplified the shape aftereffect but meanwhile, reduced the tilt aftereffect and the threshold elevation aftereffect (TEAE). Furthermore, the modulation of perceptual grouping on the TEAE showed a partial interocular transfer. This finding suggests a 2-fold effect of perceptual grouping - enhancing the high-level shape representation and attenuating the low-level feature representation even at a monocular level. We propose that this effect is a functional manifestation of a predictive coding scheme and reflects an efficient code of visual information across lower and higher visual cortical areas. PMID:22578417
To support effective browsing, interfaces to digital video libraries should include video surrogates (i.e., smaller objects that can stand in for the videos in the collection, analogous to abstracts standing in for documents). The current study investigated four variations (i.e., speeds) of one form of video surrogate: a fast forward created by selecting every Nth frame from the full video.
Barbara M. Wildemuth; Gary Marchionini; Meng Yang; Gary Geisler; Todd Wilkens; Anthony Hughes; Richard Gruss
Video databases became an active field of research during the last decade. The main objective in such systems is to provide users with capabilities to friendly search, access and playback distributed stored video data in the same way as they do for traditional distributed databases. Hence, such systems need to deal with hard issues : (a) video documents generate huge volumes of data and are time sensitive (streams must be delivered at a specific bitrate), (b) contents of video data are very hard to be automatically extracted and need to be humanly annotated. To cope with these issues, many approaches have been proposed in the literature including data models, query languages, video indexing etc. In this paper, we present SIRSALE : a set of video databases management tools that allow users to manipulate video documents and streams stored in large distributed repositories. All the proposed tools are based on generic models that can be customized for specific applications using ad-hoc adaptation modules. More precisely, SIRSALE allows users to : (a) browse video documents by structures (sequences, scenes, shots) and (b) query the video database content by using a graphical tool, adapted to the nature of the target video documents. This paper also presents an annotating interface which allows archivists to describe the content of video documents. All these tools are coupled to a video player integrating remote VCR functionalities and are based on active network technology. So, we present how dedicated active services allow an optimized video transport for video streams (with Tamanoir active nodes). We then describe experiments of using SIRSALE on an archive of news video and soccer matches. The system has been demonstrated to professionals with a positive feedback. Finally, we discuss open issues and present some perspectives.
Brunie, Lionel; Favory, Loic; Gelas, J. P.; Lefevre, Laurent; Mostefaoui, Ahmed; Nait-Abdesselam, F.
The research of new human-computer interfaces has become a growing field in computer science, which aims to attain the development of more natural, intuitive, unobtrusive and efficient interfaces. This objective has come up with the concept of Perceptual User Interfaces (PUIs) that are turning out to be very popular as they seek to make the user interface more natural and compelling by taking advantage of the ways in which people naturally interact with each other and with the world. PUIs can use speech and sound recognition and generation, computer vision, graphical animation and visualization, language understanding, touch-based sensing and feedback (haptics), learning, user modeling and dialog management.
Correlating and fusing video frames from distributed and moving sensors is important area of video matching. It is especially difficult for frames with objects at long distances that are visible as single pixels where the algorithms cannot exploit the structure of each object. The proposed algorithm correlates partial frames with such small objects using the algebraic structural approach that exploits structural relations between objects including ratios of areas. The algorithm is fully affine invariant, which includes any rotation, shift, and scaling.
This video is composed of a sequence of time lapse films created by John Tyler Bonner in the 1940s to show the life cycle of the cellular slime mold, Dictyostelium discoideum. As only the second person to study slime molds, Bonner frequently encountered audiences who had never heard of, let alone seen, the unusual organism. He therefore decided to create a film to present at seminars in order to introduce his object of study. Bonner created the video for his senior thesis at Harvard University with the help of photographer Frank Smith. Bonner began to work at Princeton University in 1947, thus the mention of that university on the title screen of the film. It was digitized and narrated by developmental biologist Rachel Fink of Mount Holyoke College. Includes (approximate starting times given): Amoebae [00:02]; Aggregation [00:27]; Migrating Pseudoplasmodia [02:16]; Culmination [03:28]; Trisected Pseudoplasmodium [04:17].
John T Bonner (PRINCETON Department of Ecology and Evolutionary Biology)
The human brain is renowned for its dynamic regulation of sensory inputs, which enables our brain to operate under an enormous range of physical energy with sensory neurons whose processing range is limited. Here we present a novel and strong brightness induction that reflects neural mechanisms underlying this dynamic regulation of sensory inputs. When physically identical, stationary and moving objects are viewed simultaneously, the stationary and moving objects appear largely different. Experiments reveal that normalization at multiple stages of visual processing provides a plausible account for the large shifts in perceptual experiences, observed in both the stationary and the moving objects. This novel brightness induction suggests that brightness of an object is influenced not only by variations in surrounding light (i.e. simultaneous contrast) but also by dynamically changing neural responses associated with stimulus motion. PMID:23954812
In many everyday situations, speed is of the essence. However, fast decisions typically mean more mistakes. To this day, it remains unknown whether reaction times can be reduced with appropriate training, within one individual, across a range of tasks, and without compromising accuracy. Here we review evidence that the very act of playing action video games significantly reduces reaction times without sacrificing accuracy. Critically, this increase in speed is observed across various tasks beyond game situations. Video gaming may therefore provide an efficient training regimen to induce a general speeding of perceptual reaction times without decreases in accuracy of performance.
Dye, Matthew W.G.; Green, C. Shawn; Bavelier, Daphne
This web site, hosted by Gateway Technical College, houses several learning objects and lectures using Flash animations and Camtasia videos in the following electronics topic areas: DC/AC ciruits, digital electronics, electronic devices, transistor fundamentals, op amp fundamentals, electronic workbench fundamentals, electronic circuit analysis and multi-simulation. This is a great fundamental source for learn objects in the field of electronics.
The extent to which different cognitive processes are "embodied" is widely debated. Previous studies have implicated sensorimotor regions such as lateral intraparietal (LIP) area in perceptual decision making. This has led to the view that perceptual decisions are embodied in the same sensorimotor networks that guide body movements. We use event-related fMRI and effective connectivity analysis to investigate whether the human sensorimotor system implements perceptual decisions. We show that when eye and hand motor preparation is disentangled from perceptual decisions, sensorimotor areas are not involved in accumulating sensory evidence toward a perceptual decision. Instead, inferior frontal cortex increases its effective connectivity with sensory regions representing the evidence, is modulated by the amount of evidence, and shows greater task-positive BOLD responses during the perceptual decision stage. Once eye movement planning can begin, however, an intraparietal sulcus (IPS) area, putative LIP, participates in motor decisions. Moreover, sensory evidence levels modulate decision and motor preparation stages differently in different IPS regions, suggesting functional heterogeneity of the IPS. This suggests that different systems implement perceptual versus motor decisions, using different neural signatures. PMID:23365248
Filimon, Flavia; Philiastides, Marios G; Nelson, Jonathan D; Kloosterman, Niels A; Heekeren, Hauke R
Nowadays most digital cameras have the functionality of taking short video clips, with the length of video ranging from several seconds to a couple of minutes. The purpose of this research is to develop an algorithm which extracts an optimal set of keyframes from each short video clip so that the user could obtain proper video frames to print out. In current video printing systems, keyframes are normally obtained by evenly sampling the video clip over time. Such an approach, however, may not reflect highlights or regions of interest in the video. Keyframes derived in this way may also be improper for video printing in terms of either content or image quality. In this paper, we present an intelligent keyframe extraction approach to derive an improved keyframe set by performing semantic analysis of the video content. For a video clip, a number of video and audio features are analyzed to first generate a candidate keyframe set. These features include accumulative color histogram and color layout differences, camera motion estimation, moving object tracking, face detection and audio event detection. Then, the candidate keyframes are clustered and evaluated to obtain a final keyframe set. The objective is to automatically generate a limited number of keyframes to show different views of the scene; to show different people and their actions in the scene; and to tell the story in the video shot. Moreover, frame extraction for video printing, which is a rather subjective problem, is considered in this work for the first time, and a semi-automatic approach is proposed.
Universal Mobile Telecommunication System (UMTS) is a third generation mobile communication systems that supports wireless wideband multimedia applications. The objective of this paper is to present a new model for non-intrusive prediction of H.264 encoded video quality over UMTS networks and to illustrate their application to video quality monitoring and adaptation in mobile wireless streaming services. First, we present an
Asiya Khan; Lingfen Sun; Emmanuel Ifeachor; Jose Oscar Fajardo; Fidel Liberal
Executive Summary Objective The objective of this health technology policy assessment was to determine the effectiveness and cost-effectiveness of video-assisted laryngoscopy for tracheal intubation. The Technology Video-assisted, rigid laryngoscopes have been recently introduced that allow for the illumination of the airway and the accurate placement of the endotracheal tube. Two such devices are available in Canada: the Bullard® Laryngoscope that relies on fibre optics for illumination and the GlideScope® that uses a video camera and a light source to illuminate the airway. Both are connected to an external monitor so health professionals other than the operator can visualize the insertion of the tube. These devices therefore may be very useful as teaching aids for tracheal intubation. Review Strategy The objective of this review was to examine the effectiveness of the most commonly used video-assisted rigid laryngoscopes used in Canada for tracheal intubation. According to the Medical Advisory Secretariat standard search strategy, a literature search for current health technology assessments and peer-reviewed literature from Medline (full citations, in-process and non-indexed citations) and Embase for was conducted for citations from January 1994 to January 2004. Key words used in the search were as follows: Video-assisted; video; emergency; airway management; tracheal intubation and laryngoscopy. Summary of Findings Two video-assisted systems are available for use in Canada. The Bullard® video laryngscope has a large body of literature associated with it and has been used for the last 10 years, although most of the studies are small and not well conducted. The literature on the GlideScope® is limited. In general, these devices provide better views of the airway but are much more expensive than conventional direct laryngoscopes. As with most medical procedures, video-assisted laryngoscopy requires training and skill maintenance for successful use. There seems to be a discrepancy between the seeming advantages of these devices in the management of difficult airway and their availability and uptake outside the operating room. The uptake of these devices by non-anesthetists in Ontario at this time may be limited because: Difficult intubation is relatively infrequent outside the operating room Many alternative and inexpensive devices are available There are no professional supports in place for the training and maintenance of skills for the use of these devices outside anesthesia. Video laryngoscopy has no obvious utility in preventing airborne viral transmission from patient to provider but may be useful for teaching purposes.
Is that youtube video real or fake? This question comes up all the time. In this talk, I will briefly introduce Tracker Video Analysis (a free, java-based application) and show how it can be used to determine the validity of videos.
Viewers of video now have more choices than ever. As the number of choices increases, the task of searching through these choices to locate video of interest is becoming more difficult. Current methods for learning a viewer's preferences in order to automate the search process rely either on video having content descriptions or on having been rated by other viewers
We tested whether lorazepam (a benzodiazepine) affects perceptual processes involved in the computation of contour information. Subjects matched incomplete forms whose contour was composed of line segments varying in their spacing and in their alignment. An initial centrally displayed object (a reference) was followed by two laterally displayed pictures, a target and a distractor. The distractor was the mirror-reversed version
A. Giersch; M. Boucart; J.-M. Danion; Pierre Vidailhet; François Legrand
|The authors investigated the role of perceptual attunement in an emergency braking task in which participants waited until the last possible moment to slam on the brakes. Effects of the size of the approached object and initial speed on the initiation of braking were used to identify the optical variables on which participants relied at various…
In 2 experiments, the authors investigated the ability of high- and low-span comprehenders to construe subtle shades of meaning through perceptual representation. High- and low-span comprehenders responded to pictures that either matched or mismatched a target object's shape as implied by the preceding sentence context. At 750 ms after hearing the…
The authors investigated the role of perceptual attunement in an emergency braking task in which participants waited until the last possible moment to slam on the brakes. Effects of the size of the approached object and initial speed on the initiation of braking were used to identify the optical variables on which participants relied at various stages of practice. In
One of the unique applications of Mixed and Augmented Reality (MR \\/ AR) systems is that hidden and occluded objects can be readily visualized. We call this specialized use of MR\\/AR, Obscured Information Visualization (OIV). In this paper, we describe the beginning of a research program designed to develop such visualizations through the use of principles derived from perceptual psychology
The ability of organisms to categorize objects depends on their sensory experience in an environment. We studied the role of frequency and temporal order of stimuli in perceptual categorization on Darwin VI, a neuronal model interfaced with a behaving real-world device. The model consisted of several distinct biologically based networks, representing areas of the central nervous system. Darwin VI was
Jeffrey L. Krichmar; James A. Snook; Gerald M. Edelman; Olaf Sporns
|Spatial demonstratives ("this/that") play a crucial role when indicating object locations using language. However, the relationship between the use of these proximal and distal linguistic descriptors and the near (peri-personal) versus far (extra-personal) perceptual space distinction is a source of controversy [Kemmerer, D. (1999). "Near" and…
Coventry, Kenny R.; Valdes, Berenice; Castillo, Alejandro; Guijarro-Fuentes, Pedro
|When we view visual images in everyday life, our perception is oriented toward object identification. In contrast, when viewing visual images "as artworks", we also tend to experience subjective reactions to their stylistic and structural properties. This experiment sought to determine how cognitive control and perceptual facilitation contribute…
Cupchik, Gerald C.; Vartanian, Oshin; Crawley, Adrian; Mikulis, David J.
Among the existing block partitioning schemes, the pattern-based video coding (PVC) has already established its superiority at low bit-rate. Its innovative segmentation process with regular-shaped pattern templates is very fast as it avoids handling the exact shape of the moving objects. It also judiciously encodes the pattern-uncovered background segments capturing high level of interblock temporal redundancy without any motion compensation, which is favoured by the rate-distortion optimizer at low bit-rates. The existing PVC technique, however, uses a number of content-sensitive thresholds and thus setting them to any predefined values risks ignoring some of the macroblocks that would otherwise be encoded with patterns. Furthermore, occluded background can potentially degrade the performance of this technique. In this paper, a robust PVC scheme is proposed by removing all the content-sensitive thresholds, introducing a new similarity metric, considering multiple top-ranked patterns by the rate-distortion optimizer, and refining the Lagrangian multiplier of the H.264 standard for efficient embedding. A novel pattern-based residual encoding approach is also integrated to address the occlusion issue. Once embedded into the H.264 Baseline profile, the proposed PVC scheme improves the image quality perceptually significantly by at least 0.5 dB in low bit-rate video coding applications. A similar trend is observed for moderate to high bit-rate applications when the proposed scheme replaces the bi-directional predictive mode in the H.264 High profile. PMID:19789112
In the stereoscopic frame-compatible format, the separate high-definition left and high-definition right views are reduced in resolution and packed to fit within the same video frame as a conventional two-dimensional high-definition signal. This format has been suggested for 3DTV since it does not require additional transmission bandwidth and entails only small changes to the existing broadcasting infrastructure. In some instances, the frame-compatible format might be used to deliver both 2D and 3D services, e.g., for over-the-air television services. In those cases, the video quality of the 2D service is bound to decrease since the 2D signal will have to be generated by up-converting one of the two views. In this study, we investigated such loss by measuring the perceptual image quality of 1080i and 720p up-converted video as compared to that of full resolution original 2D video. The video was encoded with either a MPEG-2 or a H.264/AVC codec at different bit rates and presented for viewing with either no polarized glasses (2D viewing mode) or with polarized glasses (3D viewing mode). The results confirmed a loss of video quality of the 2D video up-converted material. The loss due to the sampling processes inherent to the frame-compatible format was rather small for both 1080i and 720p video formats; the loss became more substantial with encoding, particularly for MPEG-2 encoding. The 3D viewing mode provided higher quality ratings, possibly because the visibility of the degradations was reduced.
Speranza, Filippo; Tam, Wa James; Vázquez, Carlos; Renaud, Ronald; Blanchfield, Phil
This study extends product placement research by testing the impact of interactivity on product placement effectiveness. The results suggest that when children cannot interact with the placements in video games, perceptual fluency is the underlying mechanism leading to positive affect. Therefore, the effects are only evident in a stimulus-based choice where the same stimulus is provided as a cue. However,
Cathode ray tubes (CRTs) display images refreshed at high frequency, and the temporal waveform of each pixel is a luminance impulse only a few milliseconds long. Although humans are perceptually oblivious to this flicker, we show in V1 in macaque monkeys and in humans that extracellularly recorded action potentials (spikes) and visual-evoked potentials (VEPs) align with the video impulses, particularly
Patrick E. Williams; Ferenc Mechler; James Gordon; Robert Shapley; Michael J. Hawken
We consider the problem of spatially and temporally registering multiple video sequences of dynamical scenes which contain, but are not limited to, nonrigid objects such as fireworks, flags fluttering in the wind, etc., taken from different vantage points. This problem is extremely challenging due to the presence of complex variations in the appearance of such dynamic scenes. In this paper, we propose a simple algorithm for matching such complex scenes. Our algorithm does not require the cameras to be synchronized, and is not based on frame-by-frame or volume-by-volume registration. Instead, we model each video as the output of a linear dynamical system and transform the task of registering the video sequences to that of registering the parameters of the corresponding dynamical models. As these parameters are not uniquely defined, one cannot directly compare them to perform registration. We resolve these ambiguities by jointly identifying the parameters from multiple video sequences, and converting the identified parameters to a canonical form. This reduces the video registration problem to a multiple image registration problem, which can be efficiently solved using existing image matching techniques. We test our algorithm on a wide variety of challenging video sequences and show that it matches the performance of significantly more computationally expensive existing methods. PMID:21088325
A video sequence is more than a sequence of still images. It contains a strong spatial–temporal correlation between the regions\\u000a of consecutive frames. The most important characteristic of videos is the perceived motion foreground objects across the frames.\\u000a The motion of foreground objects dramatically changes the importance of the objects in a scene and leads to a different saliency\\u000a map
Dynamic changes of object positions provide an important clue for video characterization. In the present work, we ex- ploit the dynamic information present over different frames of a sports video to characterize the change in the configu- ration of players across different frames. For scene dynamic characterization firstly location of players are detected by using motion based segmentation. We then
Currently, few studies focus on analysing the degree of the Player eXperience (PX) in video games. Video games have now become interactive entertainment systems with a high economic impact on society; interactive systems characterized by their subjectivity, which differ from other systems in that their main objective is to entertain and amuse the user (player). This work discusses the analysis
José Luis González Sánchez; Francisco Luis Gutierrez Vela; Francisco Montero Simarro; Natalia Padilla Zea
Virtual studio technology enables the mixing of physical and digital 3D objects and thus expands the way of representing design ideas in terms of virtual video prototypes, which offers new possibilities for designers by combining elements of prototypes, mock-ups, scenarios, and conventional video. In this article we report our initial experience in the domain of pervasive healthcare with producing virtual
Virtual studio technology enables the mixing of physical and digital 3D objects and thus expands the way of representing design ideas in terms of virtual video prototypes, which offers new possibilities for designers by combining elements of prototypes, mock-ups, scenarios, and conventional video. In this article we report our initial experience in the domain of pervasive healthcare with producing virtual
Jakob Bardram; Claus Bossen; Andreas Lykke-Olesen; Rune Nielsen; Kim Halskov Madsen
Oak Ridge National Laboratory staff have developed a video compression system for low-bandwidth remote operations. The objective is to provide real-time video at data rates comparable to available tactical radio links, typically 16 to 64 thousand bits per...
Objectivevideo quality measurement has become an important issue, as multimedia services are now widely available over the Internet and other wireless communication media. Traditionally, professional CRT monitors have been used to measure subjective video quality. However, the majority of users have LCD, plasma display panel (PDP), or consumer-graded CRT monitors. We compared the subjective video quality of various TV and LCD PC monitors. Subjective tests were performed with a wide range of video sequences using different monitors, and their correlations were analyzed. Although there were high correlations among the various display monitors, care should be taken in selecting a monitor for certain applications.
Psychophysical phenomena such as categorical perception and the perceptual magnet effect indicate that our auditory perceptual spaces are warped for some stimuli. This paper investigates the effects of two dif- ferent kinds of training on auditory perceptual space. It is first shown that categorization training using non-speech stimuli, in which subjects learn to identify stimuli within a particular frequency range
Frank H. Guenther; Fatima T. Husain; Michael A. Cohen; Barbara G. Shinn-Cunningham
Multidimensional scaling (MDS) techniques provide a promising measurement strategy for characteriz- ing individual differences in cognitive processing, which many clinical theories associate with the development, maintenance, and treatment of psychopathology. The authors describe the use of deter- ministic and probabilistic MDS techniques for investigating numerous aspects of perceptual organization, such as dimensional attention, perceptual correlation, within-attribute organization, and perceptual variability.
Teresa A. Treat; Richard M. McFall; Richard J. Viken; Robert M. Nosofsky; David B. MacKay; John K. Kruschke
Recent evidence from cognitive neuroscience suggests that certain cognitive processes employ perceptual representations. Inspired by this evidence, a few researchers have proposed that cognition is inherently perceptual. They have developed an innovative theoretical approach that rests on the notion of perceptual simulation and marshaled several…
|Recent evidence from cognitive neuroscience suggests that certain cognitive processes employ perceptual representations. Inspired by this evidence, a few researchers have proposed that cognition is inherently perceptual. They have developed an innovative theoretical approach that rests on the notion of perceptual simulation and marshaled several…
Certain visual stimuli can give rise to contradictory perceptions. In this paper we examine the temporal dynamics of perceptual reversals experienced with biological motion, comparing these dynamics to those observed with other ambiguous structure from motion (SFM) stimuli. In our first experiment, naïve observers monitored perceptual alternations with an ambiguous rotating walker, a figure that randomly alternates between walking in clockwise (CW) and counter-clockwise (CCW) directions. While the number of reported reversals varied between observers, the observed dynamics (distribution of dominance durations, CW/CCW proportions) were comparable to those experienced with an ambiguous kinetic depth cylinder. In a second experiment, we compared reversal profiles with rotating and standard point-light walkers (i.e. non-rotating). Over multiple test repetitions, three out of four observers experienced consistently shorter mean percept durations with the rotating walker, suggesting that the added rotational component may speed up reversal rates with biomotion. For both stimuli, the drift in alternation rate across trial and across repetition was minimal. In our final experiment, we investigated whether reversals with the rotating walker and a non-biological object with similar global dimensions (rotating cuboid) occur at random phases of the rotation cycle. We found evidence that some observers experience peaks in the distribution of response locations that are relatively stable across sessions. Using control data, we discuss the role of eye movements in the development of these reversal patterns, and the related role of exogenous stimulus characteristics. In summary, we have demonstrated that the temporal dynamics of reversal with biological motion are similar to other forms of ambiguous SFM. We conclude that perceptual switching with biological motion is a robust bistable phenomenon.
This paper describes an automated video analysis based monitoring system with processing at the sensor edge to watch and report certain predetermined events and unusual activities in remote areas that may be in unfriendly zones. The prototype system developed here involves content extraction from video streams collected by unattended ground cameras, tracking of objects, detection of events, and assessment of scenes for anomalous situations. The application requirements impose efficiency constraints on video analysis algorithms due to low-power on the sensor processing board. We present efficient video analysis algorithms for detection, tracking and classification of objects, analysis of extracted object and scene information to detect specific events as well as anomalous or novel situations at the video camera level. Our multi-tier and modular video analysis approach uses fast a space-based peripheral vision component for quick spatially based tracking of objects, detailed object- or scene-based feature extractors and data driven Support Vector Machine (SVM) classifiers that handle feature-based analysis at multiple data levels. Our algorithms are developed and tested on PC platform but designed to match the processing and power limitations of the target hardware platform. The videoobject detection and tracking components have been implemented on Texas Instruments DM642 evaluation board for assessing the feasibility of the prototype system.
Guler, Sadiye; Garg, Kshitiz; Silverstein, Jason A.
The Tracker Video Analysis and Modeling Tool allows students to model and analyze the motion of objects in videos. By overlaying simple dynamical models directly onto videos, students may see how well a model matches the real world. Interference patterns and spectra can also be analyzed with Tracker. Tracker 4.81 installers are available on Linux, Mac OS X, and Windows and include the Xuggle open source video engine. Tracker 4.81 Windows Installer Tracker 4.81 Mac OS X Installer Tracker 4.81 Linux 32-bit Installer - Instructions Tracker 4.81 Linux 64-bit Installer - Instructions Tracker is an Open Source Physics tool built on the OSP code library. Additional Tracker resources, demonstration experiments, and videos, can be found by searching ComPADRE for "Tracker." Additional Tracker resources including Tracker help and sample videos are available from the Tracker home page at Cabrillo College below.
|This 115-page annotated bibliography contains material on perceptual motor development. The introductory portion of the bibliography presents general reading on perception, learning, and development. The first portion contains annotated works by six specific authors. The second portion presents works grouped under the following headings: a)…
American Association for Health, Physical Education, and Recreation, Washington, DC.
|Despite decades of studies of human infants, a still open question concerns the role of visual experience in the development of the ability to perceive complete shapes over partial occlusion. Previous studies show that newborns fail to manifest this ability, either because they lack the visual experience required for perceptual completion or…
|Recent work demonstrates that learning to understand noise-vocoded (NV) speech alters sublexical perceptual processes but is enhanced by the simultaneous provision of higher-level, phonological, but not lexical content (Hervais-Adelman, Davis, Johnsrude, & Carlyon, 2008), consistent with top-down learning (Davis, Johnsrude, Hervais-Adelman,…
Hervais-Adelman, Alexis G.; Davis, Matthew H.; Johnsrude, Ingrid S.; Taylor, Karen J.; Carlyon, Robert P.
Face processing changes when a face is learned with personally relevant information. In a five-day learning paradigm, faces were presented with rich semantic stories that conveyed personal information about the faces. Event-related potentials were recorded before and after learning during a passive viewing task. When faces were novel, we observed the expected N170 repetition effect-a reduction in amplitude following face repetition. However, when faces were learned with personal information, the N170 repetition effect was eliminated, suggesting that semantic information modulates the N170 repetition effect. To control for the possibility that a simple perceptual effect contributed to the change in the N170 repetition effect, another experiment was conducted using stories that were not related to the person (i.e., stories about rocks and volcanoes). Although viewers were exposed to the faces an equal amount of time, the typical N170 repetition effect was observed, indicating that personal semantic information associated with a face, and not simply perceptual exposure, produced the observed reduction in the N170 repetition effect. These results are the first to reveal a critical perceptual change in face processing as a result of learning person-related information. The results have important implications for researchers studying face processing, as well as learning and memory in general, as they demonstrate that perceptual information alone is not enough to establish familiarity akin to real-world person learning. PMID:18752406
Designed for parents, the guide offers instructions for home activities to supplement the school program for children with perceptual motor disturbances. An individual program sheet is provided; behavioral characteristics and the child's need for structure are explained. Activities detailed include motor planning, body image, fine motor…
It is difficult to see how current models of discourse comprehension can be “scaled up” to account for the rich situation models that may be constructed during naturalistic language comprehension, as when readers are immersed in the story world. Recent proposals about embodied cognition and perceptual symbols, such as those put forth by Glenberg and Robertson and by Roth might
In this position paper we attempt to derive an architecture and mechanism for perceptual associative memory and learning for software agents and cognitive robots from what is known, or believed, about the same faculties in human and other animal cognition. Based on that of the IDA model of global workspace theory, a conceptual and computational model of cognition, this architecture,
Abstract This paper introduces a novel ,speech ,enhancement ,system based on a wavelet denoising framework. In this system, the noisy speech is first preprocessed using a generalized spectral subtraction method ,to initially lower ,the noise level with negligible speech distortion. A perceptual ,wavelet transform isthen,used to decompose ,the resulting speech signal into critical bands. Threshold estimation is implemented ,that is
This paper provides a classification of perceptual issues in augmented reality, created with a visual processing and interpretation pipeline in mind. We organize issues into ones related to the environment, capturing, augmentation, display, and individual user differences. We also illuminate issues associated with more recent platforms such as handhelds or projector-camera systems. Throughout, we describe current approaches to addressing these
To understand the brain mechanisms of olfaction we must understand the rules that govern the link between odorant structure and odorant perception. Natural odors are in fact mixtures made of many molecules, and there is currently no method to look at the molecular structure of such odorant-mixtures and predict their smell. In three separate experiments, we asked 139 subjects to rate the pairwise perceptual similarity of 64 odorant-mixtures ranging in size from 4 to 43 mono-molecular components. We then tested alternative models to link odorant-mixture structure to odorant-mixture perceptual similarity. Whereas a model that considered each mono-molecular component of a mixture separately provided a poor prediction of mixture similarity, a model that represented the mixture as a single structural vector provided consistent correlations between predicted and actual perceptual similarity (r?0.49, p<0.001). An optimized version of this model yielded a correlation of r?=?0.85 (p<0.001) between predicted and actual mixture similarity. In other words, we developed an algorithm that can look at the molecular structure of two novel odorant-mixtures, and predict their ensuing perceptual similarity. That this goal was attained using a model that considers the mixtures as a single vector is consistent with a synthetic rather than analytical brain processing mechanism in olfaction. PMID:24068899
Snitz, Kobi; Yablonka, Adi; Weiss, Tali; Frumin, Idan; Khan, Rehan M; Sobel, Noam
Olfactory memory is especially persistent. The current study explored whether this applies to a form of perceptual learning, in which experience of an odor mixture results in greater judged similarity between its elements. Experiment 1A contrasted 2 forms of interference procedure, "compound" (mixture AW, followed by presentation of new mixtures…
Stevenson, Richard J.; Case, Trevor I.; Tomiczek, Caroline
Two groups of hypothetically psychosisprone subjects were chosen from among college students who scored deviantly high on scales of Physical Anhedonia (n = 50) or Perceptual Aberration (n = 65). Scores on these two scales had a small negative correlation, indicating that the scales identify different sets of deviant subjects. These experimental subjects and a control group (n = 66)
Loren J. Chapman; William S. Edell; Jean P. Chapman
In this paper I look at dynamic mental representations, motion detection under conditions of certainty or uncertainty, perceptual adaptation, and priming of motion direction. The goal is to bridge the boundaries created in part by the use of different terminology within different literatures. The most fruitful parallel may be between the phenomenon of dynamic mental representation and representa- tional momentum
To understand the brain mechanisms of olfaction we must understand the rules that govern the link between odorant structure and odorant perception. Natural odors are in fact mixtures made of many molecules, and there is currently no method to look at the molecular structure of such odorant-mixtures and predict their smell. In three separate experiments, we asked 139 subjects to rate the pairwise perceptual similarity of 64 odorant-mixtures ranging in size from 4 to 43 mono-molecular components. We then tested alternative models to link odorant-mixture structure to odorant-mixture perceptual similarity. Whereas a model that considered each mono-molecular component of a mixture separately provided a poor prediction of mixture similarity, a model that represented the mixture as a single structural vector provided consistent correlations between predicted and actual perceptual similarity (r?0.49, p<0.001). An optimized version of this model yielded a correlation of r?=?0.85 (p<0.001) between predicted and actual mixture similarity. In other words, we developed an algorithm that can look at the molecular structure of two novel odorant-mixtures, and predict their ensuing perceptual similarity. That this goal was attained using a model that considers the mixtures as a single vector is consistent with a synthetic rather than analytical brain processing mechanism in olfaction.
Weiss, Tali; Frumin, Idan; Khan, Rehan M.; Sobel, Noam
|Pigeons responded in a perceptual categorization task with six different stimuli (shades of gray), three of which were to be classified as "light" or "dark", respectively. Reinforcement probability for correct responses was varied from 0.2 to 0.6 across blocks of sessions and was unequal for correct light and dark responses. Introduction of a new…
We present a framework for accelerating interactive rendering, grounded in psychophysical models of visual perception. This framework is applicable to multiresolution rendering techniques that use a hierarchy of local simplification operations. Our method drives those local operations directly by perceptual metrics; the effect of each simplification on the final image is considered in terms of the contrast the operation will
|Olfactory memory is especially persistent. The current study explored whether this applies to a form of perceptual learning, in which experience of an odor mixture results in greater judged similarity between its elements. Experiment 1A contrasted 2 forms of interference procedure, "compound" (mixture AW, followed by presentation of new mixtures…
Stevenson, Richard J.; Case, Trevor I.; Tomiczek, Caroline
|People categorized pairs of perceptual stimuli that varied in both category membership and pairwise similarity. Experiments 1 and 2 showed categorization of 1 color of a pair to be reliably contrasted from that of the other. This similarity-based contrast effect occurred only when the context stimulus was relevant for the categorization of the…
Hampton, James A.; Estes, Zachary; Simmons, Claire L.
Recent work demonstrates that learning to understand noise-vocoded (NV) speech alters sublexical perceptual processes but is enhanced by the simultaneous provision of higher-level, phonological, but not lexical content (Hervais-Adelman, Davis, Johnsrude, & Carlyon, 2008), consistent with top-down learning (Davis, Johnsrude, Hervais-Adelman,…
Hervais-Adelman, Alexis G.; Davis, Matthew H.; Johnsrude, Ingrid S.; Taylor, Karen J.; Carlyon, Robert P.
Performance on perceptual tasks requiring the discrimination of brief, temporally proximate or temporally varying sensory stimuli (temporal processing tasks) is impaired in some individuals with developmental language disorder and\\/or dyslexia. Little is known about how these temporal processes in perception develop and how they relate to language and reading performance in the normal population. The present study examined performance on
Kerry M. M. Walker; Susan E. Hall; Raymond M. Klein; Dennis P. Phillips
Results of an Austrian-German comparison pertaining to different aspects of job satisfaction are presented. It is argued that even at this early stage in their development, perceptual indicators can be used to reveal overall trends and to point up trouble spots where socio-economic action on the part of the authorities is called for.
In this paper we propose an ontology and a software architecture for observing and modeling context and situation. We are especially con- cerned with the perceptual components for context awareness. We propose a model in which a user's context is described by a set of roles and relations. Different configurations of roles and relations correspond to situations within the context.
James L. Crowley; Joëlle Coutaz; Gaeten Rey; Patrick Reignier
Describes three-dimensional computer aided design (CAD) models for every component in a representative mechanical system; the CAD models made it easy to generate 3-D animations that are ideal for teaching perceptual skills in multimedia computer-based technical training. Fifteen illustrations are provided. (AEF)
|This study investigated some of the differences in a perceptual discrimination performance task due to: (1) sex and (2) level of aspiration of each group of subjects. It was found that there is an interaction effect between femaleness, maleness, and aspiration level. (Author)|
Perceptual psychology widely operationalizes color appearance as a construct with very close, even isomorphic, ties to color naming structure. Indeed, a considerable body of psychological and psychophysics research uses naming-based tasks to derive structural properties of color ap- pearance space. New research investigating the relations linking color similarity and color naming structures suggest that assumptions involving strong structural correspondences between