scenes motion-based object: Topics by Science.gov

Sample records for scenes motion-based object

Modeling repetitive motions using structured light.

PubMed

Xu, Yi; Aliaga, Daniel G

2010-01-01

Obtaining models of dynamic 3D objects is an important part of content generation for computer graphics. Numerous methods have been extended from static scenarios to model dynamic scenes. If the states or poses of the dynamic object repeat often during a sequence (but not necessarily periodically), we call such a repetitive motion. There are many objects, such as toys, machines, and humans, undergoing repetitive motions. Our key observation is that when a motion-state repeats, we can sample the scene under the same motion state again but using a different set of parameters; thus, providing more information of each motion state. This enables robustly acquiring dense 3D information difficult for objects with repetitive motions using only simple hardware. After the motion sequence, we group temporally disjoint observations of the same motion state together and produce a smooth space-time reconstruction of the scene. Effectively, the dynamic scene modeling problem is converted to a series of static scene reconstructions, which are easier to tackle. The varying sampling parameters can be, for example, structured-light patterns, illumination directions, and viewpoints resulting in different modeling techniques. Based on this observation, we present an image-based motion-state framework and demonstrate our paradigm using either a synchronized or an unsynchronized structured-light acquisition method.
Structure preserving clustering-object tracking via subgroup motion pattern segmentation

NASA Astrophysics Data System (ADS)

Fan, Zheyi; Zhu, Yixuan; Jiang, Jiao; Weng, Shuqin; Liu, Zhiwen

2018-01-01

Tracking clustering objects with similar appearances simultaneously in collective scenes is a challenging task in the field of collective motion analysis. Recent work on clustering-object tracking often suffers from poor tracking accuracy and terrible real-time performance due to the neglect or the misjudgment of the motion differences among objects. To address this problem, we propose a subgroup motion pattern segmentation framework based on a multilayer clustering structure and establish spatial constraints only among objects in the same subgroup, which entails having consistent motion direction and close spatial position. In addition, the subgroup segmentation results are updated dynamically because crowd motion patterns are changeable and affected by objects' destinations and scene structures. The spatial structure information combined with the appearance similarity information is used in the structure preserving object tracking framework to track objects. Extensive experiments conducted on several datasets containing multiple real-world crowd scenes validate the accuracy and the robustness of the presented algorithm for tracking objects in collective scenes.
A system for learning statistical motion patterns.

PubMed

Hu, Weiming; Xiao, Xuejuan; Fu, Zhouyu; Xie, Dan; Tan, Tieniu; Maybank, Steve

2006-09-01

Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy K-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction.
Method for separating video camera motion from scene motion for constrained 3D displacement measurements

NASA Astrophysics Data System (ADS)

Gauthier, L. R.; Jansen, M. E.; Meyer, J. R.

2014-09-01

Camera motion is a potential problem when a video camera is used to perform dynamic displacement measurements. If the scene camera moves at the wrong time, the apparent motion of the object under study can easily be confused with the real motion of the object. In some cases, it is practically impossible to prevent camera motion, as for instance, when a camera is used outdoors in windy conditions. A method to address this challenge is described that provides an objective means to measure the displacement of an object of interest in the scene, even when the camera itself is moving in an unpredictable fashion at the same time. The main idea is to synchronously measure the motion of the camera and to use those data ex post facto to subtract out the apparent motion in the scene that is caused by the camera motion. The motion of the scene camera is measured by using a reference camera that is rigidly attached to the scene camera and oriented towards a stationary reference object. For instance, this reference object may be on the ground, which is known to be stationary. It is necessary to calibrate the reference camera by simultaneously measuring the scene images and the reference images at times when it is known that the scene object is stationary and the camera is moving. These data are used to map camera movement data to apparent scene movement data in pixel space and subsequently used to remove the camera movement from the scene measurements.
A Saccade Based Framework for Real-Time Motion Segmentation Using Event Based Vision Sensors

PubMed Central

Mishra, Abhishek; Ghosh, Rohan; Principe, Jose C.; Thakor, Nitish V.; Kukreja, Sunil L.

2017-01-01

Motion segmentation is a critical pre-processing step for autonomous robotic systems to facilitate tracking of moving objects in cluttered environments. Event based sensors are low power analog devices that represent a scene by means of asynchronous information updates of only the dynamic details at high temporal resolution and, hence, require significantly less calculations. However, motion segmentation using spatiotemporal data is a challenging task due to data asynchrony. Prior approaches for object tracking using neuromorphic sensors perform well while the sensor is static or a known model of the object to be followed is available. To address these limitations, in this paper we develop a technique for generalized motion segmentation based on spatial statistics across time frames. First, we create micromotion on the platform to facilitate the separation of static and dynamic elements of a scene, inspired by human saccadic eye movements. Second, we introduce the concept of spike-groups as a methodology to partition spatio-temporal event groups, which facilitates computation of scene statistics and characterize objects in it. Experimental results show that our algorithm is able to classify dynamic objects with a moving camera with maximum accuracy of 92%. PMID:28316563
Object Scene Flow

NASA Astrophysics Data System (ADS)

Menze, Moritz; Heipke, Christian; Geiger, Andreas

2018-06-01

This work investigates the estimation of dense three-dimensional motion fields, commonly referred to as scene flow. While great progress has been made in recent years, large displacements and adverse imaging conditions as observed in natural outdoor environments are still very challenging for current approaches to reconstruction and motion estimation. In this paper, we propose a unified random field model which reasons jointly about 3D scene flow as well as the location, shape and motion of vehicles in the observed scene. We formulate the problem as the task of decomposing the scene into a small number of rigidly moving objects sharing the same motion parameters. Thus, our formulation effectively introduces long-range spatial dependencies which commonly employed local rigidity priors are lacking. Our inference algorithm then estimates the association of image segments and object hypotheses together with their three-dimensional shape and motion. We demonstrate the potential of the proposed approach by introducing a novel challenging scene flow benchmark which allows for a thorough comparison of the proposed scene flow approach with respect to various baseline models. In contrast to previous benchmarks, our evaluation is the first to provide stereo and optical flow ground truth for dynamic real-world urban scenes at large scale. Our experiments reveal that rigid motion segmentation can be utilized as an effective regularizer for the scene flow problem, improving upon existing two-frame scene flow methods. At the same time, our method yields plausible object segmentations without requiring an explicitly trained recognition model for a specific object class.
Detecting and Analyzing Multiple Moving Objects in Crowded Environments with Coherent Motion Regions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cheriyadat, Anil M.

Understanding the world around us from large-scale video data requires vision systems that can perform automatic interpretation. While human eyes can unconsciously perceive independent objects in crowded scenes and other challenging operating environments, automated systems have difficulty detecting, counting, and understanding their behavior in similar scenes. Computer scientists at ORNL have a developed a technology termed as "Coherent Motion Region Detection" that invloves identifying multiple indepedent moving objects in crowded scenes by aggregating low-level motion cues extracted from moving objects. Humans and other species exploit such low-level motion cues seamlessely to perform perceptual grouping for visual understanding. The algorithm detectsmore » and tracks feature points on moving objects resulting in partial trajectories that span coherent 3D region in the space-time volume defined by the video. In the case of multi-object motion, many possible coherent motion regions can be constructed around the set of trajectories. The unique approach in the algorithm is to identify all possible coherent motion regions, then extract a subset of motion regions based on an innovative measure to automatically locate moving objects in crowded environments.The software reports snapshot of the object, count, and derived statistics ( count over time) from input video streams. The software can directly process videos streamed over the internet or directly from a hardware device (camera).« less
The Pop out of Scene-Relative Object Movement against Retinal Motion Due to Self-Movement

ERIC Educational Resources Information Center

Rushton, Simon K.; Bradshaw, Mark F.; Warren, Paul A.

2007-01-01

An object that moves is spotted almost effortlessly; it "pops out." When the observer is stationary, a moving object is uniquely identified by retinal motion. This is not so when the observer is also moving; as the eye travels through space all scene objects change position relative to the eye producing a complicated field of retinal motion.…
Motion video analysis using planar parallax

NASA Astrophysics Data System (ADS)

Sawhney, Harpreet S.

1994-04-01

Motion and structure analysis in video sequences can lead to efficient descriptions of objects and their motions. Interesting events in videos can be detected using such an analysis--for instance independent object motion when the camera itself is moving, figure-ground segregation based on the saliency of a structure compared to its surroundings. In this paper we present a method for 3D motion and structure analysis that uses a planar surface in the environment as a reference coordinate system to describe a video sequence. The motion in the video sequence is described as the motion of the reference plane, and the parallax motion of all the non-planar components of the scene. It is shown how this method simplifies the otherwise hard general 3D motion analysis problem. In addition, a natural coordinate system in the environment is used to describe the scene which can simplify motion based segmentation. This work is a part of an ongoing effort in our group towards video annotation and analysis for indexing and retrieval. Results from a demonstration system being developed are presented.
Automatic acquisition of motion trajectories: tracking hockey players

NASA Astrophysics Data System (ADS)

Okuma, Kenji; Little, James J.; Lowe, David

2003-12-01

Computer systems that have the capability of analyzing complex and dynamic scenes play an essential role in video annotation. Scenes can be complex in such a way that there are many cluttered objects with different colors, shapes and sizes, and can be dynamic with multiple interacting moving objects and a constantly changing background. In reality, there are many scenes that are complex, dynamic, and challenging enough for computers to describe. These scenes include games of sports, air traffic, car traffic, street intersections, and cloud transformations. Our research is about the challenge of inventing a descriptive computer system that analyzes scenes of hockey games where multiple moving players interact with each other on a constantly moving background due to camera motions. Ultimately, such a computer system should be able to acquire reliable data by extracting the players" motion as their trajectories, querying them by analyzing the descriptive information of data, and predict the motions of some hockey players based on the result of the query. Among these three major aspects of the system, we primarily focus on visual information of the scenes, that is, how to automatically acquire motion trajectories of hockey players from video. More accurately, we automatically analyze the hockey scenes by estimating parameters (i.e., pan, tilt, and zoom) of the broadcast cameras, tracking hockey players in those scenes, and constructing a visual description of the data by displaying trajectories of those players. Many technical problems in vision such as fast and unpredictable players' motions and rapid camera motions make our challenge worth tackling. To the best of our knowledge, there have not been any automatic video annotation systems for hockey developed in the past. Although there are many obstacles to overcome, our efforts and accomplishments would hopefully establish the infrastructure of the automatic hockey annotation system and become a milestone for research in automatic video annotation in this domain.
Motion of glossy objects does not promote separation of lighting and surface colour

PubMed Central

2017-01-01

The surface properties of an object, such as texture, glossiness or colour, provide important cues to its identity. However, the actual visual stimulus received by the eye is determined by both the properties of the object and the illumination. We tested whether operational colour constancy for glossy objects (the ability to distinguish changes in spectral reflectance of the object, from changes in the spectrum of the illumination) was affected by rotational motion of either the object or the light source. The different chromatic and geometric properties of the specular and diffuse reflections provide the basis for this discrimination, and we systematically varied specularity to control the available information. Observers viewed animations of isolated objects undergoing either lighting or surface-based spectral transformations accompanied by motion. By varying the axis of rotation, and surface patterning or geometry, we manipulated: (i) motion-related information about the scene, (ii) relative motion between the surface patterning and the specular reflection of the lighting, and (iii) image disruption caused by this motion. Despite large individual differences in performance with static stimuli, motion manipulations neither improved nor degraded performance. As motion significantly disrupts frame-by-frame low-level image statistics, we infer that operational constancy depends on a high-level scene interpretation, which is maintained in all conditions. PMID:29291113
Criterion-free measurement of motion transparency perception at different speeds

PubMed Central

Rocchi, Francesca; Ledgeway, Timothy; Webb, Ben S.

2018-01-01

Transparency perception often occurs when objects within the visual scene partially occlude each other or move at the same time, at different velocities across the same spatial region. Although transparent motion perception has been extensively studied, we still do not understand how the distribution of velocities within a visual scene contribute to transparent perception. Here we use a novel psychophysical procedure to characterize the distribution of velocities in a scene that give rise to transparent motion perception. To prevent participants from adopting a subjective decision criterion when discriminating transparent motion, we used an “odd-one-out,” three-alternative forced-choice procedure. Two intervals contained the standard—a random-dot-kinematogram with dot speeds or directions sampled from a uniform distribution. The other interval contained the comparison—speeds or directions sampled from a distribution with the same range as the standard, but with a notch of different widths removed. Our results suggest that transparent motion perception is driven primarily by relatively slow speeds, and does not emerge when only very fast speeds are present within a visual scene. Transparent perception of moving surfaces is modulated by stimulus-based characteristics, such as the separation between the means of the overlapping distributions or the range of speeds presented within an image. Our work illustrates the utility of using objective, forced-choice methods to reveal the mechanisms underlying motion transparency perception. PMID:29614154
A qualitative approach for recovering relative depths in dynamic scenes

NASA Technical Reports Server (NTRS)

Haynes, S. M.; Jain, R.

1987-01-01

This approach to dynamic scene analysis is a qualitative one. It computes relative depths using very general rules. The depths calculated are qualitative in the sense that the only information obtained is which object is in front of which others. The motion is qualitative in the sense that the only required motion data is whether objects are moving toward or away from the camera. Reasoning, which takes into account the temporal character of the data and the scene, is qualitative. This approach to dynamic scene analysis can tolerate imprecise data because in dynamic scenes the data are redundant.
Camera pose estimation for augmented reality in a small indoor dynamic scene

NASA Astrophysics Data System (ADS)

Frikha, Rawia; Ejbali, Ridha; Zaied, Mourad

2017-09-01

Camera pose estimation remains a challenging task for augmented reality (AR) applications. Simultaneous localization and mapping (SLAM)-based methods are able to estimate the six degrees of freedom camera motion while constructing a map of an unknown environment. However, these methods do not provide any reference for where to insert virtual objects since they do not have any information about scene structure and may fail in cases of occlusion of three-dimensional (3-D) map points or dynamic objects. This paper presents a real-time monocular piece wise planar SLAM method using the planar scene assumption. Using planar structures in the mapping process allows rendering virtual objects in a meaningful way on the one hand and improving the precision of the camera pose and the quality of 3-D reconstruction of the environment by adding constraints on 3-D points and poses in the optimization process on the other hand. We proposed to benefit from the 3-D planes rigidity motion in the tracking process to enhance the system robustness in the case of dynamic scenes. Experimental results show that using a constrained planar scene improves our system accuracy and robustness compared with the classical SLAM systems.
Computer-aided target tracking in motion analysis studies

NASA Astrophysics Data System (ADS)

Burdick, Dominic C.; Marcuse, M. L.; Mislan, J. D.

1990-08-01

Motion analysis studies require the precise tracking of reference objects in sequential scenes. In a typical situation, events of interest are captured at high frame rates using special cameras, and selected objects or targets are tracked on a frame by frame basis to provide necessary data for motion reconstruction. Tracking is usually done using manual methods which are slow and prone to error. A computer based image analysis system has been developed that performs tracking automatically. The objective of this work was to eliminate the bottleneck due to manual methods in high volume tracking applications such as the analysis of crash test films for the automotive industry. The system has proven to be successful in tracking standard fiducial targets and other objects in crash test scenes. Over 95 percent of target positions which could be located using manual methods can be tracked by the system, with a significant improvement in throughput over manual methods. Future work will focus on the tracking of clusters of targets and on tracking deformable objects such as airbags.
Accuracy and Tuning of Flow Parsing for Visual Perception of Object Motion During Self-Motion

PubMed Central

Niehorster, Diederick C.

2017-01-01

How do we perceive object motion during self-motion using visual information alone? Previous studies have reported that the visual system can use optic flow to identify and globally subtract the retinal motion component resulting from self-motion to recover scene-relative object motion, a process called flow parsing. In this article, we developed a retinal motion nulling method to directly measure and quantify the magnitude of flow parsing (i.e., flow parsing gain) in various scenarios to examine the accuracy and tuning of flow parsing for the visual perception of object motion during self-motion. We found that flow parsing gains were below unity for all displays in all experiments; and that increasing self-motion and object motion speed did not alter flow parsing gain. We conclude that visual information alone is not sufficient for the accurate perception of scene-relative motion during self-motion. Although flow parsing performs global subtraction, its accuracy also depends on local motion information in the retinal vicinity of the moving object. Furthermore, the flow parsing gain was constant across common self-motion or object motion speeds. These results can be used to inform and validate computational models of flow parsing. PMID:28567272
Shape-based human detection for threat assessment

NASA Astrophysics Data System (ADS)

Lee, Dah-Jye; Zhan, Pengcheng; Thomas, Aaron; Schoenberger, Robert B.

2004-07-01

Detection of intrusions for early threat assessment requires the capability of distinguishing whether the intrusion is a human, an animal, or other objects. Most low-cost security systems use simple electronic motion detection sensors to monitor motion or the location of objects within the perimeter. Although cost effective, these systems suffer from high rates of false alarm, especially when monitoring open environments. Any moving objects including animals can falsely trigger the security system. Other security systems that utilize video equipment require human interpretation of the scene in order to make real-time threat assessment. Shape-based human detection technique has been developed for accurate early threat assessments for open and remote environment. Potential threats are isolated from the static background scene using differential motion analysis and contours of the intruding objects are extracted for shape analysis. Contour points are simplified by removing redundant points connecting short and straight line segments and preserving only those with shape significance. Contours are represented in tangent space for comparison with shapes stored in database. Power cepstrum technique has been developed to search for the best matched contour in database and to distinguish a human from other objects from different viewing angles and distances.
Encodings of implied motion for animate and inanimate object categories in the two visual pathways.

PubMed

Lu, Zhengang; Li, Xueting; Meng, Ming

2016-01-15

Previous research has proposed two separate pathways for visual processing: the dorsal pathway for "where" information vs. the ventral pathway for "what" information. Interestingly, the middle temporal cortex (MT) in the dorsal pathway is involved in representing implied motion from still pictures, suggesting an interaction between motion and object related processing. However, the relationship between how the brain encodes implied motion and how the brain encodes object/scene categories is unclear. To address this question, fMRI was used to measure activity along the two pathways corresponding to different animate and inanimate categories of still pictures with different levels of implied motion speed. In the visual areas of both pathways, activity induced by pictures of humans and animals was hardly modulated by the implied motion speed. By contrast, activity in these areas correlated with the implied motion speed for pictures of inanimate objects and scenes. The interaction between implied motion speed and stimuli category was significant, suggesting different encoding mechanisms of implied motion for animate-inanimate distinction. Further multivariate pattern analysis of activity in the dorsal pathway revealed significant effects of stimulus category that are comparable to the ventral pathway. Moreover, still pictures of inanimate objects/scenes with higher implied motion speed evoked activation patterns that were difficult to differentiate from those evoked by pictures of humans and animals, indicating a functional role of implied motion in the representation of object categories. These results provide novel evidence to support integrated encoding of motion and object categories, suggesting a rethink of the relationship between the two visual pathways. Copyright © 2015 Elsevier Inc. All rights reserved.
A Motion Detection Algorithm Using Local Phase Information

PubMed Central

Lazar, Aurel A.; Ukani, Nikul H.; Zhou, Yiyin

2016-01-01

Previous research demonstrated that global phase alone can be used to faithfully represent visual scenes. Here we provide a reconstruction algorithm by using only local phase information. We also demonstrate that local phase alone can be effectively used to detect local motion. The local phase-based motion detector is akin to models employed to detect motion in biological vision, for example, the Reichardt detector. The local phase-based motion detection algorithm introduced here consists of two building blocks. The first building block measures/evaluates the temporal change of the local phase. The temporal derivative of the local phase is shown to exhibit the structure of a second order Volterra kernel with two normalized inputs. We provide an efficient, FFT-based algorithm for implementing the change of the local phase. The second processing building block implements the detector; it compares the maximum of the Radon transform of the local phase derivative with a chosen threshold. We demonstrate examples of applying the local phase-based motion detection algorithm on several video sequences. We also show how the locally detected motion can be used for segmenting moving objects in video scenes and compare our local phase-based algorithm to segmentation achieved with a widely used optic flow algorithm. PMID:26880882
Motion-seeded object-based attention for dynamic visual imagery

NASA Astrophysics Data System (ADS)

Huber, David J.; Khosla, Deepak; Kim, Kyungnam

2017-05-01

This paper† describes a novel system that finds and segments "objects of interest" from dynamic imagery (video) that (1) processes each frame using an advanced motion algorithm that pulls out regions that exhibit anomalous motion, and (2) extracts the boundary of each object of interest using a biologically-inspired segmentation algorithm based on feature contours. The system uses a series of modular, parallel algorithms, which allows many complicated operations to be carried out by the system in a very short time, and can be used as a front-end to a larger system that includes object recognition and scene understanding modules. Using this method, we show 90% accuracy with fewer than 0.1 false positives per frame of video, which represents a significant improvement over detection using a baseline attention algorithm.

Moving Object Detection on a Vehicle Mounted Back-Up Camera

PubMed Central

Kim, Dong-Sun; Kwon, Jinsan

2015-01-01

In the detection of moving objects from vision sources one usually assumes that the scene has been captured by stationary cameras. In case of backing up a vehicle, however, the camera mounted on the vehicle moves according to the vehicle’s movement, resulting in ego-motions on the background. This results in mixed motion in the scene, and makes it difficult to distinguish between the target objects and background motions. Without further treatments on the mixed motion, traditional fixed-viewpoint object detection methods will lead to many false-positive detection results. In this paper, we suggest a procedure to be used with the traditional moving object detection methods relaxing the stationary cameras restriction, by introducing additional steps before and after the detection. We also decribe the implementation as a FPGA platform along with the algorithm. The target application of this suggestion is use with a road vehicle’s rear-view camera systems. PMID:26712761
Interactive object modelling based on piecewise planar surface patches.

PubMed

Prankl, Johann; Zillich, Michael; Vincze, Markus

2013-06-01

Detecting elements such as planes in 3D is essential to describe objects for applications such as robotics and augmented reality. While plane estimation is well studied, table-top scenes exhibit a large number of planes and methods often lock onto a dominant plane or do not estimate 3D object structure but only homographies of individual planes. In this paper we introduce MDL to the problem of incrementally detecting multiple planar patches in a scene using tracked interest points in image sequences. Planar patches are reconstructed and stored in a keyframe-based graph structure. In case different motions occur, separate object hypotheses are modelled from currently visible patches and patches seen in previous frames. We evaluate our approach on a standard data set published by the Visual Geometry Group at the University of Oxford [24] and on our own data set containing table-top scenes. Results indicate that our approach significantly improves over the state-of-the-art algorithms.
Interactive object modelling based on piecewise planar surface patches☆

PubMed Central

Prankl, Johann; Zillich, Michael; Vincze, Markus

2013-01-01

Detecting elements such as planes in 3D is essential to describe objects for applications such as robotics and augmented reality. While plane estimation is well studied, table-top scenes exhibit a large number of planes and methods often lock onto a dominant plane or do not estimate 3D object structure but only homographies of individual planes. In this paper we introduce MDL to the problem of incrementally detecting multiple planar patches in a scene using tracked interest points in image sequences. Planar patches are reconstructed and stored in a keyframe-based graph structure. In case different motions occur, separate object hypotheses are modelled from currently visible patches and patches seen in previous frames. We evaluate our approach on a standard data set published by the Visual Geometry Group at the University of Oxford [24] and on our own data set containing table-top scenes. Results indicate that our approach significantly improves over the state-of-the-art algorithms. PMID:24511219
Moving vehicles segmentation based on Gaussian motion model

NASA Astrophysics Data System (ADS)

Zhang, Wei; Fang, Xiang Z.; Lin, Wei Y.

2005-07-01

Moving objects segmentation is a challenge in computer vision. This paper focuses on the segmentation of moving vehicles in dynamic scene. We analyses the psychology of human vision and present a framework for segmenting moving vehicles in the highway. The proposed framework consists of two parts. Firstly, we propose an adaptive background update method in which the background is updated according to the change of illumination conditions and thus can adapt to the change of illumination sensitively. Secondly, we construct a Gaussian motion model to segment moving vehicles, in which the motion vectors of the moving pixels are modeled as a Gaussian model and an on-line EM algorithm is used to update the model. The Gaussian distribution of the adaptive model is elevated to determine which moving vectors result from moving vehicles and which from other moving objects such as waving trees. Finally, the pixels with motion vector result from the moving vehicles are segmented. Experimental results of several typical scenes show that the proposed model can detect the moving vehicles correctly and is immune from influence of the moving objects caused by the waving trees and the vibration of camera.
Behavior analysis of video object in complicated background

NASA Astrophysics Data System (ADS)

Zhao, Wenting; Wang, Shigang; Liang, Chao; Wu, Wei; Lu, Yang

2016-10-01

This paper aims to achieve robust behavior recognition of video object in complicated background. Features of the video object are described and modeled according to the depth information of three-dimensional video. Multi-dimensional eigen vector are constructed and used to process high-dimensional data. Stable object tracing in complex scenes can be achieved with multi-feature based behavior analysis, so as to obtain the motion trail. Subsequently, effective behavior recognition of video object is obtained according to the decision criteria. What's more, the real-time of algorithms and accuracy of analysis are both improved greatly. The theory and method on the behavior analysis of video object in reality scenes put forward by this project have broad application prospect and important practical significance in the security, terrorism, military and many other fields.
A novel framework for intelligent surveillance system based on abnormal human activity detection in academic environments.

PubMed

Al-Nawashi, Malek; Al-Hazaimeh, Obaida M; Saraee, Mohamad

2017-01-01

Abnormal activity detection plays a crucial role in surveillance applications, and a surveillance system that can perform robustly in an academic environment has become an urgent need. In this paper, we propose a novel framework for an automatic real-time video-based surveillance system which can simultaneously perform the tracking, semantic scene learning, and abnormality detection in an academic environment. To develop our system, we have divided the work into three phases: preprocessing phase, abnormal human activity detection phase, and content-based image retrieval phase. For motion object detection, we used the temporal-differencing algorithm and then located the motions region using the Gaussian function. Furthermore, the shape model based on OMEGA equation was used as a filter for the detected objects (i.e., human and non-human). For object activities analysis, we evaluated and analyzed the human activities of the detected objects. We classified the human activities into two groups: normal activities and abnormal activities based on the support vector machine. The machine then provides an automatic warning in case of abnormal human activities. It also embeds a method to retrieve the detected object from the database for object recognition and identification using content-based image retrieval. Finally, a software-based simulation using MATLAB was performed and the results of the conducted experiments showed an excellent surveillance system that can simultaneously perform the tracking, semantic scene learning, and abnormality detection in an academic environment with no human intervention.
Scene-based nonuniformity correction and enhancement: pixel statistics and subpixel motion.

PubMed

Zhao, Wenyi; Zhang, Chao

2008-07-01

We propose a framework for scene-based nonuniformity correction (NUC) and nonuniformity correction and enhancement (NUCE) that is required for focal-plane array-like sensors to obtain clean and enhanced-quality images. The core of the proposed framework is a novel registration-based nonuniformity correction super-resolution (NUCSR) method that is bootstrapped by statistical scene-based NUC methods. Based on a comprehensive imaging model and an accurate parametric motion estimation, we are able to remove severe/structured nonuniformity and in the presence of subpixel motion to simultaneously improve image resolution. One important feature of our NUCSR method is the adoption of a parametric motion model that allows us to (1) handle many practical scenarios where parametric motions are present and (2) carry out perfect super-resolution in principle by exploring available subpixel motions. Experiments with real data demonstrate the efficiency of the proposed NUCE framework and the effectiveness of the NUCSR method.
Motion/visual cueing requirements for vortex encounters during simulated transport visual approach and landing

NASA Technical Reports Server (NTRS)

Parrish, R. V.; Bowles, R. L.

1983-01-01

This paper addresses the issues of motion/visual cueing fidelity requirements for vortex encounters during simulated transport visual approaches and landings. Four simulator configurations were utilized to provide objective performance measures during simulated vortex penetrations, and subjective comments from pilots were collected. The configurations used were as follows: fixed base with visual degradation (delay), fixed base with no visual degradation, moving base with visual degradation (delay), and moving base with no visual degradation. The statistical comparisons of the objective measures and the subjective pilot opinions indicated that although both minimum visual delay and motion cueing are recommended for the vortex penetration task, the visual-scene delay characteristics were not as significant a fidelity factor as was the presence of motion cues. However, this indication was applicable to a restricted task, and to transport aircraft. Although they were statistically significant, the effects of visual delay and motion cueing on the touchdown-related measures were considered to be of no practical consequence.
A bio-inspired system for spatio-temporal recognition in static and video imagery

NASA Astrophysics Data System (ADS)

Khosla, Deepak; Moore, Christopher K.; Chelian, Suhas

2007-04-01

This paper presents a bio-inspired method for spatio-temporal recognition in static and video imagery. It builds upon and extends our previous work on a bio-inspired Visual Attention and object Recognition System (VARS). The VARS approach locates and recognizes objects in a single frame. This work presents two extensions of VARS. The first extension is a Scene Recognition Engine (SCE) that learns to recognize spatial relationships between objects that compose a particular scene category in static imagery. This could be used for recognizing the category of a scene, e.g., office vs. kitchen scene. The second extension is the Event Recognition Engine (ERE) that recognizes spatio-temporal sequences or events in sequences. This extension uses a working memory model to recognize events and behaviors in video imagery by maintaining and recognizing ordered spatio-temporal sequences. The working memory model is based on an ARTSTORE1 neural network that combines an ART-based neural network with a cascade of sustained temporal order recurrent (STORE)1 neural networks. A series of Default ARTMAP classifiers ascribes event labels to these sequences. Our preliminary studies have shown that this extension is robust to variations in an object's motion profile. We evaluated the performance of the SCE and ERE on real datasets. The SCE module was tested on a visual scene classification task using the LabelMe2 dataset. The ERE was tested on real world video footage of vehicles and pedestrians in a street scene. Our system is able to recognize the events in this footage involving vehicles and pedestrians.
Linearized motion estimation for articulated planes.

PubMed

Datta, Ankur; Sheikh, Yaser; Kanade, Takeo

2011-04-01

In this paper, we describe the explicit application of articulation constraints for estimating the motion of a system of articulated planes. We relate articulations to the relative homography between planes and show that these articulations translate into linearized equality constraints on a linear least-squares system, which can be solved efficiently using a Karush-Kuhn-Tucker system. The articulation constraints can be applied for both gradient-based and feature-based motion estimation algorithms and to illustrate this, we describe a gradient-based motion estimation algorithm for an affine camera and a feature-based motion estimation algorithm for a projective camera that explicitly enforces articulation constraints. We show that explicit application of articulation constraints leads to numerically stable estimates of motion. The simultaneous computation of motion estimates for all of the articulated planes in a scene allows us to handle scene areas where there is limited texture information and areas that leave the field of view. Our results demonstrate the wide applicability of the algorithm in a variety of challenging real-world cases such as human body tracking, motion estimation of rigid, piecewise planar scenes, and motion estimation of triangulated meshes.
Three-directional motion-compensation mask-based novel look-up table on graphics processing units for video-rate generation of digital holographic videos of three-dimensional scenes.

PubMed

Kwon, Min-Woo; Kim, Seung-Cheol; Kim, Eun-Soo

2016-01-20

A three-directional motion-compensation mask-based novel look-up table method is proposed and implemented on graphics processing units (GPUs) for video-rate generation of digital holographic videos of three-dimensional (3D) scenes. Since the proposed method is designed to be well matched with the software and memory structures of GPUs, the number of compute-unified-device-architecture kernel function calls can be significantly reduced. This results in a great increase of the computational speed of the proposed method, allowing video-rate generation of the computer-generated hologram (CGH) patterns of 3D scenes. Experimental results reveal that the proposed method can generate 39.8 frames of Fresnel CGH patterns with 1920×1080 pixels per second for the test 3D video scenario with 12,088 object points on dual GPU boards of NVIDIA GTX TITANs, and they confirm the feasibility of the proposed method in the practical application fields of electroholographic 3D displays.
Common and Innovative Visuals: A sparsity modeling framework for video.

PubMed

Abdolhosseini Moghadam, Abdolreza; Kumar, Mrityunjay; Radha, Hayder

2014-05-02

Efficient video representation models are critical for many video analysis and processing tasks. In this paper, we present a framework based on the concept of finding the sparsest solution to model video frames. To model the spatio-temporal information, frames from one scene are decomposed into two components: (i) a common frame, which describes the visual information common to all the frames in the scene/segment, and (ii) a set of innovative frames, which depicts the dynamic behaviour of the scene. The proposed approach exploits and builds on recent results in the field of compressed sensing to jointly estimate the common frame and the innovative frames for each video segment. We refer to the proposed modeling framework by CIV (Common and Innovative Visuals). We show how the proposed model can be utilized to find scene change boundaries and extend CIV to videos from multiple scenes. Furthermore, the proposed model is robust to noise and can be used for various video processing applications without relying on motion estimation and detection or image segmentation. Results for object tracking, video editing (object removal, inpainting) and scene change detection are presented to demonstrate the efficiency and the performance of the proposed model.
Independent motion detection with a rival penalized adaptive particle filter

NASA Astrophysics Data System (ADS)

Becker, Stefan; Hübner, Wolfgang; Arens, Michael

2014-10-01

Aggregation of pixel based motion detection into regions of interest, which include views of single moving objects in a scene is an essential pre-processing step in many vision systems. Motion events of this type provide significant information about the object type or build the basis for action recognition. Further, motion is an essential saliency measure, which is able to effectively support high level image analysis. When applied to static cameras, background subtraction methods achieve good results. On the other hand, motion aggregation on freely moving cameras is still a widely unsolved problem. The image flow, measured on a freely moving camera is the result from two major motion types. First the ego-motion of the camera and second object motion, that is independent from the camera motion. When capturing a scene with a camera these two motion types are adverse blended together. In this paper, we propose an approach to detect multiple moving objects from a mobile monocular camera system in an outdoor environment. The overall processing pipeline consists of a fast ego-motion compensation algorithm in the preprocessing stage. Real-time performance is achieved by using a sparse optical flow algorithm as an initial processing stage and a densely applied probabilistic filter in the post-processing stage. Thereby, we follow the idea proposed by Jung and Sukhatme. Normalized intensity differences originating from a sequence of ego-motion compensated difference images represent the probability of moving objects. Noise and registration artefacts are filtered out, using a Bayesian formulation. The resulting a posteriori distribution is located on image regions, showing strong amplitudes in the difference image which are in accordance with the motion prediction. In order to effectively estimate the a posteriori distribution, a particle filter is used. In addition to the fast ego-motion compensation, the main contribution of this paper is the design of the probabilistic filter for real-time detection and tracking of independently moving objects. The proposed approach introduces a competition scheme between particles in order to ensure an improved multi-modality. Further, the filter design helps to generate a particle distribution which is homogenous even in the presence of multiple targets showing non-rigid motion patterns. The effectiveness of the method is shown on exemplary outdoor sequences.
Effect of Viewing Smoking Scenes in Motion Pictures on Subsequent Smoking Desire in Audiences in South Korea

PubMed Central

Sohn, Minsung

2017-01-01

Background In the modern era of heightened awareness of public health, smoking scenes in movies remain relatively free from public monitoring. The effect of smoking scenes in movies on the promotion of viewers’ smoking desire remains unknown. Objective The study aimed to explore whether exposure of adolescent smokers to images of smoking in fılms could stimulate smoking behavior. Methods Data were derived from a national Web-based sample survey of 748 Korean high-school students. Participants aged 16-18 years were randomly assigned to watch three short video clips with or without smoking scenes. After adjusting covariates using propensity score matching, paired sample t test and logistic regression analyses compared the difference in smoking desire before and after exposure of participants to smoking scenes. Results For male adolescents, cigarette craving was significantly higher in those who watched movies with smoking scenes than in the control group who did not view smoking scenes (t307.96=2.066, P<.05). In the experimental group, too, cigarette cravings of adolescents after viewing smoking scenes were significantly higher than they were before watching smoking scenes (t161.00=2.867, P<.01). After adjusting for covariates, more impulsive adolescents, particularly males, had significantly higher cigarette cravings: adjusted odds ratio (aOR) 3.40 (95% CI 1.40-8.23). However, those who actively sought health information had considerably lower cigarette cravings than those who did not engage in information-seeking: aOR 0.08 (95% CI 0.01-0.88). Conclusions Smoking scenes in motion pictures may increase male adolescent smoking desire. Establishing a standard that restricts the frequency of smoking scenes in films and assigning a smoking-related screening grade to films is warranted. PMID:28716768
Perception of scene-relative object movement: Optic flow parsing and the contribution of monocular depth cues.

PubMed

Warren, Paul A; Rushton, Simon K

2009-05-01

We have recently suggested that the brain uses its sensitivity to optic flow in order to parse retinal motion into components arising due to self and object movement (e.g. Rushton, S. K., & Warren, P. A. (2005). Moving observers, 3D relative motion and the detection of object movement. Current Biology, 15, R542-R543). Here, we explore whether stereo disparity is necessary for flow parsing or whether other sources of depth information, which could theoretically constrain flow-field interpretation, are sufficient. Stationary observers viewed large field of view stimuli containing textured cubes, moving in a manner that was consistent with a complex observer movement through a stationary scene. Observers made speeded responses to report the perceived direction of movement of a probe object presented at different depths in the scene. Across conditions we varied the presence or absence of different binocular and monocular cues to depth order. In line with previous studies, results consistent with flow parsing (in terms of both perceived direction and response time) were found in the condition in which motion parallax and stereoscopic disparity were present. Observers were poorer at judging object movement when depth order was specified by parallax alone. However, as more monocular depth cues were added to the stimulus the results approached those found when the scene contained stereoscopic cues. We conclude that both monocular and binocular static depth information contribute to flow parsing. These findings are discussed in the context of potential architectures for a model of the flow parsing mechanism.
Optic Flow Dominates Visual Scene Polarity in Causing Adaptive Modification of Locomotor Trajectory

NASA Technical Reports Server (NTRS)

Nomura, Y.; Mulavara, A. P.; Richards, J. T.; Brady, R.; Bloomberg, Jacob J.

2005-01-01

Locomotion and posture are influenced and controlled by vestibular, visual and somatosensory information. Optic flow and scene polarity are two characteristics of a visual scene that have been identified as being critical in how they affect perceived body orientation and self-motion. The goal of this study was to determine the role of optic flow and visual scene polarity on adaptive modification in locomotor trajectory. Two computer-generated virtual reality scenes were shown to subjects during 20 minutes of treadmill walking. One scene was a highly polarized scene while the other was composed of objects displayed in a non-polarized fashion. Both virtual scenes depicted constant rate self-motion equivalent to walking counterclockwise around the perimeter of a room. Subjects performed Stepping Tests blindfolded before and after scene exposure to assess adaptive changes in locomotor trajectory. Subjects showed a significant difference in heading direction, between pre and post adaptation stepping tests, when exposed to either scene during treadmill walking. However, there was no significant difference in the subjects heading direction between the two visual scene polarity conditions. Therefore, it was inferred from these data that optic flow has a greater role than visual polarity in influencing adaptive locomotor function.
Scene-Aware Adaptive Updating for Visual Tracking via Correlation Filters

PubMed Central

Zhang, Sirou; Qiao, Xiaoya

2017-01-01

In recent years, visual object tracking has been widely used in military guidance, human-computer interaction, road traffic, scene monitoring and many other fields. The tracking algorithms based on correlation filters have shown good performance in terms of accuracy and tracking speed. However, their performance is not satisfactory in scenes with scale variation, deformation, and occlusion. In this paper, we propose a scene-aware adaptive updating mechanism for visual tracking via a kernel correlation filter (KCF). First, a low complexity scale estimation method is presented, in which the corresponding weight in five scales is employed to determine the final target scale. Then, the adaptive updating mechanism is presented based on the scene-classification. We classify the video scenes as four categories by video content analysis. According to the target scene, we exploit the adaptive updating mechanism to update the kernel correlation filter to improve the robustness of the tracker, especially in scenes with scale variation, deformation, and occlusion. We evaluate our tracker on the CVPR2013 benchmark. The experimental results obtained with the proposed algorithm are improved by 33.3%, 15%, 6%, 21.9% and 19.8% compared to those of the KCF tracker on the scene with scale variation, partial or long-time large-area occlusion, deformation, fast motion and out-of-view. PMID:29140311
Characterization of a compact 6-band multifunctional camera based on patterned spectral filters in the focal plane

NASA Astrophysics Data System (ADS)

Torkildsen, H. E.; Hovland, H.; Opsahl, T.; Haavardsholm, T. V.; Nicolas, S.; Skauli, T.

2014-06-01

In some applications of multi- or hyperspectral imaging, it is important to have a compact sensor. The most compact spectral imaging sensors are based on spectral filtering in the focal plane. For hyperspectral imaging, it has been proposed to use a "linearly variable" bandpass filter in the focal plane, combined with scanning of the field of view. As the image of a given object in the scene moves across the field of view, it is observed through parts of the filter with varying center wavelength, and a complete spectrum can be assembled. However if the radiance received from the object varies with viewing angle, or with time, then the reconstructed spectrum will be distorted. We describe a camera design where this hyperspectral functionality is traded for multispectral imaging with better spectral integrity. Spectral distortion is minimized by using a patterned filter with 6 bands arranged close together, so that a scene object is seen by each spectral band in rapid succession and with minimal change in viewing angle. The set of 6 bands is repeated 4 times so that the spectral data can be checked for internal consistency. Still the total extent of the filter in the scan direction is small. Therefore the remainder of the image sensor can be used for conventional imaging with potential for using motion tracking and 3D reconstruction to support the spectral imaging function. We show detailed characterization of the point spread function of the camera, demonstrating the importance of such characterization as a basis for image reconstruction. A simplified image reconstruction based on feature-based image coregistration is shown to yield reasonable results. Elimination of spectral artifacts due to scene motion is demonstrated.
Object Segmentation from Motion Discontinuities and Temporal Occlusions–A Biologically Inspired Model

PubMed Central

Beck, Cornelia; Ognibeni, Thilo; Neumann, Heiko

2008-01-01

Background Optic flow is an important cue for object detection. Humans are able to perceive objects in a scene using only kinetic boundaries, and can perform the task even when other shape cues are not provided. These kinetic boundaries are characterized by the presence of motion discontinuities in a local neighbourhood. In addition, temporal occlusions appear along the boundaries as the object in front covers the background and the objects that are spatially behind it. Methodology/Principal Findings From a technical point of view, the detection of motion boundaries for segmentation based on optic flow is a difficult task. This is due to the problem that flow detected along such boundaries is generally not reliable. We propose a model derived from mechanisms found in visual areas V1, MT, and MSTl of human and primate cortex that achieves robust detection along motion boundaries. It includes two separate mechanisms for both the detection of motion discontinuities and of occlusion regions based on how neurons respond to spatial and temporal contrast, respectively. The mechanisms are embedded in a biologically inspired architecture that integrates information of different model components of the visual processing due to feedback connections. In particular, mutual interactions between the detection of motion discontinuities and temporal occlusions allow a considerable improvement of the kinetic boundary detection. Conclusions/Significance A new model is proposed that uses optic flow cues to detect motion discontinuities and object occlusion. We suggest that by combining these results for motion discontinuities and object occlusion, object segmentation within the model can be improved. This idea could also be applied in other models for object segmentation. In addition, we discuss how this model is related to neurophysiological findings. The model was successfully tested both with artificial and real sequences including self and object motion. PMID:19043613
Blind image deblurring based on trained dictionary and curvelet using sparse representation

NASA Astrophysics Data System (ADS)

Feng, Liang; Huang, Qian; Xu, Tingfa; Li, Shao

2015-04-01

Motion blur is one of the most significant and common artifacts causing poor image quality in digital photography, in which many factors resulted. In imaging process, if the objects are moving quickly in the scene or the camera moves in the exposure interval, the image of the scene would blur along the direction of relative motion between the camera and the scene, e.g. camera shake, atmospheric turbulence. Recently, sparse representation model has been widely used in signal and image processing, which is an effective method to describe the natural images. In this article, a new deblurring approach based on sparse representation is proposed. An overcomplete dictionary learned from the trained image samples via the KSVD algorithm is designed to represent the latent image. The motion-blur kernel can be treated as a piece-wise smooth function in image domain, whose support is approximately a thin smooth curve, so we employed curvelet to represent the blur kernel. Both of overcomplete dictionary and curvelet system have high sparsity, which improves the robustness to the noise and more satisfies the observer's visual demand. With the two priors, we constructed restoration model of blurred images and succeeded to solve the optimization problem with the help of alternating minimization technique. The experiment results prove the method can preserve the texture of original images and suppress the ring artifacts effectively.

Eye Movements Reveal the Dynamic Simulation of Speed in Language

ERIC Educational Resources Information Center

Speed, Laura J.; Vigliocco, Gabriella

2014-01-01

This study investigates how speed of motion is processed in language. In three eye-tracking experiments, participants were presented with visual scenes and spoken sentences describing fast or slow events (e.g., "The lion ambled/dashed to the balloon"). Results showed that looking time to relevant objects in the visual scene was affected…
Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet

PubMed Central

Rolls, Edmund T.

2012-01-01

Neurophysiological evidence for invariant representations of objects and faces in the primate inferior temporal visual cortex is described. Then a computational approach to how invariant representations are formed in the brain is described that builds on the neurophysiology. A feature hierarchy model in which invariant representations can be built by self-organizing learning based on the temporal and spatial statistics of the visual input produced by objects as they transform in the world is described. VisNet can use temporal continuity in an associative synaptic learning rule with a short-term memory trace, and/or it can use spatial continuity in continuous spatial transformation learning which does not require a temporal trace. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in, for example, spatial and object search tasks. The approach has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene. The approach has also been extended to provide, with an additional layer, for the development of representations of spatial scenes of the type found in the hippocampus. PMID:22723777
Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet.

PubMed

Rolls, Edmund T

2012-01-01

Neurophysiological evidence for invariant representations of objects and faces in the primate inferior temporal visual cortex is described. Then a computational approach to how invariant representations are formed in the brain is described that builds on the neurophysiology. A feature hierarchy model in which invariant representations can be built by self-organizing learning based on the temporal and spatial statistics of the visual input produced by objects as they transform in the world is described. VisNet can use temporal continuity in an associative synaptic learning rule with a short-term memory trace, and/or it can use spatial continuity in continuous spatial transformation learning which does not require a temporal trace. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in, for example, spatial and object search tasks. The approach has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene. The approach has also been extended to provide, with an additional layer, for the development of representations of spatial scenes of the type found in the hippocampus.
The lucky image-motion prediction for simple scene observation based soft-sensor technology

NASA Astrophysics Data System (ADS)

Li, Yan; Su, Yun; Hu, Bin

2015-08-01

High resolution is important to earth remote sensors, while the vibration of the platforms of the remote sensors is a major factor restricting high resolution imaging. The image-motion prediction and real-time compensation are key technologies to solve this problem. For the reason that the traditional autocorrelation image algorithm cannot meet the demand for the simple scene image stabilization, this paper proposes to utilize soft-sensor technology in image-motion prediction, and focus on the research of algorithm optimization in imaging image-motion prediction. Simulations results indicate that the improving lucky image-motion stabilization algorithm combining the Back Propagation Network (BP NN) and support vector machine (SVM) is the most suitable for the simple scene image stabilization. The relative error of the image-motion prediction based the soft-sensor technology is below 5%, the training computing speed of the mathematical predication model is as fast as the real-time image stabilization in aerial photography.
Neural representations of contextual guidance in visual search of real-world scenes.

PubMed

Preston, Tim J; Guo, Fei; Das, Koel; Giesbrecht, Barry; Eckstein, Miguel P

2013-05-01

Exploiting scene context and object-object co-occurrence is critical in guiding eye movements and facilitating visual search, yet the mediating neural mechanisms are unknown. We used functional magnetic resonance imaging while observers searched for target objects in scenes and used multivariate pattern analyses (MVPA) to show that the lateral occipital complex (LOC) can predict the coarse spatial location of observers' expectations about the likely location of 213 different targets absent from the scenes. In addition, we found weaker but significant representations of context location in an area related to the orienting of attention (intraparietal sulcus, IPS) as well as a region related to scene processing (retrosplenial cortex, RSC). Importantly, the degree of agreement among 100 independent raters about the likely location to contain a target object in a scene correlated with LOC's ability to predict the contextual location while weaker but significant effects were found in IPS, RSC, the human motion area, and early visual areas (V1, V3v). When contextual information was made irrelevant to observers' behavioral task, the MVPA analysis of LOC and the other areas' activity ceased to predict the location of context. Thus, our findings suggest that the likely locations of targets in scenes are represented in various visual areas with LOC playing a key role in contextual guidance during visual search of objects in real scenes.
Pursuit Eye Movements

NASA Technical Reports Server (NTRS)

Krauzlis, Rich; Stone, Leland; Null, Cynthia H. (Technical Monitor)

1998-01-01

When viewing objects, primates use a combination of saccadic and pursuit eye movements to stabilize the retinal image of the object of regard within the high-acuity region near the fovea. Although these movements involve widespread regions of the nervous system, they mix seamlessly in normal behavior. Saccades are discrete movements that quickly direct the eyes toward a visual target, thereby translating the image of the target from an eccentric retinal location to the fovea. In contrast, pursuit is a continuous movement that slowly rotates the eyes to compensate for the motion of the visual target, minimizing the blur that can compromise visual acuity. While other mammalian species can generate smooth optokinetic eye movements - which track the motion of the entire visual surround - only primates can smoothly pursue a single small element within a complex visual scene, regardless of the motion elsewhere on the retina. This ability likely reflects the greater ability of primates to segment the visual scene, to identify individual visual objects, and to select a target of interest.
Maintaining the ties that bind: the role of an intermediate visual memory store in the persistence of awareness.

PubMed

Ferber, Susanne; Emrich, Stephen M

2007-03-01

Segregation and feature binding are essential to the perception and awareness of objects in a visual scene. When a fragmented line-drawing of an object moves relative to a background of randomly oriented lines, the previously hidden object is segregated from the background and consequently enters awareness. Interestingly, in such shape-from-motion displays, the percept of the object persists briefly when the motion stops, suggesting that the segregated and bound representation of the object is maintained in awareness. Here, we tested whether this persistence effect is mediated by capacity-limited working-memory processes, or by the amount of object-related information available. The experiments demonstrate that persistence is affected mainly by the proportion of object information available and is independent of working-memory limits. We suggest that this persistence effect can be seen as evidence for an intermediate, form-based memory store mediating between sensory and working memory.
Perception of object trajectory: parsing retinal motion into self and object movement components.

PubMed

Warren, Paul A; Rushton, Simon K

2007-08-16

A moving observer needs to be able to estimate the trajectory of other objects moving in the scene. Without the ability to do so, it would be difficult to avoid obstacles or catch a ball. We hypothesized that neural mechanisms sensitive to the patterns of motion generated on the retina during self-movement (optic flow) play a key role in this process, "parsing" motion due to self-movement from that due to object movement. We investigated this "flow parsing" hypothesis by measuring the perceived trajectory of a moving probe placed within a flow field that was consistent with movement of the observer. In the first experiment, the flow field was consistent with an eye rotation; in the second experiment, it was consistent with a lateral translation of the eyes. We manipulated the distance of the probe in both experiments and assessed the consequences. As predicted by the flow parsing hypothesis, manipulating the distance of the probe had differing effects on the perceived trajectory of the probe in the two experiments. The results were consistent with the scene geometry and the type of simulated self-movement. In a third experiment, we explored the contribution of local and global motion processing to the results of the first two experiments. The data suggest that the parsing process involves global motion processing, not just local motion contrast. The findings of this study support a role for optic flow processing in the perception of object movement during self-movement.
Estimation of contour motion and deformation for nonrigid object tracking

NASA Astrophysics Data System (ADS)

Shao, Jie; Porikli, Fatih; Chellappa, Rama

2007-08-01

We present an algorithm for nonrigid contour tracking in heavily cluttered background scenes. Based on the properties of nonrigid contour movements, a sequential framework for estimating contour motion and deformation is proposed. We solve the nonrigid contour tracking problem by decomposing it into three subproblems: motion estimation, deformation estimation, and shape regulation. First, we employ a particle filter to estimate the global motion parameters of the affine transform between successive frames. Then we generate a probabilistic deformation map to deform the contour. To improve robustness, multiple cues are used for deformation probability estimation. Finally, we use a shape prior model to constrain the deformed contour. This enables us to retrieve the occluded parts of the contours and accurately track them while allowing shape changes specific to the given object types. Our experiments show that the proposed algorithm significantly improves the tracker performance.
Scene-aware joint global and local homographic video coding

NASA Astrophysics Data System (ADS)

Peng, Xiulian; Xu, Jizheng; Sullivan, Gary J.

2016-09-01

Perspective motion is commonly represented in video content that is captured and compressed for various applications including cloud gaming, vehicle and aerial monitoring, etc. Existing approaches based on an eight-parameter homography motion model cannot deal with this efficiently, either due to low prediction accuracy or excessive bit rate overhead. In this paper, we consider the camera motion model and scene structure in such video content and propose a joint global and local homography motion coding approach for video with perspective motion. The camera motion is estimated by a computer vision approach, and camera intrinsic and extrinsic parameters are globally coded at the frame level. The scene is modeled as piece-wise planes, and three plane parameters are coded at the block level. Fast gradient-based approaches are employed to search for the plane parameters for each block region. In this way, improved prediction accuracy and low bit costs are achieved. Experimental results based on the HEVC test model show that up to 9.1% bit rate savings can be achieved (with equal PSNR quality) on test video content with perspective motion. Test sequences for the example applications showed a bit rate savings ranging from 3.7 to 9.1%.
Invariant visual object recognition: a model, with lighting invariance.

PubMed

Rolls, Edmund T; Stringer, Simon M

2006-01-01

How are invariant representations of objects formed in the visual cortex? We describe a neurophysiological and computational approach which focusses on a feature hierarchy model in which invariant representations can be built by self-organizing learning based on the statistics of the visual input. The model can use temporal continuity in an associative synaptic learning rule with a short term memory trace, and/or it can use spatial continuity in Continuous Transformation learning. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and in this paper we show also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in for example spatial and object search tasks. The model has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene.
A rain pixel recovery algorithm for videos with highly dynamic scenes.

PubMed

Jie Chen; Lap-Pui Chau

2014-03-01

Rain removal is a very useful and important technique in applications such as security surveillance and movie editing. Several rain removal algorithms have been proposed these years, where photometric, chromatic, and probabilistic properties of the rain have been exploited to detect and remove the rainy effect. Current methods generally work well with light rain and relatively static scenes, when dealing with heavier rainfall in dynamic scenes, these methods give very poor visual results. The proposed algorithm is based on motion segmentation of dynamic scene. After applying photometric and chromatic constraints for rain detection, rain removal filters are applied on pixels such that their dynamic property as well as motion occlusion clue are considered; both spatial and temporal informations are then adaptively exploited during rain pixel recovery. Results show that the proposed algorithm has a much better performance for rainy scenes with large motion than existing algorithms.
Local statistics of retinal optic flow for self-motion through natural sceneries.

PubMed

Calow, Dirk; Lappe, Markus

2007-12-01

Image analysis in the visual system is well adapted to the statistics of natural scenes. Investigations of natural image statistics have so far mainly focused on static features. The present study is dedicated to the measurement and the analysis of the statistics of optic flow generated on the retina during locomotion through natural environments. Natural locomotion includes bouncing and swaying of the head and eye movement reflexes that stabilize gaze onto interesting objects in the scene while walking. We investigate the dependencies of the local statistics of optic flow on the depth structure of the natural environment and on the ego-motion parameters. To measure these dependencies we estimate the mutual information between correlated data sets. We analyze the results with respect to the variation of the dependencies over the visual field, since the visual motions in the optic flow vary depending on visual field position. We find that retinal flow direction and retinal speed show only minor statistical interdependencies. Retinal speed is statistically tightly connected to the depth structure of the scene. Retinal flow direction is statistically mostly driven by the relation between the direction of gaze and the direction of ego-motion. These dependencies differ at different visual field positions such that certain areas of the visual field provide more information about ego-motion and other areas provide more information about depth. The statistical properties of natural optic flow may be used to tune the performance of artificial vision systems based on human imitating behavior, and may be useful for analyzing properties of natural vision systems.
Slow motion in films and video clips: Music influences perceived duration and emotion, autonomic physiological activation and pupillary responses.

PubMed

Wöllner, Clemens; Hammerschmidt, David; Albrecht, Henning

2018-01-01

Slow motion scenes are ubiquitous in screen-based audiovisual media and are typically accompanied by emotional music. The strong effects of slow motion on observers are hypothetically related to heightened emotional states in which time seems to pass more slowly. These states are simulated in films and video clips, and seem to resemble such experiences in daily life. The current study investigated time perception and emotional response to media clips containing decelerated human motion, with or without music using psychometric and psychophysiological testing methods. Participants were presented with slow-motion scenes taken from commercial films, ballet and sports footage, as well as the same scenes converted to real-time. Results reveal that slow-motion scenes, compared to adapted real-time scenes, led to systematic underestimations of duration, lower perceived arousal but higher valence, lower respiration rates and smaller pupillary diameters. The presence of music compared to visual-only presentations strongly affected results in terms of higher accuracy in duration estimates, higher perceived arousal and valence, higher physiological activation and larger pupillary diameters, indicating higher arousal. Video genre affected responses in addition. These findings suggest that perceiving slow motion is not related to states of high arousal, but rather affects cognitive dimensions of perceived time and valence. Music influences these experiences profoundly, thus strengthening the impact of stretched time in audiovisual media.
Real-time detection of moving objects from moving vehicles using dense stereo and optical flow

NASA Technical Reports Server (NTRS)

Talukder, Ashit; Matthies, Larry

2004-01-01

Dynamic scene perception is very important for autonomous vehicles operating around other moving vehicles and humans. Most work on real-time object tracking from moving platforms has used sparse features or assumed flat scene structures. We have recently extended a real-time, dense stereo system to include realtime, dense optical flow, enabling more comprehensive dynamic scene analysis. We describe algorithms to robustly estimate 6-DOF robot egomotion in the presence of moving objects using dense flow and dense stereo. We then use dense stereo and egomotion estimates to identify & other moving objects while the robot itself is moving. We present results showing accurate egomotion estimation and detection of moving people and vehicles under general 6-DOF motion of the robot and independently moving objects. The system runs at 18.3 Hz on a 1.4 GHz Pentium M laptop, computing 160x120 disparity maps and optical flow fields, egomotion, and moving object segmentation. We believe this is a significant step toward general unconstrained dynamic scene analysis for mobile robots, as well as for improved position estimation where GPS is unavailable.
Real-time detection of moving objects from moving vehicles using dense stereo and optical flow

NASA Technical Reports Server (NTRS)

Talukder, Ashit; Matthies, Larry

2004-01-01

Dynamic scene perception is very important for autonomous vehicles operating around other moving vehicles and humans. Most work on real-time object tracking from moving platforms has used sparse features or assumed flat scene structures. We have recently extended a real-time, dense stereo system to include real-time, dense optical flow, enabling more comprehensive dynamic scene analysis. We describe algorithms to robustly estimate 6-DOF robot egomotion in the presence of moving objects using dense flow and dense stereo. We then use dense stereo and egomotion estimates to identity other moving objects while the robot itself is moving. We present results showing accurate egomotion estimation and detection of moving people and vehicles under general 6-DOF motion of the robot and independently moving objects. The system runs at 18.3 Hz on a 1.4 GHz Pentium M laptop, computing 160x120 disparity maps and optical flow fields, egomotion, and moving object segmentation. We believe this is a significant step toward general unconstrained dynamic scene analysis for mobile robots, as well as for improved position estimation where GPS is unavailable.
Real-time Detection of Moving Objects from Moving Vehicles Using Dense Stereo and Optical Flow

NASA Technical Reports Server (NTRS)

Talukder, Ashit; Matthies, Larry

2004-01-01

Dynamic scene perception is very important for autonomous vehicles operating around other moving vehicles and humans. Most work on real-time object tracking from moving platforms has used sparse features or assumed flat scene structures. We have recently extended a real-time. dense stereo system to include realtime. dense optical flow, enabling more comprehensive dynamic scene analysis. We describe algorithms to robustly estimate 6-DOF robot egomotion in the presence of moving objects using dense flow and dense stereo. We then use dense stereo and egomotion estimates to identify other moving objects while the robot itself is moving. We present results showing accurate egomotion estimation and detection of moving people and vehicles under general 6DOF motion of the robot and independently moving objects. The system runs at 18.3 Hz on a 1.4 GHz Pentium M laptop. computing 160x120 disparity maps and optical flow fields, egomotion, and moving object segmentation. We believe this is a significant step toward general unconstrained dynamic scene analysis for mobile robots, as well as for improved position estimation where GPS is unavailable.
Acoustic facilitation of object movement detection during self-motion

PubMed Central

Calabro, F. J.; Soto-Faraco, S.; Vaina, L. M.

2011-01-01

In humans, as well as most animal species, perception of object motion is critical to successful interaction with the surrounding environment. Yet, as the observer also moves, the retinal projections of the various motion components add to each other and extracting accurate object motion becomes computationally challenging. Recent psychophysical studies have demonstrated that observers use a flow-parsing mechanism to estimate and subtract self-motion from the optic flow field. We investigated whether concurrent acoustic cues for motion can facilitate visual flow parsing, thereby enhancing the detection of moving objects during simulated self-motion. Participants identified an object (the target) that moved either forward or backward within a visual scene containing nine identical textured objects simulating forward observer translation. We found that spatially co-localized, directionally congruent, moving auditory stimuli enhanced object motion detection. Interestingly, subjects who performed poorly on the visual-only task benefited more from the addition of moving auditory stimuli. When auditory stimuli were not co-localized to the visual target, improvements in detection rates were weak. Taken together, these results suggest that parsing object motion from self-motion-induced optic flow can operate on multisensory object representations. PMID:21307050
Three Small-Receptive-Field Ganglion Cells in the Mouse Retina Are Distinctly Tuned to Size, Speed, and Object Motion

PubMed Central

Jacoby, Jason

2017-01-01

Retinal ganglion cells (RGCs) are frequently divided into functional types by their ability to extract and relay specific features from a visual scene, such as the capacity to discern local or global motion, direction of motion, stimulus orientation, contrast or uniformity, or the presence of large or small objects. Here we introduce three previously uncharacterized, nondirection-selective ON–OFF RGC types that represent a distinct set of feature detectors in the mouse retina. The three high-definition (HD) RGCs possess small receptive-field centers and strong surround suppression. They respond selectively to objects of specific sizes, speeds, and types of motion. We present comprehensive morphological characterization of the HD RGCs and physiological recordings of their light responses, receptive-field size and structure, and synaptic mechanisms of surround suppression. We also explore the similarities and differences between the HD RGCs and a well characterized RGC with a comparably small receptive field, the local edge detector, in response to moving objects and textures. We model populations of each RGC type to study how they differ in their performance tracking a moving object. These results, besides introducing three new RGC types that together constitute a substantial fraction of mouse RGCs, provide insights into the role of different circuits in shaping RGC receptive fields and establish a foundation for continued study of the mechanisms of surround suppression and the neural basis of motion detection. SIGNIFICANCE STATEMENT The output cells of the retina, retinal ganglion cells (RGCs), are a diverse group of ∼40 distinct neuron types that are often assigned “feature detection” profiles based on the specific aspects of the visual scene to which they respond. Here we describe, for the first time, morphological and physiological characterization of three new RGC types in the mouse retina, substantially augmenting our understanding of feature selectivity. Experiments and modeling show that while these three “high-definition” RGCs share certain receptive-field properties, they also have distinct tuning to the size, speed, and type of motion on the retina, enabling them to occupy different niches in stimulus space. PMID:28100743
Recollection and unitization in associating actors with extrinsic and intrinsic motions.

PubMed

Kersten, Alan W; Earles, Julie L; Berger, Johanna D

2015-04-01

Four experiments provide evidence for a distinction between 2 different kinds of motion representations. Extrinsic motions involve the path of an object with respect to an external frame of reference. Intrinsic motions involve the relative motions of the parts of an object. This research suggests that intrinsic motions are represented conjointly with information about the identities of the actors who perform them, whereas extrinsic motions are represented separately from identity information. Experiment 1 demonstrated that participants remembered which actor had performed a particular intrinsic motion better than they remembered which actor had performed a particular extrinsic motion. Experiment 2 replicated this effect with incidental encoding of actor information, suggesting that encoding intrinsic motions leads one to automatically encode identity information. The results of Experiments 3 and 4 were fit by Yonelinas's (1999) source-memory model to quantify the contributions of familiarity and recollection to memory for the actors who carried out the intrinsic and extrinsic motions. Successful performance with extrinsic motion items in Experiment 3 required participants to remember in which scene contexts an actor had appeared, whereas successful performance in Experiment 4 required participants to remember the exact path taken by an actor in each scene. In both experiments, discrimination of old and new combinations of actors and extrinsic motions relied strongly on recollection, suggesting independent but associated representations of actors and extrinsic motions. In contrast, participants discriminated old and new combinations of actors and intrinsic motions primarily on the basis of familiarity, suggesting unitized representations of actors and intrinsic motions. (c) 2015 APA, all rights reserved).

Determining the 3-D structure and motion of objects using a scanning laser range sensor

NASA Technical Reports Server (NTRS)

Nandhakumar, N.; Smith, Philip W.

1993-01-01

In order for the EVAHR robot to autonomously track and grasp objects, its vision system must be able to determine the 3-D structure and motion of an object from a sequence of sensory images. This task is accomplished by the use of a laser radar range sensor which provides dense range maps of the scene. Unfortunately, the currently available laser radar range cameras use a sequential scanning approach which complicates image analysis. Although many algorithms have been developed for recognizing objects from range images, none are suited for use with single beam, scanning, time-of-flight sensors because all previous algorithms assume instantaneous acquisition of the entire image. This assumption is invalid since the EVAHR robot is equipped with a sequential scanning laser range sensor. If an object is moving while being imaged by the device, the apparent structure of the object can be significantly distorted due to the significant non-zero delay time between sampling each image pixel. If an estimate of the motion of the object can be determined, this distortion can be eliminated; but, this leads to the motion-structure paradox - most existing algorithms for 3-D motion estimation use the structure of objects to parameterize their motions. The goal of this research is to design a rigid-body motion recovery technique which overcomes this limitation. The method being developed is an iterative, linear, feature-based approach which uses the non-zero image acquisition time constraint to accurately recover the motion parameters from the distorted structure of the 3-D range maps. Once the motion parameters are determined, the structural distortion in the range images is corrected.
Bilayer segmentation of webcam videos using tree-based classifiers.

PubMed

Yin, Pei; Criminisi, Antonio; Winn, John; Essa, Irfan

2011-01-01

This paper presents an automatic segmentation algorithm for video frames captured by a (monocular) webcam that closely approximates depth segmentation from a stereo camera. The frames are segmented into foreground and background layers that comprise a subject (participant) and other objects and individuals. The algorithm produces correct segmentations even in the presence of large background motion with a nearly stationary foreground. This research makes three key contributions: First, we introduce a novel motion representation, referred to as "motons," inspired by research in object recognition. Second, we propose estimating the segmentation likelihood from the spatial context of motion. The estimation is efficiently learned by random forests. Third, we introduce a general taxonomy of tree-based classifiers that facilitates both theoretical and experimental comparisons of several known classification algorithms and generates new ones. In our bilayer segmentation algorithm, diverse visual cues such as motion, motion context, color, contrast, and spatial priors are fused by means of a conditional random field (CRF) model. Segmentation is then achieved by binary min-cut. Experiments on many sequences of our videochat application demonstrate that our algorithm, which requires no initialization, is effective in a variety of scenes, and the segmentation results are comparable to those obtained by stereo systems.
Robotic vision techniques for space operations

NASA Technical Reports Server (NTRS)

Krishen, Kumar

1994-01-01

Automation and robotics for space applications are being pursued for increased productivity, enhanced reliability, increased flexibility, higher safety, and for the automation of time-consuming tasks and those activities which are beyond the capacity of the crew. One of the key functional elements of an automated robotic system is sensing and perception. As the robotics era dawns in space, vision systems will be required to provide the key sensory data needed for multifaceted intelligent operations. In general, the three-dimensional scene/object description, along with location, orientation, and motion parameters will be needed. In space, the absence of diffused lighting due to a lack of atmosphere gives rise to: (a) high dynamic range (10(exp 8)) of scattered sunlight intensities, resulting in very high contrast between shadowed and specular portions of the scene; (b) intense specular reflections causing target/scene bloom; and (c) loss of portions of the image due to shadowing and presence of stars, Earth, Moon, and other space objects in the scene. In this work, developments for combating the adverse effects described earlier and for enhancing scene definition are discussed. Both active and passive sensors are used. The algorithm for selecting appropriate wavelength, polarization, look angle of vision sensors is based on environmental factors as well as the properties of the target/scene which are to be perceived. The environment is characterized on the basis of sunlight and other illumination incident on the target/scene and the temperature profiles estimated on the basis of the incident illumination. The unknown geometrical and physical parameters are then derived from the fusion of the active and passive microwave, infrared, laser, and optical data.
Full-color digitized holography for large-scale holographic 3D imaging of physical and nonphysical objects.

PubMed

Matsushima, Kyoji; Sonobe, Noriaki

2018-01-01

Digitized holography techniques are used to reconstruct three-dimensional (3D) images of physical objects using large-scale computer-generated holograms (CGHs). The object field is captured at three wavelengths over a wide area at high densities. Synthetic aperture techniques using single sensors are used for image capture in phase-shifting digital holography. The captured object field is incorporated into a virtual 3D scene that includes nonphysical objects, e.g., polygon-meshed CG models. The synthetic object field is optically reconstructed as a large-scale full-color CGH using red-green-blue color filters. The CGH has a wide full-parallax viewing zone and reconstructs a deep 3D scene with natural motion parallax.
Reconstruction of noisy and blurred images using blur kernel

NASA Astrophysics Data System (ADS)

Ellappan, Vijayan; Chopra, Vishal

2017-11-01

Blur is a common in so many digital images. Blur can be caused by motion of the camera and scene object. In this work we proposed a new method for deblurring images. This work uses sparse representation to identify the blur kernel. By analyzing the image coordinates Using coarse and fine, we fetch the kernel based image coordinates and according to that observation we get the motion angle of the shaken or blurred image. Then we calculate the length of the motion kernel using radon transformation and Fourier for the length calculation of the image and we use Lucy Richardson algorithm which is also called NON-Blind(NBID) Algorithm for more clean and less noisy image output. All these operation will be performed in MATLAB IDE.
Two Mirrors: Infinite Images of DiCaprio

NASA Astrophysics Data System (ADS)

Fadeev, Pavel

2015-11-01

Movies are mostly viewed for entertainment. Mixing entertainment and physics gets students excited as we look at a famous movie scene from a different point of view. The following is a link to a fragment from the 2010 motion picture "Inception": http://www.youtube.com/watch?v=q3tBBhYJeAw. The following problem, based on images in facing mirrors from that movie, was given to high school students studying geometric objects.
Two cloud-based cues for estimating scene structure and camera calibration.

PubMed

Jacobs, Nathan; Abrams, Austin; Pless, Robert

2013-10-01

We describe algorithms that use cloud shadows as a form of stochastically structured light to support 3D scene geometry estimation. Taking video captured from a static outdoor camera as input, we use the relationship of the time series of intensity values between pairs of pixels as the primary input to our algorithms. We describe two cues that relate the 3D distance between a pair of points to the pair of intensity time series. The first cue results from the fact that two pixels that are nearby in the world are more likely to be under a cloud at the same time than two distant points. We describe methods for using this cue to estimate focal length and scene structure. The second cue is based on the motion of cloud shadows across the scene; this cue results in a set of linear constraints on scene structure. These constraints have an inherent ambiguity, which we show how to overcome by combining the cloud motion cue with the spatial cue. We evaluate our method on several time lapses of real outdoor scenes.
Object detection in natural scenes: Independent effects of spatial and category-based attention.

PubMed

Stein, Timo; Peelen, Marius V

2017-04-01

Humans are remarkably efficient in detecting highly familiar object categories in natural scenes, with evidence suggesting that such object detection can be performed in the (near) absence of attention. Here we systematically explored the influences of both spatial attention and category-based attention on the accuracy of object detection in natural scenes. Manipulating both types of attention additionally allowed for addressing how these factors interact: whether the requirement for spatial attention depends on the extent to which observers are prepared to detect a specific object category-that is, on category-based attention. The results showed that the detection of targets from one category (animals or vehicles) was better than the detection of targets from two categories (animals and vehicles), demonstrating the beneficial effect of category-based attention. This effect did not depend on the semantic congruency of the target object and the background scene, indicating that observers attended to visual features diagnostic of the foreground target objects from the cued category. Importantly, in three experiments the detection of objects in scenes presented in the periphery was significantly impaired when observers simultaneously performed an attentionally demanding task at fixation, showing that spatial attention affects natural scene perception. In all experiments, the effects of category-based attention and spatial attention on object detection performance were additive rather than interactive. Finally, neither spatial nor category-based attention influenced metacognitive ability for object detection performance. These findings demonstrate that efficient object detection in natural scenes is independently facilitated by spatial and category-based attention.
3D pose estimation and motion analysis of the articulated human hand-forearm limb in an industrial production environment

NASA Astrophysics Data System (ADS)

Hahn, Markus; Barrois, Björn; Krüger, Lars; Wöhler, Christian; Sagerer, Gerhard; Kummert, Franz

2010-09-01

This study introduces an approach to model-based 3D pose estimation and instantaneous motion analysis of the human hand-forearm limb in the application context of safe human-robot interaction. 3D pose estimation is performed using two approaches: The Multiocular Contracting Curve Density (MOCCD) algorithm is a top-down technique based on pixel statistics around a contour model projected into the images from several cameras. The Iterative Closest Point (ICP) algorithm is a bottom-up approach which uses a motion-attributed 3D point cloud to estimate the object pose. Due to their orthogonal properties, a fusion of these algorithms is shown to be favorable. The fusion is performed by a weighted combination of the extracted pose parameters in an iterative manner. The analysis of object motion is based on the pose estimation result and the motion-attributed 3D points belonging to the hand-forearm limb using an extended constraint-line approach which does not rely on any temporal filtering. A further refinement is obtained using the Shape Flow algorithm, a temporal extension of the MOCCD approach, which estimates the temporal pose derivative based on the current and the two preceding images, corresponding to temporal filtering with a short response time of two or at most three frames. Combining the results of the two motion estimation stages provides information about the instantaneous motion properties of the object. Experimental investigations are performed on real-world image sequences displaying several test persons performing different working actions typically occurring in an industrial production scenario. In all example scenes, the background is cluttered, and the test persons wear various kinds of clothes. For evaluation, independently obtained ground truth data are used. [Figure not available: see fulltext.
Spared Ability to Perceive Direction of Locomotor Heading and Scene-Relative Object Movement Despite Inability to Perceive Relative Motion

PubMed Central

Vaina, Lucia M.; Buonanno, Ferdinando; Rushton, Simon K.

2014-01-01

Background All contemporary models of perception of locomotor heading from optic flow (the characteristic patterns of retinal motion that result from self-movement) begin with relative motion. Therefore it would be expected that an impairment on perception of relative motion should impact on the ability to judge heading and other 3D motion tasks. Material/Methods We report two patients with occipital lobe lesions whom we tested on a battery of motion tasks. Patients were impaired on all tests that involved relative motion in plane (motion discontinuity, form from differences in motion direction or speed). Despite this they retained the ability to judge their direction of heading relative to a target. A potential confound is that observers can derive information about heading from scale changes bypassing the need to use optic flow. Therefore we ran further experiments in which we isolated optic flow and scale change. Results Patients’ performance was in normal ranges on both tests. The finding that ability to perceive heading can be retained despite an impairment on ability to judge relative motion questions the assumption that heading perception proceeds from initial processing of relative motion. Furthermore, on a collision detection task, SS and SR’s performance was significantly better for simulated forward movement of the observer in the 3D scene, than for the static observer. This suggests that in spite of severe deficits on relative motion in the frontoparlel (xy) plane, information from self-motion helped identification objects moving along an intercept 3D relative motion trajectory. Conclusions This result suggests a potential use of a flow parsing strategy to detect in a 3D world the trajectory of moving objects when the observer is moving forward. These results have implications for developing rehabilitation strategies for deficits in visually guided navigation. PMID:25183375
Additional Crime Scenes for Projectile Motion Unit

NASA Astrophysics Data System (ADS)

Fullerton, Dan; Bonner, David

2011-12-01

Building students' ability to transfer physics fundamentals to real-world applications establishes a deeper understanding of underlying concepts while enhancing student interest. Forensic science offers a great opportunity for students to apply physics to highly engaging, real-world contexts. Integrating these opportunities into inquiry-based problem solving in a team environment provides a terrific backdrop for fostering communication, analysis, and critical thinking skills. One such activity, inspired jointly by the museum exhibit "CSI: The Experience"2 and David Bonner's TPT article "Increasing Student Engagement and Enthusiasm: A Projectile Motion Crime Scene,"3 provides students with three different crime scenes, each requiring an analysis of projectile motion. In this lesson students socially engage in higher-order analysis of two-dimensional projectile motion problems by collecting information from 3-D scale models and collaborating with one another on its interpretation, in addition to diagramming and mathematical analysis typical to problem solving in physics.
Richardson-Lucy deblurring for the star scene under a thinning motion path

NASA Astrophysics Data System (ADS)

Su, Laili; Shao, Xiaopeng; Wang, Lin; Wang, Haixin; Huang, Yining

2015-05-01

This paper puts emphasis on how to model and correct image blur that arises from a camera's ego motion while observing a distant star scene. Concerning the significance of accurate estimation of point spread function (PSF), a new method is employed to obtain blur kernel by thinning star motion path. In particular, how the blurred star image can be corrected to reconstruct the clear scene with a thinning motion blur model which describes the camera's path is presented. This thinning motion path to build blur kernel model is more effective at modeling the spatially motion blur introduced by camera's ego motion than conventional blind estimation of kernel-based PSF parameterization. To gain the reconstructed image, firstly, an improved thinning algorithm is used to obtain the star point trajectory, so as to extract the blur kernel of the motion-blurred star image. Then how motion blur model can be incorporated into the Richardson-Lucy (RL) deblurring algorithm, which reveals its overall effectiveness, is detailed. In addition, compared with the conventional estimated blur kernel, experimental results show that the proposed method of using thinning algorithm to get the motion blur kernel is of less complexity, higher efficiency and better accuracy, which contributes to better restoration of the motion-blurred star images.
HDR video synthesis for vision systems in dynamic scenes

NASA Astrophysics Data System (ADS)

Shopovska, Ivana; Jovanov, Ljubomir; Goossens, Bart; Philips, Wilfried

2016-09-01

High dynamic range (HDR) image generation from a number of differently exposed low dynamic range (LDR) images has been extensively explored in the past few decades, and as a result of these efforts a large number of HDR synthesis methods have been proposed. Since HDR images are synthesized by combining well-exposed regions of the input images, one of the main challenges is dealing with camera or object motion. In this paper we propose a method for the synthesis of HDR video from a single camera using multiple, differently exposed video frames, with circularly alternating exposure times. One of the potential applications of the system is in driver assistance systems and autonomous vehicles, involving significant camera and object movement, non- uniform and temporally varying illumination, and the requirement of real-time performance. To achieve these goals simultaneously, we propose a HDR synthesis approach based on weighted averaging of aligned radiance maps. The computational complexity of high-quality optical flow methods for motion compensation is still pro- hibitively high for real-time applications. Instead, we rely on more efficient global projective transformations to solve camera movement, while moving objects are detected by thresholding the differences between the trans- formed and brightness adapted images in the set. To attain temporal consistency of the camera motion in the consecutive HDR frames, the parameters of the perspective transformation are stabilized over time by means of computationally efficient temporal filtering. We evaluated our results on several reference HDR videos, on synthetic scenes, and using 14-bit raw images taken with a standard camera.
Threatening scenes but not threatening faces shorten time-to-contact estimates.

PubMed

DeLucia, Patricia R; Brendel, Esther; Hecht, Heiko; Stacy, Ryan L; Larsen, Jeff T

2014-08-01

We previously reported that time-to-contact (TTC) judgments of threatening scene pictures (e.g., frontal attacks) resulted in shortened estimations and were mediated by cognitive processes, and that judgments of threatening (e.g., angry) face pictures resulted in a smaller effect and did not seem cognitively mediated. In the present study, the effects of threatening scenes and faces were compared in two different tasks. An effect of threatening scene pictures occurred in a prediction-motion task, which putatively requires cognitive motion extrapolation, but not in a relative TTC judgment task, which was designed to be less reliant on cognitive processes. An effect of threatening face pictures did not occur in either task. We propose that an object's explicit potential of threat per se, and not only emotional valence, underlies the effect of threatening scenes on TTC judgments and that such an effect occurs only when the task allows sufficient cognitive processing. Results are consistent with distinctions between predator and social fear systems and different underlying physiological mechanisms. Not all threatening information elicits the same responses, and whether an effect occurs at all may depend on the task and the degree to which the task involves cognitive processes.
FPGA-based architecture for motion recovering in real-time

NASA Astrophysics Data System (ADS)

Arias-Estrada, Miguel; Maya-Rueda, Selene E.; Torres-Huitzil, Cesar

2002-03-01

A key problem in the computer vision field is the measurement of object motion in a scene. The main goal is to compute an approximation of the 3D motion from the analysis of an image sequence. Once computed, this information can be used as a basis to reach higher level goals in different applications. Motion estimation algorithms pose a significant computational load for the sequential processors limiting its use in practical applications. In this work we propose a hardware architecture for motion estimation in real time based on FPGA technology. The technique used for motion estimation is Optical Flow due to its accuracy, and the density of velocity estimation, however other techniques are being explored. The architecture is composed of parallel modules working in a pipeline scheme to reach high throughput rates near gigaflops. The modules are organized in a regular structure to provide a high degree of flexibility to cover different applications. Some results will be presented and the real-time performance will be discussed and analyzed. The architecture is prototyped in an FPGA board with a Virtex device interfaced to a digital imager.
A neural model of the temporal dynamics of figure-ground segregation in motion perception.

PubMed

Raudies, Florian; Neumann, Heiko

2010-03-01

How does the visual system manage to segment a visual scene into surfaces and objects and manage to attend to a target object? Based on psychological and physiological investigations, it has been proposed that the perceptual organization and segmentation of a scene is achieved by the processing at different levels of the visual cortical hierarchy. According to this, motion onset detection, motion-defined shape segregation, and target selection are accomplished by processes which bind together simple features into fragments of increasingly complex configurations at different levels in the processing hierarchy. As an alternative to this hierarchical processing hypothesis, it has been proposed that the processing stages for feature detection and segregation are reflected in different temporal episodes in the response patterns of individual neurons. Such temporal epochs have been observed in the activation pattern of neurons as low as in area V1. Here, we present a neural network model of motion detection, figure-ground segregation and attentive selection which explains these response patterns in an unifying framework. Based on known principles of functional architecture of the visual cortex, we propose that initial motion and motion boundaries are detected at different and hierarchically organized stages in the dorsal pathway. Visual shapes that are defined by boundaries, which were generated from juxtaposed opponent motions, are represented at different stages in the ventral pathway. Model areas in the different pathways interact through feedforward and modulating feedback, while mutual interactions enable the communication between motion and form representations. Selective attention is devoted to shape representations by sending modulating feedback signals from higher levels (working memory) to intermediate levels to enhance their responses. Areas in the motion and form pathway are coupled through top-down feedback with V1 cells at the bottom end of the hierarchy. We propose that the different temporal episodes in the response pattern of V1 cells, as recorded in recent experiments, reflect the strength of modulating feedback signals. This feedback results from the consolidated shape representations from coherent motion patterns and the attentive modulation of responses along the cortical hierarchy. The model makes testable predictions concerning the duration and delay of the temporal episodes of V1 cell responses as well as their response variations that were caused by modulating feedback signals. Copyright 2009 Elsevier Ltd. All rights reserved.
Visual memory for moving scenes.

PubMed

DeLucia, Patricia R; Maldia, Maria M

2006-02-01

In the present study, memory for picture boundaries was measured with scenes that simulated self-motion along the depth axis. The results indicated that boundary extension (a distortion in memory for picture boundaries) occurred with moving scenes in the same manner as that reported previously for static scenes. Furthermore, motion affected memory for the boundaries but this effect of motion was not consistent with representational momentum of the self (memory being further forward in a motion trajectory than actually shown). We also found that memory for the final position of the depicted self in a moving scene was influenced by properties of the optical expansion pattern. The results are consistent with a conceptual framework in which the mechanisms that underlie boundary extension and representational momentum (a) process different information and (b) both contribute to the integration of successive views of a scene while the scene is changing.
Investigation of visually induced motion sickness in dynamic 3D contents based on subjective judgment, heart rate variability, and depth gaze behavior.

PubMed

Wibirama, Sunu; Hamamoto, Kazuhiko

2014-01-01

Visually induced motion sickness (VIMS) is an important safety issue in stereoscopic 3D technology. Accompanying subjective judgment of VIMS with objective measurement is useful to identify not only biomedical effects of dynamic 3D contents, but also provoking scenes that induce VIMS, duration of VIMS, and user behavior during VIMS. Heart rate variability and depth gaze behavior are appropriate physiological indicators for such objective observation. However, there is no information about relationship between subjective judgment of VIMS, heart rate variability, and depth gaze behavior. In this paper, we present a novel investigation of VIMS based on simulator sickness questionnaire (SSQ), electrocardiography (ECG), and 3D gaze tracking. Statistical analysis on SSQ data shows that nausea and disorientation symptoms increase as amount of dynamic motions increases (nausea: p<;0.005; disorientation: p<;0.05). To reduce VIMS, SSQ and ECG data suggest that user should perform voluntary gaze fixation at one point when experiencing vertical motion (up or down) and horizontal motion (turn left and right) in dynamic 3D contents. Observation of 3D gaze tracking data reveals that users who experienced VIMS tended to have unstable depth gaze than ones who did not experience VIMS.
Optic flow-based collision-free strategies: From insects to robots.

PubMed

Serres, Julien R; Ruffier, Franck

2017-09-01

Flying insects are able to fly smartly in an unpredictable environment. It has been found that flying insects have smart neurons inside their tiny brains that are sensitive to visual motion also called optic flow. Consequently, flying insects rely mainly on visual motion during their flight maneuvers such as: takeoff or landing, terrain following, tunnel crossing, lateral and frontal obstacle avoidance, and adjusting flight speed in a cluttered environment. Optic flow can be defined as the vector field of the apparent motion of objects, surfaces, and edges in a visual scene generated by the relative motion between an observer (an eye or a camera) and the scene. Translational optic flow is particularly interesting for short-range navigation because it depends on the ratio between (i) the relative linear speed of the visual scene with respect to the observer and (ii) the distance of the observer from obstacles in the surrounding environment without any direct measurement of either speed or distance. In flying insects, roll stabilization reflex and yaw saccades attenuate any rotation at the eye level in roll and yaw respectively (i.e. to cancel any rotational optic flow) in order to ensure pure translational optic flow between two successive saccades. Our survey focuses on feedback-loops which use the translational optic flow that insects employ for collision-free navigation. Optic flow is likely, over the next decade to be one of the most important visual cues that can explain flying insects' behaviors for short-range navigation maneuvers in complex tunnels. Conversely, the biorobotic approach can therefore help to develop innovative flight control systems for flying robots with the aim of mimicking flying insects' abilities and better understanding their flight. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Global ensemble texture representations are critical to rapid scene perception.

PubMed

Brady, Timothy F; Shafer-Skelton, Anna; Alvarez, George A

2017-06-01

Traditionally, recognizing the objects within a scene has been treated as a prerequisite to recognizing the scene itself. However, research now suggests that the ability to rapidly recognize visual scenes could be supported by global properties of the scene itself rather than the objects within the scene. Here, we argue for a particular instantiation of this view: That scenes are recognized by treating them as a global texture and processing the pattern of orientations and spatial frequencies across different areas of the scene without recognizing any objects. To test this model, we asked whether there is a link between how proficient individuals are at rapid scene perception and how proficiently they represent simple spatial patterns of orientation information (global ensemble texture). We find a significant and selective correlation between these tasks, suggesting a link between scene perception and spatial ensemble tasks but not nonspatial summary statistics In a second and third experiment, we additionally show that global ensemble texture information is not only associated with scene recognition, but that preserving only global ensemble texture information from scenes is sufficient to support rapid scene perception; however, preserving the same information is not sufficient for object recognition. Thus, global ensemble texture alone is sufficient to allow activation of scene representations but not object representations. Together, these results provide evidence for a view of scene recognition based on global ensemble texture rather than a view based purely on objects or on nonspatially localized global properties. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

Illusory motion reveals velocity matching, not foveation, drives smooth pursuit of large objects

PubMed Central

Ma, Zheng; Watamaniuk, Scott N. J.; Heinen, Stephen J.

2017-01-01

When small objects move in a scene, we keep them foveated with smooth pursuit eye movements. Although large objects such as people and animals are common, it is nonetheless unknown how we pursue them since they cannot be foveated. It might be that the brain calculates an object's centroid, and then centers the eyes on it during pursuit as a foveation mechanism might. Alternatively, the brain merely matches the velocity by motion integration. We test these alternatives with an illusory motion stimulus that translates at a speed different from its retinal motion. The stimulus was a Gabor array that translated at a fixed velocity, with component Gabors that drifted with motion consistent or inconsistent with the translation. Velocity matching predicts different pursuit behaviors across drift conditions, while centroid matching predicts no difference. We also tested whether pursuit can segregate and ignore irrelevant local drifts when motion and centroid information are consistent by surrounding the Gabors with solid frames. Finally, observers judged the global translational speed of the Gabors to determine whether smooth pursuit and motion perception share mechanisms. We found that consistent Gabor motion enhanced pursuit gain while inconsistent, opposite motion diminished it, drawing the eyes away from the center of the stimulus and supporting a motion-based pursuit drive. Catch-up saccades tended to counter the position offset, directing the eyes opposite to the deviation caused by the pursuit gain change. Surrounding the Gabors with visible frames canceled both the gain increase and the compensatory saccades. Perceived speed was modulated analogous to pursuit gain. The results suggest that smooth pursuit of large stimuli depends on the magnitude of integrated retinal motion information, not its retinal location, and that the position system might be unnecessary for generating smooth velocity to large pursuit targets. PMID:29090315
Laser-Based Slam with Efficient Occupancy Likelihood Map Learning for Dynamic Indoor Scenes

NASA Astrophysics Data System (ADS)

Li, Li; Yao, Jian; Xie, Renping; Tu, Jinge; Feng, Chen

2016-06-01

Location-Based Services (LBS) have attracted growing attention in recent years, especially in indoor environments. The fundamental technique of LBS is the map building for unknown environments, this technique also named as simultaneous localization and mapping (SLAM) in robotic society. In this paper, we propose a novel approach for SLAMin dynamic indoor scenes based on a 2D laser scanner mounted on a mobile Unmanned Ground Vehicle (UGV) with the help of the grid-based occupancy likelihood map. Instead of applying scan matching in two adjacent scans, we propose to match current scan with the occupancy likelihood map learned from all previous scans in multiple scales to avoid the accumulation of matching errors. Due to that the acquisition of the points in a scan is sequential but not simultaneous, there unavoidably exists the scan distortion at different extents. To compensate the scan distortion caused by the motion of the UGV, we propose to integrate a velocity of a laser range finder (LRF) into the scan matching optimization framework. Besides, to reduce the effect of dynamic objects such as walking pedestrians often existed in indoor scenes as much as possible, we propose a new occupancy likelihood map learning strategy by increasing or decreasing the probability of each occupancy grid after each scan matching. Experimental results in several challenged indoor scenes demonstrate that our proposed approach is capable of providing high-precision SLAM results.
IR characteristic simulation of city scenes based on radiosity model

NASA Astrophysics Data System (ADS)

Xiong, Xixian; Zhou, Fugen; Bai, Xiangzhi; Yu, Xiyu

2013-09-01

Reliable modeling for thermal infrared (IR) signatures of real-world city scenes is required for signature management of civil and military platforms. Traditional modeling methods generally assume that scene objects are individual entities during the physical processes occurring in infrared range. However, in reality, the physical scene involves convective and conductive interactions between objects as well as the radiations interactions between objects. A method based on radiosity model describes these complex effects. It has been developed to enable an accurate simulation for the radiance distribution of the city scenes. Firstly, the physical processes affecting the IR characteristic of city scenes were described. Secondly, heat balance equations were formed on the basis of combining the atmospheric conditions, shadow maps and the geometry of scene. Finally, finite difference method was used to calculate the kinetic temperature of object surface. A radiosity model was introduced to describe the scattering effect of radiation between surface elements in the scene. By the synthesis of objects radiance distribution in infrared range, we could obtain the IR characteristic of scene. Real infrared images and model predictions were shown and compared. The results demonstrate that this method can realistically simulate the IR characteristic of city scenes. It effectively displays the infrared shadow effects and the radiation interactions between objects in city scenes.
A knowledge-based machine vision system for space station automation

NASA Technical Reports Server (NTRS)

Chipman, Laure J.; Ranganath, H. S.

1989-01-01

A simple knowledge-based approach to the recognition of objects in man-made scenes is being developed. Specifically, the system under development is a proposed enhancement to a robot arm for use in the space station laboratory module. The system will take a request from a user to find a specific object, and locate that object by using its camera input and information from a knowledge base describing the scene layout and attributes of the object types included in the scene. In order to use realistic test images in developing the system, researchers are using photographs of actual NASA simulator panels, which provide similar types of scenes to those expected in the space station environment. Figure 1 shows one of these photographs. In traditional approaches to image analysis, the image is transformed step by step into a symbolic representation of the scene. Often the first steps of the transformation are done without any reference to knowledge of the scene or objects. Segmentation of an image into regions generally produces a counterintuitive result in which regions do not correspond to objects in the image. After segmentation, a merging procedure attempts to group regions into meaningful units that will more nearly correspond to objects. Here, researchers avoid segmenting the image as a whole, and instead use a knowledge-directed approach to locate objects in the scene. The knowledge-based approach to scene analysis is described and the categories of knowledge used in the system are discussed.
Visual fatigue modeling for stereoscopic video shot based on camera motion

NASA Astrophysics Data System (ADS)

Shi, Guozhong; Sang, Xinzhu; Yu, Xunbo; Liu, Yangdong; Liu, Jing

2014-11-01

As three-dimensional television (3-DTV) and 3-D movie become popular, the discomfort of visual feeling limits further applications of 3D display technology. The cause of visual discomfort from stereoscopic video conflicts between accommodation and convergence, excessive binocular parallax, fast motion of objects and so on. Here, a novel method for evaluating visual fatigue is demonstrated. Influence factors including spatial structure, motion scale and comfortable zone are analyzed. According to the human visual system (HVS), people only need to converge their eyes to the specific objects for static cameras and background. Relative motion should be considered for different camera conditions determining different factor coefficients and weights. Compared with the traditional visual fatigue prediction model, a novel visual fatigue predicting model is presented. Visual fatigue degree is predicted using multiple linear regression method combining with the subjective evaluation. Consequently, each factor can reflect the characteristics of the scene, and the total visual fatigue score can be indicated according to the proposed algorithm. Compared with conventional algorithms which ignored the status of the camera, our approach exhibits reliable performance in terms of correlation with subjective test results.
A Motion-Based Feature for Event-Based Pattern Recognition

PubMed Central

Clady, Xavier; Maro, Jean-Matthieu; Barré, Sébastien; Benosman, Ryad B.

2017-01-01

This paper introduces an event-based luminance-free feature from the output of asynchronous event-based neuromorphic retinas. The feature consists in mapping the distribution of the optical flow along the contours of the moving objects in the visual scene into a matrix. Asynchronous event-based neuromorphic retinas are composed of autonomous pixels, each of them asynchronously generating “spiking” events that encode relative changes in pixels' illumination at high temporal resolutions. The optical flow is computed at each event, and is integrated locally or globally in a speed and direction coordinate frame based grid, using speed-tuned temporal kernels. The latter ensures that the resulting feature equitably represents the distribution of the normal motion along the current moving edges, whatever their respective dynamics. The usefulness and the generality of the proposed feature are demonstrated in pattern recognition applications: local corner detection and global gesture recognition. PMID:28101001
Model-based video segmentation for vision-augmented interactive games

NASA Astrophysics Data System (ADS)

Liu, Lurng-Kuo

2000-04-01

This paper presents an architecture and algorithms for model based video object segmentation and its applications to vision augmented interactive game. We are especially interested in real time low cost vision based applications that can be implemented in software in a PC. We use different models for background and a player object. The object segmentation algorithm is performed in two different levels: pixel level and object level. At pixel level, the segmentation algorithm is formulated as a maximizing a posteriori probability (MAP) problem. The statistical likelihood of each pixel is calculated and used in the MAP problem. Object level segmentation is used to improve segmentation quality by utilizing the information about the spatial and temporal extent of the object. The concept of an active region, which is defined based on motion histogram and trajectory prediction, is introduced to indicate the possibility of a video object region for both background and foreground modeling. It also reduces the overall computation complexity. In contrast with other applications, the proposed video object segmentation system is able to create background and foreground models on the fly even without introductory background frames. Furthermore, we apply different rate of self-tuning on the scene model so that the system can adapt to the environment when there is a scene change. We applied the proposed video object segmentation algorithms to several prototype virtual interactive games. In our prototype vision augmented interactive games, a player can immerse himself/herself inside a game and can virtually interact with other animated characters in a real time manner without being constrained by helmets, gloves, special sensing devices, or background environment. The potential applications of the proposed algorithms including human computer gesture interface and object based video coding such as MPEG-4 video coding.
Feature-based attentional modulations in the absence of direct visual stimulation.

PubMed

Serences, John T; Boynton, Geoffrey M

2007-07-19

When faced with a crowded visual scene, observers must selectively attend to behaviorally relevant objects to avoid sensory overload. Often this selection process is guided by prior knowledge of a target-defining feature (e.g., the color red when looking for an apple), which enhances the firing rate of visual neurons that are selective for the attended feature. Here, we used functional magnetic resonance imaging and a pattern classification algorithm to predict the attentional state of human observers as they monitored a visual feature (one of two directions of motion). We find that feature-specific attention effects spread across the visual field-even to regions of the scene that do not contain a stimulus. This spread of feature-based attention to empty regions of space may facilitate the perception of behaviorally relevant stimuli by increasing sensitivity to attended features at all locations in the visual field.
Application of composite small calibration objects in traffic accident scene photogrammetry.

PubMed

Chen, Qiang; Xu, Hongguo; Tan, Lidong

2015-01-01

In order to address the difficulty of arranging large calibration objects and the low measurement accuracy of small calibration objects in traffic accident scene photogrammetry, a photogrammetric method based on a composite of small calibration objects is proposed. Several small calibration objects are placed around the traffic accident scene, and the coordinate system of the composite calibration object is given based on one of them. By maintaining the relative position and coplanar relationship of the small calibration objects, the local coordinate system of each small calibration object is transformed into the coordinate system of the composite calibration object. The two-dimensional direct linear transformation method is improved based on minimizing the reprojection error of the calibration points of all objects. A rectified image is obtained using the nonlinear optimization method. The increased accuracy of traffic accident scene photogrammetry using a composite small calibration object is demonstrated through the analysis of field experiments and case studies.
Exploring direct 3D interaction for full horizontal parallax light field displays using leap motion controller.

PubMed

Adhikarla, Vamsi Kiran; Sodnik, Jaka; Szolgay, Peter; Jakus, Grega

2015-04-14

This paper reports on the design and evaluation of direct 3D gesture interaction with a full horizontal parallax light field display. A light field display defines a visual scene using directional light beams emitted from multiple light sources as if they are emitted from scene points. Each scene point is rendered individually resulting in more realistic and accurate 3D visualization compared to other 3D displaying technologies. We propose an interaction setup combining the visualization of objects within the Field Of View (FOV) of a light field display and their selection through freehand gesture tracked by the Leap Motion Controller. The accuracy and usefulness of the proposed interaction setup was also evaluated in a user study with test subjects. The results of the study revealed high user preference for free hand interaction with light field display as well as relatively low cognitive demand of this technique. Further, our results also revealed some limitations and adjustments of the proposed setup to be addressed in future work.
Modeling Of Object- And Scene-Prototypes With Hierarchically Structured Classes

NASA Astrophysics Data System (ADS)

Ren, Z.; Jensch, P.; Ameling, W.

1989-03-01

The success of knowledge-based image analysis methodology and implementation tools depends largely on an appropriately and efficiently built model wherein the domain-specific context information about and the inherent structure of the observed image scene have been encoded. For identifying an object in an application environment a computer vision system needs to know firstly the description of the object to be found in an image or in an image sequence, secondly the corresponding relationships between object descriptions within the image sequence. This paper presents models of image objects scenes by means of hierarchically structured classes. Using the topovisual formalism of graph and higraph, we are currently studying principally the relational aspect and data abstraction of the modeling in order to visualize the structural nature resident in image objects and scenes, and to formalize. their descriptions. The goal is to expose the structure of image scene and the correspondence of image objects in the low level image interpretation. process. The object-based system design approach has been applied to build the model base. We utilize the object-oriented programming language C + + for designing, testing and implementing the abstracted entity classes and the operation structures which have been modeled topovisually. The reference images used for modeling prototypes of objects and scenes are from industrial environments as'well as medical applications.
Gaze control for an active camera system by modeling human pursuit eye movements

NASA Astrophysics Data System (ADS)

Toelg, Sebastian

1992-11-01

The ability to stabilize the image of one moving object in the presence of others by active movements of the visual sensor is an essential task for biological systems, as well as for autonomous mobile robots. An algorithm is presented that evaluates the necessary movements from acquired visual data and controls an active camera system (ACS) in a feedback loop. No a priori assumptions about the visual scene and objects are needed. The algorithm is based on functional models of human pursuit eye movements and is to a large extent influenced by structural principles of neural information processing. An intrinsic object definition based on the homogeneity of the optical flow field of relevant objects, i.e., moving mainly fronto- parallel, is used. Velocity and spatial information are processed in separate pathways, resulting in either smooth or saccadic sensor movements. The program generates a dynamic shape model of the moving object and focuses its attention to regions where the object is expected. The system proved to behave in a stable manner under real-time conditions in complex natural environments and manages general object motion. In addition it exhibits several interesting abilities well-known from psychophysics like: catch-up saccades, grouping due to coherent motion, and optokinetic nystagmus.
Simultaneous two-view epipolar geometry estimation and motion segmentation by 4D tensor voting.

PubMed

Tong, Wai-Shun; Tang, Chi-Keung; Medioni, Gérard

2004-09-01

We address the problem of simultaneous two-view epipolar geometry estimation and motion segmentation from nonstatic scenes. Given a set of noisy image pairs containing matches of n objects, we propose an unconventional, efficient, and robust method, 4D tensor voting, for estimating the unknown n epipolar geometries, and segmenting the static and motion matching pairs into n independent motions. By considering the 4D isotropic and orthogonal joint image space, only two tensor voting passes are needed, and a very high noise to signal ratio (up to five) can be tolerated. Epipolar geometries corresponding to multiple, rigid motions are extracted in succession. Only two uncalibrated frames are needed, and no simplifying assumption (such as affine camera model or homographic model between images) other than the pin-hole camera model is made. Our novel approach consists of propagating a local geometric smoothness constraint in the 4D joint image space, followed by global consistency enforcement for extracting the fundamental matrices corresponding to independent motions. We have performed extensive experiments to compare our method with some representative algorithms to show that better performance on nonstatic scenes are achieved. Results on challenging data sets are presented.
Overt attention in natural scenes: objects dominate features.

PubMed

Stoll, Josef; Thrun, Michael; Nuthmann, Antje; Einhäuser, Wolfgang

2015-02-01

Whether overt attention in natural scenes is guided by object content or by low-level stimulus features has become a matter of intense debate. Experimental evidence seemed to indicate that once object locations in a scene are known, salience models provide little extra explanatory power. This approach has recently been criticized for using inadequate models of early salience; and indeed, state-of-the-art salience models outperform trivial object-based models that assume a uniform distribution of fixations on objects. Here we propose to use object-based models that take a preferred viewing location (PVL) close to the centre of objects into account. In experiment 1, we demonstrate that, when including this comparably subtle modification, object-based models again are at par with state-of-the-art salience models in predicting fixations in natural scenes. One possible interpretation of these results is that objects rather than early salience dominate attentional guidance. In this view, early-salience models predict fixations through the correlation of their features with object locations. To test this hypothesis directly, in two additional experiments we reduced low-level salience in image areas of high object content. For these modified stimuli, the object-based model predicted fixations significantly better than early salience. This finding held in an object-naming task (experiment 2) and a free-viewing task (experiment 3). These results provide further evidence for object-based fixation selection--and by inference object-based attentional guidance--in natural scenes. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Complex phase error and motion estimation in synthetic aperture radar imaging

NASA Astrophysics Data System (ADS)

Soumekh, M.; Yang, H.

1991-06-01

Attention is given to a SAR wave equation-based system model that accurately represents the interaction of the impinging radar signal with the target to be imaged. The model is used to estimate the complex phase error across the synthesized aperture from the measured corrupted SAR data by combining the two wave equation models governing the collected SAR data at two temporal frequencies of the radar signal. The SAR system model shows that the motion of an object in a static scene results in coupled Doppler shifts in both the temporal frequency domain and the spatial frequency domain of the synthetic aperture. The velocity of the moving object is estimated through these two Doppler shifts. It is shown that once the dynamic target's velocity is known, its reconstruction can be formulated via a squint-mode SAR geometry with parameters that depend upon the dynamic target's velocity.
Firearms in major motion pictures, 1995-2004.

PubMed

Binswanger, Ingrid A; Cowan, John A

2009-03-01

Firearms are a major cause of injury and death. We sought to determine (1) the prevalence of movie scenes that depicted firearms and verbal firearm safety messages; (2) the context and health outcomes in firearm scenes; and (3) the association between the Motion Picture Association of America ratings and firearm scene characteristics. Ten top revenue-grossing motion pictures were selected for each year from 1995 to 2004 in descending order of gross revenues. Data on firearm scenes were collected by movie coders using dual-monitor computer workstations and real-time collection tools. Seventy of the 100 movies had scenes with firearms and the majority of movies with firearms were rated PG-13. Firearm scenes (N = 624) accounted for 17% of screen time in movies with firearms. Among firearm scenes, crime or illegal activity was involved in 45%, deaths occurred in 19%, and injuries occurred in 12%. A verbal reference to safety was made in 0.8%. Depictions of firearms in top revenue-grossing movies were common, but safety messages were exceedingly rare. Major motion pictures present an under-used opportunity for education about firearm safety.
Flies and humans share a motion estimation strategy that exploits natural scene statistics

PubMed Central

Clark, Damon A.; Fitzgerald, James E.; Ales, Justin M.; Gohl, Daryl M.; Silies, Marion A.; Norcia, Anthony M.; Clandinin, Thomas R.

2014-01-01

Sighted animals extract motion information from visual scenes by processing spatiotemporal patterns of light falling on the retina. The dominant models for motion estimation exploit intensity correlations only between pairs of points in space and time. Moving natural scenes, however, contain more complex correlations. Here we show that fly and human visual systems encode the combined direction and contrast polarity of moving edges using triple correlations that enhance motion estimation in natural environments. Both species extract triple correlations with neural substrates tuned for light or dark edges, and sensitivity to specific triple correlations is retained even as light and dark edge motion signals are combined. Thus, both species separately process light and dark image contrasts to capture motion signatures that can improve estimation accuracy. This striking convergence argues that statistical structures in natural scenes have profoundly affected visual processing, driving a common computational strategy over 500 million years of evolution. PMID:24390225
Coherence of structural visual cues and pictorial gravity paves the way for interceptive actions.

PubMed

Zago, Myrka; La Scaleia, Barbara; Miller, William L; Lacquaniti, Francesco

2011-09-20

Dealing with upside-down objects is difficult and takes time. Among the cues that are critical for defining object orientation, the visible influence of gravity on the object's motion has received limited attention. Here, we manipulated the alignment of visible gravity and structural visual cues between each other and relative to the orientation of the observer and physical gravity. Participants pressed a button triggering a hitter to intercept a target accelerated by a virtual gravity. A factorial design assessed the effects of scene orientation (normal or inverted) and target gravity (normal or inverted). We found that interception was significantly more successful when scene direction was concordant with target gravity direction, irrespective of whether both were upright or inverted. This was so independent of the hitter type and when performance feedback to the participants was either available (Experiment 1) or unavailable (Experiment 2). These results show that the combined influence of visible gravity and structural visual cues can outweigh both physical gravity and viewer-centered cues, leading to rely instead on the congruence of the apparent physical forces acting on people and objects in the scene.
Semantic guidance of eye movements in real-world scenes

PubMed Central

Hwang, Alex D.; Wang, Hsueh-Cheng; Pomplun, Marc

2011-01-01

The perception of objects in our visual world is influenced by not only their low-level visual features such as shape and color, but also their high-level features such as meaning and semantic relations among them. While it has been shown that low-level features in real-world scenes guide eye movements during scene inspection and search, the influence of semantic similarity among scene objects on eye movements in such situations has not been investigated. Here we study guidance of eye movements by semantic similarity among objects during real-world scene inspection and search. By selecting scenes from the LabelMe object-annotated image database and applying Latent Semantic Analysis (LSA) to the object labels, we generated semantic saliency maps of real-world scenes based on the semantic similarity of scene objects to the currently fixated object or the search target. An ROC analysis of these maps as predictors of subjects’ gaze transitions between objects during scene inspection revealed a preference for transitions to objects that were semantically similar to the currently inspected one. Furthermore, during the course of a scene search, subjects’ eye movements were progressively guided toward objects that were semantically similar to the search target. These findings demonstrate substantial semantic guidance of eye movements in real-world scenes and show its importance for understanding real-world attentional control. PMID:21426914
Semantic guidance of eye movements in real-world scenes.

PubMed

Hwang, Alex D; Wang, Hsueh-Cheng; Pomplun, Marc

2011-05-25

The perception of objects in our visual world is influenced by not only their low-level visual features such as shape and color, but also their high-level features such as meaning and semantic relations among them. While it has been shown that low-level features in real-world scenes guide eye movements during scene inspection and search, the influence of semantic similarity among scene objects on eye movements in such situations has not been investigated. Here we study guidance of eye movements by semantic similarity among objects during real-world scene inspection and search. By selecting scenes from the LabelMe object-annotated image database and applying latent semantic analysis (LSA) to the object labels, we generated semantic saliency maps of real-world scenes based on the semantic similarity of scene objects to the currently fixated object or the search target. An ROC analysis of these maps as predictors of subjects' gaze transitions between objects during scene inspection revealed a preference for transitions to objects that were semantically similar to the currently inspected one. Furthermore, during the course of a scene search, subjects' eye movements were progressively guided toward objects that were semantically similar to the search target. These findings demonstrate substantial semantic guidance of eye movements in real-world scenes and show its importance for understanding real-world attentional control. Copyright © 2011 Elsevier Ltd. All rights reserved.

Manipulating the content of dynamic natural scenes to characterize response in human MT/MST.

PubMed

Durant, Szonya; Wall, Matthew B; Zanker, Johannes M

2011-09-09

Optic flow is one of the most important sources of information for enabling human navigation through the world. A striking finding from single-cell studies in monkeys is the rapid saturation of response of MT/MST areas with the density of optic flow type motion information. These results are reflected psychophysically in human perception in the saturation of motion aftereffects. We began by comparing responses to natural optic flow scenes in human visual brain areas to responses to the same scenes with inverted contrast (photo negative). This changes scene familiarity while preserving local motion signals. This manipulation had no effect; however, the response was only correlated with the density of local motion (calculated by a motion correlation model) in V1, not in MT/MST. To further investigate this, we manipulated the visible proportion of natural dynamic scenes and found that areas MT and MST did not increase in response over a 16-fold increase in the amount of information presented, i.e., response had saturated. This makes sense in light of the sparseness of motion information in natural scenes, suggesting that the human brain is well adapted to exploit a small amount of dynamic signal and extract information important for survival.
Application of Composite Small Calibration Objects in Traffic Accident Scene Photogrammetry

PubMed Central

Chen, Qiang; Xu, Hongguo; Tan, Lidong

2015-01-01

In order to address the difficulty of arranging large calibration objects and the low measurement accuracy of small calibration objects in traffic accident scene photogrammetry, a photogrammetric method based on a composite of small calibration objects is proposed. Several small calibration objects are placed around the traffic accident scene, and the coordinate system of the composite calibration object is given based on one of them. By maintaining the relative position and coplanar relationship of the small calibration objects, the local coordinate system of each small calibration object is transformed into the coordinate system of the composite calibration object. The two-dimensional direct linear transformation method is improved based on minimizing the reprojection error of the calibration points of all objects. A rectified image is obtained using the nonlinear optimization method. The increased accuracy of traffic accident scene photogrammetry using a composite small calibration object is demonstrated through the analysis of field experiments and case studies. PMID:26011052
Motion-based nonuniformity correction in DoFP polarimeters

NASA Astrophysics Data System (ADS)

Kumar, Rakesh; Tyo, J. Scott; Ratliff, Bradley M.

2007-09-01

Division of Focal Plane polarimeters (DoFP) operate by integrating an array of micropolarizer elements with a focal plane array. These devices have been investigated for over a decade, and example systems have been built in all regions of the optical spectrum. DoFP devices have the distinct advantage that they are mechanically rugged, inherently temporally synchronized, and optically aligned. They have the concomitant disadvantage that each pixel in the FPA has a different instantaneous field of view (IFOV), meaning that the polarization component measurements that go into estimating the Stokes vector across the image come from four different points in the field. In addition to IFOV errors, microgrid camera systems operating in the LWIR have the additional problem that FPA nonuniformity (NU) noise can be quite severe. The spatial differencing nature of a DoFP system exacerbates the residual NU noise that is remaining after calibration, and is often the largest source of false polarization signatures away from regions where IFOV error dominates. We have recently presented a scene based algorithm that uses frame-to-frame motion to compensate for NU noise in unpolarized IR imagers. In this paper, we have extended that algorithm so that it can be used to compensate for NU noise on a DoFP polarimeter. Furthermore, the additional information provided by the scene motion can be used to significantly reduce the IFOV error. We have found a reduction of IFOV error by a factor of 10 if the scene motion is known exactly. Performance is reduced when the motion must be estimated from the scene, but still shows a marked improvement over static DoFP images.
The new generation of OpenGL support in ROOT

NASA Astrophysics Data System (ADS)

Tadel, M.

2008-07-01

OpenGL has been promoted to become the main 3D rendering engine of the ROOT framework. This required a major re-modularization of OpenGL support on all levels, from basic window-system specific interface to medium-level object-representation and top-level scene management. This new architecture allows seamless integration of external scene-graph libraries into the ROOT OpenGL viewer as well as inclusion of ROOT 3D scenes into external GUI and OpenGL-based 3D-rendering frameworks. Scene representation was removed from inside of the viewer, allowing scene-data to be shared among several viewers and providing for a natural implementation of multi-view canvas layouts. The object-graph traversal infrastructure allows free mixing of 3D and 2D-pad graphics and makes implementation of ROOT canvas in pure OpenGL possible. Scene-elements representing ROOT objects trigger automatic instantiation of user-provided rendering-objects based on the dictionary information and class-naming convention. Additionally, a finer, per-object control over scene-updates is available to the user, allowing overhead-free maintenance of dynamic 3D scenes and creation of complex real-time animations. User-input handling was modularized as well, making it easy to support application-specific scene navigation, selection handling and tool management.
3D noise-resistant segmentation and tracking of unknown and occluded objects using integral imaging

NASA Astrophysics Data System (ADS)

Aloni, Doron; Jung, Jae-Hyun; Yitzhaky, Yitzhak

2017-10-01

Three dimensional (3D) object segmentation and tracking can be useful in various computer vision applications, such as: object surveillance for security uses, robot navigation, etc. We present a method for 3D multiple-object tracking using computational integral imaging, based on accurate 3D object segmentation. The method does not employ object detection by motion analysis in a video as conventionally performed (such as background subtraction or block matching). This means that the movement properties do not significantly affect the detection quality. The object detection is performed by analyzing static 3D image data obtained through computational integral imaging With regard to previous works that used integral imaging data in such a scenario, the proposed method performs the 3D tracking of objects without prior information about the objects in the scene, and it is found efficient under severe noise conditions.
Motion Field Estimation for a Dynamic Scene Using a 3D LiDAR

PubMed Central

Li, Qingquan; Zhang, Liang; Mao, Qingzhou; Zou, Qin; Zhang, Pin; Feng, Shaojun; Ochieng, Washington

2014-01-01

This paper proposes a novel motion field estimation method based on a 3D light detection and ranging (LiDAR) sensor for motion sensing for intelligent driverless vehicles and active collision avoidance systems. Unlike multiple target tracking methods, which estimate the motion state of detected targets, such as cars and pedestrians, motion field estimation regards the whole scene as a motion field in which each little element has its own motion state. Compared to multiple target tracking, segmentation errors and data association errors have much less significance in motion field estimation, making it more accurate and robust. This paper presents an intact 3D LiDAR-based motion field estimation method, including pre-processing, a theoretical framework for the motion field estimation problem and practical solutions. The 3D LiDAR measurements are first projected to small-scale polar grids, and then, after data association and Kalman filtering, the motion state of every moving grid is estimated. To reduce computing time, a fast data association algorithm is proposed. Furthermore, considering the spatial correlation of motion among neighboring grids, a novel spatial-smoothing algorithm is also presented to optimize the motion field. The experimental results using several data sets captured in different cities indicate that the proposed motion field estimation is able to run in real-time and performs robustly and effectively. PMID:25207868
Motion field estimation for a dynamic scene using a 3D LiDAR.

PubMed

Li, Qingquan; Zhang, Liang; Mao, Qingzhou; Zou, Qin; Zhang, Pin; Feng, Shaojun; Ochieng, Washington

2014-09-09

This paper proposes a novel motion field estimation method based on a 3D light detection and ranging (LiDAR) sensor for motion sensing for intelligent driverless vehicles and active collision avoidance systems. Unlike multiple target tracking methods, which estimate the motion state of detected targets, such as cars and pedestrians, motion field estimation regards the whole scene as a motion field in which each little element has its own motion state. Compared to multiple target tracking, segmentation errors and data association errors have much less significance in motion field estimation, making it more accurate and robust. This paper presents an intact 3D LiDAR-based motion field estimation method, including pre-processing, a theoretical framework for the motion field estimation problem and practical solutions. The 3D LiDAR measurements are first projected to small-scale polar grids, and then, after data association and Kalman filtering, the motion state of every moving grid is estimated. To reduce computing time, a fast data association algorithm is proposed. Furthermore, considering the spatial correlation of motion among neighboring grids, a novel spatial-smoothing algorithm is also presented to optimize the motion field. The experimental results using several data sets captured in different cities indicate that the proposed motion field estimation is able to run in real-time and performs robustly and effectively.
Motivational Objects in Natural Scenes (MONS): A Database of >800 Objects.

PubMed

Schomaker, Judith; Rau, Elias M; Einhäuser, Wolfgang; Wittmann, Bianca C

2017-01-01

In daily life, we are surrounded by objects with pre-existing motivational associations. However, these are rarely controlled for in experiments with natural stimuli. Research on natural stimuli would therefore benefit from stimuli with well-defined motivational properties; in turn, such stimuli also open new paths in research on motivation. Here we introduce a database of Motivational Objects in Natural Scenes (MONS). The database consists of 107 scenes. Each scene contains 2 to 7 objects placed at approximately equal distance from the scene center. Each scene was photographed creating 3 versions, with one object ("critical object") being replaced to vary the overall motivational value of the scene (appetitive, aversive, and neutral), while maintaining high visual similarity between the three versions. Ratings on motivation, valence, arousal and recognizability were obtained using internet-based questionnaires. Since the main objective was to provide stimuli of well-defined motivational value, three motivation scales were used: (1) Desire to own the object; (2) Approach/Avoid; (3) Desire to interact with the object. Three sets of ratings were obtained in independent sets of observers: for all 805 objects presented on a neutral background, for 321 critical objects presented in their scene context, and for the entire scenes. On the basis of the motivational ratings, objects were subdivided into aversive, neutral, and appetitive categories. The MONS database will provide a standardized basis for future studies on motivational value under realistic conditions.
Fire flame detection based on GICA and target tracking

NASA Astrophysics Data System (ADS)

Rong, Jianzhong; Zhou, Dechuang; Yao, Wei; Gao, Wei; Chen, Juan; Wang, Jian

2013-04-01

To improve the video fire detection rate, a robust fire detection algorithm based on the color, motion and pattern characteristics of fire targets was proposed, which proved a satisfactory fire detection rate for different fire scenes. In this fire detection algorithm: (a) a rule-based generic color model was developed based on analysis on a large quantity of flame pixels; (b) from the traditional GICA (Geometrical Independent Component Analysis) model, a Cumulative Geometrical Independent Component Analysis (C-GICA) model was developed for motion detection without static background and (c) a BP neural network fire recognition model based on multi-features of the fire pattern was developed. Fire detection tests on benchmark fire video clips of different scenes have shown the robustness, accuracy and fast-response of the algorithm.
A systematic comparison between visual cues for boundary detection.

PubMed

Mély, David A; Kim, Junkyung; McGill, Mason; Guo, Yuliang; Serre, Thomas

2016-03-01

The detection of object boundaries is a critical first step for many visual processing tasks. Multiple cues (we consider luminance, color, motion and binocular disparity) available in the early visual system may signal object boundaries but little is known about their relative diagnosticity and how to optimally combine them for boundary detection. This study thus aims at understanding how early visual processes inform boundary detection in natural scenes. We collected color binocular video sequences of natural scenes to construct a video database. Each scene was annotated with two full sets of ground-truth contours (one set limited to object boundaries and another set which included all edges). We implemented an integrated computational model of early vision that spans all considered cues, and then assessed their diagnosticity by training machine learning classifiers on individual channels. Color and luminance were found to be most diagnostic while stereo and motion were least. Combining all cues yielded a significant improvement in accuracy beyond that of any cue in isolation. Furthermore, the accuracy of individual cues was found to be a poor predictor of their unique contribution for the combination. This result suggested a complex interaction between cues, which we further quantified using regularization techniques. Our systematic assessment of the accuracy of early vision models for boundary detection together with the resulting annotated video dataset should provide a useful benchmark towards the development of higher-level models of visual processing. Copyright © 2016 Elsevier Ltd. All rights reserved.
Characterizing head motion in three planes during combined visual and base of support disturbances in healthy and visually sensitive subjects.

PubMed

Keshner, E A; Dhaher, Y

2008-07-01

Multiplanar environmental motion could generate head instability, particularly if the visual surround moves in planes orthogonal to a physical disturbance. We combined sagittal plane surface translations with visual field disturbances in 12 healthy (29-31 years) and 3 visually sensitive (27-57 years) adults. Center of pressure (COP), peak head angles, and RMS values of head motion were calculated and a three-dimensional model of joint motion was developed to examine gross head motion in three planes. We found that subjects standing quietly in front of a visual scene translating in the sagittal plane produced significantly greater (p<0.003) head motion in yaw than when on a translating platform. However, when the platform was translated in the dark or with a visual scene rotating in roll, head motion orthogonal to the plane of platform motion significantly increased (p<0.02). Visually sensitive subjects having no history of vestibular disorder produced large, delayed compensatory head motion. Orthogonal head motions were significantly greater in visually sensitive than in healthy subjects in the dark (p<0.05) and with a stationary scene (p<0.01). We concluded that motion of the visual field could modify compensatory response kinematics of a freely moving head in planes orthogonal to the direction of a physical perturbation. These results suggest that the mechanisms controlling head orientation in space are distinct from those that control trunk orientation in space. These behaviors would have been missed if only COP data were considered. Data suggest that rehabilitation training can be enhanced by combining visual and mechanical perturbation paradigms.
Research in interactive scene analysis

NASA Technical Reports Server (NTRS)

Tenenbaum, J. M.; Garvey, T. D.; Weyl, S. A.; Wolf, H. C.

1975-01-01

An interactive scene interpretation system (ISIS) was developed as a tool for constructing and experimenting with man-machine and automatic scene analysis methods tailored for particular image domains. A recently developed region analysis subsystem based on the paradigm of Brice and Fennema is described. Using this subsystem a series of experiments was conducted to determine good criteria for initially partitioning a scene into atomic regions and for merging these regions into a final partition of the scene along object boundaries. Semantic (problem-dependent) knowledge is essential for complete, correct partitions of complex real-world scenes. An interactive approach to semantic scene segmentation was developed and demonstrated on both landscape and indoor scenes. This approach provides a reasonable methodology for segmenting scenes that cannot be processed completely automatically, and is a promising basis for a future automatic system. A program is described that can automatically generate strategies for finding specific objects in a scene based on manually designated pictorial examples.
Motion-induced error reduction by combining Fourier transform profilometry with phase-shifting profilometry.

PubMed

Li, Beiwen; Liu, Ziping; Zhang, Song

2016-10-03

We propose a hybrid computational framework to reduce motion-induced measurement error by combining the Fourier transform profilometry (FTP) and phase-shifting profilometry (PSP). The proposed method is composed of three major steps: Step 1 is to extract continuous relative phase maps for each isolated object with single-shot FTP method and spatial phase unwrapping; Step 2 is to obtain an absolute phase map of the entire scene using PSP method, albeit motion-induced errors exist on the extracted absolute phase map; and Step 3 is to shift the continuous relative phase maps from Step 1 to generate final absolute phase maps for each isolated object by referring to the absolute phase map with error from Step 2. Experiments demonstrate the success of the proposed computational framework for measuring multiple isolated rapidly moving objects.
Radiometrically accurate scene-based nonuniformity correction for array sensors.

PubMed

Ratliff, Bradley M; Hayat, Majeed M; Tyo, J Scott

2003-10-01

A novel radiometrically accurate scene-based nonuniformity correction (NUC) algorithm is described. The technique combines absolute calibration with a recently reported algebraic scene-based NUC algorithm. The technique is based on the following principle: First, detectors that are along the perimeter of the focal-plane array are absolutely calibrated; then the calibration is transported to the remaining uncalibrated interior detectors through the application of the algebraic scene-based algorithm, which utilizes pairs of image frames exhibiting arbitrary global motion. The key advantage of this technique is that it can obtain radiometric accuracy during NUC without disrupting camera operation. Accurate estimates of the bias nonuniformity can be achieved with relatively few frames, which can be fewer than ten frame pairs. Advantages of this technique are discussed, and a thorough performance analysis is presented with use of simulated and real infrared imagery.
Forces and Motion: How Young Children Understand Causal Events

ERIC Educational Resources Information Center

Goksun, Tilbe; George, Nathan R.; Hirsh-Pasek, Kathy; Golinkoff, Roberta M.

2013-01-01

How do children evaluate complex causal events? This study investigates preschoolers' representation of "force dynamics" in causal scenes, asking whether (a) children understand how single and dual forces impact an object's movement and (b) this understanding varies across cause types (Cause, Enable, Prevent). Three-and-a half- to…
An integrated framework for detecting suspicious behaviors in video surveillance

NASA Astrophysics Data System (ADS)

Zin, Thi Thi; Tin, Pyke; Hama, Hiromitsu; Toriu, Takashi

2014-03-01

In this paper, we propose an integrated framework for detecting suspicious behaviors in video surveillance systems which are established in public places such as railway stations, airports, shopping malls and etc. Especially, people loitering in suspicion, unattended objects left behind and exchanging suspicious objects between persons are common security concerns in airports and other transit scenarios. These involve understanding scene/event, analyzing human movements, recognizing controllable objects, and observing the effect of the human movement on those objects. In the proposed framework, multiple background modeling technique, high level motion feature extraction method and embedded Markov chain models are integrated for detecting suspicious behaviors in real time video surveillance systems. Specifically, the proposed framework employs probability based multiple backgrounds modeling technique to detect moving objects. Then the velocity and distance measures are computed as the high level motion features of the interests. By using an integration of the computed features and the first passage time probabilities of the embedded Markov chain, the suspicious behaviors in video surveillance are analyzed for detecting loitering persons, objects left behind and human interactions such as fighting. The proposed framework has been tested by using standard public datasets and our own video surveillance scenarios.
Visual cognition

PubMed Central

Cavanagh, Patrick

2011-01-01

Visual cognition, high-level vision, mid-level vision and top-down processing all refer to decision-based scene analyses that combine prior knowledge with retinal input to generate representations. The label “visual cognition” is little used at present, but research and experiments on mid- and high-level, inference-based vision have flourished, becoming in the 21st century a significant, if often understated part, of current vision research. How does visual cognition work? What are its moving parts? This paper reviews the origins and architecture of visual cognition and briefly describes some work in the areas of routines, attention, surfaces, objects, and events (motion, causality, and agency). Most vision scientists avoid being too explicit when presenting concepts about visual cognition, having learned that explicit models invite easy criticism. What we see in the literature is ample evidence for visual cognition, but few or only cautious attempts to detail how it might work. This is the great unfinished business of vision research: at some point we will be done with characterizing how the visual system measures the world and we will have to return to the question of how vision constructs models of objects, surfaces, scenes, and events. PMID:21329719
Dissipation function and adaptive gradient reconstruction based smoke detection in video

NASA Astrophysics Data System (ADS)

Li, Bin; Zhang, Qiang; Shi, Chunlei

2017-11-01

A method for smoke detection in video is proposed. The camera monitoring the scene is assumed to be stationary. With the atmospheric scattering model, dissipation function is reflected transmissivity between the background objects in the scene and the camera. Dark channel prior and fast bilateral filter are used for estimating dissipation function which is only the function of the depth of field. Based on dissipation function, visual background extractor (ViBe) can be used for detecting smoke as a result of smoke's motion characteristics as well as detecting other moving targets. Since smoke has semi-transparent parts, the things which are covered by these parts can be recovered by poisson equation adaptively. The similarity between the recovered parts and the original background parts in the same position is calculated by Normalized Cross Correlation (NCC) and the original background's value is selected from the frame which is nearest to the current frame. The parts with high similarity are considered as smoke parts.
Band registration of tuneable frame format hyperspectral UAV imagers in complex scenes

NASA Astrophysics Data System (ADS)

Honkavaara, Eija; Rosnell, Tomi; Oliveira, Raquel; Tommaselli, Antonio

2017-12-01

A recent revolution in miniaturised sensor technology has provided markets with novel hyperspectral imagers operating in the frame format principle. In the case of unmanned aerial vehicle (UAV) based remote sensing, the frame format technology is highly attractive in comparison to the commonly utilised pushbroom scanning technology, because it offers better stability and the possibility to capture stereoscopic data sets, bringing an opportunity for 3D hyperspectral object reconstruction. Tuneable filters are one of the approaches for capturing multi- or hyperspectral frame images. The individual bands are not aligned when operating a sensor based on tuneable filters from a mobile platform, such as UAV, because the full spectrum recording is carried out in the time-sequential principle. The objective of this investigation was to study the aspects of band registration of an imager based on tuneable filters and to develop a rigorous and efficient approach for band registration in complex 3D scenes, such as forests. The method first determines the orientations of selected reference bands and reconstructs the 3D scene using structure-from-motion and dense image matching technologies. The bands, without orientation, are then matched to the oriented bands accounting the 3D scene to provide exterior orientations, and afterwards, hyperspectral orthomosaics, or hyperspectral point clouds, are calculated. The uncertainty aspects of the novel approach were studied. An empirical assessment was carried out in a forested environment using hyperspectral images captured with a hyperspectral 2D frame format camera, based on a tuneable Fabry-Pérot interferometer (FPI) on board a multicopter and supported by a high spatial resolution consumer colour camera. A theoretical assessment showed that the method was capable of providing band registration accuracy better than 0.5-pixel size. The empirical assessment proved the performance and showed that, with the novel method, most parts of the band misalignments were less than the pixel size. Furthermore, it was shown that the performance of the band alignment was dependent on the spatial distance from the reference band.
Vection: the contributions of absolute and relative visual motion.

PubMed

Howard, I P; Howard, A

1994-01-01

Inspection of a visual scene rotating about the vertical body axis induces a compelling sense of self rotation, or circular vection. Circular vection is suppressed by stationary objects seen beyond the moving display but not by stationary objects in the foreground. We hypothesised that stationary objects in the foreground facilitate vection because they introduce a relative-motion signal into what would otherwise be an absolute-motion signal. Vection latency and magnitude were measured with a full-field moving display and with stationary objects of various sizes and at various positions in the visual field. The results confirmed the hypothesis. Vection latency was longer when there were no stationary objects in view than when stationary objects were in view. The effect of stationary objects was particularly evident at low stimulus velocities. At low velocities a small stationary point significantly increased vection magnitude in spite of the fact that, at higher stimulus velocities and with other stationary objects in view, fixation on a stationary point, if anything, reduced vection. Changing the position of the stationary objects in the field of view did not affect vection latencies or magnitudes.

Exploring Direct 3D Interaction for Full Horizontal Parallax Light Field Displays Using Leap Motion Controller

PubMed Central

Adhikarla, Vamsi Kiran; Sodnik, Jaka; Szolgay, Peter; Jakus, Grega

2015-01-01

This paper reports on the design and evaluation of direct 3D gesture interaction with a full horizontal parallax light field display. A light field display defines a visual scene using directional light beams emitted from multiple light sources as if they are emitted from scene points. Each scene point is rendered individually resulting in more realistic and accurate 3D visualization compared to other 3D displaying technologies. We propose an interaction setup combining the visualization of objects within the Field Of View (FOV) of a light field display and their selection through freehand gesture tracked by the Leap Motion Controller. The accuracy and usefulness of the proposed interaction setup was also evaluated in a user study with test subjects. The results of the study revealed high user preference for free hand interaction with light field display as well as relatively low cognitive demand of this technique. Further, our results also revealed some limitations and adjustments of the proposed setup to be addressed in future work. PMID:25875189
Active Segmentation.

PubMed

Mishra, Ajay; Aloimonos, Yiannis

2009-01-01

The human visual system observes and understands a scene/image by making a series of fixations. Every fixation point lies inside a particular region of arbitrary shape and size in the scene which can either be an object or just a part of it. We define as a basic segmentation problem the task of segmenting that region containing the fixation point. Segmenting the region containing the fixation is equivalent to finding the enclosing contour- a connected set of boundary edge fragments in the edge map of the scene - around the fixation. This enclosing contour should be a depth boundary.We present here a novel algorithm that finds this bounding contour and achieves the segmentation of one object, given the fixation. The proposed segmentation framework combines monocular cues (color/intensity/texture) with stereo and/or motion, in a cue independent manner. The semantic robots of the immediate future will be able to use this algorithm to automatically find objects in any environment. The capability of automatically segmenting objects in their visual field can bring the visual processing to the next level. Our approach is different from current approaches. While existing work attempts to segment the whole scene at once into many areas, we segment only one image region, specifically the one containing the fixation point. Experiments with real imagery collected by our active robot and from the known databases 1 demonstrate the promise of the approach.
Vision System for Coarsely Estimating Motion Parameters for Unknown Fast Moving Objects in Space

PubMed Central

Chen, Min; Hashimoto, Koichi

2017-01-01

Motivated by biological interests in analyzing navigation behaviors of flying animals, we attempt to build a system measuring their motion states. To do this, in this paper, we build a vision system to detect unknown fast moving objects within a given space, calculating their motion parameters represented by positions and poses. We proposed a novel method to detect reliable interest points from images of moving objects, which can be hardly detected by general purpose interest point detectors. 3D points reconstructed using these interest points are then grouped and maintained for detected objects, according to a careful schedule, considering appearance and perspective changes. In the estimation step, a method is introduced to adapt the robust estimation procedure used for dense point set to the case for sparse set, reducing the potential risk of greatly biased estimation. Experiments are conducted against real scenes, showing the capability of the system of detecting multiple unknown moving objects and estimating their positions and poses. PMID:29206189
Real-time motion artifacts compensation of ToF sensors data on GPU

NASA Astrophysics Data System (ADS)

Lefloch, Damien; Hoegg, Thomas; Kolb, Andreas

2013-05-01

Over the last decade, ToF sensors attracted many computer vision and graphics researchers. Nevertheless, ToF devices suffer from severe motion artifacts for dynamic scenes as well as low-resolution depth data which strongly justifies the importance of a valid correction. To counterbalance this effect, a pre-processing approach is introduced to greatly improve range image data on dynamic scenes. We first demonstrate the robustness of our approach using simulated data to finally validate our method using sensor range data. Our GPU-based processing pipeline enhances range data reliability in real-time.
Motivational Objects in Natural Scenes (MONS): A Database of >800 Objects

PubMed Central

Schomaker, Judith; Rau, Elias M.; Einhäuser, Wolfgang; Wittmann, Bianca C.

2017-01-01

In daily life, we are surrounded by objects with pre-existing motivational associations. However, these are rarely controlled for in experiments with natural stimuli. Research on natural stimuli would therefore benefit from stimuli with well-defined motivational properties; in turn, such stimuli also open new paths in research on motivation. Here we introduce a database of Motivational Objects in Natural Scenes (MONS). The database consists of 107 scenes. Each scene contains 2 to 7 objects placed at approximately equal distance from the scene center. Each scene was photographed creating 3 versions, with one object (“critical object”) being replaced to vary the overall motivational value of the scene (appetitive, aversive, and neutral), while maintaining high visual similarity between the three versions. Ratings on motivation, valence, arousal and recognizability were obtained using internet-based questionnaires. Since the main objective was to provide stimuli of well-defined motivational value, three motivation scales were used: (1) Desire to own the object; (2) Approach/Avoid; (3) Desire to interact with the object. Three sets of ratings were obtained in independent sets of observers: for all 805 objects presented on a neutral background, for 321 critical objects presented in their scene context, and for the entire scenes. On the basis of the motivational ratings, objects were subdivided into aversive, neutral, and appetitive categories. The MONS database will provide a standardized basis for future studies on motivational value under realistic conditions. PMID:29033870
Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks

PubMed Central

Yu, Haiyang; Wu, Zhihai; Wang, Shuqin; Wang, Yunpeng; Ma, Xiaolei

2017-01-01

Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction. PMID:28672867
Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks.

PubMed

Yu, Haiyang; Wu, Zhihai; Wang, Shuqin; Wang, Yunpeng; Ma, Xiaolei

2017-06-26

Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction.
Graphics processing unit (GPU) real-time infrared scene generation

NASA Astrophysics Data System (ADS)

Christie, Chad L.; Gouthas, Efthimios (Themie); Williams, Owen M.

2007-04-01

VIRSuite, the GPU-based suite of software tools developed at DSTO for real-time infrared scene generation, is described. The tools include the painting of scene objects with radiometrically-associated colours, translucent object generation, polar plot validation and versatile scene generation. Special features include radiometric scaling within the GPU and the presence of zoom anti-aliasing at the core of VIRSuite. Extension of the zoom anti-aliasing construct to cover target embedding and the treatment of translucent objects is described.
The roles of scene gist and spatial dependency among objects in the semantic guidance of attention in real-world scenes.

PubMed

Wu, Chia-Chien; Wang, Hsueh-Cheng; Pomplun, Marc

2014-12-01

A previous study (Vision Research 51 (2011) 1192-1205) found evidence for semantic guidance of visual attention during the inspection of real-world scenes, i.e., an influence of semantic relationships among scene objects on overt shifts of attention. In particular, the results revealed an observer bias toward gaze transitions between semantically similar objects. However, this effect is not necessarily indicative of semantic processing of individual objects but may be mediated by knowledge of the scene gist, which does not require object recognition, or by known spatial dependency among objects. To examine the mechanisms underlying semantic guidance, in the present study, participants were asked to view a series of displays with the scene gist excluded and spatial dependency varied. Our results show that spatial dependency among objects seems to be sufficient to induce semantic guidance. Scene gist, on the other hand, does not seem to affect how observers use semantic information to guide attention while viewing natural scenes. Extracting semantic information mainly based on spatial dependency may be an efficient strategy of the visual system that only adds little cognitive load to the viewing task. Copyright © 2014 Elsevier Ltd. All rights reserved.
Spatial detection of tv channel logos as outliers from the content

NASA Astrophysics Data System (ADS)

Ekin, Ahmet; Braspenning, Ralph

2006-01-01

This paper proposes a purely image-based TV channel logo detection algorithm that can detect logos independently from their motion and transparency features. The proposed algorithm can robustly detect any type of logos, such as transparent and animated, without requiring any temporal constraints whereas known methods have to wait for the occurrence of large motion in the scene and assume stationary logos. The algorithm models logo pixels as outliers from the actual scene content that is represented by multiple 3-D histograms in the YC BC R space. We use four scene histograms corresponding to each of the four corners because the content characteristics change from one image corner to another. A further novelty of the proposed algorithm is that we define image corners and the areas where we compute the scene histograms by a cinematic technique called Golden Section Rule that is used by professionals. The robustness of the proposed algorithm is demonstrated over a dataset of representative TV content.
Inferring the direction of implied motion depends on visual awareness

PubMed Central

Faivre, Nathan; Koch, Christof

2014-01-01

Visual awareness of an event, object, or scene is, by essence, an integrated experience, whereby different visual features composing an object (e.g., orientation, color, shape) appear as an unified percept and are processed as a whole. Here, we tested in human observers whether perceptual integration of static motion cues depends on awareness by measuring the capacity to infer the direction of motion implied by a static visible or invisible image under continuous flash suppression. Using measures of directional adaptation, we found that visible but not invisible implied motion adaptors biased the perception of real motion probes. In a control experiment, we found that invisible adaptors implying motion primed the perception of subsequent probes when they were identical (i.e., repetition priming), but not when they only shared the same direction (i.e., direction priming). Furthermore, using a model of visual processing, we argue that repetition priming effects are likely to arise as early as in the primary visual cortex. We conclude that although invisible images implying motion undergo some form of nonconscious processing, visual awareness is necessary to make inferences about motion direction. PMID:24706951
Inferring the direction of implied motion depends on visual awareness.

PubMed

Faivre, Nathan; Koch, Christof

2014-04-04

Visual awareness of an event, object, or scene is, by essence, an integrated experience, whereby different visual features composing an object (e.g., orientation, color, shape) appear as an unified percept and are processed as a whole. Here, we tested in human observers whether perceptual integration of static motion cues depends on awareness by measuring the capacity to infer the direction of motion implied by a static visible or invisible image under continuous flash suppression. Using measures of directional adaptation, we found that visible but not invisible implied motion adaptors biased the perception of real motion probes. In a control experiment, we found that invisible adaptors implying motion primed the perception of subsequent probes when they were identical (i.e., repetition priming), but not when they only shared the same direction (i.e., direction priming). Furthermore, using a model of visual processing, we argue that repetition priming effects are likely to arise as early as in the primary visual cortex. We conclude that although invisible images implying motion undergo some form of nonconscious processing, visual awareness is necessary to make inferences about motion direction.
Characterizing Head Motion in 3 Planes during Combined Visual and Base of Support Disturbances in Healthy and Visually Sensitive Subjects

PubMed Central

Keshner, E.A.; Dhaher, Y.

2008-01-01

Multiplanar environmental motion could generate head instability, particularly if the visual surround moves in planes orthogonal to a physical disturbance. We combined sagittal plane surface translations with visual field disturbances in 12 healthy (29–31 years) and 3 visually sensitive (27–57 years) adults. Center of pressure (COP), peak head angles, and RMS values of head motion were calculated and a 3-dimensional model of joint motion11 was developed to examine gross head motion in 3 planes. We found that subjects standing quietly in front of a visual scene translating in the sagittal plane produced significantly greater (p<0.003) head motion in yaw than when on a translating platform. However, when the platform was translated in the dark or with a visual scene rotating in roll, head motion orthogonal to the plane of platform motion significantly increased (p<0.02). Visually sensitive subjects having no history of vestibular disorder produced large, delayed compensatory head motion. Orthogonal head motions were significantly greater in visually sensitive than in healthy subjects in the dark (p<0.05) and with a stationary scene (p<0.01). We concluded that motion of the visual field can modify compensatory response kinematics of a freely moving head in planes orthogonal to the direction of a physical perturbation. These results suggest that the mechanisms controlling head orientation in space are distinct from those that control trunk orientation in space. These behaviors would have been missed if only COP data were considered. Data suggest that rehabilitation training can be enhanced by combining visual and mechanical perturbation paradigms. PMID:18162402
A habituation based approach for detection of visual changes in surveillance camera

NASA Astrophysics Data System (ADS)

Sha'abani, M. N. A. H.; Adan, N. F.; Sabani, M. S. M.; Abdullah, F.; Nadira, J. H. S.; Yasin, M. S. M.

2017-09-01

This paper investigates a habituation based approach in detecting visual changes using video surveillance systems in a passive environment. Various techniques have been introduced for dynamic environment such as motion detection, object classification and behaviour analysis. However, in a passive environment, most of the scenes recorded by the surveillance system are normal. Therefore, implementing a complex analysis all the time in the passive environment resulting on computationally expensive, especially when using a high video resolution. Thus, a mechanism of attention is required, where the system only responds to an abnormal event. This paper proposed a novelty detection mechanism in detecting visual changes and a habituation based approach to measure the level of novelty. The objective of the paper is to investigate the feasibility of the habituation based approach in detecting visual changes. Experiment results show that the approach are able to accurately detect the presence of novelty as deviations from the learned knowledge.
A bio-inspired method and system for visual object-based attention and segmentation

NASA Astrophysics Data System (ADS)

Huber, David J.; Khosla, Deepak

2010-04-01

This paper describes a method and system of human-like attention and object segmentation in visual scenes that (1) attends to regions in a scene in their rank of saliency in the image, (2) extracts the boundary of an attended proto-object based on feature contours, and (3) can be biased to boost the attention paid to specific features in a scene, such as those of a desired target object in static and video imagery. The purpose of the system is to identify regions of a scene of potential importance and extract the region data for processing by an object recognition and classification algorithm. The attention process can be performed in a default, bottom-up manner or a directed, top-down manner which will assign a preference to certain features over others. One can apply this system to any static scene, whether that is a still photograph or imagery captured from video. We employ algorithms that are motivated by findings in neuroscience, psychology, and cognitive science to construct a system that is novel in its modular and stepwise approach to the problems of attention and region extraction, its application of a flooding algorithm to break apart an image into smaller proto-objects based on feature density, and its ability to join smaller regions of similar features into larger proto-objects. This approach allows many complicated operations to be carried out by the system in a very short time, approaching real-time. A researcher can use this system as a robust front-end to a larger system that includes object recognition and scene understanding modules; it is engineered to function over a broad range of situations and can be applied to any scene with minimal tuning from the user.
Computational model for perception of objects and motions.

PubMed

Yang, WenLu; Zhang, LiQing; Ma, LiBo

2008-06-01

Perception of objects and motions in the visual scene is one of the basic problems in the visual system. There exist 'What' and 'Where' pathways in the superior visual cortex, starting from the simple cells in the primary visual cortex. The former is able to perceive objects such as forms, color, and texture, and the latter perceives 'where', for example, velocity and direction of spatial movement of objects. This paper explores brain-like computational architectures of visual information processing. We propose a visual perceptual model and computational mechanism for training the perceptual model. The computational model is a three-layer network. The first layer is the input layer which is used to receive the stimuli from natural environments. The second layer is designed for representing the internal neural information. The connections between the first layer and the second layer, called the receptive fields of neurons, are self-adaptively learned based on principle of sparse neural representation. To this end, we introduce Kullback-Leibler divergence as the measure of independence between neural responses and derive the learning algorithm based on minimizing the cost function. The proposed algorithm is applied to train the basis functions, namely receptive fields, which are localized, oriented, and bandpassed. The resultant receptive fields of neurons in the second layer have the characteristics resembling that of simple cells in the primary visual cortex. Based on these basis functions, we further construct the third layer for perception of what and where in the superior visual cortex. The proposed model is able to perceive objects and their motions with a high accuracy and strong robustness against additive noise. Computer simulation results in the final section show the feasibility of the proposed perceptual model and high efficiency of the learning algorithm.
Efficient structure from motion on large scenes using UAV with position and pose information

NASA Astrophysics Data System (ADS)

Teng, Xichao; Yu, Qifeng; Shang, Yang; Luo, Jing; Wang, Gang

2018-04-01

In this paper, we exploit prior information from global positioning systems and inertial measurement units to speed up the process of large scene reconstruction from images acquired by Unmanned Aerial Vehicles. We utilize weak pose information and intrinsic parameter to obtain the projection matrix for each view. As compared to unmanned aerial vehicles' flight altitude, topographic relief can usually be ignored, we assume that the scene is flat and use weak perspective camera to get projective transformations between two views. Furthermore, we propose an overlap criterion and select potentially matching view pairs between projective transformed views. A robust global structure from motion method is used for image based reconstruction. Our real world experiments show that the approach is accurate, scalable and computationally efficient. Moreover, projective transformations between views can also be used to eliminate false matching.
Comparable mechanisms of working memory interference by auditory and visual motion in youth and aging

PubMed Central

Mishra, Jyoti; Zanto, Theodore; Nilakantan, Aneesha; Gazzaley, Adam

2013-01-01

Intrasensory interference during visual working memory (WM) maintenance by object stimuli (such as faces and scenes), has been shown to negatively impact WM performance, with greater detrimental impacts of interference observed in aging. Here we assessed age-related impacts by intrasensory WM interference from lower-level stimulus features such as visual and auditory motion stimuli. We consistently found that interference in the form of ignored distractions and secondary task i nterruptions presented during a WM maintenance period, degraded memory accuracy in both the visual and auditory domain. However, in contrast to prior studies assessing WM for visual object stimuli, feature-based interference effects were not observed to be significantly greater in older adults. Analyses of neural oscillations in the alpha frequency band further revealed preserved mechanisms of interference processing in terms of post-stimulus alpha suppression, which was observed maximally for secondary task interruptions in visual and auditory modalities in both younger and older adults. These results suggest that age-related sensitivity of WM to interference may be limited to complex object stimuli, at least at low WM loads. PMID:23791629
3D Reasoning from Blocks to Stability.

PubMed

Zhaoyin Jia; Gallagher, Andrew C; Saxena, Ashutosh; Chen, Tsuhan

2015-05-01

Objects occupy physical space and obey physical laws. To truly understand a scene, we must reason about the space that objects in it occupy, and how each objects is supported stably by each other. In other words, we seek to understand which objects would, if moved, cause other objects to fall. This 3D volumetric reasoning is important for many scene understanding tasks, ranging from segmentation of objects to perception of a rich 3D, physically well-founded, interpretations of the scene. In this paper, we propose a new algorithm to parse a single RGB-D image with 3D block units while jointly reasoning about the segments, volumes, supporting relationships, and object stability. Our algorithm is based on the intuition that a good 3D representation of the scene is one that fits the depth data well, and is a stable, self-supporting arrangement of objects (i.e., one that does not topple). We design an energy function for representing the quality of the block representation based on these properties. Our algorithm fits 3D blocks to the depth values corresponding to image segments, and iteratively optimizes the energy function. Our proposed algorithm is the first to consider stability of objects in complex arrangements for reasoning about the underlying structure of the scene. Experimental results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
Superordinate Level Processing Has Priority Over Basic-Level Processing in Scene Gist Recognition

PubMed Central

Sun, Qi; Zheng, Yang; Sun, Mingxia; Zheng, Yuanjie

2016-01-01

By combining a perceptual discrimination task and a visuospatial working memory task, the present study examined the effects of visuospatial working memory load on the hierarchical processing of scene gist. In the perceptual discrimination task, two scene images from the same (manmade–manmade pairing or natural–natural pairing) or different superordinate level categories (manmade–natural pairing) were presented simultaneously, and participants were asked to judge whether these two images belonged to the same basic-level category (e.g., street–street pairing) or not (e.g., street–highway pairing). In the concurrent working memory task, spatial load (position-based load in Experiment 1) and object load (figure-based load in Experiment 2) were manipulated. The results were as follows: (a) spatial load and object load have stronger effects on discrimination of same basic-level scene pairing than same superordinate level scene pairing; (b) spatial load has a larger impact on the discrimination of scene pairings at early stages than at later stages; on the contrary, object information has a larger influence on at later stages than at early stages. It followed that superordinate level processing has priority over basic-level processing in scene gist recognition and spatial information contributes to the earlier and object information to the later stages in scene gist recognition. PMID:28382195

Perceptual interaction of local motion signals

PubMed Central

Nitzany, Eyal I.; Loe, Maren E.; Palmer, Stephanie E.; Victor, Jonathan D.

2016-01-01

Motion signals are a rich source of information used in many everyday tasks, such as segregation of objects from background and navigation. Motion analysis by biological systems is generally considered to consist of two stages: extraction of local motion signals followed by spatial integration. Studies using synthetic stimuli show that there are many kinds and subtypes of local motion signals. When presented in isolation, these stimuli elicit behavioral and neurophysiological responses in a wide range of species, from insects to mammals. However, these mathematically-distinct varieties of local motion signals typically co-exist in natural scenes. This study focuses on interactions between two kinds of local motion signals: Fourier and glider. Fourier signals are typically associated with translation, while glider signals occur when an object approaches or recedes. Here, using a novel class of synthetic stimuli, we ask how distinct kinds of local motion signals interact and whether context influences sensitivity to Fourier motion. We report that local motion signals of different types interact at the perceptual level, and that this interaction can include subthreshold summation and, in some subjects, subtle context-dependent changes in sensitivity. We discuss the implications of these observations, and the factors that may underlie them. PMID:27902829
Perceptual interaction of local motion signals.

PubMed

Nitzany, Eyal I; Loe, Maren E; Palmer, Stephanie E; Victor, Jonathan D

2016-11-01

Motion signals are a rich source of information used in many everyday tasks, such as segregation of objects from background and navigation. Motion analysis by biological systems is generally considered to consist of two stages: extraction of local motion signals followed by spatial integration. Studies using synthetic stimuli show that there are many kinds and subtypes of local motion signals. When presented in isolation, these stimuli elicit behavioral and neurophysiological responses in a wide range of species, from insects to mammals. However, these mathematically-distinct varieties of local motion signals typically co-exist in natural scenes. This study focuses on interactions between two kinds of local motion signals: Fourier and glider. Fourier signals are typically associated with translation, while glider signals occur when an object approaches or recedes. Here, using a novel class of synthetic stimuli, we ask how distinct kinds of local motion signals interact and whether context influences sensitivity to Fourier motion. We report that local motion signals of different types interact at the perceptual level, and that this interaction can include subthreshold summation and, in some subjects, subtle context-dependent changes in sensitivity. We discuss the implications of these observations, and the factors that may underlie them.
Fourier power, subjective distance, and object categories all provide plausible models of BOLD responses in scene-selective visual areas

PubMed Central

Lescroart, Mark D.; Stansbury, Dustin E.; Gallant, Jack L.

2015-01-01

Perception of natural visual scenes activates several functional areas in the human brain, including the Parahippocampal Place Area (PPA), Retrosplenial Complex (RSC), and the Occipital Place Area (OPA). It is currently unclear what specific scene-related features are represented in these areas. Previous studies have suggested that PPA, RSC, and/or OPA might represent at least three qualitatively different classes of features: (1) 2D features related to Fourier power; (2) 3D spatial features such as the distance to objects in a scene; or (3) abstract features such as the categories of objects in a scene. To determine which of these hypotheses best describes the visual representation in scene-selective areas, we applied voxel-wise modeling (VM) to BOLD fMRI responses elicited by a set of 1386 images of natural scenes. VM provides an efficient method for testing competing hypotheses by comparing predictions of brain activity based on encoding models that instantiate each hypothesis. Here we evaluated three different encoding models that instantiate each of the three hypotheses listed above. We used linear regression to fit each encoding model to the fMRI data recorded from each voxel, and we evaluated each fit model by estimating the amount of variance it predicted in a withheld portion of the data set. We found that voxel-wise models based on Fourier power or the subjective distance to objects in each scene predicted much of the variance predicted by a model based on object categories. Furthermore, the response variance explained by these three models is largely shared, and the individual models explain little unique variance in responses. Based on an evaluation of previous studies and the data we present here, we conclude that there is currently no good basis to favor any one of the three alternative hypotheses about visual representation in scene-selective areas. We offer suggestions for further studies that may help resolve this issue. PMID:26594164
Objects predict fixations better than early saliency.

PubMed

Einhäuser, Wolfgang; Spain, Merrielle; Perona, Pietro

2008-11-20

Humans move their eyes while looking at scenes and pictures. Eye movements correlate with shifts in attention and are thought to be a consequence of optimal resource allocation for high-level tasks such as visual recognition. Models of attention, such as "saliency maps," are often built on the assumption that "early" features (color, contrast, orientation, motion, and so forth) drive attention directly. We explore an alternative hypothesis: Observers attend to "interesting" objects. To test this hypothesis, we measure the eye position of human observers while they inspect photographs of common natural scenes. Our observers perform different tasks: artistic evaluation, analysis of content, and search. Immediately after each presentation, our observers are asked to name objects they saw. Weighted with recall frequency, these objects predict fixations in individual images better than early saliency, irrespective of task. Also, saliency combined with object positions predicts which objects are frequently named. This suggests that early saliency has only an indirect effect on attention, acting through recognized objects. Consequently, rather than treating attention as mere preprocessing step for object recognition, models of both need to be integrated.
Scene-based nonuniformity correction with video sequences and registration.

PubMed

Hardie, R C; Hayat, M M; Armstrong, E; Yasuda, B

2000-03-10

We describe a new, to our knowledge, scene-based nonuniformity correction algorithm for array detectors. The algorithm relies on the ability to register a sequence of observed frames in the presence of the fixed-pattern noise caused by pixel-to-pixel nonuniformity. In low-to-moderate levels of nonuniformity, sufficiently accurate registration may be possible with standard scene-based registration techniques. If the registration is accurate, and motion exists between the frames, then groups of independent detectors can be identified that observe the same irradiance (or true scene value). These detector outputs are averaged to generate estimates of the true scene values. With these scene estimates, and the corresponding observed values through a given detector, a curve-fitting procedure is used to estimate the individual detector response parameters. These can then be used to correct for detector nonuniformity. The strength of the algorithm lies in its simplicity and low computational complexity. Experimental results, to illustrate the performance of the algorithm, include the use of visible-range imagery with simulated nonuniformity and infrared imagery with real nonuniformity.
On the Integration of Medium Wave Infrared Cameras for Vision-Based Navigation

DTIC Science & Technology

2015-03-01

SWIR Short Wave Infrared VisualSFM Visual Structure from Motion WPAFB Wright Patterson Air Force Base xi ON THE INTEGRATION OF MEDIUM WAVE INFRARED...Structure from Motion Visual Structure from Motion ( VisualSFM ) is an application that performs incremental SfM using images fed into it of a scene [20...too drastically in between frames. When this happens, VisualSFM will begin creating a new model with images that do not fit to the old one. These new
Robust multiple cue fusion-based high-speed and nonrigid object tracking algorithm for short track speed skating

NASA Astrophysics Data System (ADS)

Liu, Chenguang; Cheng, Heng-Da; Zhang, Yingtao; Wang, Yuxuan; Xian, Min

2016-01-01

This paper presents a methodology for tracking multiple skaters in short track speed skating competitions. Nonrigid skaters move at high speed with severe occlusions happening frequently among them. The camera is panned quickly in order to capture the skaters in a large and dynamic scene. To automatically track the skaters and precisely output their trajectories becomes a challenging task in object tracking. We employ the global rink information to compensate camera motion and obtain the global spatial information of skaters, utilize random forest to fuse multiple cues and predict the blob of each skater, and finally apply a silhouette- and edge-based template-matching and blob-evolving method to labelling pixels to a skater. The effectiveness and robustness of the proposed method are verified through thorough experiments.
A model of proto-object based saliency

PubMed Central

Russell, Alexander F.; Mihalaş, Stefan; von der Heydt, Rudiger; Niebur, Ernst; Etienne-Cummings, Ralph

2013-01-01

Organisms use the process of selective attention to optimally allocate their computational resources to the instantaneously most relevant subsets of a visual scene, ensuring that they can parse the scene in real time. Many models of bottom-up attentional selection assume that elementary image features, like intensity, color and orientation, attract attention. Gestalt psychologists, how-ever, argue that humans perceive whole objects before they analyze individual features. This is supported by recent psychophysical studies that show that objects predict eye-fixations better than features. In this report we present a neurally inspired algorithm of object based, bottom-up attention. The model rivals the performance of state of the art non-biologically plausible feature based algorithms (and outperforms biologically plausible feature based algorithms) in its ability to predict perceptual saliency (eye fixations and subjective interest points) in natural scenes. The model achieves this by computing saliency as a function of proto-objects that establish the perceptual organization of the scene. All computational mechanisms of the algorithm have direct neural correlates, and our results provide evidence for the interface theory of attention. PMID:24184601
Learning object-to-class kernels for scene classification.

PubMed

Zhang, Lei; Zhen, Xiantong; Shao, Ling

2014-08-01

High-level image representations have drawn increasing attention in visual recognition, e.g., scene classification, since the invention of the object bank. The object bank represents an image as a response map of a large number of pretrained object detectors and has achieved superior performance for visual recognition. In this paper, based on the object bank representation, we propose the object-to-class (O2C) distances to model scene images. In particular, four variants of O2C distances are presented, and with the O2C distances, we can represent the images using the object bank by lower-dimensional but more discriminative spaces, called distance spaces, which are spanned by the O2C distances. Due to the explicit computation of O2C distances based on the object bank, the obtained representations can possess more semantic meanings. To combine the discriminant ability of the O2C distances to all scene classes, we further propose to kernalize the distance representation for the final classification. We have conducted extensive experiments on four benchmark data sets, UIUC-Sports, Scene-15, MIT Indoor, and Caltech-101, which demonstrate that the proposed approaches can significantly improve the original object bank approach and achieve the state-of-the-art performance.
Cortical Representations of Speech in a Multitalker Auditory Scene.

PubMed

Puvvada, Krishna C; Simon, Jonathan Z

2017-09-20

The ability to parse a complex auditory scene into perceptual objects is facilitated by a hierarchical auditory system. Successive stages in the hierarchy transform an auditory scene of multiple overlapping sources, from peripheral tonotopically based representations in the auditory nerve, into perceptually distinct auditory-object-based representations in the auditory cortex. Here, using magnetoencephalography recordings from men and women, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in distinct hierarchical stages of the auditory cortex. Using systems-theoretic methods of stimulus reconstruction, we show that the primary-like areas in the auditory cortex contain dominantly spectrotemporal-based representations of the entire auditory scene. Here, both attended and ignored speech streams are represented with almost equal fidelity, and a global representation of the full auditory scene with all its streams is a better candidate neural representation than that of individual streams being represented separately. We also show that higher-order auditory cortical areas, by contrast, represent the attended stream separately and with significantly higher fidelity than unattended streams. Furthermore, the unattended background streams are more faithfully represented as a single unsegregated background object rather than as separated objects. Together, these findings demonstrate the progression of the representations and processing of a complex acoustic scene up through the hierarchy of the human auditory cortex. SIGNIFICANCE STATEMENT Using magnetoencephalography recordings from human listeners in a simulated cocktail party environment, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in separate hierarchical stages of the auditory cortex. We show that the primary-like areas in the auditory cortex use a dominantly spectrotemporal-based representation of the entire auditory scene, with both attended and unattended speech streams represented with almost equal fidelity. We also show that higher-order auditory cortical areas, by contrast, represent an attended speech stream separately from, and with significantly higher fidelity than, unattended speech streams. Furthermore, the unattended background streams are represented as a single undivided background object rather than as distinct background objects. Copyright © 2017 the authors 0270-6474/17/379189-08$15.00/0.
Complex scenes and situations visualization in hierarchical learning algorithm with dynamic 3D NeoAxis engine

NASA Astrophysics Data System (ADS)

Graham, James; Ternovskiy, Igor V.

2013-06-01

We applied a two stage unsupervised hierarchical learning system to model complex dynamic surveillance and cyber space monitoring systems using a non-commercial version of the NeoAxis visualization software. The hierarchical scene learning and recognition approach is based on hierarchical expectation maximization, and was linked to a 3D graphics engine for validation of learning and classification results and understanding the human - autonomous system relationship. Scene recognition is performed by taking synthetically generated data and feeding it to a dynamic logic algorithm. The algorithm performs hierarchical recognition of the scene by first examining the features of the objects to determine which objects are present, and then determines the scene based on the objects present. This paper presents a framework within which low level data linked to higher-level visualization can provide support to a human operator and be evaluated in a detailed and systematic way.
Attentional synchrony and the influence of viewing task on gaze behavior in static and dynamic scenes.

PubMed

Smith, Tim J; Mital, Parag K

2013-07-17

Does viewing task influence gaze during dynamic scene viewing? Research into the factors influencing gaze allocation during free viewing of dynamic scenes has reported that the gaze of multiple viewers clusters around points of high motion (attentional synchrony), suggesting that gaze may be primarily under exogenous control. However, the influence of viewing task on gaze behavior in static scenes and during real-world interaction has been widely demonstrated. To dissociate exogenous from endogenous factors during dynamic scene viewing we tracked participants' eye movements while they (a) freely watched unedited videos of real-world scenes (free viewing) or (b) quickly identified where the video was filmed (spot-the-location). Static scenes were also presented as controls for scene dynamics. Free viewing of dynamic scenes showed greater attentional synchrony, longer fixations, and more gaze to people and areas of high flicker compared with static scenes. These differences were minimized by the viewing task. In comparison with the free viewing of dynamic scenes, during the spot-the-location task fixation durations were shorter, saccade amplitudes were longer, and gaze exhibited less attentional synchrony and was biased away from areas of flicker and people. These results suggest that the viewing task can have a significant influence on gaze during a dynamic scene but that endogenous control is slow to kick in as initial saccades default toward the screen center, areas of high motion and people before shifting to task-relevant features. This default-like viewing behavior returns after the viewing task is completed, confirming that gaze behavior is more predictable during free viewing of dynamic than static scenes but that this may be due to natural correlation between regions of interest (e.g., people) and motion.
Color appearance in stereoscopy

NASA Astrophysics Data System (ADS)

Gadia, Davide; Rizzi, Alessandro; Bonanomi, Cristian; Marini, Daniele; Galmonte, Alessandra; Agostini, Tiziano

2011-03-01

The relationship between color and lightness appearance and the perception of depth has been studied since a while in the field of perceptual psychology and psycho-physiology. It has been found that depth perception affects the final object color and lightness appearance. In the stereoscopy research field, many studies have been proposed on human physiological effects, considering e.g. geometry, motion sickness, etc., but few has been done considering lightness and color information. Goal of this paper is to realize some preliminar experiments in Virtual Reality in order to determine the effects of depth perception on object color and lightness appearance. We have created a virtual test scene with a simple 3D simultaneous contrast configuration. We have created three different versions of this scene, each with different choices of relative positions and apparent size of the objects. We have collected the perceptual responses of several users after the observation of the test scene in the Virtual Theater of the University of Milan, a VR immersive installation characterized by a semi-cylindrical screen that covers 120° of horizontal field of view from an observation distance of 3.5 m. We present a description of the experiments setup and procedure, and we discuss the obtained results.
Integrated VR platform for 3D and image-based models: a step toward interactive image-based virtual environments

NASA Astrophysics Data System (ADS)

Yoon, Jayoung; Kim, Gerard J.

2003-04-01

Traditionally, three dimension models have been used for building virtual worlds, and a data structure called the "scene graph" is often employed to organize these 3D objects in the virtual space. On the other hand, image-based rendering has recently been suggested as a probable alternative VR platform for its photo-realism, however, due to limited interactivity, it has only been used for simple navigation systems. To combine the merits of these two approaches to object/scene representations, this paper proposes for a scene graph structure in which both 3D models and various image-based scenes/objects can be defined, traversed, and rendered together. In fact, as suggested by Shade et al., these different representations can be used as different LOD's for a given object. For instance, an object might be rendered using a 3D model at close range, a billboard at an intermediate range, and as part of an environment map at far range. The ultimate objective of this mixed platform is to breath more interactivity into the image based rendered VE's by employing 3D models as well. There are several technical challenges in devising such a platform: designing scene graph nodes for various types of image based techniques, establishing criteria for LOD/representation selection, handling their transitions, implementing appropriate interaction schemes, and correctly rendering the overall scene. Currently, we have extended the scene graph structure of the Sense8's WorldToolKit, to accommodate new node types for environment maps billboards, moving textures and sprites, "Tour-into-the-Picture" structure, and view interpolated objects. As for choosing the right LOD level, the usual viewing distance and image space criteria are used, however, the switching between the image and 3D model occurs at a distance from the user where the user starts to perceive the object's internal depth. Also, during interaction, regardless of the viewing distance, a 3D representation would be used, it if exists. Before rendering, objects are conservatively culled from the view frustum using the representation with the largest volume. Finally, we carried out experiments to verify the theoretical derivation of the switching rule and obtained positive results.
Visual Depth from Motion Parallax and Eye Pursuit

PubMed Central

Stroyan, Keith; Nawrot, Mark

2012-01-01

A translating observer viewing a rigid environment experiences “motion parallax,” the relative movement upon the observer’s retina of variously positioned objects in the scene. This retinal movement of images provides a cue to the relative depth of objects in the environment, however retinal motion alone cannot mathematically determine relative depth of the objects. Visual perception of depth from lateral observer translation uses both retinal image motion and eye movement. In (Nawrot & Stroyan, 2009, Vision Res. 49, p.1969) we showed mathematically that the ratio of the rate of retinal motion over the rate of smooth eye pursuit mathematically determines depth relative to the fixation point in central vision. We also reported on psychophysical experiments indicating that this ratio is the important quantity for perception. Here we analyze the motion/pursuit cue for the more general, and more complicated, case when objects are distributed across the horizontal viewing plane beyond central vision. We show how the mathematical motion/pursuit cue varies with different points across the plane and with time as an observer translates. If the time varying retinal motion and smooth eye pursuit are the only signals used for this visual process, it is important to know what is mathematically possible to derive about depth and structure. Our analysis shows that the motion/pursuit ratio determines an excellent description of depth and structure in these broader stimulus conditions, provides a detailed quantitative hypothesis of these visual processes for the perception of depth and structure from motion parallax, and provides a computational foundation to analyze the dynamic geometry of future experiments. PMID:21695531
Rendering visual events as sounds: Spatial attention capture by auditory augmented reality.

PubMed

Stone, Scott A; Tata, Matthew S

2017-01-01

Many salient visual events tend to coincide with auditory events, such as seeing and hearing a car pass by. Information from the visual and auditory senses can be used to create a stable percept of the stimulus. Having access to related coincident visual and auditory information can help for spatial tasks such as localization. However not all visual information has analogous auditory percepts, such as viewing a computer monitor. Here, we describe a system capable of detecting and augmenting visual salient events into localizable auditory events. The system uses a neuromorphic camera (DAVIS 240B) to detect logarithmic changes of brightness intensity in the scene, which can be interpreted as salient visual events. Participants were blindfolded and asked to use the device to detect new objects in the scene, as well as determine direction of motion for a moving visual object. Results suggest the system is robust enough to allow for the simple detection of new salient stimuli, as well accurately encoding direction of visual motion. Future successes are probable as neuromorphic devices are likely to become faster and smaller in the future, making this system much more feasible.
Rendering visual events as sounds: Spatial attention capture by auditory augmented reality

PubMed Central

Tata, Matthew S.

2017-01-01

Many salient visual events tend to coincide with auditory events, such as seeing and hearing a car pass by. Information from the visual and auditory senses can be used to create a stable percept of the stimulus. Having access to related coincident visual and auditory information can help for spatial tasks such as localization. However not all visual information has analogous auditory percepts, such as viewing a computer monitor. Here, we describe a system capable of detecting and augmenting visual salient events into localizable auditory events. The system uses a neuromorphic camera (DAVIS 240B) to detect logarithmic changes of brightness intensity in the scene, which can be interpreted as salient visual events. Participants were blindfolded and asked to use the device to detect new objects in the scene, as well as determine direction of motion for a moving visual object. Results suggest the system is robust enough to allow for the simple detection of new salient stimuli, as well accurately encoding direction of visual motion. Future successes are probable as neuromorphic devices are likely to become faster and smaller in the future, making this system much more feasible. PMID:28792518
Three-dimensional model-based object recognition and segmentation in cluttered scenes.

PubMed

Mian, Ajmal S; Bennamoun, Mohammed; Owens, Robyn

2006-10-01

Viewpoint independent recognition of free-form objects and their segmentation in the presence of clutter and occlusions is a challenging task. We present a novel 3D model-based algorithm which performs this task automatically and efficiently. A 3D model of an object is automatically constructed offline from its multiple unordered range images (views). These views are converted into multidimensional table representations (which we refer to as tensors). Correspondences are automatically established between these views by simultaneously matching the tensors of a view with those of the remaining views using a hash table-based voting scheme. This results in a graph of relative transformations used to register the views before they are integrated into a seamless 3D model. These models and their tensor representations constitute the model library. During online recognition, a tensor from the scene is simultaneously matched with those in the library by casting votes. Similarity measures are calculated for the model tensors which receive the most votes. The model with the highest similarity is transformed to the scene and, if it aligns accurately with an object in the scene, that object is declared as recognized and is segmented. This process is repeated until the scene is completely segmented. Experiments were performed on real and synthetic data comprised of 55 models and 610 scenes and an overall recognition rate of 95 percent was achieved. Comparison with the spin images revealed that our algorithm is superior in terms of recognition rate and efficiency.
Evaluating methods for controlling depth perception in stereoscopic cinematography

NASA Astrophysics Data System (ADS)

Sun, Geng; Holliman, Nick

2009-02-01

Existing stereoscopic imaging algorithms can create static stereoscopic images with perceived depth control function to ensure a compelling 3D viewing experience without visual discomfort. However, current algorithms do not normally support standard Cinematic Storytelling techniques. These techniques, such as object movement, camera motion, and zooming, can result in dynamic scene depth change within and between a series of frames (shots) in stereoscopic cinematography. In this study, we empirically evaluate the following three types of stereoscopic imaging approaches that aim to address this problem. (1) Real-Eye Configuration: set camera separation equal to the nominal human eye interpupillary distance. The perceived depth on the display is identical to the scene depth without any distortion. (2) Mapping Algorithm: map the scene depth to a predefined range on the display to avoid excessive perceived depth. A new method that dynamically adjusts the depth mapping from scene space to display space is presented in addition to an existing fixed depth mapping method. (3) Depth of Field Simulation: apply Depth of Field (DOF) blur effect to stereoscopic images. Only objects that are inside the DOF are viewed in full sharpness. Objects that are far away from the focus plane are blurred. We performed a human-based trial using the ITU-R BT.500-11 Recommendation to compare the depth quality of stereoscopic video sequences generated by the above-mentioned imaging methods. Our results indicate that viewers' practical 3D viewing volumes are different for individual stereoscopic displays and viewers can cope with much larger perceived depth range in viewing stereoscopic cinematography in comparison to static stereoscopic images. Our new dynamic depth mapping method does have an advantage over the fixed depth mapping method in controlling stereo depth perception. The DOF blur effect does not provide the expected improvement for perceived depth quality control in 3D cinematography. We anticipate the results will be of particular interest to 3D filmmaking and real time computer games.
Modeling human perception and estimation of kinematic responses during aircraft landing

NASA Technical Reports Server (NTRS)

Schmidt, David K.; Silk, Anthony B.

1988-01-01

The thrust of this research is to determine estimation accuracy of aircraft responses based on observed cues. By developing the geometric relationships between the outside visual scene and the kinematics during landing, visual and kinesthetic cues available to the pilot were modeled. Both fovial and peripheral vision was examined. The objective was to first determine estimation accuracy in a variety of flight conditions, and second to ascertain which parameters are most important and lead to the best achievable accuracy in estimating the actual vehicle response. It was found that altitude estimation was very sensitive to the FOV. For this model the motion cue of perceived vertical acceleration was shown to be less important than the visual cues. The inclusion of runway geometry in the visual scene increased estimation accuracy in most cases. Finally, it was shown that for this model if the pilot has an incorrect internal model of the system kinematics the choice of observations thought to be 'optimal' may in fact be suboptimal.

Effects of Spatio-Temporal Aliasing on Out-the-Window Visual Systems

NASA Technical Reports Server (NTRS)

Sweet, Barbara T.; Stone, Leland S.; Liston, Dorion B.; Hebert, Tim M.

2014-01-01

Designers of out-the-window visual systems face a challenge when attempting to simulate the outside world as viewed from a cockpit. Many methodologies have been developed and adopted to aid in the depiction of particular scene features, or levels of static image detail. However, because aircraft move, it is necessary to also consider the quality of the motion in the simulated visual scene. When motion is introduced in the simulated visual scene, perceptual artifacts can become apparent. A particular artifact related to image motion, spatiotemporal aliasing, will be addressed. The causes of spatio-temporal aliasing will be discussed, and current knowledge regarding the impact of these artifacts on both motion perception and simulator task performance will be reviewed. Methods of reducing the impact of this artifact are also addressed
The elephant in the room: Inconsistency in scene viewing and representation.

PubMed

Spotorno, Sara; Tatler, Benjamin W

2017-10-01

We examined the extent to which semantic informativeness, consistency with expectations and perceptual salience contribute to object prioritization in scene viewing and representation. In scene viewing (Experiments 1-2), semantic guidance overshadowed perceptual guidance in determining fixation order, with the greatest prioritization for objects that were diagnostic of the scene's depicted event. Perceptual properties affected selection of consistent objects (regardless of their informativeness) but not of inconsistent objects. Semantic and perceptual properties also interacted in influencing foveal inspection, as inconsistent objects were fixated longer than low but not high salience diagnostic objects. While not studied in direct competition with each other (each studied in competition with diagnostic objects), we found that inconsistent objects were fixated earlier and for longer than consistent but marginally informative objects. In change detection (Experiment 3), perceptual guidance overshadowed semantic guidance, promoting detection of highly salient changes. A residual advantage for diagnosticity over inconsistency emerged only when selection prioritization could not be based on low-level features. Overall these findings show that semantic inconsistency is not prioritized within a scene when competing with other relevant information that is essential to scene understanding and respects observers' expectations. Moreover, they reveal that the relative dominance of semantic or perceptual properties during selection depends on ongoing task requirements. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Developmental changes in attention to faces and bodies in static and dynamic scenes.

PubMed

Stoesz, Brenda M; Jakobson, Lorna S

2014-01-01

Typically developing individuals show a strong visual preference for faces and face-like stimuli; however, this may come at the expense of attending to bodies or to other aspects of a scene. The primary goal of the present study was to provide additional insight into the development of attentional mechanisms that underlie perception of real people in naturalistic scenes. We examined the looking behaviors of typical children, adolescents, and young adults as they viewed static and dynamic scenes depicting one or more people. Overall, participants showed a bias to attend to faces more than on other parts of the scenes. Adding motion cues led to a reduction in the number, but an increase in the average duration of face fixations in single-character scenes. When multiple characters appeared in a scene, motion-related effects were attenuated and participants shifted their gaze from faces to bodies, or made off-screen glances. Children showed the largest effects related to the introduction of motion cues or additional characters, suggesting that they find dynamic faces difficult to process, and are especially prone to look away from faces when viewing complex social scenes-a strategy that could reduce the cognitive and the affective load imposed by having to divide one's attention between multiple faces. Our findings provide new insights into the typical development of social attention during natural scene viewing, and lay the foundation for future work examining gaze behaviors in typical and atypical development.
Effect of Viewing Smoking Scenes in Motion Pictures on Subsequent Smoking Desire in Audiences in South Korea.

PubMed

Sohn, Minsung; Jung, Minsoo

2017-07-17

In the modern era of heightened awareness of public health, smoking scenes in movies remain relatively free from public monitoring. The effect of smoking scenes in movies on the promotion of viewers' smoking desire remains unknown. The study aimed to explore whether exposure of adolescent smokers to images of smoking in fılms could stimulate smoking behavior. Data were derived from a national Web-based sample survey of 748 Korean high-school students. Participants aged 16-18 years were randomly assigned to watch three short video clips with or without smoking scenes. After adjusting covariates using propensity score matching, paired sample t test and logistic regression analyses compared the difference in smoking desire before and after exposure of participants to smoking scenes. For male adolescents, cigarette craving was significantly higher in those who watched movies with smoking scenes than in the control group who did not view smoking scenes (t 307.96 =2.066, P<.05). In the experimental group, too, cigarette cravings of adolescents after viewing smoking scenes were significantly higher than they were before watching smoking scenes (t 161.00 =2.867, P<.01). After adjusting for covariates, more impulsive adolescents, particularly males, had significantly higher cigarette cravings: adjusted odds ratio (aOR) 3.40 (95% CI 1.40-8.23). However, those who actively sought health information had considerably lower cigarette cravings than those who did not engage in information-seeking: aOR 0.08 (95% CI 0.01-0.88). Smoking scenes in motion pictures may increase male adolescent smoking desire. Establishing a standard that restricts the frequency of smoking scenes in films and assigning a smoking-related screening grade to films is warranted. ©Minsung Sohn, Minsoo Jung. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 17.07.2017.
Peripheral Visual Cues Contribute to the Perception of Object Movement During Self-Movement

PubMed Central

Rogers, Cassandra; Warren, Paul A.

2017-01-01

Safe movement through the environment requires us to monitor our surroundings for moving objects or people. However, identification of moving objects in the scene is complicated by self-movement, which adds motion across the retina. To identify world-relative object movement, the brain thus has to ‘compensate for’ or ‘parse out’ the components of retinal motion that are due to self-movement. We have previously demonstrated that retinal cues arising from central vision contribute to solving this problem. Here, we investigate the contribution of peripheral vision, commonly thought to provide strong cues to self-movement. Stationary participants viewed a large field of view display, with radial flow patterns presented in the periphery, and judged the trajectory of a centrally presented probe. Across two experiments, we demonstrate and quantify the contribution of peripheral optic flow to flow parsing during forward and backward movement. PMID:29201335
Method for targetless tracking subpixel in-plane movements.

PubMed

Espinosa, Julian; Perez, Jorge; Ferrer, Belen; Mas, David

2015-09-01

We present a targetless motion tracking method for detecting planar movements with subpixel accuracy. This method is based on the computation and tracking of the intersection of two nonparallel straight-line segments in the image of a moving object in a scene. The method is simple and easy to implement because no complex structures have to be detected. It has been tested and validated using a lab experiment consisting of a vibrating object that was recorded with a high-speed camera working at 1000 fps. We managed to track displacements with an accuracy of hundredths of pixel or even of thousandths of pixel in the case of tracking harmonic vibrations. The method is widely applicable because it can be used for distance measuring amplitude and frequency of vibrations with a vision system.
Vestibular nuclei and cerebellum put visual gravitational motion in context.

PubMed

Miller, William L; Maffei, Vincenzo; Bosco, Gianfranco; Iosa, Marco; Zago, Myrka; Macaluso, Emiliano; Lacquaniti, Francesco

2008-04-01

Animal survival in the forest, and human success on the sports field, often depend on the ability to seize a target on the fly. All bodies fall at the same rate in the gravitational field, but the corresponding retinal motion varies with apparent viewing distance. How then does the brain predict time-to-collision under gravity? A perspective context from natural or pictorial settings might afford accurate predictions of gravity's effects via the recovery of an environmental reference from the scene structure. We report that embedding motion in a pictorial scene facilitates interception of gravitational acceleration over unnatural acceleration, whereas a blank scene eliminates such bias. Functional magnetic resonance imaging (fMRI) revealed blood-oxygen-level-dependent correlates of these visual context effects on gravitational motion processing in the vestibular nuclei and posterior cerebellar vermis. Our results suggest an early stage of integration of high-level visual analysis with gravity-related motion information, which may represent the substrate for perceptual constancy of ubiquitous gravitational motion.
SWCD: a sliding window and self-regulated learning-based background updating method for change detection in videos

NASA Astrophysics Data System (ADS)

Işık, Şahin; Özkan, Kemal; Günal, Serkan; Gerek, Ömer Nezih

2018-03-01

Change detection with background subtraction process remains to be an unresolved issue and attracts research interest due to challenges encountered on static and dynamic scenes. The key challenge is about how to update dynamically changing backgrounds from frames with an adaptive and self-regulated feedback mechanism. In order to achieve this, we present an effective change detection algorithm for pixelwise changes. A sliding window approach combined with dynamic control of update parameters is introduced for updating background frames, which we called sliding window-based change detection. Comprehensive experiments on related test videos show that the integrated algorithm yields good objective and subjective performance by overcoming illumination variations, camera jitters, and intermittent object motions. It is argued that the obtained method makes a fair alternative in most types of foreground extraction scenarios; unlike case-specific methods, which normally fail for their nonconsidered scenarios.
A novel role for visual perspective cues in the neural computation of depth.

PubMed

Kim, HyungGoo R; Angelaki, Dora E; DeAngelis, Gregory C

2015-01-01

As we explore a scene, our eye movements add global patterns of motion to the retinal image, complicating visual motion produced by self-motion or moving objects. Conventionally, it has been assumed that extraretinal signals, such as efference copy of smooth pursuit commands, are required to compensate for the visual consequences of eye rotations. We consider an alternative possibility: namely, that the visual system can infer eye rotations from global patterns of image motion. We visually simulated combinations of eye translation and rotation, including perspective distortions that change dynamically over time. We found that incorporating these 'dynamic perspective' cues allowed the visual system to generate selectivity for depth sign from motion parallax in macaque cortical area MT, a computation that was previously thought to require extraretinal signals regarding eye velocity. Our findings suggest neural mechanisms that analyze global patterns of visual motion to perform computations that require knowledge of eye rotations.
Judder-Induced Edge Flicker at Zero Spatial Contrast

NASA Technical Reports Server (NTRS)

Larimer, James; Feng, Christine; Gille, Jennifer; Cheung, Victor

2004-01-01

Judder is a motion artifact that degrades the quality of video imagery. Smooth motion appears jerky and can appear to flicker along the leading and trailing edge of the moving object. In a previous paper, we demonstrated that the strength of the edge flicker signal depended upon the brightness of the scene and the contrast of the moving object relative to the background. Reducing the contrast between foreground and background reduced the flicker signal. In this report, we show that the contrast signal required for judder-induced edge flicker is due to temporal contrast and not simply to spatial contrast. Bars made of random dots of the same dot density as the background exhibit edge flicker when moved at sufficient rate.
Anticipation in Real-World Scenes: The Role of Visual Context and Visual Memory.

PubMed

Coco, Moreno I; Keller, Frank; Malcolm, George L

2016-11-01

The human sentence processor is able to make rapid predictions about upcoming linguistic input. For example, upon hearing the verb eat, anticipatory eye-movements are launched toward edible objects in a visual scene (Altmann & Kamide, 1999). However, the cognitive mechanisms that underlie anticipation remain to be elucidated in ecologically valid contexts. Previous research has, in fact, mainly used clip-art scenes and object arrays, raising the possibility that anticipatory eye-movements are limited to displays containing a small number of objects in a visually impoverished context. In Experiment 1, we confirm that anticipation effects occur in real-world scenes and investigate the mechanisms that underlie such anticipation. In particular, we demonstrate that real-world scenes provide contextual information that anticipation can draw on: When the target object is not present in the scene, participants infer and fixate regions that are contextually appropriate (e.g., a table upon hearing eat). Experiment 2 investigates whether such contextual inference requires the co-presence of the scene, or whether memory representations can be utilized instead. The same real-world scenes as in Experiment 1 are presented to participants, but the scene disappears before the sentence is heard. We find that anticipation occurs even when the screen is blank, including when contextual inference is required. We conclude that anticipatory language processing is able to draw upon global scene representations (such as scene type) to make contextual inferences. These findings are compatible with theories assuming contextual guidance, but posit a challenge for theories assuming object-based visual indices. Copyright © 2015 Cognitive Science Society, Inc.
Semantic-based surveillance video retrieval.

PubMed

Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve

2007-04-01

Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.
Inertial navigation sensor integrated motion analysis for autonomous vehicle navigation

NASA Technical Reports Server (NTRS)

Roberts, Barry; Bhanu, Bir

1992-01-01

Recent work on INS integrated motion analysis is described. Results were obtained with a maximally passive system of obstacle detection (OD) for ground-based vehicles and rotorcraft. The OD approach involves motion analysis of imagery acquired by a passive sensor in the course of vehicle travel to generate range measurements to world points within the sensor FOV. INS data and scene analysis results are used to enhance interest point selection, the matching of the interest points, and the subsequent motion-based computations, tracking, and OD. The most important lesson learned from the research described here is that the incorporation of inertial data into the motion analysis program greatly improves the analysis and makes the process more robust.
Implementation of jump-diffusion algorithms for understanding FLIR scenes

NASA Astrophysics Data System (ADS)

Lanterman, Aaron D.; Miller, Michael I.; Snyder, Donald L.

1995-07-01

Our pattern theoretic approach to the automated understanding of forward-looking infrared (FLIR) images brings the traditionally separate endeavors of detection, tracking, and recognition together into a unified jump-diffusion process. New objects are detected and object types are recognized through discrete jump moves. Between jumps, the location and orientation of objects are estimated via continuous diffusions. An hypothesized scene, simulated from the emissive characteristics of the hypothesized scene elements, is compared with the collected data by a likelihood function based on sensor statistics. This likelihood is combined with a prior distribution defined over the set of possible scenes to form a posterior distribution. The jump-diffusion process empirically generates the posterior distribution. Both the diffusion and jump operations involve the simulation of a scene produced by a hypothesized configuration. Scene simulation is most effectively accomplished by pipelined rendering engines such as silicon graphics. We demonstrate the execution of our algorithm on a silicon graphics onyx/reality engine.
Dual Use of Image Based Tracking Techniques: Laser Eye Surgery and Low Vision Prosthesis

NASA Technical Reports Server (NTRS)

Juday, Richard D.; Barton, R. Shane

1994-01-01

With a concentration on Fourier optics pattern recognition, we have developed several methods of tracking objects in dynamic imagery to automate certain space applications such as orbital rendezvous and spacecraft capture, or planetary landing. We are developing two of these techniques for Earth applications in real-time medical image processing. The first is warping of a video image, developed to evoke shift invariance to scale and rotation in correlation pattern recognition. The technology is being applied to compensation for certain field defects in low vision humans. The second is using the optical joint Fourier transform to track the translation of unmodeled scenes. Developed as an image fixation tool to assist in calculating shape from motion, it is being applied to tracking motions of the eyeball quickly enough to keep a laser photocoagulation spot fixed on the retina, thus avoiding collateral damage.
Dual use of image based tracking techniques: Laser eye surgery and low vision prosthesis

NASA Technical Reports Server (NTRS)

Juday, Richard D.

1994-01-01

With a concentration on Fourier optics pattern recognition, we have developed several methods of tracking objects in dynamic imagery to automate certain space applications such as orbital rendezvous and spacecraft capture, or planetary landing. We are developing two of these techniques for Earth applications in real-time medical image processing. The first is warping of a video image, developed to evoke shift invariance to scale and rotation in correlation pattern recognition. The technology is being applied to compensation for certain field defects in low vision humans. The second is using the optical joint Fourier transform to track the translation of unmodeled scenes. Developed as an image fixation tool to assist in calculating shape from motion, it is being applied to tracking motions of the eyeball quickly enough to keep a laser photocoagulation spot fixed on the retina, thus avoiding collateral damage.
Deconstructing Visual Scenes in Cortex: Gradients of Object and Spatial Layout Information

PubMed Central

Kravitz, Dwight J.; Baker, Chris I.

2013-01-01

Real-world visual scenes are complex cluttered, and heterogeneous stimuli engaging scene- and object-selective cortical regions including parahippocampal place area (PPA), retrosplenial complex (RSC), and lateral occipital complex (LOC). To understand the unique contribution of each region to distributed scene representations, we generated predictions based on a neuroanatomical framework adapted from monkey and tested them using minimal scenes in which we independently manipulated both spatial layout (open, closed, and gradient) and object content (furniture, e.g., bed, dresser). Commensurate with its strong connectivity with posterior parietal cortex, RSC evidenced strong spatial layout information but no object information, and its response was not even modulated by object presence. In contrast, LOC, which lies within the ventral visual pathway, contained strong object information but no background information. Finally, PPA, which is connected with both the dorsal and the ventral visual pathway, showed information about both objects and spatial backgrounds and was sensitive to the presence or absence of either. These results suggest that 1) LOC, PPA, and RSC have distinct representations, emphasizing different aspects of scenes, 2) the specific representations in each region are predictable from their patterns of connectivity, and 3) PPA combines both spatial layout and object information as predicted by connectivity. PMID:22473894
Topography-Dependent Motion Compensation: Application to UAVSAR Data

NASA Technical Reports Server (NTRS)

Jones, Cathleen E.; Hensley, Scott; Michel, Thierry

2009-01-01

The UAVSAR L-band synthetic aperture radar system has been designed for repeat track interferometry in support of Earth science applications that require high-precision measurements of small surface deformations over timescales from hours to years. Conventional motion compensation algorithms, which are based upon assumptions of a narrow beam and flat terrain, yield unacceptably large errors in areas with even moderate topographic relief, i.e., in most areas of interest. This often limits the ability to achieve sub-centimeter surface change detection over significant portions of an acquired scene. To reduce this source of error in the interferometric phase, we have implemented an advanced motion compensation algorithm that corrects for the scene topography and radar beam width. Here we discuss the algorithm used, its implementation in the UAVSAR data processor, and the improvement in interferometric phase and correlation achieved in areas with significant topographic relief.
Visual Persons Behavior Diary Generation Model based on Trajectories and Pose Estimation

NASA Astrophysics Data System (ADS)

Gang, Chen; Bin, Chen; Yuming, Liu; Hui, Li

2018-03-01

The behavior pattern of persons was the important output of the surveillance analysis. This paper focus on the generation model of visual person behavior diary. The pipeline includes the person detection, tracking, and the person behavior classify. This paper adopts the deep convolutional neural model YOLO (You Only Look Once)V2 for person detection module. Multi person tracking was based on the detection framework. The Hungarian assignment algorithm was used to the matching. The person appearance model was integrated by HSV color model and Hash code model. The person object motion was estimated by the Kalman Filter. The multi objects were matching with exist tracklets through the appearance and motion location distance by the Hungarian assignment method. A long continuous trajectory for one person was get by the spatial-temporal continual linking algorithm. And the face recognition information was used to identify the trajectory. The trajectories with identification information can be used to generate the visual diary of person behavior based on the scene context information and person action estimation. The relevant modules are tested in public data sets and our own capture video sets. The test results show that the method can be used to generate the visual person behavior pattern diary with certain accuracy.
Mathematical modelling of animate and intentional motion.

PubMed Central

Rittscher, Jens; Blake, Andrew; Hoogs, Anthony; Stein, Gees

2003-01-01

Our aim is to enable a machine to observe and interpret the behaviour of others. Mathematical models are employed to describe certain biological motions. The main challenge is to design models that are both tractable and meaningful. In the first part we will describe how computer vision techniques, in particular visual tracking, can be applied to recognize a small vocabulary of human actions in a constrained scenario. Mainly the problems of viewpoint and scale invariance need to be overcome to formalize a general framework. Hence the second part of the article is devoted to the question whether a particular human action should be captured in a single complex model or whether it is more promising to make extensive use of semantic knowledge and a collection of low-level models that encode certain motion primitives. Scene context plays a crucial role if we intend to give a higher-level interpretation rather than a low-level physical description of the observed motion. A semantic knowledge base is used to establish the scene context. This approach consists of three main components: visual analysis, the mapping from vision to language and the search of the semantic database. A small number of robust visual detectors is used to generate a higher-level description of the scene. The approach together with a number of results is presented in the third part of this article. PMID:12689374

Scene analysis for effective visual search in rough three-dimensional-modeling scenes

NASA Astrophysics Data System (ADS)

Wang, Qi; Hu, Xiaopeng

2016-11-01

Visual search is a fundamental technology in the computer vision community. It is difficult to find an object in complex scenes when there exist similar distracters in the background. We propose a target search method in rough three-dimensional-modeling scenes based on a vision salience theory and camera imaging model. We give the definition of salience of objects (or features) and explain the way that salience measurements of objects are calculated. Also, we present one type of search path that guides to the target through salience objects. Along the search path, when the previous objects are localized, the search region of each subsequent object decreases, which is calculated through imaging model and an optimization method. The experimental results indicate that the proposed method is capable of resolving the ambiguities resulting from distracters containing similar visual features with the target, leading to an improvement of search speed by over 50%.
Smart unattended sensor networks with scene understanding capabilities

NASA Astrophysics Data System (ADS)

Kuvich, Gary

2006-05-01

Unattended sensor systems are new technologies that are supposed to provide enhanced situation awareness to military and law enforcement agencies. A network of such sensors cannot be very effective in field conditions only if it can transmit visual information to human operators or alert them on motion. In the real field conditions, events may happen in many nodes of a network simultaneously. But the real number of control personnel is always limited, and attention of human operators can be simply attracted to particular network nodes, while more dangerous threat may be unnoticed at the same time in the other nodes. Sensor networks would be more effective if equipped with a system that is similar to human vision in its abilities to understand visual information. Human vision uses for that a rough but wide peripheral system that tracks motions and regions of interests, narrow but precise foveal vision that analyzes and recognizes objects in the center of selected region of interest, and visual intelligence that provides scene and object contexts and resolves ambiguity and uncertainty in the visual information. Biologically-inspired Network-Symbolic models convert image information into an 'understandable' Network-Symbolic format, which is similar to relational knowledge models. The equivalent of interaction between peripheral and foveal systems in the network-symbolic system is achieved via interaction between Visual and Object Buffers and the top-level knowledge system.
Track Everything: Limiting Prior Knowledge in Online Multi-Object Recognition.

PubMed

Wong, Sebastien C; Stamatescu, Victor; Gatt, Adam; Kearney, David; Lee, Ivan; McDonnell, Mark D

2017-10-01

This paper addresses the problem of online tracking and classification of multiple objects in an image sequence. Our proposed solution is to first track all objects in the scene without relying on object-specific prior knowledge, which in other systems can take the form of hand-crafted features or user-based track initialization. We then classify the tracked objects with a fast-learning image classifier, that is based on a shallow convolutional neural network architecture and demonstrate that object recognition improves when this is combined with object state information from the tracking algorithm. We argue that by transferring the use of prior knowledge from the detection and tracking stages to the classification stage, we can design a robust, general purpose object recognition system with the ability to detect and track a variety of object types. We describe our biologically inspired implementation, which adaptively learns the shape and motion of tracked objects, and apply it to the Neovision2 Tower benchmark data set, which contains multiple object types. An experimental evaluation demonstrates that our approach is competitive with the state-of-the-art video object recognition systems that do make use of object-specific prior knowledge in detection and tracking, while providing additional practical advantages by virtue of its generality.
Assessing the performance of a motion tracking system based on optical joint transform correlation

NASA Astrophysics Data System (ADS)

Elbouz, M.; Alfalou, A.; Brosseau, C.; Ben Haj Yahia, N.; Alam, M. S.

2015-08-01

We present an optimized system specially designed for the tracking and recognition of moving subjects in a confined environment (such as an elderly remaining at home). In the first step of our study, we use a VanderLugt correlator (VLC) with an adapted pre-processing treatment of the input plane and a postprocessing of the correlation plane via a nonlinear function allowing us to make a robust decision. The second step is based on an optical joint transform correlation (JTC)-based system (NZ-NL-correlation JTC) for achieving improved detection and tracking of moving persons in a confined space. The proposed system has been found to have significantly superior discrimination and robustness capabilities allowing to detect an unknown target in an input scene and to determine the target's trajectory when this target is in motion. This system offers robust tracking performance of a moving target in several scenarios, such as rotational variation of input faces. Test results obtained using various real life video sequences show that the proposed system is particularly suitable for real-time detection and tracking of moving objects.
Does scene context always facilitate retrieval of visual object representations?

PubMed

Nakashima, Ryoichi; Yokosawa, Kazuhiko

2011-04-01

An object-to-scene binding hypothesis maintains that visual object representations are stored as part of a larger scene representation or scene context, and that scene context facilitates retrieval of object representations (see, e.g., Hollingworth, Journal of Experimental Psychology: Learning, Memory and Cognition, 32, 58-69, 2006). Support for this hypothesis comes from data using an intentional memory task. In the present study, we examined whether scene context always facilitates retrieval of visual object representations. In two experiments, we investigated whether the scene context facilitates retrieval of object representations, using a new paradigm in which a memory task is appended to a repeated-flicker change detection task. Results indicated that in normal scene viewing, in which many simultaneous objects appear, scene context facilitation of the retrieval of object representations-henceforth termed object-to-scene binding-occurred only when the observer was required to retain much information for a task (i.e., an intentional memory task).
Plenoptic Image Motion Deblurring.

PubMed

Chandramouli, Paramanand; Jin, Meiguang; Perrone, Daniele; Favaro, Paolo

2018-04-01

We propose a method to remove motion blur in a single light field captured with a moving plenoptic camera. Since motion is unknown, we resort to a blind deconvolution formulation, where one aims to identify both the blur point spread function and the latent sharp image. Even in the absence of motion, light field images captured by a plenoptic camera are affected by a non-trivial combination of both aliasing and defocus, which depends on the 3D geometry of the scene. Therefore, motion deblurring algorithms designed for standard cameras are not directly applicable. Moreover, many state of the art blind deconvolution algorithms are based on iterative schemes, where blurry images are synthesized through the imaging model. However, current imaging models for plenoptic images are impractical due to their high dimensionality. We observe that plenoptic cameras introduce periodic patterns that can be exploited to obtain highly parallelizable numerical schemes to synthesize images. These schemes allow extremely efficient GPU implementations that enable the use of iterative methods. We can then cast blind deconvolution of a blurry light field image as a regularized energy minimization to recover a sharp high-resolution scene texture and the camera motion. Furthermore, the proposed formulation can handle non-uniform motion blur due to camera shake as demonstrated on both synthetic and real light field data.
Core geometry in perspective

PubMed Central

Dillon, Moira R.; Spelke, Elizabeth S.

2015-01-01

Research on animals, infants, children, and adults provides evidence that distinct cognitive systems underlie navigation and object recognition. Here we examine whether and how these systems interact when children interpret 2D edge-based perspectival line drawings of scenes and objects. Such drawings serve as symbols early in development, and they preserve scene and object geometry from canonical points of view. Young children show limits when using geometry both in non-symbolic tasks and in symbolic map tasks that present 3D contexts from unusual, unfamiliar points of view. When presented with the familiar viewpoints in perspectival line drawings, however, do children engage more integrated geometric representations? In three experiments, children successfully interpreted line drawings with respect to their depicted scene or object. Nevertheless, children recruited distinct processes when navigating based on the information in these drawings, and these processes depended on the context in which the drawings were presented. These results suggest that children are flexible but limited in using geometric information to form integrated representations of scenes and objects, even when interpreting spatial symbols that are highly familiar and faithful renditions of the visual world. PMID:25441089
Amygdala activity for the modulation of goal-directed behavior in emotional contexts

PubMed Central

Kunimatsu, Jun; Hikosaka, Okihide

2018-01-01

Choosing valuable objects and rewarding actions is critical for survival. While such choices must be made in a way that suits the animal’s circumstances, the neural mechanisms underlying such context-appropriate behavior are unclear. To address this question, we devised a context-dependent reward-seeking task for macaque monkeys. Each trial started with the appearance of one of many visual scenes containing two or more objects, and the monkey had to choose the good object by saccade to get a reward. These scenes were categorized into two dimensions of emotional context: dangerous versus safe and rich versus poor. We found that many amygdala neurons were more strongly activated by dangerous scenes, by rich scenes, or by both. Furthermore, saccades to target objects occurred more quickly in dangerous than in safe scenes and were also quicker in rich than in poor scenes. Thus, amygdala neuronal activity and saccadic reaction times were negatively correlated in each monkey. These results suggest that amygdala neurons facilitate targeting saccades predictably based on aspects of emotional context, as is necessary for goal-directed and social behavior. PMID:29870524
Towards Gesture-Based Multi-User Interactions in Collaborative Virtual Environments

NASA Astrophysics Data System (ADS)

Pretto, N.; Poiesi, F.

2017-11-01

We present a virtual reality (VR) setup that enables multiple users to participate in collaborative virtual environments and interact via gestures. A collaborative VR session is established through a network of users that is composed of a server and a set of clients. The server manages the communication amongst clients and is created by one of the users. Each user's VR setup consists of a Head Mounted Display (HMD) for immersive visualisation, a hand tracking system to interact with virtual objects and a single-hand joypad to move in the virtual environment. We use Google Cardboard as a HMD for the VR experience and a Leap Motion for hand tracking, thus making our solution low cost. We evaluate our VR setup though a forensics use case, where real-world objects pertaining to a simulated crime scene are included in a VR environment, acquired using a smartphone-based 3D reconstruction pipeline. Users can interact using virtual gesture-based tools such as pointers and rulers.
Visual search in scenes involves selective and non-selective pathways

PubMed Central

Wolfe, Jeremy M; Vo, Melissa L-H; Evans, Karla K; Greene, Michelle R

2010-01-01

How do we find objects in scenes? For decades, visual search models have been built on experiments in which observers search for targets, presented among distractor items, isolated and randomly arranged on blank backgrounds. Are these models relevant to search in continuous scenes? This paper argues that the mechanisms that govern artificial, laboratory search tasks do play a role in visual search in scenes. However, scene-based information is used to guide search in ways that had no place in earlier models. Search in scenes may be best explained by a dual-path model: A “selective” path in which candidate objects must be individually selected for recognition and a “non-selective” path in which information can be extracted from global / statistical information. PMID:21227734
Real-time visual tracking of less textured three-dimensional objects on mobile platforms

NASA Astrophysics Data System (ADS)

Seo, Byung-Kuk; Park, Jungsik; Park, Hanhoon; Park, Jong-Il

2012-12-01

Natural feature-based approaches are still challenging for mobile applications (e.g., mobile augmented reality), because they are feasible only in limited environments such as highly textured and planar scenes/objects, and they need powerful mobile hardware for fast and reliable tracking. In many cases where conventional approaches are not effective, three-dimensional (3-D) knowledge of target scenes would be beneficial. We present a well-established framework for real-time visual tracking of less textured 3-D objects on mobile platforms. Our framework is based on model-based tracking that efficiently exploits partially known 3-D scene knowledge such as object models and a background's distinctive geometric or photometric knowledge. Moreover, we elaborate on implementation in order to make it suitable for real-time vision processing on mobile hardware. The performance of the framework is tested and evaluated on recent commercially available smartphones, and its feasibility is shown by real-time demonstrations.
Optimization of incremental structure from motion combining a random k-d forest and pHash for unordered images in a complex scene

NASA Astrophysics Data System (ADS)

Zhan, Zongqian; Wang, Chendong; Wang, Xin; Liu, Yi

2018-01-01

On the basis of today's popular virtual reality and scientific visualization, three-dimensional (3-D) reconstruction is widely used in disaster relief, virtual shopping, reconstruction of cultural relics, etc. In the traditional incremental structure from motion (incremental SFM) method, the time cost of the matching is one of the main factors restricting the popularization of this method. To make the whole matching process more efficient, we propose a preprocessing method before the matching process: (1) we first construct a random k-d forest with the large-scale scale-invariant feature transform features in the images and combine this with the pHash method to obtain a value of relatedness, (2) we then construct a connected weighted graph based on the relatedness value, and (3) we finally obtain a planned sequence of adding images according to the principle of the minimum spanning tree. On this basis, we attempt to thin the minimum spanning tree to reduce the number of matchings and ensure that the images are well distributed. The experimental results show a great reduction in the number of matchings with enough object points, with only a small influence on the inner stability, which proves that this method can quickly and reliably improve the efficiency of the SFM method with unordered multiview images in complex scenes.
Stuck on semantics: Processing of irrelevant object-scene inconsistencies modulates ongoing gaze behavior.

PubMed

Cornelissen, Tim H W; Võ, Melissa L-H

2017-01-01

People have an amazing ability to identify objects and scenes with only a glimpse. How automatic is this scene and object identification? Are scene and object semantics-let alone their semantic congruity-processed to a degree that modulates ongoing gaze behavior even if they are irrelevant to the task at hand? Objects that do not fit the semantics of the scene (e.g., a toothbrush in an office) are typically fixated longer and more often than objects that are congruent with the scene context. In this study, we overlaid a letter T onto photographs of indoor scenes and instructed participants to search for it. Some of these background images contained scene-incongruent objects. Despite their lack of relevance to the search, we found that participants spent more time in total looking at semantically incongruent compared to congruent objects in the same position of the scene. Subsequent tests of explicit and implicit memory showed that participants did not remember many of the inconsistent objects and no more of the consistent objects. We argue that when we view natural environments, scene and object relationships are processed obligatorily, such that irrelevant semantic mismatches between scene and object identity can modulate ongoing eye-movement behavior.
Seek and you shall remember: Scene semantics interact with visual search to build better memories

PubMed Central

Draschkow, Dejan; Wolfe, Jeremy M.; Võ, Melissa L.-H.

2014-01-01

Memorizing critical objects and their locations is an essential part of everyday life. In the present study, incidental encoding of objects in naturalistic scenes during search was compared to explicit memorization of those scenes. To investigate if prior knowledge of scene structure influences these two types of encoding differently, we used meaningless arrays of objects as well as objects in real-world, semantically meaningful images. Surprisingly, when participants were asked to recall scenes, their memory performance was markedly better for searched objects than for objects they had explicitly tried to memorize, even though participants in the search condition were not explicitly asked to memorize objects. This finding held true even when objects were observed for an equal amount of time in both conditions. Critically, the recall benefit for searched over memorized objects in scenes was eliminated when objects were presented on uniform, non-scene backgrounds rather than in a full scene context. Thus, scene semantics not only help us search for objects in naturalistic scenes, but appear to produce a representation that supports our memory for those objects beyond intentional memorization. PMID:25015385
Beyond scene gist: Objects guide search more than scene background.

PubMed

Koehler, Kathryn; Eckstein, Miguel P

2017-06-01

Although the facilitation of visual search by contextual information is well established, there is little understanding of the independent contributions of different types of contextual cues in scenes. Here we manipulated 3 types of contextual information: object co-occurrence, multiple object configurations, and background category. We isolated the benefits of each contextual cue to target detectability, its impact on decision bias, confidence, and the guidance of eye movements. We find that object-based information guides eye movements and facilitates perceptual judgments more than scene background. The degree of guidance and facilitation of each contextual cue can be related to its inherent informativeness about the target spatial location as measured by human explicit judgments about likely target locations. Our results improve the understanding of the contributions of distinct contextual scene components to search and suggest that the brain's utilization of cues to guide eye movements is linked to the cue's informativeness about the target's location. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Effects of memory colour on colour constancy for unknown coloured objects.

PubMed

Granzier, Jeroen J M; Gegenfurtner, Karl R

2012-01-01

The perception of an object's colour remains constant despite large variations in the chromaticity of the illumination-colour constancy. Hering suggested that memory colours, the typical colours of objects, could help in estimating the illuminant's colour and therefore be an important factor in establishing colour constancy. Here we test whether the presence of objects with diagnostical colours (fruits, vegetables, etc) within a scene influence colour constancy for unknown coloured objects in the scene. Subjects matched one of four Munsell papers placed in a scene illuminated under either a reddish or a greenish lamp with the Munsell book of colour illuminated by a neutral lamp. The Munsell papers were embedded in four different scenes-one scene containing diagnostically coloured objects, one scene containing incongruent coloured objects, a third scene with geometrical objects of the same colour as the diagnostically coloured objects, and one scene containing non-diagnostically coloured objects (eg, a yellow coffee mug). All objects were placed against a black background. Colour constancy was on average significantly higher for the scene containing the diagnostically coloured objects compared with the other scenes tested. We conclude that the colours of familiar objects help in obtaining colour constancy for unknown objects.
Auditory object salience: human cortical processing of non-biological action sounds and their acoustic signal attributes

PubMed Central

Lewis, James W.; Talkington, William J.; Tallaksen, Katherine C.; Frum, Chris A.

2012-01-01

Whether viewed or heard, an object in action can be segmented as a distinct salient event based on a number of different sensory cues. In the visual system, several low-level attributes of an image are processed along parallel hierarchies, involving intermediate stages wherein gross-level object form and/or motion features are extracted prior to stages that show greater specificity for different object categories (e.g., people, buildings, or tools). In the auditory system, though relying on a rather different set of low-level signal attributes, meaningful real-world acoustic events and “auditory objects” can also be readily distinguished from background scenes. However, the nature of the acoustic signal attributes or gross-level perceptual features that may be explicitly processed along intermediate cortical processing stages remain poorly understood. Examining mechanical and environmental action sounds, representing two distinct non-biological categories of action sources, we had participants assess the degree to which each sound was perceived as object-like versus scene-like. We re-analyzed data from two of our earlier functional magnetic resonance imaging (fMRI) task paradigms (Engel et al., 2009) and found that scene-like action sounds preferentially led to activation along several midline cortical structures, but with strong dependence on listening task demands. In contrast, bilateral foci along the superior temporal gyri (STG) showed parametrically increasing activation to action sounds rated as more “object-like,” independent of sound category or task demands. Moreover, these STG regions also showed parametric sensitivity to spectral structure variations (SSVs) of the action sounds—a quantitative measure of change in entropy of the acoustic signals over time—and the right STG additionally showed parametric sensitivity to measures of mean entropy and harmonic content of the environmental sounds. Analogous to the visual system, intermediate stages of the auditory system appear to process or extract a number of quantifiable low-order signal attributes that are characteristic of action events perceived as being object-like, representing stages that may begin to dissociate different perceptual dimensions and categories of every-day, real-world action sounds. PMID:22582038
Mental Layout Extrapolations Prime Spatial Processing of Scenes

ERIC Educational Resources Information Center

Gottesman, Carmela V.

2011-01-01

Four experiments examined whether scene processing is facilitated by layout representation, including layout that was not perceived but could be predicted based on a previous partial view (boundary extension). In a priming paradigm (after Sanocki, 2003), participants judged objects' distances in photographs. In Experiment 1, full scenes (target),…
Visual Acuity Using Head-fixed Displays During Passive Self and Surround Motion

NASA Technical Reports Server (NTRS)

Wood, Scott J.; Black, F. Owen; Stallings, Valerie; Peters, Brian

2007-01-01

The ability to read head-fixed displays on various motion platforms requires the suppression of vestibulo-ocular reflexes. This study examined dynamic visual acuity while viewing a head-fixed display during different self and surround rotation conditions. Twelve healthy subjects were asked to report the orientation of Landolt C optotypes presented on a micro-display fixed to a rotating chair at 50 cm distance. Acuity thresholds were determined by the lowest size at which the subjects correctly identified 3 of 5 optotype orientations at peak velocity. Visual acuity was compared across four different conditions, each tested at 0.05 and 0.4 Hz (peak amplitude of 57 deg/s). The four conditions included: subject rotated in semi-darkness (i.e., limited to background illumination of the display), subject stationary while visual scene rotated, subject rotated around a stationary visual background, and both subject and visual scene rotated together. Visual acuity performance was greatest when the subject rotated around a stationary visual background; i.e., when both vestibular and visual inputs provided concordant information about the motion. Visual acuity performance was most reduced when the subject and visual scene rotated together; i.e., when the visual scene provided discordant information about the motion. Ranges of 4-5 logMAR step sizes across the conditions indicated the acuity task was sufficient to discriminate visual performance levels. The background visual scene can influence the ability to read head-fixed displays during passive motion disturbances. Dynamic visual acuity using head-fixed displays can provide an operationally relevant screening tool for visual performance during exposure to novel acceleration environments.
Progress in high-level exploratory vision

NASA Astrophysics Data System (ADS)

Brand, Matthew

1993-08-01

We have been exploring the hypothesis that vision is an explanatory process, in which causal and functional reasoning about potential motion plays an intimate role in mediating the activity of low-level visual processes. In particular, we have explored two of the consequences of this view for the construction of purposeful vision systems: Causal and design knowledge can be used to (1) drive focus of attention, and (2) choose between ambiguous image interpretations. An important result of visual understanding is an explanation of the scene's causal structure: How action is originated, constrained, and prevented, and what will happen in the immediate future. In everyday visual experience, most action takes the form of motion, and most causal analysis takes the form of dynamical analysis. This is even true of static scenes, where much of a scene's interest lies in how possible motions are arrested. This paper describes our progress in developing domain theories and visual processes for the understanding of various kinds of structured scenes, including structures built out of children's constructive toys and simple mechanical devices.

Blind prediction of natural video quality.

PubMed

Saad, Michele A; Bovik, Alan C; Charrier, Christophe

2014-03-01

We propose a blind (no reference or NR) video quality evaluation model that is nondistortion specific. The approach relies on a spatio-temporal model of video scenes in the discrete cosine transform domain, and on a model that characterizes the type of motion occurring in the scenes, to predict video quality. We use the models to define video statistics and perceptual features that are the basis of a video quality assessment (VQA) algorithm that does not require the presence of a pristine video to compare against in order to predict a perceptual quality score. The contributions of this paper are threefold. 1) We propose a spatio-temporal natural scene statistics (NSS) model for videos. 2) We propose a motion model that quantifies motion coherency in video scenes. 3) We show that the proposed NSS and motion coherency models are appropriate for quality assessment of videos, and we utilize them to design a blind VQA algorithm that correlates highly with human judgments of quality. The proposed algorithm, called video BLIINDS, is tested on the LIVE VQA database and on the EPFL-PoliMi video database and shown to perform close to the level of top performing reduced and full reference VQA algorithms.
Virtual integral holography

NASA Astrophysics Data System (ADS)

Venolia, Dan S.; Williams, Lance

1990-08-01

A range of stereoscopic display technologies exist which are no more intrusive, to the user, than a pair of spectacles. Combining such a display system with sensors for the position and orientation of the user's point-of-view results in a greatly enhanced depiction of three-dimensional data. As the point of view changes, the stereo display channels are updated in real time. The face of a monitor or display screen becomes a window on a three-dimensional scene. Motion parallax naturally conveys the placement and relative depth of objects in the field of view. Most of the advantages of "head-mounted display" technology are achieved with a less cumbersome system. To derive the full benefits of stereo combined with motion parallax, both stereo channels must be updated in real time. This may limit the size and complexity of data bases which can be viewed on processors of modest resources, and restrict the use of additional three-dimensional cues, such as texture mapping, depth cueing, and hidden surface elimination. Effective use of "full 3D" may still be undertaken in a non-interactive mode. Integral composite holograms have often been advanced as a powerful 3D visualization tool. Such a hologram is typically produced from a film recording of an object on a turntable, or a computer animation of an object rotating about one axis. The individual frames of film are multiplexed, in a composite hologram, in such a way as to be indexed by viewing angle. The composite may be produced as a cylinder transparency, which provides a stereo view of the object as if enclosed within the cylinder, which can be viewed from any angle. No vertical parallax is usually provided (this would require increasing the dimensionality of the multiplexing scheme), but the three dimensional image is highly resolved and easy to view and interpret. Even a modest processor can duplicate the effect of such a precomputed display, provided sufficient memory and bus bandwidth. This paper describes the components of a stereo display system with user point-of-view tracking for interactive 3D, and a digital realization of integral composite display which we term virtual integral holography. The primary drawbacks of holographic display - film processing turnaround time, and the difficulties of displaying scenes in full color -are obviated, and motion parallax cues provide easy 3D interpretation even for users who cannot see in stereo.
A scientific theory of Ars Memoriae: Spatial view cells in a continuous attractor network with linked items.

PubMed

Rolls, Edmund T

2017-05-01

The art of memory (ars memoriae) used since classical times includes using a well-known scene to associate each view or part of the scene with a different item in a speech. This memory technique is also known as the "method of loci." The new theory is proposed that this type of memory is implemented in the CA3 region of the hippocampus where there are spatial view cells in primates that allow a particular view to be associated with a particular object in an event or episodic memory. Given that the CA3 cells with their extensive recurrent collateral system connecting different CA3 cells, and associative synaptic modifiability, form an autoassociation or attractor network, the spatial view cells with their approximately Gaussian view fields become linked in a continuous attractor network. As the view space is traversed continuously (e.g., by self-motion or imagined self-motion across the scene), the views are therefore successively recalled in the correct order, with no view missing, and with low interference between the items to be recalled. Given that each spatial view has been associated with a different discrete item, the items are recalled in the correct order, with none missing. This is the first neuroscience theory of ars memoriae. The theory provides a foundation for understanding how a key feature of ars memoriae, the ability to use a spatial scene to encode a sequence of items to be remembered, is implemented. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Unsupervised motion-based object segmentation refined by color

NASA Astrophysics Data System (ADS)

Piek, Matthijs C.; Braspenning, Ralph; Varekamp, Chris

2003-06-01

For various applications, such as data compression, structure from motion, medical imaging and video enhancement, there is a need for an algorithm that divides video sequences into independently moving objects. Because our focus is on video enhancement and structure from motion for consumer electronics, we strive for a low complexity solution. For still images, several approaches exist based on colour, but these lack in both speed and segmentation quality. For instance, colour-based watershed algorithms produce a so-called oversegmentation with many segments covering each single physical object. Other colour segmentation approaches exist which somehow limit the number of segments to reduce this oversegmentation problem. However, this often results in inaccurate edges or even missed objects. Most likely, colour is an inherently insufficient cue for real world object segmentation, because real world objects can display complex combinations of colours. For video sequences, however, an additional cue is available, namely the motion of objects. When different objects in a scene have different motion, the motion cue alone is often enough to reliably distinguish objects from one another and the background. However, because of the lack of sufficient resolution of efficient motion estimators, like the 3DRS block matcher, the resulting segmentation is not at pixel resolution, but at block resolution. Existing pixel resolution motion estimators are more sensitive to noise, suffer more from aperture problems or have less correspondence to the true motion of objects when compared to block-based approaches or are too computationally expensive. From its tendency to oversegmentation it is apparent that colour segmentation is particularly effective near edges of homogeneously coloured areas. On the other hand, block-based true motion estimation is particularly effective in heterogeneous areas, because heterogeneous areas improve the chance a block is unique and thus decrease the chance of the wrong position producing a good match. Consequently, a number of methods exist which combine motion and colour segmentation. These methods use colour segmentation as a base for the motion segmentation and estimation or perform an independent colour segmentation in parallel which is in some way combined with the motion segmentation. The presented method uses both techniques to complement each other by first segmenting on motion cues and then refining the segmentation with colour. To our knowledge few methods exist which adopt this approach. One example is te{meshrefine}. This method uses an irregular mesh, which hinders its efficient implementation in consumer electronics devices. Furthermore, the method produces a foreground/background segmentation, while our applications call for the segmentation of multiple objects. NEW METHOD As mentioned above we start with motion segmentation and refine the edges of this segmentation with a pixel resolution colour segmentation method afterwards. There are several reasons for this approach: + Motion segmentation does not produce the oversegmentation which colour segmentation methods normally produce, because objects are more likely to have colour discontinuities than motion discontinuities. In this way, the colour segmentation only has to be done at the edges of segments, confining the colour segmentation to a smaller part of the image. In such a part, it is more likely that the colour of an object is homogeneous. + This approach restricts the computationally expensive pixel resolution colour segmentation to a subset of the image. Together with the very efficient 3DRS motion estimation algorithm, this helps to reduce the computational complexity. + The motion cue alone is often enough to reliably distinguish objects from one another and the background. To obtain the motion vector fields, a variant of the 3DRS block-based motion estimator which analyses three frames of input was used. The 3DRS motion estimator is known for its ability to estimate motion vectors which closely resemble the true motion. BLOCK-BASED MOTION SEGMENTATION As mentioned above we start with a block-resolution segmentation based on motion vectors. The presented method is inspired by the well-known K-means segmentation method te{K-means}. Several other methods (e.g. te{kmeansc}) adapt K-means for connectedness by adding a weighted shape-error. This adds the additional difficulty of finding the correct weights for the shape-parameters. Also, these methods often bias one particular pre-defined shape. The presented method, which we call K-regions, encourages connectedness because only blocks at the edges of segments may be assigned to another segment. This constrains the segmentation method to such a degree that it allows the method to use least squares for the robust fitting of affine motion models for each segment. Contrary to te{parmkm}, the segmentation step still operates on vectors instead of model parameters. To make sure the segmentation is temporally consistent, the segmentation of the previous frame will be used as initialisation for every new frame. We also present a scheme which makes the algorithm independent of the initially chosen amount of segments. COLOUR-BASED INTRA-BLOCK SEGMENTATION The block resolution motion-based segmentation forms the starting point for the pixel resolution segmentation. The pixel resolution segmentation is obtained from the block resolution segmentation by reclassifying pixels only at the edges of clusters. We assume that an edge between two objects can be found in either one of two neighbouring blocks that belong to different clusters. This assumption allows us to do the pixel resolution segmentation on each pair of such neighbouring blocks separately. Because of the local nature of the segmentation, it largely avoids problems with heterogeneously coloured areas. Because no new segments are introduced in this step, it also does not suffer from oversegmentation problems. The presented method has no problems with bifurcations. For the pixel resolution segmentation itself we reclassify pixels such that we optimize an error norm which favour similarly coloured regions and straight edges. SEGMENTATION MEASURE To assist in the evaluation of the proposed algorithm we developed a quality metric. Because the problem does not have an exact specification, we decided to define a ground truth output which we find desirable for a given input. We define the measure for the segmentation quality as being how different the segmentation is from the ground truth. Our measure enables us to evaluate oversegmentation and undersegmentation seperately. Also, it allows us to evaluate which parts of a frame suffer from oversegmentation or undersegmentation. The proposed algorithm has been tested on several typical sequences. CONCLUSIONS In this abstract we presented a new video segmentation method which performs well in the segmentation of multiple independently moving foreground objects from each other and the background. It combines the strong points of both colour and motion segmentation in the way we expected. One of the weak points is that the segmentation method suffers from undersegmentation when adjacent objects display similar motion. In sequences with detailed backgrounds the segmentation will sometimes display noisy edges. Apart from these results, we think that some of the techniques, and in particular the K-regions technique, may be useful for other two-dimensional data segmentation problems.
Visuo-Haptic Mixed Reality with Unobstructed Tool-Hand Integration.

PubMed

Cosco, Francesco; Garre, Carlos; Bruno, Fabio; Muzzupappa, Maurizio; Otaduy, Miguel A

2013-01-01

Visuo-haptic mixed reality consists of adding to a real scene the ability to see and touch virtual objects. It requires the use of see-through display technology for visually mixing real and virtual objects, and haptic devices for adding haptic interaction with the virtual objects. Unfortunately, the use of commodity haptic devices poses obstruction and misalignment issues that complicate the correct integration of a virtual tool and the user's real hand in the mixed reality scene. In this work, we propose a novel mixed reality paradigm where it is possible to touch and see virtual objects in combination with a real scene, using commodity haptic devices, and with a visually consistent integration of the user's hand and the virtual tool. We discuss the visual obstruction and misalignment issues introduced by commodity haptic devices, and then propose a solution that relies on four simple technical steps: color-based segmentation of the hand, tracking-based segmentation of the haptic device, background repainting using image-based models, and misalignment-free compositing of the user's hand. We have developed a successful proof-of-concept implementation, where a user can touch virtual objects and interact with them in the context of a real scene, and we have evaluated the impact on user performance of obstruction and misalignment correction.
Small Moving Vehicle Detection in a Satellite Video of an Urban Area

PubMed Central

Yang, Tao; Wang, Xiwen; Yao, Bowei; Li, Jing; Zhang, Yanning; He, Zhannan; Duan, Wencheng

2016-01-01

Vehicle surveillance of a wide area allows us to learn much about the daily activities and traffic information. With the rapid development of remote sensing, satellite video has become an important data source for vehicle detection, which provides a broader field of surveillance. The achieved work generally focuses on aerial video with moderately-sized objects based on feature extraction. However, the moving vehicles in satellite video imagery range from just a few pixels to dozens of pixels and exhibit low contrast with respect to the background, which makes it hard to get available appearance or shape information. In this paper, we look into the problem of moving vehicle detection in satellite imagery. To the best of our knowledge, it is the first time to deal with moving vehicle detection from satellite videos. Our approach consists of two stages: first, through foreground motion segmentation and trajectory accumulation, the scene motion heat map is dynamically built. Following this, a novel saliency based background model which intensifies moving objects is presented to segment the vehicles in the hot regions. Qualitative and quantitative experiments on sequence from a recent Skybox satellite video dataset demonstrates that our approach achieves a high detection rate and low false alarm simultaneously. PMID:27657091
Does object view influence the scene consistency effect?

PubMed

Sastyin, Gergo; Niimi, Ryosuke; Yokosawa, Kazuhiko

2015-04-01

Traditional research on the scene consistency effect only used clearly recognizable object stimuli to show mutually interactive context effects for both the object and background components on scene perception (Davenport & Potter in Psychological Science, 15, 559-564, 2004). However, in real environments, objects are viewed from multiple viewpoints, including an accidental, hard-to-recognize one. When the observers named target objects in scenes (Experiments 1a and 1b, object recognition task), we replicated the scene consistency effect (i.e., there was higher accuracy for the objects with consistent backgrounds). However, there was a significant interaction effect between consistency and object viewpoint, which indicated that the scene consistency effect was more important for identifying objects in the accidental view condition than in the canonical view condition. Therefore, the object recognition system may rely more on the scene context when the object is difficult to recognize. In Experiment 2, the observers identified the background (background recognition task) while the scene consistency and object views were manipulated. The results showed that object viewpoint had no effect, while the scene consistency effect was observed. More specifically, the canonical and accidental views both equally provided contextual information for scene perception. These findings suggested that the mechanism for conscious recognition of objects could be dissociated from the mechanism for visual analysis of object images that were part of a scene. The "context" that the object images provided may have been derived from its view-invariant, relatively low-level visual features (e.g., color), rather than its semantic information.
Joint Video Stitching and Stabilization from Moving Cameras.

PubMed

Guo, Heng; Liu, Shuaicheng; He, Tong; Zhu, Shuyuan; Zeng, Bing; Gabbouj, Moncef

2016-09-08

In this paper, we extend image stitching to video stitching for videos that are captured for the same scene simultaneously by multiple moving cameras. In practice, videos captured under this circumstance often appear shaky. Directly applying image stitching methods for shaking videos often suffers from strong spatial and temporal artifacts. To solve this problem, we propose a unified framework in which video stitching and stabilization are performed jointly. Specifically, our system takes several overlapping videos as inputs. We estimate both inter motions (between different videos) and intra motions (between neighboring frames within a video). Then, we solve an optimal virtual 2D camera path from all original paths. An enlarged field of view along the virtual path is finally obtained by a space-temporal optimization that takes both inter and intra motions into consideration. Two important components of this optimization are that (1) a grid-based tracking method is designed for an improved robustness, which produces features that are distributed evenly within and across multiple views, and (2) a mesh-based motion model is adopted for the handling of the scene parallax. Some experimental results are provided to demonstrate the effectiveness of our approach on various consumer-level videos and a Plugin, named "Video Stitcher" is developed at Adobe After Effects CC2015 to show the processed videos.
The roles of scene priming and location priming in object-scene consistency effects

PubMed Central

Heise, Nils; Ansorge, Ulrich

2014-01-01

Presenting consistent objects in scenes facilitates object recognition as compared to inconsistent objects. Yet the mechanisms by which scenes influence object recognition are still not understood. According to one theory, consistent scenes facilitate visual search for objects at expected places. Here, we investigated two predictions following from this theory: If visual search is responsible for consistency effects, consistency effects could be weaker (1) with better-primed than less-primed object locations, and (2) with less-primed than better-primed scenes. In Experiments 1 and 2, locations of objects were varied within a scene to a different degree (one, two, or four possible locations). In addition, object-scene consistency was studied as a function of progressive numbers of repetitions of the backgrounds. Because repeating locations and backgrounds could facilitate visual search for objects, these repetitions might alter the object-scene consistency effect by lowering of location uncertainty. Although we find evidence for a significant consistency effect, we find no clear support for impacts of scene priming or location priming on the size of the consistency effect. Additionally, we find evidence that the consistency effect is dependent on the eccentricity of the target objects. These results point to only small influences of priming to object-scene consistency effects but all-in-all the findings can be reconciled with a visual-search explanation of the consistency effect. PMID:24910628
A novel role for visual perspective cues in the neural computation of depth

PubMed Central

Kim, HyungGoo R.; Angelaki, Dora E.; DeAngelis, Gregory C.

2014-01-01

As we explore a scene, our eye movements add global patterns of motion to the retinal image, complicating visual motion produced by self-motion or moving objects. Conventionally, it has been assumed that extra-retinal signals, such as efference copy of smooth pursuit commands, are required to compensate for the visual consequences of eye rotations. We consider an alternative possibility: namely, that the visual system can infer eye rotations from global patterns of image motion. We visually simulated combinations of eye translation and rotation, including perspective distortions that change dynamically over time. We demonstrate that incorporating these “dynamic perspective” cues allows the visual system to generate selectivity for depth sign from motion parallax in macaque area MT, a computation that was previously thought to require extra-retinal signals regarding eye velocity. Our findings suggest novel neural mechanisms that analyze global patterns of visual motion to perform computations that require knowledge of eye rotations. PMID:25436667
A fuzzy measure approach to motion frame analysis for scene detection. M.S. Thesis - Houston Univ.

NASA Technical Reports Server (NTRS)

Leigh, Albert B.; Pal, Sankar K.

1992-01-01

This paper addresses a solution to the problem of scene estimation of motion video data in the fuzzy set theoretic framework. Using fuzzy image feature extractors, a new algorithm is developed to compute the change of information in each of two successive frames to classify scenes. This classification process of raw input visual data can be used to establish structure for correlation. The algorithm attempts to fulfill the need for nonlinear, frame-accurate access to video data for applications such as video editing and visual document archival/retrieval systems in multimedia environments.
Modeling global scene factors in attention

NASA Astrophysics Data System (ADS)

Torralba, Antonio

2003-07-01

Models of visual attention have focused predominantly on bottom-up approaches that ignored structured contextual and scene information. I propose a model of contextual cueing for attention guidance based on the global scene configuration. It is shown that the statistics of low-level features across the whole image can be used to prime the presence or absence of objects in the scene and to predict their location, scale, and appearance before exploring the image. In this scheme, visual context information can become available early in the visual processing chain, which allows modulation of the saliency of image regions and provides an efficient shortcut for object detection and recognition. 2003 Optical Society of America
Effects of memory colour on colour constancy for unknown coloured objects

PubMed Central

Granzier, Jeroen J M; Gegenfurtner, Karl R

2012-01-01

The perception of an object's colour remains constant despite large variations in the chromaticity of the illumination—colour constancy. Hering suggested that memory colours, the typical colours of objects, could help in estimating the illuminant's colour and therefore be an important factor in establishing colour constancy. Here we test whether the presence of objects with diagnostical colours (fruits, vegetables, etc) within a scene influence colour constancy for unknown coloured objects in the scene. Subjects matched one of four Munsell papers placed in a scene illuminated under either a reddish or a greenish lamp with the Munsell book of colour illuminated by a neutral lamp. The Munsell papers were embedded in four different scenes—one scene containing diagnostically coloured objects, one scene containing incongruent coloured objects, a third scene with geometrical objects of the same colour as the diagnostically coloured objects, and one scene containing non-diagnostically coloured objects (eg, a yellow coffee mug). All objects were placed against a black background. Colour constancy was on average significantly higher for the scene containing the diagnostically coloured objects compared with the other scenes tested. We conclude that the colours of familiar objects help in obtaining colour constancy for unknown objects. PMID:23145282
Direct versus indirect processing changes the influence of color in natural scene categorization.

PubMed

Otsuka, Sachio; Kawaguchi, Jun

2009-10-01

We examined whether participants would use a negative priming (NP) paradigm to categorize color and grayscale images of natural scenes that were presented peripherally and were ignored. We focused on (1) attentional resources allocated to natural scenes and (2) direct versus indirect processing of them. We set up low and high attention-load conditions, based on the set size of the searched stimuli in the prime display (one and five). Participants were required to detect and categorize the target objects in natural scenes in a central visual search task, ignoring peripheral natural images in both the prime and probe displays. The results showed that, irrespective of attention load, NP was observed for color scenes but not for grayscale scenes. We did not observe any effect of color information in central visual search, where participants responded directly to natural scenes. These results indicate that, in a situation in which participants indirectly process natural scenes, color information is critical to object categorization, but when the scenes are processed directly, color information does not contribute to categorization.
Illusory Motion Reproduced by Deep Neural Networks Trained for Prediction

PubMed Central

Watanabe, Eiji; Kitaoka, Akiyoshi; Sakamoto, Kiwako; Yasugi, Masaki; Tanaka, Kenta

2018-01-01

The cerebral cortex predicts visual motion to adapt human behavior to surrounding objects moving in real time. Although the underlying mechanisms are still unknown, predictive coding is one of the leading theories. Predictive coding assumes that the brain's internal models (which are acquired through learning) predict the visual world at all times and that errors between the prediction and the actual sensory input further refine the internal models. In the past year, deep neural networks based on predictive coding were reported for a video prediction machine called PredNet. If the theory substantially reproduces the visual information processing of the cerebral cortex, then PredNet can be expected to represent the human visual perception of motion. In this study, PredNet was trained with natural scene videos of the self-motion of the viewer, and the motion prediction ability of the obtained computer model was verified using unlearned videos. We found that the computer model accurately predicted the magnitude and direction of motion of a rotating propeller in unlearned videos. Surprisingly, it also represented the rotational motion for illusion images that were not moving physically, much like human visual perception. While the trained network accurately reproduced the direction of illusory rotation, it did not detect motion components in negative control pictures wherein people do not perceive illusory motion. This research supports the exciting idea that the mechanism assumed by the predictive coding theory is one of basis of motion illusion generation. Using sensory illusions as indicators of human perception, deep neural networks are expected to contribute significantly to the development of brain research. PMID:29599739
Illusory Motion Reproduced by Deep Neural Networks Trained for Prediction.

PubMed

Watanabe, Eiji; Kitaoka, Akiyoshi; Sakamoto, Kiwako; Yasugi, Masaki; Tanaka, Kenta

2018-01-01

The cerebral cortex predicts visual motion to adapt human behavior to surrounding objects moving in real time. Although the underlying mechanisms are still unknown, predictive coding is one of the leading theories. Predictive coding assumes that the brain's internal models (which are acquired through learning) predict the visual world at all times and that errors between the prediction and the actual sensory input further refine the internal models. In the past year, deep neural networks based on predictive coding were reported for a video prediction machine called PredNet. If the theory substantially reproduces the visual information processing of the cerebral cortex, then PredNet can be expected to represent the human visual perception of motion. In this study, PredNet was trained with natural scene videos of the self-motion of the viewer, and the motion prediction ability of the obtained computer model was verified using unlearned videos. We found that the computer model accurately predicted the magnitude and direction of motion of a rotating propeller in unlearned videos. Surprisingly, it also represented the rotational motion for illusion images that were not moving physically, much like human visual perception. While the trained network accurately reproduced the direction of illusory rotation, it did not detect motion components in negative control pictures wherein people do not perceive illusory motion. This research supports the exciting idea that the mechanism assumed by the predictive coding theory is one of basis of motion illusion generation. Using sensory illusions as indicators of human perception, deep neural networks are expected to contribute significantly to the development of brain research.
Position estimation and driving of an autonomous vehicle by monocular vision

NASA Astrophysics Data System (ADS)

Hanan, Jay C.; Kayathi, Pavan; Hughlett, Casey L.

2007-04-01

Automatic adaptive tracking in real-time for target recognition provided autonomous control of a scale model electric truck. The two-wheel drive truck was modified as an autonomous rover test-bed for vision based guidance and navigation. Methods were implemented to monitor tracking error and ensure a safe, accurate arrival at the intended science target. Some methods are situation independent relying only on the confidence error of the target recognition algorithm. Other methods take advantage of the scenario of combined motion and tracking to filter out anomalies. In either case, only a single calibrated camera was needed for position estimation. Results from real-time autonomous driving tests on the JPL simulated Mars yard are presented. Recognition error was often situation dependent. For the rover case, the background was in motion and may be characterized to provide visual cues on rover travel such as rate, pitch, roll, and distance to objects of interest or hazards. Objects in the scene may be used as landmarks, or waypoints, for such estimations. As objects are approached, their scale increases and their orientation may change. In addition, particularly on rough terrain, these orientation and scale changes may be unpredictable. Feature extraction combined with the neural network algorithm was successful in providing visual odometry in the simulated Mars environment.
Augmented reality three-dimensional object visualization and recognition with axially distributed sensing.

PubMed

Markman, Adam; Shen, Xin; Hua, Hong; Javidi, Bahram

2016-01-15

An augmented reality (AR) smartglass display combines real-world scenes with digital information enabling the rapid growth of AR-based applications. We present an augmented reality-based approach for three-dimensional (3D) optical visualization and object recognition using axially distributed sensing (ADS). For object recognition, the 3D scene is reconstructed, and feature extraction is performed by calculating the histogram of oriented gradients (HOG) of a sliding window. A support vector machine (SVM) is then used for classification. Once an object has been identified, the 3D reconstructed scene with the detected object is optically displayed in the smartglasses allowing the user to see the object, remove partial occlusions of the object, and provide critical information about the object such as 3D coordinates, which are not possible with conventional AR devices. To the best of our knowledge, this is the first report on combining axially distributed sensing with 3D object visualization and recognition for applications to augmented reality. The proposed approach can have benefits for many applications, including medical, military, transportation, and manufacturing.
Moving through a multiplex holographic scene

NASA Astrophysics Data System (ADS)

Mrongovius, Martina

2013-02-01

This paper explores how movement can be used as a compositional element in installations of multiplex holograms. My holographic images are created from montages of hand-held video and photo-sequences. These spatially dynamic compositions are visually complex but anchored to landmarks and hints of the capturing process - such as the appearance of the photographer's shadow - to establish a sense of connection to the holographic scene. Moving around in front of the hologram, the viewer animates the holographic scene. A perception of motion then results from the viewer's bodily awareness of physical motion and the visual reading of dynamics within the scene or movement of perspective through a virtual suggestion of space. By linking and transforming the physical motion of the viewer with the visual animation, the viewer's bodily awareness - including proprioception, balance and orientation - play into the holographic composition. How multiplex holography can be a tool for exploring coupled, cross-referenced and transformed perceptions of movement is demonstrated with a number of holographic image installations. Through this process I expanded my creative composition practice to consider how dynamic and spatial scenes can be conveyed through the fragmented view of a multiplex hologram. This body of work was developed through an installation art practice and was the basis of my recently completed doctoral thesis: 'The Emergent Holographic Scene — compositions of movement and affect using multiplex holographic images'.
An HDR imaging method with DTDI technology for push-broom cameras

NASA Astrophysics Data System (ADS)

Sun, Wu; Han, Chengshan; Xue, Xucheng; Lv, Hengyi; Shi, Junxia; Hu, Changhong; Li, Xiangzhi; Fu, Yao; Jiang, Xiaonan; Huang, Liang; Han, Hongyin

2018-03-01

Conventionally, high dynamic-range (HDR) imaging is based on taking two or more pictures of the same scene with different exposure. However, due to a high-speed relative motion between the camera and the scene, it is hard for this technique to be applied to push-broom remote sensing cameras. For the sake of HDR imaging in push-broom remote sensing applications, the present paper proposes an innovative method which can generate HDR images without redundant image sensors or optical components. Specifically, this paper adopts an area array CMOS (complementary metal oxide semiconductor) with the digital domain time-delay-integration (DTDI) technology for imaging, instead of adopting more than one row of image sensors, thereby taking more than one picture with different exposure. And then a new HDR image by fusing two original images with a simple algorithm can be achieved. By conducting the experiment, the dynamic range (DR) of the image increases by 26.02 dB. The proposed method is proved to be effective and has potential in other imaging applications where there is a relative motion between the cameras and scenes.

Fixations on objects in natural scenes: dissociating importance from salience

PubMed Central

't Hart, Bernard M.; Schmidt, Hannah C. E. F.; Roth, Christine; Einhäuser, Wolfgang

2013-01-01

The relation of selective attention to understanding of natural scenes has been subject to intense behavioral research and computational modeling, and gaze is often used as a proxy for such attention. The probability of an image region to be fixated typically correlates with its contrast. However, this relation does not imply a causal role of contrast. Rather, contrast may relate to an object's “importance” for a scene, which in turn drives attention. Here we operationalize importance by the probability that an observer names the object as characteristic for a scene. We modify luminance contrast of either a frequently named (“common”/“important”) or a rarely named (“rare”/“unimportant”) object, track the observers' eye movements during scene viewing and ask them to provide keywords describing the scene immediately after. When no object is modified relative to the background, important objects draw more fixations than unimportant ones. Increases of contrast make an object more likely to be fixated, irrespective of whether it was important for the original scene, while decreases in contrast have little effect on fixations. Any contrast modification makes originally unimportant objects more important for the scene. Finally, important objects are fixated more centrally than unimportant objects, irrespective of contrast. Our data suggest a dissociation between object importance (relevance for the scene) and salience (relevance for attention). If an object obeys natural scene statistics, important objects are also salient. However, when natural scene statistics are violated, importance and salience are differentially affected. Object salience is modulated by the expectation about object properties (e.g., formed by context or gist), and importance by the violation of such expectations. In addition, the dependence of fixated locations within an object on the object's importance suggests an analogy to the effects of word frequency on landing positions in reading. PMID:23882251
Experiments in MPEG-4 content authoring, browsing, and streaming

NASA Astrophysics Data System (ADS)

Puri, Atul; Schmidt, Robert L.; Basso, Andrea; Civanlar, Mehmet R.

2000-12-01

In this paper, within the context of the MPEG-4 standard we report on preliminary experiments in three areas -- authoring of MPEG-4 content, a player/browser for MPEG-4 content, and streaming of MPEG-4 content. MPEG-4 is a new standard for coding of audiovisual objects; the core of MPEG-4 standard is complete while amendments are in various stages of completion. MPEG-4 addresses compression of audio and visual objects, their integration by scene description, and interactivity of users with such objects. MPEG-4 scene description is based on VRML like language for 3D scenes, extended to 2D scenes, and supports integration of 2D and 3D scenes. This scene description language is called BIFS. First, we introduce the basic concepts behind BIFS and then show with an example, textual authoring of different components needed to describe an audiovisual scene in BIFS; the textual BIFS is then saved as compressed binary file/s for storage or transmission. Then, we discuss a high level design of an MPEG-4 player/browser that uses the main components from authoring such as encoded BIFS stream, media files it refers to, and multiplexed object descriptor stream to play an MPEG-4 scene. We also discuss our extensions to such a player/browser. Finally, we present our work in streaming of MPEG-4 -- the payload format, modification to client MPEG-4 player/browser, server-side infrastructure and example content used in our MPEG-4 streaming experiments.
Real-time full-motion color Flash lidar for target detection and identification

NASA Astrophysics Data System (ADS)

Nelson, Roy; Coppock, Eric; Craig, Rex; Craner, Jeremy; Nicks, Dennis; von Niederhausern, Kurt

2015-05-01

Greatly improved understanding of areas and objects of interest can be gained when real time, full-motion Flash LiDAR is fused with inertial navigation data and multi-spectral context imagery. On its own, full-motion Flash LiDAR provides the opportunity to exploit the z dimension for improved intelligence vs. 2-D full-motion video (FMV). The intelligence value of this data is enhanced when it is combined with inertial navigation data to produce an extended, georegistered data set suitable for a variety of analysis. Further, when fused with multispectral context imagery the typical point cloud now becomes a rich 3-D scene which is intuitively obvious to the user and allows rapid cognitive analysis with little or no training. Ball Aerospace has developed and demonstrated a real-time, full-motion LIDAR system that fuses context imagery (VIS to MWIR demonstrated) and inertial navigation data in real time, and can stream these information-rich geolocated/fused 3-D scenes from an airborne platform. In addition, since the higher-resolution context camera is boresighted and frame synchronized to the LiDAR camera and the LiDAR camera is an array sensor, techniques have been developed to rapidly interpolate the LIDAR pixel values creating a point cloud that has the same resolution as the context camera, effectively creating a high definition (HD) LiDAR image. This paper presents a design overview of the Ball TotalSight™ LIDAR system along with typical results over urban and rural areas collected from both rotary and fixed-wing aircraft. We conclude with a discussion of future work.
The Shuttle Mission Simulator computer generated imagery

NASA Technical Reports Server (NTRS)

Henderson, T. H.

1984-01-01

Equipment available in the primary training facility for the Space Transportation System (STS) flight crews includes the Fixed Base Simulator, the Motion Base Simulator, the Spacelab Simulator, and the Guidance and Navigation Simulator. The Shuttle Mission Simulator (SMS) consists of the Fixed Base Simulator and the Motion Base Simulator. The SMS utilizes four visual Computer Generated Image (CGI) systems. The Motion Base Simulator has a forward crew station with six-degrees of freedom motion simulation. Operation of the Spacelab Simulator is planned for the spring of 1983. The Guidance and Navigation Simulator went into operation in 1982. Aspects of orbital visual simulation are discussed, taking into account the earth scene, payload simulation, the generation and display of 1079 stars, the simulation of sun glare, and Reaction Control System jet firing plumes. Attention is also given to landing site visual simulation, and night launch and landing simulation.
Fixation and saliency during search of natural scenes: the case of visual agnosia.

PubMed

Foulsham, Tom; Barton, Jason J S; Kingstone, Alan; Dewhurst, Richard; Underwood, Geoffrey

2009-07-01

Models of eye movement control in natural scenes often distinguish between stimulus-driven processes (which guide the eyes to visually salient regions) and those based on task and object knowledge (which depend on expectations or identification of objects and scene gist). In the present investigation, the eye movements of a patient with visual agnosia were recorded while she searched for objects within photographs of natural scenes and compared to those made by students and age-matched controls. Agnosia is assumed to disrupt the top-down knowledge available in this task, and so may increase the reliance on bottom-up cues. The patient's deficit in object recognition was seen in poor search performance and inefficient scanning. The low-level saliency of target objects had an effect on responses in visual agnosia, and the most salient region in the scene was more likely to be fixated by the patient than by controls. An analysis of model-predicted saliency at fixation locations indicated a closer match between fixations and low-level saliency in agnosia than in controls. These findings are discussed in relation to saliency-map models and the balance between high and low-level factors in eye guidance.
Using an Improved SIFT Algorithm and Fuzzy Closed-Loop Control Strategy for Object Recognition in Cluttered Scenes

PubMed Central

Nie, Haitao; Long, Kehui; Ma, Jun; Yue, Dan; Liu, Jinguo

2015-01-01

Partial occlusions, large pose variations, and extreme ambient illumination conditions generally cause the performance degradation of object recognition systems. Therefore, this paper presents a novel approach for fast and robust object recognition in cluttered scenes based on an improved scale invariant feature transform (SIFT) algorithm and a fuzzy closed-loop control method. First, a fast SIFT algorithm is proposed by classifying SIFT features into several clusters based on several attributes computed from the sub-orientation histogram (SOH), in the feature matching phase only features that share nearly the same corresponding attributes are compared. Second, a feature matching step is performed following a prioritized order based on the scale factor, which is calculated between the object image and the target object image, guaranteeing robust feature matching. Finally, a fuzzy closed-loop control strategy is applied to increase the accuracy of the object recognition and is essential for autonomous object manipulation process. Compared to the original SIFT algorithm for object recognition, the result of the proposed method shows that the number of SIFT features extracted from an object has a significant increase, and the computing speed of the object recognition processes increases by more than 40%. The experimental results confirmed that the proposed method performs effectively and accurately in cluttered scenes. PMID:25714094
Observing human movements helps decoding environmental forces.

PubMed

Zago, Myrka; La Scaleia, Barbara; Miller, William L; Lacquaniti, Francesco

2011-11-01

Vision of human actions can affect several features of visual motion processing, as well as the motor responses of the observer. Here, we tested the hypothesis that action observation helps decoding environmental forces during the interception of a decelerating target within a brief time window, a task intrinsically very difficult. We employed a factorial design to evaluate the effects of scene orientation (normal or inverted) and target gravity (normal or inverted). Button-press triggered the motion of a bullet, a piston, or a human arm. We found that the timing errors were smaller for upright scenes irrespective of gravity direction in the Bullet group, while the errors were smaller for the standard condition of normal scene and gravity in the Piston group. In the Arm group, instead, performance was better when the directions of scene and target gravity were concordant, irrespective of whether both were upright or inverted. These results suggest that the default viewer-centered reference frame is used with inanimate scenes, such as those of the Bullet and Piston protocols. Instead, the presence of biological movements in animate scenes (as in the Arm protocol) may help processing target kinematics under the ecological conditions of coherence between scene and target gravity directions.
a Gaussian Process Based Multi-Person Interaction Model

NASA Astrophysics Data System (ADS)

Klinger, T.; Rottensteiner, F.; Heipke, C.

2016-06-01

Online multi-person tracking in image sequences is commonly guided by recursive filters, whose predictive models define the expected positions of future states. When a predictive model deviates too much from the true motion of a pedestrian, which is often the case in crowded scenes due to unpredicted accelerations, the data association is prone to fail. In this paper we propose a novel predictive model on the basis of Gaussian Process Regression. The model takes into account the motion of every tracked pedestrian in the scene and the prediction is executed with respect to the velocities of all interrelated persons. As shown by the experiments, the model is capable of yielding more plausible predictions even in the presence of mutual occlusions or missing measurements. The approach is evaluated on a publicly available benchmark and outperforms other state-of-the-art trackers.
Selective attention during scene perception: evidence from negative priming.

PubMed

Gordon, Robert D

2006-10-01

In two experiments, we examined the role of semantic scene content in guiding attention during scene viewing. In each experiment, performance on a lexical decision task was measured following the brief presentation of a scene. The lexical decision stimulus named an object that was either present or not present in the scene. The results of Experiment 1 revealed no priming from inconsistent objects (whose identities conflicted with the scene in which they appeared), but negative priming from consistent objects. The results of Experiment 2 indicated that negative priming from consistent objects occurs only when inconsistent objects are present in the scenes. Together, the results suggest that observers are likely to attend to inconsistent objects, and that representations of consistent objects are suppressed in the presence of an inconsistent object. Furthermore, the data suggest that inconsistent objects draw attention because they are relatively difficult to identify in an inappropriate context.
Knowledge-Based Object Detection in Laser Scanning Point Clouds

NASA Astrophysics Data System (ADS)

Boochs, F.; Karmacharya, A.; Marbs, A.

2012-07-01

Object identification and object processing in 3D point clouds have always posed challenges in terms of effectiveness and efficiency. In practice, this process is highly dependent on human interpretation of the scene represented by the point cloud data, as well as the set of modeling tools available for use. Such modeling algorithms are data-driven and concentrate on specific features of the objects, being accessible to numerical models. We present an approach that brings the human expert knowledge about the scene, the objects inside, and their representation by the data and the behavior of algorithms to the machine. This "understanding" enables the machine to assist human interpretation of the scene inside the point cloud. Furthermore, it allows the machine to understand possibilities and limitations of algorithms and to take this into account within the processing chain. This not only assists the researchers in defining optimal processing steps, but also provides suggestions when certain changes or new details emerge from the point cloud. Our approach benefits from the advancement in knowledge technologies within the Semantic Web framework. This advancement has provided a strong base for applications based on knowledge management. In the article we will present and describe the knowledge technologies used for our approach such as Web Ontology Language (OWL), used for formulating the knowledge base and the Semantic Web Rule Language (SWRL) with 3D processing and topologic built-ins, aiming to combine geometrical analysis of 3D point clouds, and specialists' knowledge of the scene and algorithmic processing.
Space flight visual simulation.

PubMed

Xu, L

1985-01-01

In this paper, based on the scenes of stars seen by astronauts in their orbital flights, we have studied the mathematical model which must be constructed for CGI system to realize the space flight visual simulation. Considering such factors as the revolution and rotation of the Earth, exact date, time and site of orbital injection of the spacecraft, as well as its orbital flight and attitude motion, etc., we first defined all the instantaneous lines of sight and visual fields of astronauts in space. Then, through a series of coordinate transforms, the pictures of the scenes of stars changing with time-space were photographed one by one mathematically. In the procedure, we have designed a method of three-times "mathematical cutting." Finally, we obtained each instantaneous picture of the scenes of stars observed by astronauts through the window of the cockpit. Also, the dynamic conditions shaded by the Earth in the varying pictures of scenes of stars could be displayed.
Method for detecting surface motions and mapping small terrestrial or planetary surface deformations with synthetic aperture radar

NASA Technical Reports Server (NTRS)

Gabriel, Andrew K. (Inventor); Goldstein, Richard M. (Inventor); Zebker, Howard A. (Inventor)

1990-01-01

A technique based on synthetic aperture radar (SAR) interferometry is used to measure very small (1 cm or less) surface deformations with good resolution (10 m) over large areas (50 km). It can be used for accurate measurements of many geophysical phenomena, including swelling and buckling in fault zones, residual, vertical and lateral displacements from seismic events, and prevolcanic swelling. Two SAR images are made of a scene by two spaced antennas and a difference interferogram of the scene is made. After unwrapping phases of pixels of the difference interferogram, surface motion or deformation changes of the surface are observed. A second interferogram of the same scene is made from a different pair of images, at least one of which is made after some elapsed time. The second interferogram is then compared with the first interferogram to detect changes in line of sight position of pixels. By resolving line of sight observations into their vector components in other sets of interferograms along at least one other direction, lateral motions may be recovered in their entirety. Since in general, the SAR images are made from flight tracks that are separated, it is not possible to distinguish surface changes from the parallax caused by topography. However, a third image may be used to remove the topography and leave only the surface changes.
Parietal cortex integrates contextual and saliency signals during the encoding of natural scenes in working memory.

PubMed

Santangelo, Valerio; Di Francesco, Simona Arianna; Mastroberardino, Serena; Macaluso, Emiliano

2015-12-01

The Brief presentation of a complex scene entails that only a few objects can be selected, processed indepth, and stored in memory. Both low-level sensory salience and high-level context-related factors (e.g., the conceptual match/mismatch between objects and scene context) contribute to this selection process, but how the interplay between these factors affects memory encoding is largely unexplored. Here, during fMRI we presented participants with pictures of everyday scenes. After a short retention interval, participants judged the position of a target object extracted from the initial scene. The target object could be either congruent or incongruent with the context of the scene, and could be located in a region of the image with maximal or minimal salience. Behaviourally, we found a reduced impact of saliency on visuospatial working memory performance when the target was out-of-context. Encoding-related fMRI results showed that context-congruent targets activated dorsoparietal regions, while context-incongruent targets de-activated the ventroparietal cortex. Saliency modulated activity both in dorsal and ventral regions, with larger context-related effects for salient targets. These findings demonstrate the joint contribution of knowledge-based and saliency-driven attention for memory encoding, highlighting a dissociation between dorsal and ventral parietal regions. © 2015 Wiley Periodicals, Inc.
Modeling heading and path perception from optic flow in the case of independently moving objects

PubMed Central

Raudies, Florian; Neumann, Heiko

2013-01-01

Humans are usually accurate when estimating heading or path from optic flow, even in the presence of independently moving objects (IMOs) in an otherwise rigid scene. To invoke significant biases in perceived heading, IMOs have to be large and obscure the focus of expansion (FOE) in the image plane, which is the point of approach. For the estimation of path during curvilinear self-motion no significant biases were found in the presence of IMOs. What makes humans robust in their estimation of heading or path using optic flow? We derive analytical models of optic flow for linear and curvilinear self-motion using geometric scene models. Heading biases of a linear least squares method, which builds upon these analytical models, are large, larger than those reported for humans. This motivated us to study segmentation cues that are available from optic flow. We derive models of accretion/deletion, expansion/contraction, acceleration/deceleration, local spatial curvature, and local temporal curvature, to be used as cues to segment an IMO from the background. Integrating these segmentation cues into our method of estimating heading or path now explains human psychophysical data and extends, as well as unifies, previous investigations. Our analysis suggests that various cues available from optic flow help to segment IMOs and, thus, make humans' heading and path perception robust in the presence of such IMOs. PMID:23554589
The Interplay Between Strategic And Adaptive Control Mechanisms In Plastic Recalibration Of Locomotor Function

NASA Technical Reports Server (NTRS)

Richards, J. T.; Mulavara, A. P.; Bloomberg, J. J.

2006-01-01

We have previously shown that viewing simulated rotary self-motion during treadmill locomotion causes immediate strategic modifications (Richards et al. 2004) as well as an after effect reflecting adaptive modification of the control of position and trajectory during over-ground locomotion (Mulavara et al. 2005). The process of sensorimotor adaptation is comprised of both strategic and adaptive control mechanisms. Strategic control involves cognitive, on-line corrections to limb movements once one is aware of a sensory discordance. Over an extended period of exposure to the sensory discordance, new strategic sensorimotor coordination patterns are reinforced until they become more automatic, and therefore adaptive, in nature. The objective of this study was to investigate how strategic changes in trunk control during exposure to simulated rotary self-motion during treadmill walking influences adaptive modification of locomotor heading direction during over-ground stepping. Subjects (n = 10) walked on a motorized linear treadmill while viewing a wide field-of-view virtual scene for 24 minutes. The scene was static for the first 4 minutes and then, for the last 20 minutes, depicted constant rate self-motion equivalent to walking in a counter-clockwise, circular path around the perimeter of a room. Subjects performed five stepping trials both before and after the exposure period to assess after effects. Results from our previous study showed a significant change in heading direction (HD) during post-exposure step tests that was opposite the direction in which the scene rotated during the adaptation period. For the present study, we quantified strategic modifications in trunk movement control during scene exposure using normalized root mean square (R(sub p)) variation of the subject's 3D trunk positions and normalized sum of standard deviations (R(sub o)) variation of 3D trunk orientations during scene rotation relative to that during static scene presentation. Associated 95% confidence intervals, CI(sub P) and CI(sub O), were calculated to investigate the variation of strategic modifications during scene exposure. Repeated measures ANOVA and individual subject regression analyses showed that R(sub P) and R(sub O) (i.e. strategic modifications) for trunk fore/aft (X) and yaw movements, respectively, decreased significantly over the exposure period. Furthermore, we found a significant correlation between the magnitude change in HD and the rate at which the variation of strategic modifications in trunk X decreased. We also found evidence of a correlation between HD and the rate at which strategic modifications in trunk yaw decreased (p = .06). We infer that adaptive recalibration of locomotor trajectory using optic flow stimuli depends on the rate at which strategic intervention is reduced.
A knowledge-based object recognition system for applications in the space station

NASA Technical Reports Server (NTRS)

Dhawan, Atam P.

1988-01-01

A knowledge-based three-dimensional (3D) object recognition system is being developed. The system uses primitive-based hierarchical relational and structural matching for the recognition of 3D objects in the two-dimensional (2D) image for interpretation of the 3D scene. At present, the pre-processing, low-level preliminary segmentation, rule-based segmentation, and the feature extraction are completed. The data structure of the primitive viewing knowledge-base (PVKB) is also completed. Algorithms and programs based on attribute-trees matching for decomposing the segmented data into valid primitives were developed. The frame-based structural and relational descriptions of some objects were created and stored in a knowledge-base. This knowledge-base of the frame-based descriptions were developed on the MICROVAX-AI microcomputer in LISP environment. The simulated 3D scene of simple non-overlapping objects as well as real camera data of images of 3D objects of low-complexity have been successfully interpreted.
Exocentric direction judgements in computer-generated displays and actual scenes

NASA Technical Reports Server (NTRS)

Ellis, Stephen R.; Smith, Stephen; Mcgreevy, Michael W.; Grunwald, Arthur J.

1989-01-01

One of the most remarkable perceptual properties of common experience is that the perceived shapes of known objects are constant despite movements about them which transform their projections on the retina. This perceptual ability is one aspect of shape constancy (Thouless, 1931; Metzger, 1953; Borresen and Lichte, 1962). It requires that the viewer be able to sense and discount his or her relative position and orientation with respect to a viewed object. This discounting of relative position may be derived directly from the ranging information provided from stereopsis, from motion parallax, from vestibularly sensed rotation and translation, or from corollary information associated with voluntary movement. It is argued that: (1) errors in exocentric judgements of the azimuth of a target generated on an electronic perspective display are not viewpoint-independent, but are influenced by the specific geometry of their perspective projection; (2) elimination of binocular conflict by replacing electronic displays with actual scenes eliminates a previously reported equidistance tendency in azimuth error, but the viewpoint dependence remains; (3) the pattern of exocentrically judged azimuth error in real scenes viewed with a viewing direction depressed 22 deg and rotated + or - 22 deg with respect to a reference direction could not be explained by overestimation of the depression angle, i.e., a slant overestimation.
Piecewise-Planar StereoScan: Sequential Structure and Motion using Plane Primitives.

PubMed

Raposo, Carolina; Antunes, Michel; P Barreto, Joao

2017-08-09

The article describes a pipeline that receives as input a sequence of stereo images, and outputs the camera motion and a Piecewise-Planar Reconstruction (PPR) of the scene. The pipeline, named Piecewise-Planar StereoScan (PPSS), works as follows: the planes in the scene are detected for each stereo view using semi-dense depth estimation; the relative pose is computed by a new closed-form minimal algorithm that only uses point correspondences whenever plane detections do not fully constrain the motion; the camera motion and the PPR are jointly refined by alternating between discrete optimization and continuous bundle adjustment; and, finally, the detected 3D planes are segmented in images using a new framework that handles low texture and visibility issues. PPSS is extensively validated in indoor and outdoor datasets, and benchmarked against two popular point-based SfM pipelines. The experiments confirm that plane-based visual odometry is resilient to situations of small image overlap, poor texture, specularity, and perceptual aliasing where the fast LIBVISO2 pipeline fails. The comparison against VisualSfM+CMVS/PMVS shows that, for a similar computational complexity, PPSS is more accurate and provides much more compelling and visually pleasant 3D models. These results strongly suggest that plane primitives are an advantageous alternative to point correspondences for applications of SfM and 3D reconstruction in man-made environments.
Runway Texture and Grid Pattern Effects on Rate-of-Descent Perception

NASA Technical Reports Server (NTRS)

Schroeder, J. A.; Dearing, M. G.; Sweet, B. T.; Kaiser, M. K.; Rutkowski, Mike (Technical Monitor)

2001-01-01

To date, perceptual errors occur in determining descent rate from a computer-generated image in flight simulation. Pilots tend to touch down twice as hard in simulation than in flight, and more training time is needed in simulation before reaching steady-state performance. Barnes suggested that recognition of range may be the culprit, and he cited that problems such as collimated objects, binocular vision, and poor resolution lead to poor estimation of the velocity vector. Brown's study essentially ruled out that the lack of binocular vision is the problem. Dorfel added specificity to the problem by showing that pilots underestimated range in simulated scenes by 50% when 800 ft from the runway threshold. Palmer and Petitt showed that pilots are able to distinguish between a 1.7 ft/sec and 2.9 ft/sec sink rate when passively observing sink rates in a night scene. Platform motion also plays a role, as previous research has shown that the addition of substantial platform motion improves pilot estimates of vertical velocity and results in simulated touchdown rates more closely resembling flight. This experiment examined how some specific variations in the visual scene properties affect a pilot's perception of sink rate. It extended another experiment that focused on the visual and motion cues necessary for helicopter autorotations. In that experiment, pilots performed steep approaches to a runway. The visual content of the runway and its surroundings varied in two ways: texture and rectangular grid spacing. Four textures, included a no-texture case, were evaluated. Three grid spacings, including a no-grid case, were evaluated. The results showed that pilot better controlled their vertical descent rates when good texture cues were present. No significant differences were found for the grid manipulation. Using those visual scenes a simple psychophysics, experiment was performed. The purpose was to determine if the variations in the visual scenes allowed pilots to better perceive vertical velocity. To determine that answer, pilots passively viewed a particular visual scene in which the vehicle was descending at two different rates. Pilots had to select which of the two rates they thought was the fastest rate. The difference between the two rates changed using a staircase method, depending on whether or not the pilot was correct, until a minimum threshold between the two descent rates was reached. This process was repeated for all of the visual scenes to decide whether or not the visual scenes did allow pilots to perceive vertical velocity better among them. All of the data have yet to be analyzed; however, neither the effects of grid nor texture revealed any statistically significant trends. On further examination of the staircase method employed, a possibility exists that the lack of an evident trend may be due to the exit criterion used during the study. As such, the experiment will be repeated with an improved exit criterion in February. Results of this study will be presented in the submitted paper.
Automatic facial animation parameters extraction in MPEG-4 visual communication

NASA Astrophysics Data System (ADS)

Yang, Chenggen; Gong, Wanwei; Yu, Lu

2002-01-01

Facial Animation Parameters (FAPs) are defined in MPEG-4 to animate a facial object. The algorithm proposed in this paper to extract these FAPs is applied to very low bit-rate video communication, in which the scene is composed of a head-and-shoulder object with complex background. This paper addresses the algorithm to automatically extract all FAPs needed to animate a generic facial model, estimate the 3D motion of head by points. The proposed algorithm extracts human facial region by color segmentation and intra-frame and inter-frame edge detection. Facial structure and edge distribution of facial feature such as vertical and horizontal gradient histograms are used to locate the facial feature region. Parabola and circle deformable templates are employed to fit facial feature and extract a part of FAPs. A special data structure is proposed to describe deformable templates to reduce time consumption for computing energy functions. Another part of FAPs, 3D rigid head motion vectors, are estimated by corresponding-points method. A 3D head wire-frame model provides facial semantic information for selection of proper corresponding points, which helps to increase accuracy of 3D rigid object motion estimation.

A unified framework for building high performance DVEs

NASA Astrophysics Data System (ADS)

Lei, Kaibin; Ma, Zhixia; Xiong, Hua

2011-10-01

A unified framework for integrating PC cluster based parallel rendering with distributed virtual environments (DVEs) is presented in this paper. While various scene graphs have been proposed in DVEs, it is difficult to enable collaboration of different scene graphs. This paper proposes a technique for non-distributed scene graphs with the capability of object and event distribution. With the increase of graphics data, DVEs require more powerful rendering ability. But general scene graphs are inefficient in parallel rendering. The paper also proposes a technique to connect a DVE and a PC cluster based parallel rendering environment. A distributed multi-player video game is developed to show the interaction of different scene graphs and the parallel rendering performance on a large tiled display wall.
The Neural Dynamics of Attentional Selection in Natural Scenes.

PubMed

Kaiser, Daniel; Oosterhof, Nikolaas N; Peelen, Marius V

2016-10-12

The human visual system can only represent a small subset of the many objects present in cluttered scenes at any given time, such that objects compete for representation. Despite these processing limitations, the detection of object categories in cluttered natural scenes is remarkably rapid. How does the brain efficiently select goal-relevant objects from cluttered scenes? In the present study, we used multivariate decoding of magneto-encephalography (MEG) data to track the neural representation of within-scene objects as a function of top-down attentional set. Participants detected categorical targets (cars or people) in natural scenes. The presence of these categories within a scene was decoded from MEG sensor patterns by training linear classifiers on differentiating cars and people in isolation and testing these classifiers on scenes containing one of the two categories. The presence of a specific category in a scene could be reliably decoded from MEG response patterns as early as 160 ms, despite substantial scene clutter and variation in the visual appearance of each category. Strikingly, we find that these early categorical representations fully depend on the match between visual input and top-down attentional set: only objects that matched the current attentional set were processed to the category level within the first 200 ms after scene onset. A sensor-space searchlight analysis revealed that this early attention bias was localized to lateral occipitotemporal cortex, reflecting top-down modulation of visual processing. These results show that attention quickly resolves competition between objects in cluttered natural scenes, allowing for the rapid neural representation of goal-relevant objects. Efficient attentional selection is crucial in many everyday situations. For example, when driving a car, we need to quickly detect obstacles, such as pedestrians crossing the street, while ignoring irrelevant objects. How can humans efficiently perform such tasks, given the multitude of objects contained in real-world scenes? Here we used multivariate decoding of magnetoencephalogaphy data to characterize the neural underpinnings of attentional selection in natural scenes with high temporal precision. We show that brain activity quickly tracks the presence of objects in scenes, but crucially only for those objects that were immediately relevant for the participant. These results provide evidence for fast and efficient attentional selection that mediates the rapid detection of goal-relevant objects in real-world environments. Copyright © 2016 the authors 0270-6474/16/3610522-07$15.00/0.
Guidance of attention to objects and locations by long-term memory of natural scenes.

PubMed

Becker, Mark W; Rasmussen, Ian P

2008-11-01

Four flicker change-detection experiments demonstrate that scene-specific long-term memory guides attention to both behaviorally relevant locations and objects within a familiar scene. Participants performed an initial block of change-detection trials, detecting the addition of an object to a natural scene. After a 30-min delay, participants performed an unanticipated 2nd block of trials. When the same scene occurred in the 2nd block, the change within the scene was (a) identical to the original change, (b) a new object appearing in the original change location, (c) the same object appearing in a new location, or (d) a new object appearing in a new location. Results suggest that attention is rapidly allocated to previously relevant locations and then to previously relevant objects. This pattern of locations dominating objects remained when object identity information was made more salient. Eye tracking verified that scene memory results in more direct scan paths to previously relevant locations and objects. This contextual guidance suggests that a high-capacity long-term memory for scenes is used to insure that limited attentional capacity is allocated efficiently rather than being squandered.
Learning Relative Motion Concepts in Immersive and Non-immersive Virtual Environments

NASA Astrophysics Data System (ADS)

Kozhevnikov, Michael; Gurlitt, Johannes; Kozhevnikov, Maria

2013-12-01

The focus of the current study is to understand which unique features of an immersive virtual reality environment have the potential to improve learning relative motion concepts. Thirty-seven undergraduate students learned relative motion concepts using computer simulation either in immersive virtual environment (IVE) or non-immersive desktop virtual environment (DVE) conditions. Our results show that after the simulation activities, both IVE and DVE groups exhibited a significant shift toward a scientific understanding in their conceptual models and epistemological beliefs about the nature of relative motion, and also a significant improvement on relative motion problem-solving tests. In addition, we analyzed students' performance on one-dimensional and two-dimensional questions in the relative motion problem-solving test separately and found that after training in the simulation, the IVE group performed significantly better than the DVE group on solving two-dimensional relative motion problems. We suggest that egocentric encoding of the scene in IVE (where the learner constitutes a part of a scene they are immersed in), as compared to allocentric encoding on a computer screen in DVE (where the learner is looking at the scene from "outside"), is more beneficial than DVE for studying more complex (two-dimensional) relative motion problems. Overall, our findings suggest that such aspects of virtual realities as immersivity, first-hand experience, and the possibility of changing different frames of reference can facilitate understanding abstract scientific phenomena and help in displacing intuitive misconceptions with more accurate mental models.
The Low Backscattering Objects Classification in Polsar Image Based on Bag of Words Model Using Support Vector Machine

NASA Astrophysics Data System (ADS)

Yang, L.; Shi, L.; Li, P.; Yang, J.; Zhao, L.; Zhao, B.

2018-04-01

Due to the forward scattering and block of radar signal, the water, bare soil, shadow, named low backscattering objects (LBOs), often present low backscattering intensity in polarimetric synthetic aperture radar (PolSAR) image. Because the LBOs rise similar backscattering intensity and polarimetric responses, the spectral-based classifiers are inefficient to deal with LBO classification, such as Wishart method. Although some polarimetric features had been exploited to relieve the confusion phenomenon, the backscattering features are still found unstable when the system noise floor varies in the range direction. This paper will introduce a simple but effective scene classification method based on Bag of Words (BoW) model using Support Vector Machine (SVM) to discriminate the LBOs, without relying on any polarimetric features. In the proposed approach, square windows are firstly opened around the LBOs adaptively to determine the scene images, and then the Scale-Invariant Feature Transform (SIFT) points are detected in training and test scenes. The several SIFT features detected are clustered using K-means to obtain certain cluster centers as the visual word lists and scene images are represented using word frequency. At last, the SVM is selected for training and predicting new scenes as some kind of LBOs. The proposed method is executed over two AIRSAR data sets at C band and L band, including water, bare soil and shadow scenes. The experimental results illustrate the effectiveness of the scene method in distinguishing LBOs.
You shall know an object by the company it keeps: An investigation of semantic representations derived from object co-occurrence in visual scenes.

PubMed

Sadeghi, Zahra; McClelland, James L; Hoffman, Paul

2015-09-01

An influential position in lexical semantics holds that semantic representations for words can be derived through analysis of patterns of lexical co-occurrence in large language corpora. Firth (1957) famously summarised this principle as "you shall know a word by the company it keeps". We explored whether the same principle could be applied to non-verbal patterns of object co-occurrence in natural scenes. We performed latent semantic analysis (LSA) on a set of photographed scenes in which all of the objects present had been manually labelled. This resulted in a representation of objects in a high-dimensional space in which similarity between two objects indicated the degree to which they appeared in similar scenes. These representations revealed similarities among objects belonging to the same taxonomic category (e.g., items of clothing) as well as cross-category associations (e.g., between fruits and kitchen utensils). We also compared representations generated from this scene dataset with two established methods for elucidating semantic representations: (a) a published database of semantic features generated verbally by participants and (b) LSA applied to a linguistic corpus in the usual fashion. Statistical comparisons of the three methods indicated significant association between the structures revealed by each method, with the scene dataset displaying greater convergence with feature-based representations than did LSA applied to linguistic data. The results indicate that information about the conceptual significance of objects can be extracted from their patterns of co-occurrence in natural environments, opening the possibility for such data to be incorporated into existing models of conceptual representation. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
A novel scene management technology for complex virtual battlefield environment

NASA Astrophysics Data System (ADS)

Sheng, Changchong; Jiang, Libing; Tang, Bo; Tang, Xiaoan

2018-04-01

The efficient scene management of virtual environment is an important research content of computer real-time visualization, which has a decisive influence on the efficiency of drawing. However, Traditional scene management methods do not suitable for complex virtual battlefield environments, this paper combines the advantages of traditional scene graph technology and spatial data structure method, using the idea of management and rendering separation, a loose object-oriented scene graph structure is established to manage the entity model data in the scene, and the performance-based quad-tree structure is created for traversing and rendering. In addition, the collaborative update relationship between the above two structural trees is designed to achieve efficient scene management. Compared with the previous scene management method, this method is more efficient and meets the needs of real-time visualization.
Thermal bioaerosol cloud tracking with Bayesian classification

NASA Astrophysics Data System (ADS)

Smith, Christian W.; Dupuis, Julia R.; Schundler, Elizabeth C.; Marinelli, William J.

2017-05-01

The development of a wide area, bioaerosol early warning capability employing existing uncooled thermal imaging systems used for persistent perimeter surveillance is discussed. The capability exploits thermal imagers with other available data streams including meteorological data and employs a recursive Bayesian classifier to detect, track, and classify observed thermal objects with attributes consistent with a bioaerosol plume. Target detection is achieved based on similarity to a phenomenological model which predicts the scene-dependent thermal signature of bioaerosol plumes. Change detection in thermal sensor data is combined with local meteorological data to locate targets with the appropriate thermal characteristics. Target motion is tracked utilizing a Kalman filter and nearly constant velocity motion model for cloud state estimation. Track management is performed using a logic-based upkeep system, and data association is accomplished using a combinatorial optimization technique. Bioaerosol threat classification is determined using a recursive Bayesian classifier to quantify the threat probability of each tracked object. The classifier can accept additional inputs from visible imagers, acoustic sensors, and point biological sensors to improve classification confidence. This capability was successfully demonstrated for bioaerosol simulant releases during field testing at Dugway Proving Grounds. Standoff detection at a range of 700m was achieved for as little as 500g of anthrax simulant. Developmental test results will be reviewed for a range of simulant releases, and future development and transition plans for the bioaerosol early warning platform will be discussed.
When Does Repeated Search in Scenes Involve Memory? Looking at versus Looking for Objects in Scenes

ERIC Educational Resources Information Center

Vo, Melissa L. -H.; Wolfe, Jeremy M.

2012-01-01

One might assume that familiarity with a scene or previous encounters with objects embedded in a scene would benefit subsequent search for those items. However, in a series of experiments we show that this is not the case: When participants were asked to subsequently search for multiple objects in the same scene, search performance remained…
Visual motion combined with base of support width reveals variable field dependency in healthy young adults.

PubMed

Streepey, Jefferson W; Kenyon, Robert V; Keshner, Emily A

2007-01-01

We previously reported responses to induced postural instability in young healthy individuals viewing visual motion with a narrow (25 degrees in both directions) and wide (90 degrees and 55 degrees in the horizontal and vertical directions) field of view (FOV) as they stood on different sized blocks. Visual motion was achieved using an immersive virtual environment that moved realistically with head motion (natural motion) and translated sinusoidally at 0.1 Hz in the fore-aft direction (augmented motion). We observed that a subset of the subjects (steppers) could not maintain continuous stance on the smallest block when the virtual environment was in motion. We completed a posteriori analyses on the postural responses of the steppers and non-steppers that may inform us about the mechanisms underlying these differences in stability. We found that when viewing augmented motion with a wide FOV, there was a greater effect on the head and whole body center of mass and ankle angle root mean square (RMS) values of the steppers than of the non-steppers. FFT analyses revealed greater power at the frequency of the visual stimulus in the steppers compared to the non-steppers. Whole body COM time lags relative to the augmented visual scene revealed that the time-delay between the scene and the COM was significantly increased in the steppers. The increased responsiveness to visual information suggests a greater visual field-dependency of the steppers and suggests that the thresholds for shifting from a reliance on visual information to somatosensory information can differ even within a healthy population.
Learning Visual Design through Hypermedia: Pathways to Visual Literacy.

ERIC Educational Resources Information Center

Lockee, Barbara; Hergert, Tom

The interactive multimedia application described here attempts to provide learners and teachers with a common frame of reference for communicating about visual media. The system is based on a list of concepts related to composition, and illustrates those concepts with photographs, paintings, graphic designs, and motion picture scenes. The ability…
Scene and Position Specificity in Visual Memory for Objects

ERIC Educational Resources Information Center

Hollingworth, Andrew

2006-01-01

This study investigated whether and how visual representations of individual objects are bound in memory to scene context. Participants viewed a series of naturalistic scenes, and memory for the visual form of a target object in each scene was examined in a 2-alternative forced-choice test, with the distractor object either a different object…
Neuromorphic Event-Based 3D Pose Estimation

PubMed Central

Reverter Valeiras, David; Orchard, Garrick; Ieng, Sio-Hoi; Benosman, Ryad B.

2016-01-01

Pose estimation is a fundamental step in many artificial vision tasks. It consists of estimating the 3D pose of an object with respect to a camera from the object's 2D projection. Current state of the art implementations operate on images. These implementations are computationally expensive, especially for real-time applications. Scenes with fast dynamics exceeding 30–60 Hz can rarely be processed in real-time using conventional hardware. This paper presents a new method for event-based 3D object pose estimation, making full use of the high temporal resolution (1 μs) of asynchronous visual events output from a single neuromorphic camera. Given an initial estimate of the pose, each incoming event is used to update the pose by combining both 3D and 2D criteria. We show that the asynchronous high temporal resolution of the neuromorphic camera allows us to solve the problem in an incremental manner, achieving real-time performance at an update rate of several hundreds kHz on a conventional laptop. We show that the high temporal resolution of neuromorphic cameras is a key feature for performing accurate pose estimation. Experiments are provided showing the performance of the algorithm on real data, including fast moving objects, occlusions, and cases where the neuromorphic camera and the object are both in motion. PMID:26834547
Fast, accurate, small-scale 3D scene capture using a low-cost depth sensor

PubMed Central

Carey, Nicole; Nagpal, Radhika; Werfel, Justin

2017-01-01

Commercially available depth sensing devices are primarily designed for domains that are either macroscopic, or static. We develop a solution for fast microscale 3D reconstruction, using off-the-shelf components. By the addition of lenses, precise calibration of camera internals and positioning, and development of bespoke software, we turn an infrared depth sensor designed for human-scale motion and object detection into a device with mm-level accuracy capable of recording at up to 30Hz. PMID:28758159
SCEGRAM: An image database for semantic and syntactic inconsistencies in scenes.

PubMed

Öhlschläger, Sabine; Võ, Melissa Le-Hoa

2017-10-01

Our visual environment is not random, but follows compositional rules according to what objects are usually found where. Despite the growing interest in how such semantic and syntactic rules - a scene grammar - enable effective attentional guidance and object perception, no common image database containing highly-controlled object-scene modifications has been publically available. Such a database is essential in minimizing the risk that low-level features drive high-level effects of interest, which is being discussed as possible source of controversial study results. To generate the first database of this kind - SCEGRAM - we took photographs of 62 real-world indoor scenes in six consistency conditions that contain semantic and syntactic (both mild and extreme) violations as well as their combinations. Importantly, always two scenes were paired, so that an object was semantically consistent in one scene (e.g., ketchup in kitchen) and inconsistent in the other (e.g., ketchup in bathroom). Low-level salience did not differ between object-scene conditions and was generally moderate. Additionally, SCEGRAM contains consistency ratings for every object-scene condition, as well as object-absent scenes and object-only images. Finally, a cross-validation using eye-movements replicated previous results of longer dwell times for both semantic and syntactic inconsistencies compared to consistent controls. In sum, the SCEGRAM image database is the first to contain well-controlled semantic and syntactic object-scene inconsistencies that can be used in a broad range of cognitive paradigms (e.g., verbal and pictorial priming, change detection, object identification, etc.) including paradigms addressing developmental aspects of scene grammar. SCEGRAM can be retrieved for research purposes from http://www.scenegrammarlab.com/research/scegram-database/ .
Research on 3D virtual campus scene modeling based on 3ds Max and VRML

NASA Astrophysics Data System (ADS)

Kang, Chuanli; Zhou, Yanliu; Liang, Xianyue

2015-12-01

With the rapid development of modem technology, the digital information management and the virtual reality simulation technology has become a research hotspot. Virtual campus 3D model can not only express the real world objects of natural, real and vivid, and can expand the campus of the reality of time and space dimension, the combination of school environment and information. This paper mainly uses 3ds Max technology to create three-dimensional model of building and on campus buildings, special land etc. And then, the dynamic interactive function is realized by programming the object model in 3ds Max by VRML .This research focus on virtual campus scene modeling technology and VRML Scene Design, and the scene design process in a variety of real-time processing technology optimization strategy. This paper guarantees texture map image quality and improve the running speed of image texture mapping. According to the features and architecture of Guilin University of Technology, 3ds Max, AutoCAD and VRML were used to model the different objects of the virtual campus. Finally, the result of virtual campus scene is summarized.
GBS: Guidance by Semantics-Using High-Level Visual Inference to Improve Vision-Based Mobile Robot Localization

DTIC Science & Technology

2015-08-28

for the scene, and effectively isolates the points on buildings. We are now able to accurately filter in buildings, and filter out the ground, but...brushing hair and hugging. Time Action running kids Agent Motion rolling ball panning camera waves crashing Figure 3: Our work distinguishes inten- tional...action of an unknown agent (the kids in this example) from various other motions, such as the rolling ball, the crashing waves and the background mo
Robust video super-resolution with registration efficiency adaptation

NASA Astrophysics Data System (ADS)

Zhang, Xinfeng; Xiong, Ruiqin; Ma, Siwei; Zhang, Li; Gao, Wen

2010-07-01

Super-Resolution (SR) is a technique to construct a high-resolution (HR) frame by fusing a group of low-resolution (LR) frames describing the same scene. The effectiveness of the conventional super-resolution techniques, when applied on video sequences, strongly relies on the efficiency of motion alignment achieved by image registration. Unfortunately, such efficiency is limited by the motion complexity in the video and the capability of adopted motion model. In image regions with severe registration errors, annoying artifacts usually appear in the produced super-resolution video. This paper proposes a robust video super-resolution technique that adapts itself to the spatially-varying registration efficiency. The reliability of each reference pixel is measured by the corresponding registration error and incorporated into the optimization objective function of SR reconstruction. This makes the SR reconstruction highly immune to the registration errors, as outliers with higher registration errors are assigned lower weights in the objective function. In particular, we carefully design a mechanism to assign weights according to registration errors. The proposed superresolution scheme has been tested with various video sequences and experimental results clearly demonstrate the effectiveness of the proposed method.
A Theoretical and Experimental Analysis of the Outside World Perception Process

NASA Technical Reports Server (NTRS)

Wewerinke, P. H.

1978-01-01

The outside scene is often an important source of information for manual control tasks. Important examples of these are car driving and aircraft control. This paper deals with modelling this visual scene perception process on the basis of linear perspective geometry and the relative motion cues. Model predictions utilizing psychophysical threshold data from base-line experiments and literature of a variety of visual approach tasks are compared with experimental data. Both the performance and workload results illustrate that the model provides a meaningful description of the outside world perception process, with a useful predictive capability.
The Deployment of Visual Attention

DTIC Science & Technology

2006-03-01

targets: Evidence for memory-based control of attention. Psychonomic Bulletin & Review , 11(1), 71-76. Torralba, A. (2003). Modeling global scene...S., Fencsik, D. E., Tran, L., & Wolfe, J. M. (in press). How do we track invisible objects? Psychonomic Bulletin & Review . *Horowitz, T. S. (in press

A Vision-Based Wayfinding System for Visually Impaired People Using Situation Awareness and Activity-Based Instructions

PubMed Central

Kim, Eun Yi

2017-01-01

A significant challenge faced by visually impaired people is ‘wayfinding’, which is the ability to find one’s way to a destination in an unfamiliar environment. This study develops a novel wayfinding system for smartphones that can automatically recognize the situation and scene objects in real time. Through analyzing streaming images, the proposed system first classifies the current situation of a user in terms of their location. Next, based on the current situation, only the necessary context objects are found and interpreted using computer vision techniques. It estimates the motions of the user with two inertial sensors and records the trajectories of the user toward the destination, which are also used as a guide for the return route after reaching the destination. To efficiently convey the recognized results using an auditory interface, activity-based instructions are generated that guide the user in a series of movements along a route. To assess the effectiveness of the proposed system, experiments were conducted in several indoor environments: the sit in which the situation awareness accuracy was 90% and the object detection false alarm rate was 0.016. In addition, our field test results demonstrate that users can locate their paths with an accuracy of 97%. PMID:28813033
When does repeated search in scenes involve memory? Looking AT versus looking FOR objects in scenes

PubMed Central

Võ, Melissa L.-H.; Wolfe, Jeremy M.

2014-01-01

One might assume that familiarity with a scene or previous encounters with objects embedded in a scene would benefit subsequent search for those items. However, in a series of experiments we show that this is not the case: When participants were asked to subsequently search for multiple objects in the same scene, search performance remained essentially unchanged over the course of searches despite increasing scene familiarity. Similarly, looking at target objects during previews, which included letter search, 30 seconds of free viewing, or even 30 seconds of memorizing a scene, also did not benefit search for the same objects later on. However, when the same object was searched for again memory for the previous search was capable of producing very substantial speeding of search despite many different intervening searches. This was especially the case when the previous search engagement had been active rather than supported by a cue. While these search benefits speak to the strength of memory-guided search when the same search target is repeated, the lack of memory guidance during initial object searches – despite previous encounters with the target objects - demonstrates the dominance of guidance by generic scene knowledge in real-world search. PMID:21688939
When does repeated search in scenes involve memory? Looking at versus looking for objects in scenes.

PubMed

Võ, Melissa L-H; Wolfe, Jeremy M

2012-02-01

One might assume that familiarity with a scene or previous encounters with objects embedded in a scene would benefit subsequent search for those items. However, in a series of experiments we show that this is not the case: When participants were asked to subsequently search for multiple objects in the same scene, search performance remained essentially unchanged over the course of searches despite increasing scene familiarity. Similarly, looking at target objects during previews, which included letter search, 30 seconds of free viewing, or even 30 seconds of memorizing a scene, also did not benefit search for the same objects later on. However, when the same object was searched for again memory for the previous search was capable of producing very substantial speeding of search despite many different intervening searches. This was especially the case when the previous search engagement had been active rather than supported by a cue. While these search benefits speak to the strength of memory-guided search when the same search target is repeated, the lack of memory guidance during initial object searches-despite previous encounters with the target objects-demonstrates the dominance of guidance by generic scene knowledge in real-world search.
Recognition of 3-D Scene with Partially Occluded Objects

NASA Astrophysics Data System (ADS)

Lu, Siwei; Wong, Andrew K. C...

1987-03-01

This paper presents a robot vision system which is capable of recognizing objects in a 3-D scene and interpreting their spatial relation even though some objects in the scene may be partially occluded by other objects. An algorithm is developed to transform the geometric information from the range data into an attributed hypergraph representation (AHR). A hypergraph monomorphism algorithm is then used to compare the AHR of objects in the scene with a set of complete AHR's of prototypes. The capability of identifying connected components and interpreting various types of edges in the 3-D scene enables us to distinguish objects which are partially blocking each other in the scene. Using structural information stored in the primitive area graph, a heuristic hypergraph monomorphism algorithm provides an effective way for recognizing, locating, and interpreting partially occluded objects in the range image.
Seatbelt and helmet depiction on the big screen blockbuster injury prevention messages?

PubMed

Cowan, John A; Dubosh, Nicole; Hadley, Craig

2009-03-01

Injuries from vehicle crashes are a major cause of death among American youth. Many of these injuries are worsened because of noncompliant safety practices. Messages delivered by mass media are omnipresent in young peoples' lives and influence their behavior patterns. In this investigation, we analyzed seat belt and helmet messages from a sample of top-grossing motion pictures with emphasis on scene context and character demographics. Content analysis of 50 top-grossing motion pictures for years 2000 to 2004, with coding for seat belt and helmet usage by trained media coders. In 48 of 50 movies (53% PG-13; 33% R; 10% PG; 4% G) with vehicle scenes, 518 scenes (82% car/truck; 7% taxi/limo; 7% motorcycle; 4% bicycle/skateboard) were coded. Overall, seat belt and helmet usage rates were 15.4% and 33.3%, respectively, with verbal indications for seat belt or helmet use found in 1.0% of scenes. Safety compliance rates varied by character race (18.3% white; 6.5% black; p = 0.036). No differences in compliance rates were noted for high-speed or unsafe vehicle operation. The injury rate for noncompliant characters involved in crashes was 10.7%. A regression model demonstrated black character race and escape scenes most predictive of noncompliant safety behavior. Safety compliance messages and images are starkly absent in top-grossing motion pictures resulting in, at worst, a deleterious effect on vulnerable populations and public health initiatives, and, at minimum, a lost opportunity to prevent injury and death. Healthcare providers should call on the motion picture industry to improve safety compliance messages and images in their products delivered for mass consumption.
Temporal and spatial adaptation of transient responses to local features

PubMed Central

O'Carroll, David C.; Barnett, Paul D.; Nordström, Karin

2012-01-01

Interpreting visual motion within the natural environment is a challenging task, particularly considering that natural scenes vary enormously in brightness, contrast and spatial structure. The performance of current models for the detection of self-generated optic flow depends critically on these very parameters, but despite this, animals manage to successfully navigate within a broad range of scenes. Within global scenes local areas with more salient features are common. Recent work has highlighted the influence that local, salient features have on the encoding of optic flow, but it has been difficult to quantify how local transient responses affect responses to subsequent features and thus contribute to the global neural response. To investigate this in more detail we used experimenter-designed stimuli and recorded intracellularly from motion-sensitive neurons. We limited the stimulus to a small vertically elongated strip, to investigate local and global neural responses to pairs of local “doublet” features that were designed to interact with each other in the temporal and spatial domain. We show that the passage of a high-contrast doublet feature produces a complex transient response from local motion detectors consistent with predictions of a simple computational model. In the neuron, the passage of a high-contrast feature induces a local reduction in responses to subsequent low-contrast features. However, this neural contrast gain reduction appears to be recruited only when features stretch vertically (i.e., orthogonal to the direction of motion) across at least several aligned neighboring ommatidia. Horizontal displacement of the components of elongated features abolishes the local adaptation effect. It is thus likely that features in natural scenes with vertically aligned edges, such as tree trunks, recruit the greatest amount of response suppression. This property could emphasize the local responses to such features vs. those in nearby texture within the scene. PMID:23087617
Video Completion in Digital Stabilization Task Using Pseudo-Panoramic Technique

NASA Astrophysics Data System (ADS)

Favorskaya, M. N.; Buryachenko, V. V.; Zotin, A. G.; Pakhirka, A. I.

2017-05-01

Video completion is a necessary stage after stabilization of a non-stationary video sequence, if it is desirable to make the resolution of the stabilized frames equalled the resolution of the original frames. Usually the cropped stabilized frames lose 10-20% of area that means the worse visibility of the reconstructed scenes. The extension of a view of field may appear due to the pan-tilt-zoom unwanted camera movement. Our approach deals with a preparing of pseudo-panoramic key frame during a stabilization stage as a pre-processing step for the following inpainting. It is based on a multi-layered representation of each frame including the background and objects, moving differently. The proposed algorithm involves four steps, such as the background completion, local motion inpainting, local warping, and seamless blending. Our experiments show that a necessity of a seamless stitching occurs often than a local warping step. Therefore, a seamless blending was investigated in details including four main categories, such as feathering-based, pyramid-based, gradient-based, and optimal seam-based blending.
A Methodology for Evaluating the Fidelity of Ground-Based Flight Simulators

NASA Technical Reports Server (NTRS)

Zeyada, Y.; Hess, R. A.

1999-01-01

An analytical and experimental investigation was undertaken to model the manner in which pilots perceive and utilize visual, proprioceptive, and vestibular cues in a ground-based flight simulator. The study was part of a larger research effort which has the creation of a methodology for determining flight simulator fidelity requirements as its ultimate goal. The study utilized a closed-loop feedback structure of the pilot/simulator system which included the pilot, the cockpit inceptor, the dynamics of the simulated vehicle and the motion system. With the exception of time delays which accrued in visual scene production in the simulator, visual scene effects were not included in this study. The NASA Ames Vertical Motion Simulator was used in a simple, single-degree of freedom rotorcraft bob-up/down maneuver. Pilot/vehicle analysis and fuzzy-inference identification were employed to study the changes in fidelity which occurred as the characteristics of the motion system were varied over five configurations i The data from three of the five pilots that participated in the experimental study were analyzed in the fuzzy inference identification. Results indicate that both the analytical pilot/vehicle analysis and the fuzzyinference identification can be used to reflect changes in simulator fidelity for the task examined.
A Methodology for Evaluating the Fidelity of Ground-Based Flight Simulators

NASA Technical Reports Server (NTRS)

Zeyada, Y.; Hess, R. A.

1999-01-01

An analytical and experimental investigation was undertaken to model the manner in which pilots perceive and utilize visual, proprioceptive, and vestibular cues in a ground-based flight simulator. The study was part of a larger research effort which has the creation of a methodology for determining flight simulator fidelity requirements as its ultimate goal. The study utilized a closed-loop feedback structure of the pilot/simulator system which included the pilot, the cockpit inceptor, the dynamics of the simulated vehicle and the motion system. With the exception of time delays which accrued in visual scene production in the simulator, visual scene effects were not included in this study. The NASA Ames Vertical Motion Simulator was used in a simple, single-degree of freedom rotorcraft bob-up/down maneuver. Pilot/vehicle analysis and fuzzy-inference identification were employed to study the changes in fidelity which occurred as the characteristics of the motion system were varied over five configurations. The data from three of the five pilots that participated in the experimental study were analyzed in the fuzzy-inference identification. Results indicate that both the analytical pilot/vehicle analysis and the fuzzy-inference identification can be used to reflect changes in simulator fidelity for the task examined.
Automated content and quality assessment of full-motion-video for the generation of meta data

NASA Astrophysics Data System (ADS)

Harguess, Josh

2015-05-01

Virtually all of the video data (and full-motion-video (FMV)) that is currently collected and stored in support of missions has been corrupted to various extents by image acquisition and compression artifacts. Additionally, video collected by wide-area motion imagery (WAMI) surveillance systems and unmanned aerial vehicles (UAVs) and similar sources is often of low quality or in other ways corrupted so that it is not worth storing or analyzing. In order to make progress in the problem of automatic video analysis, the first problem that should be solved is deciding whether the content of the video is even worth analyzing to begin with. We present a work in progress to address three types of scenes which are typically found in real-world data stored in support of Department of Defense (DoD) missions: no or very little motion in the scene, large occlusions in the scene, and fast camera motion. Each of these produce video that is generally not usable to an analyst or automated algorithm for mission support and therefore should be removed or flagged to the user as such. We utilize recent computer vision advances in motion detection and optical flow to automatically assess FMV for the identification and generation of meta-data (or tagging) of video segments which exhibit unwanted scenarios as described above. Results are shown on representative real-world video data.
Scene Configuration and Object Reliability Affect the Use of Allocentric Information for Memory-Guided Reaching

PubMed Central

Klinghammer, Mathias; Blohm, Gunnar; Fiehler, Katja

2017-01-01

Previous research has shown that egocentric and allocentric information is used for coding target locations for memory-guided reaching movements. Especially, task-relevance determines the use of objects as allocentric cues. Here, we investigated the influence of scene configuration and object reliability as a function of task-relevance on allocentric coding for memory-guided reaching. For that purpose, we presented participants images of a naturalistic breakfast scene with five objects on a table and six objects in the background. Six of these objects served as potential reach-targets (= task-relevant objects). Participants explored the scene and after a short delay, a test scene appeared with one of the task-relevant objects missing, indicating the location of the reach target. After the test scene vanished, participants performed a memory-guided reaching movement toward the target location. Besides removing one object from the test scene, we also shifted the remaining task-relevant and/or task-irrelevant objects left- or rightwards either coherently in the same direction or incoherently in opposite directions. By varying object coherence, we manipulated the reliability of task-relevant and task-irrelevant objects in the scene. In order to examine the influence of scene configuration (distributed vs. grouped arrangement of task-relevant objects) on allocentric coding, we compared the present data with our previously published data set (Klinghammer et al., 2015). We found that reaching errors systematically deviated in the direction of object shifts, but only when the objects were task-relevant and their reliability was high. However, this effect was substantially reduced when task-relevant objects were distributed across the scene leading to a larger target-cue distance compared to a grouped configuration. No deviations of reach endpoints were observed in conditions with shifts of only task-irrelevant objects or with low object reliability irrespective of task-relevancy. Moreover, when solely task-relevant objects were shifted incoherently, the variability of reaching endpoints increased compared to coherent shifts of task-relevant objects. Our results suggest that the use of allocentric information for coding targets for memory-guided reaching depends on the scene configuration, in particular the average distance of the reach target to task-relevant objects, and the reliability of task-relevant allocentric information. PMID:28450826
Scene Configuration and Object Reliability Affect the Use of Allocentric Information for Memory-Guided Reaching.

PubMed

Klinghammer, Mathias; Blohm, Gunnar; Fiehler, Katja

2017-01-01

Previous research has shown that egocentric and allocentric information is used for coding target locations for memory-guided reaching movements. Especially, task-relevance determines the use of objects as allocentric cues. Here, we investigated the influence of scene configuration and object reliability as a function of task-relevance on allocentric coding for memory-guided reaching. For that purpose, we presented participants images of a naturalistic breakfast scene with five objects on a table and six objects in the background. Six of these objects served as potential reach-targets (= task-relevant objects). Participants explored the scene and after a short delay, a test scene appeared with one of the task-relevant objects missing, indicating the location of the reach target. After the test scene vanished, participants performed a memory-guided reaching movement toward the target location. Besides removing one object from the test scene, we also shifted the remaining task-relevant and/or task-irrelevant objects left- or rightwards either coherently in the same direction or incoherently in opposite directions. By varying object coherence, we manipulated the reliability of task-relevant and task-irrelevant objects in the scene. In order to examine the influence of scene configuration (distributed vs. grouped arrangement of task-relevant objects) on allocentric coding, we compared the present data with our previously published data set (Klinghammer et al., 2015). We found that reaching errors systematically deviated in the direction of object shifts, but only when the objects were task-relevant and their reliability was high. However, this effect was substantially reduced when task-relevant objects were distributed across the scene leading to a larger target-cue distance compared to a grouped configuration. No deviations of reach endpoints were observed in conditions with shifts of only task-irrelevant objects or with low object reliability irrespective of task-relevancy. Moreover, when solely task-relevant objects were shifted incoherently, the variability of reaching endpoints increased compared to coherent shifts of task-relevant objects. Our results suggest that the use of allocentric information for coding targets for memory-guided reaching depends on the scene configuration, in particular the average distance of the reach target to task-relevant objects, and the reliability of task-relevant allocentric information.
Traffic Video Image Segmentation Model Based on Bayesian and Spatio-Temporal Markov Random Field

NASA Astrophysics Data System (ADS)

Zhou, Jun; Bao, Xu; Li, Dawei; Yin, Yongwen

2017-10-01

Traffic video image is a kind of dynamic image and its background and foreground is changed at any time, which results in the occlusion. In this case, using the general method is more difficult to get accurate image segmentation. A segmentation algorithm based on Bayesian and Spatio-Temporal Markov Random Field is put forward, which respectively build the energy function model of observation field and label field to motion sequence image with Markov property, then according to Bayesian' rule, use the interaction of label field and observation field, that is the relationship of label field’s prior probability and observation field’s likelihood probability, get the maximum posterior probability of label field’s estimation parameter, use the ICM model to extract the motion object, consequently the process of segmentation is finished. Finally, the segmentation methods of ST - MRF and the Bayesian combined with ST - MRF were analyzed. Experimental results: the segmentation time in Bayesian combined with ST-MRF algorithm is shorter than in ST-MRF, and the computing workload is small, especially in the heavy traffic dynamic scenes the method also can achieve better segmentation effect.
-The Influence of Scene Context on Parafoveal Processing of Objects.

PubMed

Castelhano, Monica S; Pereira, Effie J

2017-04-21

Many studies in reading have shown the enhancing effect of context on the processing of a word before it is directly fixated (parafoveal processing of words; Balota et al., 1985; Balota & Rayner, 1983; Ehrlich & Rayner, 1981). Here, we examined whether scene context influences the parafoveal processing of objects and enhances the extraction of object information. Using a modified boundary paradigm (Rayner, 1975), the Dot-Boundary paradigm, participants fixated on a suddenly-onsetting cue before the preview object would onset 4° away. The preview object could be identical to the target, visually similar, visually dissimilar, or a control (black rectangle). The preview changed to the target object once a saccade toward the object was made. Critically, the objects were presented on either a consistent or an inconsistent scene background. Results revealed that there was a greater processing benefit for consistent than inconsistent scene backgrounds and that identical and visually similar previews produced greater processing benefits than other previews. In the second experiment, we added an additional context condition in which the target location was inconsistent, but the scene semantics remained consistent. We found that changing the location of the target object disrupted the processing benefit derived from the consistent context. Most importantly, across both experiments, the effect of preview was not enhanced by scene context. Thus, preview information and scene context appear to independently boost the parafoveal processing of objects without any interaction from object-scene congruency.
[Perception of objects and scenes in age-related macular degeneration].

PubMed

Tran, T H C; Boucart, M

2012-01-01

Vision related quality of life questionnaires suggest that patients with AMD exhibit difficulties in finding objects and in mobility. In the natural environment, objects seldom appear in isolation. They appear in a spatial context which may obscure them in part or place obstacles in the patient's path. Furthermore, the luminance of a natural scene varies as a function of the hour of the day and the light source, which can alter perception. This study aims to evaluate recognition of objects and natural scenes by patients with AMD, by using photographs of such scenes. Studies demonstrate that AMD patients are able to categorize scenes as nature scenes or urban scenes and to discriminate indoor from outdoor scenes with a high degree of precision. They detect objects better in isolation, in color, or against a white background than in their natural contexts. These patients encounter more difficulties than normally sighted individuals in detecting objects in a low-contrast, black-and-white scene. These results may have implications for rehabilitation, for layout of texts and magazines for the reading-impaired and for the rearrangement of the spatial environment of older AMD patients in order to facilitate mobility, finding objects and reducing the risk of falls. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
A view not to be missed: Salient scene content interferes with cognitive restoration

PubMed Central

Van der Jagt, Alexander P. N.; Craig, Tony; Brewer, Mark J.; Pearson, David G.

2017-01-01

Attention Restoration Theory (ART) states that built scenes place greater load on attentional resources than natural scenes. This is explained in terms of "hard" and "soft" fascination of built and natural scenes. Given a lack of direct empirical evidence for this assumption we propose that perceptual saliency of scene content can function as an empirically derived indicator of fascination. Saliency levels were established by measuring speed of scene category detection using a Go/No-Go detection paradigm. Experiment 1 shows that built scenes are more salient than natural scenes. Experiment 2 replicates these findings using greyscale images, ruling out a colour-based response strategy, and additionally shows that built objects in natural scenes affect saliency to a greater extent than the reverse. Experiment 3 demonstrates that the saliency of scene content is directly linked to cognitive restoration using an established restoration paradigm. Overall, these findings demonstrate an important link between the saliency of scene content and related cognitive restoration. PMID:28723975
A view not to be missed: Salient scene content interferes with cognitive restoration.

PubMed

Van der Jagt, Alexander P N; Craig, Tony; Brewer, Mark J; Pearson, David G

2017-01-01

Attention Restoration Theory (ART) states that built scenes place greater load on attentional resources than natural scenes. This is explained in terms of "hard" and "soft" fascination of built and natural scenes. Given a lack of direct empirical evidence for this assumption we propose that perceptual saliency of scene content can function as an empirically derived indicator of fascination. Saliency levels were established by measuring speed of scene category detection using a Go/No-Go detection paradigm. Experiment 1 shows that built scenes are more salient than natural scenes. Experiment 2 replicates these findings using greyscale images, ruling out a colour-based response strategy, and additionally shows that built objects in natural scenes affect saliency to a greater extent than the reverse. Experiment 3 demonstrates that the saliency of scene content is directly linked to cognitive restoration using an established restoration paradigm. Overall, these findings demonstrate an important link between the saliency of scene content and related cognitive restoration.
The role of temporo-parietal junction (TPJ) in global Gestalt perception.

PubMed

Huberle, Elisabeth; Karnath, Hans-Otto

2012-07-01

Grouping processes enable the coherent perception of our environment. A number of brain areas has been suggested to be involved in the integration of elements into objects including early and higher visual areas along the ventral visual pathway as well as motion-processing areas of the dorsal visual pathway. However, integration not only is required for the cortical representation of individual objects, but is also essential for the perception of more complex visual scenes consisting of several different objects and/or shapes. The present fMRI experiments aimed to address such integration processes. We investigated the neural correlates underlying the global Gestalt perception of hierarchically organized stimuli that allowed parametrical degrading of the object at the global level. The comparison of intact versus disturbed perception of the global Gestalt revealed a network of cortical areas including the temporo-parietal junction (TPJ), anterior cingulate cortex and the precuneus. The TPJ location corresponds well with the areas known to be typically lesioned in stroke patients with simultanagnosia following bilateral brain damage. These patients typically show a deficit in identifying the global Gestalt of a visual scene. Further, we found the closest relation between behavioral performance and fMRI activation for the TPJ. Our data thus argue for a significant role of the TPJ in human global Gestalt perception.
Cultural differences in the lateral occipital complex while viewing incongruent scenes

PubMed Central

Yang, Yung-Jui; Goh, Joshua; Hong, Ying-Yi; Park, Denise C.

2010-01-01

Converging behavioral and neuroimaging evidence indicates that culture influences the processing of complex visual scenes. Whereas Westerners focus on central objects and tend to ignore context, East Asians process scenes more holistically, attending to the context in which objects are embedded. We investigated cultural differences in contextual processing by manipulating the congruence of visual scenes presented in an fMR-adaptation paradigm. We hypothesized that East Asians would show greater adaptation to incongruent scenes, consistent with their tendency to process contextual relationships more extensively than Westerners. Sixteen Americans and 16 native Chinese were scanned while viewing sets of pictures consisting of a focal object superimposed upon a background scene. In half of the pictures objects were paired with congruent backgrounds, and in the other half objects were paired with incongruent backgrounds. We found that within both the right and left lateral occipital complexes, Chinese participants showed significantly greater adaptation to incongruent scenes than to congruent scenes relative to American participants. These results suggest that Chinese were more sensitive to contextual incongruity than were Americans and that they reacted to incongruent object/background pairings by focusing greater attention on the object. PMID:20083532
Object-Based Attention and Cognitive Tunneling

ERIC Educational Resources Information Center

Jarmasz, Jerzy; Herdman, Chris M.; Johannsdottir, Kamilla Run

2005-01-01

Simulator-based research has shown that pilots cognitively tunnel their attention on head-up displays (HUDs). Cognitive tunneling has been linked to object-based visual attention on the assumption that HUD symbology is perceptually grouped into an object that is perceived and attended separately from the external scene. The present research…

Machine learning-based augmented reality for improved surgical scene understanding.

PubMed

Pauly, Olivier; Diotte, Benoit; Fallavollita, Pascal; Weidert, Simon; Euler, Ekkehard; Navab, Nassir

2015-04-01

In orthopedic and trauma surgery, AR technology can support surgeons in the challenging task of understanding the spatial relationships between the anatomy, the implants and their tools. In this context, we propose a novel augmented visualization of the surgical scene that mixes intelligently the different sources of information provided by a mobile C-arm combined with a Kinect RGB-Depth sensor. Therefore, we introduce a learning-based paradigm that aims at (1) identifying the relevant objects or anatomy in both Kinect and X-ray data, and (2) creating an object-specific pixel-wise alpha map that permits relevance-based fusion of the video and the X-ray images within one single view. In 12 simulated surgeries, we show very promising results aiming at providing for surgeons a better surgical scene understanding as well as an improved depth perception. Copyright © 2014 Elsevier Ltd. All rights reserved.
Stereo-motion cooperation and the use of motion disparity in the visual perception of 3-D structure.

PubMed

Cornilleau-Pérès, V; Droulez, J

1993-08-01

When an observer views a moving scene binocularly, both motion parallax and binocular disparity provide depth information. In Experiments 1A-1C, we measured sensitivity to surface curvature when these depth cues were available either individually or simultaneously. When the depth cues yielded comparable sensitivity to surface curvature, we found that curvature detection was easier with the cues present simultaneously, rather than individually. For 2 of the 6 subjects, this effect was stronger when the component of frontal translation of the surface was vertical, rather than horizontal. No such anisotropy was found for the 4 other subjects. If a moving object is observed binocularly, the patterns of optic flow are different on the left and right retinae. We have suggested elsewhere (Cornilleau-Pérès & Droulez, in press) that this motion disparity might be used as a visual cue for the perception of a 3-D structure. Our model consisted in deriving binocular disparity from the left and right distributions of vertical velocities, rather than from luminous intensities, as has been done in classical studies on stereoscopic vision. The model led to some predictions concerning the detection of surface curvature from motion disparity in the presence or absence of intensity-based disparity (classically termed binocular disparity). In a second set of experiments, we attempted to test these predictions, and we failed to validate our theoretical scheme from a physiological point of view.
Speed and direction changes induce the perception of animacy in 7-month-old infants

PubMed Central

Träuble, Birgit; Pauen, Sabina; Poulin-Dubois, Diane

2014-01-01

A large body of research has documented infants’ ability to classify animate and inanimate objects based on static or dynamic information. It has been shown that infants less than 1 year of age transfer animacy-specific expectations from dynamic point-light displays to static images. The present study examined whether basic motion cues that typically trigger judgments of perceptual animacy in older children and adults lead 7-month-olds to infer an ambiguous object’s identity from dynamic information. Infants were tested with a novel paradigm that required inferring the animacy status of an ambiguous moving shape. An ambiguous shape emerged from behind a screen and its identity could only be inferred from its motion. Its motion pattern varied distinctively between scenes: it either changed speed and direction in an animate way, or it moved along a straight path at a constant speed (i.e., in an inanimate way). At test, the identity of the shape was revealed and it was either consistent or inconsistent with its motion pattern. Infants looked longer on trials with the inconsistent outcome. We conclude that 7-month-olds’ representations of animates and inanimates include category-specific associations between static and dynamic attributes. Moreover, these associations seem to hold for simple dynamic cues that are considered minimal conditions for animacy perception. PMID:25346712
Hyperspectral imaging simulation of object under sea-sky background

NASA Astrophysics Data System (ADS)

Wang, Biao; Lin, Jia-xuan; Gao, Wei; Yue, Hui

2016-10-01

Remote sensing image simulation plays an important role in spaceborne/airborne load demonstration and algorithm development. Hyperspectral imaging is valuable in marine monitoring, search and rescue. On the demand of spectral imaging of objects under the complex sea scene, physics based simulation method of spectral image of object under sea scene is proposed. On the development of an imaging simulation model considering object, background, atmosphere conditions, sensor, it is able to examine the influence of wind speed, atmosphere conditions and other environment factors change on spectral image quality under complex sea scene. Firstly, the sea scattering model is established based on the Philips sea spectral model, the rough surface scattering theory and the water volume scattering characteristics. The measured bi directional reflectance distribution function (BRDF) data of objects is fit to the statistical model. MODTRAN software is used to obtain solar illumination on the sea, sky brightness, the atmosphere transmittance from sea to sensor and atmosphere backscattered radiance, and Monte Carlo ray tracing method is used to calculate the sea surface object composite scattering and spectral image. Finally, the object spectrum is acquired by the space transformation, radiation degradation and adding the noise. The model connects the spectrum image with the environmental parameters, the object parameters, and the sensor parameters, which provide a tool for the load demonstration and algorithm development.
New insights into the role of motion and form vision in neurodevelopmental disorders.

PubMed

Johnston, Richard; Pitchford, Nicola J; Roach, Neil W; Ledgeway, Timothy

2017-12-01

A selective deficit in processing the global (overall) motion, but not form, of spatially extensive objects in the visual scene is frequently associated with several neurodevelopmental disorders, including preterm birth. Existing theories that proposed to explain the origin of this visual impairment are, however, challenged by recent research. In this review, we explore alternative hypotheses for why deficits in the processing of global motion, relative to global form, might arise. We describe recent evidence that has utilised novel tasks of global motion and global form to elucidate the underlying nature of the visual deficit reported in different neurodevelopmental disorders. We also examine the role of IQ and how the sex of an individual can influence performance on these tasks, as these are factors that are associated with performance on global motion tasks, but have not been systematically controlled for in previous studies exploring visual processing in clinical populations. Finally, we suggest that a new theoretical framework is needed for visual processing in neurodevelopmental disorders and present recommendations for future research. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
The portrayal of coma in contemporary motion pictures.

PubMed

Wijdicks, Eelco F M; Wijdicks, Coen A

2006-05-09

Coma has been a theme of screenplays in motion pictures, but there is no information about its accuracy. The authors reviewed 30 movies from 1970 to 2004 with actors depicting prolonged coma. Accurate depiction of comatose patients was defined by appearance, the complexity of care, accurate cause of coma and probability of awakening, and appropriate compassionate discussion between the physician and family members. Twenty-two key scenes from 17 movies were rated for accuracy by a panel of neurointensivists and neuroscience nurses and then were shown to 72 nonmedical viewers. Accuracy of the scenes was assessed using a Likert Scale. Coma was most often caused by motor vehicle accidents or violence (63%). The time in a comatose state varied from days to 10 years. Awakening occurred in 18 of 30 motion pictures (60%). Awakening was sudden with cognition intact, even after prolonged time in a coma. Actors personified "Sleeping Beauty" (eyes closed, beautifully groomed). Physicians appeared as caricatures. Only two movies had a reasonable accurate representation (Dream Life of Angels and Reversal of Fortune). The majority of the surveyed viewers identified inaccuracy of representation of coma, awakenings, and conversations on the experience of being in a coma, except in 8 of the 22 scenes (36%). Twenty-eight of the 72 viewers (39%) could potentially allow these scenes to influence decisions in real life. Misrepresentation of coma and awakening was common in motion pictures and impacted on the public perception of coma. Neurologic advice regarding prolonged coma is needed.
Semantic memory for contextual regularities within and across scene categories: evidence from eye movements.

PubMed

Brockmole, James R; Le-Hoa Võ, Melissa

2010-10-01

When encountering familiar scenes, observers can use item-specific memory to facilitate the guidance of attention to objects appearing in known locations or configurations. Here, we investigated how memory for relational contingencies that emerge across different scenes can be exploited to guide attention. Participants searched for letter targets embedded in pictures of bedrooms. In a between-subjects manipulation, targets were either always on a bed pillow or randomly positioned. When targets were systematically located within scenes, search for targets became more efficient. Importantly, this learning transferred to bedrooms without pillows, ruling out learning that is based on perceptual contingencies. Learning also transferred to living room scenes, but it did not transfer to kitchen scenes, even though both scene types contained pillows. These results suggest that statistical regularities abstracted across a range of stimuli are governed by semantic expectations regarding the presence of target-predicting local landmarks. Moreover, explicit awareness of these contingencies led to a central tendency bias in recall memory for precise target positions that is similar to the spatial category effects observed in landmark memory. These results broaden the scope of conditions under which contextual cuing operates and demonstrate how semantic memory plays a causal and independent role in the learning of associations between objects in real-world scenes.
Guidance of Attention to Objects and Locations by Long-Term Memory of Natural Scenes

ERIC Educational Resources Information Center

Becker, Mark W.; Rasmussen, Ian P.

2008-01-01

Four flicker change-detection experiments demonstrate that scene-specific long-term memory guides attention to both behaviorally relevant locations and objects within a familiar scene. Participants performed an initial block of change-detection trials, detecting the addition of an object to a natural scene. After a 30-min delay, participants…
Real-time people counting system using a single video camera

NASA Astrophysics Data System (ADS)

Lefloch, Damien; Cheikh, Faouzi A.; Hardeberg, Jon Y.; Gouton, Pierre; Picot-Clemente, Romain

2008-02-01

There is growing interest in video-based solutions for people monitoring and counting in business and security applications. Compared to classic sensor-based solutions the video-based ones allow for more versatile functionalities, improved performance with lower costs. In this paper, we propose a real-time system for people counting based on single low-end non-calibrated video camera. The two main challenges addressed in this paper are: robust estimation of the scene background and the number of real persons in merge-split scenarios. The latter is likely to occur whenever multiple persons move closely, e.g. in shopping centers. Several persons may be considered to be a single person by automatic segmentation algorithms, due to occlusions or shadows, leading to under-counting. Therefore, to account for noises, illumination and static objects changes, a background substraction is performed using an adaptive background model (updated over time based on motion information) and automatic thresholding. Furthermore, post-processing of the segmentation results is performed, in the HSV color space, to remove shadows. Moving objects are tracked using an adaptive Kalman filter, allowing a robust estimation of the objects future positions even under heavy occlusion. The system is implemented in Matlab, and gives encouraging results even at high frame rates. Experimental results obtained based on the PETS2006 datasets are presented at the end of the paper.
A Drastic Change in Background Luminance or Motion Degrades the Preview Benefit.

PubMed

Osugi, Takayuki; Murakami, Ikuya

2017-01-01

When some distractors (old items) precede some others (new items) in an inefficient visual search task, the search is restricted to new items, and yields a phenomenon termed the preview benefit. It has recently been demonstrated that, in this preview search task, the onset of repetitive changes in the background disrupts the preview benefit, whereas a single transient change in the background does not. In the present study, we explored this effect with dynamic background changes occurring in the context of realistic scenes, to examine the robustness and usefulness of visual marking. We examined whether preview benefit in a preview search task survived through task-irrelevant changes in the scene, namely a luminance change and the initiation of coherent motion, both occurring in the background. Luminance change of the background disrupted preview benefit if it was synchronized with the onset of the search display. Furthermore, although the presence of coherent background motion per se did not affect preview benefit, its synchronized initiation with the onset of the search display did disrupt preview benefit if the motion speed was sufficiently high. These results suggest that visual marking can be destroyed by a transient event in the scene if that event is sufficiently drastic.
Recognition and attention guidance during contextual cueing in real-world scenes: evidence from eye movements.

PubMed

Brockmole, James R; Henderson, John M

2006-07-01

When confronted with a previously encountered scene, what information is used to guide search to a known target? We contrasted the role of a scene's basic-level category membership with its specific arrangement of visual properties. Observers were repeatedly shown photographs of scenes that contained consistently but arbitrarily located targets, allowing target positions to be associated with scene content. Learned scenes were then unexpectedly mirror reversed, spatially translating visual features as well as the target across the display while preserving the scene's identity and concept. Mirror reversals produced a cost as the eyes initially moved toward the position in the display in which the target had previously appeared. The cost was not complete, however; when initial search failed, the eyes were quickly directed to the target's new position. These results suggest that in real-world scenes, shifts of attention are initially based on scene identity, and subsequent shifts are guided by more detailed information regarding scene and object layout.
Perception of 3D spatial relations for 3D displays

NASA Astrophysics Data System (ADS)

Rosen, Paul; Pizlo, Zygmunt; Hoffmann, Christoph; Popescu, Voicu S.

2004-05-01

We test perception of 3D spatial relations in 3D images rendered by a 3D display (Perspecta from Actuality Systems) and compare it to that of a high-resolution flat panel display. 3D images provide the observer with such depth cues as motion parallax and binocular disparity. Our 3D display is a device that renders a 3D image by displaying, in rapid succession, radial slices through the scene on a rotating screen. The image is contained in a glass globe and can be viewed from virtually any direction. In the psychophysical experiment several families of 3D objects are used as stimuli: primitive shapes (cylinders and cuboids), and complex objects (multi-story buildings, cars, and pieces of furniture). Each object has at least one plane of symmetry. On each trial an object or its "distorted" version is shown at an arbitrary orientation. The distortion is produced by stretching an object in a random direction by 40%. This distortion must eliminate the symmetry of an object. The subject's task is to decide whether or not the presented object is distorted under several viewing conditions (monocular/binocular, with/without motion parallax, and near/far). The subject's performance is measured by the discriminability d', which is a conventional dependent variable in signal detection experiments.
Subliminal encoding and flexible retrieval of objects in scenes.

PubMed

Wuethrich, Sergej; Hannula, Deborah E; Mast, Fred W; Henke, Katharina

2018-04-27

Our episodic memory stores what happened when and where in life. Episodic memory requires the rapid formation and flexible retrieval of where things are located in space. Consciousness of the encoding scene is considered crucial for episodic memory formation. Here, we question the necessity of consciousness and hypothesize that humans can form unconscious episodic memories. Participants were presented with subliminal scenes, i.e., scenes invisible to the conscious mind. The scenes displayed objects at certain locations for participants to form unconscious object-in-space memories. Later, the same scenes were presented supraliminally, i.e., visibly, for retrieval testing. Scenes were presented absent the objects and rotated by 90°-270° in perspective to assess the representational flexibility of unconsciously formed memories. During the test phase, participants performed a forced-choice task that required them to place an object in one of two highlighted scene locations and their eye movements were recorded. Evaluation of the eye tracking data revealed that participants remembered object locations unconsciously, irrespective of changes in viewing perspective. This effect of gaze was related to correct placements of objects in scenes, and an intuitive decision style was necessary for unconscious memories to influence intentional behavior to a significant degree. We conclude that conscious perception is not mandatory for spatial episodic memory formation. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
Real-Time Motion Tracking for Indoor Moving Sphere Objects with a LiDAR Sensor.

PubMed

Huang, Lvwen; Chen, Siyuan; Zhang, Jianfeng; Cheng, Bang; Liu, Mingqing

2017-08-23

Object tracking is a crucial research subfield in computer vision and it has wide applications in navigation, robotics and military applications and so on. In this paper, the real-time visualization of 3D point clouds data based on the VLP-16 3D Light Detection and Ranging (LiDAR) sensor is achieved, and on the basis of preprocessing, fast ground segmentation, Euclidean clustering segmentation for outliers, View Feature Histogram (VFH) feature extraction, establishing object models and searching matching a moving spherical target, the Kalman filter and adaptive particle filter are used to estimate in real-time the position of a moving spherical target. The experimental results show that the Kalman filter has the advantages of high efficiency while adaptive particle filter has the advantages of high robustness and high precision when tested and validated on three kinds of scenes under the condition of target partial occlusion and interference, different moving speed and different trajectories. The research can be applied in the natural environment of fruit identification and tracking, robot navigation and control and other fields.
Camouflage and visual perception

PubMed Central

Troscianko, Tom; Benton, Christopher P.; Lovell, P. George; Tolhurst, David J.; Pizlo, Zygmunt

2008-01-01

How does an animal conceal itself from visual detection by other animals? This review paper seeks to identify general principles that may apply in this broad area. It considers mechanisms of visual encoding, of grouping and object encoding, and of search. In most cases, the evidence base comes from studies of humans or species whose vision approximates to that of humans. The effort is hampered by a relatively sparse literature on visual function in natural environments and with complex foraging tasks. However, some general constraints emerge as being potentially powerful principles in understanding concealment—a ‘constraint’ here means a set of simplifying assumptions. Strategies that disrupt the unambiguous encoding of discontinuities of intensity (edges), and of other key visual attributes, such as motion, are key here. Similar strategies may also defeat grouping and object-encoding mechanisms. Finally, the paper considers how we may understand the processes of search for complex targets in complex scenes. The aim is to provide a number of pointers towards issues, which may be of assistance in understanding camouflage and concealment, particularly with reference to how visual systems can detect the shape of complex, concealed objects. PMID:18990671
Real-Time Motion Tracking for Indoor Moving Sphere Objects with a LiDAR Sensor

PubMed Central

Chen, Siyuan; Zhang, Jianfeng; Cheng, Bang; Liu, Mingqing

2017-01-01

Object tracking is a crucial research subfield in computer vision and it has wide applications in navigation, robotics and military applications and so on. In this paper, the real-time visualization of 3D point clouds data based on the VLP-16 3D Light Detection and Ranging (LiDAR) sensor is achieved, and on the basis of preprocessing, fast ground segmentation, Euclidean clustering segmentation for outliers, View Feature Histogram (VFH) feature extraction, establishing object models and searching matching a moving spherical target, the Kalman filter and adaptive particle filter are used to estimate in real-time the position of a moving spherical target. The experimental results show that the Kalman filter has the advantages of high efficiency while adaptive particle filter has the advantages of high robustness and high precision when tested and validated on three kinds of scenes under the condition of target partial occlusion and interference, different moving speed and different trajectories. The research can be applied in the natural environment of fruit identification and tracking, robot navigation and control and other fields. PMID:28832520
[Visual representation of natural scenes in flicker changes].

PubMed

Nakashima, Ryoichi; Yokosawa, Kazuhiko

2010-08-01

Coherence theory in scene perception (Rensink, 2002) assumes the retention of volatile object representations on which attention is not focused. On the other hand, visual memory theory in scene perception (Hollingworth & Henderson, 2002) assumes that robust object representations are retained. In this study, we hypothesized that the difference between these two theories is derived from the difference of the experimental tasks that they are based on. In order to verify this hypothesis, we examined the properties of visual representation by using a change detection and memory task in a flicker paradigm. We measured the representations when participants were instructed to search for a change in a scene, and compared them with the intentional memory representations. The visual representations were retained in visual long-term memory even in the flicker paradigm, and were as robust as the intentional memory representations. However, the results indicate that the representations are unavailable for explicitly localizing a scene change, but are available for answering the recognition test. This suggests that coherence theory and visual memory theory are compatible.
Extraction of edge-based and region-based features for object recognition

NASA Astrophysics Data System (ADS)

Coutts, Benjamin; Ravi, Srinivas; Hu, Gongzhu; Shrikhande, Neelima

1993-08-01

One of the central problems of computer vision is object recognition. A catalogue of model objects is described as a set of features such as edges and surfaces. The same features are extracted from the scene and matched against the models for object recognition. Edges and surfaces extracted from the scenes are often noisy and imperfect. In this paper algorithms are described for improving low level edge and surface features. Existing edge extraction algorithms are applied to the intensity image to obtain edge features. Initial edges are traced by following directions of the current contour. These are improved by using corresponding depth and intensity information for decision making at branch points. Surface fitting routines are applied to the range image to obtain planar surface patches. An algorithm of region growing is developed that starts with a coarse segmentation and uses quadric surface fitting to iteratively merge adjacent regions into quadric surfaces based on approximate orthogonal distance regression. Surface information obtained is returned to the edge extraction routine to detect and remove fake edges. This process repeats until no more merging or edge improvement can take place. Both synthetic (with Gaussian noise) and real images containing multiple object scenes have been tested using the merging criteria. Results appeared quite encouraging.
Staging Scenes of Co-Cultural Communication: Acting out Aspects of Marginalized and Dominant Identities

ERIC Educational Resources Information Center

Root, Elizabeth

2018-01-01

Courses: Intercultural Communication, Interracial Communication, or an Interpersonal Communication class that covers co-cultural theory. Objectives: Students will be able to demonstrate a practical application of co-cultural theory by creating scenes that illustrate different communicative approaches and desired outcomes based on communication…
Finding and recognizing objects in natural scenes: complementary computations in the dorsal and ventral visual systems

PubMed Central

Rolls, Edmund T.; Webb, Tristan J.

2014-01-01

Searching for and recognizing objects in complex natural scenes is implemented by multiple saccades until the eyes reach within the reduced receptive field sizes of inferior temporal cortex (IT) neurons. We analyze and model how the dorsal and ventral visual streams both contribute to this. Saliency detection in the dorsal visual system including area LIP is modeled by graph-based visual saliency, and allows the eyes to fixate potential objects within several degrees. Visual information at the fixated location subtending approximately 9° corresponding to the receptive fields of IT neurons is then passed through a four layer hierarchical model of the ventral cortical visual system, VisNet. We show that VisNet can be trained using a synaptic modification rule with a short-term memory trace of recent neuronal activity to capture both the required view and translation invariances to allow in the model approximately 90% correct object recognition for 4 objects shown in any view across a range of 135° anywhere in a scene. The model was able to generalize correctly within the four trained views and the 25 trained translations. This approach analyses the principles by which complementary computations in the dorsal and ventral visual cortical streams enable objects to be located and recognized in complex natural scenes. PMID:25161619

Small-size pedestrian detection in large scene based on fast R-CNN

NASA Astrophysics Data System (ADS)

Wang, Shengke; Yang, Na; Duan, Lianghua; Liu, Lu; Dong, Junyu

2018-04-01

Pedestrian detection is a canonical sub-problem of object detection with high demand during recent years. Although recent deep learning object detectors such as Fast/Faster R-CNN have shown excellent performance for general object detection, they have limited success for small size pedestrian detection in large-view scene. We study that the insufficient resolution of feature maps lead to the unsatisfactory accuracy when handling small instances. In this paper, we investigate issues involving Fast R-CNN for pedestrian detection. Driven by the observations, we propose a very simple but effective baseline for pedestrian detection based on Fast R-CNN, employing the DPM detector to generate proposals for accuracy, and training a fast R-CNN style network to jointly optimize small size pedestrian detection with skip connection concatenating feature from different layers to solving coarseness of feature maps. And the accuracy is improved in our research for small size pedestrian detection in the real large scene.
Scene recognition following locomotion around a scene.

PubMed

Motes, Michael A; Finlay, Cory A; Kozhevnikov, Maria

2006-01-01

Effects of locomotion on scene-recognition reaction time (RT) and accuracy were studied. In experiment 1, observers memorized an 11-object scene and made scene-recognition judgments on subsequently presented scenes from the encoded view or different views (ie scenes were rotated or observers moved around the scene, both from 40 degrees to 360 degrees). In experiment 2, observers viewed different 5-object scenes on each trial and made scene-recognition judgments from the encoded view or after moving around the scene, from 36 degrees to 180 degrees. Across experiments, scene-recognition RT increased (in experiment 2 accuracy decreased) with angular distance between encoded and judged views, regardless of how the viewpoint changes occurred. The findings raise questions about conditions in which locomotion produces spatially updated representations of scenes.
Increasing Student Engagement and Enthusiasm: A Projectile Motion Crime Scene

NASA Astrophysics Data System (ADS)

Bonner, David

2010-05-01

Connecting physics concepts with real-world events allows students to establish a strong conceptual foundation. When such events are particularly interesting to students, it can greatly impact their engagement and enthusiasm in an activity. Activities that involve studying real-world events of high interest can provide students a long-lasting understanding and positive memorable experiences, both of which heighten the learning experiences of those students. One such activity, described in depth in this paper, utilizes a murder mystery and crime scene investigation as an application of basic projectile motion.
Individual differences in visual motion perception and neurotransmitter concentrations in the human brain.

PubMed

Takeuchi, Tatsuto; Yoshimoto, Sanae; Shimada, Yasuhiro; Kochiyama, Takanori; Kondo, Hirohito M

2017-02-19

Recent studies have shown that interindividual variability can be a rich source of information regarding the mechanism of human visual perception. In this study, we examined the mechanisms underlying interindividual variability in the perception of visual motion, one of the fundamental components of visual scene analysis, by measuring neurotransmitter concentrations using magnetic resonance spectroscopy. First, by psychophysically examining two types of motion phenomena-motion assimilation and contrast-we found that, following the presentation of the same stimulus, some participants perceived motion assimilation, while others perceived motion contrast. Furthermore, we found that the concentration of the excitatory neurotransmitter glutamate-glutamine (Glx) in the dorsolateral prefrontal cortex (Brodmann area 46) was positively correlated with the participant's tendency to motion assimilation over motion contrast; however, this effect was not observed in the visual areas. The concentration of the inhibitory neurotransmitter γ-aminobutyric acid had only a weak effect compared with that of Glx. We conclude that excitatory process in the suprasensory area is important for an individual's tendency to determine antagonistically perceived visual motion phenomena.This article is part of the themed issue 'Auditory and visual scene analysis'. © 2017 The Author(s).
Statistics of high-level scene context.

PubMed

Greene, Michelle R

2013-01-01

CONTEXT IS CRITICAL FOR RECOGNIZING ENVIRONMENTS AND FOR SEARCHING FOR OBJECTS WITHIN THEM: contextual associations have been shown to modulate reaction time and object recognition accuracy, as well as influence the distribution of eye movements and patterns of brain activations. However, we have not yet systematically quantified the relationships between objects and their scene environments. Here I seek to fill this gap by providing descriptive statistics of object-scene relationships. A total of 48, 167 objects were hand-labeled in 3499 scenes using the LabelMe tool (Russell et al., 2008). From these data, I computed a variety of descriptive statistics at three different levels of analysis: the ensemble statistics that describe the density and spatial distribution of unnamed "things" in the scene; the bag of words level where scenes are described by the list of objects contained within them; and the structural level where the spatial distribution and relationships between the objects are measured. The utility of each level of description for scene categorization was assessed through the use of linear classifiers, and the plausibility of each level for modeling human scene categorization is discussed. Of the three levels, ensemble statistics were found to be the most informative (per feature), and also best explained human patterns of categorization errors. Although a bag of words classifier had similar performance to human observers, it had a markedly different pattern of errors. However, certain objects are more useful than others, and ceiling classification performance could be achieved using only the 64 most informative objects. As object location tends not to vary as a function of category, structural information provided little additional information. Additionally, these data provide valuable information on natural scene redundancy that can be exploited for machine vision, and can help the visual cognition community to design experiments guided by statistics rather than intuition.
Motor mapping of implied actions during perception of emotional body language.

PubMed

Borgomaneri, Sara; Gazzola, Valeria; Avenanti, Alessio

2012-04-01

Perceiving and understanding emotional cues is critical for survival. Using the International Affective Picture System (IAPS) previous TMS studies have found that watching humans in emotional pictures increases motor excitability relative to seeing landscapes or household objects, suggesting that emotional cues may prime the body for action. Here we tested whether motor facilitation to emotional pictures may reflect the simulation of the human motor behavior implied in the pictures occurring independently of its emotional valence. Motor-evoked potentials (MEPs) to single-pulse TMS of the left motor cortex were recorded from hand muscles during observation and categorization of emotional and neutral pictures. In experiment 1 participants watched neutral, positive and negative IAPS stimuli, while in experiment 2, they watched pictures depicting human emotional (joyful, fearful), neutral body movements and neutral static postures. Experiment 1 confirms the increase in excitability for emotional IAPS stimuli found in previous research and shows, however, that more implied motion is perceived in emotional relative to neutral scenes. Experiment 2 shows that motor excitability and implied motion scores for emotional and neutral body actions were comparable and greater than for static body postures. In keeping with embodied simulation theories, motor response to emotional pictures may reflect the simulation of the action implied in the emotional scenes. Action simulation may occur independently of whether the observed implied action carries emotional or neutral meanings. Our study suggests the need of controlling implied motion when exploring motor response to emotional pictures of humans. Copyright © 2012 Elsevier Inc. All rights reserved.
Virgil Gus Grissom's Visit to LaRC

NASA Image and Video Library

1963-02-22

Astronaut Virgil "Gus" Grissom at the controls of the Visual Docking Simulator. From A.W. Vogeley, "Piloted Space-Flight Simulation at Langley Research Center," Paper presented at the American Society of Mechanical Engineers 1966 Winter Meeting, New York, NY, November 27-December 1, 1966. "This facility was [later known as the Visual-Optical Simulator.] It presents to the pilot an out-the-window view of his target in correct 6 degrees of freedom motion. The scene is obtained by a television camera pick-up viewing a small-scale gimbaled model of the target." "For docking studies, the docking target picture was projected onto the surface of a 20-foot-diameter sphere and the pilot could, effectively, maneuver into contract. this facility was used in a comparison study with the Rendezvous Docking Simulator - one of the few comparison experiments in which conditions were carefully controlled and a reasonable sample of pilots used. All pilots preferred the more realistic RDS visual scene. The pilots generally liked the RDS angular motion cues although some objected to the false gravity cues that these motions introduced. Training time was shorter on the RDS, but final performance on both simulators was essentially equal. " "For station-keeping studies, since close approach is not required, the target was presented to the pilot through a virtual-image system which projects his view to infinity, providing a more realistic effect. In addition to the target, the system also projects a star and horizon background. "
Usage of cornea and sclera back reflected images captured in security cameras for forensic and card games applications

NASA Astrophysics Data System (ADS)

Zalevsky, Zeev; Ilovitsh, Asaf; Beiderman, Yevgeny

2013-10-01

We present an approach allowing seeing objects that are hidden and that are not positioned in direct line of sight with security inspection cameras. The approach is based on inspecting the back reflections obtained from the cornea and the sclera of the eyes of people attending the inspected scene and which are positioned in front of the hidden objects we aim to image after performing proper calibration with point light source (e.g. a LED). The scene can be a forensic scene or for instance a casino in which the application is to see the cards of poker players seating in front of you.
Scene incongruity and attention.

PubMed

Mack, Arien; Clarke, Jason; Erol, Muge; Bert, John

2017-02-01

Does scene incongruity, (a mismatch between scene gist and a semantically incongruent object), capture attention and lead to conscious perception? We explored this question using 4 different procedures: Inattention (Experiment 1), Scene description (Experiment 2), Change detection (Experiment 3), and Iconic Memory (Experiment 4). We found no differences between scene incongruity and scene congruity in Experiments 1, 2, and 4, although in Experiment 3 change detection was faster for scenes containing an incongruent object. We offer an explanation for why the change detection results differ from the results of the other three experiments. In all four experiments, participants invariably failed to report the incongruity and routinely mis-described it by normalizing the incongruent object. None of the results supports the claim that semantic incongruity within a scene invariably captures attention and provide strong evidence of the dominant role of scene gist in determining what is perceived. Copyright © 2016 Elsevier Inc. All rights reserved.
Stochastic correlative firing for figure-ground segregation.

PubMed

Chen, Zhe

2005-03-01

Segregation of sensory inputs into separate objects is a central aspect of perception and arises in all sensory modalities. The figure-ground segregation problem requires identifying an object of interest in a complex scene, in many cases given binaural auditory or binocular visual observations. The computations required for visual and auditory figure-ground segregation share many common features and can be cast within a unified framework. Sensory perception can be viewed as a problem of optimizing information transmission. Here we suggest a stochastic correlative firing mechanism and an associative learning rule for figure-ground segregation in several classic sensory perception tasks, including the cocktail party problem in binaural hearing, binocular fusion of stereo images, and Gestalt grouping in motion perception.
Distributed and Dynamic Neural Encoding of Multiple Motion Directions of Transparently Moving Stimuli in Cortical Area MT

PubMed Central

Xiao, Jianbo

2015-01-01

Segmenting visual scenes into distinct objects and surfaces is a fundamental visual function. To better understand the underlying neural mechanism, we investigated how neurons in the middle temporal cortex (MT) of macaque monkeys represent overlapping random-dot stimuli moving transparently in slightly different directions. It has been shown that the neuronal response elicited by two stimuli approximately follows the average of the responses elicited by the constituent stimulus components presented alone. In this scheme of response pooling, the ability to segment two simultaneously presented motion directions is limited by the width of the tuning curve to motion in a single direction. We found that, although the population-averaged neuronal tuning showed response averaging, subgroups of neurons showed distinct patterns of response tuning and were capable of representing component directions that were separated by a small angle—less than the tuning width to unidirectional stimuli. One group of neurons preferentially represented the component direction at a specific side of the bidirectional stimuli, weighting one stimulus component more strongly than the other. Another group of neurons pooled the component responses nonlinearly and showed two separate peaks in their tuning curves even when the average of the component responses was unimodal. We also show for the first time that the direction tuning of MT neurons evolved from initially representing the vector-averaged direction of slightly different stimuli to gradually representing the component directions. Our results reveal important neural processes underlying image segmentation and suggest that information about slightly different stimulus components is computed dynamically and distributed across neurons. SIGNIFICANCE STATEMENT Natural scenes often contain multiple entities. The ability to segment visual scenes into distinct objects and surfaces is fundamental to sensory processing and is crucial for generating the perception of our environment. Because cortical neurons are broadly tuned to a given visual feature, segmenting two stimuli that differ only slightly is a challenge for the visual system. In this study, we discovered that many neurons in the visual cortex are capable of representing individual components of slightly different stimuli by selectively and nonlinearly pooling the responses elicited by the stimulus components. We also show for the first time that the neural representation of individual stimulus components developed over a period of ∼70–100 ms, revealing a dynamic process of image segmentation. PMID:26658869
Reconstruction and simplification of urban scene models based on oblique images

NASA Astrophysics Data System (ADS)

Liu, J.; Guo, B.

2014-08-01

We describe a multi-view stereo reconstruction and simplification algorithms for urban scene models based on oblique images. The complexity, diversity, and density within the urban scene, it increases the difficulty to build the city models using the oblique images. But there are a lot of flat surfaces existing in the urban scene. One of our key contributions is that a dense matching algorithm based on Self-Adaptive Patch in view of the urban scene is proposed. The basic idea of matching propagating based on Self-Adaptive Patch is to build patches centred by seed points which are already matched. The extent and shape of the patches can adapt to the objects of urban scene automatically: when the surface is flat, the extent of the patch would become bigger; while the surface is very rough, the extent of the patch would become smaller. The other contribution is that the mesh generated by Graph Cuts is 2-manifold surface satisfied the half edge data structure. It is solved by clustering and re-marking tetrahedrons in s-t graph. The purpose of getting 2- manifold surface is to simply the mesh by edge collapse algorithm which can preserve and stand out the features of buildings.
On-chip visual perception of motion: a bio-inspired connectionist model on FPGA.

PubMed

Torres-Huitzil, César; Girau, Bernard; Castellanos-Sánchez, Claudio

2005-01-01

Visual motion provides useful information to understand the dynamics of a scene to allow intelligent systems interact with their environment. Motion computation is usually restricted by real time requirements that need the design and implementation of specific hardware architectures. In this paper, the design of hardware architecture for a bio-inspired neural model for motion estimation is presented. The motion estimation is based on a strongly localized bio-inspired connectionist model with a particular adaptation of spatio-temporal Gabor-like filtering. The architecture is constituted by three main modules that perform spatial, temporal, and excitatory-inhibitory connectionist processing. The biomimetic architecture is modeled, simulated and validated in VHDL. The synthesis results on a Field Programmable Gate Array (FPGA) device show the potential achievement of real-time performance at an affordable silicon area.
Scene Integration Without Awareness: No Conclusive Evidence for Processing Scene Congruency During Continuous Flash Suppression.

PubMed

Moors, Pieter; Boelens, David; van Overwalle, Jaana; Wagemans, Johan

2016-07-01

A recent study showed that scenes with an object-background relationship that is semantically incongruent break interocular suppression faster than scenes with a semantically congruent relationship. These results implied that semantic relations between the objects and the background of a scene could be extracted in the absence of visual awareness of the stimulus. In the current study, we assessed the replicability of this finding and tried to rule out an alternative explanation dependent on low-level differences between the stimuli. Furthermore, we used a Bayesian analysis to quantify the evidence in favor of the presence or absence of a scene-congruency effect. Across three experiments, we found no convincing evidence for a scene-congruency effect or a modulation of scene congruency by scene inversion. These findings question the generalizability of previous observations and cast doubt on whether genuine semantic processing of object-background relationships in scenes can manifest during interocular suppression. © The Author(s) 2016.
Interactive MPEG-4 low-bit-rate speech/audio transmission over the Internet

NASA Astrophysics Data System (ADS)

Liu, Fang; Kim, JongWon; Kuo, C.-C. Jay

1999-11-01

The recently developed MPEG-4 technology enables the coding and transmission of natural and synthetic audio-visual data in the form of objects. In an effort to extend the object-based functionality of MPEG-4 to real-time Internet applications, architectural prototypes of multiplex layer and transport layer tailored for transmission of MPEG-4 data over IP are under debate among Internet Engineering Task Force (IETF), and MPEG-4 systems Ad Hoc group. In this paper, we present an architecture for interactive MPEG-4 speech/audio transmission system over the Internet. It utilities a framework of Real Time Streaming Protocol (RTSP) over Real-time Transport Protocol (RTP) to provide controlled, on-demand delivery of real time speech/audio data. Based on a client-server model, a couple of low bit-rate bit streams (real-time speech/audio, pre- encoded speech/audio) are multiplexed and transmitted via a single RTP channel to the receiver. The MPEG-4 Scene Description (SD) and Object Descriptor (OD) bit streams are securely sent through the RTSP control channel. Upon receiving, an initial MPEG-4 audio- visual scene is constructed after de-multiplexing, decoding of bit streams, and scene composition. A receiver is allowed to manipulate the initial audio-visual scene presentation locally, or interactively arrange scene changes by sending requests to the server. A server may also choose to update the client with new streams and list of contents for user selection.
Image registration for multi-exposed HDRI and motion deblurring

NASA Astrophysics Data System (ADS)

Lee, Seok; Wey, Ho-Cheon; Lee, Seong-Deok

2009-02-01

In multi-exposure based image fusion task, alignment is an essential prerequisite to prevent ghost artifact after blending. Compared to usual matching problem, registration is more difficult when each image is captured under different photographing conditions. In HDR imaging, we use long and short exposure images, which have different brightness and there exist over/under satuated regions. In motion deblurring problem, we use blurred and noisy image pair and the amount of motion blur varies from one image to another due to the different exposure times. The main difficulty is that luminance levels of the two images are not in linear relationship and we cannot perfectly equalize or normalize the brightness of each image and this leads to unstable and inaccurate alignment results. To solve this problem, we applied probabilistic measure such as mutual information to represent similarity between images after alignment. In this paper, we discribed about the characteristics of multi-exposed input images in the aspect of registration and also analyzed the magnitude of camera hand shake. By exploiting the independence of luminance of mutual information, we proposed a fast and practically useful image registration technique in multiple capturing. Our algorithm can be applied to extreme HDR scenes and motion blurred scenes with over 90% success rate and its simplicity enables to be embedded in digital camera and mobile camera phone. The effectiveness of our registration algorithm is examined by various experiments on real HDR or motion deblurring cases using hand-held camera.
Hierarchy-associated semantic-rule inference framework for classifying indoor scenes

NASA Astrophysics Data System (ADS)

Yu, Dan; Liu, Peng; Ye, Zhipeng; Tang, Xianglong; Zhao, Wei

2016-03-01

Typically, the initial task of classifying indoor scenes is challenging, because the spatial layout and decoration of a scene can vary considerably. Recent efforts at classifying object relationships commonly depend on the results of scene annotation and predefined rules, making classification inflexible. Furthermore, annotation results are easily affected by external factors. Inspired by human cognition, a scene-classification framework was proposed using the empirically based annotation (EBA) and a match-over rule-based (MRB) inference system. The semantic hierarchy of images is exploited by EBA to construct rules empirically for MRB classification. The problem of scene classification is divided into low-level annotation and high-level inference from a macro perspective. Low-level annotation involves detecting the semantic hierarchy and annotating the scene with a deformable-parts model and a bag-of-visual-words model. In high-level inference, hierarchical rules are extracted to train the decision tree for classification. The categories of testing samples are generated from the parts to the whole. Compared with traditional classification strategies, the proposed semantic hierarchy and corresponding rules reduce the effect of a variable background and improve the classification performance. The proposed framework was evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.
Visual search for arbitrary objects in real scenes

PubMed Central

Alvarez, George A.; Rosenholtz, Ruth; Kuzmova, Yoana I.; Sherman, Ashley M.

2011-01-01

How efficient is visual search in real scenes? In searches for targets among arrays of randomly placed distractors, efficiency is often indexed by the slope of the reaction time (RT) × Set Size function. However, it may be impossible to define set size for real scenes. As an approximation, we hand-labeled 100 indoor scenes and used the number of labeled regions as a surrogate for set size. In Experiment 1, observers searched for named objects (a chair, bowl, etc.). With set size defined as the number of labeled regions, search was very efficient (~5 ms/item). When we controlled for a possible guessing strategy in Experiment 2, slopes increased somewhat (~15 ms/item), but they were much shallower than search for a random object among other distinctive objects outside of a scene setting (Exp. 3: ~40 ms/item). In Experiments 4–6, observers searched repeatedly through the same scene for different objects. Increased familiarity with scenes had modest effects on RTs, while repetition of target items had large effects (>500 ms). We propose that visual search in scenes is efficient because scene-specific forms of attentional guidance can eliminate most regions from the “functional set size” of items that could possibly be the target. PMID:21671156
Visual search for arbitrary objects in real scenes.

PubMed

Wolfe, Jeremy M; Alvarez, George A; Rosenholtz, Ruth; Kuzmova, Yoana I; Sherman, Ashley M

2011-08-01

How efficient is visual search in real scenes? In searches for targets among arrays of randomly placed distractors, efficiency is often indexed by the slope of the reaction time (RT) × Set Size function. However, it may be impossible to define set size for real scenes. As an approximation, we hand-labeled 100 indoor scenes and used the number of labeled regions as a surrogate for set size. In Experiment 1, observers searched for named objects (a chair, bowl, etc.). With set size defined as the number of labeled regions, search was very efficient (~5 ms/item). When we controlled for a possible guessing strategy in Experiment 2, slopes increased somewhat (~15 ms/item), but they were much shallower than search for a random object among other distinctive objects outside of a scene setting (Exp. 3: ~40 ms/item). In Experiments 4-6, observers searched repeatedly through the same scene for different objects. Increased familiarity with scenes had modest effects on RTs, while repetition of target items had large effects (>500 ms). We propose that visual search in scenes is efficient because scene-specific forms of attentional guidance can eliminate most regions from the "functional set size" of items that could possibly be the target.
Robust Approach for Nonuniformity Correction in Infrared Focal Plane Array.

PubMed

Boutemedjet, Ayoub; Deng, Chenwei; Zhao, Baojun

2016-11-10

In this paper, we propose a new scene-based nonuniformity correction technique for infrared focal plane arrays. Our work is based on the use of two well-known scene-based methods, namely, adaptive and interframe registration-based exploiting pure translation motion model between frames. The two approaches have their benefits and drawbacks, which make them extremely effective in certain conditions and not adapted for others. Following on that, we developed a method robust to various conditions, which may slow or affect the correction process by elaborating a decision criterion that adapts the process to the most effective technique to ensure fast and reliable correction. In addition to that, problems such as bad pixels and ghosting artifacts are also dealt with to enhance the overall quality of the correction. The performance of the proposed technique is investigated and compared to the two state-of-the-art techniques cited above.

Robust Approach for Nonuniformity Correction in Infrared Focal Plane Array

PubMed Central

Boutemedjet, Ayoub; Deng, Chenwei; Zhao, Baojun

2016-01-01

In this paper, we propose a new scene-based nonuniformity correction technique for infrared focal plane arrays. Our work is based on the use of two well-known scene-based methods, namely, adaptive and interframe registration-based exploiting pure translation motion model between frames. The two approaches have their benefits and drawbacks, which make them extremely effective in certain conditions and not adapted for others. Following on that, we developed a method robust to various conditions, which may slow or affect the correction process by elaborating a decision criterion that adapts the process to the most effective technique to ensure fast and reliable correction. In addition to that, problems such as bad pixels and ghosting artifacts are also dealt with to enhance the overall quality of the correction. The performance of the proposed technique is investigated and compared to the two state-of-the-art techniques cited above. PMID:27834893
Cortical systems mediating visual attention to both objects and spatial locations

PubMed Central

Shomstein, Sarah; Behrmann, Marlene

2006-01-01

Natural visual scenes consist of many objects occupying a variety of spatial locations. Given that the plethora of information cannot be processed simultaneously, the multiplicity of inputs compete for representation. Using event-related functional MRI, we show that attention, the mechanism by which a subset of the input is selected, is mediated by the posterior parietal cortex (PPC). Of particular interest is that PPC activity is differentially sensitive to the object-based properties of the input, with enhanced activation for those locations bound by an attended object. Of great interest too is the ensuing modulation of activation in early cortical regions, reflected as differences in the temporal profile of the blood oxygenation level-dependent (BOLD) response for within-object versus between-object locations. These findings indicate that object-based selection results from an object-sensitive reorienting signal issued by the PPC. The dynamic circuit between the PPC and earlier sensory regions then enables observers to attend preferentially to objects of interest in complex scenes. PMID:16840559
Representation of Gravity-Aligned Scene Structure in Ventral Pathway Visual Cortex.

PubMed

Vaziri, Siavash; Connor, Charles E

2016-03-21

The ventral visual pathway in humans and non-human primates is known to represent object information, including shape and identity [1]. Here, we show the ventral pathway also represents scene structure aligned with the gravitational reference frame in which objects move and interact. We analyzed shape tuning of recently described macaque monkey ventral pathway neurons that prefer scene-like stimuli to objects [2]. Individual neurons did not respond to a single shape class, but to a variety of scene elements that are typically aligned with gravity: large planes in the orientation range of ground surfaces under natural viewing conditions, planes in the orientation range of ceilings, and extended convex and concave edges in the orientation range of wall/floor/ceiling junctions. For a given neuron, these elements tended to share a common alignment in eye-centered coordinates. Thus, each neuron integrated information about multiple gravity-aligned structures as they would be seen from a specific eye and head orientation. This eclectic coding strategy provides only ambiguous information about individual structures but explicit information about the environmental reference frame and the orientation of gravity in egocentric coordinates. In the ventral pathway, this could support perceiving and/or predicting physical events involving objects subject to gravity, recognizing object attributes like animacy based on movement not caused by gravity, and/or stabilizing perception of the world against changes in head orientation [3-5]. Our results, like the recent discovery of object weight representation [6], imply that the ventral pathway is involved not just in recognition, but also in physical understanding of objects and scenes. Copyright © 2016 Elsevier Ltd. All rights reserved.
Enhanced Algorithms for EO/IR Electronic Stabilization, Clutter Suppression, and Track-Before-Detect for Multiple Low Observable Targets

NASA Astrophysics Data System (ADS)

Tartakovsky, A.; Brown, A.; Brown, J.

The paper describes the development and evaluation of a suite of advanced algorithms which provide significantly-improved capabilities for finding, fixing, and tracking multiple ballistic and flying low observable objects in highly stressing cluttered environments. The algorithms have been developed for use in satellite-based staring and scanning optical surveillance suites for applications including theatre and intercontinental ballistic missile early warning, trajectory prediction, and multi-sensor track handoff for midcourse discrimination and intercept. The functions performed by the algorithms include electronic sensor motion compensation providing sub-pixel stabilization (to 1/100 of a pixel), as well as advanced temporal-spatial clutter estimation and suppression to below sensor noise levels, followed by statistical background modeling and Bayesian multiple-target track-before-detect filtering. The multiple-target tracking is performed in physical world coordinates to allow for multi-sensor fusion, trajectory prediction, and intercept. Output of detected object cues and data visualization are also provided. The algorithms are designed to handle a wide variety of real-world challenges. Imaged scenes may be highly complex and infinitely varied -- the scene background may contain significant celestial, earth limb, or terrestrial clutter. For example, when viewing combined earth limb and terrestrial scenes, a combination of stationary and non-stationary clutter may be present, including cloud formations, varying atmospheric transmittance and reflectance of sunlight and other celestial light sources, aurora, glint off sea surfaces, and varied natural and man-made terrain features. The targets of interest may also appear to be dim, relative to the scene background, rendering much of the existing deployed software useless for optical target detection and tracking. Additionally, it may be necessary to detect and track a large number of objects in the threat cloud, and these objects may not always be resolvable in individual data frames. In the present paper, the performance of the developed algorithms is demonstrated using real-world data containing resident space objects observed from the MSX platform, with backgrounds varying from celestial to combined celestial and earth limb, with instances of extremely bright aurora clutter. Simulation results are also presented for parameterized variations in signal-to-clutter levels (down to 1/1000) and signal-to-noise levels (down to 1/6) for simulated targets against real-world terrestrial clutter backgrounds. We also discuss algorithm processing requirements and C++ software processing capabilities from our on-going MDA- and AFRL-sponsored development of an image processing toolkit (iPTK). In the current effort, the iPTK is being developed to a Technology Readiness Level (TRL) of 6 by mid-2010, in preparation for possible integration with STSS-like, SBIRS high-like and SBSS-like surveillance suites.
Generalized algebraic scene-based nonuniformity correction algorithm.

PubMed

Ratliff, Bradley M; Hayat, Majeed M; Tyo, J Scott

2005-02-01

A generalization of a recently developed algebraic scene-based nonuniformity correction algorithm for focal plane array (FPA) sensors is presented. The new technique uses pairs of image frames exhibiting arbitrary one- or two-dimensional translational motion to compute compensator quantities that are then used to remove nonuniformity in the bias of the FPA response. Unlike its predecessor, the generalization does not require the use of either a blackbody calibration target or a shutter. The algorithm has a low computational overhead, lending itself to real-time hardware implementation. The high-quality correction ability of this technique is demonstrated through application to real IR data from both cooled and uncooled infrared FPAs. A theoretical and experimental error analysis is performed to study the accuracy of the bias compensator estimates in the presence of two main sources of error.
Eye movements during change detection: implications for search constraints, memory limitations, and scanning strategies.

PubMed

Zelinsky, G J

2001-02-01

Search, memory, and strategy constraints on change detection were analyzed in terms of oculomotor variables. Observers viewed a repeating sequence of three displays (Scene 1-->Mask-->Scene 2-->Mask...) and indicated the presence-absence of a changing object between Scenes 1 and 2. Scenes depicted real-world objects arranged on a surface. Manipulations included set size (one, three, or nine items) and the orientation of the changing objects (similar or different). Eye movements increased with the number of potentially changing objects in the scene, with this set size effect suggesting a relationship between change detection and search. A preferential fixation analysis determined that memory constraints are better described by the operation comparing the pre- and postchange objects than as a capacity limitation, and a scanpath analysis revealed a change detection strategy relying on the peripheral encoding and comparison of display items. These findings support a signal-in-noise interpretation of change detection in which the signal varies with the similarity of the changing objects and the noise is determined by the distractor objects and scene background.
Mining Very High Resolution INSAR Data Based On Complex-GMRF Cues And Relevance Feedback

NASA Astrophysics Data System (ADS)

Singh, Jagmal; Popescu, Anca; Soccorsi, Matteo; Datcu, Mihai

2012-01-01

With the increase in number of remote sensing satellites, the number of image-data scenes in our repositories is also increasing and a large quantity of these scenes are never received and used. Thus automatic retrieval of de- sired image-data using query by image content to fully utilize the huge repository volume is becoming of great interest. Generally different users are interested in scenes containing different kind of objects and structures. So its important to analyze all the image information mining (IIM) methods so that its easier for user to select a method depending upon his/her requirement. We concentrate our study only on high-resolution SAR images and we propose to use InSAR observations instead of only one single look complex (SLC) images for mining scenes containing coherent objects such as high-rise buildings. However in case of objects with less coherence like areas with vegetation cover, SLC images exhibits better performance. We demonstrate IIM performance comparison using complex-Gauss Markov Random Fields as texture descriptor for image patches and SVM relevance- feedback.
Basic level scene understanding: categories, attributes and structures

PubMed Central

Xiao, Jianxiong; Hays, James; Russell, Bryan C.; Patterson, Genevieve; Ehinger, Krista A.; Torralba, Antonio; Oliva, Aude

2013-01-01

A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image. PMID:24009590
NASA Fundamental Remote Sensing Science Research Program

NASA Technical Reports Server (NTRS)

1984-01-01

The NASA Fundamental Remote Sensing Research Program is described. The program provides a dynamic scientific base which is continually broadened and from which future applied research and development can draw support. In particular, the overall objectives and current studies of the scene radiation and atmospheric effect characterization (SRAEC) project are reviewed. The SRAEC research can be generically structured into four types of activities including observation of phenomena, empirical characterization, analytical modeling, and scene radiation analysis and synthesis. The first three activities are the means by which the goal of scene radiation analysis and synthesis is achieved, and thus are considered priority activities during the early phases of the current project. Scene radiation analysis refers to the extraction of information describing the biogeophysical attributes of the scene from the spectral, spatial, and temporal radiance characteristics of the scene including the atmosphere. Scene radiation synthesis is the generation of realistic spectral, spatial, and temporal radiance values for a scene with a given set of biogeophysical attributes and atmospheric conditions.
Gist in time: scene semantics and structure enhance recall of searched objects

PubMed Central

Wolfe, Jeremy M.; Võ, Melissa L.-H.

2016-01-01

Previous work has shown that recall of objects that are incidentally encountered as targets in visual search is better than recall of objects that have been intentionally memorized (Draschkow, Wolfe & Võ, 2014). However, this counter-intuitive result is not seen when these tasks are performed with non-scene stimuli. The goal of the current paper is to determine what features of search in a scene contribute to higher recall rates when compared to a memorization task. In each of four experiments, we compare the free recall rate for target objects following a search to the rate following a memorization task. Across the experiments, the stimuli include progressively more scene-related information. Experiment 1 provides the spatial relations between objects. Experiment 2 adds relative size and depth of objects. Experiments 3 and 4 include scene layout and semantic information. We find that search leads to better recall than explicit memorization in cases where scene layout and semantic information are present, as long as the participant has ample time (2500ms) to integrate this information with knowledge about the target object (Exp. 4). These results suggest that the integration of scene and target information not only leads to more efficient search, but can also contribute to stronger memory representations than intentional memorization. PMID:27270227
Gist in time: Scene semantics and structure enhance recall of searched objects.

PubMed

Josephs, Emilie L; Draschkow, Dejan; Wolfe, Jeremy M; Võ, Melissa L-H

2016-09-01

Previous work has shown that recall of objects that are incidentally encountered as targets in visual search is better than recall of objects that have been intentionally memorized (Draschkow, Wolfe, & Võ, 2014). However, this counter-intuitive result is not seen when these tasks are performed with non-scene stimuli. The goal of the current paper is to determine what features of search in a scene contribute to higher recall rates when compared to a memorization task. In each of four experiments, we compare the free recall rate for target objects following a search to the rate following a memorization task. Across the experiments, the stimuli include progressively more scene-related information. Experiment 1 provides the spatial relations between objects. Experiment 2 adds relative size and depth of objects. Experiments 3 and 4 include scene layout and semantic information. We find that search leads to better recall than explicit memorization in cases where scene layout and semantic information are present, as long as the participant has ample time (2500ms) to integrate this information with knowledge about the target object (Exp. 4). These results suggest that the integration of scene and target information not only leads to more efficient search, but can also contribute to stronger memory representations than intentional memorization. Copyright © 2016 Elsevier B.V. All rights reserved.
Object tracking using multiple camera video streams

NASA Astrophysics Data System (ADS)

Mehrubeoglu, Mehrube; Rojas, Diego; McLauchlan, Lifford

2010-05-01

Two synchronized cameras are utilized to obtain independent video streams to detect moving objects from two different viewing angles. The video frames are directly correlated in time. Moving objects in image frames from the two cameras are identified and tagged for tracking. One advantage of such a system involves overcoming effects of occlusions that could result in an object in partial or full view in one camera, when the same object is fully visible in another camera. Object registration is achieved by determining the location of common features in the moving object across simultaneous frames. Perspective differences are adjusted. Combining information from images from multiple cameras increases robustness of the tracking process. Motion tracking is achieved by determining anomalies caused by the objects' movement across frames in time in each and the combined video information. The path of each object is determined heuristically. Accuracy of detection is dependent on the speed of the object as well as variations in direction of motion. Fast cameras increase accuracy but limit the speed and complexity of the algorithm. Such an imaging system has applications in traffic analysis, surveillance and security, as well as object modeling from multi-view images. The system can easily be expanded by increasing the number of cameras such that there is an overlap between the scenes from at least two cameras in proximity. An object can then be tracked long distances or across multiple cameras continuously, applicable, for example, in wireless sensor networks for surveillance or navigation.
A holographic technique for recording a hypervelocity projectile with front surface resolution.

PubMed

Kurtz, R L; Loh, H Y

1970-05-01

Any motion of the scene during the exposure of a hologram results in a spatial modulation of the recorded fringe contrast. On reconstruction, this produces a spatial amplitude modulation of the reconstructed wavefront, which results in a blurring of the image, not unlike that of a conventional photograph. For motion of the scene sufficient to change the path length of the signal arm by a half wavelength, this blurring is generally prohibitive. This paper describes a proposed holographic technique which offers promise for front light resolution of targets moving at high speeds, heretofore unobtainable by conventional methods.
Augmented reality glass-free three-dimensional display with the stereo camera

NASA Astrophysics Data System (ADS)

Pang, Bo; Sang, Xinzhu; Chen, Duo; Xing, Shujun; Yu, Xunbo; Yan, Binbin; Wang, Kuiru; Yu, Chongxiu

2017-10-01

An improved method for Augmented Reality (AR) glass-free three-dimensional (3D) display based on stereo camera used for presenting parallax contents from different angle with lenticular lens array is proposed. Compared with the previous implementation method of AR techniques based on two-dimensional (2D) panel display with only one viewpoint, the proposed method can realize glass-free 3D display of virtual objects and real scene with 32 virtual viewpoints. Accordingly, viewers can get abundant 3D stereo information from different viewing angles based on binocular parallax. Experimental results show that this improved method based on stereo camera can realize AR glass-free 3D display, and both of virtual objects and real scene have realistic and obvious stereo performance.
Statistics of high-level scene context

PubMed Central

Greene, Michelle R.

2013-01-01

Context is critical for recognizing environments and for searching for objects within them: contextual associations have been shown to modulate reaction time and object recognition accuracy, as well as influence the distribution of eye movements and patterns of brain activations. However, we have not yet systematically quantified the relationships between objects and their scene environments. Here I seek to fill this gap by providing descriptive statistics of object-scene relationships. A total of 48, 167 objects were hand-labeled in 3499 scenes using the LabelMe tool (Russell et al., 2008). From these data, I computed a variety of descriptive statistics at three different levels of analysis: the ensemble statistics that describe the density and spatial distribution of unnamed “things” in the scene; the bag of words level where scenes are described by the list of objects contained within them; and the structural level where the spatial distribution and relationships between the objects are measured. The utility of each level of description for scene categorization was assessed through the use of linear classifiers, and the plausibility of each level for modeling human scene categorization is discussed. Of the three levels, ensemble statistics were found to be the most informative (per feature), and also best explained human patterns of categorization errors. Although a bag of words classifier had similar performance to human observers, it had a markedly different pattern of errors. However, certain objects are more useful than others, and ceiling classification performance could be achieved using only the 64 most informative objects. As object location tends not to vary as a function of category, structural information provided little additional information. Additionally, these data provide valuable information on natural scene redundancy that can be exploited for machine vision, and can help the visual cognition community to design experiments guided by statistics rather than intuition. PMID:24194723
Detection of gait characteristics for scene registration in video surveillance system.

PubMed

Havasi, László; Szlávik, Zoltán; Szirányi, Tamás

2007-02-01

This paper presents a robust walk-detection algorithm, based on our symmetry approach which can be used to extract gait characteristics from video-image sequences. To obtain a useful descriptor of a walking person, we temporally track the symmetries of a person's legs. Our method is suitable for use in indoor or outdoor surveillance scenes. Determining the leading leg of the walking subject is important, and the presented method can identify this from two successive walk steps (one walk cycle). We tested the accuracy of the presented walk-detection method in a possible application: Image registration methods are presented which are applicable to multicamera systems viewing human subjects in motion.
Real time ray tracing based on shader

NASA Astrophysics Data System (ADS)

Gui, JiangHeng; Li, Min

2017-07-01

Ray tracing is a rendering algorithm for generating an image through tracing lights into an image plane, it can simulate complicate optical phenomenon like refraction, depth of field and motion blur. Compared with rasterization, ray tracing can achieve more realistic rendering result, however with greater computational cost, simple scene rendering can consume tons of time. With the GPU's performance improvement and the advent of programmable rendering pipeline, complicated algorithm can also be implemented directly on shader. So, this paper proposes a new method that implement ray tracing directly on fragment shader, mainly include: surface intersection, importance sampling and progressive rendering. With the help of GPU's powerful throughput capability, it can implement real time rendering of simple scene.
Velocity-based motion categorization by pigeons.

PubMed

Cook, Robert G; Beale, Kevin; Koban, Angie

2011-04-01

To examine if animals could learn action-like categorizations in a manner similar to noun-based categories, eight pigeons were trained to categorize rates of object motion. Testing 40 different objects in a go/no-go discrimination, pigeons were first trained to discriminate between fast and slow rates of object rotation around their central y-axis. They easily learned this velocity discrimination and transferred it to novel objects and rates. This discrimination also transferred to novel types of motions including the other two axes of rotation and two new translations around the display. Comparable tests with rapid and slow changes in the objects' size, color, and shape failed to support comparable transfer. This difference in discrimination transfer between motion-based and property-based changes suggests the pigeons had learned motion concept rather than one based on change per se. The results provide evidence that pigeons can acquire an understanding of motion-based actions, at least with regard to the property of object velocity. This may be similar to our use of verbs and adverbs to categorize different classes of behavior or motion (e.g., walking, jogging, or running slow vs. fast).
Two Mirrors: Infinite Images of DiCaprio

ERIC Educational Resources Information Center

Fadeev, Pavel

2015-01-01

Movies are mostly viewed for entertainment. Mixing entertainment and physics gets students excited as we look at a famous movie scene from a different point of view. The following is a link to a fragment from the 2010 motion picture "Inception": http://www.youtube.com/watch?v=q3tBBhYJeAw. The following problem, based on images in facing…
A shape-based segmentation method for mobile laser scanning point clouds

NASA Astrophysics Data System (ADS)

Yang, Bisheng; Dong, Zhen

2013-07-01

Segmentation of mobile laser point clouds of urban scenes into objects is an important step for post-processing (e.g., interpretation) of point clouds. Point clouds of urban scenes contain numerous objects with significant size variability, complex and incomplete structures, and holes or variable point densities, raising great challenges for the segmentation of mobile laser point clouds. This paper addresses these challenges by proposing a shape-based segmentation method. The proposed method first calculates the optimal neighborhood size of each point to derive the geometric features associated with it, and then classifies the point clouds according to geometric features using support vector machines (SVMs). Second, a set of rules are defined to segment the classified point clouds, and a similarity criterion for segments is proposed to overcome over-segmentation. Finally, the segmentation output is merged based on topological connectivity into a meaningful geometrical abstraction. The proposed method has been tested on point clouds of two urban scenes obtained by different mobile laser scanners. The results show that the proposed method segments large-scale mobile laser point clouds with good accuracy and computationally effective time cost, and that it segments pole-like objects particularly well.

Representational Momentum in Aviation

ERIC Educational Resources Information Center

Blattler, Colin; Ferrari, Vincent; Didierjean, Andre; Marmeche, Evelyne

2011-01-01

The purpose of this study was to examine the effects of expertise on motion anticipation. We conducted 2 experiments in which novices and expert pilots viewed simulated aircraft landing scenes. The scenes were interrupted by the display of a black screen and then started again after a forward or backward shift. The participant's task was to…
Perceptual salience affects the contents of working memory during free-recollection of objects from natural scenes

PubMed Central

Pedale, Tiziana; Santangelo, Valerio

2015-01-01

One of the most important issues in the study of cognition is to understand which are the factors determining internal representation of the external world. Previous literature has started to highlight the impact of low-level sensory features (indexed by saliency-maps) in driving attention selection, hence increasing the probability for objects presented in complex and natural scenes to be successfully encoded into working memory (WM) and then correctly remembered. Here we asked whether the probability of retrieving high-saliency objects modulates the overall contents of WM, by decreasing the probability of retrieving other, lower-saliency objects. We presented pictures of natural scenes for 4 s. After a retention period of 8 s, we asked participants to verbally report as many objects/details as possible of the previous scenes. We then computed how many times the objects located at either the peak of maximal or minimal saliency in the scene (as indexed by a saliency-map; Itti et al., 1998) were recollected by participants. Results showed that maximal-saliency objects were recollected more often and earlier in the stream of successfully reported items than minimal-saliency objects. This indicates that bottom-up sensory salience increases the recollection probability and facilitates the access to memory representation at retrieval, respectively. Moreover, recollection of the maximal- (but not the minimal-) saliency objects predicted the overall amount of successfully recollected objects: The higher the probability of having successfully reported the most-salient object in the scene, the lower the amount of recollected objects. These findings highlight that bottom-up sensory saliency modulates the current contents of WM during recollection of objects from natural scenes, most likely by reducing available resources to encode and then retrieve other (lower saliency) objects. PMID:25741266
Model-Based Building Detection from Low-Cost Optical Sensors Onboard Unmanned Aerial Vehicles

NASA Astrophysics Data System (ADS)

Karantzalos, K.; Koutsourakis, P.; Kalisperakis, I.; Grammatikopoulos, L.

2015-08-01

The automated and cost-effective building detection in ultra high spatial resolution is of major importance for various engineering and smart city applications. To this end, in this paper, a model-based building detection technique has been developed able to extract and reconstruct buildings from UAV aerial imagery and low-cost imaging sensors. In particular, the developed approach through advanced structure from motion, bundle adjustment and dense image matching computes a DSM and a true orthomosaic from the numerous GoPro images which are characterised by important geometric distortions and fish-eye effect. An unsupervised multi-region, graphcut segmentation and a rule-based classification is responsible for delivering the initial multi-class classification map. The DTM is then calculated based on inpaininting and mathematical morphology process. A data fusion process between the detected building from the DSM/DTM and the classification map feeds a grammar-based building reconstruction and scene building are extracted and reconstructed. Preliminary experimental results appear quite promising with the quantitative evaluation indicating detection rates at object level of 88% regarding the correctness and above 75% regarding the detection completeness.
Active visual search in non-stationary scenes: coping with temporal variability and uncertainty

NASA Astrophysics Data System (ADS)

Ušćumlić, Marija; Blankertz, Benjamin

2016-02-01

Objective. State-of-the-art experiments for studying neural processes underlying visual cognition often constrain sensory inputs (e.g., static images) and our behavior (e.g., fixed eye-gaze, long eye fixations), isolating or simplifying the interaction of neural processes. Motivated by the non-stationarity of our natural visual environment, we investigated the electroencephalography (EEG) correlates of visual recognition while participants overtly performed visual search in non-stationary scenes. We hypothesized that visual effects (such as those typically used in human-computer interfaces) may increase temporal uncertainty (with reference to fixation onset) of cognition-related EEG activity in an active search task and therefore require novel techniques for single-trial detection. Approach. We addressed fixation-related EEG activity in an active search task with respect to stimulus-appearance styles and dynamics. Alongside popping-up stimuli, our experimental study embraces two composite appearance styles based on fading-in, enlarging, and motion effects. Additionally, we explored whether the knowledge obtained in the pop-up experimental setting can be exploited to boost the EEG-based intention-decoding performance when facing transitional changes of visual content. Main results. The results confirmed our initial hypothesis that the dynamic of visual content can increase temporal uncertainty of the cognition-related EEG activity in active search with respect to fixation onset. This temporal uncertainty challenges the pivotal aim to keep the decoding performance constant irrespective of visual effects. Importantly, the proposed approach for EEG decoding based on knowledge transfer between the different experimental settings gave a promising performance. Significance. Our study demonstrates that the non-stationarity of visual scenes is an important factor in the evolution of cognitive processes, as well as in the dynamic of ocular behavior (i.e., dwell time and fixation duration) in an active search task. In addition, our method to improve single-trial detection performance in this adverse scenario is an important step in making brain-computer interfacing technology available for human-computer interaction applications.
How long did it last? You would better ask a human

PubMed Central

Lacquaniti, Francesco; Carrozzo, Mauro; d’Avella, Andrea; La Scaleia, Barbara; Moscatelli, Alessandro; Zago, Myrka

2014-01-01

In the future, human-like robots will live among people to provide company and help carrying out tasks in cooperation with humans. These interactions require that robots understand not only human actions, but also the way in which we perceive the world. Human perception heavily relies on the time dimension, especially when it comes to processing visual motion. Critically, human time perception for dynamic events is often inaccurate. Robots interacting with humans may want to see the world and tell time the way humans do: if so, they must incorporate human-like fallacy. Observers asked to judge the duration of brief scenes are prone to errors: perceived duration often does not match the physical duration of the event. Several kinds of temporal distortions have been described in the specialized literature. Here we review the topic with a special emphasis on our work dealing with time perception of animate actors versus inanimate actors. This work shows the existence of specialized time bases for different categories of targets. The time base used by the human brain to process visual motion appears to be calibrated against the specific predictions regarding the motion of human figures in case of animate motion, while it can be calibrated against the predictions of motion of passive objects in case of inanimate motion. Human perception of time appears to be strictly linked with the mechanisms used to control movements. Thus, neural time can be entrained by external cues in a similar manner for both perceptual judgments of elapsed time and in motor control tasks. One possible strategy could be to implement in humanoids a unique architecture for dealing with time, which would apply the same specialized mechanisms to both perception and action, similarly to humans. This shared implementation might render the humanoids more acceptable to humans, thus facilitating reciprocal interactions. PMID:24478694
How long did it last? You would better ask a human.

PubMed

Lacquaniti, Francesco; Carrozzo, Mauro; d'Avella, Andrea; La Scaleia, Barbara; Moscatelli, Alessandro; Zago, Myrka

2014-01-01

In the future, human-like robots will live among people to provide company and help carrying out tasks in cooperation with humans. These interactions require that robots understand not only human actions, but also the way in which we perceive the world. Human perception heavily relies on the time dimension, especially when it comes to processing visual motion. Critically, human time perception for dynamic events is often inaccurate. Robots interacting with humans may want to see the world and tell time the way humans do: if so, they must incorporate human-like fallacy. Observers asked to judge the duration of brief scenes are prone to errors: perceived duration often does not match the physical duration of the event. Several kinds of temporal distortions have been described in the specialized literature. Here we review the topic with a special emphasis on our work dealing with time perception of animate actors versus inanimate actors. This work shows the existence of specialized time bases for different categories of targets. The time base used by the human brain to process visual motion appears to be calibrated against the specific predictions regarding the motion of human figures in case of animate motion, while it can be calibrated against the predictions of motion of passive objects in case of inanimate motion. Human perception of time appears to be strictly linked with the mechanisms used to control movements. Thus, neural time can be entrained by external cues in a similar manner for both perceptual judgments of elapsed time and in motor control tasks. One possible strategy could be to implement in humanoids a unique architecture for dealing with time, which would apply the same specialized mechanisms to both perception and action, similarly to humans. This shared implementation might render the humanoids more acceptable to humans, thus facilitating reciprocal interactions.
A method of 3D object recognition and localization in a cloud of points

NASA Astrophysics Data System (ADS)

Bielicki, Jerzy; Sitnik, Robert

2013-12-01

The proposed method given in this article is prepared for analysis of data in the form of cloud of points directly from 3D measurements. It is designed for use in the end-user applications that can directly be integrated with 3D scanning software. The method utilizes locally calculated feature vectors (FVs) in point cloud data. Recognition is based on comparison of the analyzed scene with reference object library. A global descriptor in the form of a set of spatially distributed FVs is created for each reference model. During the detection process, correlation of subsets of reference FVs with FVs calculated in the scene is computed. Features utilized in the algorithm are based on parameters, which qualitatively estimate mean and Gaussian curvatures. Replacement of differentiation with averaging in the curvatures estimation makes the algorithm more resistant to discontinuities and poor quality of the input data. Utilization of the FV subsets allows to detect partially occluded and cluttered objects in the scene, while additional spatial information maintains false positive rate at a reasonably low level.
Object tracking mask-based NLUT on GPUs for real-time generation of holographic videos of three-dimensional scenes.

PubMed

Kwon, M-W; Kim, S-C; Yoon, S-E; Ho, Y-S; Kim, E-S

2015-02-09

A new object tracking mask-based novel-look-up-table (OTM-NLUT) method is proposed and implemented on graphics-processing-units (GPUs) for real-time generation of holographic videos of three-dimensional (3-D) scenes. Since the proposed method is designed to be matched with software and memory structures of the GPU, the number of compute-unified-device-architecture (CUDA) kernel function calls and the computer-generated hologram (CGH) buffer size of the proposed method have been significantly reduced. It therefore results in a great increase of the computational speed of the proposed method and enables real-time generation of CGH patterns of 3-D scenes. Experimental results show that the proposed method can generate 31.1 frames of Fresnel CGH patterns with 1,920 × 1,080 pixels per second, on average, for three test 3-D video scenarios with 12,666 object points on three GPU boards of NVIDIA GTX TITAN, and confirm the feasibility of the proposed method in the practical application of electro-holographic 3-D displays.
Identifying sports videos using replay, text, and camera motion features

NASA Astrophysics Data System (ADS)

Kobla, Vikrant; DeMenthon, Daniel; Doermann, David S.

1999-12-01

Automated classification of digital video is emerging as an important piece of the puzzle in the design of content management systems for digital libraries. The ability to classify videos into various classes such as sports, news, movies, or documentaries, increases the efficiency of indexing, browsing, and retrieval of video in large databases. In this paper, we discuss the extraction of features that enable identification of sports videos directly from the compressed domain of MPEG video. These features include detecting the presence of action replays, determining the amount of scene text in vide, and calculating various statistics on camera and/or object motion. The features are derived from the macroblock, motion,and bit-rate information that is readily accessible from MPEG video with very minimal decoding, leading to substantial gains in processing speeds. Full-decoding of selective frames is required only for text analysis. A decision tree classifier built using these features is able to identify sports clips with an accuracy of about 93 percent.
Speed Limits: Orientation and Semantic Context Interactions Constrain Natural Scene Discrimination Dynamics

ERIC Educational Resources Information Center

Rieger, Jochem W.; Kochy, Nick; Schalk, Franziska; Gruschow, Marcus; Heinze, Hans-Jochen

2008-01-01

The visual system rapidly extracts information about objects from the cluttered natural environment. In 5 experiments, the authors quantified the influence of orientation and semantics on the classification speed of objects in natural scenes, particularly with regard to object-context interactions. Natural scene photographs were presented in an…
Influence of semantic consistency and perceptual features on visual attention during scene viewing in toddlers.

PubMed

Helo, Andrea; van Ommen, Sandrien; Pannasch, Sebastian; Danteny-Dordoigne, Lucile; Rämä, Pia

2017-11-01

Conceptual representations of everyday scenes are built in interaction with visual environment and these representations guide our visual attention. Perceptual features and object-scene semantic consistency have been found to attract our attention during scene exploration. The present study examined how visual attention in 24-month-old toddlers is attracted by semantic violations and how perceptual features (i. e. saliency, centre distance, clutter and object size) and linguistic properties (i. e. object label frequency and label length) affect gaze distribution. We compared eye movements of 24-month-old toddlers and adults while exploring everyday scenes which either contained an inconsistent (e.g., soap on a breakfast table) or consistent (e.g., soap in a bathroom) object. Perceptual features such as saliency, centre distance and clutter of the scene affected looking times in the toddler group during the whole viewing time whereas looking times in adults were affected only by centre distance during the early viewing time. Adults looked longer to inconsistent than consistent objects either if the objects had a high or a low saliency. In contrast, toddlers presented semantic consistency effect only when objects were highly salient. Additionally, toddlers with lower vocabulary skills looked longer to inconsistent objects while toddlers with higher vocabulary skills look equally long to both consistent and inconsistent objects. Our results indicate that 24-month-old children use scene context to guide visual attention when exploring the visual environment. However, perceptual features have a stronger influence in eye movement guidance in toddlers than in adults. Our results also indicate that language skills influence cognitive but not perceptual guidance of eye movements during scene perception in toddlers. Copyright © 2017 Elsevier Inc. All rights reserved.
Synchronous contextual irregularities affect early scene processing: replication and extension.

PubMed

Mudrik, Liad; Shalgi, Shani; Lamy, Dominique; Deouell, Leon Y

2014-04-01

Whether contextual regularities facilitate perceptual stages of scene processing is widely debated, and empirical evidence is still inconclusive. Specifically, it was recently suggested that contextual violations affect early processing of a scene only when the incongruent object and the scene are presented a-synchronously, creating expectations. We compared event-related potentials (ERPs) evoked by scenes that depicted a person performing an action using either a congruent or an incongruent object (e.g., a man shaving with a razor or with a fork) when scene and object were presented simultaneously. We also explored the role of attention in contextual processing by using a pre-cue to direct subjects׳ attention towards or away from the congruent/incongruent object. Subjects׳ task was to determine how many hands the person in the picture used in order to perform the action. We replicated our previous findings of frontocentral negativity for incongruent scenes that started ~ 210 ms post stimulus presentation, even earlier than previously found. Surprisingly, this incongruency ERP effect was negatively correlated with the reaction times cost on incongruent scenes. The results did not allow us to draw conclusions about the role of attention in detecting the regularity, due to a weak attention manipulation. By replicating the 200-300 ms incongruity effect with a new group of subjects at even earlier latencies than previously reported, the results strengthen the evidence for contextual processing during this time window even when simultaneous presentation of the scene and object prevent the formation of prior expectations. We discuss possible methodological limitations that may account for previous failures to find this an effect, and conclude that contextual information affects object model selection processes prior to full object identification, with semantic knowledge activation stages unfolding only later on. Copyright © 2014 Elsevier Ltd. All rights reserved.
Adherent Raindrop Modeling, Detectionand Removal in Video.

PubMed

You, Shaodi; Tan, Robby T; Kawakami, Rei; Mukaigawa, Yasuhiro; Ikeuchi, Katsushi

2016-09-01

Raindrops adhered to a windscreen or window glass can significantly degrade the visibility of a scene. Modeling, detecting and removing raindrops will, therefore, benefit many computer vision applications, particularly outdoor surveillance systems and intelligent vehicle systems. In this paper, a method that automatically detects and removes adherent raindrops is introduced. The core idea is to exploit the local spatio-temporal derivatives of raindrops. To accomplish the idea, we first model adherent raindrops using law of physics, and detect raindrops based on these models in combination with motion and intensity temporal derivatives of the input video. Having detected the raindrops, we remove them and restore the images based on an analysis that some areas of raindrops completely occludes the scene, and some other areas occlude only partially. For partially occluding areas, we restore them by retrieving as much as possible information of the scene, namely, by solving a blending function on the detected partially occluding areas using the temporal intensity derivative. For completely occluding areas, we recover them by using a video completion technique. Experimental results using various real videos show the effectiveness of our method.
An algebraic algorithm for nonuniformity correction in focal-plane arrays.

PubMed

Ratliff, Bradley M; Hayat, Majeed M; Hardie, Russell C

2002-09-01

A scene-based algorithm is developed to compensate for bias nonuniformity in focal-plane arrays. Nonuniformity can be extremely problematic, especially for mid- to far-infrared imaging systems. The technique is based on use of estimates of interframe subpixel shifts in an image sequence, in conjunction with a linear-interpolation model for the motion, to extract information on the bias nonuniformity algebraically. The performance of the proposed algorithm is analyzed by using real infrared and simulated data. One advantage of this technique is its simplicity; it requires relatively few frames to generate an effective correction matrix, thereby permitting the execution of frequent on-the-fly nonuniformity correction as drift occurs. Additionally, the performance is shown to exhibit considerable robustness with respect to lack of the common types of temporal and spatial irradiance diversity that are typically required by statistical scene-based nonuniformity correction techniques.
Simulator scene display evaluation device

NASA Technical Reports Server (NTRS)

Haines, R. F. (Inventor)

1986-01-01

An apparatus for aligning and calibrating scene displays in an aircraft simulator has a base on which all of the instruments for the aligning and calibrating are mounted. Laser directs beam at double right prism which is attached to pivoting support on base. The pivot point of the prism is located at the design eye point (DEP) of simulator during the aligning and calibrating. The objective lens in the base is movable on a track to follow the laser beam at different angles within the field of vision at the DEP. An eyepiece and a precision diopter are movable into a position behind the prism during the scene evaluation. A photometer or illuminometer is pivotable about the pivot into and out of position behind the eyepiece.
Three-dimensional scene reconstruction from a two-dimensional image

NASA Astrophysics Data System (ADS)

Parkins, Franz; Jacobs, Eddie

2017-05-01

We propose and simulate a method of reconstructing a three-dimensional scene from a two-dimensional image for developing and augmenting world models for autonomous navigation. This is an extension of the Perspective-n-Point (PnP) method which uses a sampling of the 3D scene, 2D image point parings, and Random Sampling Consensus (RANSAC) to infer the pose of the object and produce a 3D mesh of the original scene. Using object recognition and segmentation, we simulate the implementation on a scene of 3D objects with an eye to implementation on embeddable hardware. The final solution will be deployed on the NVIDIA Tegra platform.
Extracting heading and temporal range from optic flow: Human performance issues

NASA Technical Reports Server (NTRS)

Kaiser, Mary K.; Perrone, John A.; Stone, Leland; Banks, Martin S.; Crowell, James A.

1993-01-01

Pilots are able to extract information about their vehicle motion and environmental structure from dynamic transformations in the out-the-window scene. In this presentation, we focus on the information in the optic flow which specifies vehicle heading and distance to objects in the environment, scaled to a temporal metric. In particular, we are concerned with modeling how the human operators extract the necessary information, and what factors impact their ability to utilize the critical information. In general, the psychophysical data suggest that the human visual system is fairly robust to degradations in the visual display, e.g., reduced contrast and resolution or restricted field of view. However, extraneous motion flow, i.e., introduced by sensor rotation, greatly compromises human performance. The implications of these models and data for enhanced/synthetic vision systems are discussed.
Accurate estimation of object location in an image sequence using helicopter flight data

NASA Technical Reports Server (NTRS)

Tang, Yuan-Liang; Kasturi, Rangachar

1994-01-01

In autonomous navigation, it is essential to obtain a three-dimensional (3D) description of the static environment in which the vehicle is traveling. For a rotorcraft conducting low-latitude flight, this description is particularly useful for obstacle detection and avoidance. In this paper, we address the problem of 3D position estimation for static objects from a monocular sequence of images captured from a low-latitude flying helicopter. Since the environment is static, it is well known that the optical flow in the image will produce a radiating pattern from the focus of expansion. We propose a motion analysis system which utilizes the epipolar constraint to accurately estimate 3D positions of scene objects in a real world image sequence taken from a low-altitude flying helicopter. Results show that this approach gives good estimates of object positions near the rotorcraft's intended flight-path.
Modulation of Visually Evoked Postural Responses by Contextual Visual, Haptic and Auditory Information: A ‘Virtual Reality Check’

PubMed Central

Meyer, Georg F.; Shao, Fei; White, Mark D.; Hopkins, Carl; Robotham, Antony J.

2013-01-01

Externally generated visual motion signals can cause the illusion of self-motion in space (vection) and corresponding visually evoked postural responses (VEPR). These VEPRs are not simple responses to optokinetic stimulation, but are modulated by the configuration of the environment. The aim of this paper is to explore what factors modulate VEPRs in a high quality virtual reality (VR) environment where real and virtual foreground objects served as static visual, auditory and haptic reference points. Data from four experiments on visually evoked postural responses show that: 1) visually evoked postural sway in the lateral direction is modulated by the presence of static anchor points that can be haptic, visual and auditory reference signals; 2) real objects and their matching virtual reality representations as visual anchors have different effects on postural sway; 3) visual motion in the anterior-posterior plane induces robust postural responses that are not modulated by the presence of reference signals or the reality of objects that can serve as visual anchors in the scene. We conclude that automatic postural responses for laterally moving visual stimuli are strongly influenced by the configuration and interpretation of the environment and draw on multisensory representations. Different postural responses were observed for real and virtual visual reference objects. On the basis that automatic visually evoked postural responses in high fidelity virtual environments should mimic those seen in real situations we propose to use the observed effect as a robust objective test for presence and fidelity in VR. PMID:23840760
The statistics of local motion signals in naturalistic movies

PubMed Central

Nitzany, Eyal I.; Victor, Jonathan D.

2014-01-01

Extraction of motion from visual input plays an important role in many visual tasks, such as separation of figure from ground and navigation through space. Several kinds of local motion signals have been distinguished based on mathematical and computational considerations (e.g., motion based on spatiotemporal correlation of luminance, and motion based on spatiotemporal correlation of flicker), but little is known about the prevalence of these different kinds of signals in the real world. To address this question, we first note that different kinds of local motion signals (e.g., Fourier, non-Fourier, and glider) are characterized by second- and higher-order correlations in slanted spatiotemporal regions. The prevalence of local motion signals in natural scenes can thus be estimated by measuring the extent to which each of these correlations are present in space-time patches and whether they are coherent across spatiotemporal scales. We apply this technique to several popular movies. The results show that all three kinds of local motion signals are present in natural movies. While the balance of the different kinds of motion signals varies from segment to segment during the course of each movie, the overall pattern of prevalence of the different kinds of motion and their subtypes, and the correlations between them, is strikingly similar across movies (but is absent from white noise movies). In sum, naturalistic movies contain a diversity of local motion signals that occur with a consistent prevalence and pattern of covariation, indicating a substantial regularity of their high-order spatiotemporal image statistics. PMID:24732243

The statistics of local motion signals in naturalistic movies.

PubMed

Nitzany, Eyal I; Victor, Jonathan D

2014-04-14

Extraction of motion from visual input plays an important role in many visual tasks, such as separation of figure from ground and navigation through space. Several kinds of local motion signals have been distinguished based on mathematical and computational considerations (e.g., motion based on spatiotemporal correlation of luminance, and motion based on spatiotemporal correlation of flicker), but little is known about the prevalence of these different kinds of signals in the real world. To address this question, we first note that different kinds of local motion signals (e.g., Fourier, non-Fourier, and glider) are characterized by second- and higher-order correlations in slanted spatiotemporal regions. The prevalence of local motion signals in natural scenes can thus be estimated by measuring the extent to which each of these correlations are present in space-time patches and whether they are coherent across spatiotemporal scales. We apply this technique to several popular movies. The results show that all three kinds of local motion signals are present in natural movies. While the balance of the different kinds of motion signals varies from segment to segment during the course of each movie, the overall pattern of prevalence of the different kinds of motion and their subtypes, and the correlations between them, is strikingly similar across movies (but is absent from white noise movies). In sum, naturalistic movies contain a diversity of local motion signals that occur with a consistent prevalence and pattern of covariation, indicating a substantial regularity of their high-order spatiotemporal image statistics.
The cognitive structural approach for image restoration

NASA Astrophysics Data System (ADS)

Mardare, Igor; Perju, Veacheslav; Casasent, David

2008-03-01

It is analyzed the important and actual problem of the defective images of scenes restoration. The proposed approach provides restoration of scenes by a system on the basis of human intelligence phenomena reproduction used for restoration-recognition of images. The cognitive models of the restoration process are elaborated. The models are realized by the intellectual processors constructed on the base of neural networks and associative memory using neural network simulator NNToolbox from MATLAB 7.0. The models provides restoration and semantic designing of images of scenes under defective images of the separate objects.
Temporal and peripheral extraction of contextual cues from scenes during visual search.

PubMed

Koehler, Kathryn; Eckstein, Miguel P

2017-02-01

Scene context is known to facilitate object recognition and guide visual search, but little work has focused on isolating image-based cues and evaluating their contributions to eye movement guidance and search performance. Here, we explore three types of contextual cues (a co-occurring object, the configuration of other objects, and the superordinate category of background elements) and assess their joint contributions to search performance in the framework of cue-combination and the temporal unfolding of their extraction. We also assess whether observers' ability to extract each contextual cue in the visual periphery is a bottleneck that determines the utilization and contribution of each cue to search guidance and decision accuracy. We find that during the first four fixations of a visual search task observers first utilize the configuration of objects for coarse eye movement guidance and later use co-occurring object information for finer guidance. In the absence of contextual cues, observers were suboptimally biased to report the target object as being absent. The presence of the co-occurring object was the only contextual cue that had a significant effect in reducing decision bias. The early influence of object-based cues on eye movements is corroborated by a clear demonstration of observers' ability to extract object cues up to 16° into the visual periphery. The joint contributions of the cues to decision search accuracy approximates that expected from the combination of statistically independent cues and optimal cue combination. Finally, the lack of utilization and contribution of the background-based contextual cue to search guidance cannot be explained by the availability of the contextual cue in the visual periphery; instead it is related to background cues providing the least inherent information about the precise location of the target in the scene.
Modeling human pilot cue utilization with applications to simulator fidelity assessment.

PubMed

Zeyada, Y; Hess, R A

2000-01-01

An analytical investigation to model the manner in which pilots perceive and utilize visual, proprioceptive, and vestibular cues in a ground-based flight simulator was undertaken. Data from a NASA Ames Research Center vertical motion simulator study of a simple, single-degree-of-freedom rotorcraft bob-up/down maneuver were employed in the investigation. The study was part of a larger research effort that has the creation of a methodology for determining flight simulator fidelity requirements as its ultimate goal. The study utilized a closed-loop feedback structure of the pilot/simulator system that included the pilot, the cockpit inceptor, the dynamics of the simulated vehicle, and the motion system. With the exception of time delays that accrued in visual scene production in the simulator, visual scene effects were not included in this study. Pilot/vehicle analysis and fuzzy-inference identification were employed to study the changes in fidelity that occurred as the characteristics of the motion system were varied over five configurations. The data from three of the five pilots who participated in the experimental study were analyzed in the fuzzy-inference identification. Results indicate that both the analytical pilot/vehicle analysis and the fuzzy-inference identification can be used to identify changes in simulator fidelity for the task examined.
Functional correlates of the lateral and medial entorhinal cortex: objects, path integration and local-global reference frames.

PubMed

Knierim, James J; Neunuebel, Joshua P; Deshmukh, Sachin S

2014-02-05

The hippocampus receives its major cortical input from the medial entorhinal cortex (MEC) and the lateral entorhinal cortex (LEC). It is commonly believed that the MEC provides spatial input to the hippocampus, whereas the LEC provides non-spatial input. We review new data which suggest that this simple dichotomy between 'where' versus 'what' needs revision. We propose a refinement of this model, which is more complex than the simple spatial-non-spatial dichotomy. MEC is proposed to be involved in path integration computations based on a global frame of reference, primarily using internally generated, self-motion cues and external input about environmental boundaries and scenes; it provides the hippocampus with a coordinate system that underlies the spatial context of an experience. LEC is proposed to process information about individual items and locations based on a local frame of reference, primarily using external sensory input; it provides the hippocampus with information about the content of an experience.
Functional correlates of the lateral and medial entorhinal cortex: objects, path integration and local–global reference frames

PubMed Central

Knierim, James J.; Neunuebel, Joshua P.; Deshmukh, Sachin S.

2014-01-01

The hippocampus receives its major cortical input from the medial entorhinal cortex (MEC) and the lateral entorhinal cortex (LEC). It is commonly believed that the MEC provides spatial input to the hippocampus, whereas the LEC provides non-spatial input. We review new data which suggest that this simple dichotomy between ‘where’ versus ‘what’ needs revision. We propose a refinement of this model, which is more complex than the simple spatial–non-spatial dichotomy. MEC is proposed to be involved in path integration computations based on a global frame of reference, primarily using internally generated, self-motion cues and external input about environmental boundaries and scenes; it provides the hippocampus with a coordinate system that underlies the spatial context of an experience. LEC is proposed to process information about individual items and locations based on a local frame of reference, primarily using external sensory input; it provides the hippocampus with information about the content of an experience. PMID:24366146
Proscene: A feature-rich framework for interactive environments

NASA Astrophysics Data System (ADS)

Charalambos, Jean Pierre

We introduce Proscene, a feature-rich, open-source framework for interactive environments. The design of Proscene comprises a three-layered onion-like software architecture, promoting different possible development scenarios. The framework innermost layer decouples user gesture parsing from user-defined actions. The in-between layer implements a feature-rich set of widely-used motion actions allowing the selection and manipulation of objects, including the scene viewpoint. The outermost layer exposes those features as a Processing library. The results have shown the feasibility of our approach together with the simplicity and flexibility of the Proscene framework API.
A self-adaptive algorithm for traffic sign detection in motion image based on color and shape features

NASA Astrophysics Data System (ADS)

Zhang, Ka; Sheng, Yehua; Gong, Zhijun; Ye, Chun; Li, Yongqiang; Liang, Cheng

2007-06-01

As an important sub-system in intelligent transportation system (ITS), the detection and recognition of traffic signs from mobile images is becoming one of the hot spots in the international research field of ITS. Considering the problem of traffic sign automatic detection in motion images, a new self-adaptive algorithm for traffic sign detection based on color and shape features is proposed in this paper. Firstly, global statistical color features of different images are computed based on statistics theory. Secondly, some self-adaptive thresholds and special segmentation rules for image segmentation are designed according to these global color features. Then, for red, yellow and blue traffic signs, the color image is segmented to three binary images by these thresholds and rules. Thirdly, if the number of white pixels in the segmented binary image exceeds the filtering threshold, the binary image should be further filtered. Fourthly, the method of gray-value projection is used to confirm top, bottom, left and right boundaries for candidate regions of traffic signs in the segmented binary image. Lastly, if the shape feature of candidate region satisfies the need of real traffic sign, this candidate region is confirmed as the detected traffic sign region. The new algorithm is applied to actual motion images of natural scenes taken by a CCD camera of the mobile photogrammetry system in Nanjing at different time. The experimental results show that the algorithm is not only simple, robust and more adaptive to natural scene images, but also reliable and high-speed on real traffic sign detection.
Immersive Virtual Moon Scene System Based on Panoramic Camera Data of Chang'E-3

NASA Astrophysics Data System (ADS)

Gao, X.; Liu, J.; Mu, L.; Yan, W.; Zeng, X.; Zhang, X.; Li, C.

2014-12-01

The system "Immersive Virtual Moon Scene" is used to show the virtual environment of Moon surface in immersive environment. Utilizing stereo 360-degree imagery from panoramic camera of Yutu rover, the system enables the operator to visualize the terrain and the celestial background from the rover's point of view in 3D. To avoid image distortion, stereo 360-degree panorama stitched by 112 images is projected onto inside surface of sphere according to panorama orientation coordinates and camera parameters to build the virtual scene. Stars can be seen from the Moon at any time. So we render the sun, planets and stars according to time and rover's location based on Hipparcos catalogue as the background on the sphere. Immersing in the stereo virtual environment created by this imaged-based rendering technique, the operator can zoom, pan to interact with the virtual Moon scene and mark interesting objects. Hardware of the immersive virtual Moon system is made up of four high lumen projectors and a huge curve screen which is 31 meters long and 5.5 meters high. This system which take all panoramic camera data available and use it to create an immersive environment, enable operator to interact with the environment and mark interesting objects contributed heavily to establishment of science mission goals in Chang'E-3 mission. After Chang'E-3 mission, the lab with this system will be open to public. Besides this application, Moon terrain stereo animations based on Chang'E-1 and Chang'E-2 data will be showed to public on the huge screen in the lab. Based on the data of lunar exploration，we will made more immersive virtual moon scenes and animations to help the public understand more about the Moon in the future.
Vertical gaze angle: absolute height-in-scene information for the programming of prehension.

PubMed

Gardner, P L; Mon-Williams, M

2001-02-01

One possible source of information regarding the distance of a fixated target is provided by the height of the object within the visual scene. It is accepted that this cue can provide ordinal information, but generally it has been assumed that the nervous system cannot extract "absolute" information from height-in-scene. In order to use height-in-scene, the nervous system would need to be sensitive to ocular position with respect to the head and to head orientation with respect to the shoulders (i.e. vertical gaze angle or VGA). We used a perturbation technique to establish whether the nervous system uses vertical gaze angle as a distance cue. Vertical gaze angle was perturbed using ophthalmic prisms with the base oriented either up or down. In experiment 1, participants were required to carry out an open-loop pointing task whilst wearing: (1) no prisms; (2) a base-up prism; or (3) a base-down prism. In experiment 2, the participants reached to grasp an object under closed-loop viewing conditions whilst wearing: (1) no prisms; (2) a base-up prism; or (3) a base-down prism. Experiment 1 and 2 provided clear evidence that the human nervous system uses vertical gaze angle as a distance cue. It was found that the weighting attached to VGA decreased with increasing target distance. The weighting attached to VGA was also affected by the discrepancy between the height of the target, as specified by all other distance cues, and the height indicated by the initial estimate of the position of the supporting surface. We conclude by considering the use of height-in-scene information in the perception of surface slant and highlight some of the complexities that must be involved in the computation of environmental layout.
Active polarization descattering.

PubMed

Treibitz, Tali; Schechner, Yoav Y

2009-03-01

Vision in scattering media is important but challenging. Images suffer from poor visibility due to backscattering and attenuation. Most prior methods for scene recovery use active illumination scanners (structured and gated), which can be slow and cumbersome, while natural illumination is inapplicable to dark environments. The current paper addresses the need for a non-scanning recovery method, that uses active scene irradiance. We study the formation of images under widefield artificial illumination. Based on the formation model, the paper presents an approach for recovering the object signal. It also yields rough information about the 3D scene structure. The approach can work with compact, simple hardware, having active widefield, polychromatic polarized illumination. The camera is fitted with a polarization analyzer. Two frames of the scene are taken, with different states of the analyzer or polarizer. A recovery algorithm follows the acquisition. It allows both the backscatter and the object reflection to be partially polarized. It thus unifies and generalizes prior polarization-based methods, which had assumed exclusive polarization of either of these components. The approach is limited to an effective range, due to image noise and illumination falloff. Thus, the limits and noise sensitivity are analyzed. We demonstrate the approach in underwater field experiments.
3D reconstruction based on light field images

NASA Astrophysics Data System (ADS)

Zhu, Dong; Wu, Chunhong; Liu, Yunluo; Fu, Dongmei

2018-04-01

This paper proposed a method of reconstructing three-dimensional (3D) scene from two light field images capture by Lytro illium. The work was carried out by first extracting the sub-aperture images from light field images and using the scale-invariant feature transform (SIFT) for feature registration on the selected sub-aperture images. Structure from motion (SFM) algorithm is further used on the registration completed sub-aperture images to reconstruct the three-dimensional scene. 3D sparse point cloud was obtained in the end. The method shows that the 3D reconstruction can be implemented by only two light field camera captures, rather than at least a dozen times captures by traditional cameras. This can effectively solve the time-consuming, laborious issues for 3D reconstruction based on traditional digital cameras, to achieve a more rapid, convenient and accurate reconstruction.
An improved artifact removal in exposure fusion with local linear constraints

NASA Astrophysics Data System (ADS)

Zhang, Hai; Yu, Mali

2018-04-01

In exposure fusion, it is challenging to remove artifacts because of camera motion and moving objects in the scene. An improved artifact removal method is proposed in this paper, which performs local linear adjustment in artifact removal progress. After determining a reference image, we first perform high-dynamic-range (HDR) deghosting to generate an intermediate image stack from the input image stack. Then, a linear Intensity Mapping Function (IMF) in each window is extracted based on the intensities of intermediate image and reference image, the intensity mean and variance of reference image. Finally, with the extracted local linear constraints, we reconstruct a target image stack, which can be directly used for fusing a single HDR-like image. Some experiments have been implemented and experimental results demonstrate that the proposed method is robust and effective in removing artifacts especially in the saturated regions of the reference image.
A motion deblurring method with long/short exposure image pairs

NASA Astrophysics Data System (ADS)

Cui, Guangmang; Hua, Weiping; Zhao, Jufeng; Gong, Xiaoli; Zhu, Liyao

2018-01-01

In this paper, a motion deblurring method with long/short exposure image pairs is presented. The long/short exposure image pairs are captured for the same scene under different exposure time. The image pairs are treated as the input of the deblurring method and more information could be used to obtain a deblurring result with high image quality. Firstly, the luminance equalization process is carried out to the short exposure image. And the blur kernel is estimated with the image pair under the maximum a posteriori (MAP) framework using conjugate gradient algorithm. Then a L0 image smoothing based denoising method is applied to the luminance equalized image. And the final deblurring result is obtained with the gain controlled residual image deconvolution process with the edge map as the gain map. Furthermore, a real experimental optical system is built to capture the image pair in order to demonstrate the effectiveness of the proposed deblurring framework. The long/short image pairs are obtained under different exposure time and camera gain control. Experimental results show that the proposed method could provide a superior deblurring result in both subjective and objective assessment compared with other deblurring approaches.
Human ocular responses to translation of the observer and of the scene: dependence on viewing distance.

PubMed

Busettini, C; Miles, F A; Schwarz, U; Carl, J R

1994-01-01

Recent experiments on monkeys have indicated that the eye movements induced by brief translation of either the observer or the visual scene are a linear function of the inverse of the viewing distance. For the movements of the observer, the room was dark and responses were attributed to a translational vestibulo-ocular reflex (TVOR) that senses the motion through the otolith organs; for the movements of the scene, which elicit ocular following, the scene was projected and adjusted in size and speed so that the retinal stimulation was the same at all distances. The shared dependence on viewing distance was consistent with the hypothesis that the TVOR and ocular following are synergistic and share central pathways. The present experiments looked for such dependencies on viewing distance in human subjects. When briefly accelerated along the interaural axis in the dark, human subjects generated compensatory eye movements that were also a linear function of the inverse of the viewing distance to a previously fixated target. These responses, which were attributed to the TVOR, were somewhat weaker than those previously recorded from monkeys using similar methods. When human subjects faced a tangent screen onto which patterned images were projected, brief motion of those images evoked ocular following responses that showed statistically significant dependence on viewing distance only with low-speed stimuli (10 degrees/s). This dependence was at best weak and in the reverse direction of that seen with the TVOR, i.e., responses increased as viewing distance increased. We suggest that in generating an internal estimate of viewing distance subjects may have used a confounding cue in the ocular-following paradigm--the size of the projected scene--which was varied directly with the viewing distance in these experiments (in order to preserve the size of the retinal image). When movements of the subject were randomly interleaved with the movements of the scene--to encourage the expectation of ego-motion--the dependence of ocular following on viewing distance altered significantly: with higher speed stimuli (40 degrees/s) many responses (63%) now increased significantly as viewing distance decreased, though less vigorously than the TVOR. We suggest that the expectation of motion results in the subject placing greater weight on cues such as vergence and accommodation that provide veridical distance information in our experimental situation: cue selection is context specific.
Intrinsic and contextual features in object recognition.

PubMed

Schlangen, Derrick; Barenholtz, Elan

2015-01-28

The context in which an object is found can facilitate its recognition. Yet, it is not known how effective this contextual information is relative to the object's intrinsic visual features, such as color and shape. To address this, we performed four experiments using rendered scenes with novel objects. In each experiment, participants first performed a visual search task, searching for a uniquely shaped target object whose color and location within the scene was experimentally manipulated. We then tested participants' tendency to use their knowledge of the location and color information in an identification task when the objects' images were degraded due to blurring, thus eliminating the shape information. In Experiment 1, we found that, in the absence of any diagnostic intrinsic features, participants identified objects based purely on their locations within the scene. In Experiment 2, we found that participants combined an intrinsic feature, color, with contextual location in order to uniquely specify an object. In Experiment 3, we found that when an object's color and location information were in conflict, participants identified the object using both sources of information equally. Finally, in Experiment 4, we found that participants used whichever source of information-either color or location-was more statistically reliable in order to identify the target object. Overall, these experiments show that the context in which objects are found can play as important a role as intrinsic features in identifying the objects. © 2015 ARVO.
Expertise in crime scene examination: comparing search strategies of expert and novice crime scene examiners in simulated crime scenes.

PubMed

Baber, Chris; Butler, Mark

2012-06-01

The strategies of novice and expert crime scene examiners were compared in searching crime scenes. Previous studies have demonstrated that experts frame a scene through reconstructing the likely actions of a criminal and use contextual cues to develop hypotheses that guide subsequent search for evidence. Novice (first-year undergraduate students of forensic sciences) and expert (experienced crime scene examiners) examined two "simulated" crime scenes. Performance was captured through a combination of concurrent verbal protocol and own-point recording, using head-mounted cameras. Although both groups paid attention to the likely modus operandi of the perpetrator (in terms of possible actions taken), the novices paid more attention to individual objects, whereas the experts paid more attention to objects with "evidential value." Novices explore the scene in terms of the objects that it contains, whereas experts consider the evidence analysis that can be performed as a consequence of the examination. The suggestion is that the novices are putting effort into detailing the scene in terms of its features, whereas the experts are putting effort into the likely actions that can be performed as a consequence of the examination. The findings have helped in developing the expertise of novice crime scene examiners and approaches to training of expertise within this population.
Probabilistic multi-person localisation and tracking in image sequences

NASA Astrophysics Data System (ADS)

Klinger, T.; Rottensteiner, F.; Heipke, C.

2017-05-01

The localisation and tracking of persons in image sequences in commonly guided by recursive filters. Especially in a multi-object tracking environment, where mutual occlusions are inherent, the predictive model is prone to drift away from the actual target position when not taking context into account. Further, if the image-based observations are imprecise, the trajectory is prone to be updated towards a wrong position. In this work we address both these problems by using a new predictive model on the basis of Gaussian Process Regression, and by using generic object detection, as well as instance-specific classification, for refined localisation. The predictive model takes into account the motion of every tracked pedestrian in the scene and the prediction is executed with respect to the velocities of neighbouring persons. In contrast to existing methods our approach uses a Dynamic Bayesian Network in which the state vector of a recursive Bayes filter, as well as the location of the tracked object in the image, are modelled as unknowns. This allows the detection to be corrected before it is incorporated into the recursive filter. Our method is evaluated on a publicly available benchmark dataset and outperforms related methods in terms of geometric precision and tracking accuracy.
Learning and Inferring "Dark Matter" and Predicting Human Intents and Trajectories in Videos.

PubMed

Xie, Dan; Shu, Tianmin; Todorovic, Sinisa; Zhu, Song-Chun

2018-07-01

This paper presents a method for localizing functional objects and predicting human intents and trajectories in surveillance videos of public spaces, under no supervision in training. People in public spaces are expected to intentionally take shortest paths (subject to obstacles) toward certain objects (e.g., vending machine, picnic table, dumpster etc.) where they can satisfy certain needs (e.g., quench thirst). Since these objects are typically very small or heavily occluded, they cannot be inferred by their visual appearance but indirectly by their influence on people's trajectories. Therefore, we call them "dark matter", by analogy to cosmology, since their presence can only be observed as attractive or repulsive "fields" in the public space. A person in the scene is modeled as an intelligent agent engaged in one of the "fields" selected depending his/her intent. An agent's trajectory is derived from an Agent-based Lagrangian Mechanics. The agents can change their intents in the middle of motion and thus alter the trajectory. For evaluation, we compiled and annotated a new dataset. The results demonstrate our effectiveness in predicting human intent behaviors and trajectories, and localizing and discovering distinct types of "dark matter" in wide public spaces.
Multilayered nonuniform sampling for three-dimensional scene representation

NASA Astrophysics Data System (ADS)

Lin, Huei-Yung; Xiao, Yu-Hua; Chen, Bo-Ren

2015-09-01

The representation of a three-dimensional (3-D) scene is essential in multiview imaging technologies. We present a unified geometry and texture representation based on global resampling of the scene. A layered data map representation with a distance-dependent nonuniform sampling strategy is proposed. It is capable of increasing the details of the 3-D structure locally and is compact in size. The 3-D point cloud obtained from the multilayered data map is used for view rendering. For any given viewpoint, image synthesis with different levels of detail is carried out using the quadtree-based nonuniformly sampled 3-D data points. Experimental results are presented using the 3-D models of reconstructed real objects.

Reduced depth inversion illusions in schizophrenia are state-specific and occur for multiple object types and viewing conditions.

PubMed

Keane, Brian P; Silverstein, Steven M; Wang, Yushi; Papathomas, Thomas V

2013-05-01

Schizophrenia patients are less susceptible to depth inversion illusions (DIIs) in which concave faces appear as convex, but what stimulus attributes generate this effect and how does it vary with clinical state? To address these issues, we had 30 schizophrenia patients and 25 well-matched healthy controls make convexity judgments on physically concave faces and scenes. Patients were selectively sampled from three levels of care to ensure symptom heterogeneity. Half of the concave objects were painted with realistic texture to enhance the convexity illusion; the remaining objects were painted uniform beige to reduce the illusion. Subjects viewed the objects with one eye while laterally moving in front of the stimulus (to see depth via motion parallax) or with two eyes while remaining motionless (to see depth stereoscopically). For each group, DIIs were stronger with texture than without, and weaker with stereoscopic information than without, indicating that patients responded normally to stimulus alterations. More importantly, patients experienced fewer illusions than controls irrespective of the face/scene category, texture, or viewing condition (parallax/stereo). Illusions became less frequent as patients experienced more positive symptoms and required more structured treatment. Taken together, these results indicate that people with schizophrenia experience fewer DIIs with a variety of object types and viewing conditions, perhaps because of a lessened tendency to construe any type of object as convex. Moreover, positive symptoms and the need for structured treatment are associated with more accurate 3-D perception, suggesting that DII may serve as a state marker for the illness. © 2013 American Psychological Association
Real time moving scene holographic camera system

NASA Technical Reports Server (NTRS)

Kurtz, R. L. (Inventor)

1973-01-01

A holographic motion picture camera system producing resolution of front surface detail is described. The system utilizes a beam of coherent light and means for dividing the beam into a reference beam for direct transmission to a conventional movie camera and two reflection signal beams for transmission to the movie camera by reflection from the front side of a moving scene. The system is arranged so that critical parts of the system are positioned on the foci of a pair of interrelated, mathematically derived ellipses. The camera has the theoretical capability of producing motion picture holograms of projectiles moving at speeds as high as 900,000 cm/sec (about 21,450 mph).
Eye Movements and Visual Memory for Scenes

DTIC Science & Technology

2005-01-01

Scene memory research has demonstrated that the memory representation of a semantically inconsistent object in a scene is more detailed and/or complete... memory during scene viewing, then changes to semantically inconsistent objects (which should be represented more com- pletely) should be detected more... semantic description. Due to the surprise nature of the visual memory test, any learning that occurred during the search portion of the experiment was
Transient cardio-respiratory responses to visually induced tilt illusions

NASA Technical Reports Server (NTRS)

Wood, S. J.; Ramsdell, C. D.; Mullen, T. J.; Oman, C. M.; Harm, D. L.; Paloski, W. H.

2000-01-01

Although the orthostatic cardio-respiratory response is primarily mediated by the baroreflex, studies have shown that vestibular cues also contribute in both humans and animals. We have demonstrated a visually mediated response to illusory tilt in some human subjects. Blood pressure, heart and respiration rate, and lung volume were monitored in 16 supine human subjects during two types of visual stimulation, and compared with responses to real passive whole body tilt from supine to head 80 degrees upright. Visual tilt stimuli consisted of either a static scene from an overhead mirror or constant velocity scene motion along different body axes generated by an ultra-wide dome projection system. Visual vertical cues were initially aligned with the longitudinal body axis. Subjective tilt and self-motion were reported verbally. Although significant changes in cardio-respiratory parameters to illusory tilts could not be demonstrated for the entire group, several subjects showed significant transient decreases in mean blood pressure resembling their initial response to passive head-up tilt. Changes in pulse pressure and a slight elevation in heart rate were noted. These transient responses are consistent with the hypothesis that visual-vestibular input contributes to the initial cardiovascular adjustment to a change in posture in humans. On average the static scene elicited perceived tilt without rotation. Dome scene pitch and yaw elicited perceived tilt and rotation, and dome roll motion elicited perceived rotation without tilt. A significant correlation between the magnitude of physiological and subjective reports could not be demonstrated.
Scene text recognition in mobile applications by character descriptor and structure configuration.

PubMed

Yi, Chucai; Tian, Yingli

2014-07-01

Text characters and strings in natural scene can provide valuable information for many applications. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. This paper proposes a method of scene text recognition from detected text regions. In text detection, our previously proposed algorithms are applied to obtain text regions from scene image. First, we design a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors. Second, we model character structure at each character class by designing stroke configuration maps. Our algorithm design is compatible with the application of scene text extraction in smart mobile devices. An Android-based demo system is developed to show the effectiveness of our proposed method on scene text information extraction from nearby objects. The demo system also provides us some insight into algorithm design and performance improvement of scene text extraction. The evaluation results on benchmark data sets demonstrate that our proposed scheme of text recognition is comparable with the best existing methods.
Translucent Radiosity: Efficiently Combining Diffuse Inter-Reflection and Subsurface Scattering.

PubMed

Sheng, Yu; Shi, Yulong; Wang, Lili; Narasimhan, Srinivasa G

2014-07-01

It is hard to efficiently model the light transport in scenes with translucent objects for interactive applications. The inter-reflection between objects and their environments and the subsurface scattering through the materials intertwine to produce visual effects like color bleeding, light glows, and soft shading. Monte-Carlo based approaches have demonstrated impressive results but are computationally expensive, and faster approaches model either only inter-reflection or only subsurface scattering. In this paper, we present a simple analytic model that combines diffuse inter-reflection and isotropic subsurface scattering. Our approach extends the classical work in radiosity by including a subsurface scattering matrix that operates in conjunction with the traditional form factor matrix. This subsurface scattering matrix can be constructed using analytic, measurement-based or simulation-based models and can capture both homogeneous and heterogeneous translucencies. Using a fast iterative solution to radiosity, we demonstrate scene relighting and dynamically varying object translucencies at near interactive rates.
Robust tracking of respiratory rate in high-dynamic range scenes using mobile thermal imaging

PubMed Central

Cho, Youngjun; Julier, Simon J.; Marquardt, Nicolai; Bianchi-Berthouze, Nadia

2017-01-01

The ability to monitor the respiratory rate, one of the vital signs, is extremely important for the medical treatment, healthcare and fitness sectors. In many situations, mobile methods, which allow users to undertake everyday activities, are required. However, current monitoring systems can be obtrusive, requiring users to wear respiration belts or nasal probes. Alternatively, contactless digital image sensor based remote-photoplethysmography (PPG) can be used. However, remote PPG requires an ambient source of light, and does not work properly in dark places or under varying lighting conditions. Recent advances in thermographic systems have shrunk their size, weight and cost, to the point where it is possible to create smart-phone based respiration rate monitoring devices that are not affected by lighting conditions. However, mobile thermal imaging is challenged in scenes with high thermal dynamic ranges (e.g. due to the different environmental temperature distributions indoors and outdoors). This challenge is further amplified by general problems such as motion artifacts and low spatial resolution, leading to unreliable breathing signals. In this paper, we propose a novel and robust approach for respiration tracking which compensates for the negative effects of variations in the ambient temperature and motion artifacts and can accurately extract breathing rates in highly dynamic thermal scenes. The approach is based on tracking the nostril of the user and using local temperature variations to infer inhalation and exhalation cycles. It has three main contributions. The first is a novel Optimal Quantization technique which adaptively constructs a color mapping of absolute temperature to improve segmentation, classification and tracking. The second is the Thermal Gradient Flow method that computes thermal gradient magnitude maps to enhance the accuracy of the nostril region tracking. Finally, we introduce the Thermal Voxel method to increase the reliability of the captured respiration signals compared to the traditional averaging method. We demonstrate the extreme robustness of our system to track the nostril-region and measure the respiratory rate by evaluating it during controlled respiration exercises in high thermal dynamic scenes (e.g. strong correlation (r = 0.9987) with the ground truth from the respiration-belt sensor). We also demonstrate how our algorithm outperformed standard algorithms in settings with different amounts of environmental thermal changes and human motion. We open the tracked ROI sequences of the datasets collected for these studies (i.e. under both controlled and unconstrained real-world settings) to the community to foster work in this area. PMID:29082079
Robust tracking of respiratory rate in high-dynamic range scenes using mobile thermal imaging.

PubMed

Cho, Youngjun; Julier, Simon J; Marquardt, Nicolai; Bianchi-Berthouze, Nadia

2017-10-01

The ability to monitor the respiratory rate, one of the vital signs, is extremely important for the medical treatment, healthcare and fitness sectors. In many situations, mobile methods, which allow users to undertake everyday activities, are required. However, current monitoring systems can be obtrusive, requiring users to wear respiration belts or nasal probes. Alternatively, contactless digital image sensor based remote-photoplethysmography (PPG) can be used. However, remote PPG requires an ambient source of light, and does not work properly in dark places or under varying lighting conditions. Recent advances in thermographic systems have shrunk their size, weight and cost, to the point where it is possible to create smart-phone based respiration rate monitoring devices that are not affected by lighting conditions. However, mobile thermal imaging is challenged in scenes with high thermal dynamic ranges (e.g. due to the different environmental temperature distributions indoors and outdoors). This challenge is further amplified by general problems such as motion artifacts and low spatial resolution, leading to unreliable breathing signals. In this paper, we propose a novel and robust approach for respiration tracking which compensates for the negative effects of variations in the ambient temperature and motion artifacts and can accurately extract breathing rates in highly dynamic thermal scenes. The approach is based on tracking the nostril of the user and using local temperature variations to infer inhalation and exhalation cycles. It has three main contributions. The first is a novel Optimal Quantization technique which adaptively constructs a color mapping of absolute temperature to improve segmentation, classification and tracking. The second is the Thermal Gradient Flow method that computes thermal gradient magnitude maps to enhance the accuracy of the nostril region tracking. Finally, we introduce the Thermal Voxel method to increase the reliability of the captured respiration signals compared to the traditional averaging method. We demonstrate the extreme robustness of our system to track the nostril-region and measure the respiratory rate by evaluating it during controlled respiration exercises in high thermal dynamic scenes (e.g. strong correlation (r = 0.9987) with the ground truth from the respiration-belt sensor). We also demonstrate how our algorithm outperformed standard algorithms in settings with different amounts of environmental thermal changes and human motion. We open the tracked ROI sequences of the datasets collected for these studies (i.e. under both controlled and unconstrained real-world settings) to the community to foster work in this area.
Towards photorealistic and immersive virtual-reality environments for simulated prosthetic vision: integrating recent breakthroughs in consumer hardware and software.

PubMed

Zapf, Marc P; Matteucci, Paul B; Lovell, Nigel H; Zheng, Steven; Suaning, Gregg J

2014-01-01

Simulated prosthetic vision (SPV) in normally sighted subjects is an established way of investigating the prospective efficacy of visual prosthesis designs in visually guided tasks such as mobility. To perform meaningful SPV mobility studies in computer-based environments, a credible representation of both the virtual scene to navigate and the experienced artificial vision has to be established. It is therefore prudent to make optimal use of existing hardware and software solutions when establishing a testing framework. The authors aimed at improving the realism and immersion of SPV by integrating state-of-the-art yet low-cost consumer technology. The feasibility of body motion tracking to control movement in photo-realistic virtual environments was evaluated in a pilot study. Five subjects were recruited and performed an obstacle avoidance and wayfinding task using either keyboard and mouse, gamepad or Kinect motion tracking. Walking speed and collisions were analyzed as basic measures for task performance. Kinect motion tracking resulted in lower performance as compared to classical input methods, yet results were more uniform across vision conditions. The chosen framework was successfully applied in a basic virtual task and is suited to realistically simulate real-world scenes under SPV in mobility research. Classical input peripherals remain a feasible and effective way of controlling the virtual movement. Motion tracking, despite its limitations and early state of implementation, is intuitive and can eliminate between-subject differences due to familiarity to established input methods.
Using articulated scene models for dynamic 3d scene analysis in vista spaces

NASA Astrophysics Data System (ADS)

Beuter, Niklas; Swadzba, Agnes; Kummert, Franz; Wachsmuth, Sven

2010-09-01

In this paper we describe an efficient but detailed new approach to analyze complex dynamic scenes directly in 3D. The arising information is important for mobile robots to solve tasks in the area of household robotics. In our work a mobile robot builds an articulated scene model by observing the environment in the visual field or rather in the so-called vista space. The articulated scene model consists of essential knowledge about the static background, about autonomously moving entities like humans or robots and finally, in contrast to existing approaches, information about articulated parts. These parts describe movable objects like chairs, doors or other tangible entities, which could be moved by an agent. The combination of the static scene, the self-moving entities and the movable objects in one articulated scene model enhances the calculation of each single part. The reconstruction process for parts of the static scene benefits from removal of the dynamic parts and in turn, the moving parts can be extracted more easily through the knowledge about the background. In our experiments we show, that the system delivers simultaneously an accurate static background model, moving persons and movable objects. This information of the articulated scene model enables a mobile robot to detect and keep track of interaction partners, to navigate safely through the environment and finally, to strengthen the interaction with the user through the knowledge about the 3D articulated objects and 3D scene analysis. [Figure not available: see fulltext.
Scene perception in posterior cortical atrophy: categorization, description and fixation patterns.

PubMed

Shakespeare, Timothy J; Yong, Keir X X; Frost, Chris; Kim, Lois G; Warrington, Elizabeth K; Crutch, Sebastian J

2013-01-01

Partial or complete Balint's syndrome is a core feature of the clinico-radiological syndrome of posterior cortical atrophy (PCA), in which individuals experience a progressive deterioration of cortical vision. Although multi-object arrays are frequently used to detect simultanagnosia in the clinical assessment and diagnosis of PCA, to date there have been no group studies of scene perception in patients with the syndrome. The current study involved three linked experiments conducted in PCA patients and healthy controls. Experiment 1 evaluated the accuracy and latency of complex scene perception relative to individual faces and objects (color and grayscale) using a categorization paradigm. PCA patients were both less accurate (faces < scenes < objects) and slower (scenes < objects < faces) than controls on all categories, with performance strongly associated with their level of basic visual processing impairment; patients also showed a small advantage for color over grayscale stimuli. Experiment 2 involved free description of real world scenes. PCA patients generated fewer features and more misperceptions than controls, though perceptual errors were always consistent with the patient's global understanding of the scene (whether correct or not). Experiment 3 used eye tracking measures to compare patient and control eye movements over initial and subsequent fixations of scenes. Patients' fixation patterns were significantly different to those of young and age-matched controls, with comparable group differences for both initial and subsequent fixations. Overall, these findings describe the variability in everyday scene perception exhibited by individuals with PCA, and indicate the importance of exposure duration in the perception of complex scenes.
Real-Time Motion Tracking for Mobile Augmented/Virtual Reality Using Adaptive Visual-Inertial Fusion

PubMed Central

Fang, Wei; Zheng, Lianyu; Deng, Huanjun; Zhang, Hongbo

2017-01-01

In mobile augmented/virtual reality (AR/VR), real-time 6-Degree of Freedom (DoF) motion tracking is essential for the registration between virtual scenes and the real world. However, due to the limited computational capacity of mobile terminals today, the latency between consecutive arriving poses would damage the user experience in mobile AR/VR. Thus, a visual-inertial based real-time motion tracking for mobile AR/VR is proposed in this paper. By means of high frequency and passive outputs from the inertial sensor, the real-time performance of arriving poses for mobile AR/VR is achieved. In addition, to alleviate the jitter phenomenon during the visual-inertial fusion, an adaptive filter framework is established to cope with different motion situations automatically, enabling the real-time 6-DoF motion tracking by balancing the jitter and latency. Besides, the robustness of the traditional visual-only based motion tracking is enhanced, giving rise to a better mobile AR/VR performance when motion blur is encountered. Finally, experiments are carried out to demonstrate the proposed method, and the results show that this work is capable of providing a smooth and robust 6-DoF motion tracking for mobile AR/VR in real-time. PMID:28475145
Real-Time Motion Tracking for Mobile Augmented/Virtual Reality Using Adaptive Visual-Inertial Fusion.

PubMed

Fang, Wei; Zheng, Lianyu; Deng, Huanjun; Zhang, Hongbo

2017-05-05

In mobile augmented/virtual reality (AR/VR), real-time 6-Degree of Freedom (DoF) motion tracking is essential for the registration between virtual scenes and the real world. However, due to the limited computational capacity of mobile terminals today, the latency between consecutive arriving poses would damage the user experience in mobile AR/VR. Thus, a visual-inertial based real-time motion tracking for mobile AR/VR is proposed in this paper. By means of high frequency and passive outputs from the inertial sensor, the real-time performance of arriving poses for mobile AR/VR is achieved. In addition, to alleviate the jitter phenomenon during the visual-inertial fusion, an adaptive filter framework is established to cope with different motion situations automatically, enabling the real-time 6-DoF motion tracking by balancing the jitter and latency. Besides, the robustness of the traditional visual-only based motion tracking is enhanced, giving rise to a better mobile AR/VR performance when motion blur is encountered. Finally, experiments are carried out to demonstrate the proposed method, and the results show that this work is capable of providing a smooth and robust 6-DoF motion tracking for mobile AR/VR in real-time.
Multiview human activity recognition system based on spatiotemporal template for video surveillance system

NASA Astrophysics Data System (ADS)

Kushwaha, Alok Kumar Singh; Srivastava, Rajeev

2015-09-01

An efficient view invariant framework for the recognition of human activities from an input video sequence is presented. The proposed framework is composed of three consecutive modules: (i) detect and locate people by background subtraction, (ii) view invariant spatiotemporal template creation for different activities, (iii) and finally, template matching is performed for view invariant activity recognition. The foreground objects present in a scene are extracted using change detection and background modeling. The view invariant templates are constructed using the motion history images and object shape information for different human activities in a video sequence. For matching the spatiotemporal templates for various activities, the moment invariants and Mahalanobis distance are used. The proposed approach is tested successfully on our own viewpoint dataset, KTH action recognition dataset, i3DPost multiview dataset, MSR viewpoint action dataset, VideoWeb multiview dataset, and WVU multiview human action recognition dataset. From the experimental results and analysis over the chosen datasets, it is observed that the proposed framework is robust, flexible, and efficient with respect to multiple views activity recognition, scale, and phase variations.
Figure-ground segregation modulates apparent motion.

PubMed

Ramachandran, V S; Anstis, S

1986-01-01

We explored the relationship between figure-ground segmentation and apparent motion. Results suggest that: static elements in the surround can eliminate apparent motion of a cluster of dots in the centre, but only if the cluster and surround have similar "grain" or texture; outlines that define occluding surfaces are taken into account by the motion mechanism; the brain uses a hierarchy of precedence rules in attributing motion to different segments of the visual scene. Being designated as "figure" confers a high rank in this scheme of priorities.
A comparison of moving object detection methods for real-time moving object detection

NASA Astrophysics Data System (ADS)

Roshan, Aditya; Zhang, Yun

2014-06-01

Moving object detection has a wide variety of applications from traffic monitoring, site monitoring, automatic theft identification, face detection to military surveillance. Many methods have been developed across the globe for moving object detection, but it is very difficult to find one which can work globally in all situations and with different types of videos. The purpose of this paper is to evaluate existing moving object detection methods which can be implemented in software on a desktop or laptop, for real time object detection. There are several moving object detection methods noted in the literature, but few of them are suitable for real time moving object detection. Most of the methods which provide for real time movement are further limited by the number of objects and the scene complexity. This paper evaluates the four most commonly used moving object detection methods as background subtraction technique, Gaussian mixture model, wavelet based and optical flow based methods. The work is based on evaluation of these four moving object detection methods using two (2) different sets of cameras and two (2) different scenes. The moving object detection methods have been implemented using MatLab and results are compared based on completeness of detected objects, noise, light change sensitivity, processing time etc. After comparison, it is observed that optical flow based method took least processing time and successfully detected boundary of moving objects which also implies that it can be implemented for real-time moving object detection.
Using virtual reality to test the regularity priors used by the human visual system

NASA Astrophysics Data System (ADS)

Palmer, Eric; Kwon, TaeKyu; Pizlo, Zygmunt

2017-09-01

Virtual reality applications provide an opportunity to test human vision in well-controlled scenarios that would be difficult to generate in real physical spaces. This paper presents a study intended to evaluate the importance of the regularity priors used by the human visual system. Using a CAVE simulation, subjects viewed virtual objects in a variety of experimental manipulations. In the first experiment, the subject was asked to count the objects in a scene that was viewed either right-side-up or upside-down for 4 seconds. The subject counted more accurately in the right-side-up condition regardless of the presence of binocular disparity or color. In the second experiment, the subject was asked to reconstruct the scene from a different viewpoint. Reconstructions were accurate, but the position and orientation error was twice as high when the scene was rotated by 45°, compared to 22.5°. Similarly to the first experiment, there was little difference between monocular and binocular viewing. In the third experiment, the subject was asked to adjust the position of one object to match the depth extent to the frontal extent among three objects. Performance was best with symmetrical objects and became poorer with asymmetrical objects and poorest with only small circular markers on the floor. Finally, in the fourth experiment, we demonstrated reliable performance in monocular and binocular recovery of 3D shapes of objects standing naturally on the simulated horizontal floor. Based on these results, we conclude that gravity, horizontal ground, and symmetry priors play an important role in veridical perception of scenes.
Estimating the number of people in crowded scenes

NASA Astrophysics Data System (ADS)

Kim, Minjin; Kim, Wonjun; Kim, Changick

2011-01-01

This paper presents a method to estimate the number of people in crowded scenes without using explicit object segmentation or tracking. The proposed method consists of three steps as follows: (1) extracting space-time interest points using eigenvalues of the local spatio-temporal gradient matrix, (2) generating crowd regions based on space-time interest points, and (3) estimating the crowd density based on the multiple regression. In experimental results, the efficiency and robustness of our proposed method are demonstrated by using PETS 2009 dataset.
Research and Technology Development for Construction of 3d Video Scenes

NASA Astrophysics Data System (ADS)

Khlebnikova, Tatyana A.

2016-06-01

For the last two decades surface information in the form of conventional digital and analogue topographic maps has been being supplemented by new digital geospatial products, also known as 3D models of real objects. It is shown that currently there are no defined standards for 3D scenes construction technologies that could be used by Russian surveying and cartographic enterprises. The issues regarding source data requirements, their capture and transferring to create 3D scenes have not been defined yet. The accuracy issues for 3D video scenes used for measuring purposes can hardly ever be found in publications. Practicability of development, research and implementation of technology for construction of 3D video scenes is substantiated by 3D video scene capability to expand the field of data analysis application for environmental monitoring, urban planning, and managerial decision problems. The technology for construction of 3D video scenes with regard to the specified metric requirements is offered. Technique and methodological background are recommended for this technology used to construct 3D video scenes based on DTM, which were created by satellite and aerial survey data. The results of accuracy estimation of 3D video scenes are presented.
A novel scene-based non-uniformity correction method for SWIR push-broom hyperspectral sensors

NASA Astrophysics Data System (ADS)

Hu, Bin-Lin; Hao, Shi-Jing; Sun, De-Xin; Liu, Yin-Nian

2017-09-01

A novel scene-based non-uniformity correction (NUC) method for short-wavelength infrared (SWIR) push-broom hyperspectral sensors is proposed and evaluated. This method relies on the assumption that for each band there will be ground objects with similar reflectance to form uniform regions when a sufficient number of scanning lines are acquired. The uniform regions are extracted automatically through a sorting algorithm, and are used to compute the corresponding NUC coefficients. SWIR hyperspectral data from airborne experiment are used to verify and evaluate the proposed method, and results show that stripes in the scenes have been well corrected without any significant information loss, and the non-uniformity is less than 0.5%. In addition, the proposed method is compared to two other regular methods, and they are evaluated based on their adaptability to the various scenes, non-uniformity, roughness and spectral fidelity. It turns out that the proposed method shows strong adaptability, high accuracy and efficiency.

Space-variant restoration of images degraded by camera motion blur.

PubMed

Sorel, Michal; Flusser, Jan

2008-02-01

We examine the problem of restoration from multiple images degraded by camera motion blur. We consider scenes with significant depth variations resulting in space-variant blur. The proposed algorithm can be applied if the camera moves along an arbitrary curve parallel to the image plane, without any rotations. The knowledge of camera trajectory and camera parameters is not necessary. At the input, the user selects a region where depth variations are negligible. The algorithm belongs to the group of variational methods that estimate simultaneously a sharp image and a depth map, based on the minimization of a cost functional. To initialize the minimization, it uses an auxiliary window-based depth estimation algorithm. Feasibility of the algorithm is demonstrated by three experiments with real images.
Neural correlates of contextual cueing are modulated by explicit learning.

PubMed

Westerberg, Carmen E; Miller, Brennan B; Reber, Paul J; Cohen, Neal J; Paller, Ken A

2011-10-01

Contextual cueing refers to the facilitated ability to locate a particular visual element in a scene due to prior exposure to the same scene. This facilitation is thought to reflect implicit learning, as it typically occurs without the observer's knowledge that scenes repeat. Unlike most other implicit learning effects, contextual cueing can be impaired following damage to the medial temporal lobe. Here we investigated neural correlates of contextual cueing and explicit scene memory in two participant groups. Only one group was explicitly instructed about scene repetition. Participants viewed a sequence of complex scenes that depicted a landscape with five abstract geometric objects. Superimposed on each object was a letter T or L rotated left or right by 90°. Participants responded according to the target letter (T) orientation. Responses were highly accurate for all scenes. Response speeds were faster for repeated versus novel scenes. The magnitude of this contextual cueing did not differ between the two groups. Also, in both groups repeated scenes yielded reduced hemodynamic activation compared with novel scenes in several regions involved in visual perception and attention, and reductions in some of these areas were correlated with response-time facilitation. In the group given instructions about scene repetition, recognition memory for scenes was superior and was accompanied by medial temporal and more anterior activation. Thus, strategic factors can promote explicit memorization of visual scene information, which appears to engage additional neural processing beyond what is required for implicit learning of object configurations and target locations in a scene. Copyright © 2011 Elsevier Ltd. All rights reserved.
Neural correlates of contextual cueing are modulated by explicit learning

PubMed Central

Westerberg, Carmen E.; Miller, Brennan B.; Reber, Paul J.; Cohen, Neal J.; Paller, Ken A.

2011-01-01

Contextual cueing refers to the facilitated ability to locate a particular visual element in a scene due to prior exposure to the same scene. This facilitation is thought to reflect implicit learning, as it typically occurs without the observer’s knowledge that scenes repeat. Unlike most other implicit learning effects, contextual cueing can be impaired following damage to the medial temporal lobe. Here we investigated neural correlates of contextual cueing and explicit scene memory in two participant groups. Only one group was explicitly instructed about scene repetition. Participants viewed a sequence of complex scenes that depicted a landscape with five abstract geometric objects. Superimposed on each object was a letter T or L rotated left or right by 90°. Participants responded according to the target letter (T) orientation. Responses were highly accurate for all scenes. Response speeds were faster for repeated versus novel scenes. The magnitude of this contextual cueing did not differ between the two groups. Also, in both groups repeated scenes yielded reduced hemodynamic activation compared with novel scenes in several regions involved in visual perception and attention, and reductions in some of these areas were correlated with response-time facilitation. In the group given instructions about scene repetition, recognition memory for scenes was superior and was accompanied by medial temporal and more anterior activation. Thus, strategic factors can promote explicit memorization of visual scene information, which appears to engage additional neural processing beyond what is required for implicit learning of object configurations and target locations in a scene. PMID:21889947
Robust Notion Vision For A Vehicle Moving On A Plane

NASA Astrophysics Data System (ADS)

Moni, Shankar; Weldon, E. J.

1987-05-01

A vehicle equipped with a cemputer vision system moves on a plane. We show that subject to certain constraints, the system can determine the motion of the vehicle (one rotational and two translational degrees of freedom) and the depth of the scene in front of the vehicle. The constraints include limits on the speed of the vehicle, presence of texture on the plane and absence of pitch and roll in the vehicular motion. It is possible to decouple the problems of finding the vehicle's motion and the depth of the scene in front of the vehicle by using two rigidly connected cameras. One views a field with known depth (i.e. the ground plane) and estimates the motion parameters and the other determines the depth map knowing the motion parameters. The motion is constrained to be planar to increase robustness. We use a least squares method of fitting the vehicle motion to observer brightness gradients. With this method, no correspondence between image points needs to be established and information fran the entire image is used in calculating notion. The algorithm performs very reliably on real image sequences and these results have been included. The results compare favourably to the performance of the algorithm of Negandaripour and Horn [2] where six degrees of freedom are assumed.
Moving object detection using dynamic motion modelling from UAV aerial images.

PubMed

Saif, A F M Saifuddin; Prabuwono, Anton Satria; Mahayuddin, Zainal Rasyid

2014-01-01

Motion analysis based moving object detection from UAV aerial image is still an unsolved issue due to inconsideration of proper motion estimation. Existing moving object detection approaches from UAV aerial images did not deal with motion based pixel intensity measurement to detect moving object robustly. Besides current research on moving object detection from UAV aerial images mostly depends on either frame difference or segmentation approach separately. There are two main purposes for this research: firstly to develop a new motion model called DMM (dynamic motion model) and secondly to apply the proposed segmentation approach SUED (segmentation using edge based dilation) using frame difference embedded together with DMM model. The proposed DMM model provides effective search windows based on the highest pixel intensity to segment only specific area for moving object rather than searching the whole area of the frame using SUED. At each stage of the proposed scheme, experimental fusion of the DMM and SUED produces extracted moving objects faithfully. Experimental result reveals that the proposed DMM and SUED have successfully demonstrated the validity of the proposed methodology.
Demonstrating the Potential for Dynamic Auditory Stimulation to Contribute to Motion Sickness

PubMed Central

Keshavarz, Behrang; Hettinger, Lawrence J.; Kennedy, Robert S.; Campos, Jennifer L.

2014-01-01

Auditory cues can create the illusion of self-motion (vection) in the absence of visual or physical stimulation. The present study aimed to determine whether auditory cues alone can also elicit motion sickness and how auditory cues contribute to motion sickness when added to visual motion stimuli. Twenty participants were seated in front of a curved projection display and were exposed to a virtual scene that constantly rotated around the participant's vertical axis. The virtual scene contained either visual-only, auditory-only, or a combination of corresponding visual and auditory cues. All participants performed all three conditions in a counterbalanced order. Participants tilted their heads alternately towards the right or left shoulder in all conditions during stimulus exposure in order to create pseudo-Coriolis effects and to maximize the likelihood for motion sickness. Measurements of motion sickness (onset, severity), vection (latency, strength, duration), and postural steadiness (center of pressure) were recorded. Results showed that adding auditory cues to the visual stimuli did not, on average, affect motion sickness and postural steadiness, but it did reduce vection onset times and increased vection strength compared to pure visual or pure auditory stimulation. Eighteen of the 20 participants reported at least slight motion sickness in the two conditions including visual stimuli. More interestingly, six participants also reported slight motion sickness during pure auditory stimulation and two of the six participants stopped the pure auditory test session due to motion sickness. The present study is the first to demonstrate that motion sickness may be caused by pure auditory stimulation, which we refer to as “auditorily induced motion sickness”. PMID:24983752
Graphics to H.264 video encoding for 3D scene representation and interaction on mobile devices using region of interest

NASA Astrophysics Data System (ADS)

Le, Minh Tuan; Nguyen, Congdu; Yoon, Dae-Il; Jung, Eun Ku; Jia, Jie; Kim, Hae-Kwang

2007-12-01

In this paper, we propose a method of 3D graphics to video encoding and streaming that are embedded into a remote interactive 3D visualization system for rapidly representing a 3D scene on mobile devices without having to download it from the server. In particular, a 3D graphics to video framework is presented that increases the visual quality of regions of interest (ROI) of the video by performing more bit allocation to ROI during H.264 video encoding. The ROI are identified by projection 3D objects to a 2D plane during rasterization. The system offers users to navigate the 3D scene and interact with objects of interests for querying their descriptions. We developed an adaptive media streaming server that can provide an adaptive video stream in term of object-based quality to the client according to the user's preferences and the variation of network bandwidth. Results show that by doing ROI mode selection, PSNR of test sample slightly change while visual quality of objects increases evidently.
Projection of controlled repeatable real-time moving targets to test and evaluate motion imagery quality

NASA Astrophysics Data System (ADS)

Scopatz, Stephen D.; Mendez, Michael; Trent, Randall

2015-05-01

The projection of controlled moving targets is key to the quantitative testing of video capture and post processing for Motion Imagery. This presentation will discuss several implementations of target projectors with moving targets or apparent moving targets creating motion to be captured by the camera under test. The targets presented are broadband (UV-VIS-IR) and move in a predictable, repeatable and programmable way; several short videos will be included in the presentation. Among the technical approaches will be targets that move independently in the camera's field of view, as well targets that change size and shape. The development of a rotating IR and VIS 4 bar target projector with programmable rotational velocity and acceleration control for testing hyperspectral cameras is discussed. A related issue for motion imagery is evaluated by simulating a blinding flash which is an impulse of broadband photons in fewer than 2 milliseconds to assess the camera's reaction to a large, fast change in signal. A traditional approach of gimbal mounting the camera in combination with the moving target projector is discussed as an alternative to high priced flight simulators. Based on the use of the moving target projector several standard tests are proposed to provide a corresponding test to MTF (resolution), SNR and minimum detectable signal at velocity. Several unique metrics are suggested for Motion Imagery including Maximum Velocity Resolved (the measure of the greatest velocity that is accurately tracked by the camera system) and Missing Object Tolerance (measurement of tracking ability when target is obscured in the images). These metrics are applicable to UV-VIS-IR wavelengths and can be used to assist in camera and algorithm development as well as comparing various systems by presenting the exact scenes to the cameras in a repeatable way.
A kind of graded sub-pixel motion estimation algorithm combining time-domain characteristics with frequency-domain phase correlation

NASA Astrophysics Data System (ADS)

Xie, Bing; Duan, Zhemin; Chen, Yu

2017-11-01

The mode of navigation based on scene match can assist UAV to achieve autonomous navigation and other missions. However, aerial multi-frame images of the UAV in the complex flight environment easily be affected by the jitter, noise and exposure, which will lead to image blur, deformation and other issues, and result in the decline of detection rate of the interested regional target. Aiming at this problem, we proposed a kind of Graded sub-pixel motion estimation algorithm combining time-domain characteristics with frequency-domain phase correlation. Experimental results prove the validity and accuracy of the proposed algorithm.
3D Traffic Scene Understanding From Movable Platforms.

PubMed

Geiger, Andreas; Lauer, Martin; Wojek, Christian; Stiller, Christoph; Urtasun, Raquel

2014-05-01

In this paper, we present a novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene. In particular, the scene topology, geometry, and traffic activities are inferred from short video sequences. Inspired by the impressive driving capabilities of humans, our model does not rely on GPS, lidar, or map knowledge. Instead, it takes advantage of a diverse set of visual cues in the form of vehicle tracklets, vanishing points, semantic scene labels, scene flow, and occupancy grids. For each of these cues, we propose likelihood functions that are integrated into a probabilistic generative model. We learn all model parameters from training data using contrastive divergence. Experiments conducted on videos of 113 representative intersections show that our approach successfully infers the correct layout in a variety of very challenging scenarios. To evaluate the importance of each feature cue, experiments using different feature combinations are conducted. Furthermore, we show how by employing context derived from the proposed method we are able to improve over the state-of-the-art in terms of object detection and object orientation estimation in challenging and cluttered urban environments.
Event-Based Computation of Motion Flow on a Neuromorphic Analog Neural Platform

PubMed Central

Giulioni, Massimiliano; Lagorce, Xavier; Galluppi, Francesco; Benosman, Ryad B.

2016-01-01

Estimating the speed and direction of moving objects is a crucial component of agents behaving in a dynamic world. Biological organisms perform this task by means of the neural connections originating from their retinal ganglion cells. In artificial systems the optic flow is usually extracted by comparing activity of two or more frames captured with a vision sensor. Designing artificial motion flow detectors which are as fast, robust, and efficient as the ones found in biological systems is however a challenging task. Inspired by the architecture proposed by Barlow and Levick in 1965 to explain the spiking activity of the direction-selective ganglion cells in the rabbit's retina, we introduce an architecture for robust optical flow extraction with an analog neuromorphic multi-chip system. The task is performed by a feed-forward network of analog integrate-and-fire neurons whose inputs are provided by contrast-sensitive photoreceptors. Computation is supported by the precise time of spike emission, and the extraction of the optical flow is based on time lag in the activation of nearby retinal neurons. Mimicking ganglion cells our neuromorphic detectors encode the amplitude and the direction of the apparent visual motion in their output spiking pattern. Hereby we describe the architectural aspects, discuss its latency, scalability, and robustness properties and demonstrate that a network of mismatched delicate analog elements can reliably extract the optical flow from a simple visual scene. This work shows how precise time of spike emission used as a computational basis, biological inspiration, and neuromorphic systems can be used together for solving specific tasks. PMID:26909015
Event-Based Computation of Motion Flow on a Neuromorphic Analog Neural Platform.

PubMed

Giulioni, Massimiliano; Lagorce, Xavier; Galluppi, Francesco; Benosman, Ryad B

2016-01-01

Estimating the speed and direction of moving objects is a crucial component of agents behaving in a dynamic world. Biological organisms perform this task by means of the neural connections originating from their retinal ganglion cells. In artificial systems the optic flow is usually extracted by comparing activity of two or more frames captured with a vision sensor. Designing artificial motion flow detectors which are as fast, robust, and efficient as the ones found in biological systems is however a challenging task. Inspired by the architecture proposed by Barlow and Levick in 1965 to explain the spiking activity of the direction-selective ganglion cells in the rabbit's retina, we introduce an architecture for robust optical flow extraction with an analog neuromorphic multi-chip system. The task is performed by a feed-forward network of analog integrate-and-fire neurons whose inputs are provided by contrast-sensitive photoreceptors. Computation is supported by the precise time of spike emission, and the extraction of the optical flow is based on time lag in the activation of nearby retinal neurons. Mimicking ganglion cells our neuromorphic detectors encode the amplitude and the direction of the apparent visual motion in their output spiking pattern. Hereby we describe the architectural aspects, discuss its latency, scalability, and robustness properties and demonstrate that a network of mismatched delicate analog elements can reliably extract the optical flow from a simple visual scene. This work shows how precise time of spike emission used as a computational basis, biological inspiration, and neuromorphic systems can be used together for solving specific tasks.
Realism and Perspectivism: a Reevaluation of Rival Theories of Spatial Vision.

NASA Astrophysics Data System (ADS)

Thro, E. Broydrick

1990-01-01

My study reevaluates two theories of human space perception, a trigonometric surveying theory I call perspectivism and a "scene recognition" theory I call realism. Realists believe that retinal image geometry can supply no unambiguous information about an object's size and distance--and that, as a result, viewers can locate objects in space only by making discretionary interpretations based on familiar experience of object types. Perspectivists, in contrast, think viewers can disambiguate object sizes/distances on the basis of retinal image information alone. More specifically, they believe the eye responds to perspective image geometry with an automatic trigonometric calculation that not only fixes the directions and shapes, but also roughly fixes the sizes and distances of scene elements in space. Today this surveyor theory has been largely superceded by the realist approach, because most vision scientists believe retinal image geometry is ambiguous about the scale of space. However, I show that there is a considerable body of neglected evidence, both past and present, tending to call this scale ambiguity claim into question. I maintain that this evidence against scale ambiguity could hardly be more important, if one considers its subversive implications for the scene recognition theory that is not only today's reigning approach to spatial vision, but also the foundation for computer scientists' efforts to create space-perceiving robots. If viewers were deemed to be capable of automatic surveying calculations, the discretionary scene recognition theory would lose its main justification. Clearly, it would be difficult for realists to maintain that we viewers rely on scene recognition for space perception in spite of our ability to survey. And in reality, as I show, the surveyor theory does a much better job of describing the everyday space we viewers actually see--a space featuring stable, unambiguous relationships among scene elements, and a single horizon and vanishing point for (meter-scale) receding objects. In addition, I argue, the surveyor theory raises fewer philosophical difficulties, because it is more in harmony with our everyday concepts of material objects, human agency and the self.
Neural Correlates of Fixation Duration during Real-world Scene Viewing: Evidence from Fixation-related (FIRE) fMRI.

PubMed

Henderson, John M; Choi, Wonil

2015-06-01

During active scene perception, our eyes move from one location to another via saccadic eye movements, with the eyes fixating objects and scene elements for varying amounts of time. Much of the variability in fixation duration is accounted for by attentional, perceptual, and cognitive processes associated with scene analysis and comprehension. For this reason, current theories of active scene viewing attempt to account for the influence of attention and cognition on fixation duration. Yet almost nothing is known about the neurocognitive systems associated with variation in fixation duration during scene viewing. We addressed this topic using fixation-related fMRI, which involves coregistering high-resolution eye tracking and magnetic resonance scanning to conduct event-related fMRI analysis based on characteristics of eye movements. We observed that activation in visual and prefrontal executive control areas was positively correlated with fixation duration, whereas activation in ventral areas associated with scene encoding and medial superior frontal and paracentral regions associated with changing action plans was negatively correlated with fixation duration. The results suggest that fixation duration in scene viewing is controlled by cognitive processes associated with real-time scene analysis interacting with motor planning, consistent with current computational models of active vision for scene perception.
Misperceptions in the Trajectories of Objects undergoing Curvilinear Motion

PubMed Central

Yilmaz, Ozgur; Tripathy, Srimant P.; Ogmen, Haluk

2012-01-01

Trajectory perception is crucial in scene understanding and action. A variety of trajectory misperceptions have been reported in the literature. In this study, we quantify earlier observations that reported distortions in the perceived shape of bilinear trajectories and in the perceived positions of their deviation. Our results show that bilinear trajectories with deviation angles smaller than 90 deg are perceived smoothed while those with deviation angles larger than 90 degrees are perceived sharpened. The sharpening effect is weaker in magnitude than the smoothing effect. We also found a correlation between the distortion of perceived trajectories and the perceived shift of their deviation point. Finally, using a dual-task paradigm, we found that reducing attentional resources allocated to the moving target causes an increase in the perceived shift of the deviation point of the trajectory. We interpret these results in the context of interactions between motion and position systems. PMID:22615775
Scene perception in posterior cortical atrophy: categorization, description and fixation patterns

PubMed Central

Shakespeare, Timothy J.; Yong, Keir X. X.; Frost, Chris; Kim, Lois G.; Warrington, Elizabeth K.; Crutch, Sebastian J.

2013-01-01

Partial or complete Balint's syndrome is a core feature of the clinico-radiological syndrome of posterior cortical atrophy (PCA), in which individuals experience a progressive deterioration of cortical vision. Although multi-object arrays are frequently used to detect simultanagnosia in the clinical assessment and diagnosis of PCA, to date there have been no group studies of scene perception in patients with the syndrome. The current study involved three linked experiments conducted in PCA patients and healthy controls. Experiment 1 evaluated the accuracy and latency of complex scene perception relative to individual faces and objects (color and grayscale) using a categorization paradigm. PCA patients were both less accurate (faces < scenes < objects) and slower (scenes < objects < faces) than controls on all categories, with performance strongly associated with their level of basic visual processing impairment; patients also showed a small advantage for color over grayscale stimuli. Experiment 2 involved free description of real world scenes. PCA patients generated fewer features and more misperceptions than controls, though perceptual errors were always consistent with the patient's global understanding of the scene (whether correct or not). Experiment 3 used eye tracking measures to compare patient and control eye movements over initial and subsequent fixations of scenes. Patients' fixation patterns were significantly different to those of young and age-matched controls, with comparable group differences for both initial and subsequent fixations. Overall, these findings describe the variability in everyday scene perception exhibited by individuals with PCA, and indicate the importance of exposure duration in the perception of complex scenes. PMID:24106469
Autonomous landing and ingress of micro-air-vehicles in urban environments based on monocular vision

NASA Astrophysics Data System (ADS)

Brockers, Roland; Bouffard, Patrick; Ma, Jeremy; Matthies, Larry; Tomlin, Claire

2011-06-01

Unmanned micro air vehicles (MAVs) will play an important role in future reconnaissance and search and rescue applications. In order to conduct persistent surveillance and to conserve energy, MAVs need the ability to land, and they need the ability to enter (ingress) buildings and other structures to conduct reconnaissance. To be safe and practical under a wide range of environmental conditions, landing and ingress maneuvers must be autonomous, using real-time, onboard sensor feedback. To address these key behaviors, we present a novel method for vision-based autonomous MAV landing and ingress using a single camera for two urban scenarios: landing on an elevated surface, representative of a rooftop, and ingress through a rectangular opening, representative of a door or window. Real-world scenarios will not include special navigation markers, so we rely on tracking arbitrary scene features; however, we do currently exploit planarity of the scene. Our vision system uses a planar homography decomposition to detect navigation targets and to produce approach waypoints as inputs to the vehicle control algorithm. Scene perception, planning, and control run onboard in real-time; at present we obtain aircraft position knowledge from an external motion capture system, but we expect to replace this in the near future with a fully self-contained, onboard, vision-aided state estimation algorithm. We demonstrate autonomous vision-based landing and ingress target detection with two different quadrotor MAV platforms. To our knowledge, this is the first demonstration of onboard, vision-based autonomous landing and ingress algorithms that do not use special purpose scene markers to identify the destination.
Autonomous Landing and Ingress of Micro-Air-Vehicles in Urban Environments Based on Monocular Vision

NASA Technical Reports Server (NTRS)

Brockers, Roland; Bouffard, Patrick; Ma, Jeremy; Matthies, Larry; Tomlin, Claire

2011-01-01

Unmanned micro air vehicles (MAVs) will play an important role in future reconnaissance and search and rescue applications. In order to conduct persistent surveillance and to conserve energy, MAVs need the ability to land, and they need the ability to enter (ingress) buildings and other structures to conduct reconnaissance. To be safe and practical under a wide range of environmental conditions, landing and ingress maneuvers must be autonomous, using real-time, onboard sensor feedback. To address these key behaviors, we present a novel method for vision-based autonomous MAV landing and ingress using a single camera for two urban scenarios: landing on an elevated surface, representative of a rooftop, and ingress through a rectangular opening, representative of a door or window. Real-world scenarios will not include special navigation markers, so we rely on tracking arbitrary scene features; however, we do currently exploit planarity of the scene. Our vision system uses a planar homography decomposition to detect navigation targets and to produce approach waypoints as inputs to the vehicle control algorithm. Scene perception, planning, and control run onboard in real-time; at present we obtain aircraft position knowledge from an external motion capture system, but we expect to replace this in the near future with a fully self-contained, onboard, vision-aided state estimation algorithm. We demonstrate autonomous vision-based landing and ingress target detection with two different quadrotor MAV platforms. To our knowledge, this is the first demonstration of onboard, vision-based autonomous landing and ingress algorithms that do not use special purpose scene markers to identify the destination.
Optical system for object detection and delineation in space

NASA Astrophysics Data System (ADS)

Handelman, Amir; Shwartz, Shoam; Donitza, Liad; Chaplanov, Loran

2018-01-01

Object recognition and delineation is an important task in many environments, such as in crime scenes and operating rooms. Marking evidence or surgical tools and attracting the attention of the surrounding staff to the marked objects can affect people's lives. We present an optical system comprising a camera, computer, and small laser projector that can detect and delineate objects in the environment. To prove the optical system's concept, we show that it can operate in a hypothetical crime scene in which a pistol is present and automatically recognize and segment it by various computer-vision algorithms. Based on such segmentation, the laser projector illuminates the actual boundaries of the pistol and thus allows the persons in the scene to comfortably locate and measure the pistol without holding any intermediator device, such as an augmented reality handheld device, glasses, or screens. Using additional optical devices, such as diffraction grating and a cylinder lens, the pistol size can be estimated. The exact location of the pistol in space remains static, even after its removal. Our optical system can be fixed or dynamically moved, making it suitable for various applications that require marking of objects in space.
An Indoor Slam Method Based on Kinect and Multi-Feature Extended Information Filter

NASA Astrophysics Data System (ADS)

Chang, M.; Kang, Z.

2017-09-01

Based on the frame of ORB-SLAM in this paper the transformation parameters between adjacent Kinect image frames are computed using ORB keypoints, from which priori information matrix and information vector are calculated. The motion update of multi-feature extended information filter is then realized. According to the point cloud data formed by depth image, ICP algorithm was used to extract the point features of the point cloud data in the scene and built an observation model while calculating a-posteriori information matrix and information vector, and weakening the influences caused by the error accumulation in the positioning process. Furthermore, this paper applied ORB-SLAM frame to realize autonomous positioning in real time in interior unknown environment. In the end, Lidar was used to get data in the scene in order to estimate positioning accuracy put forward in this paper.

Inferring segmented dense motion layers using 5D tensor voting.

PubMed

Min, Changki; Medioni, Gérard

2008-09-01

We present a novel local spatiotemporal approach to produce motion segmentation and dense temporal trajectories from an image sequence. A common representation of image sequences is a 3D spatiotemporal volume, (x,y,t), and its corresponding mathematical formalism is the fiber bundle. However, directly enforcing the spatiotemporal smoothness constraint is difficult in the fiber bundle representation. Thus, we convert the representation into a new 5D space (x,y,t,vx,vy) with an additional velocity domain, where each moving object produces a separate 3D smooth layer. The smoothness constraint is now enforced by extracting 3D layers using the tensor voting framework in a single step that solves both correspondence and segmentation simultaneously. Motion segmentation is achieved by identifying those layers, and the dense temporal trajectories are obtained by converting the layers back into the fiber bundle representation. We proceed to address three applications (tracking, mosaic, and 3D reconstruction) that are hard to solve from the video stream directly because of the segmentation and dense matching steps, but become straightforward with our framework. The approach does not make restrictive assumptions about the observed scene or camera motion and is therefore generally applicable. We present results on a number of data sets.
Visual Object Recognition with 3D-Aware Features in KITTI Urban Scenes

PubMed Central

Yebes, J. Javier; Bergasa, Luis M.; García-Garrido, Miguel Ángel

2015-01-01

Driver assistance systems and autonomous robotics rely on the deployment of several sensors for environment perception. Compared to LiDAR systems, the inexpensive vision sensors can capture the 3D scene as perceived by a driver in terms of appearance and depth cues. Indeed, providing 3D image understanding capabilities to vehicles is an essential target in order to infer scene semantics in urban environments. One of the challenges that arises from the navigation task in naturalistic urban scenarios is the detection of road participants (e.g., cyclists, pedestrians and vehicles). In this regard, this paper tackles the detection and orientation estimation of cars, pedestrians and cyclists, employing the challenging and naturalistic KITTI images. This work proposes 3D-aware features computed from stereo color images in order to capture the appearance and depth peculiarities of the objects in road scenes. The successful part-based object detector, known as DPM, is extended to learn richer models from the 2.5D data (color and disparity), while also carrying out a detailed analysis of the training pipeline. A large set of experiments evaluate the proposals, and the best performing approach is ranked on the KITTI website. Indeed, this is the first work that reports results with stereo data for the KITTI object challenge, achieving increased detection ratios for the classes car and cyclist compared to a baseline DPM. PMID:25903553
Visual Object Recognition with 3D-Aware Features in KITTI Urban Scenes.

PubMed

Yebes, J Javier; Bergasa, Luis M; García-Garrido, Miguel Ángel

2015-04-20

Driver assistance systems and autonomous robotics rely on the deployment of several sensors for environment perception. Compared to LiDAR systems, the inexpensive vision sensors can capture the 3D scene as perceived by a driver in terms of appearance and depth cues. Indeed, providing 3D image understanding capabilities to vehicles is an essential target in order to infer scene semantics in urban environments. One of the challenges that arises from the navigation task in naturalistic urban scenarios is the detection of road participants (e.g., cyclists, pedestrians and vehicles). In this regard, this paper tackles the detection and orientation estimation of cars, pedestrians and cyclists, employing the challenging and naturalistic KITTI images. This work proposes 3D-aware features computed from stereo color images in order to capture the appearance and depth peculiarities of the objects in road scenes. The successful part-based object detector, known as DPM, is extended to learn richer models from the 2.5D data (color and disparity), while also carrying out a detailed analysis of the training pipeline. A large set of experiments evaluate the proposals, and the best performing approach is ranked on the KITTI website. Indeed, this is the first work that reports results with stereo data for the KITTI object challenge, achieving increased detection ratios for the classes car and cyclist compared to a baseline DPM.
CHAMP: a locally adaptive unmixing-based hyperspectral anomaly detection algorithm

NASA Astrophysics Data System (ADS)

Crist, Eric P.; Thelen, Brian J.; Carrara, David A.

1998-10-01

Anomaly detection offers a means by which to identify potentially important objects in a scene without prior knowledge of their spectral signatures. As such, this approach is less sensitive to variations in target class composition, atmospheric and illumination conditions, and sensor gain settings than would be a spectral matched filter or similar algorithm. The best existing anomaly detectors generally fall into one of two categories: those based on local Gaussian statistics, and those based on linear mixing moles. Unmixing-based approaches better represent the real distribution of data in a scene, but are typically derived and applied on a global or scene-wide basis. Locally adaptive approaches allow detection of more subtle anomalies by accommodating the spatial non-homogeneity of background classes in a typical scene, but provide a poorer representation of the true underlying background distribution. The CHAMP algorithm combines the best attributes of both approaches, applying a linear-mixing model approach in a spatially adaptive manner. The algorithm itself, and teste results on simulated and actual hyperspectral image data, are presented in this paper.
An Analysis of the High Frequency Vibrations in Early Thematic Mapper Scenes

NASA Technical Reports Server (NTRS)

Kogut, J.; Larduinat, E.

1985-01-01

The motion of the mirrors in the thematic mapper (TM) and multispectral scanner (MSS) instruments, and the motion of other devices, such as the TDRSS antenna drive, and solar array drives onboard LANDSAT-4 cause vibrations to propagate through the spacecraft. These vibrations as well as nonlinearities in the scanning motion of the TM mirror can cause the TM detectors to point away from their nominal positions. Two computer programs, JITTER and SCDFT, were developed as part of the LANDSAT-D Assessment System (LAS), Products and Procedures Analysis (PAPA) program to evaluate the potential effect of high frequency vibrations on the final TM image. The maximum overlap and underlap which were observed for early TM scenes are well within specifications for the ground processing system. The cross scan and scan high frequency vibrations are also within the specifications cited for the flight system.
Only "efficient" emotional stimuli affect the content of working memory during free-recollection from natural scenes.

PubMed

Buttafuoco, Arianna; Pedale, Tiziana; Buchanan, Tony W; Santangelo, Valerio

2018-02-01

Emotional events are thought to have privileged access to attention and memory, consuming resources needed to encode competing emotionally neutral stimuli. However, it is not clear whether this detrimental effect is automatic or depends on the successful maintenance of the specific emotional object within working memory. Here, participants viewed everyday scenes including an emotional object among other neutral objects followed by a free-recollection task. Results showed that emotional objects-irrespective of their perceptual saliency-were recollected more often than neutral objects. The probability of being recollected increased as a function of the arousal of the emotional objects, specifically for negative objects. Successful recollection of emotional objects (positive or negative) from a scene reduced the overall number of recollected neutral objects from the same scene. This indicates that only emotional stimuli that are efficient in grabbing (and then consuming) available attentional resources play a crucial role during the encoding of competing information, with a subsequent bias in the recollection of neutral representations.
Spatial attention is attracted in a sustained fashion toward singular points in the optic flow.

PubMed

Wang, Shuo; Fukuchi, Masaki; Koch, Christof; Tsuchiya, Naotsugu

2012-01-01

While a single approaching object is known to attract spatial attention, it is unknown how attention is directed when the background looms towards the observer as s/he moves forward in a quasi-stationary environment. In Experiment 1, we used a cued speeded discrimination task to quantify where and how spatial attention is directed towards the target superimposed onto a cloud of moving dots. We found that when the motion was expansive, attention was attracted towards the singular point of the optic flow (the focus of expansion, FOE) in a sustained fashion. The effects were less pronounced when the motion was contractive. The more ecologically valid the motion features became (e.g., temporal expansion of each dot, spatial depth structure implied by distribution of the size of the dots), the stronger the attentional effects. Further, the attentional effects were sustained over 1000 ms. Experiment 2 quantified these attentional effects using a change detection paradigm by zooming into or out of photographs of natural scenes. Spatial attention was attracted in a sustained manner such that change detection was facilitated or delayed depending on the location of the FOE only when the motion was expansive. Our results suggest that focal attention is strongly attracted towards singular points that signal the direction of forward ego-motion.
Spatial Attention Is Attracted in a Sustained Fashion toward Singular Points in the Optic Flow

PubMed Central

Wang, Shuo; Fukuchi, Masaki; Koch, Christof; Tsuchiya, Naotsugu

2012-01-01

While a single approaching object is known to attract spatial attention, it is unknown how attention is directed when the background looms towards the observer as s/he moves forward in a quasi-stationary environment. In Experiment 1, we used a cued speeded discrimination task to quantify where and how spatial attention is directed towards the target superimposed onto a cloud of moving dots. We found that when the motion was expansive, attention was attracted towards the singular point of the optic flow (the focus of expansion, FOE) in a sustained fashion. The effects were less pronounced when the motion was contractive. The more ecologically valid the motion features became (e.g., temporal expansion of each dot, spatial depth structure implied by distribution of the size of the dots), the stronger the attentional effects. Further, the attentional effects were sustained over 1000 ms. Experiment 2 quantified these attentional effects using a change detection paradigm by zooming into or out of photographs of natural scenes. Spatial attention was attracted in a sustained manner such that change detection was facilitated or delayed depending on the location of the FOE only when the motion was expansive. Our results suggest that focal attention is strongly attracted towards singular points that signal the direction of forward ego-motion. PMID:22905096
Ground-plane influences on size estimation in early visual processing.

PubMed

Champion, Rebecca A; Warren, Paul A

2010-07-21

Ground-planes have an important influence on the perception of 3D space (Gibson, 1950) and it has been shown that the assumption that a ground-plane is present in the scene plays a role in the perception of object distance (Bruno & Cutting, 1988). Here, we investigate whether this influence is exerted at an early stage of processing, to affect the rapid estimation of 3D size. Participants performed a visual search task in which they searched for a target object that was larger or smaller than distracter objects. Objects were presented against a background that contained either a frontoparallel or slanted 3D surface, defined by texture gradient cues. We measured the effect on search performance of target location within the scene (near vs. far) and how this was influenced by scene orientation (which, e.g., might be consistent with a ground or ceiling plane, etc.). In addition, we investigated how scene orientation interacted with texture gradient information (indicating surface slant), to determine how these separate cues to scene layout were combined. We found that the difference in target detection performance between targets at the front and rear of the simulated scene was maximal when the scene was consistent with a ground-plane - consistent with the use of an elevation cue to object distance. In addition, we found a significant increase in the size of this effect when texture gradient information (indicating surface slant) was present, but no interaction between texture gradient and scene orientation information. We conclude that scene orientation plays an important role in the estimation of 3D size at an early stage of processing, and suggest that elevation information is linearly combined with texture gradient information for the rapid estimation of 3D size. Copyright 2010 Elsevier Ltd. All rights reserved.
Guidance of visual attention by semantic information in real-world scenes

PubMed Central

Wu, Chia-Chien; Wick, Farahnaz Ahmed; Pomplun, Marc

2014-01-01

Recent research on attentional guidance in real-world scenes has focused on object recognition within the context of a scene. This approach has been valuable for determining some factors that drive the allocation of visual attention and determine visual selection. This article provides a review of experimental work on how different components of context, especially semantic information, affect attentional deployment. We review work from the areas of object recognition, scene perception, and visual search, highlighting recent studies examining semantic structure in real-world scenes. A better understanding on how humans parse scene representations will not only improve current models of visual attention but also advance next-generation computer vision systems and human-computer interfaces. PMID:24567724
Sharp-Wave Ripples in Primates Are Enhanced near Remembered Visual Objects.

PubMed

Leonard, Timothy K; Hoffman, Kari L

2017-01-23

The hippocampus plays an important role in memory for events that are distinct in space and time. One of the strongest, most synchronous neural signals produced by the hippocampus is the sharp-wave ripple (SWR), observed in a variety of mammalian species during offline behaviors, such as slow-wave sleep [1-3] and quiescent waking and pauses in exploration [4-8], leading to long-standing and widespread theories of its contribution to plasticity and memory during these inactive or immobile states [9-14]. Indeed, during sleep and waking inactivity, hippocampal SWRs in rodents appear to support spatial long-term and working memory [4, 15-23], but so far, they have not been linked to memory in primates. More recently, SWRs have been observed during active, visual scene exploration in macaques [24], opening up the possibility that these active-state ripples in the primate hippocampus are linked to memory for objects embedded in scenes. By measuring hippocampal SWRs in macaques during search for scene-contextualized objects, we found that SWR rate increased with repeated presentations. Furthermore, gaze during SWRs was more likely to be near the target object on repeated than on novel presentations, even after accounting for overall differences in gaze location with scene repetition. This proximity bias with repetition occurred near the time of target object detection for remembered targets. The increase in ripple likelihood near remembered visual objects suggests a link between ripples and memory in primates; specifically, SWRs may reflect part of a mechanism supporting the guidance of search based on past experience. Copyright © 2017 Elsevier Ltd. All rights reserved.
Positive emotion can protect against source memory impairment.

PubMed

MacKenzie, Graham; Powell, Tim F; Donaldson, David I

2015-01-01

Despite widespread belief that memory is enhanced by emotion, evidence also suggests that emotion can impair memory. Here we test predictions inspired by object-based binding theory, which states that memory enhancement or impairment depends on the nature of the information to be retrieved. We investigated emotional memory in the context of source retrieval, using images of scenes that were negative, neutral or positive in valence. At study each scene was paired with a colour and during retrieval participants reported the source colour for recognised scenes. Critically, we isolated effects of valence by equating stimulus arousal across conditions. In Experiment 1 colour borders surrounded scenes at study: memory impairment was found for both negative and positive scenes. Experiment 2 used colours superimposed over scenes at study: valence affected source retrieval, with memory impairment for negative scenes only. These findings challenge current theories of emotional memory by showing that emotion can impair memory for both intrinsic and extrinsic source information, even when arousal is equated between emotional and neutral stimuli, and by dissociating the effects of positive and negative emotion on episodic memory retrieval.
Significance of perceptually relevant image decolorization for scene classification

NASA Astrophysics Data System (ADS)

Viswanathan, Sowmya; Divakaran, Govind; Soman, Kutti Padanyl

2017-11-01

Color images contain luminance and chrominance components representing the intensity and color information, respectively. The objective of this paper is to show the significance of incorporating chrominance information to the task of scene classification. An improved color-to-grayscale image conversion algorithm that effectively incorporates chrominance information is proposed using the color-to-gray structure similarity index and singular value decomposition to improve the perceptual quality of the converted grayscale images. The experimental results based on an image quality assessment for image decolorization and its success rate (using the Cadik and COLOR250 datasets) show that the proposed image decolorization technique performs better than eight existing benchmark algorithms for image decolorization. In the second part of the paper, the effectiveness of incorporating the chrominance component for scene classification tasks is demonstrated using a deep belief network-based image classification system developed using dense scale-invariant feature transforms. The amount of chrominance information incorporated into the proposed image decolorization technique is confirmed with the improvement to the overall scene classification accuracy. Moreover, the overall scene classification performance improved by combining the models obtained using the proposed method and conventional decolorization methods.
Effect of predictive sign of acceleration on heart rate variability in passive translation situation: preliminary evidence using visual and vestibular stimuli in VR environment

PubMed Central

Watanabe, Hiroshi; Teramoto, Wataru; Umemura, Hiroyuki

2007-01-01

Objective We studied the effects of the presentation of a visual sign that warned subjects of acceleration around the yaw and pitch axes in virtual reality (VR) on their heart rate variability. Methods Synchronization of the immersive virtual reality equipment (CAVE) and motion base system generated a driving scene and provided subjects with dynamic and wide-ranging depth information and vestibular input. The heart rate variability of 21 subjects was measured while the subjects observed a simulated driving scene for 16 minutes under three different conditions. Results When the predictive sign of the acceleration appeared 3500 ms before the acceleration, the index of the activity of the autonomic nervous system (low/high frequency ratio; LF/HF ratio) of subjects did not change much, whereas when no sign appeared the LF/HF ratio increased over the observation time. When the predictive sign of the acceleration appeared 750 ms before the acceleration, no systematic change occurred. Conclusion The visual sign which informed subjects of the acceleration affected the activity of the autonomic nervous system when it appeared long enough before the acceleration. Also, our results showed the importance of the interval between the sign and the event and the relationship between the gradual representation of events and their quantity. PMID:17903267
Advanced interactive display formats for terminal area traffic control

NASA Technical Reports Server (NTRS)

Grunwald, Arthur J.

1995-01-01

The basic design considerations for perspective Air Traffic Control displays are described. A software framework has been developed for manual viewing parameter setting (MVPS) in preparation for continued, ongoing developments on automated viewing parameter setting (AVPS) schemes. The MVPS system is based on indirect manipulation of the viewing parameters. Requests for changes in viewing parameter setting are entered manually by the operator by moving viewing parameter manipulation pointers on the screen. The motion of these pointers, which are an integral part of the 3-D scene, is limited to the boundaries of screen. This arrangement has been chosen, in order to preserve the correspondence between the new and the old viewing parameter setting, a feature which contributes to preventing spatial disorientation of the operator. For all viewing operations, e.g. rotation, translation and ranging, the actual change is executed automatically by the system, through gradual transitions with an exponentially damped, sinusoidal velocity profile, in this work referred to as 'slewing' motions. The slewing functions, which eliminate discontinuities in the viewing parameter changes, are designed primarily for enhancing the operator's impression that he, or she, is dealing with an actually existing physical system, rather than an abstract computer generated scene. Current, ongoing efforts deal with the development of automated viewing parameter setting schemes. These schemes employ an optimization strategy, aimed at identifying the best possible vantage point, from which the Air Traffic Control scene can be viewed, for a given traffic situation.
Cat and mouse search: the influence of scene and object analysis on eye movements when targets change locations during search.

PubMed

Hillstrom, Anne P; Segabinazi, Joice D; Godwin, Hayward J; Liversedge, Simon P; Benson, Valerie

2017-02-19

We explored the influence of early scene analysis and visible object characteristics on eye movements when searching for objects in photographs of scenes. On each trial, participants were shown sequentially either a scene preview or a uniform grey screen (250 ms), a visual mask, the name of the target and the scene, now including the target at a likely location. During the participant's first saccade during search, the target location was changed to: (i) a different likely location, (ii) an unlikely but possible location or (iii) a very implausible location. The results showed that the first saccade landed more often on the likely location in which the target re-appeared than on unlikely or implausible locations, and overall the first saccade landed nearer the first target location with a preview than without. Hence, rapid scene analysis influenced initial eye movement planning, but availability of the target rapidly modified that plan. After the target moved, it was found more quickly when it appeared in a likely location than when it appeared in an unlikely or implausible location. The findings show that both scene gist and object properties are extracted rapidly, and are used in conjunction to guide saccadic eye movements during visual search.This article is part of the themed issue 'Auditory and visual scene analysis'. © 2017 The Author(s).
View-invariant object category learning, recognition, and search: how spatial and object attention are coordinated using surface-based attentional shrouds.

PubMed

Fazl, Arash; Grossberg, Stephen; Mingolla, Ennio

2009-02-01

How does the brain learn to recognize an object from multiple viewpoints while scanning a scene with eye movements? How does the brain avoid the problem of erroneously classifying parts of different objects together? How are attention and eye movements intelligently coordinated to facilitate object learning? A neural model provides a unified mechanistic explanation of how spatial and object attention work together to search a scene and learn what is in it. The ARTSCAN model predicts how an object's surface representation generates a form-fitting distribution of spatial attention, or "attentional shroud". All surface representations dynamically compete for spatial attention to form a shroud. The winning shroud persists during active scanning of the object. The shroud maintains sustained activity of an emerging view-invariant category representation while multiple view-specific category representations are learned and are linked through associative learning to the view-invariant object category. The shroud also helps to restrict scanning eye movements to salient features on the attended object. Object attention plays a role in controlling and stabilizing the learning of view-specific object categories. Spatial attention hereby coordinates the deployment of object attention during object category learning. Shroud collapse releases a reset signal that inhibits the active view-invariant category in the What cortical processing stream. Then a new shroud, corresponding to a different object, forms in the Where cortical processing stream, and search using attention shifts and eye movements continues to learn new objects throughout a scene. The model mechanistically clarifies basic properties of attention shifts (engage, move, disengage) and inhibition of return. It simulates human reaction time data about object-based spatial attention shifts, and learns with 98.1% accuracy and a compression of 430 on a letter database whose letters vary in size, position, and orientation. The model provides a powerful framework for unifying many data about spatial and object attention, and their interactions during perception, cognition, and action.
Tracker Toolkit

NASA Technical Reports Server (NTRS)

Lewis, Steven J.; Palacios, David M.

2013-01-01

This software can track multiple moving objects within a video stream simultaneously, use visual features to aid in the tracking, and initiate tracks based on object detection in a subregion. A simple programmatic interface allows plugging into larger image chain modeling suites. It extracts unique visual features for aid in tracking and later analysis, and includes sub-functionality for extracting visual features about an object identified within an image frame. Tracker Toolkit utilizes a feature extraction algorithm to tag each object with metadata features about its size, shape, color, and movement. Its functionality is independent of the scale of objects within a scene. The only assumption made on the tracked objects is that they move. There are no constraints on size within the scene, shape, or type of movement. The Tracker Toolkit is also capable of following an arbitrary number of objects in the same scene, identifying and propagating the track of each object from frame to frame. Target objects may be specified for tracking beforehand, or may be dynamically discovered within a tripwire region. Initialization of the Tracker Toolkit algorithm includes two steps: Initializing the data structures for tracked target objects, including targets preselected for tracking; and initializing the tripwire region. If no tripwire region is desired, this step is skipped. The tripwire region is an area within the frames that is always checked for new objects, and all new objects discovered within the region will be tracked until lost (by leaving the frame, stopping, or blending in to the background).
A Context-Aware-Based Audio Guidance System for Blind People Using a Multimodal Profile Model

PubMed Central

Lin, Qing; Han, Youngjoon

2014-01-01

A wearable guidance system is designed to provide context-dependent guidance messages to blind people while they traverse local pathways. The system is composed of three parts: moving scene analysis, walking context estimation and audio message delivery. The combination of a downward-pointing laser scanner and a camera is used to solve the challenging problem of moving scene analysis. By integrating laser data profiles and image edge profiles, a multimodal profile model is constructed to estimate jointly the ground plane, object locations and object types, by using a Bayesian network. The outputs of the moving scene analysis are further employed to estimate the walking context, which is defined as a fuzzy safety level that is inferred through a fuzzy logic model. Depending on the estimated walking context, the audio messages that best suit the current context are delivered to the user in a flexible manner. The proposed system is tested under various local pathway scenes, and the results confirm its efficiency in assisting blind people to attain autonomous mobility. PMID:25302812
Ventral-stream-like shape representation: from pixel intensity values to trainable object-selective COSFIRE models

PubMed Central

Azzopardi, George; Petkov, Nicolai

2014-01-01

The remarkable abilities of the primate visual system have inspired the construction of computational models of some visual neurons. We propose a trainable hierarchical object recognition model, which we call S-COSFIRE (S stands for Shape and COSFIRE stands for Combination Of Shifted FIlter REsponses) and use it to localize and recognize objects of interests embedded in complex scenes. It is inspired by the visual processing in the ventral stream (V1/V2 → V4 → TEO). Recognition and localization of objects embedded in complex scenes is important for many computer vision applications. Most existing methods require prior segmentation of the objects from the background which on its turn requires recognition. An S-COSFIRE filter is automatically configured to be selective for an arrangement of contour-based features that belong to a prototype shape specified by an example. The configuration comprises selecting relevant vertex detectors and determining certain blur and shift parameters. The response is computed as the weighted geometric mean of the blurred and shifted responses of the selected vertex detectors. S-COSFIRE filters share similar properties with some neurons in inferotemporal cortex, which provided inspiration for this work. We demonstrate the effectiveness of S-COSFIRE filters in two applications: letter and keyword spotting in handwritten manuscripts and object spotting in complex scenes for the computer vision system of a domestic robot. S-COSFIRE filters are effective to recognize and localize (deformable) objects in images of complex scenes without requiring prior segmentation. They are versatile trainable shape detectors, conceptually simple and easy to implement. The presented hierarchical shape representation contributes to a better understanding of the brain and to more robust computer vision algorithms. PMID:25126068

The scene and the unseen: manipulating photographs for experiments on change blindness and scene memory: image manipulation for change blindness.

PubMed

Ball, Felix; Elzemann, Anne; Busch, Niko A

2014-09-01

The change blindness paradigm, in which participants often fail to notice substantial changes in a scene, is a popular tool for studying scene perception, visual memory, and the link between awareness and attention. Some of the most striking and popular examples of change blindness have been demonstrated with digital photographs of natural scenes; in most studies, however, much simpler displays, such as abstract stimuli or "free-floating" objects, are typically used. Although simple displays have undeniable advantages, natural scenes remain a very useful and attractive stimulus for change blindness research. To assist researchers interested in using natural-scene stimuli in change blindness experiments, we provide here a step-by-step tutorial on how to produce changes in natural-scene images with a freely available image-processing tool (GIMP). We explain how changes in a scene can be made by deleting objects or relocating them within the scene or by changing the color of an object, in just a few simple steps. We also explain how the physical properties of such changes can be analyzed using GIMP and MATLAB (a high-level scientific programming tool). Finally, we present an experiment confirming that scenes manipulated according to our guidelines are effective in inducing change blindness and demonstrating the relationship between change blindness and the physical properties of the change and inter-individual differences in performance measures. We expect that this tutorial will be useful for researchers interested in studying the mechanisms of change blindness, attention, or visual memory using natural scenes.
Classification of Mls Point Clouds in Urban Scenes Using Detrended Geometric Features from Supervoxel-Based Local Contexts

NASA Astrophysics Data System (ADS)

Sun, Z.; Xu, Y.; Hoegner, L.; Stilla, U.

2018-05-01

In this work, we propose a classification method designed for the labeling of MLS point clouds, with detrended geometric features extracted from the points of the supervoxel-based local context. To achieve the analysis of complex 3D urban scenes, acquired points of the scene should be tagged with individual labels of different classes. Thus, assigning a unique label to the points of an object that belong to the same category plays an essential role in the entire 3D scene analysis workflow. Although plenty of studies in this field have been reported, this work is still a challenging task. Specifically, in this work: 1) A novel geometric feature extraction method, detrending the redundant and in-salient information in the local context, is proposed, which is proved to be effective for extracting local geometric features from the 3D scene. 2) Instead of using individual point as basic element, the supervoxel-based local context is designed to encapsulate geometric characteristics of points, providing a flexible and robust solution for feature extraction. 3) Experiments using complex urban scene with manually labeled ground truth are conducted, and the performance of proposed method with respect to different methods is analyzed. With the testing dataset, we have obtained a result of 0.92 for overall accuracy for assigning eight semantic classes.
Effect of stationary objects on illusory forward self-motion induced by a looming display.

PubMed

Ohmi, M; Howard, I P

1988-01-01

It has previously been shown that when a moving and a stationary display are superimposed, illusory self-rotation (circular vection) is induced only when the moving display appears as the background. Three experiments are reported on the extent to which illusory forward self-motion (forward vection) induced by a looming display is inhibited by a superimposed stationary display as a function of the size and location of the stationary display and of the depth between the stationary and looming displays. Results showed that forward vection was controlled by the display that was perceived as the background, and background stationary displays suppressed forward vection by about the same amount whatever their size and eccentricity. Also, the perception of foreground-background properties of competing displays determined which controlled forward vection, and this control was not tied to specific depth cues. The inhibitory effect of a stationary background on forward vection was, however, weaker than that found with circular vection. This difference makes sense because, for forward body motion, the image of a distant scene is virtually stationary whereas, when the body rotates, it is not.
Modeling visual clutter perception using proto-object segmentation

PubMed Central

Yu, Chen-Ping; Samaras, Dimitris; Zelinsky, Gregory J.

2014-01-01

We introduce the proto-object model of visual clutter perception. This unsupervised model segments an image into superpixels, then merges neighboring superpixels that share a common color cluster to obtain proto-objects—defined here as spatially extended regions of coherent features. Clutter is estimated by simply counting the number of proto-objects. We tested this model using 90 images of realistic scenes that were ranked by observers from least to most cluttered. Comparing this behaviorally obtained ranking to a ranking based on the model clutter estimates, we found a significant correlation between the two (Spearman's ρ = 0.814, p < 0.001). We also found that the proto-object model was highly robust to changes in its parameters and was generalizable to unseen images. We compared the proto-object model to six other models of clutter perception and demonstrated that it outperformed each, in some cases dramatically. Importantly, we also showed that the proto-object model was a better predictor of clutter perception than an actual count of the number of objects in the scenes, suggesting that the set size of a scene may be better described by proto-objects than objects. We conclude that the success of the proto-object model is due in part to its use of an intermediate level of visual representation—one between features and objects—and that this is evidence for the potential importance of a proto-object representation in many common visual percepts and tasks. PMID:24904121
An Evaluation of Pixel-Based Methods for the Detection of Floating Objects on the Sea Surface

NASA Astrophysics Data System (ADS)

Borghgraef, Alexander; Barnich, Olivier; Lapierre, Fabian; Van Droogenbroeck, Marc; Philips, Wilfried; Acheroy, Marc

2010-12-01

Ship-based automatic detection of small floating objects on an agitated sea surface remains a hard problem. Our main concern is the detection of floating mines, which proved a real threat to shipping in confined waterways during the first Gulf War, but applications include salvaging, search-and-rescue operation, perimeter, or harbour defense. Detection in infrared (IR) is challenging because a rough sea is seen as a dynamic background of moving objects with size order, shape, and temperature similar to those of the floating mine. In this paper we have applied a selection of background subtraction algorithms to the problem, and we show that the recent algorithms such as ViBe and behaviour subtraction, which take into account spatial and temporal correlations within the dynamic scene, significantly outperform the more conventional parametric techniques, with only little prior assumptions about the physical properties of the scene.
Modulation of visually evoked movement responses in moving virtual environments.

PubMed

Reed-Jones, Rebecca J; Vallis, Lori Ann

2009-01-01

Virtual-reality technology is being increasingly used to understand how humans perceive and act in the moving world around them. What is currently not clear is how virtual reality technology is perceived by human participants and what virtual scenes are effective in evoking movement responses to visual stimuli. We investigated the effect of virtual-scene context on human responses to a virtual visual perturbation. We hypothesised that exposure to a natural scene that matched the visual expectancies of the natural world would create a perceptual set towards presence, and thus visual guidance of body movement in a subsequently presented virtual scene. Results supported this hypothesis; responses to a virtual visual perturbation presented in an ambiguous virtual scene were increased when participants first viewed a scene that consisted of natural landmarks which provided 'real-world' visual motion cues. Further research in this area will provide a basis of knowledge for the effective use of this technology in the study of human movement responses.
Utilization of DIRSIG in support of real-time infrared scene generation

NASA Astrophysics Data System (ADS)

Sanders, Jeffrey S.; Brown, Scott D.

2000-07-01

Real-time infrared scene generation for hardware-in-the-loop has been a traditionally difficult challenge. Infrared scenes are usually generated using commercial hardware that was not designed to properly handle the thermal and environmental physics involved. Real-time infrared scenes typically lack details that are included in scenes rendered in no-real- time by ray-tracing programs such as the Digital Imaging and Remote Sensing Scene Generation (DIRSIG) program. However, executing DIRSIG in real-time while retaining all the physics is beyond current computational capabilities for many applications. DIRSIG is a first principles-based synthetic image generation model that produces multi- or hyper-spectral images in the 0.3 to 20 micron region of the electromagnetic spectrum. The DIRSIG model is an integrated collection of independent first principles based on sub-models, each of which works in conjunction to produce radiance field images with high radiometric fidelity. DIRSIG uses the MODTRAN radiation propagation model for exo-atmospheric irradiance, emitted and scattered radiances (upwelled and downwelled) and path transmission predictions. This radiometry submodel utilizes bidirectional reflectance data, accounts for specular and diffuse background contributions, and features path length dependent extinction and emission for transmissive bodies (plumes, clouds, etc.) which may be present in any target, background or solar path. This detailed environmental modeling greatly enhances the number of rendered features and hence, the fidelity of a rendered scene. While DIRSIG itself cannot currently be executed in real-time, its outputs can be used to provide scene inputs for real-time scene generators. These inputs can incorporate significant features such as target to background thermal interactions, static background object thermal shadowing, and partially transmissive countermeasures. All of these features represent significant improvements over the current state of the art in real-time IR scene generation.
Cybersickness in the presence of scene rotational movements along different axes.

PubMed

Lo, W T; So, R H

2001-02-01

Compelling scene movements in a virtual reality (VR) system can cause symptoms of motion sickness (i.e., cybersickness). A within-subject experiment has been conducted to investigate the effects of scene oscillations along different axes on the level of cybersickness. Sixteen male participants were exposed to four 20-min VR simulation sessions. The four sessions used the same virtual environment but with scene oscillations along different axes, i.e., pitch, yaw, roll, or no oscillation (speed: 30 degrees/s, range: +/- 60 degrees). Verbal ratings of the level of nausea were taken at 5-min intervals during the sessions and sickness symptoms were also measured before and after the sessions using the Simulator Sickness Questionnaire (SSQ). In the presence of scene oscillation, both nausea ratings and SSQ scores increased at significantly higher rates than with no oscillation. While individual participants exhibited different susceptibilities to nausea associated with VR simulation containing scene oscillations along different rotational axes, the overall effects of axis among our group of 16 randomly selected participants were not significant. The main effects of, and interactions among, scene oscillation, duration, and participants are discussed in the paper.
Motion planning for autonomous vehicle based on radial basis function neural network in unstructured environment.

PubMed

Chen, Jiajia; Zhao, Pan; Liang, Huawei; Mei, Tao

2014-09-18

The autonomous vehicle is an automated system equipped with features like environment perception, decision-making, motion planning, and control and execution technology. Navigating in an unstructured and complex environment is a huge challenge for autonomous vehicles, due to the irregular shape of road, the requirement of real-time planning, and the nonholonomic constraints of vehicle. This paper presents a motion planning method, based on the Radial Basis Function (RBF) neural network, to guide the autonomous vehicle in unstructured environments. The proposed algorithm extracts the drivable region from the perception grid map based on the global path, which is available in the road network. The sample points are randomly selected in the drivable region, and a gradient descent method is used to train the RBF network. The parameters of the motion-planning algorithm are verified through the simulation and experiment. It is observed that the proposed approach produces a flexible, smooth, and safe path that can fit any road shape. The method is implemented on autonomous vehicle and verified against many outdoor scenes; furthermore, a comparison of proposed method with the existing well-known Rapidly-exploring Random Tree (RRT) method is presented. The experimental results show that the proposed method is highly effective in planning the vehicle path and offers better motion quality.
Motion Planning for Autonomous Vehicle Based on Radial Basis Function Neural Network in Unstructured Environment

PubMed Central

Chen, Jiajia; Zhao, Pan; Liang, Huawei; Mei, Tao

2014-01-01

The autonomous vehicle is an automated system equipped with features like environment perception, decision-making, motion planning, and control and execution technology. Navigating in an unstructured and complex environment is a huge challenge for autonomous vehicles, due to the irregular shape of road, the requirement of real-time planning, and the nonholonomic constraints of vehicle. This paper presents a motion planning method, based on the Radial Basis Function (RBF) neural network, to guide the autonomous vehicle in unstructured environments. The proposed algorithm extracts the drivable region from the perception grid map based on the global path, which is available in the road network. The sample points are randomly selected in the drivable region, and a gradient descent method is used to train the RBF network. The parameters of the motion-planning algorithm are verified through the simulation and experiment. It is observed that the proposed approach produces a flexible, smooth, and safe path that can fit any road shape. The method is implemented on autonomous vehicle and verified against many outdoor scenes; furthermore, a comparison of proposed method with the existing well-known Rapidly-exploring Random Tree (RRT) method is presented. The experimental results show that the proposed method is highly effective in planning the vehicle path and offers better motion quality. PMID:25237902
Contextual Congruency Effect in Natural Scene Categorization: Different Strategies in Humans and Monkeys (Macaca mulatta)

PubMed Central

Collet, Anne-Claire; Fize, Denis; VanRullen, Rufin

2015-01-01

Rapid visual categorization is a crucial ability for survival of many animal species, including monkeys and humans. In real conditions, objects (either animate or inanimate) are never isolated but embedded in a complex background made of multiple elements. It has been shown in humans and monkeys that the contextual background can either enhance or impair object categorization, depending on context/object congruency (for example, an animal in a natural vs. man-made environment). Moreover, a scene is not only a collection of objects; it also has global physical features (i.e phase and amplitude of Fourier spatial frequencies) which help define its gist. In our experiment, we aimed to explore and compare the contribution of the amplitude spectrum of scenes in the context-object congruency effect in monkeys and humans. We designed a rapid visual categorization task, Animal versus Non-Animal, using as contexts both real scenes photographs and noisy backgrounds built from the amplitude spectrum of real scenes but with randomized phase spectrum. We showed that even if the contextual congruency effect was comparable in both species when the context was a real scene, it differed when the foreground object was surrounded by a noisy background: in monkeys we found a similar congruency effect in both conditions, but in humans the congruency effect was absent (or even reversed) when the context was a noisy background. PMID:26207915
Robust object tracking techniques for vision-based 3D motion analysis applications

NASA Astrophysics Data System (ADS)

Knyaz, Vladimir A.; Zheltov, Sergey Y.; Vishnyakov, Boris V.

2016-04-01

Automated and accurate spatial motion capturing of an object is necessary for a wide variety of applications including industry and science, virtual reality and movie, medicine and sports. For the most part of applications a reliability and an accuracy of the data obtained as well as convenience for a user are the main characteristics defining the quality of the motion capture system. Among the existing systems for 3D data acquisition, based on different physical principles (accelerometry, magnetometry, time-of-flight, vision-based), optical motion capture systems have a set of advantages such as high speed of acquisition, potential for high accuracy and automation based on advanced image processing algorithms. For vision-based motion capture accurate and robust object features detecting and tracking through the video sequence are the key elements along with a level of automation of capturing process. So for providing high accuracy of obtained spatial data the developed vision-based motion capture system "Mosca" is based on photogrammetric principles of 3D measurements and supports high speed image acquisition in synchronized mode. It includes from 2 to 4 technical vision cameras for capturing video sequences of object motion. The original camera calibration and external orientation procedures provide the basis for high accuracy of 3D measurements. A set of algorithms as for detecting, identifying and tracking of similar targets, so for marker-less object motion capture is developed and tested. The results of algorithms' evaluation show high robustness and high reliability for various motion analysis tasks in technical and biomechanics applications.
An Analysis of the High Frequency Vibrations in Early Thematic Mapper Scenes

NASA Technical Reports Server (NTRS)

Kogut, J.; Larduinat, E.

1984-01-01

The potential effects of high frequency vibrations on the final Thematic Mapper (TM) image are evaluated for 26 scenes. The angular displacements of the TM detectors from their nominal pointing directions as measured by the TM Angular Displacement Sensor (ADS) and the spacecraft Dry Rotor Inertial Reference Unit (DRIRU) give data on the along scan and cross scan high frequency vibrations present in each scan of a scene. These measurements are to find the maximum overlap and underlap between successive scans, and to analyze the spectrum of the high frequency vibrations acting on the detectors. The Fourier spectrum of the along scan and cross scan vibrations for each scene also evaluated. The spectra of the scenes examined indicate that the high frequency vibrations arise primarily from the motion of the TM and MSS mirrors, and that their amplitudes are well within expected ranges.
Large Scale Structure From Motion for Autonomous Underwater Vehicle Surveys

DTIC Science & Technology

2004-09-01

Govern the Formation of Multiple Images of a Scene and Some of Their Applications. MIT Press, 2001. [26] 0. Faugeras and S. Maybank . Motion from point...Machine Vision Conference, volume 1, pages 384-393, September 2002. [69] S. Maybank and 0. Faugeras. A theory of self-calibration of a moving camera
View-Invariant Object Category Learning, Recognition, and Search: How Spatial and Object Attention are Coordinated Using Surface-Based Attentional Shrouds

ERIC Educational Resources Information Center

Fazl, Arash; Grossberg, Stephen; Mingolla, Ennio

2009-01-01

How does the brain learn to recognize an object from multiple viewpoints while scanning a scene with eye movements? How does the brain avoid the problem of erroneously classifying parts of different objects together? How are attention and eye movements intelligently coordinated to facilitate object learning? A neural model provides a unified…
Scene perception and memory revealed by eye movements and receiver-operating characteristic analyses: does a cultural difference truly exist?

PubMed

Evans, Kris; Rotello, Caren M; Li, Xingshan; Rayner, Keith

2009-02-01

Cultural differences have been observed in scene perception and memory: Chinese participants purportedly attend to the background information more than did American participants. We investigated the influence of culture by recording eye movements during scene perception and while participants made recognition memory judgements. Real-world pictures with a focal object on a background were shown to both American and Chinese participants while their eye movements were recorded. Later, memory for the focal object in each scene was tested, and the relationship between the focal object (studied, new) and the background context (studied, new) was manipulated. Receiver-operating characteristic (ROC) curves show that both sensitivity and response bias were changed when objects were tested in new contexts. However, neither the decrease in accuracy nor the response bias shift differed with culture. The eye movement patterns were also similar across cultural groups. Both groups made longer and more fixations on the focal objects than on the contexts. The similarity of eye movement patterns and recognition memory behaviour suggests that both Americans and Chinese use the same strategies in scene perception and memory.
Deep learning based hand gesture recognition in complex scenes

NASA Astrophysics Data System (ADS)

Ni, Zihan; Sang, Nong; Tan, Cheng

2018-03-01

Recently, region-based convolutional neural networks(R-CNNs) have achieved significant success in the field of object detection, but their accuracy is not too high for small objects and similar objects, such as the gestures. To solve this problem, we present an online hard example testing(OHET) technology to evaluate the confidence of the R-CNNs' outputs, and regard those outputs with low confidence as hard examples. In this paper, we proposed a cascaded networks to recognize the gestures. Firstly, we use the region-based fully convolutional neural network(R-FCN), which is capable of the detection for small object, to detect the gestures, and then use the OHET to select the hard examples. To enhance the accuracy of the gesture recognition, we re-classify the hard examples through VGG-19 classification network to obtain the final output of the gesture recognition system. Through the contrast experiments with other methods, we can see that the cascaded networks combined with the OHET reached to the state-of-the-art results of 99.3% mAP on small and similar gestures in complex scenes.
The effect of scene context on episodic object recognition: parahippocampal cortex mediates memory encoding and retrieval success.

PubMed

Hayes, Scott M; Nadel, Lynn; Ryan, Lee

2007-01-01

Previous research has investigated intentional retrieval of contextual information and contextual influences on object identification and word recognition, yet few studies have investigated context effects in episodic memory for objects. To address this issue, unique objects embedded in a visually rich scene or on a white background were presented to participants. At test, objects were presented either in the original scene or on a white background. A series of behavioral studies with young adults demonstrated a context shift decrement (CSD)-decreased recognition performance when context is changed between encoding and retrieval. The CSD was not attenuated by encoding or retrieval manipulations, suggesting that binding of object and context may be automatic. A final experiment explored the neural correlates of the CSD, using functional Magnetic Resonance Imaging. Parahippocampal cortex (PHC) activation (right greater than left) during incidental encoding was associated with subsequent memory of objects in the context shift condition. Greater activity in right PHC was also observed during successful recognition of objects previously presented in a scene. Finally, a subset of regions activated during scene encoding, such as bilateral PHC, was reactivated when the object was presented on a white background at retrieval. Although participants were not required to intentionally retrieve contextual information, the results suggest that PHC may reinstate visual context to mediate successful episodic memory retrieval. The CSD is attributed to automatic and obligatory binding of object and context. The results suggest that PHC is important not only for processing of scene information, but also plays a role in successful episodic memory encoding and retrieval. These findings are consistent with the view that spatial information is stored in the hippocampal complex, one of the central tenets of Multiple Trace Theory. (c) 2007 Wiley-Liss, Inc.
Probability distributions of whisker-surface contact: quantifying elements of the rat vibrissotactile natural scene.

PubMed

Hobbs, Jennifer A; Towal, R Blythe; Hartmann, Mitra J Z

2015-08-01

Analysis of natural scene statistics has been a powerful approach for understanding neural coding in the auditory and visual systems. In the field of somatosensation, it has been more challenging to quantify the natural tactile scene, in part because somatosensory signals are so tightly linked to the animal's movements. The present work takes a step towards quantifying the natural tactile scene for the rat vibrissal system by simulating rat whisking motions to systematically investigate the probabilities of whisker-object contact in naturalistic environments. The simulations permit an exhaustive search through the complete space of possible contact patterns, thereby allowing for the characterization of the patterns that would most likely occur during long sequences of natural exploratory behavior. We specifically quantified the probabilities of 'concomitant contact', that is, given that a particular whisker makes contact with a surface during a whisk, what is the probability that each of the other whiskers will also make contact with the surface during that whisk? Probabilities of concomitant contact were quantified in simulations that assumed increasingly naturalistic conditions: first, the space of all possible head poses; second, the space of behaviorally preferred head poses as measured experimentally; and third, common head poses in environments such as cages and burrows. As environments became more naturalistic, the probability distributions shifted from exhibiting a 'row-wise' structure to a more diagonal structure. Results also reveal that the rat appears to use motor strategies (e.g. head pitches) that generate contact patterns that are particularly well suited to extract information in the presence of uncertainty. © 2015. Published by The Company of Biologists Ltd.
Signature modelling and radiometric rendering equations in infrared scene simulation systems

NASA Astrophysics Data System (ADS)

Willers, Cornelius J.; Willers, Maria S.; Lapierre, Fabian

2011-11-01

The development and optimisation of modern infrared systems necessitates the use of simulation systems to create radiometrically realistic representations (e.g. images) of infrared scenes. Such simulation systems are used in signature prediction, the development of surveillance and missile sensors, signal/image processing algorithm development and aircraft self-protection countermeasure system development and evaluation. Even the most cursory investigation reveals a multitude of factors affecting the infrared signatures of realworld objects. Factors such as spectral emissivity, spatial/volumetric radiance distribution, specular reflection, reflected direct sunlight, reflected ambient light, atmospheric degradation and more, all affect the presentation of an object's instantaneous signature. The signature is furthermore dynamically varying as a result of internal and external influences on the object, resulting from the heat balance comprising insolation, internal heat sources, aerodynamic heating (airborne objects), conduction, convection and radiation. In order to accurately render the object's signature in a computer simulation, the rendering equations must therefore account for all the elements of the signature. In this overview paper, the signature models, rendering equations and application frameworks of three infrared simulation systems are reviewed and compared. The paper first considers the problem of infrared scene simulation in a framework for simulation validation. This approach provides concise definitions and a convenient context for considering signature models and subsequent computer implementation. The primary radiometric requirements for an infrared scene simulator are presented next. The signature models and rendering equations implemented in OSMOSIS (Belgian Royal Military Academy), DIRSIG (Rochester Institute of Technology) and OSSIM (CSIR & Denel Dynamics) are reviewed. In spite of these three simulation systems' different application focus areas, their underlying physics-based approach is similar. The commonalities and differences between the different systems are investigated, in the context of their somewhat different application areas. The application of an infrared scene simulation system towards the development of imaging missiles and missile countermeasures are briefly described. Flowing from the review of the available models and equations, recommendations are made to further enhance and improve the signature models and rendering equations in infrared scene simulators.

Three-dimensional simulation and auto-stereoscopic 3D display of the battlefield environment based on the particle system algorithm

NASA Astrophysics Data System (ADS)

Ning, Jiwei; Sang, Xinzhu; Xing, Shujun; Cui, Huilong; Yan, Binbin; Yu, Chongxiu; Dou, Wenhua; Xiao, Liquan

2016-10-01

The army's combat training is very important now, and the simulation of the real battlefield environment is of great significance. Two-dimensional information has been unable to meet the demand at present. With the development of virtual reality technology, three-dimensional (3D) simulation of the battlefield environment is possible. In the simulation of 3D battlefield environment, in addition to the terrain, combat personnel and the combat tool ,the simulation of explosions, fire, smoke and other effects is also very important, since these effects can enhance senses of realism and immersion of the 3D scene. However, these special effects are irregular objects, which make it difficult to simulate with the general geometry. Therefore, the simulation of irregular objects is always a hot and difficult research topic in computer graphics. Here, the particle system algorithm is used for simulating irregular objects. We design the simulation of the explosion, fire, smoke based on the particle system and applied it to the battlefield 3D scene. Besides, the battlefield 3D scene simulation with the glasses-free 3D display is carried out with an algorithm based on GPU 4K super-multiview 3D video real-time transformation method. At the same time, with the human-computer interaction function, we ultimately realized glasses-free 3D display of the simulated more realistic and immersed 3D battlefield environment.
An orbital emulator for pursuit-evasion game theoretic sensor management

NASA Astrophysics Data System (ADS)

Shen, Dan; Wang, Tao; Wang, Gang; Jia, Bin; Wang, Zhonghai; Chen, Genshe; Blasch, Erik; Pham, Khanh

2017-05-01

This paper develops and evaluates an orbital emulator (OE) for space situational awareness (SSA). The OE can produce 3D satellite movements using capabilities generated from omni-wheeled robot and robotic arm motion methods. The 3D motion of a satellite is partitioned into the movements in the equatorial plane and the up-down motions in the vertical plane. The 3D actions are emulated by omni-wheeled robot models while the up-down motions are performed by a stepped-motor-controlled-ball along a rod (robotic arm), which is attached to the robot. For multiple satellites, a fast map-merging algorithm is integrated into the robot operating system (ROS) and simultaneous localization and mapping (SLAM) routines to locate the multiple robots in the scene. The OE is used to demonstrate a pursuit-evasion (PE) game theoretic sensor management algorithm, which models conflicts between a space-based-visible (SBV) satellite (as pursuer) and a geosynchronous (GEO) satellite (as evader). The cost function of the PE game is based on the informational entropy of the SBV-tracking-GEO scenario. GEO can maneuver using a continuous and low thruster. The hard-in-loop space emulator visually illustrates the SSA problem solution based PE game.
Misidentifying a tennis racket as keys: object identification in people with age-related macular degeneration.

PubMed

Thibaut, Miguel; Tran, Thi Ha Chau; Delerue, Céline; Boucart, Muriel

2015-05-01

Previous studies showed that people with age-related macular degeneration (AMD) can categorise a pre-defined target object or scene with high accuracy (above 80%). In these studies participants were asked to detect the target (e.g. an animal) in serial visual presentation. People with AMD must rely on peripheral vision which is more adapted to the low resolution required for detection than for the higher resolution required to identify a specific exemplar. We investigated the ability of people with central vision loss to identify photographs of objects and scenes. Photographs of isolated objects, natural scenes and objects in scenes were centrally displayed for 2 s each. Participants were asked to name the stimuli. We measured accuracy and naming times in 20 patients with AMD, 15 age-matched and 12 young controls. Accuracy was lower (by about 30%) and naming times were longer (by about 300 ms) in people with AMD than in age-matched controls in the three categories of images. Correct identification occurred in 62-66% of the stimuli for patients. More than 20% of the misidentifications resulted from a structural and/or semantic similarity between the object and the name (e.g. spectacles for dog plates or dolphin for shark). Accuracy and naming times did not differ significantly between young and older normally sighted participants indicating that the deficits resulted from pathology rather than to normal ageing. These results show that, in contrast to performance for categorisation of a single pre-defined target, people with central vision loss are impaired at identifying various objects and scenes. The decrease in accuracy and the increase in response times in patients with AMD indicate that peripheral vision might be sufficient for object and scene categorisation but not for precise scene or object identification. © 2015 The Authors Ophthalmic & Physiological Optics © 2015 The College of Optometrists.
Sensory Bias Predicts Postural Stability, Anxiety, and Cognitive Performance in Healthy Adults Walking in Novel Discordant Conditions

NASA Technical Reports Server (NTRS)

Brady, Rachel A.; Batson, Crystal D.; Peters, Brian T.; Mulavara, Ajitkumar P.; Bloomberg, Jacob J.

2010-01-01

We designed a gait training study that presented combinations of visual flow and support surface manipulations to investigate the response of healthy adults to novel discordant sensorimotor conditions. We aimed to determine whether a relationship existed between subjects visual dependence and their scores on a collective measure of anxiety, cognition, and postural stability in a new discordant environment presented at the conclusion of training (Transfer Test). A treadmill was mounted to a motion base platform positioned 2 m behind a large visual screen. Training consisted of three walking sessions, each within a week of the previous visit, that presented four 5-minute exposures to various combinations of support surface and visual scene manipulations, all lateral sinusoids. The conditions were scene translation only, support surface translation only, simultaneous scene and support surface translations in-phase, and simultaneous scene and support surface translations 180 out-of-phase. During the Transfer Test, the trained participants received a 2-minute novel exposure. A visual sinusoidal roll perturbation, with twice the original flow rate, was superimposed on a sinusoidal support surface roll perturbation that was 90 out of phase with the scene. A high correlation existed between normalized torso translation, measured in the scene-only condition at the first visit, and a combined measure of normalized heart rate, stride frequency, and reaction time at the transfer test. Results suggest that visually dependent participants experience decreased postural stability, increased anxiety, and increased reaction times compared to their less visually dependent counterparts when negotiating novel discordant conditions.
Differences in the effects of crowding on size perception and grip scaling in densely cluttered 3-D scenes.

PubMed

Chen, Juan; Sperandio, Irene; Goodale, Melvyn Alan

2015-01-01

Objects rarely appear in isolation in natural scenes. Although many studies have investigated how nearby objects influence perception in cluttered scenes (i.e., crowding), none has studied how nearby objects influence visually guided action. In Experiment 1, we found that participants could scale their grasp to the size of a crowded target even when they could not perceive its size, demonstrating for the first time that neurologically intact participants can use visual information that is not available to conscious report to scale their grasp to real objects in real scenes. In Experiments 2 and 3, we found that changing the eccentricity of the display and the orientation of the flankers had no effect on grasping but strongly affected perception. The differential effects of eccentricity and flanker orientation on perception and grasping show that the known differences in retinotopy between the ventral and dorsal streams are reflected in the way in which people deal with targets in cluttered scenes. © The Author(s) 2014.
Steering and positioning targets for HWIL IR testing at cryogenic conditions

NASA Astrophysics Data System (ADS)

Perkes, D. W.; Jensen, G. L.; Higham, D. L.; Lowry, H. S.; Simpson, W. R.

2006-05-01

In order to increase the fidelity of hardware-in-the-loop ground-truth testing, it is desirable to create a dynamic scene of multiple, independently controlled IR point sources. ATK-Mission Research has developed and supplied the steering mirror systems for the 7V and 10V Space Simulation Test Chambers at the Arnold Engineering Development Center (AEDC), Air Force Materiel Command (AFMC). A portion of the 10V system incorporates multiple target sources beam-combined at the focal point of a 20K cryogenic collimator. Each IR source consists of a precision blackbody with cryogenic aperture and filter wheels mounted on a cryogenic two-axis translation stage. This point source target scene is steered by a high-speed steering mirror to produce further complex motion. The scene changes dynamically in order to simulate an actual operational scene as viewed by the System Under Test (SUT) as it executes various dynamic look-direction changes during its flight to a target. Synchronization and real-time hardware-in-the-loop control is accomplished using reflective memory for each subsystem control and feedback loop. This paper focuses on the steering mirror system and the required tradeoffs of optical performance, precision, repeatability and high-speed motion as well as the complications of encoder feedback calibration and operation at 20K.
Dynamic thermal signature prediction for real-time scene generation

NASA Astrophysics Data System (ADS)

Christie, Chad L.; Gouthas, Efthimios (Themie); Williams, Owen M.; Swierkowski, Leszek

2013-05-01

At DSTO, a real-time scene generation framework, VIRSuite, has been developed in recent years, within which trials data are predominantly used for modelling the radiometric properties of the simulated objects. Since in many cases the data are insufficient, a physics-based simulator capable of predicting the infrared signatures of objects and their backgrounds has been developed as a new VIRSuite module. It includes transient heat conduction within the materials, and boundary conditions that take into account the heat fluxes due to solar radiation, wind convection and radiative transfer. In this paper, an overview is presented, covering both the steady-state and transient performance.
Empirical comparison of a fixed-base and a moving-base simulation of a helicopter engaged in visually conducted slalom runs

NASA Technical Reports Server (NTRS)

Parrish, R. V.; Houck, J. A.; Martin, D. J., Jr.

1977-01-01

Combined visual, motion, and aural cues for a helicopter engaged in visually conducted slalom runs at low altitude were studied. The evaluation of the visual and aural cues was subjective, whereas the motion cues were evaluated both subjectively and objectively. Subjective and objective results coincided in the area of control activity. Generally, less control activity is present under motion conditions than under fixed-base conditions, a fact attributed subjectively to the feeling of realistic limitations of a machine (helicopter) given by the addition of motion cues. The objective data also revealed that the slalom runs were conducted at significantly higher altitudes under motion conditions than under fixed-base conditions.
Initial Scene Representations Facilitate Eye Movement Guidance in Visual Search

ERIC Educational Resources Information Center

Castelhano, Monica S.; Henderson, John M.

2007-01-01

What role does the initial glimpse of a scene play in subsequent eye movement guidance? In 4 experiments, a brief scene preview was followed by object search through the scene via a small moving window that was tied to fixation position. Experiment 1 demonstrated that the scene preview resulted in more efficient eye movements compared with a…
Ubiquitous Creation of Bas-Relief Surfaces with Depth-of-Field Effects Using Smartphones.

PubMed

Sohn, Bong-Soo

2017-03-11

This paper describes a new method to automatically generate digital bas-reliefs with depth-of-field effects from general scenes. Most previous methods for bas-relief generation take input in the form of 3D models. However, obtaining 3D models of real scenes or objects is often difficult, inaccurate, and time-consuming. From this motivation, we developed a method that takes as input a set of photographs that can be quickly and ubiquitously captured by ordinary smartphone cameras. A depth map is computed from the input photographs. The value range of the depth map is compressed and used as a base map representing the overall shape of the bas-relief. However, the resulting base map contains little information on details of the scene. Thus, we construct a detail map using pixel values of the input image to express the details. The base and detail maps are blended to generate a new depth map that reflects both overall depth and scene detail information. This map is selectively blurred to simulate the depth-of-field effects. The final depth map is converted to a bas-relief surface mesh. Experimental results show that our method generates a realistic bas-relief surface of general scenes with no expensive manual processing.
Ubiquitous Creation of Bas-Relief Surfaces with Depth-of-Field Effects Using Smartphones

PubMed Central

Sohn, Bong-Soo

2017-01-01

This paper describes a new method to automatically generate digital bas-reliefs with depth-of-field effects from general scenes. Most previous methods for bas-relief generation take input in the form of 3D models. However, obtaining 3D models of real scenes or objects is often difficult, inaccurate, and time-consuming. From this motivation, we developed a method that takes as input a set of photographs that can be quickly and ubiquitously captured by ordinary smartphone cameras. A depth map is computed from the input photographs. The value range of the depth map is compressed and used as a base map representing the overall shape of the bas-relief. However, the resulting base map contains little information on details of the scene. Thus, we construct a detail map using pixel values of the input image to express the details. The base and detail maps are blended to generate a new depth map that reflects both overall depth and scene detail information. This map is selectively blurred to simulate the depth-of-field effects. The final depth map is converted to a bas-relief surface mesh. Experimental results show that our method generates a realistic bas-relief surface of general scenes with no expensive manual processing. PMID:28287487
Core Geometry in Perspective

ERIC Educational Resources Information Center

Dillon, Moira R.; Spelke, Elizabeth S.

2015-01-01

Research on animals, infants, children, and adults provides evidence that distinct cognitive systems underlie navigation and object recognition. Here we examine whether and how these systems interact when children interpret 2D edge-based perspectival line drawings of scenes and objects. Such drawings serve as symbols early in development, and they…
The Effect of Consistency on Short-Term Memory for Scenes.

PubMed

Gong, Mingliang; Xuan, Yuming; Xu, Xinwen; Fu, Xiaolan

2017-01-01

Which is more detectable, the change of a consistent or an inconsistent object in a scene? This question has been debated for decades. We noted that the change of objects in scenes might simultaneously be accompanied with gist changes. In the present study we aimed to examine how the alteration of gist, as well as the consistency of the changed objects, modulated change detection. In Experiment 1, we manipulated the semantic content by either keeping or changing the consistency of the scene. Results showed that the changes of consistent and inconsistent scenes were equally detected. More importantly, the changes were more accurately detected when scene consistency changed than when the consistency remained unchanged, regardless of the consistency of the memory scenes. A phase-scrambled version of stimuli was adopted in Experiment 2 to decouple the possible confounding effect of low-level factors. The results of Experiment 2 demonstrated that the effect found in Experiment 1 was indeed due to the change of high-level semantic consistency rather than the change of low-level physical features. Together, the study suggests that the change of consistency plays an important role in scene short-term memory, which might be attributed to the sensitivity to the change of semantic content.
The Effect of Consistency on Short-Term Memory for Scenes

PubMed Central

Gong, Mingliang; Xuan, Yuming; Xu, Xinwen; Fu, Xiaolan

2017-01-01

Which is more detectable, the change of a consistent or an inconsistent object in a scene? This question has been debated for decades. We noted that the change of objects in scenes might simultaneously be accompanied with gist changes. In the present study we aimed to examine how the alteration of gist, as well as the consistency of the changed objects, modulated change detection. In Experiment 1, we manipulated the semantic content by either keeping or changing the consistency of the scene. Results showed that the changes of consistent and inconsistent scenes were equally detected. More importantly, the changes were more accurately detected when scene consistency changed than when the consistency remained unchanged, regardless of the consistency of the memory scenes. A phase-scrambled version of stimuli was adopted in Experiment 2 to decouple the possible confounding effect of low-level factors. The results of Experiment 2 demonstrated that the effect found in Experiment 1 was indeed due to the change of high-level semantic consistency rather than the change of low-level physical features. Together, the study suggests that the change of consistency plays an important role in scene short-term memory, which might be attributed to the sensitivity to the change of semantic content. PMID:29046654
Fast generation of video holograms of three-dimensional moving objects using a motion compensation-based novel look-up table.

PubMed

Kim, Seung-Cheol; Dong, Xiao-Bin; Kwon, Min-Woo; Kim, Eun-Soo

2013-05-06

A novel approach for fast generation of video holograms of three-dimensional (3-D) moving objects using a motion compensation-based novel-look-up-table (MC-N-LUT) method is proposed. Motion compensation has been widely employed in compression of conventional 2-D video data because of its ability to exploit high temporal correlation between successive video frames. Here, this concept of motion-compensation is firstly applied to the N-LUT based on its inherent property of shift-invariance. That is, motion vectors of 3-D moving objects are extracted between the two consecutive video frames, and with them motions of the 3-D objects at each frame are compensated. Then, through this process, 3-D object data to be calculated for its video holograms are massively reduced, which results in a dramatic increase of the computational speed of the proposed method. Experimental results with three kinds of 3-D video scenarios reveal that the average number of calculated object points and the average calculation time for one object point of the proposed method, have found to be reduced down to 86.95%, 86.53% and 34.99%, 32.30%, respectively compared to those of the conventional N-LUT and temporal redundancy-based N-LUT (TR-N-LUT) methods.
Object-based spatial attention when objects have sufficient depth cues.

PubMed

Takeya, Ryuji; Kasai, Tetsuko

2015-01-01

Attention directed to a part of an object tends to obligatorily spread over all of the spatial regions that belong to the object, which may be critical for rapid object-recognition in cluttered visual scenes. Previous studies have generally used simple rectangles as objects and have shown that attention spreading is reflected by amplitude modulation in the posterior N1 component (150-200 ms poststimulus) of event-related potentials, while other interpretations (i.e., rectangular holes) may arise implicitly in early visual processing stages. By using modified Kanizsa-type stimuli that provided less ambiguity of depth ordering, the present study examined early event-related potential spatial-attention effects for connected and separated objects, both of which were perceived in front of (Experiment 1) and in back of (Experiment 2) the surroundings. Typical P1 (100-140 ms) and N1 (150-220 ms) attention effects of ERP in response to unilateral probes were observed in both experiments. Importantly, the P1 attention effect was decreased for connected objects compared to separated objects only in Experiment 1, and the typical object-based modulations of N1 were not observed in either experiment. These results suggest that spatial attention spreads over a figural object at earlier stages of processing than previously indicated, in three-dimensional visual scenes with multiple depth cues.
Laser based imaging of time depending microscopic scenes with strong light emission

NASA Astrophysics Data System (ADS)

Hahlweg, Cornelius; Wilhelm, Eugen; Rothe, Hendrik

2011-10-01

Investigating volume scatterometry methods based on short range LIDAR devices for non-static objects we achieved interesting results aside the intended micro-LIDAR: the high speed camera recording of the illuminated scene of an exploding wire -intended for Doppler LIDAR tests - delivered a very effective method of observing details of objects with extremely strong light emission. As a side effect a schlieren movie is gathered without any special effort. The fact that microscopic features of short time processes with high emission and material flow might be imaged without endangering valuable equipment makes this technique at least as interesting as the intended one. So we decided to present our results - including latest video and photo material - instead of a more theoretical paper on our progress concerning the primary goal.
Key frame extraction based on spatiotemporal motion trajectory

NASA Astrophysics Data System (ADS)

Zhang, Yunzuo; Tao, Ran; Zhang, Feng

2015-05-01

Spatiotemporal motion trajectory can accurately reflect the changes of motion state. Motivated by this observation, this letter proposes a method for key frame extraction based on motion trajectory on the spatiotemporal slice. Different from the well-known motion related methods, the proposed method utilizes the inflexions of the motion trajectory on the spatiotemporal slice of all the moving objects. Experimental results show that although a similar performance is achieved in the single-objective screen, by comparing the proposed method to that achieved with the state-of-the-art methods based on motion energy or acceleration, the proposed method shows a better performance in a multiobjective video.
Infants' Individuation of Rigid and Plastic Objects Based on Shape

ERIC Educational Resources Information Center

Schaub, Simone; Bertin, Evelyn; Cacchione, Trix

2013-01-01

Recent research suggests that 12-month-old infants use shape to individuate the number of objects present in a scene. This study addressed the question of whether infants would also rely on shape when shape is only a temporary attribute of an object. Specifically, we investigated whether infants realize that shape changes reliably indicate…
Creating photo-realistic works in a 3D scene using layers styles to create an animation

NASA Astrophysics Data System (ADS)

Avramescu, A. M.

2015-11-01

Creating realist objects in a 3D scene is not an easy work. We have to be very careful to make the creation very detailed. If we don't know how to make these photo-realistic works, by using the techniques and a good reference photo we can create an amazing amount of detail and realism. For example, in this article there are some of these detailed methods from which we can learn the techniques necessary to make beautiful and realistic objects in a scene. More precisely, in this paper, we present how to create a 3D animated scene, mainly using the Pen Tool and Blending Options. Indeed, this work is based on teaching some simple ways of using the Layer Styles to create some great shadows, lights, textures and a realistic sense of 3 Dimension. The present work involves also showing how some interesting ways of using the illuminating and rendering options can create a realistic effect in a scene. Moreover, this article shows how to create photo realistic 3D models from a digital image. The present work proposes to present how to use Illustrator paths, texturing, basic lighting and rendering, how to apply textures and how to parent the building and objects components. We also propose to use this proposition to recreate smaller details or 3D objects from a 2D image. After a critic art stage, we are able now to present in this paper the architecture of a design method that proposes to create an animation. The aim is to create a conceptual and methodological tutorial to address this issue both scientifically and in practice. This objective also includes proposing, on strong scientific basis, a model that gives the possibility of a better understanding of the techniques necessary to create a realistic animation.

Benefit from NASA

NASA Image and Video Library

1999-06-01

Two scientists at NASA Marshall Space Flight Center, atmospheric scientist Paul Meyer (left) and solar physicist Dr. David Hathaway, have developed promising new software, called Video Image Stabilization and Registration (VISAR), that may help law enforcement agencies to catch criminals by improving the quality of video recorded at crime scenes, VISAR stabilizes camera motion in the horizontal and vertical as well as rotation and zoom effects; produces clearer images of moving objects; smoothes jagged edges; enhances still images; and reduces video noise of snow. VISAR could also have applications in medical and meteorological imaging. It could steady images of Ultrasounds which are infamous for their grainy, blurred quality. It would be especially useful for tornadoes, tracking whirling objects and helping to determine the tornado's wind speed. This image shows two scientists reviewing an enhanced video image of a license plate taken from a moving automobile.
The what, where and how of auditory-object perception.

PubMed

Bizley, Jennifer K; Cohen, Yale E

2013-10-01

The fundamental perceptual unit in hearing is the 'auditory object'. Similar to visual objects, auditory objects are the computational result of the auditory system's capacity to detect, extract, segregate and group spectrotemporal regularities in the acoustic environment; the multitude of acoustic stimuli around us together form the auditory scene. However, unlike the visual scene, resolving the component objects within the auditory scene crucially depends on their temporal structure. Neural correlates of auditory objects are found throughout the auditory system. However, neural responses do not become correlated with a listener's perceptual reports until the level of the cortex. The roles of different neural structures and the contribution of different cognitive states to the perception of auditory objects are not yet fully understood.
Dual processing of visual rotation for bipedal stance control.

PubMed

Day, Brian L; Muller, Timothy; Offord, Joanna; Di Giulio, Irene

2016-10-01

When standing, the gain of the body-movement response to a sinusoidally moving visual scene has been shown to get smaller with faster stimuli, possibly through changes in the apportioning of visual flow to self-motion or environment motion. We investigated whether visual-flow speed similarly influences the postural response to a discrete, unidirectional rotation of the visual scene in the frontal plane. Contrary to expectation, the evoked postural response consisted of two sequential components with opposite relationships to visual motion speed. With faster visual rotation the early component became smaller, not through a change in gain but by changes in its temporal structure, while the later component grew larger. We propose that the early component arises from the balance control system minimising apparent self-motion, while the later component stems from the postural system realigning the body with gravity. The source of visual motion is inherently ambiguous such that movement of objects in the environment can evoke self-motion illusions and postural adjustments. Theoretically, the brain can mitigate this problem by combining visual signals with other types of information. A Bayesian model that achieves this was previously proposed and predicts a decreasing gain of postural response with increasing visual motion speed. Here we test this prediction for discrete, unidirectional, full-field visual rotations in the frontal plane of standing subjects. The speed (0.75-48 deg s(-1) ) and direction of visual rotation was pseudo-randomly varied and mediolateral responses were measured from displacements of the trunk and horizontal ground reaction forces. The behaviour evoked by this visual rotation was more complex than has hitherto been reported, consisting broadly of two consecutive components with respective latencies of ∼190 ms and >0.7 s. Both components were sensitive to visual rotation speed, but with diametrically opposite relationships. Thus, the early component decreased with faster visual rotation, while the later component increased. Furthermore, the decrease in size of the early component was not achieved by a simple attenuation of gain, but by a change in its temporal structure. We conclude that the two components represent expressions of different motor functions, both pertinent to the control of bipedal stance. We propose that the early response stems from the balance control system attempting to minimise unintended body motion, while the later response arises from the postural control system attempting to align the body with gravity. © 2016 The Authors. The Journal of Physiology published by John Wiley & Sons Ltd on behalf of The Physiological Society.
Use of Linear Perspective Scene Cues in a Simulated Height Regulation Task

NASA Technical Reports Server (NTRS)

Levison, W. H.; Warren, R.

1984-01-01

As part of a long-term effort to quantify the effects of visual scene cuing and non-visual motion cuing in flight simulators, an experimental study of the pilot's use of linear perspective cues in a simulated height-regulation task was conducted. Six test subjects performed a fixed-base tracking task with a visual display consisting of a simulated horizon and a perspective view of a straight, infinitely-long roadway of constant width. Experimental parameters were (1) the central angle formed by the roadway perspective and (2) the display gain. The subject controlled only the pitch/height axis; airspeed, bank angle, and lateral track were fixed in the simulation. The average RMS height error score for the least effective display configuration was about 25% greater than the score for the most effective configuration. Overall, larger and more highly significant effects were observed for the pitch and control scores. Model analysis was performed with the optimal control pilot model to characterize the pilot's use of visual scene cues, with the goal of obtaining a consistent set of independent model parameters to account for display effects.
Contributions of low- and high-level properties to neural processing of visual scenes in the human brain.

PubMed

Groen, Iris I A; Silson, Edward H; Baker, Chris I

2017-02-19

Visual scene analysis in humans has been characterized by the presence of regions in extrastriate cortex that are selectively responsive to scenes compared with objects or faces. While these regions have often been interpreted as representing high-level properties of scenes (e.g. category), they also exhibit substantial sensitivity to low-level (e.g. spatial frequency) and mid-level (e.g. spatial layout) properties, and it is unclear how these disparate findings can be united in a single framework. In this opinion piece, we suggest that this problem can be resolved by questioning the utility of the classical low- to high-level framework of visual perception for scene processing, and discuss why low- and mid-level properties may be particularly diagnostic for the behavioural goals specific to scene perception as compared to object recognition. In particular, we highlight the contributions of low-level vision to scene representation by reviewing (i) retinotopic biases and receptive field properties of scene-selective regions and (ii) the temporal dynamics of scene perception that demonstrate overlap of low- and mid-level feature representations with those of scene category. We discuss the relevance of these findings for scene perception and suggest a more expansive framework for visual scene analysis.This article is part of the themed issue 'Auditory and visual scene analysis'. © 2017 The Author(s).
Contributions of low- and high-level properties to neural processing of visual scenes in the human brain

PubMed Central

2017-01-01

Visual scene analysis in humans has been characterized by the presence of regions in extrastriate cortex that are selectively responsive to scenes compared with objects or faces. While these regions have often been interpreted as representing high-level properties of scenes (e.g. category), they also exhibit substantial sensitivity to low-level (e.g. spatial frequency) and mid-level (e.g. spatial layout) properties, and it is unclear how these disparate findings can be united in a single framework. In this opinion piece, we suggest that this problem can be resolved by questioning the utility of the classical low- to high-level framework of visual perception for scene processing, and discuss why low- and mid-level properties may be particularly diagnostic for the behavioural goals specific to scene perception as compared to object recognition. In particular, we highlight the contributions of low-level vision to scene representation by reviewing (i) retinotopic biases and receptive field properties of scene-selective regions and (ii) the temporal dynamics of scene perception that demonstrate overlap of low- and mid-level feature representations with those of scene category. We discuss the relevance of these findings for scene perception and suggest a more expansive framework for visual scene analysis. This article is part of the themed issue ‘Auditory and visual scene analysis’. PMID:28044013
Interaction between object-based attention and pertinence values shapes the attentional priority map of a multielement display.

PubMed

Gillebert, Celine R; Petersen, Anders; Van Meel, Chayenne; Müller, Tanja; McIntyre, Alexandra; Wagemans, Johan; Humphreys, Glyn W

2016-06-01

Previous studies have shown that the perceptual organization of the visual scene constrains the deployment of attention. Here we investigated how the organization of multiple elements into larger configurations alters their attentional weight, depending on the "pertinence" or behavioral importance of the elements' features. We assessed object-based effects on distinct aspects of the attentional priority map: top-down control, reflecting the tendency to encode targets rather than distracters, and the spatial distribution of attention weights across the visual scene, reflecting the tendency to report elements belonging to the same rather than different objects. In 2 experiments participants had to report the letters in briefly presented displays containing 8 letters and digits, in which pairs of characters could be connected with a line. Quantitative estimates of top-down control were obtained using Bundesen's Theory of Visual Attention (1990). The spatial distribution of attention weights was assessed using the "paired response index" (PRI), indicating responses for within-object pairs of letters. In Experiment 1, grouping along the task-relevant dimension (targets with targets and distracters with distracters) increased top-down control and enhanced the PRI; in contrast, task-irrelevant grouping (targets with distracters) did not affect performance. In Experiment 2, we disentangled the effect of target-target and distracter-distracter grouping: Pairwise grouping of distracters enhanced top-down control whereas pairwise grouping of targets changed the PRI. We conclude that object-based perceptual representations interact with pertinence values (of the elements' features and location) in the computation of attention weights, thereby creating a widespread pattern of attentional facilitation across the visual scene. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
A review of the findings and theories on surface size effects on visual attention

PubMed Central

Peschel, Anne O.; Orquin, Jacob L.

2013-01-01

That surface size has an impact on attention has been well-known in advertising research for almost a century; however, theoretical accounts of this effect have been sparse. To address this issue, we review studies on surface size effects on eye movements in this paper. While most studies find that large objects are more likely to be fixated, receive more fixations, and are fixated faster than small objects, a comprehensive explanation of this effect is still lacking. To bridge the theoretical gap, we relate the findings from this review to three theories of surface size effects suggested in the literature: a linear model based on the assumption of random fixations (Lohse, 1997), a theory of surface size as visual saliency (Pieters etal., 2007), and a theory based on competition for attention (CA; Janiszewski, 1998). We furthermore suggest a fourth model – demand for attention – which we derive from the theory of CA by revising the underlying model assumptions. In order to test the models against each other, we reanalyze data from an eye tracking study investigating surface size and saliency effects on attention. The reanalysis revealed little support for the first three theories while the demand for attention model showed a much better alignment with the data. We conclude that surface size effects may best be explained as an increase in object signal strength which depends on object size, number of objects in the visual scene, and object distance to the center of the scene. Our findings suggest that advertisers should take into account how objects in the visual scene interact in order to optimize attention to, for instance, brands and logos. PMID:24367343
A review of the findings and theories on surface size effects on visual attention.

PubMed

Peschel, Anne O; Orquin, Jacob L

2013-12-09

That surface size has an impact on attention has been well-known in advertising research for almost a century; however, theoretical accounts of this effect have been sparse. To address this issue, we review studies on surface size effects on eye movements in this paper. While most studies find that large objects are more likely to be fixated, receive more fixations, and are fixated faster than small objects, a comprehensive explanation of this effect is still lacking. To bridge the theoretical gap, we relate the findings from this review to three theories of surface size effects suggested in the literature: a linear model based on the assumption of random fixations (Lohse, 1997), a theory of surface size as visual saliency (Pieters etal., 2007), and a theory based on competition for attention (CA; Janiszewski, 1998). We furthermore suggest a fourth model - demand for attention - which we derive from the theory of CA by revising the underlying model assumptions. In order to test the models against each other, we reanalyze data from an eye tracking study investigating surface size and saliency effects on attention. The reanalysis revealed little support for the first three theories while the demand for attention model showed a much better alignment with the data. We conclude that surface size effects may best be explained as an increase in object signal strength which depends on object size, number of objects in the visual scene, and object distance to the center of the scene. Our findings suggest that advertisers should take into account how objects in the visual scene interact in order to optimize attention to, for instance, brands and logos.
Stroboscopic Goggles for Reduction of Motion Sickness

NASA Technical Reports Server (NTRS)

Reschke, M. F.; Somers, Jeffrey T.

2005-01-01

A device built around a pair of electronic shutters has been demonstrated to be effective as a prototype of stroboscopic goggles or eyeglasses for preventing or reducing motion sickness. The momentary opening of the shutters helps to suppress a phenomenon that is known in the art as retinal slip and is described more fully below. While a number of different environmental factors can induce motion sickness, a common factor associated with every known motion environment is sensory confusion or sensory mismatch. Motion sickness is a product of misinformation arriving at a central point in the nervous system from the senses from which one determines one s spatial orientation. When information from the eyes, ears, joints, and pressure receptors are all in agreement as to one s orientation, there is no motion sickness. When one or more sensory input(s) to the brain is not expected, or conflicts with what is anticipated, the end product is motion sickness. Normally, an observer s eye moves, compensating for the anticipated effect of motion, in such a manner that the image of an object moving relatively to an observer is held stationary on the retina. In almost every known environment that induces motion sickness, a change in the gain (in the signal-processing sense of gain ) of the vestibular system causes the motion of the eye to fail to hold images stationary on the retina, and the resulting motion of the images is termed retinal slip. The present concept of stroboscopic goggles or eyeglasses (see figure) is based on the proposition that prevention of retinal slip, and hence, the prevention of sensory mismatch, can be expected to reduce the tendency toward motion sickness. A device according to this concept helps to prevent retinal slip by providing snapshots of the visual environment through electronic shutters that are brief enough that each snapshot freezes the image on each retina. The exposure time for each snapshot is less than 5 ms. In the event that a higher rate of strobing is necessary for adequate viewing of the changing scene during rapid head movements, the rate of strobing (but not the exposure time) can be controlled in response to the readings of rate-of-rotation sensors attached to the device.
Acceptable bit-rates for human face identification from CCTV imagery

NASA Astrophysics Data System (ADS)

Tsifouti, Anastasia; Triantaphillidou, Sophie; Bilissi, Efthimia; Larabi, Mohamed-Chaker

2013-01-01

The objective of this investigation is to produce recommendations for acceptable bit-rates of CCTV footage of people onboard London buses. The majority of CCTV recorders on buses use a proprietary format based on the H.264/AVC video coding standard, exploiting both spatial and temporal redundancy. Low bit-rates are favored in the CCTV industry but they compromise the image usefulness of the recorded imagery. In this context usefulness is defined by the presence of enough facial information remaining in the compressed image to allow a specialist to identify a person. The investigation includes four steps: 1) Collection of representative video footage. 2) The grouping of video scenes based on content attributes. 3) Psychophysical investigations to identify key scenes, which are most affected by compression. 4) Testing of recording systems using the key scenes and further psychophysical investigations. The results are highly dependent upon scene content. For example, very dark and very bright scenes were the most challenging to compress, requiring higher bit-rates to maintain useful information. The acceptable bit-rates are also found to be dependent upon the specific CCTV system used to compress the footage, presenting challenges in drawing conclusions about universal `average' bit-rates.
A dual-waveband dynamic IR scene projector based on DMD

NASA Astrophysics Data System (ADS)

Hu, Yu; Zheng, Ya-wei; Gao, Jiao-bo; Sun, Ke-feng; Li, Jun-na; Zhang, Lei; Zhang, Fang

2016-10-01

Infrared scene simulation system can simulate multifold objects and backgrounds to perform dynamic test and evaluate EO detecting system in the hardware in-the-loop test. The basic structure of a dual-waveband dynamic IR scene projector was introduced in the paper. The system's core device is an IR Digital Micro-mirror Device (DMD) and the radiant source is a mini-type high temperature IR plane black-body. An IR collimation optical system which transmission range includes 3-5μm and 8-12μm is designed as the projection optical system. Scene simulation software was developed with Visual C++ and Vega soft tools and a software flow chart was presented. The parameters and testing results of the system were given, and this system was applied with satisfying performance in an IR imaging simulation testing.
Televised Whorf: Cognitive Restructuring in Advanced Foreign Language Learners as a Function of Audiovisual Media Exposure

ERIC Educational Resources Information Center

Bylund, Emanuel; Athanasopoulos, Panos

2015-01-01

The encoding of goal-oriented motion events varies across different languages. Speakers of languages without grammatical aspect (e.g., Swedish) tend to mention motion endpoints when describing events (e.g., "two nuns walk to a house") and attach importance to event endpoints when matching scenes from memory. Speakers of aspect languages…
Attention in natural scenes: Affective-motivational factors guide gaze independently of visual salience.

PubMed

Schomaker, Judith; Walper, Daniel; Wittmann, Bianca C; Einhäuser, Wolfgang

2017-04-01

In addition to low-level stimulus characteristics and current goals, our previous experience with stimuli can also guide attentional deployment. It remains unclear, however, if such effects act independently or whether they interact in guiding attention. In the current study, we presented natural scenes including every-day objects that differed in affective-motivational impact. In the first free-viewing experiment, we presented visually-matched triads of scenes in which one critical object was replaced that varied mainly in terms of motivational value, but also in terms of valence and arousal, as confirmed by ratings by a large set of observers. Treating motivation as a categorical factor, we found that it affected gaze. A linear-effect model showed that arousal, valence, and motivation predicted fixations above and beyond visual characteristics, like object size, eccentricity, or visual salience. In a second experiment, we experimentally investigated whether the effects of emotion and motivation could be modulated by visual salience. In a medium-salience condition, we presented the same unmodified scenes as in the first experiment. In a high-salience condition, we retained the saturation of the critical object in the scene, and decreased the saturation of the background, and in a low-salience condition, we desaturated the critical object while retaining the original saturation of the background. We found that highly salient objects guided gaze, but still found additional additive effects of arousal, valence and motivation, confirming that higher-level factors can also guide attention, as measured by fixations towards objects in natural scenes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Mapping Eroded Areas on Mountain Grassland with Terrestrial Photogrammetry and Object-Based Image Analysis

NASA Astrophysics Data System (ADS)

Mayr, Andreas; Rutzinger, Martin; Bremer, Magnus; Geitner, Clemens

2016-06-01

In the Alps as well as in other mountain regions steep grassland is frequently affected by shallow erosion. Often small landslides or snow movements displace the vegetation together with soil and/or unconsolidated material. This results in bare earth surface patches within the grass covered slope. Close-range and remote sensing techniques are promising for both mapping and monitoring these eroded areas. This is essential for a better geomorphological process understanding, to assess past and recent developments, and to plan mitigation measures. Recent developments in image matching techniques make it feasible to produce high resolution orthophotos and digital elevation models from terrestrial oblique images. In this paper we propose to delineate the boundary of eroded areas for selected scenes of a study area, using close-range photogrammetric data. Striving for an efficient, objective and reproducible workflow for this task, we developed an approach for automated classification of the scenes into the classes grass and eroded. We propose an object-based image analysis (OBIA) workflow which consists of image segmentation and automated threshold selection for classification using the Excess Green Vegetation Index (ExG). The automated workflow is tested with ten different scenes. Compared to a manual classification, grass and eroded areas are classified with an overall accuracy between 90.7% and 95.5%, depending on the scene. The methods proved to be insensitive to differences in illumination of the scenes and greenness of the grass. The proposed workflow reduces user interaction and is transferable to other study areas. We conclude that close-range photogrammetry is a valuable low-cost tool for mapping this type of eroded areas in the field with a high level of detail and quality. In future, the output will be used as ground truth for an area-wide mapping of eroded areas in coarser resolution aerial orthophotos acquired at the same time.
Object Interpolation in Three Dimensions

ERIC Educational Resources Information Center

Kellman, Philip J.; Garrigan, Patrick; Shipley, Thomas F.

2005-01-01

Perception of objects in ordinary scenes requires interpolation processes connecting visible areas across spatial gaps. Most research has focused on 2-D displays, and models have been based on 2-D, orientation-sensitive units. The authors present a view of interpolation processes as intrinsically 3-D and producing representations of contours and…
Pixel-By Estimation of Scene Motion in Video

NASA Astrophysics Data System (ADS)

Tashlinskii, A. G.; Smirnov, P. V.; Tsaryov, M. G.

2017-05-01

The paper considers the effectiveness of motion estimation in video using pixel-by-pixel recurrent algorithms. The algorithms use stochastic gradient decent to find inter-frame shifts of all pixels of a frame. These vectors form shift vectors' field. As estimated parameters of the vectors the paper studies their projections and polar parameters. It considers two methods for estimating shift vectors' field. The first method uses stochastic gradient descent algorithm to sequentially process all nodes of the image row-by-row. It processes each row bidirectionally i.e. from the left to the right and from the right to the left. Subsequent joint processing of the results allows compensating inertia of the recursive estimation. The second method uses correlation between rows to increase processing efficiency. It processes rows one after the other with the change in direction after each row and uses obtained values to form resulting estimate. The paper studies two criteria of its formation: gradient estimation minimum and correlation coefficient maximum. The paper gives examples of experimental results of pixel-by-pixel estimation for a video with a moving object and estimation of a moving object trajectory using shift vectors' field.
Impact of age-related macular degeneration on object searches in realistic panoramic scenes.

PubMed

Thibaut, Miguel; Tran, Thi-Ha-Chau; Szaffarczyk, Sebastien; Boucart, Muriel

2018-05-01

This study investigated whether realistic immersive conditions with dynamic indoor scenes presented on a large, hemispheric panoramic screen covering 180° of the visual field improved the visual search abilities of participants with age-related macular degeneration (AMD). Twenty-one participants with AMD, 16 age-matched controls and 16 young observers were included. Realistic indoor scenes were presented on a panoramic five metre diameter screen. Twelve different objects were used as targets. The participants were asked to search for a target object, shown on paper before each trial, within a room composed of various objects. A joystick was used for navigation within the scene views. A target object was present in 24 trials and absent in 24 trials. The percentage of correct detection of the target, the percentage of false alarms (that is, the detection of the target when it was absent), the number of scene views explored and the search time were measured. The search time was slower for participants with AMD than for the age-matched controls, who in turn were slower than the young participants. The participants with AMD were able to accomplish the task with a performance of 75 per cent correct detections. This was slightly lower than older controls (79.2 per cent) while young controls were at ceiling (91.7 per cent). Errors were mainly due to false alarms resulting from confusion between the target object and another object present in the scene in the target-absent trials. The outcomes of the present study indicate that, under realistic conditions, although slower than age-matched, normally sighted controls, participants with AMD were able to accomplish visual searches of objects with high accuracy. © 2017 Optometry Australia.
High Accuracy Monocular SFM and Scale Correction for Autonomous Driving.

PubMed

Song, Shiyu; Chandraker, Manmohan; Guest, Clark C

2016-04-01

We present a real-time monocular visual odometry system that achieves high accuracy in real-world autonomous driving applications. First, we demonstrate robust monocular SFM that exploits multithreading to handle driving scenes with large motions and rapidly changing imagery. To correct for scale drift, we use known height of the camera from the ground plane. Our second contribution is a novel data-driven mechanism for cue combination that allows highly accurate ground plane estimation by adapting observation covariances of multiple cues, such as sparse feature matching and dense inter-frame stereo, based on their relative confidences inferred from visual data on a per-frame basis. Finally, we demonstrate extensive benchmark performance and comparisons on the challenging KITTI dataset, achieving accuracy comparable to stereo and exceeding prior monocular systems. Our SFM system is optimized to output pose within 50 ms in the worst case, while average case operation is over 30 fps. Our framework also significantly boosts the accuracy of applications like object localization that rely on the ground plane.
Gamifying Video Object Segmentation.

PubMed

Spampinato, Concetto; Palazzo, Simone; Giordano, Daniela

2017-10-01

Video object segmentation can be considered as one of the most challenging computer vision problems. Indeed, so far, no existing solution is able to effectively deal with the peculiarities of real-world videos, especially in cases of articulated motion and object occlusions; limitations that appear more evident when we compare the performance of automated methods with the human one. However, manually segmenting objects in videos is largely impractical as it requires a lot of time and concentration. To address this problem, in this paper we propose an interactive video object segmentation method, which exploits, on one hand, the capability of humans to identify correctly objects in visual scenes, and on the other hand, the collective human brainpower to solve challenging and large-scale tasks. In particular, our method relies on a game with a purpose to collect human inputs on object locations, followed by an accurate segmentation phase achieved by optimizing an energy function encoding spatial and temporal constraints between object regions as well as human-provided location priors. Performance analysis carried out on complex video benchmarks, and exploiting data provided by over 60 users, demonstrated that our method shows a better trade-off between annotation times and segmentation accuracy than interactive video annotation and automated video object segmentation approaches.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.