Bag of Lines (BoL) for Improved Aerial Scene Representation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sridharan, Harini; Cheriyadat, Anil M.
2014-09-22
Feature representation is a key step in automated visual content interpretation. In this letter, we present a robust feature representation technique, referred to as bag of lines (BoL), for high-resolution aerial scenes. The proposed technique involves extracting and compactly representing low-level line primitives from the scene. The compact scene representation is generated by counting the different types of lines representing various linear structures in the scene. Through extensive experiments, we show that the proposed scene representation is invariant to scale changes and scene conditions and can discriminate urban scene categories accurately. We compare the BoL representation with the popular scalemore » invariant feature transform (SIFT) and Gabor wavelets for their classification and clustering performance on an aerial scene database consisting of images acquired by sensors with different spatial resolutions. The proposed BoL representation outperforms the SIFT- and Gabor-based representations.« less
Multilayered nonuniform sampling for three-dimensional scene representation
NASA Astrophysics Data System (ADS)
Lin, Huei-Yung; Xiao, Yu-Hua; Chen, Bo-Ren
2015-09-01
The representation of a three-dimensional (3-D) scene is essential in multiview imaging technologies. We present a unified geometry and texture representation based on global resampling of the scene. A layered data map representation with a distance-dependent nonuniform sampling strategy is proposed. It is capable of increasing the details of the 3-D structure locally and is compact in size. The 3-D point cloud obtained from the multilayered data map is used for view rendering. For any given viewpoint, image synthesis with different levels of detail is carried out using the quadtree-based nonuniformly sampled 3-D data points. Experimental results are presented using the 3-D models of reconstructed real objects.
NASA Astrophysics Data System (ADS)
Yoon, Jayoung; Kim, Gerard J.
2003-04-01
Traditionally, three dimension models have been used for building virtual worlds, and a data structure called the "scene graph" is often employed to organize these 3D objects in the virtual space. On the other hand, image-based rendering has recently been suggested as a probable alternative VR platform for its photo-realism, however, due to limited interactivity, it has only been used for simple navigation systems. To combine the merits of these two approaches to object/scene representations, this paper proposes for a scene graph structure in which both 3D models and various image-based scenes/objects can be defined, traversed, and rendered together. In fact, as suggested by Shade et al., these different representations can be used as different LOD's for a given object. For instance, an object might be rendered using a 3D model at close range, a billboard at an intermediate range, and as part of an environment map at far range. The ultimate objective of this mixed platform is to breath more interactivity into the image based rendered VE's by employing 3D models as well. There are several technical challenges in devising such a platform: designing scene graph nodes for various types of image based techniques, establishing criteria for LOD/representation selection, handling their transitions, implementing appropriate interaction schemes, and correctly rendering the overall scene. Currently, we have extended the scene graph structure of the Sense8's WorldToolKit, to accommodate new node types for environment maps billboards, moving textures and sprites, "Tour-into-the-Picture" structure, and view interpolated objects. As for choosing the right LOD level, the usual viewing distance and image space criteria are used, however, the switching between the image and 3D model occurs at a distance from the user where the user starts to perceive the object's internal depth. Also, during interaction, regardless of the viewing distance, a 3D representation would be used, it if exists. Before rendering, objects are conservatively culled from the view frustum using the representation with the largest volume. Finally, we carried out experiments to verify the theoretical derivation of the switching rule and obtained positive results.
Learning object-to-class kernels for scene classification.
Zhang, Lei; Zhen, Xiantong; Shao, Ling
2014-08-01
High-level image representations have drawn increasing attention in visual recognition, e.g., scene classification, since the invention of the object bank. The object bank represents an image as a response map of a large number of pretrained object detectors and has achieved superior performance for visual recognition. In this paper, based on the object bank representation, we propose the object-to-class (O2C) distances to model scene images. In particular, four variants of O2C distances are presented, and with the O2C distances, we can represent the images using the object bank by lower-dimensional but more discriminative spaces, called distance spaces, which are spanned by the O2C distances. Due to the explicit computation of O2C distances based on the object bank, the obtained representations can possess more semantic meanings. To combine the discriminant ability of the O2C distances to all scene classes, we further propose to kernalize the distance representation for the final classification. We have conducted extensive experiments on four benchmark data sets, UIUC-Sports, Scene-15, MIT Indoor, and Caltech-101, which demonstrate that the proposed approaches can significantly improve the original object bank approach and achieve the state-of-the-art performance.
The Relationship Between Online Visual Representation of a Scene and Long-Term Scene Memory
ERIC Educational Resources Information Center
Hollingworth, Andrew
2005-01-01
In 3 experiments the author investigated the relationship between the online visual representation of natural scenes and long-term visual memory. In a change detection task, a target object either changed or remained the same from an initial image of a natural scene to a test image. Two types of changes were possible: rotation in depth, or…
Physics Based Modeling and Rendering of Vegetation in the Thermal Infrared
NASA Technical Reports Server (NTRS)
Smith, J. A.; Ballard, J. R., Jr.
1999-01-01
We outline a procedure for rendering physically-based thermal infrared images of simple vegetation scenes. Our approach incorporates the biophysical processes that affect the temperature distribution of the elements within a scene. Computer graphics plays a key role in two respects. First, in computing the distribution of scene shaded and sunlit facets and, second, in the final image rendering once the temperatures of all the elements in the scene have been computed. We illustrate our approach for a simple corn scene where the three-dimensional geometry is constructed based on measured morphological attributes of the row crop. Statistical methods are used to construct a representation of the scene in agreement with the measured characteristics. Our results are quite good. The rendered images exhibit realistic behavior in directional properties as a function of view and sun angle. The root-mean-square error in measured versus predicted brightness temperatures for the scene was 2.1 deg C.
Blind image deblurring based on trained dictionary and curvelet using sparse representation
NASA Astrophysics Data System (ADS)
Feng, Liang; Huang, Qian; Xu, Tingfa; Li, Shao
2015-04-01
Motion blur is one of the most significant and common artifacts causing poor image quality in digital photography, in which many factors resulted. In imaging process, if the objects are moving quickly in the scene or the camera moves in the exposure interval, the image of the scene would blur along the direction of relative motion between the camera and the scene, e.g. camera shake, atmospheric turbulence. Recently, sparse representation model has been widely used in signal and image processing, which is an effective method to describe the natural images. In this article, a new deblurring approach based on sparse representation is proposed. An overcomplete dictionary learned from the trained image samples via the KSVD algorithm is designed to represent the latent image. The motion-blur kernel can be treated as a piece-wise smooth function in image domain, whose support is approximately a thin smooth curve, so we employed curvelet to represent the blur kernel. Both of overcomplete dictionary and curvelet system have high sparsity, which improves the robustness to the noise and more satisfies the observer's visual demand. With the two priors, we constructed restoration model of blurred images and succeeded to solve the optimization problem with the help of alternating minimization technique. The experiment results prove the method can preserve the texture of original images and suppress the ring artifacts effectively.
Cichy, Radoslaw Martin; Khosla, Aditya; Pantazis, Dimitrios; Oliva, Aude
2017-01-01
Human scene recognition is a rapid multistep process evolving over time from single scene image to spatial layout processing. We used multivariate pattern analyses on magnetoencephalography (MEG) data to unravel the time course of this cortical process. Following an early signal for lower-level visual analysis of single scenes at ~100 ms, we found a marker of real-world scene size, i.e. spatial layout processing, at ~250 ms indexing neural representations robust to changes in unrelated scene properties and viewing conditions. For a quantitative model of how scene size representations may arise in the brain, we compared MEG data to a deep neural network model trained on scene classification. Representations of scene size emerged intrinsically in the model, and resolved emerging neural scene size representation. Together our data provide a first description of an electrophysiological signal for layout processing in humans, and suggest that deep neural networks are a promising framework to investigate how spatial layout representations emerge in the human brain. PMID:27039703
3D Reasoning from Blocks to Stability.
Zhaoyin Jia; Gallagher, Andrew C; Saxena, Ashutosh; Chen, Tsuhan
2015-05-01
Objects occupy physical space and obey physical laws. To truly understand a scene, we must reason about the space that objects in it occupy, and how each objects is supported stably by each other. In other words, we seek to understand which objects would, if moved, cause other objects to fall. This 3D volumetric reasoning is important for many scene understanding tasks, ranging from segmentation of objects to perception of a rich 3D, physically well-founded, interpretations of the scene. In this paper, we propose a new algorithm to parse a single RGB-D image with 3D block units while jointly reasoning about the segments, volumes, supporting relationships, and object stability. Our algorithm is based on the intuition that a good 3D representation of the scene is one that fits the depth data well, and is a stable, self-supporting arrangement of objects (i.e., one that does not topple). We design an energy function for representing the quality of the block representation based on these properties. Our algorithm fits 3D blocks to the depth values corresponding to image segments, and iteratively optimizes the energy function. Our proposed algorithm is the first to consider stability of objects in complex arrangements for reasoning about the underlying structure of the scene. Experimental results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
Martin Cichy, Radoslaw; Khosla, Aditya; Pantazis, Dimitrios; Oliva, Aude
2017-06-01
Human scene recognition is a rapid multistep process evolving over time from single scene image to spatial layout processing. We used multivariate pattern analyses on magnetoencephalography (MEG) data to unravel the time course of this cortical process. Following an early signal for lower-level visual analysis of single scenes at ~100ms, we found a marker of real-world scene size, i.e. spatial layout processing, at ~250ms indexing neural representations robust to changes in unrelated scene properties and viewing conditions. For a quantitative model of how scene size representations may arise in the brain, we compared MEG data to a deep neural network model trained on scene classification. Representations of scene size emerged intrinsically in the model, and resolved emerging neural scene size representation. Together our data provide a first description of an electrophysiological signal for layout processing in humans, and suggest that deep neural networks are a promising framework to investigate how spatial layout representations emerge in the human brain. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Interactive distributed hardware-accelerated LOD-sprite terrain rendering with stable frame rates
NASA Astrophysics Data System (ADS)
Swan, J. E., II; Arango, Jesus; Nakshatrala, Bala K.
2002-03-01
A stable frame rate is important for interactive rendering systems. Image-based modeling and rendering (IBMR) techniques, which model parts of the scene with image sprites, are a promising technique for interactive systems because they allow the sprite to be manipulated instead of the underlying scene geometry. However, with IBMR techniques a frequent problem is an unstable frame rate, because generating an image sprite (with 3D rendering) is time-consuming relative to manipulating the sprite (with 2D image resampling). This paper describes one solution to this problem, by distributing an IBMR technique into a collection of cooperating threads and executable programs across two computers. The particular IBMR technique distributed here is the LOD-Sprite algorithm. This technique uses a multiple level-of-detail (LOD) scene representation. It first renders a keyframe from a high-LOD representation, and then caches the frame as an image sprite. It renders subsequent spriteframes by texture-mapping the cached image sprite into a lower-LOD representation. We describe a distributed architecture and implementation of LOD-Sprite, in the context of terrain rendering, which takes advantage of graphics hardware. We present timing results which indicate we have achieved a stable frame rate. In addition to LOD-Sprite, our distribution method holds promise for other IBMR techniques.
A knowledge-based machine vision system for space station automation
NASA Technical Reports Server (NTRS)
Chipman, Laure J.; Ranganath, H. S.
1989-01-01
A simple knowledge-based approach to the recognition of objects in man-made scenes is being developed. Specifically, the system under development is a proposed enhancement to a robot arm for use in the space station laboratory module. The system will take a request from a user to find a specific object, and locate that object by using its camera input and information from a knowledge base describing the scene layout and attributes of the object types included in the scene. In order to use realistic test images in developing the system, researchers are using photographs of actual NASA simulator panels, which provide similar types of scenes to those expected in the space station environment. Figure 1 shows one of these photographs. In traditional approaches to image analysis, the image is transformed step by step into a symbolic representation of the scene. Often the first steps of the transformation are done without any reference to knowledge of the scene or objects. Segmentation of an image into regions generally produces a counterintuitive result in which regions do not correspond to objects in the image. After segmentation, a merging procedure attempts to group regions into meaningful units that will more nearly correspond to objects. Here, researchers avoid segmenting the image as a whole, and instead use a knowledge-directed approach to locate objects in the scene. The knowledge-based approach to scene analysis is described and the categories of knowledge used in the system are discussed.
Pasqualotto, Achille; Esenkaya, Tayfun
2016-01-01
Visual-to-auditory sensory substitution is used to convey visual information through audition, and it was initially created to compensate for blindness; it consists of software converting the visual images captured by a video-camera into the equivalent auditory images, or "soundscapes". Here, it was used by blindfolded sighted participants to learn the spatial position of simple shapes depicted in images arranged on the floor. Very few studies have used sensory substitution to investigate spatial representation, while it has been widely used to investigate object recognition. Additionally, with sensory substitution we could study the performance of participants actively exploring the environment through audition, rather than passively localizing sound sources. Blindfolded participants egocentrically learnt the position of six images by using sensory substitution and then a judgment of relative direction task (JRD) was used to determine how this scene was represented. This task consists of imagining being in a given location, oriented in a given direction, and pointing towards the required image. Before performing the JRD task, participants explored a map that provided allocentric information about the scene. Although spatial exploration was egocentric, surprisingly we found that performance in the JRD task was better for allocentric perspectives. This suggests that the egocentric representation of the scene was updated. This result is in line with previous studies using visual and somatosensory scenes, thus supporting the notion that different sensory modalities produce equivalent spatial representation(s). Moreover, our results have practical implications to improve training methods with sensory substitution devices (SSD).
Teng, Santani
2017-01-01
In natural environments, visual and auditory stimulation elicit responses across a large set of brain regions in a fraction of a second, yielding representations of the multimodal scene and its properties. The rapid and complex neural dynamics underlying visual and auditory information processing pose major challenges to human cognitive neuroscience. Brain signals measured non-invasively are inherently noisy, the format of neural representations is unknown, and transformations between representations are complex and often nonlinear. Further, no single non-invasive brain measurement technique provides a spatio-temporally integrated view. In this opinion piece, we argue that progress can be made by a concerted effort based on three pillars of recent methodological development: (i) sensitive analysis techniques such as decoding and cross-classification, (ii) complex computational modelling using models such as deep neural networks, and (iii) integration across imaging methods (magnetoencephalography/electroencephalography, functional magnetic resonance imaging) and models, e.g. using representational similarity analysis. We showcase two recent efforts that have been undertaken in this spirit and provide novel results about visual and auditory scene analysis. Finally, we discuss the limits of this perspective and sketch a concrete roadmap for future research. This article is part of the themed issue ‘Auditory and visual scene analysis’. PMID:28044019
Cichy, Radoslaw Martin; Teng, Santani
2017-02-19
In natural environments, visual and auditory stimulation elicit responses across a large set of brain regions in a fraction of a second, yielding representations of the multimodal scene and its properties. The rapid and complex neural dynamics underlying visual and auditory information processing pose major challenges to human cognitive neuroscience. Brain signals measured non-invasively are inherently noisy, the format of neural representations is unknown, and transformations between representations are complex and often nonlinear. Further, no single non-invasive brain measurement technique provides a spatio-temporally integrated view. In this opinion piece, we argue that progress can be made by a concerted effort based on three pillars of recent methodological development: (i) sensitive analysis techniques such as decoding and cross-classification, (ii) complex computational modelling using models such as deep neural networks, and (iii) integration across imaging methods (magnetoencephalography/electroencephalography, functional magnetic resonance imaging) and models, e.g. using representational similarity analysis. We showcase two recent efforts that have been undertaken in this spirit and provide novel results about visual and auditory scene analysis. Finally, we discuss the limits of this perspective and sketch a concrete roadmap for future research.This article is part of the themed issue 'Auditory and visual scene analysis'. © 2017 The Authors.
Robust algebraic image enhancement for intelligent control systems
NASA Technical Reports Server (NTRS)
Lerner, Bao-Ting; Morrelli, Michael
1993-01-01
Robust vision capability for intelligent control systems has been an elusive goal in image processing. The computationally intensive techniques a necessary for conventional image processing make real-time applications, such as object tracking and collision avoidance difficult. In order to endow an intelligent control system with the needed vision robustness, an adequate image enhancement subsystem capable of compensating for the wide variety of real-world degradations, must exist between the image capturing and the object recognition subsystems. This enhancement stage must be adaptive and must operate with consistency in the presence of both statistical and shape-based noise. To deal with this problem, we have developed an innovative algebraic approach which provides a sound mathematical framework for image representation and manipulation. Our image model provides a natural platform from which to pursue dynamic scene analysis, and its incorporation into a vision system would serve as the front-end to an intelligent control system. We have developed a unique polynomial representation of gray level imagery and applied this representation to develop polynomial operators on complex gray level scenes. This approach is highly advantageous since polynomials can be manipulated very easily, and are readily understood, thus providing a very convenient environment for image processing. Our model presents a highly structured and compact algebraic representation of grey-level images which can be viewed as fuzzy sets.
Correlated Topic Vector for Scene Classification.
Wei, Pengxu; Qin, Fei; Wan, Fang; Zhu, Yi; Jiao, Jianbin; Ye, Qixiang
2017-07-01
Scene images usually involve semantic correlations, particularly when considering large-scale image data sets. This paper proposes a novel generative image representation, correlated topic vector, to model such semantic correlations. Oriented from the correlated topic model, correlated topic vector intends to naturally utilize the correlations among topics, which are seldom considered in the conventional feature encoding, e.g., Fisher vector, but do exist in scene images. It is expected that the involvement of correlations can increase the discriminative capability of the learned generative model and consequently improve the recognition accuracy. Incorporated with the Fisher kernel method, correlated topic vector inherits the advantages of Fisher vector. The contributions to the topics of visual words have been further employed by incorporating the Fisher kernel framework to indicate the differences among scenes. Combined with the deep convolutional neural network (CNN) features and Gibbs sampling solution, correlated topic vector shows great potential when processing large-scale and complex scene image data sets. Experiments on two scene image data sets demonstrate that correlated topic vector improves significantly the deep CNN features, and outperforms existing Fisher kernel-based features.
Pasqualotto, Achille; Esenkaya, Tayfun
2016-01-01
Visual-to-auditory sensory substitution is used to convey visual information through audition, and it was initially created to compensate for blindness; it consists of software converting the visual images captured by a video-camera into the equivalent auditory images, or “soundscapes”. Here, it was used by blindfolded sighted participants to learn the spatial position of simple shapes depicted in images arranged on the floor. Very few studies have used sensory substitution to investigate spatial representation, while it has been widely used to investigate object recognition. Additionally, with sensory substitution we could study the performance of participants actively exploring the environment through audition, rather than passively localizing sound sources. Blindfolded participants egocentrically learnt the position of six images by using sensory substitution and then a judgment of relative direction task (JRD) was used to determine how this scene was represented. This task consists of imagining being in a given location, oriented in a given direction, and pointing towards the required image. Before performing the JRD task, participants explored a map that provided allocentric information about the scene. Although spatial exploration was egocentric, surprisingly we found that performance in the JRD task was better for allocentric perspectives. This suggests that the egocentric representation of the scene was updated. This result is in line with previous studies using visual and somatosensory scenes, thus supporting the notion that different sensory modalities produce equivalent spatial representation(s). Moreover, our results have practical implications to improve training methods with sensory substitution devices (SSD). PMID:27148000
Three-dimensional model-based object recognition and segmentation in cluttered scenes.
Mian, Ajmal S; Bennamoun, Mohammed; Owens, Robyn
2006-10-01
Viewpoint independent recognition of free-form objects and their segmentation in the presence of clutter and occlusions is a challenging task. We present a novel 3D model-based algorithm which performs this task automatically and efficiently. A 3D model of an object is automatically constructed offline from its multiple unordered range images (views). These views are converted into multidimensional table representations (which we refer to as tensors). Correspondences are automatically established between these views by simultaneously matching the tensors of a view with those of the remaining views using a hash table-based voting scheme. This results in a graph of relative transformations used to register the views before they are integrated into a seamless 3D model. These models and their tensor representations constitute the model library. During online recognition, a tensor from the scene is simultaneously matched with those in the library by casting votes. Similarity measures are calculated for the model tensors which receive the most votes. The model with the highest similarity is transformed to the scene and, if it aligns accurately with an object in the scene, that object is declared as recognized and is segmented. This process is repeated until the scene is completely segmented. Experiments were performed on real and synthetic data comprised of 55 models and 610 scenes and an overall recognition rate of 95 percent was achieved. Comparison with the spin images revealed that our algorithm is superior in terms of recognition rate and efficiency.
Agricultural mapping using Support Vector Machine-Based Endmember Extraction (SVM-BEE)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archibald, Richard K; Filippi, Anthony M; Bhaduri, Budhendra L
Extracting endmembers from remotely sensed images of vegetated areas can present difficulties. In this research, we applied a recently developed endmember-extraction algorithm based on Support Vector Machines (SVMs) to the problem of semi-autonomous estimation of vegetation endmembers from a hyperspectral image. This algorithm, referred to as Support Vector Machine-Based Endmember Extraction (SVM-BEE), accurately and rapidly yields a computed representation of hyperspectral data that can accommodate multiple distributions. The number of distributions is identified without prior knowledge, based upon this representation. Prior work established that SVM-BEE is robustly noise-tolerant and can semi-automatically and effectively estimate endmembers; synthetic data and a geologicmore » scene were previously analyzed. Here we compared the efficacies of the SVM-BEE and N-FINDR algorithms in extracting endmembers from a predominantly agricultural scene. SVM-BEE was able to estimate vegetation and other endmembers for all classes in the image, which N-FINDR failed to do. Classifications based on SVM-BEE endmembers were markedly more accurate compared with those based on N-FINDR endmembers.« less
Predicting perceptual quality of images in realistic scenario using deep filter banks
NASA Astrophysics Data System (ADS)
Zhang, Weixia; Yan, Jia; Hu, Shiyong; Ma, Yang; Deng, Dexiang
2018-03-01
Classical image perceptual quality assessment models usually resort to natural scene statistic methods, which are based on an assumption that certain reliable statistical regularities hold on undistorted images and will be corrupted by introduced distortions. However, these models usually fail to accurately predict degradation severity of images in realistic scenarios since complex, multiple, and interactive authentic distortions usually appear on them. We propose a quality prediction model based on convolutional neural network. Quality-aware features extracted from filter banks of multiple convolutional layers are aggregated into the image representation. Furthermore, an easy-to-implement and effective feature selection strategy is used to further refine the image representation and finally a linear support vector regression model is trained to map image representation into images' subjective perceptual quality scores. The experimental results on benchmark databases present the effectiveness and generalizability of the proposed model.
Interactive 2D to 3D stereoscopic image synthesis
NASA Astrophysics Data System (ADS)
Feldman, Mark H.; Lipton, Lenny
2005-03-01
Advances in stereoscopic display technologies, graphic card devices, and digital imaging algorithms have opened up new possibilities in synthesizing stereoscopic images. The power of today"s DirectX/OpenGL optimized graphics cards together with adapting new and creative imaging tools found in software products such as Adobe Photoshop, provide a powerful environment for converting planar drawings and photographs into stereoscopic images. The basis for such a creative process is the focus of this paper. This article presents a novel technique, which uses advanced imaging features and custom Windows-based software that utilizes the Direct X 9 API to provide the user with an interactive stereo image synthesizer. By creating an accurate and interactive world scene with moveable and flexible depth map altered textured surfaces, perspective stereoscopic cameras with both visible frustums and zero parallax planes, a user can precisely model a virtual three-dimensional representation of a real-world scene. Current versions of Adobe Photoshop provide a creative user with a rich assortment of tools needed to highlight elements of a 2D image, simulate hidden areas, and creatively shape them for a 3D scene representation. The technique described has been implemented as a Photoshop plug-in and thus allows for a seamless transition of these 2D image elements into 3D surfaces, which are subsequently rendered to create stereoscopic views.
Chromatic information and feature detection in fast visual analysis
Del Viva, Maria M.; Punzi, Giovanni; Shevell, Steven K.; ...
2016-08-01
The visual system is able to recognize a scene based on a sketch made of very simple features. This ability is likely crucial for survival, when fast image recognition is necessary, and it is believed that a primal sketch is extracted very early in the visual processing. Such highly simplified representations can be sufficient for accurate object discrimination, but an open question is the role played by color in this process. Rich color information is available in natural scenes, yet artist's sketches are usually monochromatic; and, black-andwhite movies provide compelling representations of real world scenes. Also, the contrast sensitivity ofmore » color is low at fine spatial scales. We approach the question from the perspective of optimal information processing by a system endowed with limited computational resources. We show that when such limitations are taken into account, the intrinsic statistical properties of natural scenes imply that the most effective strategy is to ignore fine-scale color features and devote most of the bandwidth to gray-scale information. We find confirmation of these information-based predictions from psychophysics measurements of fast-viewing discrimination of natural scenes. As a result, we conclude that the lack of colored features in our visual representation, and our overall low sensitivity to high-frequency color components, are a consequence of an adaptation process, optimizing the size and power consumption of our brain for the visual world we live in.« less
Chromatic information and feature detection in fast visual analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Del Viva, Maria M.; Punzi, Giovanni; Shevell, Steven K.
The visual system is able to recognize a scene based on a sketch made of very simple features. This ability is likely crucial for survival, when fast image recognition is necessary, and it is believed that a primal sketch is extracted very early in the visual processing. Such highly simplified representations can be sufficient for accurate object discrimination, but an open question is the role played by color in this process. Rich color information is available in natural scenes, yet artist's sketches are usually monochromatic; and, black-andwhite movies provide compelling representations of real world scenes. Also, the contrast sensitivity ofmore » color is low at fine spatial scales. We approach the question from the perspective of optimal information processing by a system endowed with limited computational resources. We show that when such limitations are taken into account, the intrinsic statistical properties of natural scenes imply that the most effective strategy is to ignore fine-scale color features and devote most of the bandwidth to gray-scale information. We find confirmation of these information-based predictions from psychophysics measurements of fast-viewing discrimination of natural scenes. As a result, we conclude that the lack of colored features in our visual representation, and our overall low sensitivity to high-frequency color components, are a consequence of an adaptation process, optimizing the size and power consumption of our brain for the visual world we live in.« less
A Wavelet Polarization Decomposition Net Model for Polarimetric SAR Image Classification
NASA Astrophysics Data System (ADS)
He, Chu; Ou, Dan; Yang, Teng; Wu, Kun; Liao, Mingsheng; Chen, Erxue
2014-11-01
In this paper, a deep model based on wavelet texture has been proposed for Polarimetric Synthetic Aperture Radar (PolSAR) image classification inspired by recent successful deep learning method. Our model is supposed to learn powerful and informative representations to improve the generalization ability for the complex scene classification tasks. Given the influence of speckle noise in Polarimetric SAR image, wavelet polarization decomposition is applied first to obtain basic and discriminative texture features which are then embedded into a Deep Neural Network (DNN) in order to compose multi-layer higher representations. We demonstrate that the model can produce a powerful representation which can capture some untraceable information from Polarimetric SAR images and show a promising achievement in comparison with other traditional SAR image classification methods for the SAR image dataset.
REKRIATE: A Knowledge Representation System for Object Recognition and Scene Interpretation
NASA Astrophysics Data System (ADS)
Meystel, Alexander M.; Bhasin, Sanjay; Chen, X.
1990-02-01
What humans actually observe and how they comprehend this information is complex due to Gestalt processes and interaction of context in predicting the course of thinking and enforcing one idea while repressing another. How we extract the knowledge from the scene, what we get from the scene indeed and what we bring from our mechanisms of perception are areas separated by a thin, ill-defined line. The purpose of this paper is to present a system for Representing Knowledge and Recognizing and Interpreting Attention Trailed Entities dubbed as REKRIATE. It will be used as a tool for discovering the underlying principles involved in knowledge representation required for conceptual learning. REKRIATE has some inherited knowledge and is given a vocabulary which is used to form rules for identification of the object. It has various modalities of sensing and has the ability to measure the distance between the objects in the image as well as the similarity between different images of presumably the same object. All sensations received from matrix of different sensors put into an adequate form. The methodology proposed is applicable to not only the pictorial or visual world representation, but to any sensing modality. It is based upon the two premises: a) inseparability of all domains of the world representation including linguistic, as well as those formed by various sensor modalities. and b) representativity of the object at several levels of resolution simultaneously.
NASA Astrophysics Data System (ADS)
Wu, Wei; Zhao, Dewei; Zhang, Huan
2015-12-01
Super-resolution image reconstruction is an effective method to improve the image quality. It has important research significance in the field of image processing. However, the choice of the dictionary directly affects the efficiency of image reconstruction. A sparse representation theory is introduced into the problem of the nearest neighbor selection. Based on the sparse representation of super-resolution image reconstruction method, a super-resolution image reconstruction algorithm based on multi-class dictionary is analyzed. This method avoids the redundancy problem of only training a hyper complete dictionary, and makes the sub-dictionary more representatives, and then replaces the traditional Euclidean distance computing method to improve the quality of the whole image reconstruction. In addition, the ill-posed problem is introduced into non-local self-similarity regularization. Experimental results show that the algorithm is much better results than state-of-the-art algorithm in terms of both PSNR and visual perception.
[Visual representation of natural scenes in flicker changes].
Nakashima, Ryoichi; Yokosawa, Kazuhiko
2010-08-01
Coherence theory in scene perception (Rensink, 2002) assumes the retention of volatile object representations on which attention is not focused. On the other hand, visual memory theory in scene perception (Hollingworth & Henderson, 2002) assumes that robust object representations are retained. In this study, we hypothesized that the difference between these two theories is derived from the difference of the experimental tasks that they are based on. In order to verify this hypothesis, we examined the properties of visual representation by using a change detection and memory task in a flicker paradigm. We measured the representations when participants were instructed to search for a change in a scene, and compared them with the intentional memory representations. The visual representations were retained in visual long-term memory even in the flicker paradigm, and were as robust as the intentional memory representations. However, the results indicate that the representations are unavailable for explicitly localizing a scene change, but are available for answering the recognition test. This suggests that coherence theory and visual memory theory are compatible.
Using eye movements to explore mental representations of space.
Fourtassi, Maryam; Rode, Gilles; Pisella, Laure
2017-06-01
Visual mental imagery is a cognitive experience characterised by the activation of the mental representation of an object or scene in the absence of the corresponding stimulus. According to the analogical theory, mental representations have a pictorial nature that preserves the spatial characteristics of the environment that is mentally represented. This cognitive experience shares many similarities with the experience of visual perception, including eye movements. The mental visualisation of a scene is accompanied by eye movements that reflect the spatial content of the mental image, and which can mirror the deformations of this mental image with respect to the real image, such as asymmetries or size reduction. The present article offers a concise overview of the main theories explaining the interactions between eye movements and mental representations, with some examples of the studies supporting them. It also aims to explain how ocular-tracking could be a useful tool in exploring the dynamics of spatial mental representations, especially in pathological situations where these representations can be altered, for instance in unilateral spatial neglect. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Statistical regularities of art images and natural scenes: spectra, sparseness and nonlinearities.
Graham, Daniel J; Field, David J
2007-01-01
Paintings are the product of a process that begins with ordinary vision in the natural world and ends with manipulation of pigments on canvas. Because artists must produce images that can be seen by a visual system that is thought to take advantage of statistical regularities in natural scenes, artists are likely to replicate many of these regularities in their painted art. We have tested this notion by computing basic statistical properties and modeled cell response properties for a large set of digitized paintings and natural scenes. We find that both representational and non-representational (abstract) paintings from our sample (124 images) show basic similarities to a sample of natural scenes in terms of their spatial frequency amplitude spectra, but the paintings and natural scenes show significantly different mean amplitude spectrum slopes. We also find that the intensity distributions of paintings show a lower skewness and sparseness than natural scenes. We account for this by considering the range of luminances found in the environment compared to the range available in the medium of paint. A painting's range is limited by the reflective properties of its materials. We argue that artists do not simply scale the intensity range down but use a compressive nonlinearity. In our studies, modeled retinal and cortical filter responses to the images were less sparse for the paintings than for the natural scenes. But when a compressive nonlinearity was applied to the images, both the paintings' sparseness and the modeled responses to the paintings showed the same or greater sparseness compared to the natural scenes. This suggests that artists achieve some degree of nonlinear compression in their paintings. Because paintings have captivated humans for millennia, finding basic statistical regularities in paintings' spatial structure could grant insights into the range of spatial patterns that humans find compelling.
Cortical Representations of Speech in a Multitalker Auditory Scene.
Puvvada, Krishna C; Simon, Jonathan Z
2017-09-20
The ability to parse a complex auditory scene into perceptual objects is facilitated by a hierarchical auditory system. Successive stages in the hierarchy transform an auditory scene of multiple overlapping sources, from peripheral tonotopically based representations in the auditory nerve, into perceptually distinct auditory-object-based representations in the auditory cortex. Here, using magnetoencephalography recordings from men and women, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in distinct hierarchical stages of the auditory cortex. Using systems-theoretic methods of stimulus reconstruction, we show that the primary-like areas in the auditory cortex contain dominantly spectrotemporal-based representations of the entire auditory scene. Here, both attended and ignored speech streams are represented with almost equal fidelity, and a global representation of the full auditory scene with all its streams is a better candidate neural representation than that of individual streams being represented separately. We also show that higher-order auditory cortical areas, by contrast, represent the attended stream separately and with significantly higher fidelity than unattended streams. Furthermore, the unattended background streams are more faithfully represented as a single unsegregated background object rather than as separated objects. Together, these findings demonstrate the progression of the representations and processing of a complex acoustic scene up through the hierarchy of the human auditory cortex. SIGNIFICANCE STATEMENT Using magnetoencephalography recordings from human listeners in a simulated cocktail party environment, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in separate hierarchical stages of the auditory cortex. We show that the primary-like areas in the auditory cortex use a dominantly spectrotemporal-based representation of the entire auditory scene, with both attended and unattended speech streams represented with almost equal fidelity. We also show that higher-order auditory cortical areas, by contrast, represent an attended speech stream separately from, and with significantly higher fidelity than, unattended speech streams. Furthermore, the unattended background streams are represented as a single undivided background object rather than as distinct background objects. Copyright © 2017 the authors 0270-6474/17/379189-08$15.00/0.
Multi-Sensor Scene Synthesis and Analysis
1981-09-01
Quad Trees for Image Representation and Processing ...... ... 126 2.6.2 Databases ..... ..... ... ..... ... ..... ..... 138 2.6.2.1 Definitions and...Basic Concepts ....... 138 2.6.3 Use of Databases in Hierarchical Scene Analysis ...... ... ..................... 147 2.6.4 Use of Relational Tables...Multisensor Image Database Systems (MIDAS) . 161 2.7.2 Relational Database System for Pictures .... ..... 168 2.7.3 Relational Pictorial Database
Infrared small target detection in heavy sky scene clutter based on sparse representation
NASA Astrophysics Data System (ADS)
Liu, Depeng; Li, Zhengzhou; Liu, Bing; Chen, Wenhao; Liu, Tianmei; Cao, Lei
2017-09-01
A novel infrared small target detection method based on sky clutter and target sparse representation is proposed in this paper to cope with the representing uncertainty of clutter and target. The sky scene background clutter is described by fractal random field, and it is perceived and eliminated via the sparse representation on fractal background over-complete dictionary (FBOD). The infrared small target signal is simulated by generalized Gaussian intensity model, and it is expressed by the generalized Gaussian target over-complete dictionary (GGTOD), which could describe small target more efficiently than traditional structured dictionaries. Infrared image is decomposed on the union of FBOD and GGTOD, and the sparse representation energy that target signal and background clutter decomposed on GGTOD differ so distinctly that it is adopted to distinguish target from clutter. Some experiments are induced and the experimental results show that the proposed approach could improve the small target detection performance especially under heavy clutter for background clutter could be efficiently perceived and suppressed by FBOD and the changing target could also be represented accurately by GGTOD.
Content Representation in the Human Medial Temporal Lobe
Liang, Jackson C.; Wagner, Anthony D.
2013-01-01
Current theories of medial temporal lobe (MTL) function focus on event content as an important organizational principle that differentiates MTL subregions. Perirhinal and parahippocampal cortices may play content-specific roles in memory, whereas hippocampal processing is alternately hypothesized to be content specific or content general. Despite anatomical evidence for content-specific MTL pathways, empirical data for content-based MTL subregional dissociations are mixed. Here, we combined functional magnetic resonance imaging with multiple statistical approaches to characterize MTL subregional responses to different classes of novel event content (faces, scenes, spoken words, sounds, visual words). Univariate analyses revealed that responses to novel faces and scenes were distributed across the anterior–posterior axis of MTL cortex, with face responses distributed more anteriorly than scene responses. Moreover, multivariate pattern analyses of perirhinal and parahippocampal data revealed spatially organized representational codes for multiple content classes, including nonpreferred visual and auditory stimuli. In contrast, anterior hippocampal responses were content general, with less accurate overall pattern classification relative to MTL cortex. Finally, posterior hippocampal activation patterns consistently discriminated scenes more accurately than other forms of content. Collectively, our findings indicate differential contributions of MTL subregions to event representation via a distributed code along the anterior–posterior axis of MTL that depends on the nature of event content. PMID:22275474
Local spatial frequency analysis for computer vision
NASA Technical Reports Server (NTRS)
Krumm, John; Shafer, Steven A.
1990-01-01
A sense of vision is a prerequisite for a robot to function in an unstructured environment. However, real-world scenes contain many interacting phenomena that lead to complex images which are difficult to interpret automatically. Typical computer vision research proceeds by analyzing various effects in isolation (e.g., shading, texture, stereo, defocus), usually on images devoid of realistic complicating factors. This leads to specialized algorithms which fail on real-world images. Part of this failure is due to the dichotomy of useful representations for these phenomena. Some effects are best described in the spatial domain, while others are more naturally expressed in frequency. In order to resolve this dichotomy, we present the combined space/frequency representation which, for each point in an image, shows the spatial frequencies at that point. Within this common representation, we develop a set of simple, natural theories describing phenomena such as texture, shape, aliasing and lens parameters. We show these theories lead to algorithms for shape from texture and for dealiasing image data. The space/frequency representation should be a key aid in untangling the complex interaction of phenomena in images, allowing automatic understanding of real-world scenes.
Joint image restoration and location in visual navigation system
NASA Astrophysics Data System (ADS)
Wu, Yuefeng; Sang, Nong; Lin, Wei; Shao, Yuanjie
2018-02-01
Image location methods are the key technologies of visual navigation, most previous image location methods simply assume the ideal inputs without taking into account the real-world degradations (e.g. low resolution and blur). In view of such degradations, the conventional image location methods first perform image restoration and then match the restored image on the reference image. However, the defective output of the image restoration can affect the result of localization, by dealing with the restoration and location separately. In this paper, we present a joint image restoration and location (JRL) method, which utilizes the sparse representation prior to handle the challenging problem of low-quality image location. The sparse representation prior states that the degraded input image, if correctly restored, will have a good sparse representation in terms of the dictionary constructed from the reference image. By iteratively solving the image restoration in pursuit of the sparest representation, our method can achieve simultaneous restoration and location. Based on such a sparse representation prior, we demonstrate that the image restoration task and the location task can benefit greatly from each other. Extensive experiments on real scene images with Gaussian blur are carried out and our joint model outperforms the conventional methods of treating the two tasks independently.
Target recognition for ladar range image using slice image
NASA Astrophysics Data System (ADS)
Xia, Wenze; Han, Shaokun; Wang, Liang
2015-12-01
A shape descriptor and a complete shape-based recognition system using slice images as geometric feature descriptor for ladar range images are introduced. A slice image is a two-dimensional image generated by three-dimensional Hough transform and the corresponding mathematical transformation. The system consists of two processes, the model library construction and recognition. In the model library construction process, a series of range images are obtained after the model object is sampled at preset attitude angles. Then, all the range images are converted into slice images. The number of slice images is reduced by clustering analysis and finding a representation to reduce the size of the model library. In the recognition process, the slice image of the scene is compared with the slice image in the model library. The recognition results depend on the comparison. Simulated ladar range images are used to analyze the recognition and misjudgment rates, and comparison between the slice image representation method and moment invariants representation method is performed. The experimental results show that whether in conditions without noise or with ladar noise, the system has a high recognition rate and low misjudgment rate. The comparison experiment demonstrates that the slice image has better representation ability than moment invariants.
Lescroart, Mark D.; Stansbury, Dustin E.; Gallant, Jack L.
2015-01-01
Perception of natural visual scenes activates several functional areas in the human brain, including the Parahippocampal Place Area (PPA), Retrosplenial Complex (RSC), and the Occipital Place Area (OPA). It is currently unclear what specific scene-related features are represented in these areas. Previous studies have suggested that PPA, RSC, and/or OPA might represent at least three qualitatively different classes of features: (1) 2D features related to Fourier power; (2) 3D spatial features such as the distance to objects in a scene; or (3) abstract features such as the categories of objects in a scene. To determine which of these hypotheses best describes the visual representation in scene-selective areas, we applied voxel-wise modeling (VM) to BOLD fMRI responses elicited by a set of 1386 images of natural scenes. VM provides an efficient method for testing competing hypotheses by comparing predictions of brain activity based on encoding models that instantiate each hypothesis. Here we evaluated three different encoding models that instantiate each of the three hypotheses listed above. We used linear regression to fit each encoding model to the fMRI data recorded from each voxel, and we evaluated each fit model by estimating the amount of variance it predicted in a withheld portion of the data set. We found that voxel-wise models based on Fourier power or the subjective distance to objects in each scene predicted much of the variance predicted by a model based on object categories. Furthermore, the response variance explained by these three models is largely shared, and the individual models explain little unique variance in responses. Based on an evaluation of previous studies and the data we present here, we conclude that there is currently no good basis to favor any one of the three alternative hypotheses about visual representation in scene-selective areas. We offer suggestions for further studies that may help resolve this issue. PMID:26594164
NASA Astrophysics Data System (ADS)
Fan, Jiayuan; Tan, Hui Li; Toomik, Maria; Lu, Shijian
2016-10-01
Spatial pyramid matching has demonstrated its power for image recognition task by pooling features from spatially increasingly fine sub-regions. Motivated by the concept of feature pooling at multiple pyramid levels, we propose a novel spectral-spatial hyperspectral image classification approach using superpixel-based spatial pyramid representation. This technique first generates multiple superpixel maps by decreasing the superpixel number gradually along with the increased spatial regions for labelled samples. By using every superpixel map, sparse representation of pixels within every spatial region is then computed through local max pooling. Finally, features learned from training samples are aggregated and trained by a support vector machine (SVM) classifier. The proposed spectral-spatial hyperspectral image classification technique has been evaluated on two public hyperspectral datasets, including the Indian Pines image containing 16 different agricultural scene categories with a 20m resolution acquired by AVIRIS and the University of Pavia image containing 9 land-use categories with a 1.3m spatial resolution acquired by the ROSIS-03 sensor. Experimental results show significantly improved performance compared with the state-of-the-art works. The major contributions of this proposed technique include (1) a new spectral-spatial classification approach to generate feature representation for hyperspectral image, (2) a complementary yet effective feature pooling approach, i.e. the superpixel-based spatial pyramid representation that is used for the spatial correlation study, (3) evaluation on two public hyperspectral image datasets with superior image classification performance.
NASA Astrophysics Data System (ADS)
Qi, K.; Qingfeng, G.
2017-12-01
With the popular use of High-Resolution Satellite (HRS) images, more and more research efforts have been placed on land-use scene classification. However, it makes the task difficult with HRS images for the complex background and multiple land-cover classes or objects. This article presents a multiscale deeply described correlaton model for land-use scene classification. Specifically, the convolutional neural network is introduced to learn and characterize the local features at different scales. Then, learnt multiscale deep features are explored to generate visual words. The spatial arrangement of visual words is achieved through the introduction of adaptive vector quantized correlograms at different scales. Experiments on two publicly available land-use scene datasets demonstrate that the proposed model is compact and yet discriminative for efficient representation of land-use scene images, and achieves competitive classification results with the state-of-art methods.
Stages as models of scene geometry.
Nedović, Vladimir; Smeulders, Arnold W M; Redert, André; Geusebroek, Jan-Mark
2010-09-01
Reconstruction of 3D scene geometry is an important element for scene understanding, autonomous vehicle and robot navigation, image retrieval, and 3D television. We propose accounting for the inherent structure of the visual world when trying to solve the scene reconstruction problem. Consequently, we identify geometric scene categorization as the first step toward robust and efficient depth estimation from single images. We introduce 15 typical 3D scene geometries called stages, each with a unique depth profile, which roughly correspond to a large majority of broadcast video frames. Stage information serves as a first approximation of global depth, narrowing down the search space in depth estimation and object localization. We propose different sets of low-level features for depth estimation, and perform stage classification on two diverse data sets of television broadcasts. Classification results demonstrate that stages can often be efficiently learned from low-dimensional image representations.
Behavioral and Neural Representations of Spatial Directions across Words, Schemas, and Images.
Weisberg, Steven M; Marchette, Steven A; Chatterjee, Anjan
2018-05-23
Modern spatial navigation requires fluency with multiple representational formats, including visual scenes, signs, and words. These formats convey different information. Visual scenes are rich and specific but contain extraneous details. Arrows, as an example of signs, are schematic representations in which the extraneous details are eliminated, but analog spatial properties are preserved. Words eliminate all spatial information and convey spatial directions in a purely abstract form. How does the human brain compute spatial directions within and across these formats? To investigate this question, we conducted two experiments on men and women: a behavioral study that was preregistered and a neuroimaging study using multivoxel pattern analysis of fMRI data to uncover similarities and differences among representational formats. Participants in the behavioral study viewed spatial directions presented as images, schemas, or words (e.g., "left"), and responded to each trial, indicating whether the spatial direction was the same or different as the one viewed previously. They responded more quickly to schemas and words than images, despite the visual complexity of stimuli being matched. Participants in the fMRI study performed the same task but responded only to occasional catch trials. Spatial directions in images were decodable in the intraparietal sulcus bilaterally but were not in schemas and words. Spatial directions were also decodable between all three formats. These results suggest that intraparietal sulcus plays a role in calculating spatial directions in visual scenes, but this neural circuitry may be bypassed when the spatial directions are presented as schemas or words. SIGNIFICANCE STATEMENT Human navigators encounter spatial directions in various formats: words ("turn left"), schematic signs (an arrow showing a left turn), and visual scenes (a road turning left). The brain must transform these spatial directions into a plan for action. Here, we investigate similarities and differences between neural representations of these formats. We found that bilateral intraparietal sulci represent spatial directions in visual scenes and across the three formats. We also found that participants respond quickest to schemas, then words, then images, suggesting that spatial directions in abstract formats are easier to interpret than concrete formats. These results support a model of spatial direction interpretation in which spatial directions are either computed for real world action or computed for efficient visual comparison. Copyright © 2018 the authors 0270-6474/18/384996-12$15.00/0.
Interactive Scene Analysis Module - A sensor-database fusion system for telerobotic environments
NASA Technical Reports Server (NTRS)
Cooper, Eric G.; Vazquez, Sixto L.; Goode, Plesent W.
1992-01-01
Accomplishing a task with telerobotics typically involves a combination of operator control/supervision and a 'script' of preprogrammed commands. These commands usually assume that the location of various objects in the task space conform to some internal representation (database) of that task space. The ability to quickly and accurately verify the task environment against the internal database would improve the robustness of these preprogrammed commands. In addition, the on-line initialization and maintenance of a task space database is difficult for operators using Cartesian coordinates alone. This paper describes the Interactive Scene' Analysis Module (ISAM) developed to provide taskspace database initialization and verification utilizing 3-D graphic overlay modelling, video imaging, and laser radar based range imaging. Through the fusion of taskspace database information and image sensor data, a verifiable taskspace model is generated providing location and orientation data for objects in a task space. This paper also describes applications of the ISAM in the Intelligent Systems Research Laboratory (ISRL) at NASA Langley Research Center, and discusses its performance relative to representation accuracy and operator interface efficiency.
Global ensemble texture representations are critical to rapid scene perception.
Brady, Timothy F; Shafer-Skelton, Anna; Alvarez, George A
2017-06-01
Traditionally, recognizing the objects within a scene has been treated as a prerequisite to recognizing the scene itself. However, research now suggests that the ability to rapidly recognize visual scenes could be supported by global properties of the scene itself rather than the objects within the scene. Here, we argue for a particular instantiation of this view: That scenes are recognized by treating them as a global texture and processing the pattern of orientations and spatial frequencies across different areas of the scene without recognizing any objects. To test this model, we asked whether there is a link between how proficient individuals are at rapid scene perception and how proficiently they represent simple spatial patterns of orientation information (global ensemble texture). We find a significant and selective correlation between these tasks, suggesting a link between scene perception and spatial ensemble tasks but not nonspatial summary statistics In a second and third experiment, we additionally show that global ensemble texture information is not only associated with scene recognition, but that preserving only global ensemble texture information from scenes is sufficient to support rapid scene perception; however, preserving the same information is not sufficient for object recognition. Thus, global ensemble texture alone is sufficient to allow activation of scene representations but not object representations. Together, these results provide evidence for a view of scene recognition based on global ensemble texture rather than a view based purely on objects or on nonspatially localized global properties. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Does scene context always facilitate retrieval of visual object representations?
Nakashima, Ryoichi; Yokosawa, Kazuhiko
2011-04-01
An object-to-scene binding hypothesis maintains that visual object representations are stored as part of a larger scene representation or scene context, and that scene context facilitates retrieval of object representations (see, e.g., Hollingworth, Journal of Experimental Psychology: Learning, Memory and Cognition, 32, 58-69, 2006). Support for this hypothesis comes from data using an intentional memory task. In the present study, we examined whether scene context always facilitates retrieval of visual object representations. In two experiments, we investigated whether the scene context facilitates retrieval of object representations, using a new paradigm in which a memory task is appended to a repeated-flicker change detection task. Results indicated that in normal scene viewing, in which many simultaneous objects appear, scene context facilitation of the retrieval of object representations-henceforth termed object-to-scene binding-occurred only when the observer was required to retain much information for a task (i.e., an intentional memory task).
Intrinsic dimensionality predicts the saliency of natural dynamic scenes.
Vig, Eleonora; Dorr, Michael; Martinetz, Thomas; Barth, Erhardt
2012-06-01
Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.
Plenoptic image watermarking to preserve copyright
NASA Astrophysics Data System (ADS)
Ansari, A.; Dorado, A.; Saavedra, G.; Martinez Corral, M.
2017-05-01
Common camera loses a huge amount of information obtainable from scene as it does not record the value of individual rays passing a point and it merely keeps the summation of intensities of all the rays passing a point. Plenoptic images can be exploited to provide a 3D representation of the scene and watermarking such images can be helpful to protect the ownership of these images. In this paper we propose a method for watermarking the plenoptic images to achieve this aim. The performance of the proposed method is validated by experimental results and a compromise is held between imperceptibility and robustness.
Generating Text from Functional Brain Images
Pereira, Francisco; Detre, Greg; Botvinick, Matthew
2011-01-01
Recent work has shown that it is possible to take brain images acquired during viewing of a scene and reconstruct an approximation of the scene from those images. Here we show that it is also possible to generate text about the mental content reflected in brain images. We began with images collected as participants read names of concrete items (e.g., “Apartment’’) while also seeing line drawings of the item named. We built a model of the mental semantic representation of concrete concepts from text data and learned to map aspects of such representation to patterns of activation in the corresponding brain image. In order to validate this mapping, without accessing information about the items viewed for left-out individual brain images, we were able to generate from each one a collection of semantically pertinent words (e.g., “door,” “window” for “Apartment’’). Furthermore, we show that the ability to generate such words allows us to perform a classification task and thus validate our method quantitatively. PMID:21927602
A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification
Liu, Fuxian
2018-01-01
One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification. A well-designed feature representation method and classifier can improve classification accuracy. In this paper, we construct a new two-stream deep architecture for aerial scene classification. First, we use two pretrained convolutional neural networks (CNNs) as feature extractor to learn deep features from the original aerial image and the processed aerial image through saliency detection, respectively. Second, two feature fusion strategies are adopted to fuse the two different types of deep convolutional features extracted by the original RGB stream and the saliency stream. Finally, we use the extreme learning machine (ELM) classifier for final classification with the fused features. The effectiveness of the proposed architecture is tested on four challenging datasets: UC-Merced dataset with 21 scene categories, WHU-RS dataset with 19 scene categories, AID dataset with 30 scene categories, and NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that our architecture gets a significant classification accuracy improvement over all state-of-the-art references. PMID:29581722
A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification.
Yu, Yunlong; Liu, Fuxian
2018-01-01
One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification. A well-designed feature representation method and classifier can improve classification accuracy. In this paper, we construct a new two-stream deep architecture for aerial scene classification. First, we use two pretrained convolutional neural networks (CNNs) as feature extractor to learn deep features from the original aerial image and the processed aerial image through saliency detection, respectively. Second, two feature fusion strategies are adopted to fuse the two different types of deep convolutional features extracted by the original RGB stream and the saliency stream. Finally, we use the extreme learning machine (ELM) classifier for final classification with the fused features. The effectiveness of the proposed architecture is tested on four challenging datasets: UC-Merced dataset with 21 scene categories, WHU-RS dataset with 19 scene categories, AID dataset with 30 scene categories, and NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that our architecture gets a significant classification accuracy improvement over all state-of-the-art references.
CHAMP: a locally adaptive unmixing-based hyperspectral anomaly detection algorithm
NASA Astrophysics Data System (ADS)
Crist, Eric P.; Thelen, Brian J.; Carrara, David A.
1998-10-01
Anomaly detection offers a means by which to identify potentially important objects in a scene without prior knowledge of their spectral signatures. As such, this approach is less sensitive to variations in target class composition, atmospheric and illumination conditions, and sensor gain settings than would be a spectral matched filter or similar algorithm. The best existing anomaly detectors generally fall into one of two categories: those based on local Gaussian statistics, and those based on linear mixing moles. Unmixing-based approaches better represent the real distribution of data in a scene, but are typically derived and applied on a global or scene-wide basis. Locally adaptive approaches allow detection of more subtle anomalies by accommodating the spatial non-homogeneity of background classes in a typical scene, but provide a poorer representation of the true underlying background distribution. The CHAMP algorithm combines the best attributes of both approaches, applying a linear-mixing model approach in a spatially adaptive manner. The algorithm itself, and teste results on simulated and actual hyperspectral image data, are presented in this paper.
System and method for extracting dominant orientations from a scene
Straub, Julian; Rosman, Guy; Freifeld, Oren; Leonard, John J.; Fisher, III; , John W.
2017-05-30
In one embodiment, a method of identifying the dominant orientations of a scene comprises representing a scene as a plurality of directional vectors. The scene may comprise a three-dimensional representation of a scene, and the plurality of directional vectors may comprise a plurality of surface normals. The method further comprises determining, based on the plurality of directional vectors, a plurality of orientations describing the scene. The determined plurality of orientations explains the directionality of the plurality of directional vectors. In certain embodiments, the plurality of orientations may have independent axes of rotation. The plurality of orientations may be determined by representing the plurality of directional vectors as lying on a mathematical representation of a sphere, and inferring the parameters of a statistical model to adapt the plurality of orientations to explain the positioning of the plurality of directional vectors lying on the mathematical representation of the sphere.
The importance of context: evidence that contextual representations increase intrusive memories.
Pearson, David G; Ross, Fiona D C; Webster, Victoria L
2012-03-01
Intrusive memories appear to enter consciousness via involuntary rather than deliberate recollection. Some clinical accounts of PTSD seek to explain this phenomenon by making a clear distinction between the encoding of sensory-based and contextual representations. Contextual representations have been claimed to actively reduce intrusions by anchoring encoded perceptual data for an event in memory. The current analogue trauma study examined this hypothesis by manipulating contextual information independently from encoded sensory-perceptual information. Participants' viewed images selected from the International Affective Picture System that depicted scenes of violence and bodily injury. Images were viewed either under neutral conditions or paired with contextual information. Two experiments revealed a significant increase in memory intrusions for images paired with contextual information in comparison to the same images viewed under neutral conditions. In contrast to the observed increase in intrusion frequency there was no effect of contextual representations on voluntary memory for the images. The vividness and emotionality of memory intrusions were also unaffected. The analogue trauma paradigm may fail to replicate the effect of extreme stress on encoding postulated to occur during PTSD. These findings question the assertion that intrusive memories develop from a lack of integration between sensory-based and contextual representations in memory. Instead it is argued contextual representations play a causal role in increasing the frequency of intrusions by increasing the sensitivity of memory to involuntary retrieval by associated internal and external cues. Copyright © 2011 Elsevier Ltd. All rights reserved.
Cant, Jonathan S; Xu, Yaoda
2017-02-01
Our visual system can extract summary statistics from large collections of objects without forming detailed representations of the individual objects in the ensemble. In a region in ventral visual cortex encompassing the collateral sulcus and the parahippocampal gyrus and overlapping extensively with the scene-selective parahippocampal place area (PPA), we have previously reported fMRI adaptation to object ensembles when ensemble statistics repeated, even when local image features differed across images (e.g., two different images of the same strawberry pile). We additionally showed that this ensemble representation is similar to (but still distinct from) how visual texture patterns are processed in this region and is not explained by appealing to differences in the color of the elements that make up the ensemble. To further explore the nature of ensemble representation in this brain region, here we used PPA as our ROI and investigated in detail how the shape and surface properties (i.e., both texture and color) of the individual objects constituting an ensemble affect the ensemble representation in anterior-medial ventral visual cortex. We photographed object ensembles of stone beads that varied in shape and surface properties. A given ensemble always contained beads of the same shape and surface properties (e.g., an ensemble of star-shaped rose quartz beads). A change to the shape and/or surface properties of all the beads in an ensemble resulted in a significant release from adaptation in PPA compared with conditions in which no ensemble feature changed. In contrast, in the object-sensitive lateral occipital area (LO), we only observed a significant release from adaptation when the shape of the ensemble elements varied, and found no significant results in additional scene-sensitive regions, namely, the retrosplenial complex and occipital place area. Together, these results demonstrate that the shape and surface properties of the individual objects comprising an ensemble both contribute significantly to object ensemble representation in anterior-medial ventral visual cortex and further demonstrate a functional dissociation between object- (LO) and scene-selective (PPA) visual cortical regions and within the broader scene-processing network itself.
Kuhl, Brice A.; Rissman, Jesse; Wagner, Anthony D.
2012-01-01
Successful encoding of episodic memories is thought to depend on contributions from prefrontal and temporal lobe structures. Neural processes that contribute to successful encoding have been extensively explored through univariate analyses of neuroimaging data that compare mean activity levels elicited during the encoding of events that are subsequently remembered vs. those subsequently forgotten. Here, we applied pattern classification to fMRI data to assess the degree to which distributed patterns of activity within prefrontal and temporal lobe structures elicited during the encoding of word-image pairs were diagnostic of the visual category (Face or Scene) of the encoded image. We then assessed whether representation of category information was predictive of subsequent memory. Classification analyses indicated that temporal lobe structures contained information robustly diagnostic of visual category. Information in prefrontal cortex was less diagnostic of visual category, but was nonetheless associated with highly reliable classifier-based evidence for category representation. Critically, trials associated with greater classifier-based estimates of category representation in temporal and prefrontal regions were associated with a higher probability of subsequent remembering. Finally, consideration of trial-by-trial variance in classifier-based measures of category representation revealed positive correlations between prefrontal and temporal lobe representations, with the strength of these correlations varying as a function of the category of image being encoded. Together, these results indicate that multi-voxel representations of encoded information can provide unique insights into how visual experiences are transformed into episodic memories. PMID:21925190
Representation, Modeling and Recognition of Outdoor Scenes
1994-04-01
B. C. Vemuri and R . Malladi . Deformable models: Canonical parameters for surface representation and multiple view integration. In Conference on...or a high disparity gradient. If both L- R and R -L disparity images are made available, then mirror images of this pattern may be sought in the two...et at., 1991, Terzopoulos and Vasilescu, 1991, Vemuri and Malladi , 1991], parameterized surfaces [Stokely and Wu, 1992, Lowe, 1991], local surfaces
A 360-degree floating 3D display based on light field regeneration.
Xia, Xinxing; Liu, Xu; Li, Haifeng; Zheng, Zhenrong; Wang, Han; Peng, Yifan; Shen, Weidong
2013-05-06
Using light field reconstruction technique, we can display a floating 3D scene in the air, which is 360-degree surrounding viewable with correct occlusion effect. A high-frame-rate color projector and flat light field scanning screen are used in the system to create the light field of real 3D scene in the air above the spinning screen. The principle and display performance of this approach are investigated in this paper. The image synthesis method for all the surrounding viewpoints is analyzed, and the 3D spatial resolution and angular resolution of the common display zone are employed to evaluate display performance. The prototype is achieved and the real 3D color animation image has been presented vividly. The experimental results verified the representability of this method.
High compression image and image sequence coding
NASA Technical Reports Server (NTRS)
Kunt, Murat
1989-01-01
The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.
NASA Technical Reports Server (NTRS)
Langevin, Maurice L. (Inventor); Moynihan, Philip I. (Inventor)
2000-01-01
An optical-to-tactile translator provides an aid for the visually impaired by translating a near-field scene to a tactile signal corresponding to said near-field scene. An optical sensor using a plurality of active pixel sensors (APS) converts the optical image within the near-field scene to a digital signal. The digital signal is then processed by a microprocessor and a simple shape signal is generated based on the digital signal. The shape signal is then communicated to a tactile transmitter where the shape signal is converted into a tactile signal using a series of contacts. The shape signal may be an outline of the significant shapes determined in the near-field scene, or the shape signal may comprise a simple symbolic representation of common items encountered repeatedly. The user is thus made aware of the unseen near-field scene, including potential obstacles and dangers, through a series of tactile contacts. In a preferred embodiment, a range determining device such as those commonly found on auto-focusing cameras is included to limit the distance that the optical sensor interprets the near-field scene.
Polarimetric SAR image classification based on discriminative dictionary learning model
NASA Astrophysics Data System (ADS)
Sang, Cheng Wei; Sun, Hong
2018-03-01
Polarimetric SAR (PolSAR) image classification is one of the important applications of PolSAR remote sensing. It is a difficult high-dimension nonlinear mapping problem, the sparse representations based on learning overcomplete dictionary have shown great potential to solve such problem. The overcomplete dictionary plays an important role in PolSAR image classification, however for PolSAR image complex scenes, features shared by different classes will weaken the discrimination of learned dictionary, so as to degrade classification performance. In this paper, we propose a novel overcomplete dictionary learning model to enhance the discrimination of dictionary. The learned overcomplete dictionary by the proposed model is more discriminative and very suitable for PolSAR classification.
An image understanding system using attributed symbolic representation and inexact graph-matching
NASA Astrophysics Data System (ADS)
Eshera, M. A.; Fu, K.-S.
1986-09-01
A powerful image understanding system using a semantic-syntactic representation scheme consisting of attributed relational graphs (ARGs) is proposed for the analysis of the global information content of images. A multilayer graph transducer scheme performs the extraction of ARG representations from images, with ARG nodes representing the global image features, and the relations between features represented by the attributed branches between corresponding nodes. An efficient dynamic programming technique is employed to derive the distance between two ARGs and the inexact matching of their respective components. Noise, distortion and ambiguity in real-world images are handled through modeling in the transducer mapping rules and through the appropriate cost of error-transformation for the inexact matching of the representation. The system is demonstrated for the case of locating objects in a scene composed of complex overlapped objects, and the case of target detection in noisy and distorted synthetic aperture radar image.
Image correlation method for DNA sequence alignment.
Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván
2012-01-01
The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.
Mizuhara, Hiroaki; Sato, Naoyuki; Yamaguchi, Yoko
2015-05-01
Neural oscillations are crucial for revealing dynamic cortical networks and for serving as a possible mechanism of inter-cortical communication, especially in association with mnemonic function. The interplay of the slow and fast oscillations might dynamically coordinate the mnemonic cortical circuits to rehearse stored items during working memory retention. We recorded simultaneous EEG-fMRI during a working memory task involving a natural scene to verify whether the cortical networks emerge with the neural oscillations for memory of the natural scene. The slow EEG power was enhanced in association with the better accuracy of working memory retention, and accompanied cortical activities in the mnemonic circuits for the natural scene. Fast oscillation showed a phase-amplitude coupling to the slow oscillation, and its power was tightly coupled with the cortical activities for representing the visual images of natural scenes. The mnemonic cortical circuit with the slow neural oscillations would rehearse the distributed natural scene representations with the fast oscillation for working memory retention. The coincidence of the natural scene representations could be obtained by the slow oscillation phase to create a coherent whole of the natural scene in the working memory. Copyright © 2015 Elsevier Inc. All rights reserved.
Hdr Imaging for Feature Detection on Detailed Architectural Scenes
NASA Astrophysics Data System (ADS)
Kontogianni, G.; Stathopoulou, E. K.; Georgopoulos, A.; Doulamis, A.
2015-02-01
3D reconstruction relies on accurate detection, extraction, description and matching of image features. This is even truer for complex architectural scenes that pose needs for 3D models of high quality, without any loss of detail in geometry or color. Illumination conditions influence the radiometric quality of images, as standard sensors cannot depict properly a wide range of intensities in the same scene. Indeed, overexposed or underexposed pixels cause irreplaceable information loss and degrade digital representation. Images taken under extreme lighting environments may be thus prohibitive for feature detection/extraction and consequently for matching and 3D reconstruction. High Dynamic Range (HDR) images could be helpful for these operators because they broaden the limits of illumination range that Standard or Low Dynamic Range (SDR/LDR) images can capture and increase in this way the amount of details contained in the image. Experimental results of this study prove this assumption as they examine state of the art feature detectors applied both on standard dynamic range and HDR images.
Computer Vision Research and Its Applications to Automated Cartography
1984-09-01
reflecting from scene surfaces, and the film and digitization processes that result in the computer representation of the image. These models, when...alone. Specifically, intepretations that are in some sense "orthogonal" are preferred. A method for finding such interpretations for right-angle...saturated colors are not precisely representable and the colors recorded with different films or cameras may differ, but the tricomponent representation is t
Generating descriptive visual words and visual phrases for large-scale image applications.
Zhang, Shiliang; Tian, Qi; Hua, Gang; Huang, Qingming; Gao, Wen
2011-09-01
Bag-of-visual Words (BoWs) representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the text words. Notwithstanding its great success and wide adoption, visual vocabulary created from single-image local descriptors is often shown to be not as effective as desired. In this paper, descriptive visual words (DVWs) and descriptive visual phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. Since images are the carriers of visual objects and scenes, a descriptive visual element set can be composed by the visual words and their combinations which are effective in representing certain visual objects or scenes. Based on this idea, a general framework is proposed for generating DVWs and DVPs for image applications. In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain objects or scenes are identified and collected as the DVWs and DVPs. Experiments show that the DVWs and DVPs are informative and descriptive and, thus, are more comparable with the text words than the classic visual words. We apply the identified DVWs and DVPs in several applications including large-scale near-duplicated image retrieval, image search re-ranking, and object recognition. The combination of DVW and DVP performs better than the state of the art in large-scale near-duplicated image retrieval in terms of accuracy, efficiency and memory consumption. The proposed image search re-ranking algorithm: DWPRank outperforms the state-of-the-art algorithm by 12.4% in mean average precision and about 11 times faster in efficiency.
Sadeghi, Zahra; McClelland, James L; Hoffman, Paul
2015-09-01
An influential position in lexical semantics holds that semantic representations for words can be derived through analysis of patterns of lexical co-occurrence in large language corpora. Firth (1957) famously summarised this principle as "you shall know a word by the company it keeps". We explored whether the same principle could be applied to non-verbal patterns of object co-occurrence in natural scenes. We performed latent semantic analysis (LSA) on a set of photographed scenes in which all of the objects present had been manually labelled. This resulted in a representation of objects in a high-dimensional space in which similarity between two objects indicated the degree to which they appeared in similar scenes. These representations revealed similarities among objects belonging to the same taxonomic category (e.g., items of clothing) as well as cross-category associations (e.g., between fruits and kitchen utensils). We also compared representations generated from this scene dataset with two established methods for elucidating semantic representations: (a) a published database of semantic features generated verbally by participants and (b) LSA applied to a linguistic corpus in the usual fashion. Statistical comparisons of the three methods indicated significant association between the structures revealed by each method, with the scene dataset displaying greater convergence with feature-based representations than did LSA applied to linguistic data. The results indicate that information about the conceptual significance of objects can be extracted from their patterns of co-occurrence in natural environments, opening the possibility for such data to be incorporated into existing models of conceptual representation. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Number of perceptually distinct surface colors in natural scenes.
Marín-Franch, Iván; Foster, David H
2010-09-30
The ability to perceptually identify distinct surfaces in natural scenes by virtue of their color depends not only on the relative frequency of surface colors but also on the probabilistic nature of observer judgments. Previous methods of estimating the number of discriminable surface colors, whether based on theoretical color gamuts or recorded from real scenes, have taken a deterministic approach. Thus, a three-dimensional representation of the gamut of colors is divided into elementary cells or points which are spaced at one discrimination-threshold unit intervals and which are then counted. In this study, information-theoretic methods were used to take into account both differing surface-color frequencies and observer response uncertainty. Spectral radiances were calculated from 50 hyperspectral images of natural scenes and were represented in a perceptually almost uniform color space. The average number of perceptually distinct surface colors was estimated as 7.3 × 10(3), much smaller than that based on counting methods. This number is also much smaller than the number of distinct points in a scene that are, in principle, available for reliable identification under illuminant changes, suggesting that color constancy, or the lack of it, does not generally determine the limit on the use of color for surface identification.
Correspondence Search Mitigation Using Feature Space Anti-Aliasing
2007-01-01
trackers are widely used in astro -inertial nav- igation systems for long-range aircraft, space navigation, and ICBM guidance. When ground images are to be...frequency domain representation of the point spread function, H( fx , fy), is called the optical transfer function. Applying the Fourier transform to the...frequency domain representation of the image: I( fx , fy, t) = O( fx , fy, t)H( fx , fy) (4) In most conditions, the projected scene can be treated as a
Wilkinson, Krista M; Stutzman, Allyson; Seisler, Andrea
2015-03-01
Augmentative and alternative communication (AAC) systems are often implemented for individuals whose speech cannot meet their full communication needs. One type of aided display is called a Visual Scene Display (VSD). VSDs consist of integrated scenes (such as photographs) in which language concepts are embedded. Often, the representations of concepts on VSDs are perceptually similar to their referents. Given this physical resemblance, one may ask how well VSDs support development of symbolic functioning. We used brain imaging techniques to examine whether matches and mismatches between the content of spoken messages and photographic images of scenes evoke neural activity similar to activity that occurs to spoken or written words. Electroencephalography (EEG) was recorded from 15 college students who were shown photographs paired with spoken phrases that were either matched or mismatched to the concepts embedded within each photograph. Of interest was the N400 component, a negative deflecting wave 400 ms post-stimulus that is considered to be an index of semantic functioning. An N400 response in the mismatched condition (but not the matched) would replicate brain responses to traditional linguistic symbols. An N400 was found, exclusively in the mismatched condition, suggesting that mismatches between spoken messages and VSD-type representations set the stage for the N400 in ways similar to traditional linguistic symbols.
Deciding what is possible and impossible following hippocampal damage in humans.
McCormick, Cornelia; Rosenthal, Clive R; Miller, Thomas D; Maguire, Eleanor A
2017-03-01
There is currently much debate about whether the precise role of the hippocampus in scene processing is predominantly constructive, perceptual, or mnemonic. Here, we developed a novel experimental paradigm designed to control for general perceptual and mnemonic demands, thus enabling us to specifically vary the requirement for constructive processing. We tested the ability of patients with selective bilateral hippocampal damage and matched control participants to detect either semantic (e.g., an elephant with butterflies for ears) or constructive (e.g., an endless staircase) violations in realistic images of scenes. Thus, scenes could be semantically or constructively 'possible' or 'impossible'. Importantly, general perceptual and memory requirements were similar for both types of scene. We found that the patients performed comparably to control participants when deciding whether scenes were semantically possible or impossible, but were selectively impaired at judging if scenes were constructively possible or impossible. Post-task debriefing indicated that control participants constructed flexible mental representations of the scenes in order to make constructive judgements, whereas the patients were more constrained and typically focused on specific fragments of the scenes, with little indication of having constructed internal scene models. These results suggest that one contribution the hippocampus makes to scene processing is to construct internal representations of spatially coherent scenes, which may be vital for modelling the world during both perception and memory recall. © 2016 The Authors. Hippocampus Published by Wiley Periodicals, Inc. © 2016 The Authors. Hippocampus Published by Wiley Periodicals, Inc.
Image analysis by integration of disparate information
NASA Technical Reports Server (NTRS)
Lemoigne, Jacqueline
1993-01-01
Image analysis often starts with some preliminary segmentation which provides a representation of the scene needed for further interpretation. Segmentation can be performed in several ways, which are categorized as pixel based, edge-based, and region-based. Each of these approaches are affected differently by various factors, and the final result may be improved by integrating several or all of these methods, thus taking advantage of their complementary nature. In this paper, we propose an approach that integrates pixel-based and edge-based results by utilizing an iterative relaxation technique. This approach has been implemented on a massively parallel computer and tested on some remotely sensed imagery from the Landsat-Thematic Mapper (TM) sensor.
NASA Astrophysics Data System (ADS)
Moorhead, Ian R.; Gilmore, Marilyn A.; Houlbrook, Alexander W.; Oxford, David E.; Filbee, David R.; Stroud, Colin A.; Hutchings, G.; Kirk, Albert
2001-09-01
Assessment of camouflage, concealment, and deception (CCD) methodologies is not a trivial problem; conventionally the only method has been to carry out field trials, which are both expensive and subject to the vagaries of the weather. In recent years computing power has increased, such that there are now many research programs using synthetic environments for CCD assessments. Such an approach is attractive; the user has complete control over the environmental parameters and many more scenarios can be investigated. The UK Ministry of Defence is currently developing a synthetic scene generation tool for assessing the effectiveness of air vehicle camouflage schemes. The software is sufficiently flexible to allow it to be used in a broader range of applications, including full CCD assessment. The synthetic scene simulation system (CAMEO- SIM) has been developed, as an extensible system, to provide imagery within the 0.4 to 14 micrometers spectral band with as high a physical fidelity as possible. it consists of a scene design tool, an image generator, that incorporates both radiosity and ray-tracing process, and an experimental trials tool. The scene design tool allows the user to develop a 3D representation of the scenario of interest from a fixed viewpoint. Target(s) of interest can be placed anywhere within this 3D representation and may be either static or moving. Different illumination conditions and effects of the atmosphere can be modeled together with directional reflectance effects. The user has complete control over the level of fidelity of the final image. The output from the rendering tool is a sequence of radiance maps, which may be used by sensor models or for experimental trials in which observers carry out target acquisition tasks. The software also maintains an audit trail of all data selected to generate a particular image, both in terms of material properties used and the rendering options chosen. A range of verification tests has shown that the software computes the correct values for analytically tractable scenarios. Validation test using simple scenes have also been undertaken. More complex validation tests using observer trials are planned. The current version of CAMEO-SIM and how its images are used for camouflage assessment is described. The verification and validation tests undertaken are discussed. In addition, example images will be used to demonstrate the significance of different effects, such as spectral rendering and shadows. Planned developments of CAMEO-SIM are also outlined.
Reconstruction of noisy and blurred images using blur kernel
NASA Astrophysics Data System (ADS)
Ellappan, Vijayan; Chopra, Vishal
2017-11-01
Blur is a common in so many digital images. Blur can be caused by motion of the camera and scene object. In this work we proposed a new method for deblurring images. This work uses sparse representation to identify the blur kernel. By analyzing the image coordinates Using coarse and fine, we fetch the kernel based image coordinates and according to that observation we get the motion angle of the shaken or blurred image. Then we calculate the length of the motion kernel using radon transformation and Fourier for the length calculation of the image and we use Lucy Richardson algorithm which is also called NON-Blind(NBID) Algorithm for more clean and less noisy image output. All these operation will be performed in MATLAB IDE.
NASA Astrophysics Data System (ADS)
Madokoro, H.; Yamanashi, A.; Sato, K.
2013-08-01
This paper presents an unsupervised scene classification method for actualizing semantic recognition of indoor scenes. Background and foreground features are respectively extracted using Gist and color scale-invariant feature transform (SIFT) as feature representations based on context. We used hue, saturation, and value SIFT (HSV-SIFT) because of its simple algorithm with low calculation costs. Our method creates bags of features for voting visual words created from both feature descriptors to a two-dimensional histogram. Moreover, our method generates labels as candidates of categories for time-series images while maintaining stability and plasticity together. Automatic labeling of category maps can be realized using labels created using adaptive resonance theory (ART) as teaching signals for counter propagation networks (CPNs). We evaluated our method for semantic scene classification using KTH's image database for robot localization (KTH-IDOL), which is popularly used for robot localization and navigation. The mean classification accuracies of Gist, gray SIFT, one class support vector machines (OC-SVM), position-invariant robust features (PIRF), and our method are, respectively, 39.7, 58.0, 56.0, 63.6, and 79.4%. The result of our method is 15.8% higher than that of PIRF. Moreover, we applied our method for fine classification using our original mobile robot. We obtained mean classification accuracy of 83.2% for six zones.
Is moral beauty different from facial beauty? Evidence from an fMRI study
Wang, Tingting; Mo, Ce; Tan, Li Hai; Cant, Jonathan S.; Zhong, Luojin; Cupchik, Gerald
2015-01-01
Is moral beauty different from facial beauty? Two functional magnetic resonance imaging experiments were performed to answer this question. Experiment 1 investigated the network of moral aesthetic judgments and facial aesthetic judgments. Participants performed aesthetic judgments and gender judgments on both faces and scenes containing moral acts. The conjunction analysis of the contrasts ‘facial aesthetic judgment > facial gender judgment’ and ‘scene moral aesthetic judgment > scene gender judgment’ identified the common involvement of the orbitofrontal cortex (OFC), inferior temporal gyrus and medial superior frontal gyrus, suggesting that both types of aesthetic judgments are based on the orchestration of perceptual, emotional and cognitive components. Experiment 2 examined the network of facial beauty and moral beauty during implicit perception. Participants performed a non-aesthetic judgment task on both faces (beautiful vs common) and scenes (containing morally beautiful vs neutral information). We observed that facial beauty (beautiful faces > common faces) involved both the cortical reward region OFC and the subcortical reward region putamen, whereas moral beauty (moral beauty scenes > moral neutral scenes) only involved the OFC. Moreover, compared with facial beauty, moral beauty spanned a larger-scale cortical network, indicating more advanced and complex cerebral representations characterizing moral beauty. PMID:25298010
Common and Innovative Visuals: A sparsity modeling framework for video.
Abdolhosseini Moghadam, Abdolreza; Kumar, Mrityunjay; Radha, Hayder
2014-05-02
Efficient video representation models are critical for many video analysis and processing tasks. In this paper, we present a framework based on the concept of finding the sparsest solution to model video frames. To model the spatio-temporal information, frames from one scene are decomposed into two components: (i) a common frame, which describes the visual information common to all the frames in the scene/segment, and (ii) a set of innovative frames, which depicts the dynamic behaviour of the scene. The proposed approach exploits and builds on recent results in the field of compressed sensing to jointly estimate the common frame and the innovative frames for each video segment. We refer to the proposed modeling framework by CIV (Common and Innovative Visuals). We show how the proposed model can be utilized to find scene change boundaries and extend CIV to videos from multiple scenes. Furthermore, the proposed model is robust to noise and can be used for various video processing applications without relying on motion estimation and detection or image segmentation. Results for object tracking, video editing (object removal, inpainting) and scene change detection are presented to demonstrate the efficiency and the performance of the proposed model.
Neural representations of contextual guidance in visual search of real-world scenes.
Preston, Tim J; Guo, Fei; Das, Koel; Giesbrecht, Barry; Eckstein, Miguel P
2013-05-01
Exploiting scene context and object-object co-occurrence is critical in guiding eye movements and facilitating visual search, yet the mediating neural mechanisms are unknown. We used functional magnetic resonance imaging while observers searched for target objects in scenes and used multivariate pattern analyses (MVPA) to show that the lateral occipital complex (LOC) can predict the coarse spatial location of observers' expectations about the likely location of 213 different targets absent from the scenes. In addition, we found weaker but significant representations of context location in an area related to the orienting of attention (intraparietal sulcus, IPS) as well as a region related to scene processing (retrosplenial cortex, RSC). Importantly, the degree of agreement among 100 independent raters about the likely location to contain a target object in a scene correlated with LOC's ability to predict the contextual location while weaker but significant effects were found in IPS, RSC, the human motion area, and early visual areas (V1, V3v). When contextual information was made irrelevant to observers' behavioral task, the MVPA analysis of LOC and the other areas' activity ceased to predict the location of context. Thus, our findings suggest that the likely locations of targets in scenes are represented in various visual areas with LOC playing a key role in contextual guidance during visual search of objects in real scenes.
Blind subjects construct conscious mental images of visual scenes encoded in musical form.
Cronly-Dillon, J; Persaud, K C; Blore, R
2000-01-01
Blind (previously sighted) subjects are able to analyse, describe and graphically represent a number of high-contrast visual images translated into musical form de novo. We presented musical transforms of a random assortment of photographic images of objects and urban scenes to such subjects, a few of which depicted architectural and other landmarks that may be useful in navigating a route to a particular destination. Our blind subjects were able to use the sound representation to construct a conscious mental image that was revealed by their ability to depict a visual target by drawing it. We noted the similarity between the way the visual system integrates information from successive fixations to form a representation that is stable across eye movements and the way a succession of image frames (encoded in sound) which depict different portions of the image are integrated to form a seamless mental image. Finally, we discuss the profound resemblance between the way a professional musician carries out a structural analysis of a musical composition in order to relate its structure to the perception of musical form and the strategies used by our blind subjects in isolating structural features that collectively reveal the identity of visual form. PMID:11413637
Mental Layout Extrapolations Prime Spatial Processing of Scenes
ERIC Educational Resources Information Center
Gottesman, Carmela V.
2011-01-01
Four experiments examined whether scene processing is facilitated by layout representation, including layout that was not perceived but could be predicted based on a previous partial view (boundary extension). In a priming paradigm (after Sanocki, 2003), participants judged objects' distances in photographs. In Experiment 1, full scenes (target),…
Greene, Michelle R; Baldassano, Christopher; Fei-Fei, Li; Beck, Diane M; Baker, Chris I
2018-01-01
Inherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized a priori through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN). In contrast, similarity of cortical responses in scene-selective areas was uniquely explained by mid- and high-level DNN features only, while an object label model did not contribute uniquely to either domain. The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information. PMID:29513219
Groen, Iris Ia; Greene, Michelle R; Baldassano, Christopher; Fei-Fei, Li; Beck, Diane M; Baker, Chris I
2018-03-07
Inherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized a priori through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN). In contrast, similarity of cortical responses in scene-selective areas was uniquely explained by mid- and high-level DNN features only, while an object label model did not contribute uniquely to either domain. The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information.
NASA Astrophysics Data System (ADS)
Luo, Chang; Wang, Jie; Feng, Gang; Xu, Suhui; Wang, Shiqiang
2017-10-01
Deep convolutional neural networks (CNNs) have been widely used to obtain high-level representation in various computer vision tasks. However, for remote scene classification, there are not sufficient images to train a very deep CNN from scratch. From two viewpoints of generalization power, we propose two promising kinds of deep CNNs for remote scenes and try to find whether deep CNNs need to be deep for remote scene classification. First, we transfer successful pretrained deep CNNs to remote scenes based on the theory that depth of CNNs brings the generalization power by learning available hypothesis for finite data samples. Second, according to the opposite viewpoint that generalization power of deep CNNs comes from massive memorization and shallow CNNs with enough neural nodes have perfect finite sample expressivity, we design a lightweight deep CNN (LDCNN) for remote scene classification. With five well-known pretrained deep CNNs, experimental results on two independent remote-sensing datasets demonstrate that transferred deep CNNs can achieve state-of-the-art results in an unsupervised setting. However, because of its shallow architecture, LDCNN cannot obtain satisfactory performance, regardless of whether in an unsupervised, semisupervised, or supervised setting. CNNs really need depth to obtain general features for remote scenes. This paper also provides baseline for applying deep CNNs to other remote sensing tasks.
Kumar, Manoj; Federmeier, Kara D; Fei-Fei, Li; Beck, Diane M
2017-07-15
A long-standing core question in cognitive science is whether different modalities and representation types (pictures, words, sounds, etc.) access a common store of semantic information. Although different input types have been shown to activate a shared network of brain regions, this does not necessitate that there is a common representation, as the neurons in these regions could still differentially process the different modalities. However, multi-voxel pattern analysis can be used to assess whether, e.g., pictures and words evoke a similar pattern of activity, such that the patterns that separate categories in one modality transfer to the other. Prior work using this method has found support for a common code, but has two limitations: they have either only examined disparate categories (e.g. animals vs. tools) that are known to activate different brain regions, raising the possibility that the pattern separation and inferred similarity reflects only large scale differences between the categories or they have been limited to individual object representations. By using natural scene categories, we not only extend the current literature on cross-modal representations beyond objects, but also, because natural scene categories activate a common set of brain regions, we identify a more fine-grained (i.e. higher spatial resolution) common representation. Specifically, we studied picture- and word-based representations of natural scene stimuli from four different categories: beaches, cities, highways, and mountains. Participants passively viewed blocks of either phrases (e.g. "sandy beach") describing scenes or photographs from those same scene categories. To determine whether the phrases and pictures evoke a common code, we asked whether a classifier trained on one stimulus type (e.g. phrase stimuli) would transfer (i.e. cross-decode) to the other stimulus type (e.g. picture stimuli). The analysis revealed cross-decoding in the occipitotemporal, posterior parietal and frontal cortices. This similarity of neural activity patterns across the two input types, for categories that co-activate local brain regions, provides strong evidence of a common semantic code for pictures and words in the brain. Copyright © 2017 Elsevier Inc. All rights reserved.
The new generation of OpenGL support in ROOT
NASA Astrophysics Data System (ADS)
Tadel, M.
2008-07-01
OpenGL has been promoted to become the main 3D rendering engine of the ROOT framework. This required a major re-modularization of OpenGL support on all levels, from basic window-system specific interface to medium-level object-representation and top-level scene management. This new architecture allows seamless integration of external scene-graph libraries into the ROOT OpenGL viewer as well as inclusion of ROOT 3D scenes into external GUI and OpenGL-based 3D-rendering frameworks. Scene representation was removed from inside of the viewer, allowing scene-data to be shared among several viewers and providing for a natural implementation of multi-view canvas layouts. The object-graph traversal infrastructure allows free mixing of 3D and 2D-pad graphics and makes implementation of ROOT canvas in pure OpenGL possible. Scene-elements representing ROOT objects trigger automatic instantiation of user-provided rendering-objects based on the dictionary information and class-naming convention. Additionally, a finer, per-object control over scene-updates is available to the user, allowing overhead-free maintenance of dynamic 3D scenes and creation of complex real-time animations. User-input handling was modularized as well, making it easy to support application-specific scene navigation, selection handling and tool management.
Hyperspectral image denoising and anomaly detection based on low-rank and sparse representations
NASA Astrophysics Data System (ADS)
Zhuang, Lina; Gao, Lianru; Zhang, Bing; Bioucas-Dias, José M.
2017-10-01
The very high spectral resolution of Hyperspectral Images (HSIs) enables the identification of materials with subtle differences and the extraction subpixel information. However, the increasing of spectral resolution often implies an increasing in the noise linked with the image formation process. This degradation mechanism limits the quality of extracted information and its potential applications. Since HSIs represent natural scenes and their spectral channels are highly correlated, they are characterized by a high level of self-similarity and are well approximated by low-rank representations. These characteristic underlies the state-of-the-art in HSI denoising. However, in presence of rare pixels, the denoising performance of those methods is not optimal and, in addition, it may compromise the future detection of those pixels. To address these hurdles, we introduce RhyDe (Robust hyperspectral Denoising), a powerful HSI denoiser, which implements explicit low-rank representation, promotes self-similarity, and, by using a form of collaborative sparsity, preserves rare pixels. The denoising and detection effectiveness of the proposed robust HSI denoiser is illustrated using semi-real data.
Recognition of 3-D Scene with Partially Occluded Objects
NASA Astrophysics Data System (ADS)
Lu, Siwei; Wong, Andrew K. C...
1987-03-01
This paper presents a robot vision system which is capable of recognizing objects in a 3-D scene and interpreting their spatial relation even though some objects in the scene may be partially occluded by other objects. An algorithm is developed to transform the geometric information from the range data into an attributed hypergraph representation (AHR). A hypergraph monomorphism algorithm is then used to compare the AHR of objects in the scene with a set of complete AHR's of prototypes. The capability of identifying connected components and interpreting various types of edges in the 3-D scene enables us to distinguish objects which are partially blocking each other in the scene. Using structural information stored in the primitive area graph, a heuristic hypergraph monomorphism algorithm provides an effective way for recognizing, locating, and interpreting partially occluded objects in the range image.
Riecke, Lars; Peters, Judith C; Valente, Giancarlo; Kemper, Valentin G; Formisano, Elia; Sorger, Bettina
2017-05-01
A sound of interest may be tracked amid other salient sounds by focusing attention on its characteristic features including its frequency. Functional magnetic resonance imaging findings have indicated that frequency representations in human primary auditory cortex (AC) contribute to this feat. However, attentional modulations were examined at relatively low spatial and spectral resolutions, and frequency-selective contributions outside the primary AC could not be established. To address these issues, we compared blood oxygenation level-dependent (BOLD) responses in the superior temporal cortex of human listeners while they identified single frequencies versus listened selectively for various frequencies within a multifrequency scene. Using best-frequency mapping, we observed that the detailed spatial layout of attention-induced BOLD response enhancements in primary AC follows the tonotopy of stimulus-driven frequency representations-analogous to the "spotlight" of attention enhancing visuospatial representations in retinotopic visual cortex. Moreover, using an algorithm trained to discriminate stimulus-driven frequency representations, we could successfully decode the focus of frequency-selective attention from listeners' BOLD response patterns in nonprimary AC. Our results indicate that the human brain facilitates selective listening to a frequency of interest in a scene by reinforcing the fine-grained activity pattern throughout the entire superior temporal cortex that would be evoked if that frequency was present alone. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Sparsity based target detection for compressive spectral imagery
NASA Astrophysics Data System (ADS)
Boada, David Alberto; Arguello Fuentes, Henry
2016-09-01
Hyperspectral imagery provides significant information about the spectral characteristics of objects and materials present in a scene. It enables object and feature detection, classification, or identification based on the acquired spectral characteristics. However, it relies on sophisticated acquisition and data processing systems able to acquire, process, store, and transmit hundreds or thousands of image bands from a given area of interest which demands enormous computational resources in terms of storage, computationm, and I/O throughputs. Specialized optical architectures have been developed for the compressed acquisition of spectral images using a reduced set of coded measurements contrary to traditional architectures that need a complete set of measurements of the data cube for image acquisition, dealing with the storage and acquisition limitations. Despite this improvement, if any processing is desired, the image has to be reconstructed by an inverse algorithm in order to be processed, which is also an expensive task. In this paper, a sparsity-based algorithm for target detection in compressed spectral images is presented. Specifically, the target detection model adapts a sparsity-based target detector to work in a compressive domain, modifying the sparse representation basis in the compressive sensing problem by means of over-complete training dictionaries and a wavelet basis representation. Simulations show that the presented method can achieve even better detection results than the state of the art methods.
NASA Astrophysics Data System (ADS)
Wong, Erwin
2000-03-01
Traditional methods of linear based imaging limits the viewer to a single fixed-point perspective. By means of a single lens multiple perspective mirror system, a 360-degree representation of the area around the camera is reconstructed. This reconstruction is used overcome the limitations of a traditional camera by providing the viewer with many different perspectives. By constructing the mirror into a hemispherical surface with multiple focal lengths at various diameters on the mirror, and by placing a parabolic mirror overhead, a stereoscopic image can be extracted from the image captured by a high-resolution camera placed beneath the mirror. Image extraction and correction is made by computer processing of the image obtained by camera; the image present up to five distinguishable different viewpoints that a computer can extrapolate pseudo- perspective data from. Geometric and depth for field can be extrapolated via comparison and isolation of objects within a virtual scene post processed by the computer. Combining data with scene rendering software provides the viewer with the ability to choose a desired viewing position, multiple dynamic perspectives, and virtually constructed perspectives based on minimal existing data. An examination into the workings of the mirror relay system is provided, including possible image extrapolation and correctional methods. Generation of data and virtual interpolated and constructed data is also mentioned.
NASA Astrophysics Data System (ADS)
da Silva, Nuno Pinho; Marques, Manuel; Carneiro, Gustavo; Costeira, João P.
2011-03-01
Painted tile panels (Azulejos) are one of the most representative Portuguese forms of art. Most of these panels are inspired on, and sometimes are literal copies of, famous paintings, or prints of those paintings. In order to study the Azulejos, art historians need to trace these roots. To do that they manually search art image databases, looking for images similar to the representation on the tile panel. This is an overwhelming task that should be automated as much as possible. Among several cues, the pose of humans and the general composition of people in a scene is quite discriminative. We build an image descriptor, combining the kinematic chain of each character, and contextual information about their composition, in the scene. Given a query image, our system computes its similarity profile over the database. Using nearest neighbors in the space of the descriptors, the proposed system retrieves the prints that most likely inspired the tiles' work.
Dependence of Adaptive Cross-correlation Algorithm Performance on the Extended Scene Image Quality
NASA Technical Reports Server (NTRS)
Sidick, Erkin
2008-01-01
Recently, we reported an adaptive cross-correlation (ACC) algorithm to estimate with high accuracy the shift as large as several pixels between two extended-scene sub-images captured by a Shack-Hartmann wavefront sensor. It determines the positions of all extended-scene image cells relative to a reference cell in the same frame using an FFT-based iterative image-shifting algorithm. It works with both point-source spot images as well as extended scene images. We have demonstrated previously based on some measured images that the ACC algorithm can determine image shifts with as high an accuracy as 0.01 pixel for shifts as large 3 pixels, and yield similar results for both point source spot images and extended scene images. The shift estimate accuracy of the ACC algorithm depends on illumination level, background, and scene content in addition to the amount of the shift between two image cells. In this paper we investigate how the performance of the ACC algorithm depends on the quality and the frequency content of extended scene images captured by a Shack-Hatmann camera. We also compare the performance of the ACC algorithm with those of several other approaches, and introduce a failsafe criterion for the ACC algorithm-based extended scene Shack-Hatmann sensors.
Generating virtual training samples for sparse representation of face images and face recognition
NASA Astrophysics Data System (ADS)
Du, Yong; Wang, Yu
2016-03-01
There are many challenges in face recognition. In real-world scenes, images of the same face vary with changing illuminations, different expressions and poses, multiform ornaments, or even altered mental status. Limited available training samples cannot convey these possible changes in the training phase sufficiently, and this has become one of the restrictions to improve the face recognition accuracy. In this article, we view the multiplication of two images of the face as a virtual face image to expand the training set and devise a representation-based method to perform face recognition. The generated virtual samples really reflect some possible appearance and pose variations of the face. By multiplying a training sample with another sample from the same subject, we can strengthen the facial contour feature and greatly suppress the noise. Thus, more human essential information is retained. Also, uncertainty of the training data is simultaneously reduced with the increase of the training samples, which is beneficial for the training phase. The devised representation-based classifier uses both the original and new generated samples to perform the classification. In the classification phase, we first determine K nearest training samples for the current test sample by calculating the Euclidean distances between the test sample and training samples. Then, a linear combination of these selected training samples is used to represent the test sample, and the representation result is used to classify the test sample. The experimental results show that the proposed method outperforms some state-of-the-art face recognition methods.
Dynamic Textures Modeling via Joint Video Dictionary Learning.
Wei, Xian; Li, Yuanxiang; Shen, Hao; Chen, Fang; Kleinsteuber, Martin; Wang, Zhongfeng
2017-04-06
Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DT) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying "states", we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
Automatic event recognition and anomaly detection with attribute grammar by learning scene semantics
NASA Astrophysics Data System (ADS)
Qi, Lin; Yao, Zhenyu; Li, Li; Dong, Junyu
2007-11-01
In this paper we present a novel framework for automatic event recognition and abnormal behavior detection with attribute grammar by learning scene semantics. This framework combines learning scene semantics by trajectory analysis and constructing attribute grammar-based event representation. The scene and event information is learned automatically. Abnormal behaviors that disobey scene semantics or event grammars rules are detected. By this method, an approach to understanding video scenes is achieved. Further more, with this prior knowledge, the accuracy of abnormal event detection is increased.
Is moral beauty different from facial beauty? Evidence from an fMRI study.
Wang, Tingting; Mo, Lei; Mo, Ce; Tan, Li Hai; Cant, Jonathan S; Zhong, Luojin; Cupchik, Gerald
2015-06-01
Is moral beauty different from facial beauty? Two functional magnetic resonance imaging experiments were performed to answer this question. Experiment 1 investigated the network of moral aesthetic judgments and facial aesthetic judgments. Participants performed aesthetic judgments and gender judgments on both faces and scenes containing moral acts. The conjunction analysis of the contrasts 'facial aesthetic judgment > facial gender judgment' and 'scene moral aesthetic judgment > scene gender judgment' identified the common involvement of the orbitofrontal cortex (OFC), inferior temporal gyrus and medial superior frontal gyrus, suggesting that both types of aesthetic judgments are based on the orchestration of perceptual, emotional and cognitive components. Experiment 2 examined the network of facial beauty and moral beauty during implicit perception. Participants performed a non-aesthetic judgment task on both faces (beautiful vs common) and scenes (containing morally beautiful vs neutral information). We observed that facial beauty (beautiful faces > common faces) involved both the cortical reward region OFC and the subcortical reward region putamen, whereas moral beauty (moral beauty scenes > moral neutral scenes) only involved the OFC. Moreover, compared with facial beauty, moral beauty spanned a larger-scale cortical network, indicating more advanced and complex cerebral representations characterizing moral beauty. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
A moving observer in a three-dimensional world
2016-01-01
For many tasks such as retrieving a previously viewed object, an observer must form a representation of the world at one location and use it at another. A world-based three-dimensional reconstruction of the scene built up from visual information would fulfil this requirement, something computer vision now achieves with great speed and accuracy. However, I argue that it is neither easy nor necessary for the brain to do this. I discuss biologically plausible alternatives, including the possibility of avoiding three-dimensional coordinate frames such as ego-centric and world-based representations. For example, the distance, slant and local shape of surfaces dictate the propensity of visual features to move in the image with respect to one another as the observer's perspective changes (through movement or binocular viewing). Such propensities can be stored without the need for three-dimensional reference frames. The problem of representing a stable scene in the face of continual head and eye movements is an appropriate starting place for understanding the goal of three-dimensional vision, more so, I argue, than the case of a static binocular observer. This article is part of the themed issue ‘Vision in our three-dimensional world’. PMID:27269608
The neural bases of spatial frequency processing during scene perception
Kauffmann, Louise; Ramanoël, Stephen; Peyrin, Carole
2014-01-01
Theories on visual perception agree that scenes are processed in terms of spatial frequencies. Low spatial frequencies (LSF) carry coarse information whereas high spatial frequencies (HSF) carry fine details of the scene. However, how and where spatial frequencies are processed within the brain remain unresolved questions. The present review addresses these issues and aims to identify the cerebral regions differentially involved in low and high spatial frequency processing, and to clarify their attributes during scene perception. Results from a number of behavioral and neuroimaging studies suggest that spatial frequency processing is lateralized in both hemispheres, with the right and left hemispheres predominantly involved in the categorization of LSF and HSF scenes, respectively. There is also evidence that spatial frequency processing is retinotopically mapped in the visual cortex. HSF scenes (as opposed to LSF) activate occipital areas in relation to foveal representations, while categorization of LSF scenes (as opposed to HSF) activates occipital areas in relation to more peripheral representations. Concomitantly, a number of studies have demonstrated that LSF information may reach high-order areas rapidly, allowing an initial coarse parsing of the visual scene, which could then be sent back through feedback into the occipito-temporal cortex to guide finer HSF-based analysis. Finally, the review addresses spatial frequency processing within scene-selective regions areas of the occipito-temporal cortex. PMID:24847226
Estimating pixel variances in the scenes of staring sensors
Simonson, Katherine M [Cedar Crest, NM; Ma, Tian J [Albuquerque, NM
2012-01-24
A technique for detecting changes in a scene perceived by a staring sensor is disclosed. The technique includes acquiring a reference image frame and a current image frame of a scene with the staring sensor. A raw difference frame is generated based upon differences between the reference image frame and the current image frame. Pixel error estimates are generated for each pixel in the raw difference frame based at least in part upon spatial error estimates related to spatial intensity gradients in the scene. The pixel error estimates are used to mitigate effects of camera jitter in the scene between the current image frame and the reference image frame.
Seek and you shall remember: Scene semantics interact with visual search to build better memories
Draschkow, Dejan; Wolfe, Jeremy M.; Võ, Melissa L.-H.
2014-01-01
Memorizing critical objects and their locations is an essential part of everyday life. In the present study, incidental encoding of objects in naturalistic scenes during search was compared to explicit memorization of those scenes. To investigate if prior knowledge of scene structure influences these two types of encoding differently, we used meaningless arrays of objects as well as objects in real-world, semantically meaningful images. Surprisingly, when participants were asked to recall scenes, their memory performance was markedly better for searched objects than for objects they had explicitly tried to memorize, even though participants in the search condition were not explicitly asked to memorize objects. This finding held true even when objects were observed for an equal amount of time in both conditions. Critically, the recall benefit for searched over memorized objects in scenes was eliminated when objects were presented on uniform, non-scene backgrounds rather than in a full scene context. Thus, scene semantics not only help us search for objects in naturalistic scenes, but appear to produce a representation that supports our memory for those objects beyond intentional memorization. PMID:25015385
Updating representations of learned scenes.
Finlay, Cory A; Motes, Michael A; Kozhevnikov, Maria
2007-05-01
Two experiments were designed to compare scene recognition reaction time (RT) and accuracy patterns following observer versus scene movement. In Experiment 1, participants memorized a scene from a single perspective. Then, either the scene was rotated or the participants moved (0 degrees -360 degrees in 36 degrees increments) around the scene, and participants judged whether the objects' positions had changed. Regardless of whether the scene was rotated or the observer moved, RT increased with greater angular distance between judged and encoded views. In Experiment 2, we varied the delay (0, 6, or 12 s) between scene encoding and locomotion. Regardless of the delay, however, accuracy decreased and RT increased with angular distance. Thus, our data show that observer movement does not necessarily update representations of spatial layouts and raise questions about the effects of duration limitations and encoding points of view on the automatic spatial updating of representations of scenes.
Deconstructing Visual Scenes in Cortex: Gradients of Object and Spatial Layout Information
Kravitz, Dwight J.; Baker, Chris I.
2013-01-01
Real-world visual scenes are complex cluttered, and heterogeneous stimuli engaging scene- and object-selective cortical regions including parahippocampal place area (PPA), retrosplenial complex (RSC), and lateral occipital complex (LOC). To understand the unique contribution of each region to distributed scene representations, we generated predictions based on a neuroanatomical framework adapted from monkey and tested them using minimal scenes in which we independently manipulated both spatial layout (open, closed, and gradient) and object content (furniture, e.g., bed, dresser). Commensurate with its strong connectivity with posterior parietal cortex, RSC evidenced strong spatial layout information but no object information, and its response was not even modulated by object presence. In contrast, LOC, which lies within the ventral visual pathway, contained strong object information but no background information. Finally, PPA, which is connected with both the dorsal and the ventral visual pathway, showed information about both objects and spatial backgrounds and was sensitive to the presence or absence of either. These results suggest that 1) LOC, PPA, and RSC have distinct representations, emphasizing different aspects of scenes, 2) the specific representations in each region are predictable from their patterns of connectivity, and 3) PPA combines both spatial layout and object information as predicted by connectivity. PMID:22473894
Infrared and visible image fusion with spectral graph wavelet transform.
Yan, Xiang; Qin, Hanlin; Li, Jia; Zhou, Huixin; Zong, Jing-guo
2015-09-01
Infrared and visible image fusion technique is a popular topic in image analysis because it can integrate complementary information and obtain reliable and accurate description of scenes. Multiscale transform theory as a signal representation method is widely used in image fusion. In this paper, a novel infrared and visible image fusion method is proposed based on spectral graph wavelet transform (SGWT) and bilateral filter. The main novelty of this study is that SGWT is used for image fusion. On the one hand, source images are decomposed by SGWT in its transform domain. The proposed approach not only effectively preserves the details of different source images, but also excellently represents the irregular areas of the source images. On the other hand, a novel weighted average method based on bilateral filter is proposed to fuse low- and high-frequency subbands by taking advantage of spatial consistency of natural images. Experimental results demonstrate that the proposed method outperforms seven recently proposed image fusion methods in terms of both visual effect and objective evaluation metrics.
Recognising the forest, but not the trees: an effect of colour on scene perception and recognition.
Nijboer, Tanja C W; Kanai, Ryota; de Haan, Edward H F; van der Smagt, Maarten J
2008-09-01
Colour has been shown to facilitate the recognition of scene images, but only when these images contain natural scenes, for which colour is 'diagnostic'. Here we investigate whether colour can also facilitate memory for scene images, and whether this would hold for natural scenes in particular. In the first experiment participants first studied a set of colour and greyscale natural and man-made scene images. Next, the same images were presented, randomly mixed with a different set. Participants were asked to indicate whether they had seen the images during the study phase. Surprisingly, performance was better for greyscale than for coloured images, and this difference is due to the higher false alarm rate for both natural and man-made coloured scenes. We hypothesized that this increase in false alarm rate was due to a shift from scrutinizing details of the image to recognition of the gist of the (coloured) image. A second experiment, utilizing images without a nameable gist, confirmed this hypothesis as participants now performed equally on greyscale and coloured images. In the final experiment we specifically targeted the more detail-based perception and recognition for greyscale images versus the more gist-based perception and recognition for coloured images with a change detection paradigm. The results show that changes to images are detected faster when image-pairs were presented in greyscale than in colour. This counterintuitive result held for both natural and man-made scenes (but not for scenes without nameable gist) and thus corroborates the shift from more detailed processing of images in greyscale to more gist-based processing of coloured images.
Robust infrared targets tracking with covariance matrix representation
NASA Astrophysics Data System (ADS)
Cheng, Jian
2009-07-01
Robust infrared target tracking is an important and challenging research topic in many military and security applications, such as infrared imaging guidance, infrared reconnaissance, scene surveillance, etc. To effectively tackle the nonlinear and non-Gaussian state estimation problems, particle filtering is introduced to construct the theory framework of infrared target tracking. Under this framework, the observation probabilistic model is one of main factors for infrared targets tracking performance. In order to improve the tracking performance, covariance matrices are introduced to represent infrared targets with the multi-features. The observation probabilistic model can be constructed by computing the distance between the reference target's and the target samples' covariance matrix. Because the covariance matrix provides a natural tool for integrating multiple features, and is scale and illumination independent, target representation with covariance matrices can hold strong discriminating ability and robustness. Two experimental results demonstrate the proposed method is effective and robust for different infrared target tracking, such as the sensor ego-motion scene, and the sea-clutter scene.
Basic research planning in mathematical pattern recognition and image analysis
NASA Technical Reports Server (NTRS)
Bryant, J.; Guseman, L. F., Jr.
1981-01-01
Fundamental problems encountered while attempting to develop automated techniques for applications of remote sensing are discussed under the following categories: (1) geometric and radiometric preprocessing; (2) spatial, spectral, temporal, syntactic, and ancillary digital image representation; (3) image partitioning, proportion estimation, and error models in object scene interference; (4) parallel processing and image data structures; and (5) continuing studies in polarization; computer architectures and parallel processing; and the applicability of "expert systems" to interactive analysis.
Increasing situation awareness of the CBRNE robot operators
NASA Astrophysics Data System (ADS)
Jasiobedzki, Piotr; Ng, Ho-Kong; Bondy, Michel; McDiarmid, Carl H.
2010-04-01
Situational awareness of CBRN robot operators is quite limited, as they rely on images and measurements from on-board detectors. This paper describes a novel framework that enables a uniform and intuitive access to live and recent data via 2D and 3D representations of visited sites. These representations are created automatically and augmented with images, models and CBRNE measurements. This framework has been developed for CBRNE Crime Scene Modeler (C2SM), a mobile CBRNE mapping system. The system creates representations (2D floor plans and 3D photorealistic models) of the visited sites, which are then automatically augmented with CBRNE detector measurements. The data stored in a database is accessed using a variety of user interfaces providing different perspectives and increasing operators' situational awareness.
Straube, Thomas; Preissler, Sandra; Lipka, Judith; Hewig, Johannes; Mentzel, Hans-Joachim; Miltner, Wolfgang H R
2010-01-01
Some people search for intense sensations such as being scared by frightening movies while others do not. The brain mechanisms underlying such inter-individual differences are not clear. Testing theoretical models, we investigated neural correlates of anxiety and the personality trait sensation seeking in 40 subjects who watched threatening and neutral scenes from scary movies during functional magnetic resonance imaging. Threat versus neutral scenes induced increased activation in anterior cingulate cortex, insula, thalamus, and visual areas. Movie-induced anxiety correlated positively with activation in dorsomedial prefrontal cortex, indicating a role for this area in the subjective experience of being scared. Sensation seeking-scores correlated positively with brain activation to threat versus neutral scenes in visual areas and in thalamus and anterior insula, i.e. regions involved in the induction and representation of arousal states. For the insula and thalamus, these outcomes were partly due to an inverse relation between sensation seeking scores and brain activation during neutral film clips. These results support models predicting cerebral hypoactivation in high sensation seekers during neutral stimulation, which may be compensated by more intense sensations such as watching scary movies. 2009 Wiley-Liss, Inc.
2016-06-01
theories of the mammalian visual system, and exploiting descriptive text that may accompany a still image for improved inference. The focus of the Brown...test, computer vision, semantic description , street scenes, belief propagation, generative models, nonlinear filtering, sufficient statistics 16...visual system, and exploiting descriptive text that may accompany a still image for improved inference. The focus of the Brown team was on single images
Binary-space-partitioned images for resolving image-based visibility.
Fu, Chi-Wing; Wong, Tien-Tsin; Tong, Wai-Shun; Tang, Chi-Keung; Hanson, Andrew J
2004-01-01
We propose a novel 2D representation for 3D visibility sorting, the Binary-Space-Partitioned Image (BSPI), to accelerate real-time image-based rendering. BSPI is an efficient 2D realization of a 3D BSP tree, which is commonly used in computer graphics for time-critical visibility sorting. Since the overall structure of a BSP tree is encoded in a BSPI, traversing a BSPI is comparable to traversing the corresponding BSP tree. BSPI performs visibility sorting efficiently and accurately in the 2D image space by warping the reference image triangle-by-triangle instead of pixel-by-pixel. Multiple BSPIs can be combined to solve "disocclusion," when an occluded portion of the scene becomes visible at a novel viewpoint. Our method is highly automatic, including a tensor voting preprocessing step that generates candidate image partition lines for BSPIs, filters the noisy input data by rejecting outliers, and interpolates missing information. Our system has been applied to a variety of real data, including stereo, motion, and range images.
The medium and the message: a revisionist view of image quality
NASA Astrophysics Data System (ADS)
Ferwerda, James A.
2010-02-01
In his book "Understanding Media" social theorist Marshall McLuhan declared: "The medium is the message." The thesis of this paper is that with respect to image quality, imaging system developers have taken McLuhan's dictum too much to heart. Efforts focus on improving the technical specifications of the media (e.g. dynamic range, color gamut, resolution, temporal response) with little regard for the visual messages the media will be used to communicate. We present a series of psychophysical studies that investigate the visual system's ability to "see through" the limitations of imaging media to perceive the messages (object and scene properties) the images represent. The purpose of these studies is to understand the relationships between the signal characteristics of an image and the fidelity of the visual information the image conveys. The results of these studies provide a new perspective on image quality that shows that images that may be very different in "quality", can be visually equivalent as realistic representations of objects and scenes.
The effect of non-visual working memory load on top-down modulation of visual processing
Rissman, Jesse; Gazzaley, Adam; D'Esposito, Mark
2009-01-01
While a core function of the working memory (WM) system is the active maintenance of behaviorally relevant sensory representations, it is also critical that distracting stimuli are appropriately ignored. We used functional magnetic resonance imaging to examine the role of domain-general WM resources in the top-down attentional modulation of task-relevant and irrelevant visual representations. In our dual-task paradigm, each trial began with the auditory presentation of six random (high load) or sequentially-ordered (low load) digits. Next, two relevant visual stimuli (e.g., faces), presented amongst two temporally interspersed visual distractors (e.g., scenes), were to be encoded and maintained across a 7-sec delay interval, after which memory for the relevant images and digits was probed. When taxed by high load digit maintenance, participants exhibited impaired performance on the visual WM task and a selective failure to attenuate the neural processing of task-irrelevant scene stimuli. The over-processing of distractor scenes under high load was indexed by elevated encoding activity in a scene-selective region-of-interest relative to low load and passive viewing control conditions, as well as by improved long-term recognition memory for these items. In contrast, the load manipulation did not affect participants' ability to upregulate activity in this region when scenes were task-relevant. These results highlight the critical role of domain-general WM resources in the goal-directed regulation of distractor processing. Moreover, the consequences of increased WM load in young adults closely resemble the effects of cognitive aging on distractor filtering [Gazzaley et al., (2005) Nature Neuroscience 8, 1298-1300], suggesting the possibility of a common underlying mechanism. PMID:19397858
Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet
Rolls, Edmund T.
2012-01-01
Neurophysiological evidence for invariant representations of objects and faces in the primate inferior temporal visual cortex is described. Then a computational approach to how invariant representations are formed in the brain is described that builds on the neurophysiology. A feature hierarchy model in which invariant representations can be built by self-organizing learning based on the temporal and spatial statistics of the visual input produced by objects as they transform in the world is described. VisNet can use temporal continuity in an associative synaptic learning rule with a short-term memory trace, and/or it can use spatial continuity in continuous spatial transformation learning which does not require a temporal trace. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in, for example, spatial and object search tasks. The approach has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene. The approach has also been extended to provide, with an additional layer, for the development of representations of spatial scenes of the type found in the hippocampus. PMID:22723777
Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet.
Rolls, Edmund T
2012-01-01
Neurophysiological evidence for invariant representations of objects and faces in the primate inferior temporal visual cortex is described. Then a computational approach to how invariant representations are formed in the brain is described that builds on the neurophysiology. A feature hierarchy model in which invariant representations can be built by self-organizing learning based on the temporal and spatial statistics of the visual input produced by objects as they transform in the world is described. VisNet can use temporal continuity in an associative synaptic learning rule with a short-term memory trace, and/or it can use spatial continuity in continuous spatial transformation learning which does not require a temporal trace. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in, for example, spatial and object search tasks. The approach has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene. The approach has also been extended to provide, with an additional layer, for the development of representations of spatial scenes of the type found in the hippocampus.
Web GIS in practice VII: stereoscopic 3-D solutions for online maps and virtual globes
Boulos, Maged N Kamel; Robinson, Larry R
2009-01-01
Because our pupils are about 6.5 cm apart, each eye views a scene from a different angle and sends a unique image to the visual cortex, which then merges the images from both eyes into a single picture. The slight difference between the right and left images allows the brain to properly perceive the 'third dimension' or depth in a scene (stereopsis). However, when a person views a conventional 2-D (two-dimensional) image representation of a 3-D (three-dimensional) scene on a conventional computer screen, each eye receives essentially the same information. Depth in such cases can only be approximately inferred from visual clues in the image, such as perspective, as only one image is offered to both eyes. The goal of stereoscopic 3-D displays is to project a slightly different image into each eye to achieve a much truer and realistic perception of depth, of different scene planes, and of object relief. This paper presents a brief review of a number of stereoscopic 3-D hardware and software solutions for creating and displaying online maps and virtual globes (such as Google Earth) in "true 3D", with costs ranging from almost free to multi-thousand pounds sterling. A practical account is also given of the experience of the USGS BRD UMESC (United States Geological Survey's Biological Resources Division, Upper Midwest Environmental Sciences Center) in setting up a low-cost, full-colour stereoscopic 3-D system. PMID:19849837
Web GIS in practice VII: stereoscopic 3-D solutions for online maps and virtual globes.
Boulos, Maged N Kamel; Robinson, Larry R
2009-10-22
Because our pupils are about 6.5 cm apart, each eye views a scene from a different angle and sends a unique image to the visual cortex, which then merges the images from both eyes into a single picture. The slight difference between the right and left images allows the brain to properly perceive the 'third dimension' or depth in a scene (stereopsis). However, when a person views a conventional 2-D (two-dimensional) image representation of a 3-D (three-dimensional) scene on a conventional computer screen, each eye receives essentially the same information. Depth in such cases can only be approximately inferred from visual clues in the image, such as perspective, as only one image is offered to both eyes. The goal of stereoscopic 3-D displays is to project a slightly different image into each eye to achieve a much truer and realistic perception of depth, of different scene planes, and of object relief. This paper presents a brief review of a number of stereoscopic 3-D hardware and software solutions for creating and displaying online maps and virtual globes (such as Google Earth) in "true 3D", with costs ranging from almost free to multi-thousand pounds sterling. A practical account is also given of the experience of the USGS BRD UMESC (United States Geological Survey's Biological Resources Division, Upper Midwest Environmental Sciences Center) in setting up a low-cost, full-colour stereoscopic 3-D system.
Web GIS in practice VII: stereoscopic 3-D solutions for online maps and virtual globes
Boulos, Maged N.K.; Robinson, Larry R.
2009-01-01
Because our pupils are about 6.5 cm apart, each eye views a scene from a different angle and sends a unique image to the visual cortex, which then merges the images from both eyes into a single picture. The slight difference between the right and left images allows the brain to properly perceive the 'third dimension' or depth in a scene (stereopsis). However, when a person views a conventional 2-D (two-dimensional) image representation of a 3-D (three-dimensional) scene on a conventional computer screen, each eye receives essentially the same information. Depth in such cases can only be approximately inferred from visual clues in the image, such as perspective, as only one image is offered to both eyes. The goal of stereoscopic 3-D displays is to project a slightly different image into each eye to achieve a much truer and realistic perception of depth, of different scene planes, and of object relief. This paper presents a brief review of a number of stereoscopic 3-D hardware and software solutions for creating and displaying online maps and virtual globes (such as Google Earth) in "true 3D", with costs ranging from almost free to multi-thousand pounds sterling. A practical account is also given of the experience of the USGS BRD UMESC (United States Geological Survey's Biological Resources Division, Upper Midwest Environmental Sciences Center) in setting up a low-cost, full-colour stereoscopic 3-D system.
Rover imaging system for the Mars rover/sample return mission
NASA Technical Reports Server (NTRS)
1993-01-01
In the past year, the conceptual design of a panoramic imager for the Mars Environmental Survey (MESUR) Pathfinder was finished. A prototype camera was built and its performace in the laboratory was tested. The performance of this camera was excellent. Based on this work, we have recently proposed a small, lightweight, rugged, and highly capable Mars Surface Imager (MSI) instrument for the MESUR Pathfinder mission. A key aspect of our approach to optimization of the MSI design is that we treat image gathering, coding, and restoration as a whole, rather than as separate and independent tasks. Our approach leads to higher image quality, especially in the representation of fine detail with good contrast and clarity, without increasing either the complexity of the camera or the amount of data transmission. We have made significant progress over the past year in both the overall MSI system design and in the detailed design of the MSI optics. We have taken a simple panoramic camera and have upgraded it substantially to become a prototype of the MSI flight instrument. The most recent version of the camera utilizes miniature wide-angle optics that image directly onto a 3-color, 2096-element CCD line array. There are several data-taking modes, providing resolution as high as 0.3 mrad/pixel. Analysis tasks that were performed or that are underway with the test data from the prototype camera include the following: construction of 3-D models of imaged scenes from stereo data, first for controlled scenes and later for field scenes; and checks on geometric fidelity, including alignment errors, mast vibration, and oscillation in the drive system. We have outlined a number of tasks planned for Fiscal Year '93 in order to prepare us for submission of a flight instrument proposal for MESUR Pathfinder.
Multi-Depth-Map Raytracing for Efficient Large-Scene Reconstruction.
Arikan, Murat; Preiner, Reinhold; Wimmer, Michael
2016-02-01
With the enormous advances of the acquisition technology over the last years, fast processing and high-quality visualization of large point clouds have gained increasing attention. Commonly, a mesh surface is reconstructed from the point cloud and a high-resolution texture is generated over the mesh from the images taken at the site to represent surface materials. However, this global reconstruction and texturing approach becomes impractical with increasing data sizes. Recently, due to its potential for scalability and extensibility, a method for texturing a set of depth maps in a preprocessing and stitching them at runtime has been proposed to represent large scenes. However, the rendering performance of this method is strongly dependent on the number of depth maps and their resolution. Moreover, for the proposed scene representation, every single depth map has to be textured by the images, which in practice heavily increases processing costs. In this paper, we present a novel method to break these dependencies by introducing an efficient raytracing of multiple depth maps. In a preprocessing phase, we first generate high-resolution textured depth maps by rendering the input points from image cameras and then perform a graph-cut based optimization to assign a small subset of these points to the images. At runtime, we use the resulting point-to-image assignments (1) to identify for each view ray which depth map contains the closest ray-surface intersection and (2) to efficiently compute this intersection point. The resulting algorithm accelerates both the texturing and the rendering of the depth maps by an order of magnitude.
Modeling Of Object- And Scene-Prototypes With Hierarchically Structured Classes
NASA Astrophysics Data System (ADS)
Ren, Z.; Jensch, P.; Ameling, W.
1989-03-01
The success of knowledge-based image analysis methodology and implementation tools depends largely on an appropriately and efficiently built model wherein the domain-specific context information about and the inherent structure of the observed image scene have been encoded. For identifying an object in an application environment a computer vision system needs to know firstly the description of the object to be found in an image or in an image sequence, secondly the corresponding relationships between object descriptions within the image sequence. This paper presents models of image objects scenes by means of hierarchically structured classes. Using the topovisual formalism of graph and higraph, we are currently studying principally the relational aspect and data abstraction of the modeling in order to visualize the structural nature resident in image objects and scenes, and to formalize. their descriptions. The goal is to expose the structure of image scene and the correspondence of image objects in the low level image interpretation. process. The object-based system design approach has been applied to build the model base. We utilize the object-oriented programming language C + + for designing, testing and implementing the abstracted entity classes and the operation structures which have been modeled topovisually. The reference images used for modeling prototypes of objects and scenes are from industrial environments as'well as medical applications.
Feature maps driven no-reference image quality prediction of authentically distorted images
NASA Astrophysics Data System (ADS)
Ghadiyaram, Deepti; Bovik, Alan C.
2015-03-01
Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.
Image fusion via nonlocal sparse K-SVD dictionary learning.
Li, Ying; Li, Fangyi; Bai, Bendu; Shen, Qiang
2016-03-01
Image fusion aims to merge two or more images captured via various sensors of the same scene to construct a more informative image by integrating their details. Generally, such integration is achieved through the manipulation of the representations of the images concerned. Sparse representation plays an important role in the effective description of images, offering a great potential in a variety of image processing tasks, including image fusion. Supported by sparse representation, in this paper, an approach for image fusion by the use of a novel dictionary learning scheme is proposed. The nonlocal self-similarity property of the images is exploited, not only at the stage of learning the underlying description dictionary but during the process of image fusion. In particular, the property of nonlocal self-similarity is combined with the traditional sparse dictionary. This results in an improved learned dictionary, hereafter referred to as the nonlocal sparse K-SVD dictionary (where K-SVD stands for the K times singular value decomposition that is commonly used in the literature), and abbreviated to NL_SK_SVD. The performance of the NL_SK_SVD dictionary is applied for image fusion using simultaneous orthogonal matching pursuit. The proposed approach is evaluated with different types of images, and compared with a number of alternative image fusion techniques. The resultant superior fused images using the present approach demonstrates the efficacy of the NL_SK_SVD dictionary in sparse image representation.
Nonnegative Matrix Factorization for Efficient Hyperspectral Image Projection
NASA Technical Reports Server (NTRS)
Iacchetta, Alexander S.; Fienup, James R.; Leisawitz, David T.; Bolcar, Matthew R.
2015-01-01
Hyperspectral imaging for remote sensing has prompted development of hyperspectral image projectors that can be used to characterize hyperspectral imaging cameras and techniques in the lab. One such emerging astronomical hyperspectral imaging technique is wide-field double-Fourier interferometry. NASA's current, state-of-the-art, Wide-field Imaging Interferometry Testbed (WIIT) uses a Calibrated Hyperspectral Image Projector (CHIP) to generate test scenes and provide a more complete understanding of wide-field double-Fourier interferometry. Given enough time, the CHIP is capable of projecting scenes with astronomically realistic spatial and spectral complexity. However, this would require a very lengthy data collection process. For accurate but time-efficient projection of complicated hyperspectral images with the CHIP, the field must be decomposed both spectrally and spatially in a way that provides a favorable trade-off between accurately projecting the hyperspectral image and the time required for data collection. We apply nonnegative matrix factorization (NMF) to decompose hyperspectral astronomical datacubes into eigenspectra and eigenimages that allow time-efficient projection with the CHIP. Included is a brief analysis of NMF parameters that affect accuracy, including the number of eigenspectra and eigenimages used to approximate the hyperspectral image to be projected. For the chosen field, the normalized mean squared synthesis error is under 0.01 with just 8 eigenspectra. NMF of hyperspectral astronomical fields better utilizes the CHIP's capabilities, providing time-efficient and accurate representations of astronomical scenes to be imaged with the WIIT.
NASA Astrophysics Data System (ADS)
Pesaresi, Martino; Ouzounis, Georgios K.; Gueguen, Lionel
2012-06-01
A new compact representation of dierential morphological prole (DMP) vector elds is presented. It is referred to as the CSL model and is conceived to radically reduce the dimensionality of the DMP descriptors. The model maps three characteristic parameters, namely scale, saliency and level, into the RGB space through a HSV transform. The result is a a medium abstraction semantic layer used for visual exploration, image information mining and pattern classication. Fused with the PANTEX built-up presence index, the CSL model converges to an approximate building footprint representation layer in which color represents building class labels. This process is demonstrated on the rst high resolution (HR) global human settlement layer (GHSL) computed from multi-modal HR and VHR satellite images. Results of the rst massive processing exercise involving several thousands of scenes around the globe are reported along with validation gures.
NASA Technical Reports Server (NTRS)
Sidick, Erkin; Morgan, Rhonda M.; Green, Joseph J.; Ohara, Catherine M.; Redding, David C.
2007-01-01
We have developed a new, adaptive cross-correlation (ACC) algorithm to estimate with high accuracy the shift as large as several pixels in two extended-scene images captured by a Shack-Hartmann wavefront sensor (SH-WFS). It determines the positions of all of the extended-scene image cells relative to a reference cell using an FFT-based iterative image shifting algorithm. It works with both point-source spot images as well as extended scene images. We have also set up a testbed for extended0scene SH-WFS, and tested the ACC algorithm with the measured data of both point-source and extended-scene images. In this paper we describe our algorithm and present out experimental results.
Generation, recognition, and consistent fusion of partial boundary representations from range images
NASA Astrophysics Data System (ADS)
Kohlhepp, Peter; Hanczak, Andrzej M.; Li, Gang
1994-10-01
This paper presents SOMBRERO, a new system for recognizing and locating 3D, rigid, non- moving objects from range data. The objects may be polyhedral or curved, partially occluding, touching or lying flush with each other. For data collection, we employ 2D time- of-flight laser scanners mounted to a moving gantry robot. By combining sensor and robot coordinates, we obtain 3D cartesian coordinates. Boundary representations (Brep's) provide view independent geometry models that are both efficiently recognizable and derivable automatically from sensor data. SOMBRERO's methods for generating, matching and fusing Brep's are highly synergetic. A split-and-merge segmentation algorithm with dynamic triangular builds a partial (21/2D) Brep from scattered data. The recognition module matches this scene description with a model database and outputs recognized objects, their positions and orientations, and possibly surfaces corresponding to unknown objects. We present preliminary results in scene segmentation and recognition. Partial Brep's corresponding to different range sensors or viewpoints can be merged into a consistent, complete and irredundant 3D object or scene model. This fusion algorithm itself uses the recognition and segmentation methods.
The Role of Visual Experience on the Representation and Updating of Novel Haptic Scenes
ERIC Educational Resources Information Center
Pasqualotto, Achille; Newell, Fiona N.
2007-01-01
We investigated the role of visual experience on the spatial representation and updating of haptic scenes by comparing recognition performance across sighted, congenitally and late blind participants. We first established that spatial updating occurs in sighted individuals to haptic scenes of novel objects. All participants were required to…
Three-Dimensional Images For Robot Vision
NASA Astrophysics Data System (ADS)
McFarland, William D.
1983-12-01
Robots are attracting increased attention in the industrial productivity crisis. As one significant approach for this nation to maintain technological leadership, the need for robot vision has become critical. The "blind" robot, while occupying an economical niche at present is severely limited and job specific, being only one step up from the numerical controlled machines. To successfully satisfy robot vision requirements a three dimensional representation of a real scene must be provided. Several image acquistion techniques are discussed with more emphasis on the laser radar type instruments. The autonomous vehicle is also discussed as a robot form, and the requirements for these applications are considered. The total computer vision system requirement is reviewed with some discussion of the major techniques in the literature for three dimensional scene analysis.
How color enhances visual memory for natural scenes.
Spence, Ian; Wong, Patrick; Rusan, Maria; Rastegar, Naghmeh
2006-01-01
We offer a framework for understanding how color operates to improve visual memory for images of the natural environment, and we present an extensive data set that quantifies the contribution of color in the encoding and recognition phases. Using a continuous recognition task with colored and monochrome gray-scale images of natural scenes at short exposure durations, we found that color enhances recognition memory by conferring an advantage during encoding and by strengthening the encoding-specificity effect. Furthermore, because the pattern of performance was similar at all exposure durations, and because form and color are processed in different areas of cortex, the results imply that color must be bound as an integral part of the representation at the earliest stages of processing.
Dillon, Moira R.; Spelke, Elizabeth S.
2015-01-01
Research on animals, infants, children, and adults provides evidence that distinct cognitive systems underlie navigation and object recognition. Here we examine whether and how these systems interact when children interpret 2D edge-based perspectival line drawings of scenes and objects. Such drawings serve as symbols early in development, and they preserve scene and object geometry from canonical points of view. Young children show limits when using geometry both in non-symbolic tasks and in symbolic map tasks that present 3D contexts from unusual, unfamiliar points of view. When presented with the familiar viewpoints in perspectival line drawings, however, do children engage more integrated geometric representations? In three experiments, children successfully interpreted line drawings with respect to their depicted scene or object. Nevertheless, children recruited distinct processes when navigating based on the information in these drawings, and these processes depended on the context in which the drawings were presented. These results suggest that children are flexible but limited in using geometric information to form integrated representations of scenes and objects, even when interpreting spatial symbols that are highly familiar and faithful renditions of the visual world. PMID:25441089
Azzopardi, George; Petkov, Nicolai
2014-01-01
The remarkable abilities of the primate visual system have inspired the construction of computational models of some visual neurons. We propose a trainable hierarchical object recognition model, which we call S-COSFIRE (S stands for Shape and COSFIRE stands for Combination Of Shifted FIlter REsponses) and use it to localize and recognize objects of interests embedded in complex scenes. It is inspired by the visual processing in the ventral stream (V1/V2 → V4 → TEO). Recognition and localization of objects embedded in complex scenes is important for many computer vision applications. Most existing methods require prior segmentation of the objects from the background which on its turn requires recognition. An S-COSFIRE filter is automatically configured to be selective for an arrangement of contour-based features that belong to a prototype shape specified by an example. The configuration comprises selecting relevant vertex detectors and determining certain blur and shift parameters. The response is computed as the weighted geometric mean of the blurred and shifted responses of the selected vertex detectors. S-COSFIRE filters share similar properties with some neurons in inferotemporal cortex, which provided inspiration for this work. We demonstrate the effectiveness of S-COSFIRE filters in two applications: letter and keyword spotting in handwritten manuscripts and object spotting in complex scenes for the computer vision system of a domestic robot. S-COSFIRE filters are effective to recognize and localize (deformable) objects in images of complex scenes without requiring prior segmentation. They are versatile trainable shape detectors, conceptually simple and easy to implement. The presented hierarchical shape representation contributes to a better understanding of the brain and to more robust computer vision algorithms. PMID:25126068
Development of MPEG standards for 3D and free viewpoint video
NASA Astrophysics Data System (ADS)
Smolic, Aljoscha; Kimata, Hideaki; Vetro, Anthony
2005-11-01
An overview of 3D and free viewpoint video is given in this paper with special focus on related standardization activities in MPEG. Free viewpoint video allows the user to freely navigate within real world visual scenes, as known from virtual worlds in computer graphics. Suitable 3D scene representation formats are classified and the processing chain is explained. Examples are shown for image-based and model-based free viewpoint video systems, highlighting standards conform realization using MPEG-4. Then the principles of 3D video are introduced providing the user with a 3D depth impression of the observed scene. Example systems are described again focusing on their realization based on MPEG-4. Finally multi-view video coding is described as a key component for 3D and free viewpoint video systems. MPEG is currently working on a new standard for multi-view video coding. The conclusion is that the necessary technology including standard media formats for 3D and free viewpoint is available or will be available in the near future, and that there is a clear demand from industry and user side for such applications. 3DTV at home and free viewpoint video on DVD will be available soon, and will create huge new markets.
Graham, Daniel J; Field, David J
2008-01-01
Two recent studies suggest that natural scenes and paintings show similar statistical properties. But does the content or region of origin of an artwork affect its statistical properties? We addressed this question by having judges place paintings from a large, diverse collection of paintings into one of three subject-matter categories using a forced-choice paradigm. Basic statistics for images whose caterogization was agreed by all judges showed no significant differences between those judged to be 'landscape' and 'portrait/still-life', but these two classes differed from paintings judged to be 'abstract'. All categories showed basic spatial statistical regularities similar to those typical of natural scenes. A test of the full painting collection (140 images) with respect to the works' place of origin (provenance) showed significant differences between Eastern works and Western ones, differences which we find are likely related to the materials and the choice of background color. Although artists deviate slightly from reproducing natural statistics in abstract art (compared to representational art), the great majority of human art likely shares basic statistical limitations. We argue that statistical regularities in art are rooted in the need to make art visible to the eye, not in the inherent aesthetic value of natural-scene statistics, and we suggest that variability in spatial statistics may be generally imposed by manufacture.
The occipital place area represents the local elements of scenes
Kamps, Frederik S.; Julian, Joshua B.; Kubilius, Jonas; Kanwisher, Nancy; Dilks, Daniel D.
2016-01-01
Neuroimaging studies have identified three scene-selective regions in human cortex: parahippocampal place area (PPA), retrosplenial complex (RSC), and occipital place area (OPA). However, precisely what scene information each region represents in not clear, especially for the least studied, more posterior OPA. Here we hypothesized that OPA represents local elements of scenes within two independent, yet complementary scene descriptors: spatial boundary (i.e., the layout of external surfaces) and scene content (e.g., internal objects). If OPA processes the local elements of spatial boundary information, then it should respond to these local elements (e.g., walls) themselves, regardless of their spatial arrangement. Indeed, we found OPA, but not PPA or RSC, responded similarly to images of intact rooms and these same rooms in which the surfaces were fractured and rearranged, disrupting the spatial boundary. Next, if OPA represents the local elements of scene content information, then it should respond more when more such local elements (e.g., furniture) are present. Indeed, we found that OPA, but not PPA or RSC, responded more to multiple than single pieces of furniture. Taken together, these findings reveal that OPA analyzes local scene elements – both in spatial boundary and scene content representation – while PPA and RSC represent global scene properties. PMID:26931815
The occipital place area represents the local elements of scenes.
Kamps, Frederik S; Julian, Joshua B; Kubilius, Jonas; Kanwisher, Nancy; Dilks, Daniel D
2016-05-15
Neuroimaging studies have identified three scene-selective regions in human cortex: parahippocampal place area (PPA), retrosplenial complex (RSC), and occipital place area (OPA). However, precisely what scene information each region represents is not clear, especially for the least studied, more posterior OPA. Here we hypothesized that OPA represents local elements of scenes within two independent, yet complementary scene descriptors: spatial boundary (i.e., the layout of external surfaces) and scene content (e.g., internal objects). If OPA processes the local elements of spatial boundary information, then it should respond to these local elements (e.g., walls) themselves, regardless of their spatial arrangement. Indeed, we found that OPA, but not PPA or RSC, responded similarly to images of intact rooms and these same rooms in which the surfaces were fractured and rearranged, disrupting the spatial boundary. Next, if OPA represents the local elements of scene content information, then it should respond more when more such local elements (e.g., furniture) are present. Indeed, we found that OPA, but not PPA or RSC, responded more to multiple than single pieces of furniture. Taken together, these findings reveal that OPA analyzes local scene elements - both in spatial boundary and scene content representation - while PPA and RSC represent global scene properties. Copyright © 2016 Elsevier Inc. All rights reserved.
Anticipatory Scene Representation in Preschool Children's Recall and Recognition Memory
ERIC Educational Resources Information Center
Kreindel, Erica; Intraub, Helene
2017-01-01
Behavioral and neuroscience research on boundary extension (false memory beyond the edges of a view of a scene) has provided new insights into the constructive nature of scene representation, and motivates questions about development. Early research with children (as young as 6-7 years) was consistent with boundary extension, but relied on an…
Feature diagnosticity and task context shape activity in human scene-selective cortex.
Lowe, Matthew X; Gallivan, Jason P; Ferber, Susanne; Cant, Jonathan S
2016-01-15
Scenes are constructed from multiple visual features, yet previous research investigating scene processing has often focused on the contributions of single features in isolation. In the real world, features rarely exist independently of one another and likely converge to inform scene identity in unique ways. Here, we utilize fMRI and pattern classification techniques to examine the interactions between task context (i.e., attend to diagnostic global scene features; texture or layout) and high-level scene attributes (content and spatial boundary) to test the novel hypothesis that scene-selective cortex represents multiple visual features, the importance of which varies according to their diagnostic relevance across scene categories and task demands. Our results show for the first time that scene representations are driven by interactions between multiple visual features and high-level scene attributes. Specifically, univariate analysis of scene-selective cortex revealed that task context and feature diagnosticity shape activity differentially across scene categories. Examination using multivariate decoding methods revealed results consistent with univariate findings, but also evidence for an interaction between high-level scene attributes and diagnostic visual features within scene categories. Critically, these findings suggest visual feature representations are not distributed uniformly across scene categories but are shaped by task context and feature diagnosticity. Thus, we propose that scene-selective cortex constructs a flexible representation of the environment by integrating multiple diagnostically relevant visual features, the nature of which varies according to the particular scene being perceived and the goals of the observer. Copyright © 2015 Elsevier Inc. All rights reserved.
Modeling visual clutter perception using proto-object segmentation
Yu, Chen-Ping; Samaras, Dimitris; Zelinsky, Gregory J.
2014-01-01
We introduce the proto-object model of visual clutter perception. This unsupervised model segments an image into superpixels, then merges neighboring superpixels that share a common color cluster to obtain proto-objects—defined here as spatially extended regions of coherent features. Clutter is estimated by simply counting the number of proto-objects. We tested this model using 90 images of realistic scenes that were ranked by observers from least to most cluttered. Comparing this behaviorally obtained ranking to a ranking based on the model clutter estimates, we found a significant correlation between the two (Spearman's ρ = 0.814, p < 0.001). We also found that the proto-object model was highly robust to changes in its parameters and was generalizable to unseen images. We compared the proto-object model to six other models of clutter perception and demonstrated that it outperformed each, in some cases dramatically. Importantly, we also showed that the proto-object model was a better predictor of clutter perception than an actual count of the number of objects in the scenes, suggesting that the set size of a scene may be better described by proto-objects than objects. We conclude that the success of the proto-object model is due in part to its use of an intermediate level of visual representation—one between features and objects—and that this is evidence for the potential importance of a proto-object representation in many common visual percepts and tasks. PMID:24904121
Higher-order scene statistics of breast images
NASA Astrophysics Data System (ADS)
Abbey, Craig K.; Sohl-Dickstein, Jascha N.; Olshausen, Bruno A.; Eckstein, Miguel P.; Boone, John M.
2009-02-01
Researchers studying human and computer vision have found description and construction of these systems greatly aided by analysis of the statistical properties of naturally occurring scenes. More specifically, it has been found that receptive fields with directional selectivity and bandwidth properties similar to mammalian visual systems are more closely matched to the statistics of natural scenes. It is argued that this allows for sparse representation of the independent components of natural images [Olshausen and Field, Nature, 1996]. These theories have important implications for medical image perception. For example, will a system that is designed to represent the independent components of natural scenes, where objects occlude one another and illumination is typically reflected, be appropriate for X-ray imaging, where features superimpose on one another and illumination is transmissive? In this research we begin to examine these issues by evaluating higher-order statistical properties of breast images from X-ray projection mammography (PM) and dedicated breast computed tomography (bCT). We evaluate kurtosis in responses of octave bandwidth Gabor filters applied to PM and to coronal slices of bCT scans. We find that kurtosis in PM rises and quickly saturates for filter center frequencies with an average value above 0.95. By contrast, kurtosis in bCT peaks near 0.20 cyc/mm with kurtosis of approximately 2. Our findings suggest that the human visual system may be tuned to represent breast tissue more effectively in bCT over a specific range of spatial frequencies.
Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images
Cao, Jianfang; Chen, Lichao
2015-01-01
With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP) neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance. PMID:25838818
Multispectral Terrain Background Simulation Techniques For Use In Airborne Sensor Evaluation
NASA Astrophysics Data System (ADS)
Weinberg, Michael; Wohlers, Ronald; Conant, John; Powers, Edward
1988-08-01
A background simulation code developed at Aerodyne Research, Inc., called AERIE is designed to reflect the major sources of clutter that are of concern to staring and scanning sensors of the type being considered for various airborne threat warning (both aircraft and missiles) sensors. The code is a first principles model that could be used to produce a consistent image of the terrain for various spectral bands, i.e., provide the proper scene correlation both spectrally and spatially. The code utilizes both topographic and cultural features to model terrain, typically from DMA data, with a statistical overlay of the critical underlying surface properties (reflectance, emittance, and thermal factors) to simulate the resulting texture in the scene. Strong solar scattering from water surfaces is included with allowance for wind driven surface roughness. Clouds can be superimposed on the scene using physical cloud models and an analytical representation of the reflectivity obtained from scattering off spherical particles. The scene generator is augmented by collateral codes that allow for the generation of images at finer resolution. These codes provide interpolation of the basic DMA databases using fractal procedures that preserve the high frequency power spectral density behavior of the original scene. Scenes are presented illustrating variations in altitude, radiance, resolution, material, thermal factors, and emissivities. The basic models utilized for simulation of the various scene components and various "engineering level" approximations are incorporated to reduce the computational complexity of the simulation.
Serial grouping of 2D-image regions with object-based attention in humans.
Jeurissen, Danique; Self, Matthew W; Roelfsema, Pieter R
2016-06-13
After an initial stage of local analysis within the retina and early visual pathways, the human visual system creates a structured representation of the visual scene by co-selecting image elements that are part of behaviorally relevant objects. The mechanisms underlying this perceptual organization process are only partially understood. We here investigate the time-course of perceptual grouping of two-dimensional image-regions by measuring the reaction times of human participants and report that it is associated with the gradual spread of object-based attention. Attention spreads fastest over large and homogeneous areas and is slowed down at locations that require small-scale processing. We find that the time-course of the object-based selection process is well explained by a 'growth-cone' model, which selects surface elements in an incremental, scale-dependent manner. We discuss how the visual cortical hierarchy can implement this scale-dependent spread of object-based attention, leveraging the different receptive field sizes in distinct cortical areas.
Extended census transform histogram for land-use scene classification
NASA Astrophysics Data System (ADS)
Yuan, Baohua; Li, Shijin
2017-04-01
With the popular use of high-resolution satellite images, more and more research efforts have been focused on land-use scene classification. In scene classification, effective visual features can significantly boost the final performance. As a typical texture descriptor, the census transform histogram (CENTRIST) has emerged as a very powerful tool due to its effective representation ability. However, the most prominent limitation of CENTRIST is its small spatial support area, which may not necessarily be adept at capturing the key texture characteristics. We propose an extended CENTRIST (eCENTRIST), which is made up of three subschemes in a greater neighborhood scale. The proposed eCENTRIST not only inherits the advantages of CENTRIST but also encodes the more useful information of local structures. Meanwhile, multichannel eCENTRIST, which can capture the interactions from multichannel images, is developed to obtain higher categorization accuracy rates. Experimental results demonstrate that the proposed method can achieve competitive performance when compared to state-of-the-art methods.
Tachistoscopic illumination and masking of real scenes.
Chichka, David; Philbeck, John W; Gajewski, Daniel A
2015-03-01
Tachistoscopic presentation of scenes has been valuable for studying the emerging properties of visual scene representations. The spatial aspects of this work have generally been focused on the conceptual locations (e.g., next to the refrigerator) and directional locations of objects in 2-D arrays and/or images. Less is known about how the perceived egocentric distance of objects develops. Here we describe a novel system for presenting brief glimpses of a real-world environment, followed by a mask. The system includes projectors with mechanical shutters for projecting the fixation and masking images, a set of LED floodlights for illuminating the environment, and computer-controlled electronics to set the timing and initiate the process. Because a real environment is used, most visual distance and depth cues can be manipulated using traditional methods. The system is inexpensive, robust, and its components are readily available in the marketplace. This article describes the system and the timing characteristics of each component. We verified the system's ability to control exposure to time scales as low as a few milliseconds.
Significance of perceptually relevant image decolorization for scene classification
NASA Astrophysics Data System (ADS)
Viswanathan, Sowmya; Divakaran, Govind; Soman, Kutti Padanyl
2017-11-01
Color images contain luminance and chrominance components representing the intensity and color information, respectively. The objective of this paper is to show the significance of incorporating chrominance information to the task of scene classification. An improved color-to-grayscale image conversion algorithm that effectively incorporates chrominance information is proposed using the color-to-gray structure similarity index and singular value decomposition to improve the perceptual quality of the converted grayscale images. The experimental results based on an image quality assessment for image decolorization and its success rate (using the Cadik and COLOR250 datasets) show that the proposed image decolorization technique performs better than eight existing benchmark algorithms for image decolorization. In the second part of the paper, the effectiveness of incorporating the chrominance component for scene classification tasks is demonstrated using a deep belief network-based image classification system developed using dense scale-invariant feature transforms. The amount of chrominance information incorporated into the proposed image decolorization technique is confirmed with the improvement to the overall scene classification accuracy. Moreover, the overall scene classification performance improved by combining the models obtained using the proposed method and conventional decolorization methods.
Near-Space TOPSAR Large-Scene Full-Aperture Imaging Scheme Based on Two-Step Processing
Zhang, Qianghui; Wu, Junjie; Li, Wenchao; Huang, Yulin; Yang, Jianyu; Yang, Haiguang
2016-01-01
Free of the constraints of orbit mechanisms, weather conditions and minimum antenna area, synthetic aperture radar (SAR) equipped on near-space platform is more suitable for sustained large-scene imaging compared with the spaceborne and airborne counterparts. Terrain observation by progressive scans (TOPS), which is a novel wide-swath imaging mode and allows the beam of SAR to scan along the azimuth, can reduce the time of echo acquisition for large scene. Thus, near-space TOPS-mode SAR (NS-TOPSAR) provides a new opportunity for sustained large-scene imaging. An efficient full-aperture imaging scheme for NS-TOPSAR is proposed in this paper. In this scheme, firstly, two-step processing (TSP) is adopted to eliminate the Doppler aliasing of the echo. Then, the data is focused in two-dimensional frequency domain (FD) based on Stolt interpolation. Finally, a modified TSP (MTSP) is performed to remove the azimuth aliasing. Simulations are presented to demonstrate the validity of the proposed imaging scheme for near-space large-scene imaging application. PMID:27472341
Secure access control and large scale robust representation for online multimedia event detection.
Liu, Changyu; Lu, Bin; Li, Huiling
2014-01-01
We developed an online multimedia event detection (MED) system. However, there are a secure access control issue and a large scale robust representation issue when we want to integrate traditional event detection algorithms into the online environment. For the first issue, we proposed a tree proxy-based and service-oriented access control (TPSAC) model based on the traditional role based access control model. Verification experiments were conducted on the CloudSim simulation platform, and the results showed that the TPSAC model is suitable for the access control of dynamic online environments. For the second issue, inspired by the object-bank scene descriptor, we proposed a 1000-object-bank (1000OBK) event descriptor. Feature vectors of the 1000OBK were extracted from response pyramids of 1000 generic object detectors which were trained on standard annotated image datasets, such as the ImageNet dataset. A spatial bag of words tiling approach was then adopted to encode these feature vectors for bridging the gap between the objects and events. Furthermore, we performed experiments in the context of event classification on the challenging TRECVID MED 2012 dataset, and the results showed that the robust 1000OBK event descriptor outperforms the state-of-the-art approaches.
Volumetric calibration of a plenoptic camera.
Hall, Elise Munz; Fahringer, Timothy W; Guildenbecher, Daniel R; Thurow, Brian S
2018-02-01
The volumetric calibration of a plenoptic camera is explored to correct for inaccuracies due to real-world lens distortions and thin-lens assumptions in current processing methods. Two methods of volumetric calibration based on a polynomial mapping function that does not require knowledge of specific lens parameters are presented and compared to a calibration based on thin-lens assumptions. The first method, volumetric dewarping, is executed by creation of a volumetric representation of a scene using the thin-lens assumptions, which is then corrected in post-processing using a polynomial mapping function. The second method, direct light-field calibration, uses the polynomial mapping in creation of the initial volumetric representation to relate locations in object space directly to image sensor locations. The accuracy and feasibility of these methods is examined experimentally by capturing images of a known dot card at a variety of depths. Results suggest that use of a 3D polynomial mapping function provides a significant increase in reconstruction accuracy and that the achievable accuracy is similar using either polynomial-mapping-based method. Additionally, direct light-field calibration provides significant computational benefits by eliminating some intermediate processing steps found in other methods. Finally, the flexibility of this method is shown for a nonplanar calibration.
Anticipation in Real-World Scenes: The Role of Visual Context and Visual Memory.
Coco, Moreno I; Keller, Frank; Malcolm, George L
2016-11-01
The human sentence processor is able to make rapid predictions about upcoming linguistic input. For example, upon hearing the verb eat, anticipatory eye-movements are launched toward edible objects in a visual scene (Altmann & Kamide, 1999). However, the cognitive mechanisms that underlie anticipation remain to be elucidated in ecologically valid contexts. Previous research has, in fact, mainly used clip-art scenes and object arrays, raising the possibility that anticipatory eye-movements are limited to displays containing a small number of objects in a visually impoverished context. In Experiment 1, we confirm that anticipation effects occur in real-world scenes and investigate the mechanisms that underlie such anticipation. In particular, we demonstrate that real-world scenes provide contextual information that anticipation can draw on: When the target object is not present in the scene, participants infer and fixate regions that are contextually appropriate (e.g., a table upon hearing eat). Experiment 2 investigates whether such contextual inference requires the co-presence of the scene, or whether memory representations can be utilized instead. The same real-world scenes as in Experiment 1 are presented to participants, but the scene disappears before the sentence is heard. We find that anticipation occurs even when the screen is blank, including when contextual inference is required. We conclude that anticipatory language processing is able to draw upon global scene representations (such as scene type) to make contextual inferences. These findings are compatible with theories assuming contextual guidance, but posit a challenge for theories assuming object-based visual indices. Copyright © 2015 Cognitive Science Society, Inc.
Extracting flat-field images from scene-based image sequences using phase correlation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Caron, James N., E-mail: Caron@RSImd.com; Montes, Marcos J.; Obermark, Jerome L.
Flat-field image processing is an essential step in producing high-quality and radiometrically calibrated images. Flat-fielding corrects for variations in the gain of focal plane array electronics and unequal illumination from the system optics. Typically, a flat-field image is captured by imaging a radiometrically uniform surface. The flat-field image is normalized and removed from the images. There are circumstances, such as with remote sensing, where a flat-field image cannot be acquired in this manner. For these cases, we developed a phase-correlation method that allows the extraction of an effective flat-field image from a sequence of scene-based displaced images. The method usesmore » sub-pixel phase correlation image registration to align the sequence to estimate the static scene. The scene is removed from sequence producing a sequence of misaligned flat-field images. An average flat-field image is derived from the realigned flat-field sequence.« less
Songnian, Zhao; Qi, Zou; Chang, Liu; Xuemin, Liu; Shousi, Sun; Jun, Qiu
2014-04-23
How it is possible to "faithfully" represent a three-dimensional stereoscopic scene using Cartesian coordinates on a plane, and how three-dimensional perceptions differ between an actual scene and an image of the same scene are questions that have not yet been explored in depth. They seem like commonplace phenomena, but in fact, they are important and difficult issues for visual information processing, neural computation, physics, psychology, cognitive psychology, and neuroscience. The results of this study show that the use of plenoptic (or all-optical) functions and their dual plane parameterizations can not only explain the nature of information processing from the retina to the primary visual cortex and, in particular, the characteristics of the visual pathway's optical system and its affine transformation, but they can also clarify the reason why the vanishing point and line exist in a visual image. In addition, they can better explain the reasons why a three-dimensional Cartesian coordinate system can be introduced into the two-dimensional plane to express a real three-dimensional scene. 1. We introduce two different mathematical expressions of the plenoptic functions, Pw and Pv that can describe the objective world. We also analyze the differences between these two functions when describing visual depth perception, that is, the difference between how these two functions obtain the depth information of an external scene.2. The main results include a basic method for introducing a three-dimensional Cartesian coordinate system into a two-dimensional plane to express the depth of a scene, its constraints, and algorithmic implementation. In particular, we include a method to separate the plenoptic function and proceed with the corresponding transformation in the retina and visual cortex.3. We propose that size constancy, the vanishing point, and vanishing line form the basis of visual perception of the outside world, and that the introduction of a three-dimensional Cartesian coordinate system into a two dimensional plane reveals a corresponding mapping between a retinal image and the vanishing point and line.
2014-01-01
Background How it is possible to “faithfully” represent a three-dimensional stereoscopic scene using Cartesian coordinates on a plane, and how three-dimensional perceptions differ between an actual scene and an image of the same scene are questions that have not yet been explored in depth. They seem like commonplace phenomena, but in fact, they are important and difficult issues for visual information processing, neural computation, physics, psychology, cognitive psychology, and neuroscience. Results The results of this study show that the use of plenoptic (or all-optical) functions and their dual plane parameterizations can not only explain the nature of information processing from the retina to the primary visual cortex and, in particular, the characteristics of the visual pathway’s optical system and its affine transformation, but they can also clarify the reason why the vanishing point and line exist in a visual image. In addition, they can better explain the reasons why a three-dimensional Cartesian coordinate system can be introduced into the two-dimensional plane to express a real three-dimensional scene. Conclusions 1. We introduce two different mathematical expressions of the plenoptic functions, P w and P v that can describe the objective world. We also analyze the differences between these two functions when describing visual depth perception, that is, the difference between how these two functions obtain the depth information of an external scene. 2. The main results include a basic method for introducing a three-dimensional Cartesian coordinate system into a two-dimensional plane to express the depth of a scene, its constraints, and algorithmic implementation. In particular, we include a method to separate the plenoptic function and proceed with the corresponding transformation in the retina and visual cortex. 3. We propose that size constancy, the vanishing point, and vanishing line form the basis of visual perception of the outside world, and that the introduction of a three-dimensional Cartesian coordinate system into a two dimensional plane reveals a corresponding mapping between a retinal image and the vanishing point and line. PMID:24755246
Signature modelling and radiometric rendering equations in infrared scene simulation systems
NASA Astrophysics Data System (ADS)
Willers, Cornelius J.; Willers, Maria S.; Lapierre, Fabian
2011-11-01
The development and optimisation of modern infrared systems necessitates the use of simulation systems to create radiometrically realistic representations (e.g. images) of infrared scenes. Such simulation systems are used in signature prediction, the development of surveillance and missile sensors, signal/image processing algorithm development and aircraft self-protection countermeasure system development and evaluation. Even the most cursory investigation reveals a multitude of factors affecting the infrared signatures of realworld objects. Factors such as spectral emissivity, spatial/volumetric radiance distribution, specular reflection, reflected direct sunlight, reflected ambient light, atmospheric degradation and more, all affect the presentation of an object's instantaneous signature. The signature is furthermore dynamically varying as a result of internal and external influences on the object, resulting from the heat balance comprising insolation, internal heat sources, aerodynamic heating (airborne objects), conduction, convection and radiation. In order to accurately render the object's signature in a computer simulation, the rendering equations must therefore account for all the elements of the signature. In this overview paper, the signature models, rendering equations and application frameworks of three infrared simulation systems are reviewed and compared. The paper first considers the problem of infrared scene simulation in a framework for simulation validation. This approach provides concise definitions and a convenient context for considering signature models and subsequent computer implementation. The primary radiometric requirements for an infrared scene simulator are presented next. The signature models and rendering equations implemented in OSMOSIS (Belgian Royal Military Academy), DIRSIG (Rochester Institute of Technology) and OSSIM (CSIR & Denel Dynamics) are reviewed. In spite of these three simulation systems' different application focus areas, their underlying physics-based approach is similar. The commonalities and differences between the different systems are investigated, in the context of their somewhat different application areas. The application of an infrared scene simulation system towards the development of imaging missiles and missile countermeasures are briefly described. Flowing from the review of the available models and equations, recommendations are made to further enhance and improve the signature models and rendering equations in infrared scene simulators.
NASA Astrophysics Data System (ADS)
van Aardt, J. A.; van Leeuwen, M.; Kelbe, D.; Kampe, T.; Krause, K.
2015-12-01
Remote sensing is widely accepted as a useful technology for characterizing the Earth surface in an objective, reproducible, and economically feasible manner. To date, the calibration and validation of remote sensing data sets and biophysical parameter estimates remain challenging due to the requirements to sample large areas for ground-truth data collection, and restrictions to sample these data within narrow temporal windows centered around flight campaigns or satellite overpasses. The computer graphics community have taken significant steps to ameliorate some of these challenges by providing an ability to generate synthetic images based on geometrically and optically realistic representations of complex targets and imaging instruments. These synthetic data can be used for conceptual and diagnostic tests of instrumentation prior to sensor deployment or to examine linkages between biophysical characteristics of the Earth surface and at-sensor radiance. In the last two decades, the use of image generation techniques for remote sensing of the vegetated environment has evolved from the simulation of simple homogeneous, hypothetical vegetation canopies, to advanced scenes and renderings with a high degree of photo-realism. Reported virtual scenes comprise up to 100M surface facets; however, due to the tighter coupling between hardware and software development, the full potential of image generation techniques for forestry applications yet remains to be fully explored. In this presentation, we examine the potential computer graphics techniques have for the analysis of forest structure-function relationships and demonstrate techniques that provide for the modeling of extremely high-faceted virtual forest canopies, comprising billions of scene elements. We demonstrate the use of ray tracing simulations for the analysis of gap size distributions and characterization of foliage clumping within spatial footprints that allow for a tight matching between characteristics derived from these virtual scenes and typical pixel resolutions of remote sensing imagery.
A Physics-Based Deep Learning Approach to Shadow Invariant Representations of Hyperspectral Images.
Windrim, Lloyd; Ramakrishnan, Rishi; Melkumyan, Arman; Murphy, Richard J
2018-02-01
This paper proposes the Relit Spectral Angle-Stacked Autoencoder, a novel unsupervised feature learning approach for mapping pixel reflectances to illumination invariant encodings. This work extends the Spectral Angle-Stacked Autoencoder so that it can learn a shadow-invariant mapping. The method is inspired by a deep learning technique, Denoising Autoencoders, with the incorporation of a physics-based model for illumination such that the algorithm learns a shadow invariant mapping without the need for any labelled training data, additional sensors, a priori knowledge of the scene or the assumption of Planckian illumination. The method is evaluated using datasets captured from several different cameras, with experiments to demonstrate the illumination invariance of the features and how they can be used practically to improve the performance of high-level perception algorithms that operate on images acquired outdoors.
An Automatic Procedure for Combining Digital Images and Laser Scanner Data
NASA Astrophysics Data System (ADS)
Moussa, W.; Abdel-Wahab, M.; Fritsch, D.
2012-07-01
Besides improving both the geometry and the visual quality of the model, the integration of close-range photogrammetry and terrestrial laser scanning techniques directs at filling gaps in laser scanner point clouds to avoid modeling errors, reconstructing more details in higher resolution and recovering simple structures with less geometric details. Thus, within this paper a flexible approach for the automatic combination of digital images and laser scanner data is presented. Our approach comprises two methods for data fusion. The first method starts by a marker-free registration of digital images based on a point-based environment model (PEM) of a scene which stores the 3D laser scanner point clouds associated with intensity and RGB values. The PEM allows the extraction of accurate control information for the direct computation of absolute camera orientations with redundant information by means of accurate space resection methods. In order to use the computed relations between the digital images and the laser scanner data, an extended Helmert (seven-parameter) transformation is introduced and its parameters are estimated. Precedent to that, in the second method, the local relative orientation parameters of the camera images are calculated by means of an optimized Structure and Motion (SaM) reconstruction method. Then, using the determined transformation parameters results in having absolute oriented images in relation to the laser scanner data. With the resulting absolute orientations we have employed robust dense image reconstruction algorithms to create oriented dense image point clouds, which are automatically combined with the laser scanner data to form a complete detailed representation of a scene. Examples of different data sets are shown and experimental results demonstrate the effectiveness of the presented procedures.
NASA Astrophysics Data System (ADS)
Assadi, Amir H.
2001-11-01
Perceptual geometry is an emerging field of interdisciplinary research whose objectives focus on study of geometry from the perspective of visual perception, and in turn, apply such geometric findings to the ecological study of vision. Perceptual geometry attempts to answer fundamental questions in perception of form and representation of space through synthesis of cognitive and biological theories of visual perception with geometric theories of the physical world. Perception of form and space are among fundamental problems in vision science. In recent cognitive and computational models of human perception, natural scenes are used systematically as preferred visual stimuli. Among key problems in perception of form and space, we have examined perception of geometry of natural surfaces and curves, e.g. as in the observer's environment. Besides a systematic mathematical foundation for a remarkably general framework, the advantages of the Gestalt theory of natural surfaces include a concrete computational approach to simulate or recreate images whose geometric invariants and quantities might be perceived and estimated by an observer. The latter is at the very foundation of understanding the nature of perception of space and form, and the (computer graphics) problem of rendering scenes to visually invoke virtual presence.
The elephant in the room: Inconsistency in scene viewing and representation.
Spotorno, Sara; Tatler, Benjamin W
2017-10-01
We examined the extent to which semantic informativeness, consistency with expectations and perceptual salience contribute to object prioritization in scene viewing and representation. In scene viewing (Experiments 1-2), semantic guidance overshadowed perceptual guidance in determining fixation order, with the greatest prioritization for objects that were diagnostic of the scene's depicted event. Perceptual properties affected selection of consistent objects (regardless of their informativeness) but not of inconsistent objects. Semantic and perceptual properties also interacted in influencing foveal inspection, as inconsistent objects were fixated longer than low but not high salience diagnostic objects. While not studied in direct competition with each other (each studied in competition with diagnostic objects), we found that inconsistent objects were fixated earlier and for longer than consistent but marginally informative objects. In change detection (Experiment 3), perceptual guidance overshadowed semantic guidance, promoting detection of highly salient changes. A residual advantage for diagnosticity over inconsistency emerged only when selection prioritization could not be based on low-level features. Overall these findings show that semantic inconsistency is not prioritized within a scene when competing with other relevant information that is essential to scene understanding and respects observers' expectations. Moreover, they reveal that the relative dominance of semantic or perceptual properties during selection depends on ongoing task requirements. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Brain mechanisms underlying cue-based memorizing during free viewing of movie Memento.
Kauttonen, Janne; Hlushchuk, Yevhen; Jääskeläinen, Iiro P; Tikka, Pia
2018-05-15
How does the human brain recall and connect relevant memories with unfolding events? To study this, we presented 25 healthy subjects, during functional magnetic resonance imaging, the movie 'Memento' (director C. Nolan). In this movie, scenes are presented in chronologically reverse order with certain scenes briefly overlapping previously presented scenes. Such overlapping "key-frames" serve as effective memory cues for the viewers, prompting recall of relevant memories of the previously seen scene and connecting them with the concurrent scene. We hypothesized that these repeating key-frames serve as immediate recall cues and would facilitate reconstruction of the story piece-by-piece. The chronological version of Memento, shown in a separate experiment for another group of subjects, served as a control condition. Using multivariate event-related pattern analysis method and representational similarity analysis, focal fingerprint patterns of hemodynamic activity were found to emerge during presentation of key-frame scenes. This effect was present in higher-order cortical network with regions including precuneus, angular gyrus, cingulate gyrus, as well as lateral, superior, and middle frontal gyri within frontal poles. This network was right hemispheric dominant. These distributed patterns of brain activity appear to underlie ability to recall relevant memories and connect them with ongoing events, i.e., "what goes with what" in a complex story. Given the real-life likeness of cinematic experience, these results provide new insight into how the human brain recalls, given proper cues, relevant memories to facilitate understanding and prediction of everyday life events. Copyright © 2018 Elsevier Inc. All rights reserved.
A Query Expansion Framework in Image Retrieval Domain Based on Local and Global Analysis
Rahman, M. M.; Antani, S. K.; Thoma, G. R.
2011-01-01
We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as “bag of concepts” that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall. PMID:21822350
Hyperspectral image visualization based on a human visual model
NASA Astrophysics Data System (ADS)
Zhang, Hongqin; Peng, Honghong; Fairchild, Mark D.; Montag, Ethan D.
2008-02-01
Hyperspectral image data can provide very fine spectral resolution with more than 200 bands, yet presents challenges for visualization techniques for displaying such rich information on a tristimulus monitor. This study developed a visualization technique by taking advantage of both the consistent natural appearance of a true color image and the feature separation of a PCA image based on a biologically inspired visual attention model. The key part is to extract the informative regions in the scene. The model takes into account human contrast sensitivity functions and generates a topographic saliency map for both images. This is accomplished using a set of linear "center-surround" operations simulating visual receptive fields as the difference between fine and coarse scales. A difference map between the saliency map of the true color image and that of the PCA image is derived and used as a mask on the true color image to select a small number of interesting locations where the PCA image has more salient features than available in the visible bands. The resulting representations preserve hue for vegetation, water, road etc., while the selected attentional locations may be analyzed by more advanced algorithms.
Web Map Services (WMS) Global Mosaic
NASA Technical Reports Server (NTRS)
Percivall, George; Plesea, Lucian
2003-01-01
The WMS Global Mosaic provides access to imagery of the global landmass using an open standard for web mapping. The seamless image is a mosaic of Landsat 7 scenes; geographically-accurate with 30 and 15 meter resolutions. By using the OpenGIS Web Map Service (WMS) interface, any organization can use the global mosaic as a layer in their geospatial applications. Based on a trade study, an implementation approach was chosen that extends a previously developed WMS hosting a Landsat 5 CONUS mosaic developed by JPL. The WMS Global Mosaic supports the NASA Geospatial Interoperability Office goal of providing an integrated digital representation of the Earth, widely accessible for humanity's critical decisions.
Comparing object recognition from binary and bipolar edge images for visual prostheses.
Jung, Jae-Hyun; Pu, Tian; Peli, Eli
2016-11-01
Visual prostheses require an effective representation method due to the limited display condition which has only 2 or 3 levels of grayscale in low resolution. Edges derived from abrupt luminance changes in images carry essential information for object recognition. Typical binary (black and white) edge images have been used to represent features to convey essential information. However, in scenes with a complex cluttered background, the recognition rate of the binary edge images by human observers is limited and additional information is required. The polarity of edges and cusps (black or white features on a gray background) carries important additional information; the polarity may provide shape from shading information missing in the binary edge image. This depth information may be restored by using bipolar edges. We compared object recognition rates from 16 binary edge images and bipolar edge images by 26 subjects to determine the possible impact of bipolar filtering in visual prostheses with 3 or more levels of grayscale. Recognition rates were higher with bipolar edge images and the improvement was significant in scenes with complex backgrounds. The results also suggest that erroneous shape from shading interpretation of bipolar edges resulting from pigment rather than boundaries of shape may confound the recognition.
Tachistoscopic illumination and masking of real scenes
Chichka, David; Philbeck, John W.; Gajewski, Daniel A.
2014-01-01
Tachistoscopic presentation of scenes has been valuable for studying the emerging properties of visual scene representations. The spatial aspects of this work have generally been focused on the conceptual locations (e.g., next to the refrigerator) and the directional locations of objects in 2D arrays and/or images. Less is known about how the perceived egocentric distance of objects develops. Here we describe a novel system for presenting brief glimpses of a real-world environment, followed by a mask. The system includes projectors with mechanical shutters for projecting the fixation and masking images, a set of LED floodlights for illuminating the environment, and computer-controlled electronics to set the timing and initiate the process. Because a real environment is used, most visual distance and depth cues may be manipulated using traditional methods. The system is inexpensive, robust, and its components are readily available in the marketplace. This paper describes the system and the timing characteristics of each component. Verification of the ability to control exposure to time scales as low as a few milliseconds is demonstrated. PMID:24519496
Dima, Diana C; Perry, Gavin; Singh, Krish D
2018-06-11
In navigating our environment, we rapidly process and extract meaning from visual cues. However, the relationship between visual features and categorical representations in natural scene perception is still not well understood. Here, we used natural scene stimuli from different categories and filtered at different spatial frequencies to address this question in a passive viewing paradigm. Using representational similarity analysis (RSA) and cross-decoding of magnetoencephalography (MEG) data, we show that categorical representations emerge in human visual cortex at ∼180 ms and are linked to spatial frequency processing. Furthermore, dorsal and ventral stream areas reveal temporally and spatially overlapping representations of low and high-level layer activations extracted from a feedforward neural network. Our results suggest that neural patterns from extrastriate visual cortex switch from low-level to categorical representations within 200 ms, highlighting the rapid cascade of processing stages essential in human visual perception. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
The Neural Dynamics of Attentional Selection in Natural Scenes.
Kaiser, Daniel; Oosterhof, Nikolaas N; Peelen, Marius V
2016-10-12
The human visual system can only represent a small subset of the many objects present in cluttered scenes at any given time, such that objects compete for representation. Despite these processing limitations, the detection of object categories in cluttered natural scenes is remarkably rapid. How does the brain efficiently select goal-relevant objects from cluttered scenes? In the present study, we used multivariate decoding of magneto-encephalography (MEG) data to track the neural representation of within-scene objects as a function of top-down attentional set. Participants detected categorical targets (cars or people) in natural scenes. The presence of these categories within a scene was decoded from MEG sensor patterns by training linear classifiers on differentiating cars and people in isolation and testing these classifiers on scenes containing one of the two categories. The presence of a specific category in a scene could be reliably decoded from MEG response patterns as early as 160 ms, despite substantial scene clutter and variation in the visual appearance of each category. Strikingly, we find that these early categorical representations fully depend on the match between visual input and top-down attentional set: only objects that matched the current attentional set were processed to the category level within the first 200 ms after scene onset. A sensor-space searchlight analysis revealed that this early attention bias was localized to lateral occipitotemporal cortex, reflecting top-down modulation of visual processing. These results show that attention quickly resolves competition between objects in cluttered natural scenes, allowing for the rapid neural representation of goal-relevant objects. Efficient attentional selection is crucial in many everyday situations. For example, when driving a car, we need to quickly detect obstacles, such as pedestrians crossing the street, while ignoring irrelevant objects. How can humans efficiently perform such tasks, given the multitude of objects contained in real-world scenes? Here we used multivariate decoding of magnetoencephalogaphy data to characterize the neural underpinnings of attentional selection in natural scenes with high temporal precision. We show that brain activity quickly tracks the presence of objects in scenes, but crucially only for those objects that were immediately relevant for the participant. These results provide evidence for fast and efficient attentional selection that mediates the rapid detection of goal-relevant objects in real-world environments. Copyright © 2016 the authors 0270-6474/16/3610522-07$15.00/0.
Intelligent bandwith compression
NASA Astrophysics Data System (ADS)
Tseng, D. Y.; Bullock, B. L.; Olin, K. E.; Kandt, R. K.; Olsen, J. D.
1980-02-01
The feasibility of a 1000:1 bandwidth compression ratio for image transmission has been demonstrated using image-analysis algorithms and a rule-based controller. Such a high compression ratio was achieved by first analyzing scene content using auto-cueing and feature-extraction algorithms, and then transmitting only the pertinent information consistent with mission requirements. A rule-based controller directs the flow of analysis and performs priority allocations on the extracted scene content. The reconstructed bandwidth-compressed image consists of an edge map of the scene background, with primary and secondary target windows embedded in the edge map. The bandwidth-compressed images are updated at a basic rate of 1 frame per second, with the high-priority target window updated at 7.5 frames per second. The scene-analysis algorithms used in this system together with the adaptive priority controller are described. Results of simulated 1000:1 band width-compressed images are presented. A video tape simulation of the Intelligent Bandwidth Compression system has been produced using a sequence of video input from the data base.
Research on the generation of the background with sea and sky in infrared scene
NASA Astrophysics Data System (ADS)
Dong, Yan-zhi; Han, Yan-li; Lou, Shu-li
2008-03-01
It is important for scene generation to keep the texture of infrared images in simulation of anti-ship infrared imaging guidance. We studied the fractal method and applied it to the infrared scene generation. We adopted the method of horizontal-vertical (HV) partition to encode the original image. Basing on the properties of infrared image with sea-sky background, we took advantage of Local Iteration Function System (LIFS) to decrease the complexity of computation and enhance the processing rate. Some results were listed. The results show that the fractal method can keep the texture of infrared image better and can be used in the infrared scene generation widely in future.
Visual memory for moving scenes.
DeLucia, Patricia R; Maldia, Maria M
2006-02-01
In the present study, memory for picture boundaries was measured with scenes that simulated self-motion along the depth axis. The results indicated that boundary extension (a distortion in memory for picture boundaries) occurred with moving scenes in the same manner as that reported previously for static scenes. Furthermore, motion affected memory for the boundaries but this effect of motion was not consistent with representational momentum of the self (memory being further forward in a motion trajectory than actually shown). We also found that memory for the final position of the depicted self in a moving scene was influenced by properties of the optical expansion pattern. The results are consistent with a conceptual framework in which the mechanisms that underlie boundary extension and representational momentum (a) process different information and (b) both contribute to the integration of successive views of a scene while the scene is changing.
A spectral water index based on visual bands
NASA Astrophysics Data System (ADS)
Basaeed, Essa; Bhaskar, Harish; Al-Mualla, Mohammed
2013-10-01
Land-water segmentation is an important preprocessing step in a number of remote sensing applications such as target detection, environmental monitoring, and map updating. A Normalized Optical Water Index (NOWI) is proposed to accurately discriminate between land and water regions in multi-spectral satellite imagery data from DubaiSat-1. NOWI exploits the spectral characteristics of water content (using visible bands) and uses a non-linear normalization procedure that renders strong emphasize on small changes in lower brightness values whilst guaranteeing that the segmentation process remains image-independent. The NOWI representation is validated through systematic experiments, evaluated using robust metrics, and compared against various supervised classification algorithms. Analysis has indicated that NOWI has the advantages that it: a) is a pixel-based method that requires no global knowledge of the scene under investigation, b) can be easily implemented in parallel processing, c) is image-independent and requires no training, d) works in different environmental conditions, e) provides high accuracy and efficiency, and f) works directly on the input image without any form of pre-processing.
Intelligent bandwidth compression
NASA Astrophysics Data System (ADS)
Tseng, D. Y.; Bullock, B. L.; Olin, K. E.; Kandt, R. K.; Olsen, J. D.
1980-02-01
The feasibility of a 1000:1 bandwidth compression ratio for image transmission has been demonstrated using image-analysis algorithms and a rule-based controller. Such a high compression ratio was achieved by first analyzing scene content using auto-cueing and feature-extraction algorithms, and then transmitting only the pertinent information consistent with mission requirements. A rule-based controller directs the flow of analysis and performs priority allocations on the extracted scene content. The reconstructed bandwidth-compressed image consists of an edge map of the scene background, with primary and secondary target windows embedded in the edge map. The bandwidth-compressed images are updated at a basic rate of 1 frame per second, with the high-priority target window updated at 7.5 frames per second. The scene-analysis algorithms used in this system together with the adaptive priority controller are described. Results of simulated 1000:1 bandwidth-compressed images are presented.
Secure Access Control and Large Scale Robust Representation for Online Multimedia Event Detection
Liu, Changyu; Li, Huiling
2014-01-01
We developed an online multimedia event detection (MED) system. However, there are a secure access control issue and a large scale robust representation issue when we want to integrate traditional event detection algorithms into the online environment. For the first issue, we proposed a tree proxy-based and service-oriented access control (TPSAC) model based on the traditional role based access control model. Verification experiments were conducted on the CloudSim simulation platform, and the results showed that the TPSAC model is suitable for the access control of dynamic online environments. For the second issue, inspired by the object-bank scene descriptor, we proposed a 1000-object-bank (1000OBK) event descriptor. Feature vectors of the 1000OBK were extracted from response pyramids of 1000 generic object detectors which were trained on standard annotated image datasets, such as the ImageNet dataset. A spatial bag of words tiling approach was then adopted to encode these feature vectors for bridging the gap between the objects and events. Furthermore, we performed experiments in the context of event classification on the challenging TRECVID MED 2012 dataset, and the results showed that the robust 1000OBK event descriptor outperforms the state-of-the-art approaches. PMID:25147840
Volumetric calibration of a plenoptic camera
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hall, Elise Munz; Fahringer, Timothy W.; Guildenbecher, Daniel Robert
Here, the volumetric calibration of a plenoptic camera is explored to correct for inaccuracies due to real-world lens distortions and thin-lens assumptions in current processing methods. Two methods of volumetric calibration based on a polynomial mapping function that does not require knowledge of specific lens parameters are presented and compared to a calibration based on thin-lens assumptions. The first method, volumetric dewarping, is executed by creation of a volumetric representation of a scene using the thin-lens assumptions, which is then corrected in post-processing using a polynomial mapping function. The second method, direct light-field calibration, uses the polynomial mapping in creationmore » of the initial volumetric representation to relate locations in object space directly to image sensor locations. The accuracy and feasibility of these methods is examined experimentally by capturing images of a known dot card at a variety of depths. Results suggest that use of a 3D polynomial mapping function provides a significant increase in reconstruction accuracy and that the achievable accuracy is similar using either polynomial-mapping-based method. Additionally, direct light-field calibration provides significant computational benefits by eliminating some intermediate processing steps found in other methods. Finally, the flexibility of this method is shown for a nonplanar calibration.« less
Volumetric calibration of a plenoptic camera
Hall, Elise Munz; Fahringer, Timothy W.; Guildenbecher, Daniel Robert; ...
2018-02-01
Here, the volumetric calibration of a plenoptic camera is explored to correct for inaccuracies due to real-world lens distortions and thin-lens assumptions in current processing methods. Two methods of volumetric calibration based on a polynomial mapping function that does not require knowledge of specific lens parameters are presented and compared to a calibration based on thin-lens assumptions. The first method, volumetric dewarping, is executed by creation of a volumetric representation of a scene using the thin-lens assumptions, which is then corrected in post-processing using a polynomial mapping function. The second method, direct light-field calibration, uses the polynomial mapping in creationmore » of the initial volumetric representation to relate locations in object space directly to image sensor locations. The accuracy and feasibility of these methods is examined experimentally by capturing images of a known dot card at a variety of depths. Results suggest that use of a 3D polynomial mapping function provides a significant increase in reconstruction accuracy and that the achievable accuracy is similar using either polynomial-mapping-based method. Additionally, direct light-field calibration provides significant computational benefits by eliminating some intermediate processing steps found in other methods. Finally, the flexibility of this method is shown for a nonplanar calibration.« less
Serial grouping of 2D-image regions with object-based attention in humans
Jeurissen, Danique; Self, Matthew W; Roelfsema, Pieter R
2016-01-01
After an initial stage of local analysis within the retina and early visual pathways, the human visual system creates a structured representation of the visual scene by co-selecting image elements that are part of behaviorally relevant objects. The mechanisms underlying this perceptual organization process are only partially understood. We here investigate the time-course of perceptual grouping of two-dimensional image-regions by measuring the reaction times of human participants and report that it is associated with the gradual spread of object-based attention. Attention spreads fastest over large and homogeneous areas and is slowed down at locations that require small-scale processing. We find that the time-course of the object-based selection process is well explained by a 'growth-cone' model, which selects surface elements in an incremental, scale-dependent manner. We discuss how the visual cortical hierarchy can implement this scale-dependent spread of object-based attention, leveraging the different receptive field sizes in distinct cortical areas. DOI: http://dx.doi.org/10.7554/eLife.14320.001 PMID:27291188
Plenoptic layer-based modeling for image based rendering.
Pearson, James; Brookes, Mike; Dragotti, Pier Luigi
2013-09-01
Image based rendering is an attractive alternative to model based rendering for generating novel views because of its lower complexity and potential for photo-realistic results. To reduce the number of images necessary for alias-free rendering, some geometric information for the 3D scene is normally necessary. In this paper, we present a fast automatic layer-based method for synthesizing an arbitrary new view of a scene from a set of existing views. Our algorithm takes advantage of the knowledge of the typical structure of multiview data to perform occlusion-aware layer extraction. In addition, the number of depth layers used to approximate the geometry of the scene is chosen based on plenoptic sampling theory with the layers placed non-uniformly to account for the scene distribution. The rendering is achieved using a probabilistic interpolation approach and by extracting the depth layer information on a small number of key images. Numerical results demonstrate that the algorithm is fast and yet is only 0.25 dB away from the ideal performance achieved with the ground-truth knowledge of the 3D geometry of the scene of interest. This indicates that there are measurable benefits from following the predictions of plenoptic theory and that they remain true when translated into a practical system for real world data.
Graphic representation of STS-99 Endeavour during SRTM
2000-02-04
JSC2000E-01557 (January 2000) --- This partially computer-generated scene depicts anticipated coverage by the Shuttle Radar Topography Mission (SRTM) of topographic features on Earth. Heavy cloud cover, hurricanes and cyclonic storms can prevent optical cameras on satellites or aircraft from imaging some areas. SRTM radar, with its long wavelength, will penetrate clouds as well as providing its own illumination, making it independent of daylight.
Literacy shapes thought: the case of event representation in different cultures
Dobel, Christian; Enriquez-Geppert, Stefanie; Zwitserlood, Pienie; Bölte, Jens
2013-01-01
There has been a lively debate whether conceptual representations of actions or scenes follow a left-to-right spatial transient when participants depict such events or scenes. It was even suggested that conceptualizing the agent on the left side represents a universal. We review the current literature with an emphasis on event representation and on cross-cultural studies. While there is quite some evidence for spatial bias for representations of events and scenes in diverse cultures, their extent and direction depend on task demands, one‘s native language, and importantly, on reading and writing direction. Whether transients arise only in subject-verb-object languages, due to their linear sentential position of event participants, is still an open issue. We investigated a group of illiterate speakers of Yucatec Maya, a language with a predominant verb-object-subject structure. They were compared to illiterate native speakers of Spanish. Neither group displayed a spatial transient. Given the current literature, we argue that learning to read and write has a strong impact on representations of actions and scenes. Thus, while it is still under debate whether language shapes thought, there is firm evidence that literacy does. PMID:24795665
A new framework for interactive quality assessment with application to light field coding
NASA Astrophysics Data System (ADS)
Viola, Irene; Ebrahimi, Touradj
2017-09-01
In recent years, light field has experienced a surge of popularity, mainly due to the recent advances in acquisition and rendering technologies that have made it more accessible to the public. Thanks to image-based rendering techniques, light field contents can be rendered in real time on common 2D screens, allowing virtual navigation through the captured scenes in an interactive fashion. However, this richer representation of the scene poses the problem of reliable quality assessments for light field contents. In particular, while subjective methodologies that enable interaction have already been proposed, no work has been done on assessing how users interact with light field contents. In this paper, we propose a new framework to subjectively assess the quality of light field contents in an interactive manner and simultaneously track users behaviour. The framework is successfully used to perform subjective assessment of two coding solutions. Moreover, statistical analysis performed on the results shows interesting correlation between subjective scores and average interaction time.
Reconstruction and simplification of urban scene models based on oblique images
NASA Astrophysics Data System (ADS)
Liu, J.; Guo, B.
2014-08-01
We describe a multi-view stereo reconstruction and simplification algorithms for urban scene models based on oblique images. The complexity, diversity, and density within the urban scene, it increases the difficulty to build the city models using the oblique images. But there are a lot of flat surfaces existing in the urban scene. One of our key contributions is that a dense matching algorithm based on Self-Adaptive Patch in view of the urban scene is proposed. The basic idea of matching propagating based on Self-Adaptive Patch is to build patches centred by seed points which are already matched. The extent and shape of the patches can adapt to the objects of urban scene automatically: when the surface is flat, the extent of the patch would become bigger; while the surface is very rough, the extent of the patch would become smaller. The other contribution is that the mesh generated by Graph Cuts is 2-manifold surface satisfied the half edge data structure. It is solved by clustering and re-marking tetrahedrons in s-t graph. The purpose of getting 2- manifold surface is to simply the mesh by edge collapse algorithm which can preserve and stand out the features of buildings.
Improved disparity map analysis through the fusion of monocular image segmentations
NASA Technical Reports Server (NTRS)
Perlant, Frederic P.; Mckeown, David M.
1991-01-01
The focus is to examine how estimates of three dimensional scene structure, as encoded in a scene disparity map, can be improved by the analysis of the original monocular imagery. The utilization of surface illumination information is provided by the segmentation of the monocular image into fine surface patches of nearly homogeneous intensity to remove mismatches generated during stereo matching. These patches are used to guide a statistical analysis of the disparity map based on the assumption that such patches correspond closely with physical surfaces in the scene. Such a technique is quite independent of whether the initial disparity map was generated by automated area-based or feature-based stereo matching. Stereo analysis results are presented on a complex urban scene containing various man-made and natural features. This scene contains a variety of problems including low building height with respect to the stereo baseline, buildings and roads in complex terrain, and highly textured buildings and terrain. The improvements are demonstrated due to monocular fusion with a set of different region-based image segmentations. The generality of this approach to stereo analysis and its utility in the development of general three dimensional scene interpretation systems are also discussed.
Artistic Representation with Pulsed Holography
NASA Astrophysics Data System (ADS)
Ishii, S.
2013-02-01
This thesis describes artistic representation through pulsed holography. One of the prevalent practical problems in making holograms is object movement. Any movement of the object or film, including movement caused by acoustic vibration, has the same fatal results. One way of reducing the chance of movement is by ensuring that the exposure is very quick; using a pulsed laser can fulfill this objective. The attractiveness of using pulsed laser is based on the variety of materials or objects that can be recorded (e.g., liquid material or instantaneous scene of a moving object). One of the most interesting points about pulsed holograms is that some reconstructed images present us with completely different views of the real world. For example, the holographic image of liquid material does not appear fluid; it looks like a piece of hard glass that would produce a sharp sound upon tapping. In everyday life, we are unfamiliar with such an instantaneous scene. On the other hand, soft-textured materials such as a feather or wool differ from liquids when observed through holography. Using a pulsed hologram, we can sense the soft touch of the object or material with the help of realistic three-dimensional (3-D) images. The images allow us to realize the sense of touch in a way that resembles touching real objects. I had the opportunity to use a pulsed ruby laser soon after I started to work in the field of holography in 1979. Since then, I have made pulsed holograms of activities, including pouring water, breaking eggs, blowing soap bubbles, and scattering feathers and popcorn. I have also created holographic art with materials and objects, such as silk fiber, fabric, balloons, glass, flowers, and even the human body. Whenever I create art, I like to present the spectator with a new experience in perception. Therefore, I would like to introduce my experimental artwork through those pulsed holograms.
The cognitive structural approach for image restoration
NASA Astrophysics Data System (ADS)
Mardare, Igor; Perju, Veacheslav; Casasent, David
2008-03-01
It is analyzed the important and actual problem of the defective images of scenes restoration. The proposed approach provides restoration of scenes by a system on the basis of human intelligence phenomena reproduction used for restoration-recognition of images. The cognitive models of the restoration process are elaborated. The models are realized by the intellectual processors constructed on the base of neural networks and associative memory using neural network simulator NNToolbox from MATLAB 7.0. The models provides restoration and semantic designing of images of scenes under defective images of the separate objects.
Restoration and reconstruction from overlapping images
NASA Technical Reports Server (NTRS)
Reichenbach, Stephen E.; Kaiser, Daniel J.; Hanson, Andrew L.; Li, Jing
1997-01-01
This paper describes a technique for restoring and reconstructing a scene from overlapping images. In situations where there are multiple, overlapping images of the same scene, it may be desirable to create a single image that most closely approximates the scene, based on all of the data in the available images. For example, successive swaths acquired by NASA's planned Moderate Imaging Spectrometer (MODIS) will overlap, particularly at wide scan angles, creating a severe visual artifact in the output image. Resampling the overlapping swaths to produce a more accurate image on a uniform grid requires restoration and reconstruction. The one-pass restoration and reconstruction technique developed in this paper yields mean-square-optimal resampling, based on a comprehensive end-to-end system model that accounts for image overlap, and subject to user-defined and data-availability constraints on the spatial support of the filter.
A graph theoretic approach to scene matching
NASA Technical Reports Server (NTRS)
Ranganath, Heggere S.; Chipman, Laure J.
1991-01-01
The ability to match two scenes is a fundamental requirement in a variety of computer vision tasks. A graph theoretic approach to inexact scene matching is presented which is useful in dealing with problems due to imperfect image segmentation. A scene is described by a set of graphs, with nodes representing objects and arcs representing relationships between objects. Each node has a set of values representing the relations between pairs of objects, such as angle, adjacency, or distance. With this method of scene representation, the task in scene matching is to match two sets of graphs. Because of segmentation errors, variations in camera angle, illumination, and other conditions, an exact match between the sets of observed and stored graphs is usually not possible. In the developed approach, the problem is represented as an association graph, in which each node represents a possible mapping of an observed region to a stored object, and each arc represents the compatibility of two mappings. Nodes and arcs have weights indicating the merit or a region-object mapping and the degree of compatibility between two mappings. A match between the two graphs corresponds to a clique, or fully connected subgraph, in the association graph. The task is to find the clique that represents the best match. Fuzzy relaxation is used to update the node weights using the contextual information contained in the arcs and neighboring nodes. This simplifies the evaluation of cliques. A method of handling oversegmentation and undersegmentation problems is also presented. The approach is tested with a set of realistic images which exhibit many types of sementation errors.
Enhancement of multispectral thermal infrared images - Decorrelation contrast stretching
NASA Technical Reports Server (NTRS)
Gillespie, Alan R.
1992-01-01
Decorrelation contrast stretching is an effective method for displaying information from multispectral thermal infrared (TIR) images. The technique involves transformation of the data to principle components ('decorrelation'), independent contrast 'stretching' of data from the new 'decorrelated' image bands, and retransformation of the stretched data back to the approximate original axes, based on the inverse of the principle component rotation. The enhancement is robust in that colors of the same scene components are similar in enhanced images of similar scenes, or the same scene imaged at different times. Decorrelation contrast stretching is reviewed in the context of other enhancements applied to TIR images.
Jouen, A L; Ellmore, T M; Madden, C J; Pallier, C; Dominey, P F; Ventre-Dominey, J
2015-02-01
This research tests the hypothesis that comprehension of human events will engage an extended semantic representation system, independent of the input modality (sentence vs. picture). To investigate this, we examined brain activation and connectivity in 19 subjects who read sentences and viewed pictures depicting everyday events, in a combined fMRI and DTI study. Conjunction of activity in understanding sentences and pictures revealed a common fronto-temporo-parietal network that included the middle and inferior frontal gyri, the parahippocampal-retrosplenial complex, the anterior and middle temporal gyri, the inferior parietal lobe in particular the temporo-parietal cortex. DTI tractography seeded from this temporo-parietal cortex hub revealed a multi-component network reaching into the temporal pole, the ventral frontal pole and premotor cortex. A significant correlation was found between the relative pathway density issued from the temporo-parietal cortex and the imageability of sentences for individual subjects, suggesting a potential functional link between comprehension and the temporo-parietal connectivity strength. These data help to define a "meaning" network that includes components of recently characterized systems for semantic memory, embodied simulation, and visuo-spatial scene representation. The network substantially overlaps with the "default mode" network implicated as part of a core network of semantic representation, along with brain systems related to the formation of mental models, and reasoning. These data are consistent with a model of real-world situational understanding that is highly embodied. Crucially, the neural basis of this embodied understanding is not limited to sensorimotor systems, but extends to the highest levels of cognition, including autobiographical memory, scene analysis, mental model formation, reasoning and theory of mind. Copyright © 2014 Elsevier Inc. All rights reserved.
Multidimensional brain activity dictated by winner-take-all mechanisms.
Tozzi, Arturo; Peters, James F
2018-06-21
A novel demon-based architecture is introduced to elucidate brain functions such as pattern recognition during human perception and mental interpretation of visual scenes. Starting from the topological concepts of invariance and persistence, we introduce a Selfridge pandemonium variant of brain activity that takes into account a novel feature, namely, demons that recognize short straight-line segments, curved lines and scene shapes, such as shape interior, density and texture. Low-level representations of objects can be mapped to higher-level views (our mental interpretations): a series of transformations can be gradually applied to a pattern in a visual scene, without affecting its invariant properties. This makes it possible to construct a symbolic multi-dimensional representation of the environment. These representations can be projected continuously to an object that we have seen and continue to see, thanks to the mapping from shapes in our memory to shapes in Euclidean space. Although perceived shapes are 3-dimensional (plus time), the evaluation of shape features (volume, color, contour, closeness, texture, and so on) leads to n-dimensional brain landscapes. Here we discuss the advantages of our parallel, hierarchical model in pattern recognition, computer vision and biological nervous system's evolution. Copyright © 2018 Elsevier B.V. All rights reserved.
Fu, Kun; Jin, Junqi; Cui, Runpeng; Sha, Fei; Zhang, Changshui
2017-12-01
Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an image captioning system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifts among the visual regions-such transitions impose a thread of ordering in visual perception. This alignment characterizes the flow of latent meaning, which encodes what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types. We benchmark our system and contrast to published results on several popular datasets, using both automatic evaluation metrics and human evaluation. We show that either region-based attention or scene-specific contexts improves systems without those components. Furthermore, combining these two modeling ingredients attains the state-of-the-art performance.
Invariant visual object recognition: a model, with lighting invariance.
Rolls, Edmund T; Stringer, Simon M
2006-01-01
How are invariant representations of objects formed in the visual cortex? We describe a neurophysiological and computational approach which focusses on a feature hierarchy model in which invariant representations can be built by self-organizing learning based on the statistics of the visual input. The model can use temporal continuity in an associative synaptic learning rule with a short term memory trace, and/or it can use spatial continuity in Continuous Transformation learning. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and in this paper we show also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in for example spatial and object search tasks. The model has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene.
Initial Scene Representations Facilitate Eye Movement Guidance in Visual Search
ERIC Educational Resources Information Center
Castelhano, Monica S.; Henderson, John M.
2007-01-01
What role does the initial glimpse of a scene play in subsequent eye movement guidance? In 4 experiments, a brief scene preview was followed by object search through the scene via a small moving window that was tied to fixation position. Experiment 1 demonstrated that the scene preview resulted in more efficient eye movements compared with a…
Multiscale deep features learning for land-use scene recognition
NASA Astrophysics Data System (ADS)
Yuan, Baohua; Li, Shijin; Li, Ning
2018-01-01
The features extracted from deep convolutional neural networks (CNNs) have shown their promise as generic descriptors for land-use scene recognition. However, most of the work directly adopts the deep features for the classification of remote sensing images, and does not encode the deep features for improving their discriminative power, which can affect the performance of deep feature representations. To address this issue, we propose an effective framework, LASC-CNN, obtained by locality-constrained affine subspace coding (LASC) pooling of a CNN filter bank. LASC-CNN obtains more discriminative deep features than directly extracted from CNNs. Furthermore, LASC-CNN builds on the top convolutional layers of CNNs, which can incorporate multiscale information and regions of arbitrary resolution and sizes. Our experiments have been conducted using two widely used remote sensing image databases, and the results show that the proposed method significantly improves the performance when compared to other state-of-the-art methods.
The Nature of Change Detection and Online Representations of Scenes
ERIC Educational Resources Information Center
Ryan,J ennifer D.; Cohen, Neal J.
2004-01-01
This article provides evidence for implicit change detection and for the contribution of multiple memory sources to online representations. Multiple eye-movement measures distinguished original from changed scenes, even when college students had no conscious awareness for the change. Patients with amnesia showed a systematic deficit on 1 class of…
Research on polarization imaging information parsing method
NASA Astrophysics Data System (ADS)
Yuan, Hongwu; Zhou, Pucheng; Wang, Xiaolong
2016-11-01
Polarization information parsing plays an important role in polarization imaging detection. This paper focus on the polarization information parsing method: Firstly, the general process of polarization information parsing is given, mainly including polarization image preprocessing, multiple polarization parameters calculation, polarization image fusion and polarization image tracking, etc.; And then the research achievements of the polarization information parsing method are presented, in terms of polarization image preprocessing, the polarization image registration method based on the maximum mutual information is designed. The experiment shows that this method can improve the precision of registration and be satisfied the need of polarization information parsing; In terms of multiple polarization parameters calculation, based on the omnidirectional polarization inversion model is built, a variety of polarization parameter images are obtained and the precision of inversion is to be improve obviously; In terms of polarization image fusion , using fuzzy integral and sparse representation, the multiple polarization parameters adaptive optimal fusion method is given, and the targets detection in complex scene is completed by using the clustering image segmentation algorithm based on fractal characters; In polarization image tracking, the average displacement polarization image characteristics of auxiliary particle filtering fusion tracking algorithm is put forward to achieve the smooth tracking of moving targets. Finally, the polarization information parsing method is applied to the polarization imaging detection of typical targets such as the camouflage target, the fog and latent fingerprints.
Visual information processing; Proceedings of the Meeting, Orlando, FL, Apr. 20-22, 1992
NASA Technical Reports Server (NTRS)
Huck, Friedrich O. (Editor); Juday, Richard D. (Editor)
1992-01-01
Topics discussed in these proceedings include nonlinear processing and communications; feature extraction and recognition; image gathering, interpolation, and restoration; image coding; and wavelet transform. Papers are presented on noise reduction for signals from nonlinear systems; driving nonlinear systems with chaotic signals; edge detection and image segmentation of space scenes using fractal analyses; a vision system for telerobotic operation; a fidelity analysis of image gathering, interpolation, and restoration; restoration of images degraded by motion; and information, entropy, and fidelity in visual communication. Attention is also given to image coding methods and their assessment, hybrid JPEG/recursive block coding of images, modified wavelets that accommodate causality, modified wavelet transform for unbiased frequency representation, and continuous wavelet transform of one-dimensional signals by Fourier filtering.
NASA Astrophysics Data System (ADS)
Yang, L.; Shi, L.; Li, P.; Yang, J.; Zhao, L.; Zhao, B.
2018-04-01
Due to the forward scattering and block of radar signal, the water, bare soil, shadow, named low backscattering objects (LBOs), often present low backscattering intensity in polarimetric synthetic aperture radar (PolSAR) image. Because the LBOs rise similar backscattering intensity and polarimetric responses, the spectral-based classifiers are inefficient to deal with LBO classification, such as Wishart method. Although some polarimetric features had been exploited to relieve the confusion phenomenon, the backscattering features are still found unstable when the system noise floor varies in the range direction. This paper will introduce a simple but effective scene classification method based on Bag of Words (BoW) model using Support Vector Machine (SVM) to discriminate the LBOs, without relying on any polarimetric features. In the proposed approach, square windows are firstly opened around the LBOs adaptively to determine the scene images, and then the Scale-Invariant Feature Transform (SIFT) points are detected in training and test scenes. The several SIFT features detected are clustered using K-means to obtain certain cluster centers as the visual word lists and scene images are represented using word frequency. At last, the SVM is selected for training and predicting new scenes as some kind of LBOs. The proposed method is executed over two AIRSAR data sets at C band and L band, including water, bare soil and shadow scenes. The experimental results illustrate the effectiveness of the scene method in distinguishing LBOs.
Real-time generation of infrared ocean scene based on GPU
NASA Astrophysics Data System (ADS)
Jiang, Zhaoyi; Wang, Xun; Lin, Yun; Jin, Jianqiu
2007-12-01
Infrared (IR) image synthesis for ocean scene has become more and more important nowadays, especially for remote sensing and military application. Although a number of works present ready-to-use simulations, those techniques cover only a few possible ways of water interacting with the environment. And the detail calculation of ocean temperature is rarely considered by previous investigators. With the advance of programmable features of graphic card, many algorithms previously limited to offline processing have become feasible for real-time usage. In this paper, we propose an efficient algorithm for real-time rendering of infrared ocean scene using the newest features of programmable graphics processors (GPU). It differs from previous works in three aspects: adaptive GPU-based ocean surface tessellation, sophisticated balance equation of thermal balance for ocean surface, and GPU-based rendering for infrared ocean scene. Finally some results of infrared image are shown, which are in good accordance with real images.
Forced to remember: when memory is biased by salient information.
Santangelo, Valerio
2015-04-15
The last decades have seen a rapid growing in the attempt to understand the key factors involved in the internal memory representation of the external world. Visual salience have been found to provide a major contribution in predicting the probability for an item/object embedded in a complex setting (i.e., a natural scene) to be encoded and then remembered later on. Here I review the existing literature highlighting the impact of perceptual- (based on low-level sensory features) and semantics-related salience (based on high-level knowledge) on short-term memory representation, along with the neural mechanisms underpinning the interplay between these factors. The available evidence reveal that both perceptual- and semantics-related factors affect attention selection mechanisms during the encoding of natural scenes. Biasing internal memory representation, both perceptual and semantics factors increase the probability to remember high- to the detriment of low-saliency items. The available evidence also highlight an interplay between these factors, with a reduced impact of perceptual-related salience in biasing memory representation as a function of the increasing availability of semantics-related salient information. The neural mechanisms underpinning this interplay involve the activation of different portions of the frontoparietal attention control network. Ventral regions support the assignment of selection/encoding priorities based on high-level semantics, while the involvement of dorsal regions reflects priorities assignment based on low-level sensory features. Copyright © 2015 Elsevier B.V. All rights reserved.
Modeling global scene factors in attention
NASA Astrophysics Data System (ADS)
Torralba, Antonio
2003-07-01
Models of visual attention have focused predominantly on bottom-up approaches that ignored structured contextual and scene information. I propose a model of contextual cueing for attention guidance based on the global scene configuration. It is shown that the statistics of low-level features across the whole image can be used to prime the presence or absence of objects in the scene and to predict their location, scale, and appearance before exploring the image. In this scheme, visual context information can become available early in the visual processing chain, which allows modulation of the saliency of image regions and provides an efficient shortcut for object detection and recognition. 2003 Optical Society of America
Poyneer, Lisa A; Bauman, Brian J
2015-03-31
Reference-free compensated imaging makes an estimation of the Fourier phase of a series of images of a target. The Fourier magnitude of the series of images is obtained by dividing the power spectral density of the series of images by an estimate of the power spectral density of atmospheric turbulence from a series of scene based wave front sensor (SBWFS) measurements of the target. A high-resolution image of the target is recovered from the Fourier phase and the Fourier magnitude.
Comparing object recognition from binary and bipolar edge images for visual prostheses
Jung, Jae-Hyun; Pu, Tian; Peli, Eli
2017-01-01
Visual prostheses require an effective representation method due to the limited display condition which has only 2 or 3 levels of grayscale in low resolution. Edges derived from abrupt luminance changes in images carry essential information for object recognition. Typical binary (black and white) edge images have been used to represent features to convey essential information. However, in scenes with a complex cluttered background, the recognition rate of the binary edge images by human observers is limited and additional information is required. The polarity of edges and cusps (black or white features on a gray background) carries important additional information; the polarity may provide shape from shading information missing in the binary edge image. This depth information may be restored by using bipolar edges. We compared object recognition rates from 16 binary edge images and bipolar edge images by 26 subjects to determine the possible impact of bipolar filtering in visual prostheses with 3 or more levels of grayscale. Recognition rates were higher with bipolar edge images and the improvement was significant in scenes with complex backgrounds. The results also suggest that erroneous shape from shading interpretation of bipolar edges resulting from pigment rather than boundaries of shape may confound the recognition. PMID:28458481
Utilization of DIRSIG in support of real-time infrared scene generation
NASA Astrophysics Data System (ADS)
Sanders, Jeffrey S.; Brown, Scott D.
2000-07-01
Real-time infrared scene generation for hardware-in-the-loop has been a traditionally difficult challenge. Infrared scenes are usually generated using commercial hardware that was not designed to properly handle the thermal and environmental physics involved. Real-time infrared scenes typically lack details that are included in scenes rendered in no-real- time by ray-tracing programs such as the Digital Imaging and Remote Sensing Scene Generation (DIRSIG) program. However, executing DIRSIG in real-time while retaining all the physics is beyond current computational capabilities for many applications. DIRSIG is a first principles-based synthetic image generation model that produces multi- or hyper-spectral images in the 0.3 to 20 micron region of the electromagnetic spectrum. The DIRSIG model is an integrated collection of independent first principles based on sub-models, each of which works in conjunction to produce radiance field images with high radiometric fidelity. DIRSIG uses the MODTRAN radiation propagation model for exo-atmospheric irradiance, emitted and scattered radiances (upwelled and downwelled) and path transmission predictions. This radiometry submodel utilizes bidirectional reflectance data, accounts for specular and diffuse background contributions, and features path length dependent extinction and emission for transmissive bodies (plumes, clouds, etc.) which may be present in any target, background or solar path. This detailed environmental modeling greatly enhances the number of rendered features and hence, the fidelity of a rendered scene. While DIRSIG itself cannot currently be executed in real-time, its outputs can be used to provide scene inputs for real-time scene generators. These inputs can incorporate significant features such as target to background thermal interactions, static background object thermal shadowing, and partially transmissive countermeasures. All of these features represent significant improvements over the current state of the art in real-time IR scene generation.
The lucky image-motion prediction for simple scene observation based soft-sensor technology
NASA Astrophysics Data System (ADS)
Li, Yan; Su, Yun; Hu, Bin
2015-08-01
High resolution is important to earth remote sensors, while the vibration of the platforms of the remote sensors is a major factor restricting high resolution imaging. The image-motion prediction and real-time compensation are key technologies to solve this problem. For the reason that the traditional autocorrelation image algorithm cannot meet the demand for the simple scene image stabilization, this paper proposes to utilize soft-sensor technology in image-motion prediction, and focus on the research of algorithm optimization in imaging image-motion prediction. Simulations results indicate that the improving lucky image-motion stabilization algorithm combining the Back Propagation Network (BP NN) and support vector machine (SVM) is the most suitable for the simple scene image stabilization. The relative error of the image-motion prediction based the soft-sensor technology is below 5%, the training computing speed of the mathematical predication model is as fast as the real-time image stabilization in aerial photography.
NASA Astrophysics Data System (ADS)
Lo, Mei-Chun; Hsieh, Tsung-Hsien; Perng, Ruey-Kuen; Chen, Jiong-Qiao
2010-01-01
The aim of this research is to derive illuminant-independent type of HDR imaging modules which can optimally multispectrally reconstruct of every color concerned in high-dynamic-range of original images for preferable cross-media color reproduction applications. Each module, based on either of broadband and multispectral approach, would be incorporated models of perceptual HDR tone-mapping, device characterization. In this study, an xvYCC format of HDR digital camera was used to capture HDR scene images for test. A tone-mapping module was derived based on a multiscale representation of the human visual system and used equations similar to a photoreceptor adaptation equation, proposed by Michaelis-Menten. Additionally, an adaptive bilateral type of gamut mapping algorithm, using approach of a multiple conversing-points (previously derived), was incorporated with or without adaptive Un-sharp Masking (USM) to carry out the optimization of HDR image rendering. An LCD with standard color space of Adobe RGB (D65) was used as a soft-proofing platform to display/represent HDR original RGB images, and also evaluate both renditionquality and prediction-performance of modules derived. Also, another LCD with standard color space of sRGB was used to test gamut-mapping algorithms, used to be integrated with tone-mapping module derived.
Spontaneous Action Representation in Smokers when Watching Movie Characters Smoke
Wagner, Dylan D.; Cin, Sonya Dal; Sargent, James D.; Kelley, William M.; Heatherton, Todd F.
2013-01-01
Do smokers simulate smoking when they see someone else smoke? For regular smokers, smoking is such a highly practiced motor skill that it often occurs automatically, without conscious awareness. Research on the brain basis of action observation has delineated a frontopareital network that is commonly recruited when people observe, plan or imitate actions. Here, we investigated whether this action observation network would be preferentially recruited in smokers when viewing complex smoking cues, such as those occurring in motion pictures. Seventeen right-handed smokers and seventeen non-smokers watched a popular movie while undergoing functional magnetic resonance imaging. Using a natural stimulus, such as a movie, allowd us to keep both smoking and non-smoking participants naïve to the goals of the experiment. Brain activity evoked by scenes of movie smoking was contrasted with non-smoking control scenes which were matched for frequency and duration. Compared to non-smokers, smokers showed greater activity in left anterior intraparietal sulcus and inferior frontal gyrus, both regions involved in the simulation of contralateral hand-based gestures, when viewing smoking vs. control scenes. These results demonstrate that smokers spontaneously represent the action of smoking when viewing others smoke, the consequence of which may make it more difficult to abstain from smoking. PMID:21248113
Groen, Iris I A; Silson, Edward H; Baker, Chris I
2017-02-19
Visual scene analysis in humans has been characterized by the presence of regions in extrastriate cortex that are selectively responsive to scenes compared with objects or faces. While these regions have often been interpreted as representing high-level properties of scenes (e.g. category), they also exhibit substantial sensitivity to low-level (e.g. spatial frequency) and mid-level (e.g. spatial layout) properties, and it is unclear how these disparate findings can be united in a single framework. In this opinion piece, we suggest that this problem can be resolved by questioning the utility of the classical low- to high-level framework of visual perception for scene processing, and discuss why low- and mid-level properties may be particularly diagnostic for the behavioural goals specific to scene perception as compared to object recognition. In particular, we highlight the contributions of low-level vision to scene representation by reviewing (i) retinotopic biases and receptive field properties of scene-selective regions and (ii) the temporal dynamics of scene perception that demonstrate overlap of low- and mid-level feature representations with those of scene category. We discuss the relevance of these findings for scene perception and suggest a more expansive framework for visual scene analysis.This article is part of the themed issue 'Auditory and visual scene analysis'. © 2017 The Author(s).
2017-01-01
Visual scene analysis in humans has been characterized by the presence of regions in extrastriate cortex that are selectively responsive to scenes compared with objects or faces. While these regions have often been interpreted as representing high-level properties of scenes (e.g. category), they also exhibit substantial sensitivity to low-level (e.g. spatial frequency) and mid-level (e.g. spatial layout) properties, and it is unclear how these disparate findings can be united in a single framework. In this opinion piece, we suggest that this problem can be resolved by questioning the utility of the classical low- to high-level framework of visual perception for scene processing, and discuss why low- and mid-level properties may be particularly diagnostic for the behavioural goals specific to scene perception as compared to object recognition. In particular, we highlight the contributions of low-level vision to scene representation by reviewing (i) retinotopic biases and receptive field properties of scene-selective regions and (ii) the temporal dynamics of scene perception that demonstrate overlap of low- and mid-level feature representations with those of scene category. We discuss the relevance of these findings for scene perception and suggest a more expansive framework for visual scene analysis. This article is part of the themed issue ‘Auditory and visual scene analysis’. PMID:28044013
NASA Astrophysics Data System (ADS)
Anwer, Rao Muhammad; Khan, Fahad Shahbaz; van de Weijer, Joost; Molinier, Matthieu; Laaksonen, Jorma
2018-04-01
Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The de facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Local Binary Patterns (LBP) encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit LBP based texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Furthermore, our final combination leads to consistent improvement over the state-of-the-art for remote sensing scene classification.
Virtual viewpoint generation for three-dimensional display based on the compressive light field
NASA Astrophysics Data System (ADS)
Meng, Qiao; Sang, Xinzhu; Chen, Duo; Guo, Nan; Yan, Binbin; Yu, Chongxiu; Dou, Wenhua; Xiao, Liquan
2016-10-01
Virtual view-point generation is one of the key technologies the three-dimensional (3D) display, which renders the new scene image perspective with the existing viewpoints. The three-dimensional scene information can be effectively recovered at different viewing angles to allow users to switch between different views. However, in the process of multiple viewpoints matching, when N free viewpoints are received, we need to match N viewpoints each other, namely matching C 2N = N(N-1)/2 times, and even in the process of matching different baselines errors can occur. To address the problem of great complexity of the traditional virtual view point generation process, a novel and rapid virtual view point generation algorithm is presented in this paper, and actual light field information is used rather than the geometric information. Moreover, for better making the data actual meaning, we mainly use nonnegative tensor factorization(NTF). A tensor representation is introduced for virtual multilayer displays. The light field emitted by an N-layer, M-frame display is represented by a sparse set of non-zero elements restricted to a plane within an Nth-order, rank-M tensor. The tensor representation allows for optimal decomposition of a light field into time-multiplexed, light-attenuating layers using NTF. Finally, the compressive light field of multilayer displays information synthesis is used to obtain virtual view-point by multiple multiplication. Experimental results show that the approach not only the original light field is restored with the high image quality, whose PSNR is 25.6dB, but also the deficiency of traditional matching is made up and any viewpoint can obtained from N free viewpoints.
NASA Astrophysics Data System (ADS)
Yang, Xue; Sun, Hao; Fu, Kun; Yang, Jirui; Sun, Xian; Yan, Menglong; Guo, Zhi
2018-01-01
Ship detection has been playing a significant role in the field of remote sensing for a long time but it is still full of challenges. The main limitations of traditional ship detection methods usually lie in the complexity of application scenarios, the difficulty of intensive object detection and the redundancy of detection region. In order to solve such problems above, we propose a framework called Rotation Dense Feature Pyramid Networks (R-DFPN) which can effectively detect ship in different scenes including ocean and port. Specifically, we put forward the Dense Feature Pyramid Network (DFPN), which is aimed at solving the problem resulted from the narrow width of the ship. Compared with previous multi-scale detectors such as Feature Pyramid Network (FPN), DFPN builds the high-level semantic feature-maps for all scales by means of dense connections, through which enhances the feature propagation and encourages the feature reuse. Additionally, in the case of ship rotation and dense arrangement, we design a rotation anchor strategy to predict the minimum circumscribed rectangle of the object so as to reduce the redundant detection region and improve the recall. Furthermore, we also propose multi-scale ROI Align for the purpose of maintaining the completeness of semantic and spatial information. Experiments based on remote sensing images from Google Earth for ship detection show that our detection method based on R-DFPN representation has a state-of-the-art performance.
Paleolithic art and cognition.
Halverson, J
1992-05-01
In this article, I have explored some of the possible relationships between the first appearance of representational art in human history and the early development of human cognition. I argue that most Upper Paleolithic depictions directly represent generalized mental images of their animal subjects rather than percepts or recollected scenes from life and that these images, in turn, are representations of concepts at the basic level of categorization. A common feature of Paleolithic art forms is the salience of parts, and the treatment of parts indicates analytic and synthetic (recombinative) abilities. There are some indications of superordinate categorization. An expansion of conceptual thinking seems to be implied as well as the beginnings of operational thought. The presence and practice of depiction may have had the effect of bringing concepts into consciousness and thus inducing reflection in at least a partially abstract mode.
Fixed Pattern Noise pixel-wise linear correction for crime scene imaging CMOS sensor
NASA Astrophysics Data System (ADS)
Yang, Jie; Messinger, David W.; Dube, Roger R.; Ientilucci, Emmett J.
2017-05-01
Filtered multispectral imaging technique might be a potential method for crime scene documentation and evidence detection due to its abundant spectral information as well as non-contact and non-destructive nature. Low-cost and portable multispectral crime scene imaging device would be highly useful and efficient. The second generation crime scene imaging system uses CMOS imaging sensor to capture spatial scene and bandpass Interference Filters (IFs) to capture spectral information. Unfortunately CMOS sensors suffer from severe spatial non-uniformity compared to CCD sensors and the major cause is Fixed Pattern Noise (FPN). IFs suffer from "blue shift" effect and introduce spatial-spectral correlated errors. Therefore, Fixed Pattern Noise (FPN) correction is critical to enhance crime scene image quality and is also helpful for spatial-spectral noise de-correlation. In this paper, a pixel-wise linear radiance to Digital Count (DC) conversion model is constructed for crime scene imaging CMOS sensor. Pixel-wise conversion gain Gi,j and Dark Signal Non-Uniformity (DSNU) Zi,j are calculated. Also, conversion gain is divided into four components: FPN row component, FPN column component, defects component and effective photo response signal component. Conversion gain is then corrected to average FPN column and row components and defects component so that the sensor conversion gain is uniform. Based on corrected conversion gain and estimated image incident radiance from the reverse of pixel-wise linear radiance to DC model, corrected image spatial uniformity can be enhanced to 7 times as raw image, and the bigger the image DC value within its dynamic range, the better the enhancement.
Pattern-histogram-based temporal change detection using personal chest radiographs
NASA Astrophysics Data System (ADS)
Ugurlu, Yucel; Obi, Takashi; Hasegawa, Akira; Yamaguchi, Masahiro; Ohyama, Nagaaki
1999-05-01
An accurate and reliable detection of temporal changes from a pair of images has considerable interest in the medical science. Traditional registration and subtraction techniques can be applied to extract temporal differences when,the object is rigid or corresponding points are obvious. However, in radiological imaging, loss of the depth information, the elasticity of object, the absence of clearly defined landmarks and three-dimensional positioning differences constraint the performance of conventional registration techniques. In this paper, we propose a new method in order to detect interval changes accurately without using an image registration technique. The method is based on construction of so-called pattern histogram and comparison procedure. The pattern histogram is a graphic representation of the frequency counts of all allowable patterns in the multi-dimensional pattern vector space. K-means algorithm is employed to partition pattern vector space successively. Any differences in the pattern histograms imply that different patterns are involved in the scenes. In our experiment, a pair of chest radiographs of pneumoconiosis is employed and the changing histogram bins are visualized on both of the images. We found that the method can be used as an alternative way of temporal change detection, particularly when the precise image registration is not available.
Neural Similarity Between Encoding and Retrieval is Related to Memory Via Hippocampal Interactions
Ritchey, Maureen; Wing, Erik A.; LaBar, Kevin S.; Cabeza, Roberto
2013-01-01
A fundamental principle in memory research is that memory is a function of the similarity between encoding and retrieval operations. Consistent with this principle, many neurobiological models of declarative memory assume that memory traces are stored in cortical regions, and the hippocampus facilitates the reactivation of these traces during retrieval. The present investigation tested the novel prediction that encoding–retrieval similarity can be observed and related to memory at the level of individual items. Multivariate representational similarity analysis was applied to functional magnetic resonance imaging data collected during encoding and retrieval of emotional and neutral scenes. Memory success tracked fluctuations in encoding–retrieval similarity across frontal and posterior cortices. Importantly, memory effects in posterior regions reflected increased similarity between item-specific representations during successful recognition. Mediation analyses revealed that the hippocampus mediated the link between cortical similarity and memory success, providing crucial evidence for hippocampal–cortical interactions during retrieval. Finally, because emotional arousal is known to modulate both perceptual and memory processes, similarity effects were compared for emotional and neutral scenes. Emotional arousal was associated with enhanced similarity between encoding and retrieval patterns. These findings speak to the promise of pattern similarity measures for evaluating memory representations and hippocampal–cortical interactions. PMID:22967731
ERIC Educational Resources Information Center
Altmann, Gerry T. M.; Kamide, Yuki
2009-01-01
Two experiments explored the mapping between language and mental representations of visual scenes. In both experiments, participants viewed, for example, a scene depicting a woman, a wine glass and bottle on the floor, an empty table, and various other objects. In Experiment 1, participants concurrently heard either "The woman will put the glass…
ERIC Educational Resources Information Center
Greene, Michelle R.; Oliva, Aude
2009-01-01
Human observers are able to rapidly and accurately categorize natural scenes, but the representation mediating this feat is still unknown. Here we propose a framework of rapid scene categorization that does not segment a scene into objects and instead uses a vocabulary of global, ecological properties that describe spatial and functional aspects…
Scene and Position Specificity in Visual Memory for Objects
ERIC Educational Resources Information Center
Hollingworth, Andrew
2006-01-01
This study investigated whether and how visual representations of individual objects are bound in memory to scene context. Participants viewed a series of naturalistic scenes, and memory for the visual form of a target object in each scene was examined in a 2-alternative forced-choice test, with the distractor object either a different object…
Color constancy in natural scenes explained by global image statistics
Foster, David H.; Amano, Kinjiro; Nascimento, Sérgio M. C.
2007-01-01
To what extent do observers' judgments of surface color with natural scenes depend on global image statistics? To address this question, a psychophysical experiment was performed in which images of natural scenes under two successive daylights were presented on a computer-controlled high-resolution color monitor. Observers reported whether there was a change in reflectance of a test surface in the scene. The scenes were obtained with a hyperspectral imaging system and included variously trees, shrubs, grasses, ferns, flowers, rocks, and buildings. Discrimination performance, quantified on a scale of 0 to 1 with a color-constancy index, varied from 0.69 to 0.97 over 21 scenes and two illuminant changes, from a correlated color temperature of 25,000 K to 6700 K and from 4000 K to 6700 K. The best account of these effects was provided by receptor-based rather than colorimetric properties of the images. Thus, in a linear regression, 43% of the variance in constancy index was explained by the log of the mean relative deviation in spatial cone-excitation ratios evaluated globally across the two images of a scene. A further 20% was explained by including the mean chroma of the first image and its difference from that of the second image and a further 7% by the mean difference in hue. Together, all four global color properties accounted for 70% of the variance and provided a good fit to the effects of scene and of illuminant change on color constancy, and, additionally, of changing test-surface position. By contrast, a spatial-frequency analysis of the images showed that the gradient of the luminance amplitude spectrum accounted for only 5% of the variance. PMID:16961965
Color constancy in natural scenes explained by global image statistics.
Foster, David H; Amano, Kinjiro; Nascimento, Sérgio M C
2006-01-01
To what extent do observers' judgments of surface color with natural scenes depend on global image statistics? To address this question, a psychophysical experiment was performed in which images of natural scenes under two successive daylights were presented on a computer-controlled high-resolution color monitor. Observers reported whether there was a change in reflectance of a test surface in the scene. The scenes were obtained with a hyperspectral imaging system and included variously trees, shrubs, grasses, ferns, flowers, rocks, and buildings. Discrimination performance, quantified on a scale of 0 to 1 with a color-constancy index, varied from 0.69 to 0.97 over 21 scenes and two illuminant changes, from a correlated color temperature of 25,000 K to 6700 K and from 4000 K to 6700 K. The best account of these effects was provided by receptor-based rather than colorimetric properties of the images. Thus, in a linear regression, 43% of the variance in constancy index was explained by the log of the mean relative deviation in spatial cone-excitation ratios evaluated globally across the two images of a scene. A further 20% was explained by including the mean chroma of the first image and its difference from that of the second image and a further 7% by the mean difference in hue. Together, all four global color properties accounted for 70% of the variance and provided a good fit to the effects of scene and of illuminant change on color constancy, and, additionally, of changing test-surface position. By contrast, a spatial-frequency analysis of the images showed that the gradient of the luminance amplitude spectrum accounted for only 5% of the variance.
Research on hyperspectral dynamic scene and image sequence simulation
NASA Astrophysics Data System (ADS)
Sun, Dandan; Liu, Fang; Gao, Jiaobo; Sun, Kefeng; Hu, Yu; Li, Yu; Xie, Junhu; Zhang, Lei
2016-10-01
This paper presents a simulation method of hyperspectral dynamic scene and image sequence for hyperspectral equipment evaluation and target detection algorithm. Because of high spectral resolution, strong band continuity, anti-interference and other advantages, in recent years, hyperspectral imaging technology has been rapidly developed and is widely used in many areas such as optoelectronic target detection, military defense and remote sensing systems. Digital imaging simulation, as a crucial part of hardware in loop simulation, can be applied to testing and evaluation hyperspectral imaging equipment with lower development cost and shorter development period. Meanwhile, visual simulation can produce a lot of original image data under various conditions for hyperspectral image feature extraction and classification algorithm. Based on radiation physic model and material characteristic parameters this paper proposes a generation method of digital scene. By building multiple sensor models under different bands and different bandwidths, hyperspectral scenes in visible, MWIR, LWIR band, with spectral resolution 0.01μm, 0.05μm and 0.1μm have been simulated in this paper. The final dynamic scenes have high real-time and realistic, with frequency up to 100 HZ. By means of saving all the scene gray data in the same viewpoint image sequence is obtained. The analysis results show whether in the infrared band or the visible band, the grayscale variations of simulated hyperspectral images are consistent with the theoretical analysis results.
Real-time maritime scene simulation for ladar sensors
NASA Astrophysics Data System (ADS)
Christie, Chad L.; Gouthas, Efthimios; Swierkowski, Leszek; Williams, Owen M.
2011-06-01
Continuing interest exists in the development of cost-effective synthetic environments for testing Laser Detection and Ranging (ladar) sensors. In this paper we describe a PC-based system for real-time ladar scene simulation of ships and small boats in a dynamic maritime environment. In particular, we describe the techniques employed to generate range imagery accompanied by passive radiance imagery. Our ladar scene generation system is an evolutionary extension of the VIRSuite infrared scene simulation program and includes all previous features such as ocean wave simulation, the physically-realistic representation of boat and ship dynamics, wake generation and simulation of whitecaps, spray, wake trails and foam. A terrain simulation extension is also under development. In this paper we outline the development, capabilities and limitations of the VIRSuite extensions.
The role of iconic memory in change-detection tasks.
Becker, M W; Pashler, H; Anstis, S M
2000-01-01
In three experiments, subjects attempted to detect the change of a single item in a visually presented array of items. Subjects' ability to detect a change was greatly reduced if a blank interstimulus interval (ISI) was inserted between the original array and an array in which one item had changed ('change blindness'). However, change detection improved when the location of the change was cued during the blank ISI. This suggests that people represent more information of a scene than change blindness might suggest. We test two possible hypotheses why, in the absence of a cue, this representation fails to produce good change detection. The first claims that the intervening events employed to create change blindness result in multiple neural transients which co-occur with the to-be-detected change. Poor detection rates occur because a serial search of all the transient locations is required to detect the change, during which time the representation of the original scene fades. The second claims that the occurrence of the second frame overwrites the representation of the first frame, unless that information is insulated against overwriting by attention. The results support the second hypothesis. We conclude that people may have a fairly rich visual representation of a scene while the scene is present, but fail to detect changes because they lack the ability to simultaneously represent two complete visual representations.
Scalable Coding of Plenoptic Images by Using a Sparse Set and Disparities.
Li, Yun; Sjostrom, Marten; Olsson, Roger; Jennehag, Ulf
2016-01-01
One of the light field capturing techniques is the focused plenoptic capturing. By placing a microlens array in front of the photosensor, the focused plenoptic cameras capture both spatial and angular information of a scene in each microlens image and across microlens images. The capturing results in a significant amount of redundant information, and the captured image is usually of a large resolution. A coding scheme that removes the redundancy before coding can be of advantage for efficient compression, transmission, and rendering. In this paper, we propose a lossy coding scheme to efficiently represent plenoptic images. The format contains a sparse image set and its associated disparities. The reconstruction is performed by disparity-based interpolation and inpainting, and the reconstructed image is later employed as a prediction reference for the coding of the full plenoptic image. As an outcome of the representation, the proposed scheme inherits a scalable structure with three layers. The results show that plenoptic images are compressed efficiently with over 60 percent bit rate reduction compared with High Efficiency Video Coding intra coding, and with over 20 percent compared with an High Efficiency Video Coding block copying mode.
Anesthesiology training using 3D imaging and virtual reality
NASA Astrophysics Data System (ADS)
Blezek, Daniel J.; Robb, Richard A.; Camp, Jon J.; Nauss, Lee A.
1996-04-01
Current training for regional nerve block procedures by anesthesiology residents requires expert supervision and the use of cadavers; both of which are relatively expensive commodities in today's cost-conscious medical environment. We are developing methods to augment and eventually replace these training procedures with real-time and realistic computer visualizations and manipulations of the anatomical structures involved in anesthesiology procedures, such as nerve plexus injections (e.g., celiac blocks). The initial work is focused on visualizations: both static images and rotational renderings. From the initial results, a coherent paradigm for virtual patient and scene representation will be developed.
On validating remote sensing simulations using coincident real data
NASA Astrophysics Data System (ADS)
Wang, Mingming; Yao, Wei; Brown, Scott; Goodenough, Adam; van Aardt, Jan
2016-05-01
The remote sensing community often requires data simulation, either via spectral/spatial downsampling or through virtual, physics-based models, to assess systems and algorithms. The Digital Imaging and Remote Sensing Image Generation (DIRSIG) model is one such first-principles, physics-based model for simulating imagery for a range of modalities. Complex simulation of vegetation environments subsequently has become possible, as scene rendering technology and software advanced. This in turn has created questions related to the validity of such complex models, with potential multiple scattering, bidirectional distribution function (BRDF), etc. phenomena that could impact results in the case of complex vegetation scenes. We selected three sites, located in the Pacific Southwest domain (Fresno, CA) of the National Ecological Observatory Network (NEON). These sites represent oak savanna, hardwood forests, and conifer-manzanita-mixed forests. We constructed corresponding virtual scenes, using airborne LiDAR and imaging spectroscopy data from NEON, ground-based LiDAR data, and field-collected spectra to characterize the scenes. Imaging spectroscopy data for these virtual sites then were generated using the DIRSIG simulation environment. This simulated imagery was compared to real AVIRIS imagery (15m spatial resolution; 12 pixels/scene) and NEON Airborne Observation Platform (AOP) data (1m spatial resolution; 180 pixels/scene). These tests were performed using a distribution-comparison approach for select spectral statistics, e.g., established the spectra's shape, for each simulated versus real distribution pair. The initial comparison results of the spectral distributions indicated that the shapes of spectra between the virtual and real sites were closely matched.
Contrast discrimination, non-uniform patterns and change blindness.
Scott-Brown, K C; Orbach, H S
1998-01-01
Change blindness--our inability to detect large changes in natural scenes when saccades, blinks and other transients interrupt visual input--seems to contradict psychophysical evidence for our exquisite sensitivity to contrast changes. Can the type of effects described as 'change blindness' be observed with simple, multi-element stimuli, amenable to psychophysical analysis? Such stimuli, composed of five mixed contrast elements, elicited a striking increase in contrast increment thresholds compared to those for an isolated element. Cue presentation prior to the stimulus substantially reduced thresholds, as for change blindness with natural scenes. On one hand, explanations for change blindness based on abstract and sketchy representations in short-term visual memory seem inappropriate for this low-level image property of contrast where there is ample evidence for exquisite performance on memory tasks. On the other hand, the highly increased thresholds for mixed contrast elements, and the decreased thresholds when a cue is present, argue against any simple early attentional or sensory explanation for change blindness. Thus, psychophysical results for very simple patterns cannot straightforwardly predict results even for the slightly more complicated patterns studied here. PMID:9872004
Inferring segmented dense motion layers using 5D tensor voting.
Min, Changki; Medioni, Gérard
2008-09-01
We present a novel local spatiotemporal approach to produce motion segmentation and dense temporal trajectories from an image sequence. A common representation of image sequences is a 3D spatiotemporal volume, (x,y,t), and its corresponding mathematical formalism is the fiber bundle. However, directly enforcing the spatiotemporal smoothness constraint is difficult in the fiber bundle representation. Thus, we convert the representation into a new 5D space (x,y,t,vx,vy) with an additional velocity domain, where each moving object produces a separate 3D smooth layer. The smoothness constraint is now enforced by extracting 3D layers using the tensor voting framework in a single step that solves both correspondence and segmentation simultaneously. Motion segmentation is achieved by identifying those layers, and the dense temporal trajectories are obtained by converting the layers back into the fiber bundle representation. We proceed to address three applications (tracking, mosaic, and 3D reconstruction) that are hard to solve from the video stream directly because of the segmentation and dense matching steps, but become straightforward with our framework. The approach does not make restrictive assumptions about the observed scene or camera motion and is therefore generally applicable. We present results on a number of data sets.
Yi, Chucai; Tian, Yingli
2012-09-01
In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.
Scene-based nonuniformity correction and enhancement: pixel statistics and subpixel motion.
Zhao, Wenyi; Zhang, Chao
2008-07-01
We propose a framework for scene-based nonuniformity correction (NUC) and nonuniformity correction and enhancement (NUCE) that is required for focal-plane array-like sensors to obtain clean and enhanced-quality images. The core of the proposed framework is a novel registration-based nonuniformity correction super-resolution (NUCSR) method that is bootstrapped by statistical scene-based NUC methods. Based on a comprehensive imaging model and an accurate parametric motion estimation, we are able to remove severe/structured nonuniformity and in the presence of subpixel motion to simultaneously improve image resolution. One important feature of our NUCSR method is the adoption of a parametric motion model that allows us to (1) handle many practical scenarios where parametric motions are present and (2) carry out perfect super-resolution in principle by exploring available subpixel motions. Experiments with real data demonstrate the efficiency of the proposed NUCE framework and the effectiveness of the NUCSR method.
Reliable Classification of Geologic Surfaces Using Texture Analysis
NASA Astrophysics Data System (ADS)
Foil, G.; Howarth, D.; Abbey, W. J.; Bekker, D. L.; Castano, R.; Thompson, D. R.; Wagstaff, K.
2012-12-01
Communication delays and bandwidth constraints are major obstacles for remote exploration spacecraft. Due to such restrictions, spacecraft could make use of onboard science data analysis to maximize scientific gain, through capabilities such as the generation of bandwidth-efficient representative maps of scenes, autonomous instrument targeting to exploit targets of opportunity between communications, and downlink prioritization to ensure fast delivery of tactically-important data. Of particular importance to remote exploration is the precision of such methods and their ability to reliably reproduce consistent results in novel environments. Spacecraft resources are highly oversubscribed, so any onboard data analysis must provide a high degree of confidence in its assessment. The TextureCam project is constructing a "smart camera" that can analyze surface images to autonomously identify scientifically interesting targets and direct narrow field-of-view instruments. The TextureCam instrument incorporates onboard scene interpretation and mapping to assist these autonomous science activities. Computer vision algorithms map scenes such as those encountered during rover traverses. The approach, based on a machine learning strategy, trains a statistical model to recognize different geologic surface types and then classifies every pixel in a new scene according to these categories. We describe three methods for increasing the precision of the TextureCam instrument. The first uses ancillary data to segment challenging scenes into smaller regions having homogeneous properties. These subproblems are individually easier to solve, preventing uncertainty in one region from contaminating those that can be confidently classified. The second involves a Bayesian approach that maximizes the likelihood of correct classifications by abstaining from ambiguous ones. We evaluate these two techniques on a set of images acquired during field expeditions in the Mojave Desert. Finally, the algorithm was expanded to perform robust texture classification across a wide range of lighting conditions. We characterize both the increase in precision achieved using different input data representations as well as the range of conditions under which reliable performance can be achieved. An ensemble learning approach is used to increase performance by leveraging the illumination-dependent statistics of an image. Our results show that the three algorithmic modifications lead to a significant increase in classification performance as well as an increase in precision using an adjustable and human-understandable metric of confidence.
The fate of object memory traces under change detection and change blindness.
Busch, Niko A
2013-07-03
Observers often fail to detect substantial changes in a visual scene. This so-called change blindness is often taken as evidence that visual representations are sparse and volatile. This notion rests on the assumption that the failure to detect a change implies that representations of the changing objects are lost all together. However, recent evidence suggests that under change blindness, object memory representations may be formed and stored, but not retrieved. This study investigated the fate of object memory representations when changes go unnoticed. Participants were presented with scenes consisting of real world objects, one of which changed on each trial, while recording event-related potentials (ERPs). Participants were first asked to localize where the change had occurred. In an additional recognition task, participants then discriminated old objects, either from the pre-change or the post-change scene, from entirely new objects. Neural traces of object memories were studied by comparing ERPs for old and novel objects. Participants performed poorly in the detection task and often failed to recognize objects from the scene, especially pre-change objects. However, a robust old/novel effect was observed in the ERP, even when participants were change blind and did not recognize the old object. This implicit memory trace was found both for pre-change and post-change objects. These findings suggest that object memories are stored even under change blindness. Thus, visual representations may not be as sparse and volatile as previously thought. Rather, change blindness may point to a failure to retrieve and use these representations for change detection. Copyright © 2013 Elsevier B.V. All rights reserved.
Generative technique for dynamic infrared image sequences
NASA Astrophysics Data System (ADS)
Zhang, Qian; Cao, Zhiguo; Zhang, Tianxu
2001-09-01
The generative technique of the dynamic infrared image was discussed in this paper. Because infrared sensor differs from CCD camera in imaging mechanism, it generates the infrared image by incepting the infrared radiation of scene (including target and background). The infrared imaging sensor is affected deeply by the atmospheric radiation, the environmental radiation and the attenuation of atmospheric radiation transfers. Therefore at first in this paper the imaging influence of all kinds of the radiations was analyzed and the calculation formula of radiation was provided, in addition, the passive scene and the active scene were analyzed separately. Then the methods of calculation in the passive scene were provided, and the functions of the scene model, the atmospheric transmission model and the material physical attribute databases were explained. Secondly based on the infrared imaging model, the design idea, the achievable way and the software frame for the simulation software of the infrared image sequence were introduced in SGI workstation. Under the guidance of the idea above, in the third segment of the paper an example of simulative infrared image sequences was presented, which used the sea and sky as background and used the warship as target and used the aircraft as eye point. At last the simulation synthetically was evaluated and the betterment scheme was presented.
Structural scene analysis and content-based image retrieval applied to bone age assessment
NASA Astrophysics Data System (ADS)
Fischer, Benedikt; Brosig, André; Deserno, Thomas M.; Ott, Bastian; Günther, Rolf W.
2009-02-01
Radiological bone age assessment is based on global or local image regions of interest (ROI), such as epiphyseal regions or the area of carpal bones. Usually, these regions are compared to a standardized reference and a score determining the skeletal maturity is calculated. For computer-assisted diagnosis, automatic ROI extraction is done so far by heuristic approaches. In this work, we apply a high-level approach of scene analysis for knowledge-based ROI segmentation. Based on a set of 100 reference images from the IRMA database, a so called structural prototype (SP) is trained. In this graph-based structure, the 14 phalanges and 5 metacarpal bones are represented by nodes, with associated location, shape, as well as texture parameters modeled by Gaussians. Accordingly, the Gaussians describing the relative positions, relative orientation, and other relative parameters between two nodes are associated to the edges. Thereafter, segmentation of a hand radiograph is done in several steps: (i) a multi-scale region merging scheme is applied to extract visually prominent regions; (ii) a graph/sub-graph matching to the SP robustly identifies a subset of the 19 bones; (iii) the SP is registered to the current image for complete scene-reconstruction (iv) the epiphyseal regions are extracted from the reconstructed scene. The evaluation is based on 137 images of Caucasian males from the USC hand atlas. Overall, an error rate of 32% is achieved, for the 6 middle distal and medial/distal epiphyses, 23% of all extractions need adjustments. On average 9.58 of the 14 epiphyseal regions were extracted successfully per image. This is promising for further use in content-based image retrieval (CBIR) and CBIR-based automatic bone age assessment.
Research on hyperspectral dynamic scene and image sequence simulation
NASA Astrophysics Data System (ADS)
Sun, Dandan; Gao, Jiaobo; Sun, Kefeng; Hu, Yu; Li, Yu; Xie, Junhu; Zhang, Lei
2016-10-01
This paper presents a simulation method of hyper-spectral dynamic scene and image sequence for hyper-spectral equipment evaluation and target detection algorithm. Because of high spectral resolution, strong band continuity, anti-interference and other advantages, in recent years, hyper-spectral imaging technology has been rapidly developed and is widely used in many areas such as optoelectronic target detection, military defense and remote sensing systems. Digital imaging simulation, as a crucial part of hardware in loop simulation, can be applied to testing and evaluation hyper-spectral imaging equipment with lower development cost and shorter development period. Meanwhile, visual simulation can produce a lot of original image data under various conditions for hyper-spectral image feature extraction and classification algorithm. Based on radiation physic model and material characteristic parameters this paper proposes a generation method of digital scene. By building multiple sensor models under different bands and different bandwidths, hyper-spectral scenes in visible, MWIR, LWIR band, with spectral resolution 0.01μm, 0.05μm and 0.1μm have been simulated in this paper. The final dynamic scenes have high real-time and realistic, with frequency up to 100 HZ. By means of saving all the scene gray data in the same viewpoint image sequence is obtained. The analysis results show whether in the infrared band or the visible band, the grayscale variations of simulated hyper-spectral images are consistent with the theoretical analysis results.
Li, Yiyang; Jin, Weiqi; Li, Shuo; Zhang, Xu; Zhu, Jin
2017-05-08
Cooled infrared detector arrays always suffer from undesired ripple residual nonuniformity (RNU) in sky scene observations. The ripple residual nonuniformity seriously affects the imaging quality, especially for small target detection. It is difficult to eliminate it using the calibration-based techniques and the current scene-based nonuniformity algorithms. In this paper, we present a modified temporal high-pass nonuniformity correction algorithm using fuzzy scene classification. The fuzzy scene classification is designed to control the correction threshold so that the algorithm can remove ripple RNU without degrading the scene details. We test the algorithm on a real infrared sequence by comparing it to several well-established methods. The result shows that the algorithm has obvious advantages compared with the tested methods in terms of detail conservation and convergence speed for ripple RNU correction. Furthermore, we display our architecture with a prototype built on a Xilinx Virtex-5 XC5VLX50T field-programmable gate array (FPGA), which has two advantages: (1) low resources consumption; and (2) small hardware delay (less than 10 image rows). It has been successfully applied in an actual system.
Hausfeld, Lars; Riecke, Lars; Formisano, Elia
2018-06-01
Often, in everyday life, we encounter auditory scenes comprising multiple simultaneous sounds and succeed to selectively attend to only one sound, typically the most relevant for ongoing behavior. Studies using basic sounds and two-talker stimuli have shown that auditory selective attention aids this by enhancing the neural representations of the attended sound in auditory cortex. It remains unknown, however, whether and how this selective attention mechanism operates on representations of auditory scenes containing natural sounds of different categories. In this high-field fMRI study we presented participants with simultaneous voices and musical instruments while manipulating their focus of attention. We found an attentional enhancement of neural sound representations in temporal cortex - as defined by spatial activation patterns - at locations that depended on the attended category (i.e., voices or instruments). In contrast, we found that in frontal cortex the site of enhancement was independent of the attended category and the same regions could flexibly represent any attended sound regardless of its category. These results are relevant to elucidate the interacting mechanisms of bottom-up and top-down processing when listening to real-life scenes comprised of multiple sound categories. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Kim, Sungho
2015-01-01
Sea-based infrared search and track (IRST) is important for homeland security by detecting missiles and asymmetric boats. This paper proposes a novel scheme to interpret various infrared scenes by classifying the infrared background types and detecting the coastal regions in omni-directional images. The background type or region-selective small infrared target detector should be deployed to maximize the detection rate and to minimize the number of false alarms. A spatial filter-based small target detector is suitable for identifying stationary incoming targets in remote sea areas with sky only. Many false detections can occur if there is an image sector containing a coastal region, due to ground clutter and the difficulty in finding true targets using the same spatial filter-based detector. A temporal filter-based detector was used to handle these problems. Therefore, the scene type and coastal region information is critical to the success of IRST in real-world applications. In this paper, the infrared scene type was determined using the relationships between the sensor line-of-sight (LOS) and a horizontal line in an image. The proposed coastal region detector can be activated if the background type of the probing sector is determined to be a coastal region. Coastal regions can be detected by fusing the region map and curve map. The experimental results on real infrared images highlight the feasibility of the proposed sea-based scene interpretation. In addition, the effects of the proposed scheme were analyzed further by applying region-adaptive small target detection. PMID:26404308
A Novel Method to Increase LinLog CMOS Sensors’ Performance in High Dynamic Range Scenarios
Martínez-Sánchez, Antonio; Fernández, Carlos; Navarro, Pedro J.; Iborra, Andrés
2011-01-01
Images from high dynamic range (HDR) scenes must be obtained with minimum loss of information. For this purpose it is necessary to take full advantage of the quantification levels provided by the CCD/CMOS image sensor. LinLog CMOS sensors satisfy the above demand by offering an adjustable response curve that combines linear and logarithmic responses. This paper presents a novel method to quickly adjust the parameters that control the response curve of a LinLog CMOS image sensor. We propose to use an Adaptive Proportional-Integral-Derivative controller to adjust the exposure time of the sensor, together with control algorithms based on the saturation level and the entropy of the images. With this method the sensor’s maximum dynamic range (120 dB) can be used to acquire good quality images from HDR scenes with fast, automatic adaptation to scene conditions. Adaptation to a new scene is rapid, with a sensor response adjustment of less than eight frames when working in real time video mode. At least 67% of the scene entropy can be retained with this method. PMID:22164083
A novel method to increase LinLog CMOS sensors' performance in high dynamic range scenarios.
Martínez-Sánchez, Antonio; Fernández, Carlos; Navarro, Pedro J; Iborra, Andrés
2011-01-01
Images from high dynamic range (HDR) scenes must be obtained with minimum loss of information. For this purpose it is necessary to take full advantage of the quantification levels provided by the CCD/CMOS image sensor. LinLog CMOS sensors satisfy the above demand by offering an adjustable response curve that combines linear and logarithmic responses. This paper presents a novel method to quickly adjust the parameters that control the response curve of a LinLog CMOS image sensor. We propose to use an Adaptive Proportional-Integral-Derivative controller to adjust the exposure time of the sensor, together with control algorithms based on the saturation level and the entropy of the images. With this method the sensor's maximum dynamic range (120 dB) can be used to acquire good quality images from HDR scenes with fast, automatic adaptation to scene conditions. Adaptation to a new scene is rapid, with a sensor response adjustment of less than eight frames when working in real time video mode. At least 67% of the scene entropy can be retained with this method.
Framework of passive millimeter-wave scene simulation based on material classification
NASA Astrophysics Data System (ADS)
Park, Hyuk; Kim, Sung-Hyun; Lee, Ho-Jin; Kim, Yong-Hoon; Ki, Jae-Sug; Yoon, In-Bok; Lee, Jung-Min; Park, Soon-Jun
2006-05-01
Over the past few decades, passive millimeter-wave (PMMW) sensors have emerged as useful implements in transportation and military applications such as autonomous flight-landing system, smart weapons, night- and all weather vision system. As an efficient way to predict the performance of a PMMW sensor and apply it to system, it is required to test in SoftWare-In-the-Loop (SWIL). The PMMW scene simulation is a key component for implementation of this simulator. However, there is no commercial on-the-shelf available to construct the PMMW scene simulation; only there have been a few studies on this technology. We have studied the PMMW scene simulation method to develop the PMMW sensor SWIL simulator. This paper describes the framework of the PMMW scene simulation and the tentative results. The purpose of the PMMW scene simulation is to generate sensor outputs (or image) from a visible image and environmental conditions. We organize it into four parts; material classification mapping, PMMW environmental setting, PMMW scene forming, and millimeter-wave (MMW) sensorworks. The background and the objects in the scene are classified based on properties related with MMW radiation and reflectivity. The environmental setting part calculates the following PMMW phenomenology; atmospheric propagation and emission including sky temperature, weather conditions, and physical temperature. Then, PMMW raw images are formed with surface geometry. Finally, PMMW sensor outputs are generated from PMMW raw images by applying the sensor characteristics such as an aperture size and noise level. Through the simulation process, PMMW phenomenology and sensor characteristics are simulated on the output scene. We have finished the design of framework of the simulator, and are working on implementation in detail. As a tentative result, the flight observation was simulated in specific conditions. After implementation details, we plan to increase the reliability of the simulation by data collecting using actual PMMW sensors. With the reliable PMMW scene simulator, it will be more efficient to apply the PMMW sensor to various applications.
Eye Movements and Visual Memory for Scenes
2005-01-01
Scene memory research has demonstrated that the memory representation of a semantically inconsistent object in a scene is more detailed and/or complete... memory during scene viewing, then changes to semantically inconsistent objects (which should be represented more com- pletely) should be detected more... semantic description. Due to the surprise nature of the visual memory test, any learning that occurred during the search portion of the experiment was
NASA Astrophysics Data System (ADS)
Kuvychko, Igor
2001-10-01
Vision is a part of a larger information system that converts visual information into knowledge structures. These structures drive vision process, resolving ambiguity and uncertainty via feedback, and provide image understanding, that is an interpretation of visual information in terms of such knowledge models. A computer vision system based on such principles requires unifying representation of perceptual and conceptual information. Computer simulation models are built on the basis of graphs/networks. The ability of human brain to emulate similar graph/networks models is found. That means a very important shift of paradigm in our knowledge about brain from neural networks to the cortical software. Starting from the primary visual areas, brain analyzes an image as a graph-type spatial structure. Primary areas provide active fusion of image features on a spatial grid-like structure, where nodes are cortical columns. The spatial combination of different neighbor features cannot be described as a statistical/integral characteristic of the analyzed region, but uniquely characterizes such region itself. Spatial logic and topology naturally present in such structures. Mid-level vision processes like clustering, perceptual grouping, multilevel hierarchical compression, separation of figure from ground, etc. are special kinds of graph/network transformations. They convert low-level image structure into the set of more abstract ones, which represent objects and visual scene, making them easy for analysis by higher-level knowledge structures. Higher-level vision phenomena like shape from shading, occlusion, etc. are results of such analysis. Such approach gives opportunity not only to explain frequently unexplainable results of the cognitive science, but also to create intelligent computer vision systems that simulate perceptional processes in both what and where visual pathways. Such systems can open new horizons for robotic and computer vision industries.
Maximum entropy perception-action space: a Bayesian model of eye movement selection
NASA Astrophysics Data System (ADS)
Colas, Francis; Bessière, Pierre; Girard, Benoît
2011-03-01
In this article, we investigate the issue of the selection of eye movements in a free-eye Multiple Object Tracking task. We propose a Bayesian model of retinotopic maps with a complex logarithmic mapping. This model is structured in two parts: a representation of the visual scene, and a decision model based on the representation. We compare different decision models based on different features of the representation and we show that taking into account uncertainty helps predict the eye movements of subjects recorded in a psychophysics experiment. Finally, based on experimental data, we postulate that the complex logarithmic mapping has a functional relevance, as the density of objects in this space in more uniform than expected. This may indicate that the representation space and control strategies are such that the object density is of maximum entropy.
Light field rendering with omni-directional camera
NASA Astrophysics Data System (ADS)
Todoroki, Hiroshi; Saito, Hideo
2003-06-01
This paper presents an approach to capture visual appearance of a real environment such as an interior of a room. We propose the method for generating arbitrary viewpoint images by building light field with the omni-directional camera, which can capture the wide circumferences. Omni-directional camera used in this technique is a special camera with the hyperbolic mirror in the upper part of a camera, so that we can capture luminosity in the environment in the range of 360 degree of circumferences in one image. We apply the light field method, which is one technique of Image-Based-Rendering(IBR), for generating the arbitrary viewpoint images. The light field is a kind of the database that records the luminosity information in the object space. We employ the omni-directional camera for constructing the light field, so that we can collect many view direction images in the light field. Thus our method allows the user to explore the wide scene, that can acheive realistic representation of virtual enviroment. For demonstating the proposed method, we capture image sequence in our lab's interior environment with an omni-directional camera, and succesfully generate arbitray viewpoint images for virual tour of the environment.
Text String Detection from Natural Scenes by Structure-based Partition and Grouping
Yi, Chucai; Tian, YingLi
2012-01-01
Text information in natural scene images serves as important clues for many image-based applications such as scene understanding, content-based image retrieval, assistive navigation, and automatic geocoding. However, locating text from complex background with multiple colors is a challenging task. In this paper, we explore a new framework to detect text strings with arbitrary orientations in complex natural scene images. Our proposed framework of text string detection consists of two steps: 1) Image partition to find text character candidates based on local gradient features and color uniformity of character components. 2) Character candidate grouping to detect text strings based on joint structural features of text characters in each text string such as character size differences, distances between neighboring characters, and character alignment. By assuming that a text string has at least three characters, we propose two algorithms of text string detection: 1) adjacent character grouping method, and 2) text line grouping method. The adjacent character grouping method calculates the sibling groups of each character candidate as string segments and then merges the intersecting sibling groups into text string. The text line grouping method performs Hough transform to fit text line among the centroids of text candidates. Each fitted text line describes the orientation of a potential text string. The detected text string is presented by a rectangle region covering all characters whose centroids are cascaded in its text line. To improve efficiency and accuracy, our algorithms are carried out in multi-scales. The proposed methods outperform the state-of-the-art results on the public Robust Reading Dataset which contains text only in horizontal orientation. Furthermore, the effectiveness of our methods to detect text strings with arbitrary orientations is evaluated on the Oriented Scene Text Dataset collected by ourselves containing text strings in non-horizontal orientations. PMID:21411405
Text string detection from natural scenes by structure-based partition and grouping.
Yi, Chucai; Tian, YingLi
2011-09-01
Text information in natural scene images serves as important clues for many image-based applications such as scene understanding, content-based image retrieval, assistive navigation, and automatic geocoding. However, locating text from a complex background with multiple colors is a challenging task. In this paper, we explore a new framework to detect text strings with arbitrary orientations in complex natural scene images. Our proposed framework of text string detection consists of two steps: 1) image partition to find text character candidates based on local gradient features and color uniformity of character components and 2) character candidate grouping to detect text strings based on joint structural features of text characters in each text string such as character size differences, distances between neighboring characters, and character alignment. By assuming that a text string has at least three characters, we propose two algorithms of text string detection: 1) adjacent character grouping method and 2) text line grouping method. The adjacent character grouping method calculates the sibling groups of each character candidate as string segments and then merges the intersecting sibling groups into text string. The text line grouping method performs Hough transform to fit text line among the centroids of text candidates. Each fitted text line describes the orientation of a potential text string. The detected text string is presented by a rectangle region covering all characters whose centroids are cascaded in its text line. To improve efficiency and accuracy, our algorithms are carried out in multi-scales. The proposed methods outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation. Furthermore, the effectiveness of our methods to detect text strings with arbitrary orientations is evaluated on the Oriented Scene Text Dataset collected by ourselves containing text strings in nonhorizontal orientations.
Improving depth estimation from a plenoptic camera by patterned illumination
NASA Astrophysics Data System (ADS)
Marshall, Richard J.; Meah, Chris J.; Turola, Massimo; Claridge, Ela; Robinson, Alex; Bongs, Kai; Gruppetta, Steve; Styles, Iain B.
2015-05-01
Plenoptic (light-field) imaging is a technique that allows a simple CCD-based imaging device to acquire both spatially and angularly resolved information about the "light-field" from a scene. It requires a microlens array to be placed between the objective lens and the sensor of the imaging device1 and the images under each microlens (which typically span many pixels) can be computationally post-processed to shift perspective, digital refocus, extend the depth of field, manipulate the aperture synthetically and generate a depth map from a single image. Some of these capabilities are rigid functions that do not depend upon the scene and work by manipulating and combining a well-defined set of pixels in the raw image. However, depth mapping requires specific features in the scene to be identified and registered between consecutive microimages. This process requires that the image has sufficient features for the registration, and in the absence of such features the algorithms become less reliable and incorrect depths are generated. The aim of this study is to investigate the generation of depth-maps from light-field images of scenes with insufficient features for accurate registration, using projected patterns to impose a texture on the scene that provides sufficient landmarks for the registration methods.
Research in interactive scene analysis
NASA Technical Reports Server (NTRS)
Tenenbaum, J. M.; Garvey, T. D.; Weyl, S. A.; Wolf, H. C.
1975-01-01
An interactive scene interpretation system (ISIS) was developed as a tool for constructing and experimenting with man-machine and automatic scene analysis methods tailored for particular image domains. A recently developed region analysis subsystem based on the paradigm of Brice and Fennema is described. Using this subsystem a series of experiments was conducted to determine good criteria for initially partitioning a scene into atomic regions and for merging these regions into a final partition of the scene along object boundaries. Semantic (problem-dependent) knowledge is essential for complete, correct partitions of complex real-world scenes. An interactive approach to semantic scene segmentation was developed and demonstrated on both landscape and indoor scenes. This approach provides a reasonable methodology for segmenting scenes that cannot be processed completely automatically, and is a promising basis for a future automatic system. A program is described that can automatically generate strategies for finding specific objects in a scene based on manually designated pictorial examples.
Scene recognition following locomotion around a scene.
Motes, Michael A; Finlay, Cory A; Kozhevnikov, Maria
2006-01-01
Effects of locomotion on scene-recognition reaction time (RT) and accuracy were studied. In experiment 1, observers memorized an 11-object scene and made scene-recognition judgments on subsequently presented scenes from the encoded view or different views (ie scenes were rotated or observers moved around the scene, both from 40 degrees to 360 degrees). In experiment 2, observers viewed different 5-object scenes on each trial and made scene-recognition judgments from the encoded view or after moving around the scene, from 36 degrees to 180 degrees. Across experiments, scene-recognition RT increased (in experiment 2 accuracy decreased) with angular distance between encoded and judged views, regardless of how the viewpoint changes occurred. The findings raise questions about conditions in which locomotion produces spatially updated representations of scenes.
Rational-operator-based depth-from-defocus approach to scene reconstruction.
Li, Ang; Staunton, Richard; Tjahjadi, Tardi
2013-09-01
This paper presents a rational-operator-based approach to depth from defocus (DfD) for the reconstruction of three-dimensional scenes from two-dimensional images, which enables fast DfD computation that is independent of scene textures. Two variants of the approach, one using the Gaussian rational operators (ROs) that are based on the Gaussian point spread function (PSF) and the second based on the generalized Gaussian PSF, are considered. A novel DfD correction method is also presented to further improve the performance of the approach. Experimental results are considered for real scenes and show that both approaches outperform existing RO-based methods.
A HWIL test facility of infrared imaging laser radar using direct signal injection
NASA Astrophysics Data System (ADS)
Wang, Qian; Lu, Wei; Wang, Chunhui; Wang, Qi
2005-01-01
Laser radar has been widely used these years and the hardware-in-the-loop (HWIL) testing of laser radar become important because of its low cost and high fidelity compare with On-the-Fly testing and whole digital simulation separately. Scene generation and projection two key technologies of hardware-in-the-loop testing of laser radar and is a complicated problem because the 3D images result from time delay. The scene generation process begins with the definition of the target geometry and reflectivity and range. The real-time 3D scene generation computer is a PC based hardware and the 3D target models were modeled using 3dsMAX. The scene generation software was written in C and OpenGL and is executed to extract the Z-buffer from the bit planes to main memory as range image. These pixels contain each target position x, y, z and its respective intensity and range value. Expensive optical injection technologies of scene projection such as LDP array, VCSEL array, DMD and associated scene generation is ongoing. But the optical scene projection is complicated and always unaffordable. In this paper a cheaper test facility was described that uses direct electronic injection to provide rang images for laser radar testing. The electronic delay and pulse shaping circuits inject the scenes directly into the seeker's signal processing unit.
Mining Very High Resolution INSAR Data Based On Complex-GMRF Cues And Relevance Feedback
NASA Astrophysics Data System (ADS)
Singh, Jagmal; Popescu, Anca; Soccorsi, Matteo; Datcu, Mihai
2012-01-01
With the increase in number of remote sensing satellites, the number of image-data scenes in our repositories is also increasing and a large quantity of these scenes are never received and used. Thus automatic retrieval of de- sired image-data using query by image content to fully utilize the huge repository volume is becoming of great interest. Generally different users are interested in scenes containing different kind of objects and structures. So its important to analyze all the image information mining (IIM) methods so that its easier for user to select a method depending upon his/her requirement. We concentrate our study only on high-resolution SAR images and we propose to use InSAR observations instead of only one single look complex (SLC) images for mining scenes containing coherent objects such as high-rise buildings. However in case of objects with less coherence like areas with vegetation cover, SLC images exhibits better performance. We demonstrate IIM performance comparison using complex-Gauss Markov Random Fields as texture descriptor for image patches and SVM relevance- feedback.
Re-engaging with the past: recapitulation of encoding operations during episodic retrieval
Morcom, Alexa M.
2014-01-01
Recollection of events is accompanied by selective reactivation of cortical regions which responded to specific sensory and cognitive dimensions of the original events. This reactivation is thought to reflect the reinstatement of stored memory representations and therefore to reflect memory content, but it may also reveal processes which support both encoding and retrieval. The present study used event-related functional magnetic resonance imaging to investigate whether regions selectively engaged in encoding face and scene context with studied words are also re-engaged when the context is later retrieved. As predicted, encoding face and scene context with visually presented words elicited activity in distinct, context-selective regions. Retrieval of face and scene context also re-engaged some of the regions which had shown successful encoding effects. However, this recapitulation of encoding activity did not show the same context selectivity observed at encoding. Successful retrieval of both face and scene context re-engaged regions which had been associated with encoding of the other type of context, as well as those associated with encoding the same type of context. This recapitulation may reflect retrieval attempts which are not context-selective, but use shared retrieval cues to re-engage encoding operations in service of recollection. PMID:24904386
A review on brightness preserving contrast enhancement methods for digital image
NASA Astrophysics Data System (ADS)
Rahman, Md Arifur; Liu, Shilong; Li, Ruowei; Wu, Hongkun; Liu, San Chi; Jahan, Mahmuda Rawnak; Kwok, Ngaiming
2018-04-01
Image enhancement is an imperative step for many vision based applications. For image contrast enhancement, popular methods adopt the principle of spreading the captured intensities throughout the allowed dynamic range according to predefined distributions. However, these algorithms take little or no consideration into account of maintaining the mean brightness of the original scene, which is of paramount importance to carry the true scene illumination characteristics to the viewer. Though there have been significant amount of reviews on contrast enhancement methods published, updated review on overall brightness preserving image enhancement methods is still scarce. In this paper, a detailed survey is performed on those particular methods that specifically aims to maintain the overall scene illumination characteristics while enhancing the digital image.
GeoPAT: A toolbox for pattern-based information retrieval from large geospatial databases
NASA Astrophysics Data System (ADS)
Jasiewicz, Jarosław; Netzel, Paweł; Stepinski, Tomasz
2015-07-01
Geospatial Pattern Analysis Toolbox (GeoPAT) is a collection of GRASS GIS modules for carrying out pattern-based geospatial analysis of images and other spatial datasets. The need for pattern-based analysis arises when images/rasters contain rich spatial information either because of their very high resolution or their very large spatial extent. Elementary units of pattern-based analysis are scenes - patches of surface consisting of a complex arrangement of individual pixels (patterns). GeoPAT modules implement popular GIS algorithms, such as query, overlay, and segmentation, to operate on the grid of scenes. To achieve these capabilities GeoPAT includes a library of scene signatures - compact numerical descriptors of patterns, and a library of distance functions - providing numerical means of assessing dissimilarity between scenes. Ancillary GeoPAT modules use these functions to construct a grid of scenes or to assign signatures to individual scenes having regular or irregular geometries. Thus GeoPAT combines knowledge retrieval from patterns with mapping tasks within a single integrated GIS environment. GeoPAT is designed to identify and analyze complex, highly generalized classes in spatial datasets. Examples include distinguishing between different styles of urban settlements using VHR images, delineating different landscape types in land cover maps, and mapping physiographic units from DEM. The concept of pattern-based spatial analysis is explained and the roles of all modules and functions are described. A case study example pertaining to delineation of landscape types in a subregion of NLCD is given. Performance evaluation is included to highlight GeoPAT's applicability to very large datasets. The GeoPAT toolbox is available for download from
An efficient framework for modeling clouds from Landsat8 images
NASA Astrophysics Data System (ADS)
Yuan, Chunqiang; Guo, Jing
2015-03-01
Cloud plays an important role in creating realistic outdoor scenes for video game and flight simulation applications. Classic methods have been proposed for cumulus cloud modeling. However, these methods are not flexible for modeling large cloud scenes with hundreds of clouds in that the user must repeatedly model each cloud and adjust its various properties. This paper presents a meteorologically based method to reconstruct cumulus clouds from high resolution Landsat8 satellite images. From these input satellite images, the clouds are first segmented from the background. Then, the cloud top surface is estimated from the temperature of the infrared image. After that, under a mild assumption of flat base for cumulus cloud, the base height of each cloud is computed by averaging the top height for pixels on the cloud edge. Then, the extinction is generated from the visible image. Finally, we enrich the initial shapes of clouds using a fractal method and represent the recovered clouds as a particle system. The experimental results demonstrate our method can yield realistic cloud scenes resembling those in the satellite images.
Study on general design of dual-DMD based infrared two-band scene simulation system
NASA Astrophysics Data System (ADS)
Pan, Yue; Qiao, Yang; Xu, Xi-ping
2017-02-01
Mid-wave infrared(MWIR) and long-wave infrared(LWIR) two-band scene simulation system is a kind of testing equipment that used for infrared two-band imaging seeker. Not only it would be qualified for working waveband, but also realize the essence requests that infrared radiation characteristics should correspond to the real scene. Past single-digital micromirror device (DMD) based infrared scene simulation system does not take the huge difference between targets and background radiation into account, and it cannot realize the separated modulation to two-band light beam. Consequently, single-DMD based infrared scene simulation system cannot accurately express the thermal scene model that upper-computer built, and it is not that practical. To solve the problem, we design a dual-DMD based, dual-channel, co-aperture, compact-structure infrared two-band scene simulation system. The operating principle of the system is introduced in detail, and energy transfer process of the hardware-in-the-loop simulation experiment is analyzed as well. Also, it builds the equation about the signal-to-noise ratio of infrared detector in the seeker, directing the system overall design. The general design scheme of system is given, including the creation of infrared scene model, overall control, optical-mechanical structure design and image registration. By analyzing and comparing the past designs, we discuss the arrangement of optical engine framework in the system. According to the main content of working principle and overall design, we summarize each key techniques in the system.
- and Scene-Guided Integration of Tls and Photogrammetric Point Clouds for Landslide Monitoring
NASA Astrophysics Data System (ADS)
Zieher, T.; Toschi, I.; Remondino, F.; Rutzinger, M.; Kofler, Ch.; Mejia-Aguilar, A.; Schlögel, R.
2018-05-01
Terrestrial and airborne 3D imaging sensors are well-suited data acquisition systems for the area-wide monitoring of landslide activity. State-of-the-art surveying techniques, such as terrestrial laser scanning (TLS) and photogrammetry based on unmanned aerial vehicle (UAV) imagery or terrestrial acquisitions have advantages and limitations associated with their individual measurement principles. In this study we present an integration approach for 3D point clouds derived from these techniques, aiming at improving the topographic representation of landslide features while enabling a more accurate assessment of landslide-induced changes. Four expert-based rules involving local morphometric features computed from eigenvectors, elevation and the agreement of the individual point clouds, are used to choose within voxels of selectable size which sensor's data to keep. Based on the integrated point clouds, digital surface models and shaded reliefs are computed. Using an image correlation technique, displacement vectors are finally derived from the multi-temporal shaded reliefs. All results show comparable patterns of landslide movement rates and directions. However, depending on the applied integration rule, differences in spatial coverage and correlation strength emerge.
HMM for hyperspectral spectrum representation and classification with endmember entropy vectors
NASA Astrophysics Data System (ADS)
Arabi, Samir Y. W.; Fernandes, David; Pizarro, Marco A.
2015-10-01
The Hyperspectral images due to its good spectral resolution are extensively used for classification, but its high number of bands requires a higher bandwidth in the transmission data, a higher data storage capability and a higher computational capability in processing systems. This work presents a new methodology for hyperspectral data classification that can work with a reduced number of spectral bands and achieve good results, comparable with processing methods that require all hyperspectral bands. The proposed method for hyperspectral spectra classification is based on the Hidden Markov Model (HMM) associated to each Endmember (EM) of a scene and the conditional probabilities of each EM belongs to each other EM. The EM conditional probability is transformed in EM vector entropy and those vectors are used as reference vectors for the classes in the scene. The conditional probability of a spectrum that will be classified is also transformed in a spectrum entropy vector, which is classified in a given class by the minimum ED (Euclidian Distance) among it and the EM entropy vectors. The methodology was tested with good results using AVIRIS spectra of a scene with 13 EM considering the full 209 bands and the reduced spectral bands of 128, 64 and 32. For the test area its show that can be used only 32 spectral bands instead of the original 209 bands, without significant loss in the classification process.
Locus Coeruleus Activity Strengthens Prioritized Memories Under Arousal.
Clewett, David V; Huang, Ringo; Velasco, Rico; Lee, Tae-Ho; Mather, Mara
2018-02-07
Recent models posit that bursts of locus ceruleus (LC) activity amplify neural gain such that limited attention and encoding resources focus even more on prioritized mental representations under arousal. Here, we tested this hypothesis in human males and females using fMRI, neuromelanin MRI, and pupil dilation, a biomarker of arousal and LC activity. During scanning, participants performed a monetary incentive encoding task in which threat of punishment motivated them to prioritize encoding of scene images over superimposed objects. Threat of punishment elicited arousal and selectively enhanced memory for goal-relevant scenes. Furthermore, trial-level pupil dilations predicted better scene memory under threat, but were not related to object memory outcomes. fMRI analyses revealed that greater threat-evoked pupil dilations were positively associated with greater scene encoding activity in LC and parahippocampal cortex, a region specialized to process scene information. Across participants, this pattern of LC engagement for goal-relevant encoding was correlated with neuromelanin signal intensity, providing the first evidence that LC structure relates to its activation pattern during cognitive processing. Threat also reduced dynamic functional connectivity between high-priority (parahippocampal place area) and lower-priority (lateral occipital cortex) category-selective visual cortex in ways that predicted increased memory selectivity. Together, these findings support the idea that, under arousal, LC activity selectively strengthens prioritized memory representations by modulating local and functional network-level patterns of information processing. SIGNIFICANCE STATEMENT Adaptive behavior relies on the ability to select and store important information amid distraction. Prioritizing encoding of task-relevant inputs is especially critical in threatening or arousing situations, when forming these memories is essential for avoiding danger in the future. However, little is known about the arousal mechanisms that support such memory selectivity. Using fMRI, neuromelanin MRI, and pupil measures, we demonstrate that locus ceruleus (LC) activity amplifies neural gain such that limited encoding resources focus even more on prioritized mental representations under arousal. For the first time, we also show that LC structure relates to its involvement in threat-related encoding processes. These results shed new light on the brain mechanisms by which we process important information when it is most needed. Copyright © 2018 the authors 0270-6474/18/381558-17$15.00/0.
Using the structure of natural scenes and sounds to predict neural response properties in the brain
NASA Astrophysics Data System (ADS)
Deweese, Michael
2014-03-01
The natural scenes and sounds we encounter in the world are highly structured. The fact that animals and humans are so efficient at processing these sensory signals compared with the latest algorithms running on the fastest modern computers suggests that our brains can exploit this structure. We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogra representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus (MGBv) and primary auditory cortex (A1), and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds. We have also developed a biologically-inspired neural network model of primary visual cortex (V1) that can learn a sparse representation of natural scenes using spiking neurons and strictly local plasticity rules. The representation learned by our model is in good agreement with measured receptive fields in V1, demonstrating that sparse sensory coding can be achieved in a realistic biological setting.
Manhole Cover Detection Using Vehicle-Based Multi-Sensor Data
NASA Astrophysics Data System (ADS)
Ji, S.; Shi, Y.; Shi, Z.
2012-07-01
A new method combined wit multi-view matching and feature extraction technique is developed to detect manhole covers on the streets using close-range images combined with GPS/IMU and LINDAR data. The covers are an important target on the road traffic as same as transport signs, traffic lights and zebra crossing but with more unified shapes. However, the different shoot angle and distance, ground material, complex street scene especially its shadow, and cars in the road have a great impact on the cover detection rate. The paper introduces a new method in edge detection and feature extraction in order to overcome these difficulties and greatly improve the detection rate. The LIDAR data are used to do scene segmentation and the street scene and cars are excluded from the roads. And edge detection method base on canny which sensitive to arcs and ellipses is applied on the segmented road scene and the interesting areas contain arcs are extracted and fitted to ellipse. The ellipse are then resampled for invariance to shooting angle and distance and then are matched to adjacent images for further checking if covers and . More than 1000 images with different scenes are used in our tests and the detection rate is analyzed. The results verified our method have its advantages in correct covers detection in the complex street scene.
Ghost detection and removal based on super-pixel grouping in exposure fusion
NASA Astrophysics Data System (ADS)
Jiang, Shenyu; Xu, Zhihai; Li, Qi; Chen, Yueting; Feng, Huajun
2014-09-01
A novel multi-exposure images fusion method for dynamic scenes is proposed. The commonly used techniques for high dynamic range (HDR) imaging are based on the combination of multiple differently exposed images of the same scene. The drawback of these methods is that ghosting artifacts will be introduced into the final HDR image if the scene is not static. In this paper, a super-pixel grouping based method is proposed to detect the ghost in the image sequences. We introduce the zero mean normalized cross correlation (ZNCC) as a measure of similarity between a given exposure image and the reference. The calculation of ZNCC is implemented in super-pixel level, and the super-pixels which have low correlation with the reference are excluded by adjusting the weight maps for fusion. Without any prior information on camera response function or exposure settings, the proposed method generates low dynamic range (LDR) images which can be shown on conventional display devices directly with details preserving and ghost effects reduced. Experimental results show that the proposed method generates high quality images which have less ghost artifacts and provide a better visual quality than previous approaches.
Investigation of TM Band-to-band Registration Using the JSC Registration Processor
NASA Technical Reports Server (NTRS)
Yao, S. S.; Amis, M. L.
1984-01-01
The JSC registration processor performs scene-to-scene (or band-to-band) correlation based on edge images. The edge images are derived from a percentage of the edge pixels calculated from the raw scene data, excluding clouds and other extraneous data in the scene. Correlations are performed on patches (blocks) of the edge images, and the correlation peak location in each patch is estimated iteratively to fractional pixel location accuracy. Peak offset locations from all patches over the scene are then considered together, and a variety of tests are made to weed out outliers and other inconsistencies before a distortion model is assumed. Thus, the correlation peak offset locations in each patch indicate quantitatively how well the two TM bands register to each other over that patch of scene data. The average of these offsets indicate the overall accuracies of the band-to-band registration. The registration processor was also used to register one acquisition to another acquisition of multitemporal TM data acquired over the same ground track. Band 4 images from both acquisitions were correlated and an rms error of a fraction of a pixel was routinely obtained.
Integrated framework for developing search and discrimination metrics
NASA Astrophysics Data System (ADS)
Copeland, Anthony C.; Trivedi, Mohan M.
1997-06-01
This paper presents an experimental framework for evaluating target signature metrics as models of human visual search and discrimination. This framework is based on a prototype eye tracking testbed, the Integrated Testbed for Eye Movement Studies (ITEMS). ITEMS determines an observer's visual fixation point while he studies a displayed image scene, by processing video of the observer's eye. The utility of this framework is illustrated with an experiment using gray-scale images of outdoor scenes that contain randomly placed targets. Each target is a square region of a specific size containing pixel values from another image of an outdoor scene. The real-world analogy of this experiment is that of a military observer looking upon the sensed image of a static scene to find camouflaged enemy targets that are reported to be in the area. ITEMS provides the data necessary to compute various statistics for each target to describe how easily the observers located it, including the likelihood the target was fixated or identified and the time required to do so. The computed values of several target signature metrics are compared to these statistics, and a second-order metric based on a model of image texture was found to be the most highly correlated.
An optical systems analysis approach to image resampling
NASA Technical Reports Server (NTRS)
Lyon, Richard G.
1997-01-01
All types of image registration require some type of resampling, either during the registration or as a final step in the registration process. Thus the image(s) must be regridded into a spatially uniform, or angularly uniform, coordinate system with some pre-defined resolution. Frequently the ending resolution is not the resolution at which the data was observed with. The registration algorithm designer and end product user are presented with a multitude of possible resampling methods each of which modify the spatial frequency content of the data in some way. The purpose of this paper is threefold: (1) to show how an imaging system modifies the scene from an end to end optical systems analysis approach, (2) to develop a generalized resampling model, and (3) empirically apply the model to simulated radiometric scene data and tabulate the results. A Hanning windowed sinc interpolator method will be developed based upon the optical characterization of the system. It will be discussed in terms of the effects and limitations of sampling, aliasing, spectral leakage, and computational complexity. Simulated radiometric scene data will be used to demonstrate each of the algorithms. A high resolution scene will be "grown" using a fractal growth algorithm based on mid-point recursion techniques. The result scene data will be convolved with a point spread function representing the optical response. The resultant scene will be convolved with the detection systems response and subsampled to the desired resolution. The resultant data product will be subsequently resampled to the correct grid using the Hanning windowed sinc interpolator and the results and errors tabulated and discussed.
Reference View Selection in DIBR-Based Multiview Coding.
Maugey, Thomas; Petrazzuoli, Giovanni; Frossard, Pascal; Cagnazzo, Marco; Pesquet-Popescu, Beatrice
2016-04-01
Augmented reality, interactive navigation in 3D scenes, multiview video, and other emerging multimedia applications require large sets of images, hence larger data volumes and increased resources compared with traditional video services. The significant increase in the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multiview video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the data set. In such coding schemes, the two following questions become fundamental: 1) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? And 2) where to place these key views in the multiview data set? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views, such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multiview coding scheme. We show that considering the 3D scene geometry in the reference view, positioning problem brings significant rate-distortion improvements and outperforms the traditional coding strategy that simply selects key frames based on the distance between cameras.
Direct versus indirect processing changes the influence of color in natural scene categorization.
Otsuka, Sachio; Kawaguchi, Jun
2009-10-01
We examined whether participants would use a negative priming (NP) paradigm to categorize color and grayscale images of natural scenes that were presented peripherally and were ignored. We focused on (1) attentional resources allocated to natural scenes and (2) direct versus indirect processing of them. We set up low and high attention-load conditions, based on the set size of the searched stimuli in the prime display (one and five). Participants were required to detect and categorize the target objects in natural scenes in a central visual search task, ignoring peripheral natural images in both the prime and probe displays. The results showed that, irrespective of attention load, NP was observed for color scenes but not for grayscale scenes. We did not observe any effect of color information in central visual search, where participants responded directly to natural scenes. These results indicate that, in a situation in which participants indirectly process natural scenes, color information is critical to object categorization, but when the scenes are processed directly, color information does not contribute to categorization.
Local structure preserving sparse coding for infrared target recognition
Han, Jing; Yue, Jiang; Zhang, Yi; Bai, Lianfa
2017-01-01
Sparse coding performs well in image classification. However, robust target recognition requires a lot of comprehensive template images and the sparse learning process is complex. We incorporate sparsity into a template matching concept to construct a local sparse structure matching (LSSM) model for general infrared target recognition. A local structure preserving sparse coding (LSPSc) formulation is proposed to simultaneously preserve the local sparse and structural information of objects. By adding a spatial local structure constraint into the classical sparse coding algorithm, LSPSc can improve the stability of sparse representation for targets and inhibit background interference in infrared images. Furthermore, a kernel LSPSc (K-LSPSc) formulation is proposed, which extends LSPSc to the kernel space to weaken the influence of the linear structure constraint in nonlinear natural data. Because of the anti-interference and fault-tolerant capabilities, both LSPSc- and K-LSPSc-based LSSM can implement target identification based on a simple template set, which just needs several images containing enough local sparse structures to learn a sufficient sparse structure dictionary of a target class. Specifically, this LSSM approach has stable performance in the target detection with scene, shape and occlusions variations. High performance is demonstrated on several datasets, indicating robust infrared target recognition in diverse environments and imaging conditions. PMID:28323824
Automated, on-board terrain analysis for precision landings
NASA Technical Reports Server (NTRS)
Rahman, Zia-ur; Jobson, Daniel J.; Woodell, Glenn A.; Hines, Glenn D.
2006-01-01
Advances in space robotics technology hinge to a large extent upon the development and deployment of sophisticated new vision-based methods for automated in-space mission operations and scientific survey. To this end, we have developed a new concept for automated terrain analysis that is based upon a generic image enhancement platform|multi-scale retinex (MSR) and visual servo (VS) processing. This pre-conditioning with the MSR and the vs produces a "canonical" visual representation that is largely independent of lighting variations, and exposure errors. Enhanced imagery is then processed with a biologically inspired two-channel edge detection process, followed by a smoothness based criteria for image segmentation. Landing sites can be automatically determined by examining the results of the smoothness-based segmentation which shows those areas in the image that surpass a minimum degree of smoothness. Though the msr has proven to be a very strong enhancement engine, the other elements of the approach|the vs, terrain map generation, and smoothness-based segmentation|are in early stages of development. Experimental results on data from the Mars Global Surveyor show that the imagery can be processed to automatically obtain smooth landing sites. In this paper, we describe the method used to obtain these landing sites, and also examine the smoothness criteria in terms of the imager and scene characteristics. Several examples of applying this method to simulated and real imagery are shown.
Anomaly detection in hyperspectral imagery: statistics vs. graph-based algorithms
NASA Astrophysics Data System (ADS)
Berkson, Emily E.; Messinger, David W.
2016-05-01
Anomaly detection (AD) algorithms are frequently applied to hyperspectral imagery, but different algorithms produce different outlier results depending on the image scene content and the assumed background model. This work provides the first comparison of anomaly score distributions between common statistics-based anomaly detection algorithms (RX and subspace-RX) and the graph-based Topological Anomaly Detector (TAD). Anomaly scores in statistical AD algorithms should theoretically approximate a chi-squared distribution; however, this is rarely the case with real hyperspectral imagery. The expected distribution of scores found with graph-based methods remains unclear. We also look for general trends in algorithm performance with varied scene content. Three separate scenes were extracted from the hyperspectral MegaScene image taken over downtown Rochester, NY with the VIS-NIR-SWIR ProSpecTIR instrument. In order of most to least cluttered, we study an urban, suburban, and rural scene. The three AD algorithms were applied to each scene, and the distributions of the most anomalous 5% of pixels were compared. We find that subspace-RX performs better than RX, because the data becomes more normal when the highest variance principal components are removed. We also see that compared to statistical detectors, anomalies detected by TAD are easier to separate from the background. Due to their different underlying assumptions, the statistical and graph-based algorithms highlighted different anomalies within the urban scene. These results will lead to a deeper understanding of these algorithms and their applicability across different types of imagery.
Relational Memory Is Evident in Eye Movement Behavior despite the Use of Subliminal Testing Methods.
Nickel, Allison E; Henke, Katharina; Hannula, Deborah E
2015-01-01
While it is generally agreed that perception can occur without awareness, there continues to be debate about the type of representational content that is accessible when awareness is minimized or eliminated. Most investigations that have addressed this issue evaluate access to well-learned representations. Far fewer studies have evaluated whether or not associations encountered just once prior to testing might also be accessed and influence behavior. Here, eye movements were used to examine whether or not memory for studied relationships is evident following the presentation of subliminal cues. Participants assigned to experimental or control groups studied scene-face pairs and test trials evaluated implicit and explicit memory for these pairs. Each test trial began with a subliminal scene cue, followed by three visible studied faces. For experimental group participants, one face was the studied associate of the scene (implicit test); for controls none were a match. Subsequently, the display containing a match was presented to both groups, but now it was preceded by a visible scene cue (explicit test). Eye movements were recorded and recognition memory responses were made. Participants in the experimental group looked disproportionately at matching faces on implicit test trials and participants from both groups looked disproportionately at matching faces on explicit test trials, even when that face had not been successfully identified as the associate. Critically, implicit memory-based viewing effects seemed not to depend on residual awareness of subliminal scene cues, as subjective and objective measures indicated that scenes were successfully masked from view. The reported outcomes indicate that memory for studied relationships can be expressed in eye movement behavior without awareness.
Relational Memory Is Evident in Eye Movement Behavior despite the Use of Subliminal Testing Methods
Nickel, Allison E.; Henke, Katharina; Hannula, Deborah E.
2015-01-01
While it is generally agreed that perception can occur without awareness, there continues to be debate about the type of representational content that is accessible when awareness is minimized or eliminated. Most investigations that have addressed this issue evaluate access to well-learned representations. Far fewer studies have evaluated whether or not associations encountered just once prior to testing might also be accessed and influence behavior. Here, eye movements were used to examine whether or not memory for studied relationships is evident following the presentation of subliminal cues. Participants assigned to experimental or control groups studied scene-face pairs and test trials evaluated implicit and explicit memory for these pairs. Each test trial began with a subliminal scene cue, followed by three visible studied faces. For experimental group participants, one face was the studied associate of the scene (implicit test); for controls none were a match. Subsequently, the display containing a match was presented to both groups, but now it was preceded by a visible scene cue (explicit test). Eye movements were recorded and recognition memory responses were made. Participants in the experimental group looked disproportionately at matching faces on implicit test trials and participants from both groups looked disproportionately at matching faces on explicit test trials, even when that face had not been successfully identified as the associate. Critically, implicit memory-based viewing effects seemed not to depend on residual awareness of subliminal scene cues, as subjective and objective measures indicated that scenes were successfully masked from view. The reported outcomes indicate that memory for studied relationships can be expressed in eye movement behavior without awareness. PMID:26512726
Range data description based on multiple characteristics
NASA Technical Reports Server (NTRS)
Al-Hujazi, Ezzet; Sood, Arun
1988-01-01
An algorithm for describing range images based on Mean curvature (H) and Gaussian curvature (K) is presented. Range images are unique in that they directly approximate the physical surfaces of a real world 3-D scene. The curvature parameters are derived from the fundamental theorems of differential geometry and provides visible invariant pixel labels that can be used to characterize the scene. The sign of H and K can be used to classify each pixel into one of eight possible surface types. Due to the sensitivity of these parameters to noise the resulting HK-sing map does not directly identify surfaces in the range images and must be further processed. A region growing algorithm based on modeling the scene points with a Markov Random Field (MRF) of variable neighborhood size and edge models is suggested. This approach allows the integration of information from multiple characteristics in an efficient way. The performance of the proposed algorithm on a number of synthetic and real range images is discussed.
Superordinate Level Processing Has Priority Over Basic-Level Processing in Scene Gist Recognition
Sun, Qi; Zheng, Yang; Sun, Mingxia; Zheng, Yuanjie
2016-01-01
By combining a perceptual discrimination task and a visuospatial working memory task, the present study examined the effects of visuospatial working memory load on the hierarchical processing of scene gist. In the perceptual discrimination task, two scene images from the same (manmade–manmade pairing or natural–natural pairing) or different superordinate level categories (manmade–natural pairing) were presented simultaneously, and participants were asked to judge whether these two images belonged to the same basic-level category (e.g., street–street pairing) or not (e.g., street–highway pairing). In the concurrent working memory task, spatial load (position-based load in Experiment 1) and object load (figure-based load in Experiment 2) were manipulated. The results were as follows: (a) spatial load and object load have stronger effects on discrimination of same basic-level scene pairing than same superordinate level scene pairing; (b) spatial load has a larger impact on the discrimination of scene pairings at early stages than at later stages; on the contrary, object information has a larger influence on at later stages than at early stages. It followed that superordinate level processing has priority over basic-level processing in scene gist recognition and spatial information contributes to the earlier and object information to the later stages in scene gist recognition. PMID:28382195
Vision and the representation of the surroundings in spatial memory
Tatler, Benjamin W.; Land, Michael F.
2011-01-01
One of the paradoxes of vision is that the world as it appears to us and the image on the retina at any moment are not much like each other. The visual world seems to be extensive and continuous across time. However, the manner in which we sample the visual environment is neither extensive nor continuous. How does the brain reconcile these differences? Here, we consider existing evidence from both static and dynamic viewing paradigms together with the logical requirements of any representational scheme that would be able to support active behaviour. While static scene viewing paradigms favour extensive, but perhaps abstracted, memory representations, dynamic settings suggest sparser and task-selective representation. We suggest that in dynamic settings where movement within extended environments is required to complete a task, the combination of visual input, egocentric and allocentric representations work together to allow efficient behaviour. The egocentric model serves as a coding scheme in which actions can be planned, but also offers a potential means of providing the perceptual stability that we experience. PMID:21242146
Image based performance analysis of thermal imagers
NASA Astrophysics Data System (ADS)
Wegner, D.; Repasi, E.
2016-05-01
Due to advances in technology, modern thermal imagers resemble sophisticated image processing systems in functionality. Advanced signal and image processing tools enclosed into the camera body extend the basic image capturing capability of thermal cameras. This happens in order to enhance the display presentation of the captured scene or specific scene details. Usually, the implemented methods are proprietary company expertise, distributed without extensive documentation. This makes the comparison of thermal imagers especially from different companies a difficult task (or at least a very time consuming/expensive task - e.g. requiring the execution of a field trial and/or an observer trial). For example, a thermal camera equipped with turbulence mitigation capability stands for such a closed system. The Fraunhofer IOSB has started to build up a system for testing thermal imagers by image based methods in the lab environment. This will extend our capability of measuring the classical IR-system parameters (e.g. MTF, MTDP, etc.) in the lab. The system is set up around the IR- scene projector, which is necessary for the thermal display (projection) of an image sequence for the IR-camera under test. The same set of thermal test sequences might be presented to every unit under test. For turbulence mitigation tests, this could be e.g. the same turbulence sequence. During system tests, gradual variation of input parameters (e. g. thermal contrast) can be applied. First ideas of test scenes selection and how to assembly an imaging suite (a set of image sequences) for the analysis of imaging thermal systems containing such black boxes in the image forming path is discussed.
Preoperative simulation for the planning of microsurgical clipping of intracranial aneurysms.
Marinho, Paulo; Vermandel, Maximilien; Bourgeois, Philippe; Lejeune, Jean-Paul; Mordon, Serge; Thines, Laurent
2014-12-01
The safety and success of intracranial aneurysm (IA) surgery could be improved through the dedicated application of simulation covering the procedure from the 3-dimensional (3D) description of the surgical scene to the visual representation of the clip application. We aimed in this study to validate the technical feasibility and clinical relevance of such a protocol. All patients preoperatively underwent 3D magnetic resonance imaging and 3D computed tomography angiography to build 3D reconstructions of the brain, cerebral arteries, and surrounding cranial bone. These 3D models were segmented and merged using Osirix, a DICOM image processing application. This provided the surgical scene that was subsequently imported into Blender, a modeling platform for 3D animation. Digitized clips and appliers could then be manipulated in the virtual operative environment, allowing the visual simulation of clipping. This simulation protocol was assessed in a series of 10 IAs by 2 neurosurgeons. The protocol was feasible in all patients. The visual similarity between the surgical scene and the operative view was excellent in 100% of the cases, and the identification of the vascular structures was accurate in 90% of the cases. The neurosurgeons found the simulation helpful for planning the surgical approach (ie, the bone flap, cisternal opening, and arterial tree exposure) in 100% of the cases. The correct number of final clip(s) needed was predicted from the simulation in 90% of the cases. The preoperatively expected characteristics of the optimal clip(s) (ie, their number, shape, size, and orientation) were validated during surgery in 80% of the cases. This study confirmed that visual simulation of IA clipping based on the processing of high-resolution 3D imaging can be effective. This is a new and important step toward the development of a more sophisticated integrated simulation platform dedicated to cerebrovascular surgery.
A strongly goal-directed close-range vision system for spacecraft docking
NASA Technical Reports Server (NTRS)
Boyer, Kim L.; Goddard, Ralph E.
1991-01-01
In this presentation, we will propose a strongly goal-oriented stereo vision system to establish proper docking approach motions for automated rendezvous and capture (AR&C). From an input sequence of stereo video image pairs, the system produces a current best estimate of: contact position; contact vector; contact velocity; and contact orientation. The processing demands imposed by this particular problem and its environment dictate a special case solution; such a system should necessarily be, in some sense, minimalist. By this we mean the system should construct a scene description just sufficiently rich to solve the problem at hand and should do no more processing than is absolutely necessary. In addition, the imaging resolution should be just sufficient. Extracting additional information and constructing higher level scene representations wastes energy and computational resources and injects an unnecessary degree of complexity, increasing the likelihood of malfunction. We therefore take a departure from most prior stereopsis work, including our own, and propose a system based on associative memory. The purpose of the memory is to immediately associate a set of motor commands with a set of input visual patterns in the two cameras. That is, rather than explicitly computing point correspondences and object positions in world coordinates and trying to reason forward from this information to a plan of action, we are trying to capture the essence of reflex behavior through the action of associative memory. The explicit construction of point correspondences and 3D scene descriptions, followed by online velocity and point of impact calculations, is prohibitively expensive from a computational point of view for the problem at hand. Learned patterns on the four image planes, left and right at two discrete but closely spaced instants in time, will be bused directly to infer the spacecraft reaction. This will be a continuing online process as the docking collar approaches.
Li, Yiyang; Jin, Weiqi; Li, Shuo; Zhang, Xu; Zhu, Jin
2017-01-01
Cooled infrared detector arrays always suffer from undesired ripple residual nonuniformity (RNU) in sky scene observations. The ripple residual nonuniformity seriously affects the imaging quality, especially for small target detection. It is difficult to eliminate it using the calibration-based techniques and the current scene-based nonuniformity algorithms. In this paper, we present a modified temporal high-pass nonuniformity correction algorithm using fuzzy scene classification. The fuzzy scene classification is designed to control the correction threshold so that the algorithm can remove ripple RNU without degrading the scene details. We test the algorithm on a real infrared sequence by comparing it to several well-established methods. The result shows that the algorithm has obvious advantages compared with the tested methods in terms of detail conservation and convergence speed for ripple RNU correction. Furthermore, we display our architecture with a prototype built on a Xilinx Virtex-5 XC5VLX50T field-programmable gate array (FPGA), which has two advantages: (1) low resources consumption; and (2) small hardware delay (less than 10 image rows). It has been successfully applied in an actual system. PMID:28481320
The use of vision-based image quality metrics to predict low-light performance of camera phones
NASA Astrophysics Data System (ADS)
Hultgren, B.; Hertel, D.
2010-01-01
Small digital camera modules such as those in mobile phones have become ubiquitous. Their low-light performance is of utmost importance since a high percentage of images are made under low lighting conditions where image quality failure may occur due to blur, noise, and/or underexposure. These modes of image degradation are not mutually exclusive: they share common roots in the physics of the imager, the constraints of image processing, and the general trade-off situations in camera design. A comprehensive analysis of failure modes is needed in order to understand how their interactions affect overall image quality. Low-light performance is reported for DSLR, point-and-shoot, and mobile phone cameras. The measurements target blur, noise, and exposure error. Image sharpness is evaluated from three different physical measurements: static spatial frequency response, handheld motion blur, and statistical information loss due to image processing. Visual metrics for sharpness, graininess, and brightness are calculated from the physical measurements, and displayed as orthogonal image quality metrics to illustrate the relative magnitude of image quality degradation as a function of subject illumination. The impact of each of the three sharpness measurements on overall sharpness quality is displayed for different light levels. The power spectrum of the statistical information target is a good representation of natural scenes, thus providing a defined input signal for the measurement of power-spectrum based signal-to-noise ratio to characterize overall imaging performance.
Efficient space-time sampling with pixel-wise coded exposure for high-speed imaging.
Liu, Dengyu; Gu, Jinwei; Hitomi, Yasunobu; Gupta, Mohit; Mitsunaga, Tomoo; Nayar, Shree K
2014-02-01
Cameras face a fundamental trade-off between spatial and temporal resolution. Digital still cameras can capture images with high spatial resolution, but most high-speed video cameras have relatively low spatial resolution. It is hard to overcome this trade-off without incurring a significant increase in hardware costs. In this paper, we propose techniques for sampling, representing, and reconstructing the space-time volume to overcome this trade-off. Our approach has two important distinctions compared to previous works: 1) We achieve sparse representation of videos by learning an overcomplete dictionary on video patches, and 2) we adhere to practical hardware constraints on sampling schemes imposed by architectures of current image sensors, which means that our sampling function can be implemented on CMOS image sensors with modified control units in the future. We evaluate components of our approach, sampling function and sparse representation, by comparing them to several existing approaches. We also implement a prototype imaging system with pixel-wise coded exposure control using a liquid crystal on silicon device. System characteristics such as field of view and modulation transfer function are evaluated for our imaging system. Both simulations and experiments on a wide range of scenes show that our method can effectively reconstruct a video from a single coded image while maintaining high spatial resolution.
High dynamic range image acquisition based on multiplex cameras
NASA Astrophysics Data System (ADS)
Zeng, Hairui; Sun, Huayan; Zhang, Tinghua
2018-03-01
High dynamic image is an important technology of photoelectric information acquisition, providing higher dynamic range and more image details, and it can better reflect the real environment, light and color information. Currently, the method of high dynamic range image synthesis based on different exposure image sequences cannot adapt to the dynamic scene. It fails to overcome the effects of moving targets, resulting in the phenomenon of ghost. Therefore, a new high dynamic range image acquisition method based on multiplex cameras system was proposed. Firstly, different exposure images sequences were captured with the camera array, using the method of derivative optical flow based on color gradient to get the deviation between images, and aligned the images. Then, the high dynamic range image fusion weighting function was established by combination of inverse camera response function and deviation between images, and was applied to generated a high dynamic range image. The experiments show that the proposed method can effectively obtain high dynamic images in dynamic scene, and achieves good results.
The hologram as a space of illusion
NASA Astrophysics Data System (ADS)
Oliveira, Rosa M.
2013-03-01
One of the most interesting aspects of art holography is the study of 3D holographic image. Over the centuries, artists have chased the best way to represent the third dimension as similar to reality as possible. Several steps have been given in this direction, first using perspective, then photography, and later with movies, but all of these representations of reality wouldn't reach the complete objective. The realism of a 3D representation on a 2D support (paper, canvas, celluloid) is completely overcome by holography. In spite of the fact that the holographic plate or film is also a 2D support, the holographic image is a recording of all the information of the object contained in light. Our perception doesn't need to translate the object as real. It is real. Though immaterial, the holographic image is real because it exists in light. The same parallax, the same shape. The representation is no more an imitation of reality but a replacement of the real object or scene. The space where it exists is a space of illusion and multiple objects can occupy the same place in the hologram, depending on the viewer's time and place. This introduces the fourth dimension in the hologram: time, as well as the apparent conflict between the presence and the absence of images, which is just possible in holography.
Intensity dependent spread theory
NASA Technical Reports Server (NTRS)
Holben, Richard
1990-01-01
The Intensity Dependent Spread (IDS) procedure is an image-processing technique based on a model of the processing which occurs in the human visual system. IDS processing is relevant to many aspects of machine vision and image processing. For quantum limited images, it produces an ideal trade-off between spatial resolution and noise averaging, performs edge enhancement thus requiring only mean-crossing detection for the subsequent extraction of scene edges, and yields edge responses whose amplitudes are independent of scene illumination, depending only upon the ratio of the reflectance on the two sides of the edge. These properties suggest that the IDS process may provide significant bandwidth reduction while losing only minimal scene information when used as a preprocessor at or near the image plane.
Huang, Wei; Xiao, Liang; Liu, Hongyi; Wei, Zhihui
2015-01-19
Due to the instrumental and imaging optics limitations, it is difficult to acquire high spatial resolution hyperspectral imagery (HSI). Super-resolution (SR) imagery aims at inferring high quality images of a given scene from degraded versions of the same scene. This paper proposes a novel hyperspectral imagery super-resolution (HSI-SR) method via dictionary learning and spatial-spectral regularization. The main contributions of this paper are twofold. First, inspired by the compressive sensing (CS) framework, for learning the high resolution dictionary, we encourage stronger sparsity on image patches and promote smaller coherence between the learned dictionary and sensing matrix. Thus, a sparsity and incoherence restricted dictionary learning method is proposed to achieve higher efficiency sparse representation. Second, a variational regularization model combing a spatial sparsity regularization term and a new local spectral similarity preserving term is proposed to integrate the spectral and spatial-contextual information of the HSI. Experimental results show that the proposed method can effectively recover spatial information and better preserve spectral information. The high spatial resolution HSI reconstructed by the proposed method outperforms reconstructed results by other well-known methods in terms of both objective measurements and visual evaluation.
Liu, Dan; Liu, Xuejun; Wu, Yiguang
2018-04-24
This paper presents an effective approach for depth reconstruction from a single image through the incorporation of semantic information and local details from the image. A unified framework for depth acquisition is constructed by joining a deep Convolutional Neural Network (CNN) and a continuous pairwise Conditional Random Field (CRF) model. Semantic information and relative depth trends of local regions inside the image are integrated into the framework. A deep CNN network is firstly used to automatically learn a hierarchical feature representation of the image. To get more local details in the image, the relative depth trends of local regions are incorporated into the network. Combined with semantic information of the image, a continuous pairwise CRF is then established and is used as the loss function of the unified model. Experiments on real scenes demonstrate that the proposed approach is effective and that the approach obtains satisfactory results.
A mobile unit for memory retrieval in daily life based on image and sensor processing
NASA Astrophysics Data System (ADS)
Takesumi, Ryuji; Ueda, Yasuhiro; Nakanishi, Hidenobu; Nakamura, Atsuyoshi; Kakimori, Nobuaki
2003-10-01
We developed a Mobile Unit which purpose is to support memory retrieval of daily life. In this paper, we describe the two characteristic factors of this unit. (1)The behavior classification with an acceleration sensor. (2)Extracting the difference of environment with image processing technology. In (1), By analyzing power and frequency of an acceleration sensor which turns to gravity direction, the one's activities can be classified using some techniques to walk, stay, and so on. In (2), By extracting the difference between the beginning scene and the ending scene of a stay scene with image processing, the result which is done by user is recognized as the difference of environment. Using those 2 techniques, specific scenes of daily life can be extracted, and important information at the change of scenes can be realized to record. Especially we describe the effect to support retrieving important things, such as a thing left behind and a state of working halfway.
Conjoint representation of texture ensemble and location in the parahippocampal place area.
Park, Jeongho; Park, Soojin
2017-04-01
Texture provides crucial information about the category or identity of a scene. Nonetheless, not much is known about how the texture information in a scene is represented in the brain. Previous studies have shown that the parahippocampal place area (PPA), a scene-selective part of visual cortex, responds to simple patches of texture ensemble. However, in natural scenes textures exist in spatial context within a scene. Here we tested two hypotheses that make different predictions on how textures within a scene context are represented in the PPA. The Texture-Only hypothesis suggests that the PPA represents texture ensemble (i.e., the kind of texture) as is, irrespective of its location in the scene. On the other hand, the Texture and Location hypothesis suggests that the PPA represents texture and its location within a scene (e.g., ceiling or wall) conjointly. We tested these two hypotheses across two experiments, using different but complementary methods. In experiment 1 , by using multivoxel pattern analysis (MVPA) and representational similarity analysis, we found that the representational similarity of the PPA activation patterns was significantly explained by the Texture-Only hypothesis but not by the Texture and Location hypothesis. In experiment 2 , using a repetition suppression paradigm, we found no repetition suppression for scenes that had the same texture ensemble but differed in location (supporting the Texture and Location hypothesis). On the basis of these results, we propose a framework that reconciles contrasting results from MVPA and repetition suppression and draw conclusions about how texture is represented in the PPA. NEW & NOTEWORTHY This study investigates how the parahippocampal place area (PPA) represents texture information within a scene context. We claim that texture is represented in the PPA at multiple levels: the texture ensemble information at the across-voxel level and the conjoint information of texture and its location at the within-voxel level. The study proposes a working hypothesis that reconciles contrasting results from multivoxel pattern analysis and repetition suppression, suggesting that the methods are complementary to each other but not necessarily interchangeable. Copyright © 2017 the American Physiological Society.
Fazl, Arash; Grossberg, Stephen; Mingolla, Ennio
2009-02-01
How does the brain learn to recognize an object from multiple viewpoints while scanning a scene with eye movements? How does the brain avoid the problem of erroneously classifying parts of different objects together? How are attention and eye movements intelligently coordinated to facilitate object learning? A neural model provides a unified mechanistic explanation of how spatial and object attention work together to search a scene and learn what is in it. The ARTSCAN model predicts how an object's surface representation generates a form-fitting distribution of spatial attention, or "attentional shroud". All surface representations dynamically compete for spatial attention to form a shroud. The winning shroud persists during active scanning of the object. The shroud maintains sustained activity of an emerging view-invariant category representation while multiple view-specific category representations are learned and are linked through associative learning to the view-invariant object category. The shroud also helps to restrict scanning eye movements to salient features on the attended object. Object attention plays a role in controlling and stabilizing the learning of view-specific object categories. Spatial attention hereby coordinates the deployment of object attention during object category learning. Shroud collapse releases a reset signal that inhibits the active view-invariant category in the What cortical processing stream. Then a new shroud, corresponding to a different object, forms in the Where cortical processing stream, and search using attention shifts and eye movements continues to learn new objects throughout a scene. The model mechanistically clarifies basic properties of attention shifts (engage, move, disengage) and inhibition of return. It simulates human reaction time data about object-based spatial attention shifts, and learns with 98.1% accuracy and a compression of 430 on a letter database whose letters vary in size, position, and orientation. The model provides a powerful framework for unifying many data about spatial and object attention, and their interactions during perception, cognition, and action.
NASA Astrophysics Data System (ADS)
Nakagawa, M.; Akano, K.; Kobayashi, T.; Sekiguchi, Y.
2017-09-01
Image-based virtual reality (VR) is a virtual space generated with panoramic images projected onto a primitive model. In imagebased VR, realistic VR scenes can be generated with lower rendering cost, and network data can be described as relationships among VR scenes. The camera network data are generated manually or by an automated procedure using camera position and rotation data. When panoramic images are acquired in indoor environments, network data should be generated without Global Navigation Satellite Systems (GNSS) positioning data. Thus, we focused on image-based VR generation using a panoramic camera in indoor environments. We propose a methodology to automate network data generation using panoramic images for an image-based VR space. We verified and evaluated our methodology through five experiments in indoor environments, including a corridor, elevator hall, room, and stairs. We confirmed that our methodology can automatically reconstruct network data using panoramic images for image-based VR in indoor environments without GNSS position data.
Extensions of algebraic image operators: An approach to model-based vision
NASA Technical Reports Server (NTRS)
Lerner, Bao-Ting; Morelli, Michael V.
1990-01-01
Researchers extend their previous research on a highly structured and compact algebraic representation of grey-level images which can be viewed as fuzzy sets. Addition and multiplication are defined for the set of all grey-level images, which can then be described as polynomials of two variables. Utilizing this new algebraic structure, researchers devised an innovative, efficient edge detection scheme. An accurate method for deriving gradient component information from this edge detector is presented. Based upon this new edge detection system researchers developed a robust method for linear feature extraction by combining the techniques of a Hough transform and a line follower. The major advantage of this feature extractor is its general, object-independent nature. Target attributes, such as line segment lengths, intersections, angles of intersection, and endpoints are derived by the feature extraction algorithm and employed during model matching. The algebraic operators are global operations which are easily reconfigured to operate on any size or shape region. This provides a natural platform from which to pursue dynamic scene analysis. A method for optimizing the linear feature extractor which capitalizes on the spatially reconfiguration nature of the edge detector/gradient component operator is discussed.
[Glossary of terms used by radiologists in image processing].
Rolland, Y; Collorec, R; Bruno, A; Ramée, A; Morcet, N; Haigron, P
1995-01-01
We give the definition of 166 words used in image processing. Adaptivity, aliazing, analog-digital converter, analysis, approximation, arc, artifact, artificial intelligence, attribute, autocorrelation, bandwidth, boundary, brightness, calibration, class, classification, classify, centre, cluster, coding, color, compression, contrast, connectivity, convolution, correlation, data base, decision, decomposition, deconvolution, deduction, descriptor, detection, digitization, dilation, discontinuity, discretization, discrimination, disparity, display, distance, distorsion, distribution dynamic, edge, energy, enhancement, entropy, erosion, estimation, event, extrapolation, feature, file, filter, filter floaters, fitting, Fourier transform, frequency, fusion, fuzzy, Gaussian, gradient, graph, gray level, group, growing, histogram, Hough transform, Houndsfield, image, impulse response, inertia, intensity, interpolation, interpretation, invariance, isotropy, iterative, JPEG, knowledge base, label, laplacian, learning, least squares, likelihood, matching, Markov field, mask, matching, mathematical morphology, merge (to), MIP, median, minimization, model, moiré, moment, MPEG, neural network, neuron, node, noise, norm, normal, operator, optical system, optimization, orthogonal, parametric, pattern recognition, periodicity, photometry, pixel, polygon, polynomial, prediction, pulsation, pyramidal, quantization, raster, reconstruction, recursive, region, rendering, representation space, resolution, restoration, robustness, ROC, thinning, transform, sampling, saturation, scene analysis, segmentation, separable function, sequential, smoothing, spline, split (to), shape, threshold, tree, signal, speckle, spectrum, spline, stationarity, statistical, stochastic, structuring element, support, syntaxic, synthesis, texture, truncation, variance, vision, voxel, windowing.
Object Interpolation in Three Dimensions
ERIC Educational Resources Information Center
Kellman, Philip J.; Garrigan, Patrick; Shipley, Thomas F.
2005-01-01
Perception of objects in ordinary scenes requires interpolation processes connecting visible areas across spatial gaps. Most research has focused on 2-D displays, and models have been based on 2-D, orientation-sensitive units. The authors present a view of interpolation processes as intrinsically 3-D and producing representations of contours and…
Scene text recognition in mobile applications by character descriptor and structure configuration.
Yi, Chucai; Tian, Yingli
2014-07-01
Text characters and strings in natural scene can provide valuable information for many applications. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. This paper proposes a method of scene text recognition from detected text regions. In text detection, our previously proposed algorithms are applied to obtain text regions from scene image. First, we design a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors. Second, we model character structure at each character class by designing stroke configuration maps. Our algorithm design is compatible with the application of scene text extraction in smart mobile devices. An Android-based demo system is developed to show the effectiveness of our proposed method on scene text information extraction from nearby objects. The demo system also provides us some insight into algorithm design and performance improvement of scene text extraction. The evaluation results on benchmark data sets demonstrate that our proposed scheme of text recognition is comparable with the best existing methods.
Image/video understanding systems based on network-symbolic models
NASA Astrophysics Data System (ADS)
Kuvich, Gary
2004-03-01
Vision is a part of a larger information system that converts visual information into knowledge structures. These structures drive vision process, resolve ambiguity and uncertainty via feedback projections, and provide image understanding that is an interpretation of visual information in terms of such knowledge models. Computer simulation models are built on the basis of graphs/networks. The ability of human brain to emulate similar graph/network models is found. Symbols, predicates and grammars naturally emerge in such networks, and logic is simply a way of restructuring such models. Brain analyzes an image as a graph-type relational structure created via multilevel hierarchical compression of visual information. Primary areas provide active fusion of image features on a spatial grid-like structure, where nodes are cortical columns. Spatial logic and topology naturally present in such structures. Mid-level vision processes like perceptual grouping, separation of figure from ground, are special kinds of network transformations. They convert primary image structure into the set of more abstract ones, which represent objects and visual scene, making them easy for analysis by higher-level knowledge structures. Higher-level vision phenomena are results of such analysis. Composition of network-symbolic models combines learning, classification, and analogy together with higher-level model-based reasoning into a single framework, and it works similar to frames and agents. Computational intelligence methods transform images into model-based knowledge representation. Based on such principles, an Image/Video Understanding system can convert images into the knowledge models, and resolve uncertainty and ambiguity. This allows creating intelligent computer vision systems for design and manufacturing.
A survey of infrared and visual image fusion methods
NASA Astrophysics Data System (ADS)
Jin, Xin; Jiang, Qian; Yao, Shaowen; Zhou, Dongming; Nie, Rencan; Hai, Jinjin; He, Kangjian
2017-09-01
Infrared (IR) and visual (VI) image fusion is designed to fuse multiple source images into a comprehensive image to boost imaging quality and reduce redundancy information, which is widely used in various imaging equipment to improve the visual ability of human and robot. The accurate, reliable and complementary descriptions of the scene in fused images make these techniques be widely used in various fields. In recent years, a large number of fusion methods for IR and VI images have been proposed due to the ever-growing demands and the progress of image representation methods; however, there has not been published an integrated survey paper about this field in last several years. Therefore, we make a survey to report the algorithmic developments of IR and VI image fusion. In this paper, we first characterize the IR and VI image fusion based applications to represent an overview of the research status. Then we present a synthesize survey of the state of the art. Thirdly, the frequently-used image fusion quality measures are introduced. Fourthly, we perform some experiments of typical methods and make corresponding analysis. At last, we summarize the corresponding tendencies and challenges in IR and VI image fusion. This survey concludes that although various IR and VI image fusion methods have been proposed, there still exist further improvements or potential research directions in different applications of IR and VI image fusion.
Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks
Yu, Haiyang; Wu, Zhihai; Wang, Shuqin; Wang, Yunpeng; Ma, Xiaolei
2017-01-01
Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction. PMID:28672867
Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks.
Yu, Haiyang; Wu, Zhihai; Wang, Shuqin; Wang, Yunpeng; Ma, Xiaolei
2017-06-26
Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction.
The Characteristics and Limits of Rapid Visual Categorization
Fabre-Thorpe, Michèle
2011-01-01
Visual categorization appears both effortless and virtually instantaneous. The study by Thorpe et al. (1996) was the first to estimate the processing time necessary to perform fast visual categorization of animals in briefly flashed (20 ms) natural photographs. They observed a large differential EEG activity between target and distracter correct trials that developed from 150 ms after stimulus onset, a value that was later shown to be even shorter in monkeys! With such strong processing time constraints, it was difficult to escape the conclusion that rapid visual categorization was relying on massively parallel, essentially feed-forward processing of visual information. Since 1996, we have conducted a large number of studies to determine the characteristics and limits of fast visual categorization. The present chapter will review some of the main results obtained. I will argue that rapid object categorizations in natural scenes can be done without focused attention and are most likely based on coarse and unconscious visual representations activated with the first available (magnocellular) visual information. Fast visual processing proved efficient for the categorization of large superordinate object or scene categories, but shows its limits when more detailed basic representations are required. The representations for basic objects (dogs, cars) or scenes (mountain or sea landscapes) need additional processing time to be activated. This finding is at odds with the widely accepted idea that such basic representations are at the entry level of the system. Interestingly, focused attention is still not required to perform these time consuming basic categorizations. Finally we will show that object and context processing can interact very early in an ascending wave of visual information processing. We will discuss how such data could result from our experience with a highly structured and predictable surrounding world that shaped neuronal visual selectivity. PMID:22007180
NASA Astrophysics Data System (ADS)
Buford, James A., Jr.; Cosby, David; Bunfield, Dennis H.; Mayhall, Anthony J.; Trimble, Darian E.
2007-04-01
AMRDEC has successfully tested hardware and software for Real-Time Scene Generation for IR and SAL Sensors on COTS PC based hardware and video cards. AMRDEC personnel worked with nVidia and Concurrent Computer Corporation to develop a Scene Generation system capable of frame rates of at least 120Hz while frame locked to an external source (such as a missile seeker) with no dropped frames. Latency measurements and image validation were performed using COTS and in-house developed hardware and software. Software for the Scene Generation system was developed using OpenSceneGraph.
Edge detection and localization with edge pattern analysis and inflection characterization
NASA Astrophysics Data System (ADS)
Jiang, Bo
2012-05-01
In general edges are considered to be abrupt changes or discontinuities in two dimensional image signal intensity distributions. The accuracy of front-end edge detection methods in image processing impacts the eventual success of higher level pattern analysis downstream. To generalize edge detectors designed from a simple ideal step function model to real distortions in natural images, research on one dimensional edge pattern analysis to improve the accuracy of edge detection and localization proposes an edge detection algorithm, which is composed by three basic edge patterns, such as ramp, impulse, and step. After mathematical analysis, general rules for edge representation based upon the classification of edge types into three categories-ramp, impulse, and step (RIS) are developed to reduce detection and localization errors, especially reducing "double edge" effect that is one important drawback to the derivative method. But, when applying one dimensional edge pattern in two dimensional image processing, a new issue is naturally raised that the edge detector should correct marking inflections or junctions of edges. Research on human visual perception of objects and information theory pointed out that a pattern lexicon of "inflection micro-patterns" has larger information than a straight line. Also, research on scene perception gave an idea that contours have larger information are more important factor to determine the success of scene categorization. Therefore, inflections or junctions are extremely useful features, whose accurate description and reconstruction are significant in solving correspondence problems in computer vision. Therefore, aside from adoption of edge pattern analysis, inflection or junction characterization is also utilized to extend traditional derivative edge detection algorithm. Experiments were conducted to test my propositions about edge detection and localization accuracy improvements. The results support the idea that these edge detection method improvements are effective in enhancing the accuracy of edge detection and localization.
Part-based deep representation for product tagging and search
NASA Astrophysics Data System (ADS)
Chen, Keqing
2017-06-01
Despite previous studies, tagging and indexing the product images remain challenging due to the large inner-class variation of the products. In the traditional methods, the quantized hand-crafted features such as SIFTs are extracted as the representation of the product images, which are not discriminative enough to handle the inner-class variation. For discriminative image representation, this paper firstly presents a novel deep convolutional neural networks (DCNNs) architect true pre-trained on a large-scale general image dataset. Compared to the traditional features, our DCNNs representation is of more discriminative power with fewer dimensions. Moreover, we incorporate the part-based model into the framework to overcome the negative effect of bad alignment and cluttered background and hence the descriptive ability of the deep representation is further enhanced. Finally, we collect and contribute a well-labeled shoe image database, i.e., the TBShoes, on which we apply the part-based deep representation for product image tagging and search, respectively. The experimental results highlight the advantages of the proposed part-based deep representation.
Yoon, Jong H.; Tamir, Diana; Minzenberg, Michael J.; Ragland, J. Daniel; Ursu, Stefan; Carter, Cameron S.
2009-01-01
Background Multivariate pattern analysis is an alternative method of analyzing fMRI data, which is capable of decoding distributed neural representations. We applied this method to test the hypothesis of the impairment in distributed representations in schizophrenia. We also compared the results of this method with traditional GLM-based univariate analysis. Methods 19 schizophrenia and 15 control subjects viewed two runs of stimuli--exemplars of faces, scenes, objects, and scrambled images. To verify engagement with stimuli, subjects completed a 1-back matching task. A multi-voxel pattern classifier was trained to identify category-specific activity patterns on one run of fMRI data. Classification testing was conducted on the remaining run. Correlation of voxel-wise activity across runs evaluated variance over time in activity patterns. Results Patients performed the task less accurately. This group difference was reflected in the pattern analysis results with diminished classification accuracy in patients compared to controls, 59% and 72% respectively. In contrast, there was no group difference in GLM-based univariate measures. In both groups, classification accuracy was significantly correlated with behavioral measures. Both groups showed highly significant correlation between inter-run correlations and classification accuracy. Conclusions Distributed representations of visual objects are impaired in schizophrenia. This impairment is correlated with diminished task performance, suggesting that decreased integrity of cortical activity patterns is reflected in impaired behavior. Comparisons with univariate results suggest greater sensitivity of pattern analysis in detecting group differences in neural activity and reduced likelihood of non-specific factors driving these results. PMID:18822407
Wen, Haiguang; Shi, Junxing; Chen, Wei; Liu, Zhongming
2018-02-28
The brain represents visual objects with topographic cortical patterns. To address how distributed visual representations enable object categorization, we established predictive encoding models based on a deep residual network, and trained them to predict cortical responses to natural movies. Using this predictive model, we mapped human cortical representations to 64,000 visual objects from 80 categories with high throughput and accuracy. Such representations covered both the ventral and dorsal pathways, reflected multiple levels of object features, and preserved semantic relationships between categories. In the entire visual cortex, object representations were organized into three clusters of categories: biological objects, non-biological objects, and background scenes. In a finer scale specific to each cluster, object representations revealed sub-clusters for further categorization. Such hierarchical clustering of category representations was mostly contributed by cortical representations of object features from middle to high levels. In summary, this study demonstrates a useful computational strategy to characterize the cortical organization and representations of visual features for rapid categorization.
Briefly Cuing Memories Leads to Suppression of Their Neural Representations
Norman, Kenneth A.
2014-01-01
Previous studies have linked partial memory activation with impaired subsequent memory retrieval (e.g., Detre et al., 2013) but have not provided an account of this phenomenon at the level of memory representations: How does partial activation change the neural pattern subsequently elicited when the memory is cued? To address this question, we conducted a functional magnetic resonance imaging (fMRI) experiment in which participants studied word-scene paired associates. Later, we weakly reactivated some memories by briefly presenting the cue word during a rapid serial visual presentation (RSVP) task; other memories were more strongly reactivated or not reactivated at all. We tested participants' memory for the paired associates before and after RSVP. Cues that were briefly presented during RSVP triggered reduced levels of scene activity on the post-RSVP memory test, relative to the other conditions. We used pattern similarity analysis to assess how representations changed as a function of the RSVP manipulation. For briefly cued pairs, we found that neural patterns elicited by the same cue on the pre- and post-RSVP tests (preA–postA; preB–postB) were less similar than neural patterns elicited by different cues (preA–postB; preB–postA). These similarity reductions were predicted by neural measures of memory activation during RSVP. Through simulation, we show that our pattern similarity results are consistent with a model in which partial memory activation triggers selective weakening of the strongest parts of the memory. PMID:24899722
Digital Simulation Of Precise Sensor Degradations Including Non-Linearities And Shift Variance
NASA Astrophysics Data System (ADS)
Kornfeld, Gertrude H.
1987-09-01
Realistic atmospheric and Forward Looking Infrared Radiometer (FLIR) degradations were digitally simulated. Inputs to the routine are environmental observables and the FLIR specifications. It was possible to achieve realism in the thermal domain within acceptable computer time and random access memory (RAM) requirements because a shift variant recursive convolution algorithm that well describes thermal properties was invented and because each picture element (pixel) has radiative temperature, a materials parameter and range and altitude information. The computer generation steps start with the image synthesis of an undegraded scene. Atmospheric and sensor degradation follow. The final result is a realistic representation of an image seen on the display of a specific FLIR.
Basic level scene understanding: categories, attributes and structures
Xiao, Jianxiong; Hays, James; Russell, Bryan C.; Patterson, Genevieve; Ehinger, Krista A.; Torralba, Antonio; Oliva, Aude
2013-01-01
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image. PMID:24009590
Visual wetness perception based on image color statistics.
Sawayama, Masataka; Adelson, Edward H; Nishida, Shin'ya
2017-05-01
Color vision provides humans and animals with the abilities to discriminate colors based on the wavelength composition of light and to determine the location and identity of objects of interest in cluttered scenes (e.g., ripe fruit among foliage). However, we argue that color vision can inform us about much more than color alone. Since a trichromatic image carries more information about the optical properties of a scene than a monochromatic image does, color can help us recognize complex material qualities. Here we show that human vision uses color statistics of an image for the perception of an ecologically important surface condition (i.e., wetness). Psychophysical experiments showed that overall enhancement of chromatic saturation, combined with a luminance tone change that increases the darkness and glossiness of the image, tended to make dry scenes look wetter. Theoretical analysis along with image analysis of real objects indicated that our image transformation, which we call the wetness enhancing transformation, is consistent with actual optical changes produced by surface wetting. Furthermore, we found that the wetness enhancing transformation operator was more effective for the images with many colors (large hue entropy) than for those with few colors (small hue entropy). The hue entropy may be used to separate surface wetness from other surface states having similar optical properties. While surface wetness and surface color might seem to be independent, there are higher order color statistics that can influence wetness judgments, in accord with the ecological statistics. The present findings indicate that the visual system uses color image statistics in an elegant way to help estimate the complex physical status of a scene.
Johansson, Roger; Oren, Franziska; Holmqvist, Kenneth
2018-06-01
When recalling something you have previously read, to what degree will such episodic remembering activate a situation model of described events versus a memory representation of the text itself? The present study was designed to address this question by recording eye movements of participants who recalled previously read texts while looking at a blank screen. An accumulating body of research has demonstrated that spontaneous eye movements occur during episodic memory retrieval and that fixation locations from such gaze patterns to a large degree overlap with the visuospatial layout of the recalled information. Here we used this phenomenon to investigate to what degree participants' gaze patterns corresponded with the visuospatial configuration of the text itself versus a visuospatial configuration described in it. The texts to be recalled were scene descriptions, where the spatial configuration of the scene content was manipulated to be either congruent or incongruent with the spatial configuration of the text itself. Results show that participants' gaze patterns were more likely to correspond with a visuospatial representation of the described scene than with a visuospatial representation of the text itself, but also that the contribution of those representations of space is sensitive to the text content. This is the first demonstration that eye movements can be used to discriminate on which representational level texts are remembered and the findings provide novel insight into the underlying dynamics in play. Copyright © 2018 Elsevier B.V. All rights reserved.
Modulation of Temporal Precision in Thalamic Population Responses to Natural Visual Stimuli
Desbordes, Gaëlle; Jin, Jianzhong; Alonso, Jose-Manuel; Stanley, Garrett B.
2010-01-01
Natural visual stimuli have highly structured spatial and temporal properties which influence the way visual information is encoded in the visual pathway. In response to natural scene stimuli, neurons in the lateral geniculate nucleus (LGN) are temporally precise – on a time scale of 10–25 ms – both within single cells and across cells within a population. This time scale, established by non stimulus-driven elements of neuronal firing, is significantly shorter than that of natural scenes, yet is critical for the neural representation of the spatial and temporal structure of the scene. Here, a generalized linear model (GLM) that combines stimulus-driven elements with spike-history dependence associated with intrinsic cellular dynamics is shown to predict the fine timing precision of LGN responses to natural scene stimuli, the corresponding correlation structure across nearby neurons in the population, and the continuous modulation of spike timing precision and latency across neurons. A single model captured the experimentally observed neural response, across different levels of contrasts and different classes of visual stimuli, through interactions between the stimulus correlation structure and the nonlinearity in spike generation and spike history dependence. Given the sensitivity of the thalamocortical synapse to closely timed spikes and the importance of fine timing precision for the faithful representation of natural scenes, the modulation of thalamic population timing over these time scales is likely important for cortical representations of the dynamic natural visual environment. PMID:21151356
Menzel, Claudia; Hayn-Leichsenring, Gregor U; Langner, Oliver; Wiese, Holger; Redies, Christoph
2015-01-01
We investigated whether low-level processed image properties that are shared by natural scenes and artworks - but not veridical face photographs - affect the perception of facial attractiveness and age. Specifically, we considered the slope of the radially averaged Fourier power spectrum in a log-log plot. This slope is a measure of the distribution of special frequency power in an image. Images of natural scenes and artworks possess - compared to face images - a relatively shallow slope (i.e., increased high spatial frequency power). Since aesthetic perception might be based on the efficient processing of images with natural scene statistics, we assumed that the perception of facial attractiveness might also be affected by these properties. We calculated Fourier slope and other beauty-associated measurements in face images and correlated them with ratings of attractiveness and age of the depicted persons (Study 1). We found that Fourier slope - in contrast to the other tested image properties - did not predict attractiveness ratings when we controlled for age. In Study 2A, we overlaid face images with random-phase patterns with different statistics. Patterns with a slope similar to those in natural scenes and artworks resulted in lower attractiveness and higher age ratings. In Studies 2B and 2C, we directly manipulated the Fourier slope of face images and found that images with shallower slopes were rated as more attractive. Additionally, attractiveness of unaltered faces was affected by the Fourier slope of a random-phase background (Study 3). Faces in front of backgrounds with statistics similar to natural scenes and faces were rated as more attractive. We conclude that facial attractiveness ratings are affected by specific image properties. An explanation might be the efficient coding hypothesis.
Langner, Oliver; Wiese, Holger; Redies, Christoph
2015-01-01
We investigated whether low-level processed image properties that are shared by natural scenes and artworks – but not veridical face photographs – affect the perception of facial attractiveness and age. Specifically, we considered the slope of the radially averaged Fourier power spectrum in a log-log plot. This slope is a measure of the distribution of special frequency power in an image. Images of natural scenes and artworks possess – compared to face images – a relatively shallow slope (i.e., increased high spatial frequency power). Since aesthetic perception might be based on the efficient processing of images with natural scene statistics, we assumed that the perception of facial attractiveness might also be affected by these properties. We calculated Fourier slope and other beauty-associated measurements in face images and correlated them with ratings of attractiveness and age of the depicted persons (Study 1). We found that Fourier slope – in contrast to the other tested image properties – did not predict attractiveness ratings when we controlled for age. In Study 2A, we overlaid face images with random-phase patterns with different statistics. Patterns with a slope similar to those in natural scenes and artworks resulted in lower attractiveness and higher age ratings. In Studies 2B and 2C, we directly manipulated the Fourier slope of face images and found that images with shallower slopes were rated as more attractive. Additionally, attractiveness of unaltered faces was affected by the Fourier slope of a random-phase background (Study 3). Faces in front of backgrounds with statistics similar to natural scenes and faces were rated as more attractive. We conclude that facial attractiveness ratings are affected by specific image properties. An explanation might be the efficient coding hypothesis. PMID:25835539
Generalized parallel-perspective stereo mosaics from airborne video.
Zhu, Zhigang; Hanson, Allen R; Riseman, Edward M
2004-02-01
In this paper, we present a new method for automatically and efficiently generating stereoscopic mosaics by seamless registration of images collected by a video camera mounted on an airborne platform. Using a parallel-perspective representation, a pair of geometrically registered stereo mosaics can be precisely constructed under quite general motion. A novel parallel ray interpolation for stereo mosaicing (PRISM) approach is proposed to make stereo mosaics seamless in the presence of obvious motion parallax and for rather arbitrary scenes. Parallel-perspective stereo mosaics generated with the PRISM method have better depth resolution than perspective stereo due to the adaptive baseline geometry. Moreover, unlike previous results showing that parallel-perspective stereo has a constant depth error, we conclude that the depth estimation error of stereo mosaics is in fact a linear function of the absolute depths of a scene. Experimental results on long video sequences are given.
Collerton, Daniel; Perry, Elaine; McKeith, Ian
2005-12-01
As many as two million people in the United Kingdom repeatedly see people, animals, and objects that have no objective reality. Hallucinations on the border of sleep, dementing illnesses, delirium, eye disease, and schizophrenia account for 90% of these. The remainder have rarer disorders. We review existing models of recurrent complex visual hallucinations (RCVH) in the awake person, including cortical irritation, cortical hyperexcitability and cortical release, top-down activation, misperception, dream intrusion, and interactive models. We provide evidence that these can neither fully account for the phenomenology of RCVH, nor for variations in the frequency of RCVH in different disorders. We propose a novel Perception and Attention Deficit (PAD) model for RCVH. A combination of impaired attentional binding and poor sensory activation of a correct proto-object, in conjunction with a relatively intact scene representation, bias perception to allow the intrusion of a hallucinatory proto-object into a scene perception. Incorporation of this image into a context-specific hallucinatory scene representation accounts for repetitive hallucinations. We suggest that these impairments are underpinned by disturbances in a lateral frontal cortex-ventral visual stream system. We show how the frequency of RCVH in different diseases is related to the coexistence of attentional and visual perceptual impairments; how attentional and perceptual processes can account for their phenomenology; and that diseases and other states with high rates of RCVH have cholinergic dysfunction in both frontal cortex and the ventral visual stream. Several tests of the model are indicated, together with a number of treatment options that it generates.
Ahmad, Fahad N; Moscovitch, Morris; Hockley, William E
2017-04-01
Konkle, Brady, Alvarez and Oliva (Psychological Science, 21, 1551-1556, 2010) showed that participants have an exceptional long-term memory (LTM) for photographs of scenes. We examined to what extent participants' exceptional LTM for scenes is determined by presentation time during encoding. In addition, at retrieval, we varied the nature of the lures in a forced-choice recognition task so that they resembled the target in gist (i.e., global or categorical) information, but were distinct in verbatim information (e.g., an "old" beach scene and a similar "new" beach scene; exemplar condition) or vice versa (e.g., a beach scene and a new scene from a novel category; novel condition). In Experiment 1, half of the list of scenes was presented for 1 s, whereas the other half was presented for 4 s. We found lower performance for shorter study presentation time in the exemplar test condition and similar performance for both study presentation times in the novel test condition. In Experiment 2, participants showed similar performance in an exemplar test for which the lure was of a different category but a category that was used at study. In Experiment 3, when presentation time was lowered to 500 ms, recognition accuracy was reduced in both novel and exemplar test conditions. A less detailed memorial representation of the studied scene containing more gist (i.e., meaning) than verbatim (i.e., surface or perceptual details) information is retrieved from LTM after a short compared to a long study presentation time. We conclude that our findings support fuzzy-trace theory.
Wing, Erik A.; Ritchey, Maureen; Cabeza, Roberto
2015-01-01
Neurobiological memory models assume memory traces are stored in neocortex, with pointers in the hippocampus, and are then reactivated during retrieval, yielding the experience of remembering. Whereas most prior neuroimaging studies on reactivation have focused on the reactivation of sets or categories of items, the current study sought to identify cortical patterns pertaining to memory for individual scenes. During encoding, participants viewed pictures of scenes paired with matching labels (e.g., “barn,” “tunnel”), and, during retrieval, they recalled the scenes in response to the labels and rated the quality of their visual memories. Using representational similarity analyses, we interrogated the similarity between activation patterns during encoding and retrieval both at the item level (individual scenes) and the set level (all scenes). The study yielded four main findings. First, in occipitotemporal cortex, memory success increased with encoding-retrieval similarity (ERS) at the item level but not at the set level, indicating the reactivation of individual scenes. Second, in ventrolateral pFC, memory increased with ERS for both item and set levels, indicating the recapitulation of memory processes that benefit encoding and retrieval of all scenes. Third, in retrosplenial/posterior cingulate cortex, ERS was sensitive to individual scene information irrespective of memory success, suggesting automatic activation of scene contexts. Finally, consistent with neurobiological models, hippocampal activity during encoding predicted the subsequent reactivation of individual items. These findings show the promise of studying memory with greater specificity by isolating individual mnemonic representations and determining their relationship to factors like the detail with which past events are remembered. PMID:25313659
Bio-inspired display of polarization information using selected visual cues
NASA Astrophysics Data System (ADS)
Yemelyanov, Konstantin M.; Lin, Shih-Schon; Luis, William Q.; Pugh, Edward N., Jr.; Engheta, Nader
2003-12-01
For imaging systems the polarization of electromagnetic waves carries much potentially useful information about such features of the world as the surface shape, material contents, local curvature of objects, as well as about the relative locations of the source, object and imaging system. The imaging system of the human eye however, is "polarization-blind", and cannot utilize the polarization of light without the aid of an artificial, polarization-sensitive instrument. Therefore, polarization information captured by a man-made polarimetric imaging system must be displayed to a human observer in the form of visual cues that are naturally processed by the human visual system, while essentially preserving the other important non-polarization information (such as spectral and intensity information) in an image. In other words, some forms of sensory substitution are needed for representing polarization "signals" without affecting other visual information such as color and brightness. We are investigating several bio-inspired representational methodologies for mapping polarization information into visual cues readily perceived by the human visual system, and determining which mappings are most suitable for specific applications such as object detection, navigation, sensing, scene classifications, and surface deformation. The visual cues and strategies we are exploring are the use of coherently moving dots superimposed on image to represent various range of polarization signals, overlaying textures with spatial and/or temporal signatures to segregate regions of image with differing polarization, modulating luminance and/or color contrast of scenes in terms of certain aspects of polarization values, and fusing polarization images into intensity-only images. In this talk, we will present samples of our findings in this area.
An HDR imaging method with DTDI technology for push-broom cameras
NASA Astrophysics Data System (ADS)
Sun, Wu; Han, Chengshan; Xue, Xucheng; Lv, Hengyi; Shi, Junxia; Hu, Changhong; Li, Xiangzhi; Fu, Yao; Jiang, Xiaonan; Huang, Liang; Han, Hongyin
2018-03-01
Conventionally, high dynamic-range (HDR) imaging is based on taking two or more pictures of the same scene with different exposure. However, due to a high-speed relative motion between the camera and the scene, it is hard for this technique to be applied to push-broom remote sensing cameras. For the sake of HDR imaging in push-broom remote sensing applications, the present paper proposes an innovative method which can generate HDR images without redundant image sensors or optical components. Specifically, this paper adopts an area array CMOS (complementary metal oxide semiconductor) with the digital domain time-delay-integration (DTDI) technology for imaging, instead of adopting more than one row of image sensors, thereby taking more than one picture with different exposure. And then a new HDR image by fusing two original images with a simple algorithm can be achieved. By conducting the experiment, the dynamic range (DR) of the image increases by 26.02 dB. The proposed method is proved to be effective and has potential in other imaging applications where there is a relative motion between the cameras and scenes.
Thermal-to-visible transducer (TVT) for thermal-IR imaging
NASA Astrophysics Data System (ADS)
Flusberg, Allen; Swartz, Stephen; Huff, Michael; Gross, Steven
2008-04-01
We have been developing a novel thermal-to-visible transducer (TVT), an uncooled thermal-IR imager that is based on a Fabry-Perot Interferometer (FPI). The FPI-based IR imager can convert a thermal-IR image to a video electronic image. IR radiation that is emitted by an object in the scene is imaged onto an IR-absorbing material that is located within an FPI. Temperature variations generated by the spatial variations in the IR image intensity cause variations in optical thickness, modulating the reflectivity seen by a probe laser beam. The reflected probe is imaged onto a visible array, producing a visible image of the IR scene. This technology can provide low-cost IR cameras with excellent sensitivity, low power consumption, and the potential for self-registered fusion of thermal-IR and visible images. We will describe characteristics of requisite pixelated arrays that we have fabricated.
Schmid, Anita M.; Victor, Jonathan D.
2014-01-01
When analyzing a visual image, the brain has to achieve several goals quickly. One crucial goal is to rapidly detect parts of the visual scene that might be behaviorally relevant, while another one is to segment the image into objects, to enable an internal representation of the world. Both of these processes can be driven by local variations in any of several image attributes such as luminance, color, and texture. Here, focusing on texture defined by local orientation, we propose that the two processes are mediated by separate mechanisms that function in parallel. More specifically, differences in orientation can cause an object to “pop out” and attract visual attention, if its orientation differs from that of the surrounding objects. Differences in orientation can also signal a boundary between objects and therefore provide useful information for image segmentation. We propose that contextual response modulations in primary visual cortex (V1) are responsible for orientation pop-out, while a different kind of receptive field nonlinearity in secondary visual cortex (V2) is responsible for orientation-based texture segmentation. We review a recent experiment that led us to put forward this hypothesis along with other research literature relevant to this notion. PMID:25064441
Regional information guidance system based on hypermedia concept
NASA Astrophysics Data System (ADS)
Matoba, Hiroshi; Hara, Yoshinori; Kasahara, Yutako
1990-08-01
A regional information guidance system has been developed on an image workstation. Two main features of this system are hypermedia data structure and friendly visual interface realized by the full-color frame memory system. As the hypermedia data structure manages regional information such as maps, pictures and explanations of points of interest, users can retrieve those information one by one, next to next according to their interest change. For example, users can retrieve explanation of a picture through the link between pictures and text explanations. Users can also traverse from one document to another by using keywords as cross reference indices. The second feature is to utilize a full-color, high resolution and wide space frame memory for visual interface design. This frame memory system enables real-time operation of image data and natural scene representation. The system also provides half tone representing function which enables fade-in/out presentations. This fade-in/out functions used in displaying and erasing menu and image data, makes visual interface soft for human eyes. The system we have developed is a typical example of multimedia applications. We expect the image workstation will play an important role as a platform for multimedia applications.
Scene-based nonuniformity correction algorithm based on interframe registration.
Zuo, Chao; Chen, Qian; Gu, Guohua; Sui, Xiubao
2011-06-01
In this paper, we present a simple and effective scene-based nonuniformity correction (NUC) method for infrared focal plane arrays based on interframe registration. This method estimates the global translation between two adjacent frames and minimizes the mean square error between the two properly registered images to make any two detectors with the same scene produce the same output value. In this way, the accumulation of the registration error can be avoided and the NUC can be achieved. The advantages of the proposed algorithm lie in its low computational complexity and storage requirements and ability to capture temporal drifts in the nonuniformity parameters. The performance of the proposed technique is thoroughly studied with infrared image sequences with simulated nonuniformity and infrared imagery with real nonuniformity. It shows a significantly fast and reliable fixed-pattern noise reduction and obtains an effective frame-by-frame adaptive estimation of each detector's gain and offset.
NASA Astrophysics Data System (ADS)
Le Goff, Alain; Cathala, Thierry; Latger, Jean
2015-10-01
To provide technical assessments of EO/IR flares and self-protection systems for aircraft, DGA Information superiority resorts to synthetic image generation to model the operational battlefield of an aircraft, as viewed by EO/IR threats. For this purpose, it completed the SE-Workbench suite from OKTAL-SE with functionalities to predict a realistic aircraft IR signature and is yet integrating the real-time EO/IR rendering engine of SE-Workbench called SE-FAST-IR. This engine is a set of physics-based software and libraries that allows preparing and visualizing a 3D scene for the EO/IR domain. It takes advantage of recent advances in GPU computing techniques. The recent past evolutions that have been performed concern mainly the realistic and physical rendering of reflections, the rendering of both radiative and thermal shadows, the use of procedural techniques for the managing and the rendering of very large terrains, the implementation of Image- Based Rendering for dynamic interpolation of plume static signatures and lastly for aircraft the dynamic interpolation of thermal states. The next step is the representation of the spectral, directional, spatial and temporal signature of flares by Lacroix Defense using OKTAL-SE technology. This representation is prepared from experimental data acquired during windblast tests and high speed track tests. It is based on particle system mechanisms to model the different components of a flare. The validation of a flare model will comprise a simulation of real trials and a comparison of simulation outputs to experimental results concerning the flare signature and above all the behavior of the stimulated threat.
Guidance of visual attention by semantic information in real-world scenes
Wu, Chia-Chien; Wick, Farahnaz Ahmed; Pomplun, Marc
2014-01-01
Recent research on attentional guidance in real-world scenes has focused on object recognition within the context of a scene. This approach has been valuable for determining some factors that drive the allocation of visual attention and determine visual selection. This article provides a review of experimental work on how different components of context, especially semantic information, affect attentional deployment. We review work from the areas of object recognition, scene perception, and visual search, highlighting recent studies examining semantic structure in real-world scenes. A better understanding on how humans parse scene representations will not only improve current models of visual attention but also advance next-generation computer vision systems and human-computer interfaces. PMID:24567724
Hyperspectral imaging simulation of object under sea-sky background
NASA Astrophysics Data System (ADS)
Wang, Biao; Lin, Jia-xuan; Gao, Wei; Yue, Hui
2016-10-01
Remote sensing image simulation plays an important role in spaceborne/airborne load demonstration and algorithm development. Hyperspectral imaging is valuable in marine monitoring, search and rescue. On the demand of spectral imaging of objects under the complex sea scene, physics based simulation method of spectral image of object under sea scene is proposed. On the development of an imaging simulation model considering object, background, atmosphere conditions, sensor, it is able to examine the influence of wind speed, atmosphere conditions and other environment factors change on spectral image quality under complex sea scene. Firstly, the sea scattering model is established based on the Philips sea spectral model, the rough surface scattering theory and the water volume scattering characteristics. The measured bi directional reflectance distribution function (BRDF) data of objects is fit to the statistical model. MODTRAN software is used to obtain solar illumination on the sea, sky brightness, the atmosphere transmittance from sea to sensor and atmosphere backscattered radiance, and Monte Carlo ray tracing method is used to calculate the sea surface object composite scattering and spectral image. Finally, the object spectrum is acquired by the space transformation, radiation degradation and adding the noise. The model connects the spectrum image with the environmental parameters, the object parameters, and the sensor parameters, which provide a tool for the load demonstration and algorithm development.
Representation of Gravity-Aligned Scene Structure in Ventral Pathway Visual Cortex.
Vaziri, Siavash; Connor, Charles E
2016-03-21
The ventral visual pathway in humans and non-human primates is known to represent object information, including shape and identity [1]. Here, we show the ventral pathway also represents scene structure aligned with the gravitational reference frame in which objects move and interact. We analyzed shape tuning of recently described macaque monkey ventral pathway neurons that prefer scene-like stimuli to objects [2]. Individual neurons did not respond to a single shape class, but to a variety of scene elements that are typically aligned with gravity: large planes in the orientation range of ground surfaces under natural viewing conditions, planes in the orientation range of ceilings, and extended convex and concave edges in the orientation range of wall/floor/ceiling junctions. For a given neuron, these elements tended to share a common alignment in eye-centered coordinates. Thus, each neuron integrated information about multiple gravity-aligned structures as they would be seen from a specific eye and head orientation. This eclectic coding strategy provides only ambiguous information about individual structures but explicit information about the environmental reference frame and the orientation of gravity in egocentric coordinates. In the ventral pathway, this could support perceiving and/or predicting physical events involving objects subject to gravity, recognizing object attributes like animacy based on movement not caused by gravity, and/or stabilizing perception of the world against changes in head orientation [3-5]. Our results, like the recent discovery of object weight representation [6], imply that the ventral pathway is involved not just in recognition, but also in physical understanding of objects and scenes. Copyright © 2016 Elsevier Ltd. All rights reserved.
The genesis of errors in drawing.
Chamberlain, Rebecca; Wagemans, Johan
2016-06-01
The difficulty adults find in drawing objects or scenes from real life is puzzling, assuming that there are few gross individual differences in the phenomenology of visual scenes and in fine motor control in the neurologically healthy population. A review of research concerning the perceptual, motoric and memorial correlates of drawing ability was conducted in order to understand why most adults err when trying to produce faithful representations of objects and scenes. The findings reveal that accurate perception of the subject and of the drawing is at the heart of drawing proficiency, although not to the extent that drawing skill elicits fundamental changes in visual perception. Instead, the decisive role of representational decisions reveals the importance of appropriate segmentation of the visual scene and of the influence of pictorial schemas. This leads to the conclusion that domain-specific, flexible, top-down control of visual attention plays a critical role in development of skill in visual art and may also be a window into creative thinking. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kwon, TaeKyu; Agrawal, Kunal; Li, Yunfeng; Pizlo, Zygmunt
2015-01-01
Finding the occluding contours of objects in real 2D retinal images of natural 3D scenes is done by determining, which contour fragments are relevant, and the order in which they should be connected. We developed a model that finds the closed contour represented in the image by solving a shortest path problem that uses a log-polar representation of the image; the kind of representation known to exist in area V1 of the primate cortex. The shortest path in a log-polar representation favors the smooth, convex and closed contours in the retinal image that have the smallest number of gaps. This approach is practical because finding a globally-optimal solution to a shortest path problem is computationally easy. Our model was tested in four psychophysical experiments. In the first two experiments, the subject was presented with a fragmented convex or concave polygon target among a large number of unrelated pieces of contour (distracters). The density of these pieces of contour was uniform all over the screen to minimize spatially-local cues. The orientation of each target contour fragment was randomly perturbed by varying the levels of jitter. Subjects drew a closed contour that represented the target’s contour on a screen. The subjects’ performance was nearly perfect when the jitter-level was low. Their performance deteriorated as jitter-levels were increased. The performance of our model was very similar to our subjects’. In two subsequent experiments, the subject was asked to discriminate a briefly-presented egg-shaped object while maintaining fixation at several different positions relative to the closed contour of the shape. The subject’s discrimination performance was affected by the fixation position in much the same way as the model’s. PMID:26241462
Spatial Modulation Improves Performance in CTIS
NASA Technical Reports Server (NTRS)
Bearman, Gregory H.; Wilson, Daniel W.; Johnson, William R.
2009-01-01
Suitably formulated spatial modulation of a scene imaged by a computed-tomography imaging spectrometer (CTIS) has been found to be useful as a means of improving the imaging performance of the CTIS. As used here, "spatial modulation" signifies the imposition of additional, artificial structure on a scene from within the CTIS optics. The basic principles of a CTIS were described in "Improvements in Computed- Tomography Imaging Spectrometry" (NPO-20561) NASA Tech Briefs, Vol. 24, No. 12 (December 2000), page 38 and "All-Reflective Computed-Tomography Imaging Spectrometers" (NPO-20836), NASA Tech Briefs, Vol. 26, No. 11 (November 2002), page 7a. To recapitulate: A CTIS offers capabilities for imaging a scene with spatial, spectral, and temporal resolution. The spectral disperser in a CTIS is a two-dimensional diffraction grating. It is positioned between two relay lenses (or on one of two relay mirrors) in a video imaging system. If the disperser were removed, the system would produce ordinary images of the scene in its field of view. In the presence of the grating, the image on the focal plane of the system contains both spectral and spatial information because the multiple diffraction orders of the grating give rise to multiple, spectrally dispersed images of the scene. By use of algorithms adapted from computed tomography, the image on the focal plane can be processed into an image cube a three-dimensional collection of data on the image intensity as a function of the two spatial dimensions (x and y) in the scene and of wavelength (lambda). Thus, both spectrally and spatially resolved information on the scene at a given instant of time can be obtained, without scanning, from a single snapshot; this is what makes the CTIS such a potentially powerful tool for spatially, spectrally, and temporally resolved imaging. A CTIS performs poorly in imaging some types of scenes in particular, scenes that contain little spatial or spectral variation. The computed spectra of such scenes tend to approximate correct values to within acceptably small errors near the edges of the field of view but to be poor approximations away from the edges. The additional structure imposed on a scene according to the present method enables the CTIS algorithms to reconstruct acceptable approximations of the spectral data throughout the scene.
Benoit, Michel; Guerchouche, Rachid; Petit, Pierre-David; Chapoulie, Emmanuelle; Manera, Valeria; Chaurasia, Gaurav; Drettakis, George; Robert, Philippe
2015-01-01
Virtual reality (VR) opens up a vast number of possibilities in many domains of therapy. The primary objective of the present study was to evaluate the acceptability for elderly subjects of a VR experience using the image-based rendering virtual environment (IBVE) approach and secondly to test the hypothesis that visual cues using VR may enhance the generation of autobiographical memories. Eighteen healthy volunteers (mean age 68.2 years) presenting memory complaints with a Mini-Mental State Examination score higher than 27 and no history of neuropsychiatric disease were included. Participants were asked to perform an autobiographical fluency task in four conditions. The first condition was a baseline grey screen, the second was a photograph of a well-known location in the participant's home city (FamPhoto), and the last two conditions displayed VR, ie, a familiar image-based virtual environment (FamIBVE) consisting of an image-based representation of a known landmark square in the center of the city of experimentation (Nice) and an unknown image-based virtual environment (UnknoIBVE), which was captured in a public housing neighborhood containing unrecognizable building fronts. After each of the four experimental conditions, participants filled in self-report questionnaires to assess the task acceptability (levels of emotion, motivation, security, fatigue, and familiarity). CyberSickness and Presence questionnaires were also assessed after the two VR conditions. Autobiographical memory was assessed using a verbal fluency task and quality of the recollection was assessed using the "remember/know" procedure. All subjects completed the experiment. Sense of security and fatigue were not significantly different between the conditions with and without VR. The FamPhoto condition yielded a higher emotion score than the other conditions (P<0.05). The CyberSickness questionnaire showed that participants did not experience sickness during the experiment across the VR conditions. VR stimulates autobiographical memory, as demonstrated by the increased total number of responses on the autobiographical fluency task and the increased number of conscious recollections of memories for familiar versus unknown scenes (P<0.01). The study indicates that VR using the FamIBVE system is well tolerated by the elderly. VR can also stimulate recollections of autobiographical memory and convey familiarity of a given scene, which is an essential requirement for use of VR during reminiscence therapy.
Benoit, Michel; Guerchouche, Rachid; Petit, Pierre-David; Chapoulie, Emmanuelle; Manera, Valeria; Chaurasia, Gaurav; Drettakis, George; Robert, Philippe
2015-01-01
Background Virtual reality (VR) opens up a vast number of possibilities in many domains of therapy. The primary objective of the present study was to evaluate the acceptability for elderly subjects of a VR experience using the image-based rendering virtual environment (IBVE) approach and secondly to test the hypothesis that visual cues using VR may enhance the generation of autobiographical memories. Methods Eighteen healthy volunteers (mean age 68.2 years) presenting memory complaints with a Mini-Mental State Examination score higher than 27 and no history of neuropsychiatric disease were included. Participants were asked to perform an autobiographical fluency task in four conditions. The first condition was a baseline grey screen, the second was a photograph of a well-known location in the participant’s home city (FamPhoto), and the last two conditions displayed VR, ie, a familiar image-based virtual environment (FamIBVE) consisting of an image-based representation of a known landmark square in the center of the city of experimentation (Nice) and an unknown image-based virtual environment (UnknoIBVE), which was captured in a public housing neighborhood containing unrecognizable building fronts. After each of the four experimental conditions, participants filled in self-report questionnaires to assess the task acceptability (levels of emotion, motivation, security, fatigue, and familiarity). CyberSickness and Presence questionnaires were also assessed after the two VR conditions. Autobiographical memory was assessed using a verbal fluency task and quality of the recollection was assessed using the “remember/know” procedure. Results All subjects completed the experiment. Sense of security and fatigue were not significantly different between the conditions with and without VR. The FamPhoto condition yielded a higher emotion score than the other conditions (P<0.05). The CyberSickness questionnaire showed that participants did not experience sickness during the experiment across the VR conditions. VR stimulates autobiographical memory, as demonstrated by the increased total number of responses on the autobiographical fluency task and the increased number of conscious recollections of memories for familiar versus unknown scenes (P<0.01). Conclusion The study indicates that VR using the FamIBVE system is well tolerated by the elderly. VR can also stimulate recollections of autobiographical memory and convey familiarity of a given scene, which is an essential requirement for use of VR during reminiscence therapy. PMID:25834437
Invariant recognition drives neural representations of action sequences
Poggio, Tomaso
2017-01-01
Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding action recognition at the neural level have not always translated into precise accounts of the computational principles underlying what representations of action sequences are constructed by human visual cortex. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, Convolutional Neural Networks (CNNs), that achieve human level performance in complex discriminative tasks. Within this class, architectures that better support invariant object recognition also produce image representations that better match those implied by human and primate neural data. However, whether these models produce representations of action sequences that support recognition across complex transformations and closely follow neural representations of actions remains unknown. Here we show that spatiotemporal CNNs accurately categorize video stimuli into action classes, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings. Our results support our hypothesis that performance on invariant discrimination dictates the neural representations of actions computed in the brain. These results broaden the scope of the invariant recognition framework for understanding visual intelligence from perception of inanimate objects and faces in static images to the study of human perception of action sequences. PMID:29253864
Evaluation of feature-based 3-d registration of probabilistic volumetric scenes
NASA Astrophysics Data System (ADS)
Restrepo, Maria I.; Ulusoy, Ali O.; Mundy, Joseph L.
2014-12-01
Automatic estimation of the world surfaces from aerial images has seen much attention and progress in recent years. Among current modeling technologies, probabilistic volumetric models (PVMs) have evolved as an alternative representation that can learn geometry and appearance in a dense and probabilistic manner. Recent progress, in terms of storage and speed, achieved in the area of volumetric modeling, opens the opportunity to develop new frameworks that make use of the PVM to pursue the ultimate goal of creating an entire map of the earth, where one can reason about the semantics and dynamics of the 3-d world. Aligning 3-d models collected at different time-instances constitutes an important step for successful fusion of large spatio-temporal information. This paper evaluates how effectively probabilistic volumetric models can be aligned using robust feature-matching techniques, while considering different scenarios that reflect the kind of variability observed across aerial video collections from different time instances. More precisely, this work investigates variability in terms of discretization, resolution and sampling density, errors in the camera orientation, and changes in illumination and geographic characteristics. All results are given for large-scale, outdoor sites. In order to facilitate the comparison of the registration performance of PVMs to that of other 3-d reconstruction techniques, the registration pipeline is also carried out using Patch-based Multi-View Stereo (PMVS) algorithm. Registration performance is similar for scenes that have favorable geometry and the appearance characteristics necessary for high quality reconstruction. In scenes containing trees, such as a park, or many buildings, such as a city center, registration performance is significantly more accurate when using the PVM.
Compressive Coded-Aperture Multimodal Imaging Systems
NASA Astrophysics Data System (ADS)
Rueda-Chacon, Hoover F.
Multimodal imaging refers to the framework of capturing images that span different physical domains such as space, spectrum, depth, time, polarization, and others. For instance, spectral images are modeled as 3D cubes with two spatial and one spectral coordinate. Three-dimensional cubes spanning just the space domain, are referred as depth volumes. Imaging cubes varying in time, spectra or depth, are referred as 4D-images. Nature itself spans different physical domains, thus imaging our real world demands capturing information in at least 6 different domains simultaneously, giving turn to 3D-spatial+spectral+polarized dynamic sequences. Conventional imaging devices, however, can capture dynamic sequences with up-to 3 spectral channels, in real-time, by the use of color sensors. Capturing multiple spectral channels require scanning methodologies, which demand long time. In general, to-date multimodal imaging requires a sequence of different imaging sensors, placed in tandem, to simultaneously capture the different physical properties of a scene. Then, different fusion techniques are employed to mix all the individual information into a single image. Therefore, new ways to efficiently capture more than 3 spectral channels of 3D time-varying spatial information, in a single or few sensors, are of high interest. Compressive spectral imaging (CSI) is an imaging framework that seeks to optimally capture spectral imagery (tens of spectral channels of 2D spatial information), using fewer measurements than that required by traditional sensing procedures which follows the Shannon-Nyquist sampling. Instead of capturing direct one-to-one representations of natural scenes, CSI systems acquire linear random projections of the scene and then solve an optimization algorithm to estimate the 3D spatio-spectral data cube by exploiting the theory of compressive sensing (CS). To date, the coding procedure in CSI has been realized through the use of ``block-unblock" coded apertures, commonly implemented as chrome-on-quartz photomasks. These apertures block or permit to pass the entire spectrum from the scene at given spatial locations, thus modulating the spatial characteristics of the scene. In the first part, this thesis aims to expand the framework of CSI by replacing the traditional block-unblock coded apertures by patterned optical filter arrays, referred as ``color" coded apertures. These apertures are formed by tiny pixelated optical filters, which in turn, allow the input image to be modulated not only spatially but spectrally as well, entailing more powerful coding strategies. The proposed colored coded apertures are either synthesized through linear combinations of low-pass, high-pass and band-pass filters, paired with binary pattern ensembles realized by a digital-micromirror-device (DMD), or experimentally realized through thin-film color-patterned filter arrays. The optical forward model of the proposed CSI architectures will be presented along with the design and proof-of-concept implementations, which achieve noticeable improvements in the quality of the reconstructions compared with conventional block-unblock coded aperture-based CSI architectures. On another front, due to the rich information contained in the infrared spectrum as well as the depth domain, this thesis aims to explore multimodal imaging by extending the range sensitivity of current CSI systems to a dual-band visible+near-infrared spectral domain, and also, it proposes, for the first time, a new imaging device that captures simultaneously 4D data cubes (2D spatial+1D spectral+depth imaging) with as few as a single snapshot. Due to the snapshot advantage of this camera, video sequences are possible, thus enabling the joint capture of 5D imagery. It aims to create super-human sensing that will enable the perception of our world in new and exciting ways. With this, we intend to advance in the state of the art in compressive sensing systems to extract depth while accurately capturing spatial and spectral material properties. The applications of such a sensor are self-evident in fields such as computer/robotic vision because they would allow an artificial intelligence to make informed decisions about not only the location of objects within a scene but also their material properties.
Cross-label Suppression: a Discriminative and Fast Dictionary Learning with Group Regularization.
Wang, Xiudong; Gu, Yuantao
2017-05-10
This paper addresses image classification through learning a compact and discriminative dictionary efficiently. Given a structured dictionary with each atom (columns in the dictionary matrix) related to some label, we propose crosslabel suppression constraint to enlarge the difference among representations for different classes. Meanwhile, we introduce group regularization to enforce representations to preserve label properties of original samples, meaning the representations for the same class are encouraged to be similar. Upon the cross-label suppression, we don't resort to frequently-used `0-norm or `1- norm for coding, and obtain computational efficiency without losing the discriminative power for categorization. Moreover, two simple classification schemes are also developed to take full advantage of the learnt dictionary. Extensive experiments on six data sets including face recognition, object categorization, scene classification, texture recognition and sport action categorization are conducted, and the results show that the proposed approach can outperform lots of recently presented dictionary algorithms on both recognition accuracy and computational efficiency.
Phase-amplitude coupling supports phase coding in human ECoG
Watrous, Andrew J; Deuker, Lorena; Fell, Juergen; Axmacher, Nikolai
2015-01-01
Prior studies have shown that high-frequency activity (HFA) is modulated by the phase of low-frequency activity. This phenomenon of phase-amplitude coupling (PAC) is often interpreted as reflecting phase coding of neural representations, although evidence for this link is still lacking in humans. Here, we show that PAC indeed supports phase-dependent stimulus representations for categories. Six patients with medication-resistant epilepsy viewed images of faces, tools, houses, and scenes during simultaneous acquisition of intracranial recordings. Analyzing 167 electrodes, we observed PAC at 43% of electrodes. Further inspection of PAC revealed that category specific HFA modulations occurred at different phases and frequencies of the underlying low-frequency rhythm, permitting decoding of categorical information using the phase at which HFA events occurred. These results provide evidence for categorical phase-coded neural representations and are the first to show that PAC coincides with phase-dependent coding in the human brain. DOI: http://dx.doi.org/10.7554/eLife.07886.001 PMID:26308582
Modular Representation of Luminance Polarity In the Superficial Layers Of Primary Visual Cortex
Smith, Gordon B.; Whitney, David E.; Fitzpatrick, David
2016-01-01
Summary The spatial arrangement of luminance increments (ON) and decrements (OFF) falling on the retina provides a wealth of information used by central visual pathways to construct coherent representations of visual scenes. But how the polarity of luminance change is represented in the activity of cortical circuits remains unclear. Using wide-field epifluorescence and two-photon imaging we demonstrate a robust modular representation of luminance polarity (ON or OFF) in the superficial layers of ferret primary visual cortex. Polarity-specific domains are found with both uniform changes in luminance and single light/dark edges, and include neurons selective for orientation and direction of motion. The integration of orientation and polarity preference is evident in the selectivity and discrimination capabilities of most layer 2/3 neurons. We conclude that polarity selectivity is an integral feature of layer 2/3 neurons, ensuring that the distinction between light and dark stimuli is available for further processing in downstream extrastriate areas. PMID:26590348
Curvelet-based compressive sensing for InSAR raw data
NASA Astrophysics Data System (ADS)
Costa, Marcello G.; da Silva Pinho, Marcelo; Fernandes, David
2015-10-01
The aim of this work is to evaluate the compression performance of SAR raw data for interferometry applications collected by airborne from BRADAR (Brazilian SAR System operating in X and P bands) using the new approach based on compressive sensing (CS) to achieve an effective recovery with a good phase preserving. For this framework is desirable a real-time capability, where the collected data can be compressed to reduce onboard storage and bandwidth required for transmission. In the CS theory, a sparse unknown signals can be recovered from a small number of random or pseudo-random measurements by sparsity-promoting nonlinear recovery algorithms. Therefore, the original signal can be significantly reduced. To achieve the sparse representation of SAR signal, was done a curvelet transform. The curvelets constitute a directional frame, which allows an optimal sparse representation of objects with discontinuities along smooth curves as observed in raw data and provides an advanced denoising optimization. For the tests were made available a scene of 8192 x 2048 samples in range and azimuth in X-band with 2 m of resolution. The sparse representation was compressed using low dimension measurements matrices in each curvelet subband. Thus, an iterative CS reconstruction method based on IST (iterative soft/shrinkage threshold) was adjusted to recover the curvelets coefficients and then the original signal. To evaluate the compression performance were computed the compression ratio (CR), signal to noise ratio (SNR), and because the interferometry applications require more reconstruction accuracy the phase parameters like the standard deviation of the phase (PSD) and the mean phase error (MPE) were also computed. Moreover, in the image domain, a single-look complex image was generated to evaluate the compression effects. All results were computed in terms of sparsity analysis to provides an efficient compression and quality recovering appropriated for inSAR applications, therefore, providing a feasibility for compressive sensing application.
Real Time Intelligent Target Detection and Analysis with Machine Vision
NASA Technical Reports Server (NTRS)
Howard, Ayanna; Padgett, Curtis; Brown, Kenneth
2000-01-01
We present an algorithm for detecting a specified set of targets for an Automatic Target Recognition (ATR) application. ATR involves processing images for detecting, classifying, and tracking targets embedded in a background scene. We address the problem of discriminating between targets and nontarget objects in a scene by evaluating 40x40 image blocks belonging to an image. Each image block is first projected onto a set of templates specifically designed to separate images of targets embedded in a typical background scene from those background images without targets. These filters are found using directed principal component analysis which maximally separates the two groups. The projected images are then clustered into one of n classes based on a minimum distance to a set of n cluster prototypes. These cluster prototypes have previously been identified using a modified clustering algorithm based on prior sensed data. Each projected image pattern is then fed into the associated cluster's trained neural network for classification. A detailed description of our algorithm will be given in this paper. We outline our methodology for designing the templates, describe our modified clustering algorithm, and provide details on the neural network classifiers. Evaluation of the overall algorithm demonstrates that our detection rates approach 96% with a false positive rate of less than 0.03%.
A Subdivision-Based Representation for Vector Image Editing.
Liao, Zicheng; Hoppe, Hugues; Forsyth, David; Yu, Yizhou
2012-11-01
Vector graphics has been employed in a wide variety of applications due to its scalability and editability. Editability is a high priority for artists and designers who wish to produce vector-based graphical content with user interaction. In this paper, we introduce a new vector image representation based on piecewise smooth subdivision surfaces, which is a simple, unified and flexible framework that supports a variety of operations, including shape editing, color editing, image stylization, and vector image processing. These operations effectively create novel vector graphics by reusing and altering existing image vectorization results. Because image vectorization yields an abstraction of the original raster image, controlling the level of detail of this abstraction is highly desirable. To this end, we design a feature-oriented vector image pyramid that offers multiple levels of abstraction simultaneously. Our new vector image representation can be rasterized efficiently using GPU-accelerated subdivision. Experiments indicate that our vector image representation achieves high visual quality and better supports editing operations than existing representations.
Subjective time in near and far representational space.
Zäch, Peter; Brugger, Peter
2008-03-01
We set out to measure healthy subjects' estimates of temporal duration during the imagination of left and right sides of an object located in either near or far representational space. Duration estimates during the observation of small-scale scenes are shorter than those during the observation of the same scenes presented in a larger scale. It is not known whether a similar space-time relationship also exists for objects merely imagined and whether subjective time varies with a forced focus on either the left or the right side of a mental image. Eyes closed, 40 healthy, right-handed subjects (20 women) had to imagine a standard Swiss railway clock either at a distance of 30 cm or 6 m. They were required to focus on the imagined movement of the second hand and provide estimates of elapsed durations of 15 and 30 seconds. Separate estimates for the left and right side of the clockface were obtained. The magnitude of implicit line bisection error was assessed in a separate task. Irrespective of side of the clockface, duration estimates were shorter for the clockface imagined in far space than for the one imagined immediately in front of the inner eye. For men, but not women, duration judgments (left relative to right side of the clockface) correlated with relative lengths of left and right line segments in the bisection task. Subjective time seems to run faster during the inspection of a small-size compared with a larger-size mental image. This finding underlines the equivalence of the laws that guide both exploration and representation of space. Together with the observed correlation between spatial and temporal measures of lateral asymmetries, the result also illustrates the conceptual similarities in the processing of space and time. The normative data presented here may be useful for clinical applications of the paradigm in patients with hemispatial neglect or a distorted perception of time.
Neural Codes for One's Own Position and Direction in a Real-World "Vista" Environment.
Sulpizio, Valentina; Boccia, Maddalena; Guariglia, Cecilia; Galati, Gaspare
2018-01-01
Humans, like animals, rely on an accurate knowledge of one's spatial position and facing direction to keep orientated in the surrounding space. Although previous neuroimaging studies demonstrated that scene-selective regions (the parahippocampal place area or PPA, the occipital place area or OPA and the retrosplenial complex or RSC), and the hippocampus (HC) are implicated in coding position and facing direction within small-(room-sized) and large-scale navigational environments, little is known about how these regions represent these spatial quantities in a large open-field environment. Here, we used functional magnetic resonance imaging (fMRI) in humans to explore the neural codes of these navigationally-relevant information while participants viewed images which varied for position and facing direction within a familiar, real-world circular square. We observed neural adaptation for repeated directions in the HC, even if no navigational task was required. Further, we found that the amount of knowledge of the environment interacts with the PPA selectivity in encoding positions: individuals who needed more time to memorize positions in the square during a preliminary training task showed less neural attenuation in this scene-selective region. We also observed adaptation effects, which reflect the real distances between consecutive positions, in scene-selective regions but not in the HC. When examining the multi-voxel patterns of activity we observed that scene-responsive regions and the HC encoded both spatial information and that the RSC classification accuracy for positions was higher in individuals scoring higher to a self-reported questionnaire of spatial abilities. Our findings provide new insight into how the human brain represents a real, large-scale "vista" space, demonstrating the presence of neural codes for position and direction in both scene-selective and hippocampal regions, and revealing the existence, in the former regions, of a map-like spatial representation reflecting real-world distance between consecutive positions.
Immersive Virtual Moon Scene System Based on Panoramic Camera Data of Chang'E-3
NASA Astrophysics Data System (ADS)
Gao, X.; Liu, J.; Mu, L.; Yan, W.; Zeng, X.; Zhang, X.; Li, C.
2014-12-01
The system "Immersive Virtual Moon Scene" is used to show the virtual environment of Moon surface in immersive environment. Utilizing stereo 360-degree imagery from panoramic camera of Yutu rover, the system enables the operator to visualize the terrain and the celestial background from the rover's point of view in 3D. To avoid image distortion, stereo 360-degree panorama stitched by 112 images is projected onto inside surface of sphere according to panorama orientation coordinates and camera parameters to build the virtual scene. Stars can be seen from the Moon at any time. So we render the sun, planets and stars according to time and rover's location based on Hipparcos catalogue as the background on the sphere. Immersing in the stereo virtual environment created by this imaged-based rendering technique, the operator can zoom, pan to interact with the virtual Moon scene and mark interesting objects. Hardware of the immersive virtual Moon system is made up of four high lumen projectors and a huge curve screen which is 31 meters long and 5.5 meters high. This system which take all panoramic camera data available and use it to create an immersive environment, enable operator to interact with the environment and mark interesting objects contributed heavily to establishment of science mission goals in Chang'E-3 mission. After Chang'E-3 mission, the lab with this system will be open to public. Besides this application, Moon terrain stereo animations based on Chang'E-1 and Chang'E-2 data will be showed to public on the huge screen in the lab. Based on the data of lunar exploration,we will made more immersive virtual moon scenes and animations to help the public understand more about the Moon in the future.
Use of 3D techniques for virtual production
NASA Astrophysics Data System (ADS)
Grau, Oliver; Price, Marc C.; Thomas, Graham A.
2000-12-01
Virtual production for broadcast is currently mainly used in the form of virtual studios, where the resulting media is a sequence of 2D images. With the steady increase of 3D computing power in home PCs and the technical progress in 3D display technology, the content industry is looking for new kinds of program material, which makes use of 3D technology. The applications range form analysis of sport scenes, 3DTV, up to the creation of fully immersive content. In a virtual studio a camera films one or more actors in a controlled environment. The pictures of the actors can be segmented very accurately in real time using chroma keying techniques. The isolated silhouette can be integrated into a new synthetic virtual environment using a studio mixer. The resulting shape description of the actors is 2D so far. For the realization of more sophisticated optical interactions of the actors with the virtual environment, such as occlusions and shadows, an object-based 3D description of scenes is needed. However, the requirements of shape accuracy, and the kind of representation, differ in accordance with the application. This contribution gives an overview of requirements and approaches for the generation of an object-based 3D description in various applications studied by the BBC R and D department. An enhanced Virtual Studio for 3D programs is proposed that covers a range of applications for virtual production.
Human machine interface by using stereo-based depth extraction
NASA Astrophysics Data System (ADS)
Liao, Chao-Kang; Wu, Chi-Hao; Lin, Hsueh-Yi; Chang, Ting-Ting; Lin, Tung-Yang; Huang, Po-Kuan
2014-03-01
The ongoing success of three-dimensional (3D) cinema fuels increasing efforts to spread the commercial success of 3D to new markets. The possibilities of a convincing 3D experience at home, such as three-dimensional television (3DTV), has generated a great deal of interest within the research and standardization community. A central issue for 3DTV is the creation and representation of 3D content. Acquiring scene depth information is a fundamental task in computer vision, yet complex and error-prone. Dedicated range sensors, such as the Time of-Flight camera (ToF), can simplify the scene depth capture process and overcome shortcomings of traditional solutions, such as active or passive stereo analysis. Admittedly, currently available ToF sensors deliver only a limited spatial resolution. However, sophisticated depth upscaling approaches use texture information to match depth and video resolution. At Electronic Imaging 2012 we proposed an upscaling routine based on error energy minimization, weighted with edge information from an accompanying video source. In this article we develop our algorithm further. By adding temporal consistency constraints to the upscaling process, we reduce disturbing depth jumps and flickering artifacts in the final 3DTV content. Temporal consistency in depth maps enhances the 3D experience, leading to a wider acceptance of 3D media content. More content in better quality can boost the commercial success of 3DTV.
A color fusion method of infrared and low-light-level images based on visual perception
NASA Astrophysics Data System (ADS)
Han, Jing; Yan, Minmin; Zhang, Yi; Bai, Lianfa
2014-11-01
The color fusion images can be obtained through the fusion of infrared and low-light-level images, which will contain both the information of the two. The fusion images can help observers to understand the multichannel images comprehensively. However, simple fusion may lose the target information due to inconspicuous targets in long-distance infrared and low-light-level images; and if targets extraction is adopted blindly, the perception of the scene information will be affected seriously. To solve this problem, a new fusion method based on visual perception is proposed in this paper. The extraction of the visual targets ("what" information) and parallel processing mechanism are applied in traditional color fusion methods. The infrared and low-light-level color fusion images are achieved based on efficient typical targets learning. Experimental results show the effectiveness of the proposed method. The fusion images achieved by our algorithm can not only improve the detection rate of targets, but also get rich natural information of the scenes.
Storage and retrieval of large digital images
Bradley, J.N.
1998-01-20
Image compression and viewing are implemented with (1) a method for performing DWT-based compression on a large digital image with a computer system possessing a two-level system of memory and (2) a method for selectively viewing areas of the image from its compressed representation at multiple resolutions and, if desired, in a client-server environment. The compression of a large digital image I(x,y) is accomplished by first defining a plurality of discrete tile image data subsets T{sub ij}(x,y) that, upon superposition, form the complete set of image data I(x,y). A seamless wavelet-based compression process is effected on I(x,y) that is comprised of successively inputting the tiles T{sub ij}(x,y) in a selected sequence to a DWT routine, and storing the resulting DWT coefficients in a first primary memory. These coefficients are periodically compressed and transferred to a secondary memory to maintain sufficient memory in the primary memory for data processing. The sequence of DWT operations on the tiles T{sub ij}(x,y) effectively calculates a seamless DWT of I(x,y). Data retrieval consists of specifying a resolution and a region of I(x,y) for display. The subset of stored DWT coefficients corresponding to each requested scene is determined and then decompressed for input to an inverse DWT, the output of which forms the image display. The repeated process whereby image views are specified may take the form an interaction with a computer pointing device on an image display from a previous retrieval. 6 figs.
Storage and retrieval of large digital images
Bradley, Jonathan N.
1998-01-01
Image compression and viewing are implemented with (1) a method for performing DWT-based compression on a large digital image with a computer system possessing a two-level system of memory and (2) a method for selectively viewing areas of the image from its compressed representation at multiple resolutions and, if desired, in a client-server environment. The compression of a large digital image I(x,y) is accomplished by first defining a plurality of discrete tile image data subsets T.sub.ij (x,y) that, upon superposition, form the complete set of image data I(x,y). A seamless wavelet-based compression process is effected on I(x,y) that is comprised of successively inputting the tiles T.sub.ij (x,y) in a selected sequence to a DWT routine, and storing the resulting DWT coefficients in a first primary memory. These coefficients are periodically compressed and transferred to a secondary memory to maintain sufficient memory in the primary memory for data processing. The sequence of DWT operations on the tiles T.sub.ij (x,y) effectively calculates a seamless DWT of I(x,y). Data retrieval consists of specifying a resolution and a region of I(x,y) for display. The subset of stored DWT coefficients corresponding to each requested scene is determined and then decompressed for input to an inverse DWT, the output of which forms the image display. The repeated process whereby image views are specified may take the form an interaction with a computer pointing device on an image display from a previous retrieval.
Radiometrically accurate scene-based nonuniformity correction for array sensors.
Ratliff, Bradley M; Hayat, Majeed M; Tyo, J Scott
2003-10-01
A novel radiometrically accurate scene-based nonuniformity correction (NUC) algorithm is described. The technique combines absolute calibration with a recently reported algebraic scene-based NUC algorithm. The technique is based on the following principle: First, detectors that are along the perimeter of the focal-plane array are absolutely calibrated; then the calibration is transported to the remaining uncalibrated interior detectors through the application of the algebraic scene-based algorithm, which utilizes pairs of image frames exhibiting arbitrary global motion. The key advantage of this technique is that it can obtain radiometric accuracy during NUC without disrupting camera operation. Accurate estimates of the bias nonuniformity can be achieved with relatively few frames, which can be fewer than ten frame pairs. Advantages of this technique are discussed, and a thorough performance analysis is presented with use of simulated and real infrared imagery.
Active polarization descattering.
Treibitz, Tali; Schechner, Yoav Y
2009-03-01
Vision in scattering media is important but challenging. Images suffer from poor visibility due to backscattering and attenuation. Most prior methods for scene recovery use active illumination scanners (structured and gated), which can be slow and cumbersome, while natural illumination is inapplicable to dark environments. The current paper addresses the need for a non-scanning recovery method, that uses active scene irradiance. We study the formation of images under widefield artificial illumination. Based on the formation model, the paper presents an approach for recovering the object signal. It also yields rough information about the 3D scene structure. The approach can work with compact, simple hardware, having active widefield, polychromatic polarized illumination. The camera is fitted with a polarization analyzer. Two frames of the scene are taken, with different states of the analyzer or polarizer. A recovery algorithm follows the acquisition. It allows both the backscatter and the object reflection to be partially polarized. It thus unifies and generalizes prior polarization-based methods, which had assumed exclusive polarization of either of these components. The approach is limited to an effective range, due to image noise and illumination falloff. Thus, the limits and noise sensitivity are analyzed. We demonstrate the approach in underwater field experiments.
Side information in coded aperture compressive spectral imaging
NASA Astrophysics Data System (ADS)
Galvis, Laura; Arguello, Henry; Lau, Daniel; Arce, Gonzalo R.
2017-02-01
Coded aperture compressive spectral imagers sense a three-dimensional cube by using two-dimensional projections of the coded and spectrally dispersed source. These imagers systems often rely on FPA detectors, SLMs, micromirror devices (DMDs), and dispersive elements. The use of the DMDs to implement the coded apertures facilitates the capture of multiple projections, each admitting a different coded aperture pattern. The DMD allows not only to collect the sufficient number of measurements for spectrally rich scenes or very detailed spatial scenes but to design the spatial structure of the coded apertures to maximize the information content on the compressive measurements. Although sparsity is the only signal characteristic usually assumed for reconstruction in compressing sensing, other forms of prior information such as side information have been included as a way to improve the quality of the reconstructions. This paper presents the coded aperture design in a compressive spectral imager with side information in the form of RGB images of the scene. The use of RGB images as side information of the compressive sensing architecture has two main advantages: the RGB is not only used to improve the reconstruction quality but to optimally design the coded apertures for the sensing process. The coded aperture design is based on the RGB scene and thus the coded aperture structure exploits key features such as scene edges. Real reconstructions of noisy compressed measurements demonstrate the benefit of the designed coded apertures in addition to the improvement in the reconstruction quality obtained by the use of side information.
Scene-based nonuniformity corrections for optical and SWIR pushbroom sensors.
Leathers, Robert; Downes, Trijntje; Priest, Richard
2005-06-27
We propose and evaluate several scene-based methods for computing nonuniformity corrections for visible or near-infrared pushbroom sensors. These methods can be used to compute new nonuniformity correction values or to repair or refine existing radiometric calibrations. For a given data set, the preferred method depends on the quality of the data, the type of scenes being imaged, and the existence and quality of a laboratory calibration. We demonstrate our methods with data from several different sensor systems and provide a generalized approach to be taken for any new data set.
3D reconstruction based on light field images
NASA Astrophysics Data System (ADS)
Zhu, Dong; Wu, Chunhong; Liu, Yunluo; Fu, Dongmei
2018-04-01
This paper proposed a method of reconstructing three-dimensional (3D) scene from two light field images capture by Lytro illium. The work was carried out by first extracting the sub-aperture images from light field images and using the scale-invariant feature transform (SIFT) for feature registration on the selected sub-aperture images. Structure from motion (SFM) algorithm is further used on the registration completed sub-aperture images to reconstruct the three-dimensional scene. 3D sparse point cloud was obtained in the end. The method shows that the 3D reconstruction can be implemented by only two light field camera captures, rather than at least a dozen times captures by traditional cameras. This can effectively solve the time-consuming, laborious issues for 3D reconstruction based on traditional digital cameras, to achieve a more rapid, convenient and accurate reconstruction.
Spatial detection of tv channel logos as outliers from the content
NASA Astrophysics Data System (ADS)
Ekin, Ahmet; Braspenning, Ralph
2006-01-01
This paper proposes a purely image-based TV channel logo detection algorithm that can detect logos independently from their motion and transparency features. The proposed algorithm can robustly detect any type of logos, such as transparent and animated, without requiring any temporal constraints whereas known methods have to wait for the occurrence of large motion in the scene and assume stationary logos. The algorithm models logo pixels as outliers from the actual scene content that is represented by multiple 3-D histograms in the YC BC R space. We use four scene histograms corresponding to each of the four corners because the content characteristics change from one image corner to another. A further novelty of the proposed algorithm is that we define image corners and the areas where we compute the scene histograms by a cinematic technique called Golden Section Rule that is used by professionals. The robustness of the proposed algorithm is demonstrated over a dataset of representative TV content.
Atmosphere-based image classification through luminance and hue
NASA Astrophysics Data System (ADS)
Xu, Feng; Zhang, Yujin
2005-07-01
In this paper a novel image classification system is proposed. Atmosphere serves an important role in generating the scene"s topic or in conveying the message behind the scene"s story, which belongs to abstract attribute level in semantic levels. At first, five atmosphere semantic categories are defined according to rules of photo and film grammar, followed by global luminance and hue features. Then the hierarchical SVM classifiers are applied. In each classification stage, corresponding features are extracted and the trained linear SVM is implemented, resulting in two classes. After three stages of classification, five atmosphere categories are obtained. At last, the text annotation of the atmosphere semantics and the corresponding features by Extensible Markup Language (XML) in MPEG-7 is defined, which can be integrated into more multimedia applications (such as searching, indexing and accessing of multimedia content). The experiment is performed on Corel images and film frames. The classification results prove the effectiveness of the definition of atmosphere semantic classes and the corresponding features.
Image Reconstruction from Data Collected with an Imaging Interferometer
NASA Astrophysics Data System (ADS)
DeSantis, Z. J.; Thurman, S. T.; Hix, T. T.; Ogden, C. E.
The intensity distribution of an incoherent source and the spatial coherence function at some distance away are related by a Fourier transform, via the Van Cittert-Zernike theorem. Imaging interferometers measure the spatial coherence of light propagated from the incoherently illuminated object by combining light from spatially separated points to measure interference fringes. The contrast and phase of the fringe are the amplitude and phase of a Fourier component of the source’s intensity distribution. The Fiber-Coupled Interferometer (FCI) testbed is a visible light, lab-based imaging interferometer designed to test aspects of an envisioned ground-based interferometer for imaging geosynchronous satellites. The front half of the FCI testbed consists of the scene projection optics, which includes an incoherently backlit scene, located at the focus of a 1 m aperture f/100 telescope. The projected light was collected by the back half of the FCI testbed. The collection optics consisted of three 11 mm aperture fiber-coupled telescopes. Light in the fibers was combined pairwise and dispersed onto a sensor to measure the interference fringe as a function of wavelength, which produces a radial spoke of measurements in the Fourier domain. The visibility function was sampled throughout the Fourier domain by recording fringe data at many different scene rotations and collection telescope separations. Our image reconstruction algorithm successfully produced images for the three scenes we tested: asymmetric pair of pinholes, U.S. Air Force resolution bar target, and satellite scene. The bar target reconstruction shows detail and resolution near the predicted resolution limit. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author(s) and should not be interpreted as reflecting the official views or policies of the Department of Defense or the U.S. Government.
Lossless Compression of Classification-Map Data
NASA Technical Reports Server (NTRS)
Hua, Xie; Klimesh, Matthew
2009-01-01
A lossless image-data-compression algorithm intended specifically for application to classification-map data is based on prediction, context modeling, and entropy coding. The algorithm was formulated, in consideration of the differences between classification maps and ordinary images of natural scenes, so as to be capable of compressing classification- map data more effectively than do general-purpose image-data-compression algorithms. Classification maps are typically generated from remote-sensing images acquired by instruments aboard aircraft (see figure) and spacecraft. A classification map is a synthetic image that summarizes information derived from one or more original remote-sensing image(s) of a scene. The value assigned to each pixel in such a map is the index of a class that represents some type of content deduced from the original image data for example, a type of vegetation, a mineral, or a body of water at the corresponding location in the scene. When classification maps are generated onboard the aircraft or spacecraft, it is desirable to compress the classification-map data in order to reduce the volume of data that must be transmitted to a ground station.
A view not to be missed: Salient scene content interferes with cognitive restoration
Van der Jagt, Alexander P. N.; Craig, Tony; Brewer, Mark J.; Pearson, David G.
2017-01-01
Attention Restoration Theory (ART) states that built scenes place greater load on attentional resources than natural scenes. This is explained in terms of "hard" and "soft" fascination of built and natural scenes. Given a lack of direct empirical evidence for this assumption we propose that perceptual saliency of scene content can function as an empirically derived indicator of fascination. Saliency levels were established by measuring speed of scene category detection using a Go/No-Go detection paradigm. Experiment 1 shows that built scenes are more salient than natural scenes. Experiment 2 replicates these findings using greyscale images, ruling out a colour-based response strategy, and additionally shows that built objects in natural scenes affect saliency to a greater extent than the reverse. Experiment 3 demonstrates that the saliency of scene content is directly linked to cognitive restoration using an established restoration paradigm. Overall, these findings demonstrate an important link between the saliency of scene content and related cognitive restoration. PMID:28723975
A view not to be missed: Salient scene content interferes with cognitive restoration.
Van der Jagt, Alexander P N; Craig, Tony; Brewer, Mark J; Pearson, David G
2017-01-01
Attention Restoration Theory (ART) states that built scenes place greater load on attentional resources than natural scenes. This is explained in terms of "hard" and "soft" fascination of built and natural scenes. Given a lack of direct empirical evidence for this assumption we propose that perceptual saliency of scene content can function as an empirically derived indicator of fascination. Saliency levels were established by measuring speed of scene category detection using a Go/No-Go detection paradigm. Experiment 1 shows that built scenes are more salient than natural scenes. Experiment 2 replicates these findings using greyscale images, ruling out a colour-based response strategy, and additionally shows that built objects in natural scenes affect saliency to a greater extent than the reverse. Experiment 3 demonstrates that the saliency of scene content is directly linked to cognitive restoration using an established restoration paradigm. Overall, these findings demonstrate an important link between the saliency of scene content and related cognitive restoration.
An earth imaging camera simulation using wide-scale construction of reflectance surfaces
NASA Astrophysics Data System (ADS)
Murthy, Kiran; Chau, Alexandra H.; Amin, Minesh B.; Robinson, M. Dirk
2013-10-01
Developing and testing advanced ground-based image processing systems for earth-observing remote sensing applications presents a unique challenge that requires advanced imagery simulation capabilities. This paper presents an earth-imaging multispectral framing camera simulation system called PayloadSim (PaySim) capable of generating terabytes of photorealistic simulated imagery. PaySim leverages previous work in 3-D scene-based image simulation, adding a novel method for automatically and efficiently constructing 3-D reflectance scenes by draping tiled orthorectified imagery over a geo-registered Digital Elevation Map (DEM). PaySim's modeling chain is presented in detail, with emphasis given to the techniques used to achieve computational efficiency. These techniques as well as cluster deployment of the simulator have enabled tuning and robust testing of image processing algorithms, and production of realistic sample data for customer-driven image product development. Examples of simulated imagery of Skybox's first imaging satellite are shown.
Neural codes of seeing architectural styles
Choo, Heeyoung; Nasar, Jack L.; Nikrahei, Bardia; Walther, Dirk B.
2017-01-01
Images of iconic buildings, such as the CN Tower, instantly transport us to specific places, such as Toronto. Despite the substantial impact of architectural design on people’s visual experience of built environments, we know little about its neural representation in the human brain. In the present study, we have found patterns of neural activity associated with specific architectural styles in several high-level visual brain regions, but not in primary visual cortex (V1). This finding suggests that the neural correlates of the visual perception of architectural styles stem from style-specific complex visual structure beyond the simple features computed in V1. Surprisingly, the network of brain regions representing architectural styles included the fusiform face area (FFA) in addition to several scene-selective regions. Hierarchical clustering of error patterns further revealed that the FFA participated to a much larger extent in the neural encoding of architectural styles than entry-level scene categories. We conclude that the FFA is involved in fine-grained neural encoding of scenes at a subordinate-level, in our case, architectural styles of buildings. This study for the first time shows how the human visual system encodes visual aspects of architecture, one of the predominant and longest-lasting artefacts of human culture. PMID:28071765
Neural codes of seeing architectural styles.
Choo, Heeyoung; Nasar, Jack L; Nikrahei, Bardia; Walther, Dirk B
2017-01-10
Images of iconic buildings, such as the CN Tower, instantly transport us to specific places, such as Toronto. Despite the substantial impact of architectural design on people's visual experience of built environments, we know little about its neural representation in the human brain. In the present study, we have found patterns of neural activity associated with specific architectural styles in several high-level visual brain regions, but not in primary visual cortex (V1). This finding suggests that the neural correlates of the visual perception of architectural styles stem from style-specific complex visual structure beyond the simple features computed in V1. Surprisingly, the network of brain regions representing architectural styles included the fusiform face area (FFA) in addition to several scene-selective regions. Hierarchical clustering of error patterns further revealed that the FFA participated to a much larger extent in the neural encoding of architectural styles than entry-level scene categories. We conclude that the FFA is involved in fine-grained neural encoding of scenes at a subordinate-level, in our case, architectural styles of buildings. This study for the first time shows how the human visual system encodes visual aspects of architecture, one of the predominant and longest-lasting artefacts of human culture.
Phase information contained in meter-scale SAR images
NASA Astrophysics Data System (ADS)
Datcu, Mihai; Schwarz, Gottfried; Soccorsi, Matteo; Chaabouni, Houda
2007-10-01
The properties of single look complex SAR satellite images have already been analyzed by many investigators. A common belief is that, apart from inverse SAR methods or polarimetric applications, no information can be gained from the phase of each pixel. This belief is based on the assumption that we obtain uniformly distributed random phases when a sufficient number of small-scale scatterers are mixed in each image pixel. However, the random phase assumption does no longer hold for typical high resolution urban remote sensing scenes, when a limited number of prominent human-made scatterers with near-regular shape and sub-meter size lead to correlated phase patterns. If the pixel size shrinks to a critical threshold of about 1 meter, the reflectance of built-up urban scenes becomes dominated by typical metal reflectors, corner-like structures, and multiple scattering. The resulting phases are hard to model, but one can try to classify a scene based on the phase characteristics of neighboring image pixels. We provide a "cooking recipe" of how to analyze existing phase patterns that extend over neighboring pixels.
NASA Astrophysics Data System (ADS)
Altschuler, Bruce R.; Monson, Keith L.
1998-03-01
Representation of crime scenes as virtual reality 3D computer displays promises to become a useful and important tool for law enforcement evaluation and analysis, forensic identification and pathological study and archival presentation during court proceedings. Use of these methods for assessment of evidentiary materials demands complete accuracy of reproduction of the original scene, both in data collection and in its eventual virtual reality representation. The recording of spatially accurate information as soon as possible after first arrival of law enforcement personnel is advantageous for unstable or hazardous crime scenes and reduces the possibility that either inadvertent measurement error or deliberate falsification may occur or be alleged concerning processing of a scene. Detailed measurements and multimedia archiving of critical surface topographical details in a calibrated, uniform, consistent and standardized quantitative 3D coordinate method are needed. These methods would afford professional personnel in initial contact with a crime scene the means for remote, non-contacting, immediate, thorough and unequivocal documentation of the contents of the scene. Measurements of the relative and absolute global positions of object sand victims, and their dispositions within the scene before their relocation and detailed examination, could be made. Resolution must be sufficient to map both small and large objects. Equipment must be able to map regions at varied resolution as collected from different perspectives. Progress is presented in devising methods for collecting and archiving 3D spatial numerical data from crime scenes, sufficient for law enforcement needs, by remote laser structured light and video imagery. Two types of simulation studies were done. One study evaluated the potential of 3D topographic mapping and 3D telepresence using a robotic platform for explosive ordnance disassembly. The second study involved using the laser mapping system on a fixed optical bench with simulated crime scene models of the people and furniture to assess feasibility, requirements and utility of such a system for crime scene documentation and analysis.
NASA Astrophysics Data System (ADS)
Torkildsen, H. E.; Hovland, H.; Opsahl, T.; Haavardsholm, T. V.; Nicolas, S.; Skauli, T.
2014-06-01
In some applications of multi- or hyperspectral imaging, it is important to have a compact sensor. The most compact spectral imaging sensors are based on spectral filtering in the focal plane. For hyperspectral imaging, it has been proposed to use a "linearly variable" bandpass filter in the focal plane, combined with scanning of the field of view. As the image of a given object in the scene moves across the field of view, it is observed through parts of the filter with varying center wavelength, and a complete spectrum can be assembled. However if the radiance received from the object varies with viewing angle, or with time, then the reconstructed spectrum will be distorted. We describe a camera design where this hyperspectral functionality is traded for multispectral imaging with better spectral integrity. Spectral distortion is minimized by using a patterned filter with 6 bands arranged close together, so that a scene object is seen by each spectral band in rapid succession and with minimal change in viewing angle. The set of 6 bands is repeated 4 times so that the spectral data can be checked for internal consistency. Still the total extent of the filter in the scan direction is small. Therefore the remainder of the image sensor can be used for conventional imaging with potential for using motion tracking and 3D reconstruction to support the spectral imaging function. We show detailed characterization of the point spread function of the camera, demonstrating the importance of such characterization as a basis for image reconstruction. A simplified image reconstruction based on feature-based image coregistration is shown to yield reasonable results. Elimination of spectral artifacts due to scene motion is demonstrated.
Acceptable bit-rates for human face identification from CCTV imagery
NASA Astrophysics Data System (ADS)
Tsifouti, Anastasia; Triantaphillidou, Sophie; Bilissi, Efthimia; Larabi, Mohamed-Chaker
2013-01-01
The objective of this investigation is to produce recommendations for acceptable bit-rates of CCTV footage of people onboard London buses. The majority of CCTV recorders on buses use a proprietary format based on the H.264/AVC video coding standard, exploiting both spatial and temporal redundancy. Low bit-rates are favored in the CCTV industry but they compromise the image usefulness of the recorded imagery. In this context usefulness is defined by the presence of enough facial information remaining in the compressed image to allow a specialist to identify a person. The investigation includes four steps: 1) Collection of representative video footage. 2) The grouping of video scenes based on content attributes. 3) Psychophysical investigations to identify key scenes, which are most affected by compression. 4) Testing of recording systems using the key scenes and further psychophysical investigations. The results are highly dependent upon scene content. For example, very dark and very bright scenes were the most challenging to compress, requiring higher bit-rates to maintain useful information. The acceptable bit-rates are also found to be dependent upon the specific CCTV system used to compress the footage, presenting challenges in drawing conclusions about universal `average' bit-rates.
Visual encoding and fixation target selection in free viewing: presaccadic brain potentials
Nikolaev, Andrey R.; Jurica, Peter; Nakatani, Chie; Plomp, Gijs; van Leeuwen, Cees
2013-01-01
In scrutinizing a scene, the eyes alternate between fixations and saccades. During a fixation, two component processes can be distinguished: visual encoding and selection of the next fixation target. We aimed to distinguish the neural correlates of these processes in the electrical brain activity prior to a saccade onset. Participants viewed color photographs of natural scenes, in preparation for a change detection task. Then, for each participant and each scene we computed an image heat map, with temperature representing the duration and density of fixations. The temperature difference between the start and end points of saccades was taken as a measure of the expected task-relevance of the information concentrated in specific regions of a scene. Visual encoding was evaluated according to whether subsequent change was correctly detected. Saccades with larger temperature difference were more likely to be followed by correct detection than ones with smaller temperature differences. The amplitude of presaccadic activity over anterior brain areas was larger for correct detection than for detection failure. This difference was observed for short “scrutinizing” but not for long “explorative” saccades, suggesting that presaccadic activity reflects top-down saccade guidance. Thus, successful encoding requires local scanning of scene regions which are expected to be task-relevant. Next, we evaluated fixation target selection. Saccades “moving up” in temperature were preceded by presaccadic activity of higher amplitude than those “moving down”. This finding suggests that presaccadic activity reflects attention deployed to the following fixation location. Our findings illustrate how presaccadic activity can elucidate concurrent brain processes related to the immediate goal of planning the next saccade and the larger-scale goal of constructing a robust representation of the visual scene. PMID:23818877
Systems and Methods for Automated Water Detection Using Visible Sensors
NASA Technical Reports Server (NTRS)
Rankin, Arturo L. (Inventor); Matthies, Larry H. (Inventor); Bellutta, Paolo (Inventor)
2016-01-01
Systems and methods are disclosed that include automated machine vision that can utilize images of scenes captured by a 3D imaging system configured to image light within the visible light spectrum to detect water. One embodiment includes autonomously detecting water bodies within a scene including capturing at least one 3D image of a scene using a sensor system configured to detect visible light and to measure distance from points within the scene to the sensor system, and detecting water within the scene using a processor configured to detect regions within each of the at least one 3D images that possess at least one characteristic indicative of the presence of water.
A statistical model for radar images of agricultural scenes
NASA Technical Reports Server (NTRS)
Frost, V. S.; Shanmugan, K. S.; Holtzman, J. C.; Stiles, J. A.
1982-01-01
The presently derived and validated statistical model for radar images containing many different homogeneous fields predicts the probability density functions of radar images of entire agricultural scenes, thereby allowing histograms of large scenes composed of a variety of crops to be described. Seasat-A SAR images of agricultural scenes are accurately predicted by the model on the basis of three assumptions: each field has the same SNR, all target classes cover approximately the same area, and the true reflectivity characterizing each individual target class is a uniformly distributed random variable. The model is expected to be useful in the design of data processing algorithms and for scene analysis using radar images.
Navigation domain representation for interactive multiview imaging.
Maugey, Thomas; Daribo, Ismael; Cheung, Gene; Frossard, Pascal
2013-09-01
Enabling users to interactively navigate through different viewpoints of a static scene is a new interesting functionality in 3D streaming systems. While it opens exciting perspectives toward rich multimedia applications, it requires the design of novel representations and coding techniques to solve the new challenges imposed by the interactive navigation. In particular, the encoder must prepare a priori a compressed media stream that is flexible enough to enable the free selection of multiview navigation paths by different streaming media clients. Interactivity clearly brings new design constraints: the encoder is unaware of the exact decoding process, while the decoder has to reconstruct information from incomplete subsets of data since the server generally cannot transmit images for all possible viewpoints due to resource constrains. In this paper, we propose a novel multiview data representation that permits us to satisfy bandwidth and storage constraints in an interactive multiview streaming system. In particular, we partition the multiview navigation domain into segments, each of which is described by a reference image (color and depth data) and some auxiliary information. The auxiliary information enables the client to recreate any viewpoint in the navigation segment via view synthesis. The decoder is then able to navigate freely in the segment without further data request to the server; it requests additional data only when it moves to a different segment. We discuss the benefits of this novel representation in interactive navigation systems and further propose a method to optimize the partitioning of the navigation domain into independent segments, under bandwidth and storage constraints. Experimental results confirm the potential of the proposed representation; namely, our system leads to similar compression performance as classical inter-view coding, while it provides the high level of flexibility that is required for interactive streaming. Because of these unique properties, our new framework represents a promising solution for 3D data representation in novel interactive multimedia services.
NASA Astrophysics Data System (ADS)
Kuvich, Gary
2003-08-01
Vision is a part of a larger information system that converts visual information into knowledge structures. These structures drive vision process, resolve ambiguity and uncertainty via feedback projections, and provide image understanding that is an interpretation of visual information in terms of such knowledge models. The ability of human brain to emulate knowledge structures in the form of networks-symbolic models is found. And that means an important shift of paradigm in our knowledge about brain from neural networks to "cortical software". Symbols, predicates and grammars naturally emerge in such active multilevel hierarchical networks, and logic is simply a way of restructuring such models. Brain analyzes an image as a graph-type decision structure created via multilevel hierarchical compression of visual information. Mid-level vision processes like clustering, perceptual grouping, separation of figure from ground, are special kinds of graph/network transformations. They convert low-level image structure into the set of more abstract ones, which represent objects and visual scene, making them easy for analysis by higher-level knowledge structures. Higher-level vision phenomena are results of such analysis. Composition of network-symbolic models works similar to frames and agents, combines learning, classification, analogy together with higher-level model-based reasoning into a single framework. Such models do not require supercomputers. Based on such principles, and using methods of Computational intelligence, an Image Understanding system can convert images into the network-symbolic knowledge models, and effectively resolve uncertainty and ambiguity, providing unifying representation for perception and cognition. That allows creating new intelligent computer vision systems for robotic and defense industries.
NASA Astrophysics Data System (ADS)
Keane, Tommy P.; Saber, Eli; Rhody, Harvey; Savakis, Andreas; Raj, Jeffrey
2012-04-01
Contemporary research in automated panorama creation utilizes camera calibration or extensive knowledge of camera locations and relations to each other to achieve successful results. Research in image registration attempts to restrict these same camera parameters or apply complex point-matching schemes to overcome the complications found in real-world scenarios. This paper presents a novel automated panorama creation algorithm by developing an affine transformation search based on maximized mutual information (MMI) for region-based registration. Standard MMI techniques have been limited to applications with airborne/satellite imagery or medical images. We show that a novel MMI algorithm can approximate an accurate registration between views of realistic scenes of varying depth distortion. The proposed algorithm has been developed using stationary, color, surveillance video data for a scenario with no a priori camera-to-camera parameters. This algorithm is robust for strict- and nearly-affine-related scenes, while providing a useful approximation for the overlap regions in scenes related by a projective homography or a more complex transformation, allowing for a set of efficient and accurate initial conditions for pixel-based registration.
ASTER cloud coverage reassessment using MODIS cloud mask products
NASA Astrophysics Data System (ADS)
Tonooka, Hideyuki; Omagari, Kunjuro; Yamamoto, Hirokazu; Tachikawa, Tetsushi; Fujita, Masaru; Paitaer, Zaoreguli
2010-10-01
In the Advanced Spaceborne Thermal Emission and Reflection radiometer (ASTER) Project, two kinds of algorithms are used for cloud assessment in Level-1 processing. The first algorithm based on the LANDSAT-5 TM Automatic Cloud Cover Assessment (ACCA) algorithm is used for a part of daytime scenes observed with only VNIR bands and all nighttime scenes, and the second algorithm based on the LANDSAT-7 ETM+ ACCA algorithm is used for most of daytime scenes observed with all spectral bands. However, the first algorithm does not work well for lack of some spectral bands sensitive to cloud detection, and the two algorithms have been less accurate over snow/ice covered areas since April 2008 when the SWIR subsystem developed troubles. In addition, they perform less well for some combinations of surface type and sun elevation angle. We, therefore, have developed the ASTER cloud coverage reassessment system using MODIS cloud mask (MOD35) products, and have reassessed cloud coverage for all ASTER archived scenes (>1.7 million scenes). All of the new cloud coverage data are included in Image Management System (IMS) databases of the ASTER Ground Data System (GDS) and NASA's Land Process Data Active Archive Center (LP DAAC) and used for ASTER product search by users, and cloud mask images are distributed to users through Internet. Daily upcoming scenes (about 400 scenes per day) are reassessed and inserted into the IMS databases in 5 to 7 days after each scene observation date. Some validation studies for the new cloud coverage data and some mission-related analyses using those data are also demonstrated in the present paper.
Ontological Representation of Light Wave Camera Data to Support Vision-Based AmI
Serrano, Miguel Ángel; Gómez-Romero, Juan; Patricio, Miguel Ángel; García, Jesús; Molina, José Manuel
2012-01-01
Recent advances in technologies for capturing video data have opened a vast amount of new application areas in visual sensor networks. Among them, the incorporation of light wave cameras on Ambient Intelligence (AmI) environments provides more accurate tracking capabilities for activity recognition. Although the performance of tracking algorithms has quickly improved, symbolic models used to represent the resulting knowledge have not yet been adapted to smart environments. This lack of representation does not allow to take advantage of the semantic quality of the information provided by new sensors. This paper advocates for the introduction of a part-based representational level in cognitive-based systems in order to accurately represent the novel sensors' knowledge. The paper also reviews the theoretical and practical issues in part-whole relationships proposing a specific taxonomy for computer vision approaches. General part-based patterns for human body and transitive part-based representation and inference are incorporated to an ontology-based previous framework to enhance scene interpretation in the area of video-based AmI. The advantages and new features of the model are demonstrated in a Social Signal Processing (SSP) application for the elaboration of live market researches.
IR characteristic simulation of city scenes based on radiosity model
NASA Astrophysics Data System (ADS)
Xiong, Xixian; Zhou, Fugen; Bai, Xiangzhi; Yu, Xiyu
2013-09-01
Reliable modeling for thermal infrared (IR) signatures of real-world city scenes is required for signature management of civil and military platforms. Traditional modeling methods generally assume that scene objects are individual entities during the physical processes occurring in infrared range. However, in reality, the physical scene involves convective and conductive interactions between objects as well as the radiations interactions between objects. A method based on radiosity model describes these complex effects. It has been developed to enable an accurate simulation for the radiance distribution of the city scenes. Firstly, the physical processes affecting the IR characteristic of city scenes were described. Secondly, heat balance equations were formed on the basis of combining the atmospheric conditions, shadow maps and the geometry of scene. Finally, finite difference method was used to calculate the kinetic temperature of object surface. A radiosity model was introduced to describe the scattering effect of radiation between surface elements in the scene. By the synthesis of objects radiance distribution in infrared range, we could obtain the IR characteristic of scene. Real infrared images and model predictions were shown and compared. The results demonstrate that this method can realistically simulate the IR characteristic of city scenes. It effectively displays the infrared shadow effects and the radiation interactions between objects in city scenes.
Guided filter-based fusion method for multiexposure images
NASA Astrophysics Data System (ADS)
Hou, Xinglin; Luo, Haibo; Qi, Feng; Zhou, Peipei
2016-11-01
It is challenging to capture a high-dynamic range (HDR) scene using a low-dynamic range camera. A weighted sum-based image fusion (IF) algorithm is proposed so as to express an HDR scene with a high-quality image. This method mainly includes three parts. First, two image features, i.e., gradients and well-exposedness are measured to estimate the initial weight maps. Second, the initial weight maps are refined by a guided filter, in which the source image is considered as the guidance image. This process could reduce the noise in initial weight maps and preserve more texture consistent with the original images. Finally, the fused image is constructed by a weighted sum of source images in the spatial domain. The main contributions of this method are the estimation of the initial weight maps and the appropriate use of the guided filter-based weight maps refinement. It provides accurate weight maps for IF. Compared to traditional IF methods, this algorithm avoids image segmentation, combination, and the camera response curve calibration. Furthermore, experimental results demonstrate the superiority of the proposed method in both subjective and objective evaluations.
Cortical mechanisms for the segregation and representation of acoustic textures.
Overath, Tobias; Kumar, Sukhbinder; Stewart, Lauren; von Kriegstein, Katharina; Cusack, Rhodri; Rees, Adrian; Griffiths, Timothy D
2010-02-10
Auditory object analysis requires two fundamental perceptual processes: the definition of the boundaries between objects, and the abstraction and maintenance of an object's characteristic features. Although it is intuitive to assume that the detection of the discontinuities at an object's boundaries precedes the subsequent precise representation of the object, the specific underlying cortical mechanisms for segregating and representing auditory objects within the auditory scene are unknown. We investigated the cortical bases of these two processes for one type of auditory object, an "acoustic texture," composed of multiple frequency-modulated ramps. In these stimuli, we independently manipulated the statistical rules governing (1) the frequency-time space within individual textures (comprising ramps with a given spectrotemporal coherence) and (2) the boundaries between textures (adjacent textures with different spectrotemporal coherences). Using functional magnetic resonance imaging, we show mechanisms defining boundaries between textures with different coherences in primary and association auditory cortices, whereas texture coherence is represented only in association cortex. Furthermore, participants' superior detection of boundaries across which texture coherence increased (as opposed to decreased) was reflected in a greater neural response in auditory association cortex at these boundaries. The results suggest a hierarchical mechanism for processing acoustic textures that is relevant to auditory object analysis: boundaries between objects are first detected as a change in statistical rules over frequency-time space, before a representation that corresponds to the characteristics of the perceived object is formed.
Surface Color Perception and Equivalent Illumination Models
Brainard, David H.; Maloney, Laurence T.
2011-01-01
Vision provides information about the properties and identity of objects. The ease with which we make such judgments belies the difficulty of the information-processing task that accomplishes it. In the case of object color, retinal information about object reflectance is confounded with information about the illumination as well as about the object’s shape and pose. Because of these factors, there is no obvious rule that allows transformation of the retinal images of an object to a color representation that depends primarily on the object’s surface reflectance properties. Despite the difficulty of this task, however, under many circumstances object color appearance is remarkably stable across scenes in which the object is viewed. Here we review experiments and theory that aim to understand how the visual system stabilizes the color appearance of object surfaces. Our emphasis is on a class of models derived from explicit analysis of the computational problem of estimating the physical properties of illuminants and surfaces from the information available in the retinal image and experiments that test these models. We argue that this approach has considerable promise for allowing generalization from simplified laboratory experiments to richer scenes that more closely approximate natural viewing. PMID:21536727
Neurons in the human hippocampus and amygdala respond to both low- and high-level image properties
Cabrales, Elaine; Wilson, Michael S.; Baker, Christopher P.; Thorp, Christopher K.; Smith, Kris A.; Treiman, David M.
2011-01-01
A large number of studies have demonstrated that structures within the medial temporal lobe, such as the hippocampus, are intimately involved in declarative memory for objects and people. Although these items are abstractions of the visual scene, specific visual details can change the speed and accuracy of their recall. By recording from 415 neurons in the hippocampus and amygdala of human epilepsy patients as they viewed images drawn from 10 image categories, we showed that the firing rates of 8% of these neurons encode image illuminance and contrast, low-level properties not directly pertinent to task performance, whereas in 7% of the neurons, firing rates encode the category of the item depicted in the image, a high-level property pertinent to the task. This simultaneous representation of high- and low-level image properties within the same brain areas may serve to bind separate aspects of visual objects into a coherent percept and allow episodic details of objects to influence mnemonic performance. PMID:21471400
Evans, Benjamin D; Stringer, Simon M
2015-04-01
Learning to recognise objects and faces is an important and challenging problem tackled by the primate ventral visual system. One major difficulty lies in recognising an object despite profound differences in the retinal images it projects, due to changes in view, scale, position and other identity-preserving transformations. Several models of the ventral visual system have been successful in coping with these issues, but have typically been privileged by exposure to only one object at a time. In natural scenes, however, the challenges of object recognition are typically further compounded by the presence of several objects which should be perceived as distinct entities. In the present work, we explore one possible mechanism by which the visual system may overcome these two difficulties simultaneously, through segmenting unseen (artificial) stimuli using information about their category encoded in plastic lateral connections. We demonstrate that these experience-guided lateral interactions robustly organise input representations into perceptual cycles, allowing feed-forward connections trained with spike-timing-dependent plasticity to form independent, translation-invariant output representations. We present these simulations as a functional explanation for the role of plasticity in the lateral connectivity of visual cortex.
Stimulus dependence of local field potential spectra: experiment versus theory.
Barbieri, Francesca; Mazzoni, Alberto; Logothetis, Nikos K; Panzeri, Stefano; Brunel, Nicolas
2014-10-29
The local field potential (LFP) captures different neural processes, including integrative synaptic dynamics that cannot be observed by measuring only the spiking activity of small populations. Therefore, investigating how LFP power is modulated by external stimuli can offer important insights into sensory neural representations. However, gaining such insight requires developing data-driven computational models that can identify and disambiguate the neural contributions to the LFP. Here, we investigated how networks of excitatory and inhibitory integrate-and-fire neurons responding to time-dependent inputs can be used to interpret sensory modulations of LFP spectra. We computed analytically from such models the LFP spectra and the information that they convey about input and used these analytical expressions to fit the model to LFPs recorded in V1 of anesthetized macaques (Macaca mulatta) during the presentation of color movies. Our expressions explain 60%-98% of the variance of the LFP spectrum shape and its dependency upon movie scenes and we achieved this with realistic values for the best-fit parameters. In particular, synaptic best-fit parameters were compatible with experimental measurements and the predictions of firing rates, based only on the fit of LFP data, correlated with the multiunit spike rate recorded from the same location. Moreover, the parameters characterizing the input to the network across different movie scenes correlated with cross-scene changes of several image features. Our findings suggest that analytical descriptions of spiking neuron networks may become a crucial tool for the interpretation of field recordings. Copyright © 2014 the authors 0270-6474/14/3414589-17$15.00/0.
NASA Astrophysics Data System (ADS)
Toadere, Florin
2017-12-01
A spectral image processing algorithm that allows the illumination of the scene with different illuminants together with the reconstruction of the scene's reflectance is presented. Color checker spectral image and CIE A (warm light 2700 K), D65 (cold light 6500 K) and Cree TW Series LED T8 (4000 K) are employed for scene illumination. Illuminants used in the simulations have different spectra and, as a result of their illumination, the colors of the scene change. The influence of the illuminants on the reconstruction of the scene's reflectance is estimated. Demonstrative images and reflectance showing the operation of the algorithm are illustrated.
NASA Astrophysics Data System (ADS)
Zalevsky, Zeev; Ilovitsh, Asaf; Beiderman, Yevgeny
2013-10-01
We present an approach allowing seeing objects that are hidden and that are not positioned in direct line of sight with security inspection cameras. The approach is based on inspecting the back reflections obtained from the cornea and the sclera of the eyes of people attending the inspected scene and which are positioned in front of the hidden objects we aim to image after performing proper calibration with point light source (e.g. a LED). The scene can be a forensic scene or for instance a casino in which the application is to see the cards of poker players seating in front of you.
NASA Astrophysics Data System (ADS)
Barsai, Gabor
Creating accurate, current digital maps and 3-D scenes is a high priority in today's fast changing environment. The nation's maps are in a constant state of revision, with many alterations or new additions each day. Digital maps have become quite common. Google maps, Mapquest and others are examples. These also have 3-D viewing capability. Many details are now included, such as the height of low bridges, in the attribute data for the objects displayed on digital maps and scenes. To expedite the updating of these datasets, they should be created autonomously, without human intervention, from data streams. Though systems exist that attain fast, or even real-time performance mapping and reconstruction, they are typically restricted to creating sketches from the data stream, and not accurate maps or scenes. The ever increasing amount of image data available from private companies, governments and the internet, suggest the development of an automated system is of utmost importance. The proposed framework can create 3-D views autonomously; which extends the functionality of digital mapping. The first step to creating 3-D views is to reconstruct the scene of the area to be mapped. To reconstruct a scene from heterogeneous sources, the data has to be registered: either to each other or, preferably, to a general, absolute coordinate system. Registering an image is based on the reconstruction of the geometric relationship of the image to the coordinate system at the time of imaging. Registration is the process of determining the geometric transformation parameters of a dataset in one coordinate system, the source, with respect to the other coordinate system, the target. The advantages of fusing these datasets by registration manifests itself by the data contained in the complementary information that different modality datasets have. The complementary characteristics of these systems can be fully utilized only after successful registration of the photogrammetric and alternative data relative to a common reference frame. This research provides a novel approach to finding registration parameters, without the explicit use of conjugate points, but using conjugate features. These features are open or closed free-form linear features, there is no need for a parametric or any other type of representation of these features The proposed method will use different modality datasets of the same area: lidar data, image data and GIS data. There are two datasets: one from the Ohio State University and the other from San Bernardino, California. The reconstruction of scenes from imagery and range data, using laser and radar data, has been an active research area in the fields of photogrammetry and computer vision. Automatic, or just less human intervention, would have a great impact on alleviating the "bottle-neck" that describes the current state of creating knowledge from data. Pixels or laser points, the output of the sensor, represent a discretization of the real world. By themselves, these data points do not contain representative information. The values that are associated with them, intensity values and coordinates, do not define an object, and thus accurate maps are not possible just from data. Data is not an end product, nor does it directly provide answers to applications, although implicitly, the information about the object in question is contained in the data. In some form, the data from the initial data acquisition by the sensor has to be further processed to create useable information, and this information has to be combined with facts, procedures and heuristics that can be used to make inferences for reconstruction. To reconstruct a scene perfectly, whether it is an urban or rural scene, requires prior knowledge, heuristics. Buildings are, usually, smooth surfaces and many buildings are blocky with orthogonal, straight edges and sides; streets are smooth; vegetation is rough, with different shapes and sizes of trees, bushes. This research provides a path to fuse data from lidar, GIS and digital multispectral images and reconstructing the precise 3-D scene model, without human intervention, regardless of the type of data or features in the data. The data are initially registered to each other using GPS/INS initial positional values, then conjugate features are found in the datasets to refine the registration. The novelty of the research is that no conjugate points are necessary in the various datasets, and registration is performed without human intervention. The proposed system uses the original lidar and GIS data and finds edges of buildings with the help of the digital images, utilizing the exterior orientation parameters to project the lidar points onto the edge extracted image/map. These edge points are then utilized to orient and locate the datasets, in a correct position with respect to each other.
Tomaszewski, Michał; Ruszczak, Bogdan; Michalski, Paweł
2018-06-01
Electrical insulators are elements of power lines that require periodical diagnostics. Due to their location on the components of high-voltage power lines, their imaging can be cumbersome and time-consuming, especially under varying lighting conditions. Insulator diagnostics with the use of visual methods may require localizing insulators in the scene. Studies focused on insulator localization in the scene apply a number of methods, including: texture analysis, MRF (Markov Random Field), Gabor filters or GLCM (Gray Level Co-Occurrence Matrix) [1], [2]. Some methods, e.g. those which localize insulators based on colour analysis [3], rely on object and scene illumination, which is why the images from the dataset are taken under varying lighting conditions. The dataset may also be used to compare the effectiveness of different methods of localizing insulators in images. This article presents high-resolution images depicting a long rod electrical insulator under varying lighting conditions and against different backgrounds: crops, forest and grass. The dataset contains images with visible laser spots (generated by a device emitting light at the wavelength of 532 nm) and images without such spots, as well as complementary data concerning the illumination level and insulator position in the scene, the number of registered laser spots, and their coordinates in the image. The laser spots may be used to support object-localizing algorithms, while the images without spots may serve as a source of information for those algorithms which do not need spots to localize an insulator.
Understanding Deep Representations Learned in Modeling Users Likes.
Guntuku, Sharath Chandra; Zhou, Joey Tianyi; Roy, Sujoy; Lin, Weisi; Tsang, Ivor W
2016-08-01
Automatically understanding and discriminating different users' liking for an image is a challenging problem. This is because the relationship between image features (even semantic ones extracted by existing tools, viz., faces, objects, and so on) and users' likes is non-linear, influenced by several subtle factors. This paper presents a deep bi-modal knowledge representation of images based on their visual content and associated tags (text). A mapping step between the different levels of visual and textual representations allows for the transfer of semantic knowledge between the two modalities. Feature selection is applied before learning deep representation to identify the important features for a user to like an image. The proposed representation is shown to be effective in discriminating users based on images they like and also in recommending images that a given user likes, outperforming the state-of-the-art feature representations by ∼ 15 %-20%. Beyond this test-set performance, an attempt is made to qualitatively understand the representations learned by the deep architecture used to model user likes.
Warren, Wayne; Brinkley, James F.
2005-01-01
Few biomedical subjects of study are as resource-intensive to teach as gross anatomy. Medical education stands to benefit greatly from applications which deliver virtual representations of human anatomical structures. While many applications have been created to achieve this goal, their utility to the student is limited because of a lack of interactivity or customizability by expert authors. Here we describe the first version of the Biolucida system, which allows an expert anatomist author to create knowledge-based, customized, and fully interactive scenes and lessons for students of human macroscopic anatomy. Implemented in Java and VRML, Biolucida allows the sharing of these instructional 3D environments over the internet. The system simplifies the process of authoring immersive content while preserving its flexibility and expressivity. PMID:16779148
Warren, Wayne; Brinkley, James F
2005-01-01
Few biomedical subjects of study are as resource-intensive to teach as gross anatomy. Medical education stands to benefit greatly from applications which deliver virtual representations of human anatomical structures. While many applications have been created to achieve this goal, their utility to the student is limited because of a lack of interactivity or customizability by expert authors. Here we describe the first version of the Biolucida system, which allows an expert anatomist author to create knowledge-based, customized, and fully interactive scenes and lessons for students of human macroscopic anatomy. Implemented in Java and VRML, Biolucida allows the sharing of these instructional 3D environments over the internet. The system simplifies the process of authoring immersive content while preserving its flexibility and expressivity.
Representational Momentum in Aviation
ERIC Educational Resources Information Center
Blattler, Colin; Ferrari, Vincent; Didierjean, Andre; Marmeche, Evelyne
2011-01-01
The purpose of this study was to examine the effects of expertise on motion anticipation. We conducted 2 experiments in which novices and expert pilots viewed simulated aircraft landing scenes. The scenes were interrupted by the display of a black screen and then started again after a forward or backward shift. The participant's task was to…
Task demands determine the specificity of the search template.
Bravo, Mary J; Farid, Hany
2012-01-01
When searching for an object, an observer holds a representation of the target in mind while scanning the scene. If the observer repeats the search, performance may become more efficient as the observer hones this target representation, or "search template," to match the specific demands of the search task. An effective search template must have two characteristics: It must reliably discriminate the target from the distractors, and it must tolerate variability in the appearance of the target. The present experiment examined how the tolerance of the search template is affected by the search task. Two groups of 18 observers trained on the same set of stimuli blocked either by target image (block-by-image group) or by target category (block-by-category group). One or two days after training, both groups were tested on a related search task. The pattern of test results revealed that the two groups of observers had developed different search templates, and that the templates of the block-by-category observers better captured the general characteristics of the category. These results demonstrate that observers match their search templates to the demands of the search task.
Research on inosculation between master of ceremonies or players and virtual scene in virtual studio
NASA Astrophysics Data System (ADS)
Li, Zili; Zhu, Guangxi; Zhu, Yaoting
2003-04-01
A technical principle about construction of virtual studio has been proposed where orientation tracker and telemeter has been used for improving conventional BETACAM pickup camera and connecting with the software module of the host. A model of virtual camera named Camera & Post-camera Coupling Pair has been put forward, which is different from the common model in computer graphics and has been bound to real BETACAM pickup camera for shooting. The formula has been educed to compute the foreground frame buffer image and the background frame buffer image of the virtual scene whose boundary is based on the depth information of target point of the real BETACAM pickup camera's projective ray. The effect of real-time consistency has been achieved between the video image sequences of the master of ceremonies or players and the CG video image sequences for the virtual scene in spatial position, perspective relationship and image object masking. The experimental result has shown that the technological scheme of construction of virtual studio submitted in this paper is feasible and more applicative and more effective than the existing technology to establish a virtual studio based on color-key and image synthesis with background using non-linear video editing technique.
Neural-net-based image matching
NASA Astrophysics Data System (ADS)
Jerebko, Anna K.; Barabanov, Nikita E.; Luciv, Vadim R.; Allinson, Nigel M.
2000-04-01
The paper describes a neural-based method for matching spatially distorted image sets. The matching of partially overlapping images is important in many applications-- integrating information from images formed from different spectral ranges, detecting changes in a scene and identifying objects of differing orientations and sizes. Our approach consists of extracting contour features from both images, describing the contour curves as sets of line segments, comparing these sets, determining the corresponding curves and their common reference points, calculating the image-to-image co-ordinate transformation parameters on the basis of the most successful variant of the derived curve relationships. The main steps are performed by custom neural networks. The algorithms describe in this paper have been successfully tested on a large set of images of the same terrain taken in different spectral ranges, at different seasons and rotated by various angles. In general, this experimental verification indicates that the proposed method for image fusion allows the robust detection of similar objects in noisy, distorted scenes where traditional approaches often fail.
Band registration of tuneable frame format hyperspectral UAV imagers in complex scenes
NASA Astrophysics Data System (ADS)
Honkavaara, Eija; Rosnell, Tomi; Oliveira, Raquel; Tommaselli, Antonio
2017-12-01
A recent revolution in miniaturised sensor technology has provided markets with novel hyperspectral imagers operating in the frame format principle. In the case of unmanned aerial vehicle (UAV) based remote sensing, the frame format technology is highly attractive in comparison to the commonly utilised pushbroom scanning technology, because it offers better stability and the possibility to capture stereoscopic data sets, bringing an opportunity for 3D hyperspectral object reconstruction. Tuneable filters are one of the approaches for capturing multi- or hyperspectral frame images. The individual bands are not aligned when operating a sensor based on tuneable filters from a mobile platform, such as UAV, because the full spectrum recording is carried out in the time-sequential principle. The objective of this investigation was to study the aspects of band registration of an imager based on tuneable filters and to develop a rigorous and efficient approach for band registration in complex 3D scenes, such as forests. The method first determines the orientations of selected reference bands and reconstructs the 3D scene using structure-from-motion and dense image matching technologies. The bands, without orientation, are then matched to the oriented bands accounting the 3D scene to provide exterior orientations, and afterwards, hyperspectral orthomosaics, or hyperspectral point clouds, are calculated. The uncertainty aspects of the novel approach were studied. An empirical assessment was carried out in a forested environment using hyperspectral images captured with a hyperspectral 2D frame format camera, based on a tuneable Fabry-Pérot interferometer (FPI) on board a multicopter and supported by a high spatial resolution consumer colour camera. A theoretical assessment showed that the method was capable of providing band registration accuracy better than 0.5-pixel size. The empirical assessment proved the performance and showed that, with the novel method, most parts of the band misalignments were less than the pixel size. Furthermore, it was shown that the performance of the band alignment was dependent on the spatial distance from the reference band.
Iconic memory for the gist of natural scenes.
Clarke, Jason; Mack, Arien
2014-11-01
Does iconic memory contain the gist of multiple scenes? Three experiments were conducted. In the first, four scenes from different basic-level categories were briefly presented in one of two conditions: a cue or a no-cue condition. The cue condition was designed to provide an index of the contents of iconic memory of the display. Subjects were more sensitive to scene gist in the cue condition than in the no-cue condition. In the second, the scenes came from the same basic-level category. We found no difference in sensitivity between the two conditions. In the third, six scenes from different basic level categories were presented in the visual periphery. Subjects were more sensitive to scene gist in the cue condition. These results suggest that scene gist is contained in iconic memory even in the visual periphery; however, iconic representations are not sufficiently detailed to distinguish between scenes coming from the same category. Copyright © 2014 Elsevier Inc. All rights reserved.
Semantic Segmentation of Building Elements Using Point Cloud Hashing
NASA Astrophysics Data System (ADS)
Chizhova, M.; Gurianov, A.; Hess, M.; Luhmann, T.; Brunn, A.; Stilla, U.
2018-05-01
For the interpretation of point clouds, the semantic definition of extracted segments from point clouds or images is a common problem. Usually, the semantic of geometrical pre-segmented point cloud elements are determined using probabilistic networks and scene databases. The proposed semantic segmentation method is based on the psychological human interpretation of geometric objects, especially on fundamental rules of primary comprehension. Starting from these rules the buildings could be quite well and simply classified by a human operator (e.g. architect) into different building types and structural elements (dome, nave, transept etc.), including particular building parts which are visually detected. The key part of the procedure is a novel method based on hashing where point cloud projections are transformed into binary pixel representations. A segmentation approach released on the example of classical Orthodox churches is suitable for other buildings and objects characterized through a particular typology in its construction (e.g. industrial objects in standardized enviroments with strict component design allowing clear semantic modelling).
Radiologists remember mountains better than radiographs, or do they?
Evans, Karla K; Marom, Edith M; Godoy, Myrna C B; Palacio, Diana; Sagebiel, Tara; Cuellar, Sonia Betancourt; McEntee, Mark; Tian, Charles; Brennan, Patrick C; Haygood, Tamara Miner
2016-01-01
Expertise with encoding material has been shown to aid long-term memory for that material. It is not clear how relevant this expertise is for image memorability (e.g., radiologists' memory for radiographs), and how robust over time. In two studies, we tested scene memory using a standard long-term memory paradigm. One compared the performance of radiologists to naïve observers on two image sets, chest radiographs and everyday scenes, and the other radiologists' memory with immediate as opposed to delayed recognition tests using musculoskeletal radiographs and forest scenes. Radiologists' memory was better than novices for images of expertise but no different for everyday scenes. With the heterogeneity of image sets equated, radiologists' expertise with radiographs afforded them better memory for the musculoskeletal radiographs than forest scenes. Enhanced memory for images of expertise disappeared over time, resulting in chance level performance for both image sets after weeks of delay. Expertise with the material is important for visual memorability but not to the same extent as idiosyncratic detail and variability of the image set. Similar memory decline with time for images of expertise as for everyday scenes further suggests that extended familiarity with an image is not a robust factor for visual memorability.
Pretty and powerless: nurses in advertisements, 1930-1950.
Lusk, B
2000-06-01
Images of nurses in pictorial advertisements from all issues of hospital administration journals published in 1930, 1940, and 1950 (N = 598) were examined. Content analysis of the data was based on Goffman's classic 1979 study on gender advertisements. Nurses also were compared with other figures in the advertisements and nursing activities were described. Nurses were predominantly portrayed as female, young, eager to please, and without the appearance of wisdom. In group scenes, nurses were placed as subordinate to physicians and hospital administrators. Nurses in 1940 performed more complex, autonomous activities than in 1930 and 1950. These findings support previous research focused on more recent portrayals of women and nurses in communication media. The overt and subtle subordinate representation of nurses in these advertisements, compared with physicians and administrators, reveals one facet of nursing's heritage as a woman's profession. Copyright 2000 John Wiley & Sons, Inc.
A bio-inspired method and system for visual object-based attention and segmentation
NASA Astrophysics Data System (ADS)
Huber, David J.; Khosla, Deepak
2010-04-01
This paper describes a method and system of human-like attention and object segmentation in visual scenes that (1) attends to regions in a scene in their rank of saliency in the image, (2) extracts the boundary of an attended proto-object based on feature contours, and (3) can be biased to boost the attention paid to specific features in a scene, such as those of a desired target object in static and video imagery. The purpose of the system is to identify regions of a scene of potential importance and extract the region data for processing by an object recognition and classification algorithm. The attention process can be performed in a default, bottom-up manner or a directed, top-down manner which will assign a preference to certain features over others. One can apply this system to any static scene, whether that is a still photograph or imagery captured from video. We employ algorithms that are motivated by findings in neuroscience, psychology, and cognitive science to construct a system that is novel in its modular and stepwise approach to the problems of attention and region extraction, its application of a flooding algorithm to break apart an image into smaller proto-objects based on feature density, and its ability to join smaller regions of similar features into larger proto-objects. This approach allows many complicated operations to be carried out by the system in a very short time, approaching real-time. A researcher can use this system as a robust front-end to a larger system that includes object recognition and scene understanding modules; it is engineered to function over a broad range of situations and can be applied to any scene with minimal tuning from the user.
A fast color image enhancement algorithm based on Max Intensity Channel
Sun, Wei; Han, Long; Guo, Baolong; Jia, Wenyan; Sun, Mingui
2014-01-01
In this paper, we extend image enhancement techniques based on the retinex theory imitating human visual perception of scenes containing high illumination variations. This extension achieves simultaneous dynamic range modification, color consistency, and lightness rendition without multi-scale Gaussian filtering which has a certain halo effect. The reflection component is analyzed based on the illumination and reflection imaging model. A new prior named Max Intensity Channel (MIC) is implemented assuming that the reflections of some points in the scene are very high in at least one color channel. Using this prior, the illumination of the scene is obtained directly by performing a gray-scale closing operation and a fast cross-bilateral filtering on the MIC of the input color image. Consequently, the reflection component of each RGB color channel can be determined from the illumination and reflection imaging model. The proposed algorithm estimates the illumination component which is relatively smooth and maintains the edge details in different regions. A satisfactory color rendition is achieved for a class of images that do not satisfy the gray-world assumption implicit to the theoretical foundation of the retinex. Experiments are carried out to compare the new method with several spatial and transform domain methods. Our results indicate that the new method is superior in enhancement applications, improves computation speed, and performs well for images with high illumination variations than other methods. Further comparisons of images from National Aeronautics and Space Administration and a wearable camera eButton have shown a high performance of the new method with better color restoration and preservation of image details. PMID:25110395
A fast color image enhancement algorithm based on Max Intensity Channel.
Sun, Wei; Han, Long; Guo, Baolong; Jia, Wenyan; Sun, Mingui
2014-03-30
In this paper, we extend image enhancement techniques based on the retinex theory imitating human visual perception of scenes containing high illumination variations. This extension achieves simultaneous dynamic range modification, color consistency, and lightness rendition without multi-scale Gaussian filtering which has a certain halo effect. The reflection component is analyzed based on the illumination and reflection imaging model. A new prior named Max Intensity Channel (MIC) is implemented assuming that the reflections of some points in the scene are very high in at least one color channel. Using this prior, the illumination of the scene is obtained directly by performing a gray-scale closing operation and a fast cross-bilateral filtering on the MIC of the input color image. Consequently, the reflection component of each RGB color channel can be determined from the illumination and reflection imaging model. The proposed algorithm estimates the illumination component which is relatively smooth and maintains the edge details in different regions. A satisfactory color rendition is achieved for a class of images that do not satisfy the gray-world assumption implicit to the theoretical foundation of the retinex. Experiments are carried out to compare the new method with several spatial and transform domain methods. Our results indicate that the new method is superior in enhancement applications, improves computation speed, and performs well for images with high illumination variations than other methods. Further comparisons of images from National Aeronautics and Space Administration and a wearable camera eButton have shown a high performance of the new method with better color restoration and preservation of image details.
A fast color image enhancement algorithm based on Max Intensity Channel
NASA Astrophysics Data System (ADS)
Sun, Wei; Han, Long; Guo, Baolong; Jia, Wenyan; Sun, Mingui
2014-03-01
In this paper, we extend image enhancement techniques based on the retinex theory imitating human visual perception of scenes containing high illumination variations. This extension achieves simultaneous dynamic range modification, color consistency, and lightness rendition without multi-scale Gaussian filtering which has a certain halo effect. The reflection component is analyzed based on the illumination and reflection imaging model. A new prior named Max Intensity Channel (MIC) is implemented assuming that the reflections of some points in the scene are very high in at least one color channel. Using this prior, the illumination of the scene is obtained directly by performing a gray-scale closing operation and a fast cross-bilateral filtering on the MIC of the input color image. Consequently, the reflection component of each RGB color channel can be determined from the illumination and reflection imaging model. The proposed algorithm estimates the illumination component which is relatively smooth and maintains the edge details in different regions. A satisfactory color rendition is achieved for a class of images that do not satisfy the gray-world assumption implicit to the theoretical foundation of the retinex. Experiments are carried out to compare the new method with several spatial and transform domain methods. Our results indicate that the new method is superior in enhancement applications, improves computation speed, and performs well for images with high illumination variations than other methods. Further comparisons of images from National Aeronautics and Space Administration and a wearable camera eButton have shown a high performance of the new method with better color restoration and preservation of image details.
Helo, Andrea; van Ommen, Sandrien; Pannasch, Sebastian; Danteny-Dordoigne, Lucile; Rämä, Pia
2017-11-01
Conceptual representations of everyday scenes are built in interaction with visual environment and these representations guide our visual attention. Perceptual features and object-scene semantic consistency have been found to attract our attention during scene exploration. The present study examined how visual attention in 24-month-old toddlers is attracted by semantic violations and how perceptual features (i. e. saliency, centre distance, clutter and object size) and linguistic properties (i. e. object label frequency and label length) affect gaze distribution. We compared eye movements of 24-month-old toddlers and adults while exploring everyday scenes which either contained an inconsistent (e.g., soap on a breakfast table) or consistent (e.g., soap in a bathroom) object. Perceptual features such as saliency, centre distance and clutter of the scene affected looking times in the toddler group during the whole viewing time whereas looking times in adults were affected only by centre distance during the early viewing time. Adults looked longer to inconsistent than consistent objects either if the objects had a high or a low saliency. In contrast, toddlers presented semantic consistency effect only when objects were highly salient. Additionally, toddlers with lower vocabulary skills looked longer to inconsistent objects while toddlers with higher vocabulary skills look equally long to both consistent and inconsistent objects. Our results indicate that 24-month-old children use scene context to guide visual attention when exploring the visual environment. However, perceptual features have a stronger influence in eye movement guidance in toddlers than in adults. Our results also indicate that language skills influence cognitive but not perceptual guidance of eye movements during scene perception in toddlers. Copyright © 2017 Elsevier Inc. All rights reserved.
Scene analysis for a breadboard Mars robot functioning in an indoor environment
NASA Technical Reports Server (NTRS)
Levine, M. D.
1973-01-01
The problem is delt with of computer perception in an indoor laboratory environment containing rocks of various sizes. The sensory data processing is required for the NASA/JPL breadboard mobile robot that is a test system for an adaptive variably-autonomous vehicle that will conduct scientific explorations on the surface of Mars. Scene analysis is discussed in terms of object segmentation followed by feature extraction, which results in a representation of the scene in the robot's world model.
Small maritime target detection through false color fusion
NASA Astrophysics Data System (ADS)
Toet, Alexander; Wu, Tirui
2008-04-01
We present an algorithm that produces a fused false color representation of a combined multiband IR and visual imaging system for maritime applications. Multispectral IR imaging techniques are increasingly deployed in maritime operations, to detect floating mines or to find small dinghies and swimmers during search and rescue operations. However, maritime backgrounds usually contain a large amount of clutter that severely hampers the detection of small targets. Our new algorithm deploys the correlation between the target signatures in two different IR frequency bands (3-5 and 8-12 μm) to construct a fused IR image with a reduced amount of clutter. The fused IR image is then combined with a visual image in a false color RGB representation for display to a human operator. The algorithm works as follows. First, both individual IR bands are filtered with a morphological opening top-hat transform to extract small details. Second, a common image is extracted from the two filtered IR bands, and assigned to the red channel of an RGB image. Regions of interest that appear in both IR bands remain in this common image, while most uncorrelated noise details are filtered out. Third, the visual band is assigned to the green channel and, after multiplication with a constant (typically 1.6) also to the blue channel. Fourth, the brightness and colors of this intermediate false color image are renormalized by adjusting its first order statistics to those of a representative reference scene. The result of these four steps is a fused color image, with naturalistic colors (bluish sky and grayish water), in which small targets are clearly visible.
Scene analysis for effective visual search in rough three-dimensional-modeling scenes
NASA Astrophysics Data System (ADS)
Wang, Qi; Hu, Xiaopeng
2016-11-01
Visual search is a fundamental technology in the computer vision community. It is difficult to find an object in complex scenes when there exist similar distracters in the background. We propose a target search method in rough three-dimensional-modeling scenes based on a vision salience theory and camera imaging model. We give the definition of salience of objects (or features) and explain the way that salience measurements of objects are calculated. Also, we present one type of search path that guides to the target through salience objects. Along the search path, when the previous objects are localized, the search region of each subsequent object decreases, which is calculated through imaging model and an optimization method. The experimental results indicate that the proposed method is capable of resolving the ambiguities resulting from distracters containing similar visual features with the target, leading to an improvement of search speed by over 50%.
Scene-based nonuniformity correction using local constant statistics.
Zhang, Chao; Zhao, Wenyi
2008-06-01
In scene-based nonuniformity correction, the statistical approach assumes all possible values of the true-scene pixel are seen at each pixel location. This global-constant-statistics assumption does not distinguish fixed pattern noise from spatial variations in the average image. This often causes the "ghosting" artifacts in the corrected images since the existing spatial variations are treated as noises. We introduce a new statistical method to reduce the ghosting artifacts. Our method proposes a local-constant statistics that assumes that the temporal signal distribution is not constant at each pixel but is locally true. This considers statistically a constant distribution in a local region around each pixel but uneven distribution in a larger scale. Under the assumption that the fixed pattern noise concentrates in a higher spatial-frequency domain than the distribution variation, we apply a wavelet method to the gain and offset image of the noise and separate out the pattern noise from the spatial variations in the temporal distribution of the scene. We compare the results to the global-constant-statistics method using a clean sequence with large artificial pattern noises. We also apply the method to a challenging CCD video sequence and a LWIR sequence to show how effective it is in reducing noise and the ghosting artifacts.
Delcasso, Sébastien; Huh, Namjung; Byeon, Jung Seop; Lee, Jihyun; Jung, Min Whan; Lee, Inah
2014-11-19
The hippocampus is important for contextual behavior, and the striatum plays key roles in decision making. When studying the functional relationships with the hippocampus, prior studies have focused mostly on the dorsolateral striatum (DLS), emphasizing the antagonistic relationships between the hippocampus and DLS in spatial versus response learning. By contrast, the functional relationships between the dorsomedial striatum (DMS) and hippocampus are relatively unknown. The current study reports that lesions to both the hippocampus and DMS profoundly impaired performance of rats in a visual scene-based memory task in which the animals were required to make a choice response by using visual scenes displayed in the background. Analysis of simultaneous recordings of local field potentials revealed that the gamma oscillatory power was higher in the DMS, but not in CA1, when the rat performed the task using familiar scenes than novel ones. In addition, the CA1-DMS networks increased coherence at γ, but not at θ, rhythm as the rat mastered the task. At the single-unit level, the neuronal populations in CA1 and DMS showed differential firing patterns when responses were made using familiar visual scenes than novel ones. Such learning-dependent firing patterns were observed earlier in the DMS than in CA1 before the rat made choice responses. The present findings suggest that both the hippocampus and DMS process memory representations for visual scenes in parallel with different time courses and that flexible choice action using background visual scenes requires coordinated operations of the hippocampus and DMS at γ frequencies. Copyright © 2014 the authors 0270-6474/14/3415534-14$15.00/0.
NASA Technical Reports Server (NTRS)
2004-01-01
12 November 2004 This Mars Global Surveyor (MGS) Mars Orbiter Camera (MOC) image shows light-toned, sedimentary rock outcrops in the Aureum Chaos region of Mars. On the brightest and steepest slope in this scene, dry talus shed from the outcrop has formed a series of dark fans along its base. These outcrops are located near 3.4oS, 27.5oW. The image covers an area approximately 3 km (1.9 mi) across and sunlight illuminates the scene from the upper left.Liedtke, C E; Aeikens, B
1980-01-01
By segmentation of cell images we understand the automated decomposition of microscopic cell scenes into nucleus, plasma and background. A segmentation is achieved by using information from the microscope image and prior knowledge about the content of the scene. Different algorithms have been investigated and applied to samples of urothelial cells. A particular algorithm based on a histogram approach which can be easily implemented in hardware is discussed in more detail.
The Influence of Familiarity on Affective Responses to Natural Scenes
NASA Astrophysics Data System (ADS)
Sanabria Z., Jorge C.; Cho, Youngil; Yamanaka, Toshimasa
This kansei study explored how familiarity with image-word combinations influences affective states. Stimuli were obtained from Japanese print advertisements (ads), and consisted of images (e.g., natural-scene backgrounds) and their corresponding headlines (advertising copy). Initially, a group of subjects evaluated their level of familiarity with images and headlines independently, and stimuli were filtered based on the results. In the main experiment, a different group of subjects rated their pleasure and arousal to, and familiarity with, image-headline combinations. The Self-Assessment Manikin (SAM) scale was used to evaluate pleasure and arousal, and a bipolar scale was used to evaluate familiarity. The results showed a high correlation between familiarity and pleasure, but low correlation between familiarity and arousal. The characteristics of the stimuli, and their effect on the variables of pleasure, arousal and familiarity, were explored through ANOVA. It is suggested that, in the case of natural-scene ads, familiarity with image-headline combinations may increase the pleasure response to the ads, and that certain components in the images (e.g., water) may increase arousal levels.
Measurable realistic image-based 3D mapping
NASA Astrophysics Data System (ADS)
Liu, W.; Wang, J.; Wang, J. J.; Ding, W.; Almagbile, A.
2011-12-01
Maps with 3D visual models are becoming a remarkable feature of 3D map services. High-resolution image data is obtained for the construction of 3D visualized models.The3D map not only provides the capabilities of 3D measurements and knowledge mining, but also provides the virtual experienceof places of interest, such as demonstrated in the Google Earth. Applications of 3D maps are expanding into the areas of architecture, property management, and urban environment monitoring. However, the reconstruction of high quality 3D models is time consuming, and requires robust hardware and powerful software to handle the enormous amount of data. This is especially for automatic implementation of 3D models and the representation of complicated surfacesthat still need improvements with in the visualisation techniques. The shortcoming of 3D model-based maps is the limitation of detailed coverage since a user can only view and measure objects that are already modelled in the virtual environment. This paper proposes and demonstrates a 3D map concept that is realistic and image-based, that enables geometric measurements and geo-location services. Additionally, image-based 3D maps provide more detailed information of the real world than 3D model-based maps. The image-based 3D maps use geo-referenced stereo images or panoramic images. The geometric relationships between objects in the images can be resolved from the geometric model of stereo images. The panoramic function makes 3D maps more interactive with users but also creates an interesting immersive circumstance. Actually, unmeasurable image-based 3D maps already exist, such as Google street view, but only provide virtual experiences in terms of photos. The topographic and terrain attributes, such as shapes and heights though are omitted. This paper also discusses the potential for using a low cost land Mobile Mapping System (MMS) to implement realistic image 3D mapping, and evaluates the positioning accuracy that a measureable realistic image-based (MRI) system can produce. The major contribution here is the implementation of measurable images on 3D maps to obtain various measurements from real scenes.
Selective attention during scene perception: evidence from negative priming.
Gordon, Robert D
2006-10-01
In two experiments, we examined the role of semantic scene content in guiding attention during scene viewing. In each experiment, performance on a lexical decision task was measured following the brief presentation of a scene. The lexical decision stimulus named an object that was either present or not present in the scene. The results of Experiment 1 revealed no priming from inconsistent objects (whose identities conflicted with the scene in which they appeared), but negative priming from consistent objects. The results of Experiment 2 indicated that negative priming from consistent objects occurs only when inconsistent objects are present in the scenes. Together, the results suggest that observers are likely to attend to inconsistent objects, and that representations of consistent objects are suppressed in the presence of an inconsistent object. Furthermore, the data suggest that inconsistent objects draw attention because they are relatively difficult to identify in an inappropriate context.
Contextual descriptors and neural networks for scene analysis in VHR SAR images
NASA Astrophysics Data System (ADS)
Del Frate, Fabio; Picchiani, Matteo; Falasco, Alessia; Schiavon, Giovanni
2016-10-01
The development of SAR technology during the last decade has made it possible to collect a huge amount of data over many regions of the world. In particular, the availability of SAR images from different sensors, with metric or sub-metric spatial resolution, offers novel opportunities in different fields as land cover, urban monitoring, soil consumption etc. On the other hand, automatic approaches become crucial for the exploitation of such a huge amount of information. In such a scenario, especially if single polarization images are considered, the main issue is to select appropriate contextual descriptors, since the backscattering coefficient of a single pixel may not be sufficient to classify an object on the scene. In this paper a comparison among three different approaches for contextual features definition is presented so as to design optimum procedures for VHR SAR scene understanding. The first approach is based on Gray Level Co- Occurrence Matrix since it is widely accepted and several studies have used it for land cover classification with SAR data. The second approach is based on the Fourier spectra and it has been already proposed with positive results for this kind of problems, the third one is based on Auto-associative Neural Networks which have been already proven effective for features extraction from polarimetric SAR images. The three methods are evaluated in terms of the accuracy of the classified scene when the features extracted using each method are considered as input to a neural network classificator and applied on different Cosmo-SkyMed spotlight products.
NASA Astrophysics Data System (ADS)
Li, Shuo; Jin, Weiqi; Li, Li; Li, Yiyang
2018-05-01
Infrared thermal images can reflect the thermal-radiation distribution of a particular scene. However, the contrast of the infrared images is usually low. Hence, it is generally necessary to enhance the contrast of infrared images in advance to facilitate subsequent recognition and analysis. Based on the adaptive double plateaus histogram equalization, this paper presents an improved contrast enhancement algorithm for infrared thermal images. In the proposed algorithm, the normalized coefficient of variation of the histogram, which characterizes the level of contrast enhancement, is introduced as feedback information to adjust the upper and lower plateau thresholds. The experiments on actual infrared images show that compared to the three typical contrast-enhancement algorithms, the proposed algorithm has better scene adaptability and yields better contrast-enhancement results for infrared images with more dark areas or a higher dynamic range. Hence, it has high application value in contrast enhancement, dynamic range compression, and digital detail enhancement for infrared thermal images.
1989-03-01
Toys, is a model of the dinosaur Tyrannosaurus Rex . This particular test case is characterized by sharply discontinuous depths varying over a wide...are not shown in these figures). 7B-C-13 Figure 7: T. Rex Scene - Figure 8: T. Rex Scene - Left Image of Tinker Right Image Toy Object (j 1/’.) C...8217: Figure 9: T. Rex Scene - Figure 10: T. Rex Scene - Connected Contours Extracted Connected Contours Extracted from Left Image from Right Image 7B-C-14 400
HDR imaging and color constancy: two sides of the same coin?
NASA Astrophysics Data System (ADS)
McCann, John J.
2011-01-01
At first, we think that High Dynamic Range (HDR) imaging is a technique for improved recordings of scene radiances. Many of us think that human color constancy is a variation of a camera's automatic white balance algorithm. However, on closer inspection, glare limits the range of light we can detect in cameras and on retinas. All scene regions below middle gray are influenced, more or less, by the glare from the bright scene segments. Instead of accurate radiance reproduction, HDR imaging works well because it preserves the details in the scene's spatial contrast. Similarly, on closer inspection, human color constancy depends on spatial comparisons that synthesize appearances from all the scene segments. Can spatial image processing play similar principle roles in both HDR imaging and color constancy?
The research of multi-frame target recognition based on laser active imaging
NASA Astrophysics Data System (ADS)
Wang, Can-jin; Sun, Tao; Wang, Tin-feng; Chen, Juan
2013-09-01
Laser active imaging is fit to conditions such as no difference in temperature between target and background, pitch-black night, bad visibility. Also it can be used to detect a faint target in long range or small target in deep space, which has advantage of high definition and good contrast. In one word, it is immune to environment. However, due to the affect of long distance, limited laser energy and atmospheric backscatter, it is impossible to illuminate the whole scene at the same time. It means that the target in every single frame is unevenly or partly illuminated, which make the recognition more difficult. At the same time the speckle noise which is common in laser active imaging blurs the images . In this paper we do some research on laser active imaging and propose a new target recognition method based on multi-frame images . Firstly, multi pulses of laser is used to obtain sub-images for different parts of scene. A denoising method combined homomorphic filter with wavelet domain SURE is used to suppress speckle noise. And blind deconvolution is introduced to obtain low-noise and clear sub-images. Then these sub-images are registered and stitched to combine a completely and uniformly illuminated scene image. After that, a new target recognition method based on contour moments is proposed. Firstly, canny operator is used to obtain contours. For each contour, seven invariant Hu moments are calculated to generate the feature vectors. At last the feature vectors are input into double hidden layers BP neural network for classification . Experiments results indicate that the proposed algorithm could achieve a high recognition rate and satisfactory real-time performance for laser active imaging.
a New Color Correction Method for Underwater Imaging
NASA Astrophysics Data System (ADS)
Bianco, G.; Muzzupappa, M.; Bruno, F.; Garcia, R.; Neumann, L.
2015-04-01
Recovering correct or at least realistic colors of underwater scenes is a very challenging issue for imaging techniques, since illumination conditions in a refractive and turbid medium as the sea are seriously altered. The need to correct colors of underwater images or videos is an important task required in all image-based applications like 3D imaging, navigation, documentation, etc. Many imaging enhancement methods have been proposed in literature for these purposes. The advantage of these methods is that they do not require the knowledge of the medium physical parameters while some image adjustments can be performed manually (as histogram stretching) or automatically by algorithms based on some criteria as suggested from computational color constancy methods. One of the most popular criterion is based on gray-world hypothesis, which assumes that the average of the captured image should be gray. An interesting application of this assumption is performed in the Ruderman opponent color space lαβ, used in a previous work for hue correction of images captured under colored light sources, which allows to separate the luminance component of the scene from its chromatic components. In this work, we present the first proposal for color correction of underwater images by using lαβ color space. In particular, the chromatic components are changed moving their distributions around the white point (white balancing) and histogram cutoff and stretching of the luminance component is performed to improve image contrast. The experimental results demonstrate the effectiveness of this method under gray-world assumption and supposing uniform illumination of the scene. Moreover, due to its low computational cost it is suitable for real-time implementation.
NASA Astrophysics Data System (ADS)
Kuvich, Gary
2004-08-01
Vision is only a part of a system that converts visual information into knowledge structures. These structures drive the vision process, resolving ambiguity and uncertainty via feedback, and provide image understanding, which is an interpretation of visual information in terms of these knowledge models. These mechanisms provide a reliable recognition if the object is occluded or cannot be recognized as a whole. It is hard to split the entire system apart, and reliable solutions to the target recognition problems are possible only within the solution of a more generic Image Understanding Problem. Brain reduces informational and computational complexities, using implicit symbolic coding of features, hierarchical compression, and selective processing of visual information. Biologically inspired Network-Symbolic representation, where both systematic structural/logical methods and neural/statistical methods are parts of a single mechanism, is the most feasible for such models. It converts visual information into relational Network-Symbolic structures, avoiding artificial precise computations of 3-dimensional models. Network-Symbolic Transformations derive abstract structures, which allows for invariant recognition of an object as exemplar of a class. Active vision helps creating consistent models. Attention, separation of figure from ground and perceptual grouping are special kinds of network-symbolic transformations. Such Image/Video Understanding Systems will be reliably recognizing targets.
Bilinear Convolutional Neural Networks for Fine-grained Visual Recognition.
Lin, Tsung-Yu; RoyChowdhury, Aruni; Maji, Subhransu
2017-07-04
We present a simple and effective architecture for fine-grained recognition called Bilinear Convolutional Neural Networks (B-CNNs). These networks represent an image as a pooled outer product of features derived from two CNNs and capture localized feature interactions in a translationally invariant manner. B-CNNs are related to orderless texture representations built on deep features but can be trained in an end-to-end manner. Our most accurate model obtains 84.1%, 79.4%, 84.5% and 91.3% per-image accuracy on the Caltech-UCSD birds [66], NABirds [63], FGVC aircraft [42], and Stanford cars [33] dataset respectively and runs at 30 frames-per-second on a NVIDIA Titan X GPU. We then present a systematic analysis of these networks and show that (1) the bilinear features are highly redundant and can be reduced by an order of magnitude in size without significant loss in accuracy, (2) are also effective for other image classification tasks such as texture and scene recognition, and (3) can be trained from scratch on the ImageNet dataset offering consistent improvements over the baseline architecture. Finally, we present visualizations of these models on various datasets using top activations of neural units and gradient-based inversion techniques. The source code for the complete system is available at http://vis-www.cs.umass.edu/bcnn.
Referenceless perceptual fog density prediction model
NASA Astrophysics Data System (ADS)
Choi, Lark Kwon; You, Jaehee; Bovik, Alan C.
2014-02-01
We propose a perceptual fog density prediction model based on natural scene statistics (NSS) and "fog aware" statistical features, which can predict the visibility in a foggy scene from a single image without reference to a corresponding fogless image, without side geographical camera information, without training on human-rated judgments, and without dependency on salient objects such as lane markings or traffic signs. The proposed fog density predictor only makes use of measurable deviations from statistical regularities observed in natural foggy and fog-free images. A fog aware collection of statistical features is derived from a corpus of foggy and fog-free images by using a space domain NSS model and observed characteristics of foggy images such as low contrast, faint color, and shifted intensity. The proposed model not only predicts perceptual fog density for the entire image but also provides a local fog density index for each patch. The predicted fog density of the model correlates well with the measured visibility in a foggy scene as measured by judgments taken in a human subjective study on a large foggy image database. As one application, the proposed model accurately evaluates the performance of defog algorithms designed to enhance the visibility of foggy images.
NASA Astrophysics Data System (ADS)
Hu, Ruiguang; Xiao, Liping; Zheng, Wenjuan
2015-12-01
In this paper, multi-kernel learning(MKL) is used for drug-related webpages classification. First, body text and image-label text are extracted through HTML parsing, and valid images are chosen by the FOCARSS algorithm. Second, text based BOW model is used to generate text representation, and image-based BOW model is used to generate images representation. Last, text and images representation are fused with a few methods. Experimental results demonstrate that the classification accuracy of MKL is higher than those of all other fusion methods in decision level and feature level, and much higher than the accuracy of single-modal classification.
Improved integral images compression based on multi-view extraction
NASA Astrophysics Data System (ADS)
Dricot, Antoine; Jung, Joel; Cagnazzo, Marco; Pesquet, Béatrice; Dufaux, Frédéric
2016-09-01
Integral imaging is a technology based on plenoptic photography that captures and samples the light-field of a scene through a micro-lens array. It provides views of the scene from several angles and therefore is foreseen as a key technology for future immersive video applications. However, integral images have a large resolution and a structure based on micro-images which is challenging to encode. A compression scheme for integral images based on view extraction has previously been proposed, with average BD-rate gains of 15.7% (up to 31.3%) reported over HEVC when using one single extracted view. As the efficiency of the scheme depends on a tradeoff between the bitrate required to encode the view and the quality of the image reconstructed from the view, it is proposed to increase the number of extracted views. Several configurations are tested with different positions and different number of extracted views. Compression efficiency is increased with average BD-rate gains of 22.2% (up to 31.1%) reported over the HEVC anchor, with a realistic runtime increase.
a Novel Framework for Remote Sensing Image Scene Classification
NASA Astrophysics Data System (ADS)
Jiang, S.; Zhao, H.; Wu, W.; Tan, Q.
2018-04-01
High resolution remote sensing (HRRS) images scene classification aims to label an image with a specific semantic category. HRRS images contain more details of the ground objects and their spatial distribution patterns than low spatial resolution images. Scene classification can bridge the gap between low-level features and high-level semantics. It can be applied in urban planning, target detection and other fields. This paper proposes a novel framework for HRRS images scene classification. This framework combines the convolutional neural network (CNN) and XGBoost, which utilizes CNN as feature extractor and XGBoost as a classifier. Then, this framework is evaluated on two different HRRS images datasets: UC-Merced dataset and NWPU-RESISC45 dataset. Our framework achieved satisfying accuracies on two datasets, which is 95.57 % and 83.35 % respectively. From the experiments result, our framework has been proven to be effective for remote sensing images classification. Furthermore, we believe this framework will be more practical for further HRRS scene classification, since it costs less time on training stage.
Gray-world-assumption-based illuminant color estimation using color gamuts with high and low chroma
NASA Astrophysics Data System (ADS)
Kawamura, Harumi; Yonemura, Shunichi; Ohya, Jun; Kojima, Akira
2013-02-01
A new approach is proposed for estimating illuminant colors from color images under an unknown scene illuminant. The approach is based on a combination of a gray-world-assumption-based illuminant color estimation method and a method using color gamuts. The former method, which is one we had previously proposed, improved on the original method that hypothesizes that the average of all the object colors in a scene is achromatic. Since the original method estimates scene illuminant colors by calculating the average of all the image pixel values, its estimations are incorrect when certain image colors are dominant. Our previous method improves on it by choosing several colors on the basis of an opponent-color property, which is that the average color of opponent colors is achromatic, instead of using all colors. However, it cannot estimate illuminant colors when there are only a few image colors or when the image colors are unevenly distributed in local areas in the color space. The approach we propose in this paper combines our previous method and one using high chroma and low chroma gamuts, which makes it possible to find colors that satisfy the gray world assumption. High chroma gamuts are used for adding appropriate colors to the original image and low chroma gamuts are used for narrowing down illuminant color possibilities. Experimental results obtained using actual images show that even if the image colors are localized in a certain area in the color space, the illuminant colors are accurately estimated, with smaller estimation error average than that generated in the conventional method.
ERIC Educational Resources Information Center
Brady, Timothy F.; Tenenbaum, Joshua B.
2013-01-01
When remembering a real-world scene, people encode both detailed information about specific objects and higher order information like the overall gist of the scene. However, formal models of change detection, like those used to estimate visual working memory capacity, assume observers encode only a simple memory representation that includes no…
Hierarchy-associated semantic-rule inference framework for classifying indoor scenes
NASA Astrophysics Data System (ADS)
Yu, Dan; Liu, Peng; Ye, Zhipeng; Tang, Xianglong; Zhao, Wei
2016-03-01
Typically, the initial task of classifying indoor scenes is challenging, because the spatial layout and decoration of a scene can vary considerably. Recent efforts at classifying object relationships commonly depend on the results of scene annotation and predefined rules, making classification inflexible. Furthermore, annotation results are easily affected by external factors. Inspired by human cognition, a scene-classification framework was proposed using the empirically based annotation (EBA) and a match-over rule-based (MRB) inference system. The semantic hierarchy of images is exploited by EBA to construct rules empirically for MRB classification. The problem of scene classification is divided into low-level annotation and high-level inference from a macro perspective. Low-level annotation involves detecting the semantic hierarchy and annotating the scene with a deformable-parts model and a bag-of-visual-words model. In high-level inference, hierarchical rules are extracted to train the decision tree for classification. The categories of testing samples are generated from the parts to the whole. Compared with traditional classification strategies, the proposed semantic hierarchy and corresponding rules reduce the effect of a variable background and improve the classification performance. The proposed framework was evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.
Concurrent-scene/alternate-pattern analysis for robust video-based docking systems
NASA Technical Reports Server (NTRS)
Udomkesmalee, Suraphol
1991-01-01
A typical docking target employs a three-point design of retroreflective tape, one at each endpoint of the center-line, and one on the tip of the central post. Scenes, sensed via laser diode illumination, produce pictures with spots corresponding to desired reflection from the retroreflectors and other reflections. Control corrections for each axis of the vehicle can then be properly applied if the desired spots are accurately tracked. However, initial acquisition of these three spots (detection and identification problem) are non-trivial under a severe noise environment. Signal-to-noise enhancement, accomplished by subtracting the non-illuminated scene from the target scene illuminated by laser diodes, can not eliminate every false spot. Hence, minimization of docking failures due to target mistracking would suggest needed inclusion of added processing features pertaining to target locations. In this paper, we present a concurrent processing scheme for a modified docking target scene which could lead to a perfect docking system. Since the non-illuminated target scene is already available, adding another feature to the three-point design by marking two non-reflective lines, one between the two end-points and one from the tip of the central post to the center-line, would allow this line feature to be picked-up only when capturing the background scene (sensor data without laser illumination). Therefore, instead of performing the image subtraction to generate a picture with a high signal-to-noise ratio, a processed line-image based on the robust line detection technique (Hough transform) can be used to fuse with the actively sensed three-point target image to deduce the true locations of the docking target. This dual-channel confirmation scheme is necessary if a fail-safe system is to be realized from both the sensing and processing point-of-views. Detailed algorithms and preliminary results are presented.
Detection of chromatic and luminance distortions in natural scenes.
Jennings, Ben J; Wang, Karen; Menzies, Samantha; Kingdom, Frederick A A
2015-09-01
A number of studies have measured visual thresholds for detecting spatial distortions applied to images of natural scenes. In one study, Bex [J. Vis.10(2), 1 (2010)10.1167/10.2.231534-7362] measured sensitivity to sinusoidal spatial modulations of image scale. Here, we measure sensitivity to sinusoidal scale distortions applied to the chromatic, luminance, or both layers of natural scene images. We first established that sensitivity does not depend on whether the undistorted comparison image was of the same or of a different scene. Next, we found that, when the luminance but not chromatic layer was distorted, performance was the same regardless of whether the chromatic layer was present, absent, or phase-scrambled; in other words, the chromatic layer, in whatever form, did not affect sensitivity to the luminance layer distortion. However, when the chromatic layer was distorted, sensitivity was higher when the luminance layer was intact compared to when absent or phase-scrambled. These detection threshold results complement the appearance of periodic distortions of the image scale: when the luminance layer is distorted visibly, the scene appears distorted, but when the chromatic layer is distorted visibly, there is little apparent scene distortion. We conclude that (a) observers have a built-in sense of how a normal image of a natural scene should appear, and (b) the detection of distortion in, as well as the apparent distortion of, natural scene images is mediated predominantly by the luminance layer and not chromatic layer.
Bornik, Alexander; Urschler, Martin; Schmalstieg, Dieter; Bischof, Horst; Krauskopf, Astrid; Schwark, Thorsten; Scheurer, Eva; Yen, Kathrin
2018-06-01
Three-dimensional (3D) crime scene documentation using 3D scanners and medical imaging modalities like computed tomography (CT) and magnetic resonance imaging (MRI) are increasingly applied in forensic casework. Together with digital photography, these modalities enable comprehensive and non-invasive recording of forensically relevant information regarding injuries/pathologies inside the body and on its surface. Furthermore, it is possible to capture traces and items at crime scenes. Such digitally secured evidence has the potential to similarly increase case understanding by forensic experts and non-experts in court. Unlike photographs and 3D surface models, images from CT and MRI are not self-explanatory. Their interpretation and understanding requires radiological knowledge. Findings in tomography data must not only be revealed, but should also be jointly studied with all the 2D and 3D data available in order to clarify spatial interrelations and to optimally exploit the data at hand. This is technically challenging due to the heterogeneous data representations including volumetric data, polygonal 3D models, and images. This paper presents a novel computer-aided forensic toolbox providing tools to support the analysis, documentation, annotation, and illustration of forensic cases using heterogeneous digital data. Conjoint visualization of data from different modalities in their native form and efficient tools to visually extract and emphasize findings help experts to reveal unrecognized correlations and thereby enhance their case understanding. Moreover, the 3D case illustrations created for case analysis represent an efficient means to convey the insights gained from case analysis to forensic non-experts involved in court proceedings like jurists and laymen. The capability of the presented approach in the context of case analysis, its potential to speed up legal procedures and to ultimately enhance legal certainty is demonstrated by introducing a number of representative forensic cases. Copyright © 2018 The Author(s). Published by Elsevier B.V. All rights reserved.
Use of the TM tasseled cap transform for interpretation of spectral contrasts in an urban scene
NASA Technical Reports Server (NTRS)
Goward, S. N.; Wharton, S. W.
1984-01-01
Investigations are being conducted with the objective to develop automated numerical image analysis procedures. In this context, an examination is performed of physically-based multispectral data transforms as a means to incorporate a priori knowledge of land radiance properties in the analysis process. A physically-based transform of TM observations was developed. This transform extends the Landsat MSS Tasseled Cap transform reported by Kauth and Thomas (1976) to TM data observations. The present study has the aim to examine the utility of the TM Tasseled Cap transform as applied to TM data from an urban landscape. The analysis conducted is based on 512 x 512 subset of the Washington, DC November 2, 1982 TM scene, centered on Springfield, VA. It appears that the TM tasseled cap transformation provides a good means to explain land physical attributes of the Washington scene. This result provides a suggestion regarding a direction by which a priori knowledge of landscape spectral patterns may be incorporated into numerical image analysis.
Robust, Efficient Depth Reconstruction With Hierarchical Confidence-Based Matching.
Sun, Li; Chen, Ke; Song, Mingli; Tao, Dacheng; Chen, Gang; Chen, Chun
2017-07-01
In recent years, taking photos and capturing videos with mobile devices have become increasingly popular. Emerging applications based on the depth reconstruction technique have been developed, such as Google lens blur. However, depth reconstruction is difficult due to occlusions, non-diffuse surfaces, repetitive patterns, and textureless surfaces, and it has become more difficult due to the unstable image quality and uncontrolled scene condition in the mobile setting. In this paper, we present a novel hierarchical framework with multi-view confidence-based matching for robust, efficient depth reconstruction in uncontrolled scenes. Particularly, the proposed framework combines local cost aggregation with global cost optimization in a complementary manner that increases efficiency and accuracy. A depth map is efficiently obtained in a coarse-to-fine manner by using an image pyramid. Moreover, confidence maps are computed to robustly fuse multi-view matching cues, and to constrain the stereo matching on a finer scale. The proposed framework has been evaluated with challenging indoor and outdoor scenes, and has achieved robust and efficient depth reconstruction.
Recent Experiments Conducted with the Wide-Field Imaging Interferometry Testbed (WIIT)
NASA Technical Reports Server (NTRS)
Leisawitz, David T.; Juanola-Parramon, Roser; Bolcar, Matthew; Iacchetta, Alexander S.; Maher, Stephen F.; Rinehart, Stephen A.
2016-01-01
The Wide-field Imaging Interferometry Testbed (WIIT) was developed at NASA's Goddard Space Flight Center to demonstrate and explore the practical limitations inherent in wide field-of-view double Fourier (spatio-spectral) interferometry. The testbed delivers high-quality interferometric data and is capable of observing spatially and spectrally complex hyperspectral test scenes. Although WIIT operates at visible wavelengths, by design the data are representative of those from a space-based far-infrared observatory. We used WIIT to observe a calibrated, independently characterized test scene of modest spatial and spectral complexity, and an astronomically realistic test scene of much greater spatial and spectral complexity. This paper describes the experimental setup, summarizes the performance of the testbed, and presents representative data.
Fly-through viewpoint video system for multi-view soccer movie using viewpoint interpolation
NASA Astrophysics Data System (ADS)
Inamoto, Naho; Saito, Hideo
2003-06-01
This paper presents a novel method for virtual view generation that allows viewers to fly through in a real soccer scene. A soccer match is captured by multiple cameras at a stadium and images of arbitrary viewpoints are synthesized by view-interpolation of two real camera images near the given viewpoint. In the proposed method, cameras do not need to be strongly calibrated, but epipolar geometry between the cameras is sufficient for the view-interpolation. Therefore, it can easily be applied to a dynamic event even in a large space, because the efforts for camera calibration can be reduced. A soccer scene is classified into several regions and virtual view images are generated based on the epipolar geometry in each region. Superimposition of the images completes virtual views for the whole soccer scene. An application for fly-through observation of a soccer match is introduced as well as the algorithm of the view-synthesis and experimental results..
On-line object feature extraction for multispectral scene representation
NASA Technical Reports Server (NTRS)
Ghassemian, Hassan; Landgrebe, David
1988-01-01
A new on-line unsupervised object-feature extraction method is presented that reduces the complexity and costs associated with the analysis of the multispectral image data and data transmission, storage, archival and distribution. The ambiguity in the object detection process can be reduced if the spatial dependencies, which exist among the adjacent pixels, are intelligently incorporated into the decision making process. The unity relation was defined that must exist among the pixels of an object. Automatic Multispectral Image Compaction Algorithm (AMICA) uses the within object pixel-feature gradient vector as a valuable contextual information to construct the object's features, which preserve the class separability information within the data. For on-line object extraction the path-hypothesis and the basic mathematical tools for its realization are introduced in terms of a specific similarity measure and adjacency relation. AMICA is applied to several sets of real image data, and the performance and reliability of features is evaluated.
Cronly-Dillon, J; Persaud, K; Gregory, R P
1999-01-01
This study demonstrates the ability of blind (previously sighted) and blindfolded (sighted) subjects in reconstructing and identifying a number of visual targets transformed into equivalent musical representations. Visual images are deconstructed through a process which selectively segregates different features of the image into separate packages. These are then encoded in sound and presented as a polyphonic musical melody which resembles a Baroque fugue with many voices, allowing subjects to analyse the component voices selectively in combination, or separately in sequence, in a manner which allows a subject to patch together and bind the different features of the object mentally into a mental percept of a single recognizable entity. The visual targets used in this study included a variety of geometrical figures, simple high-contrast line drawings of man-made objects, natural and urban scenes, etc., translated into sound and presented to the subject in polyphonic musical form. PMID:10643086
Radiologists remember mountains better than radiographs, or do they?
Evans, Karla K.; Marom, Edith M.; Godoy, Myrna C. B.; Palacio, Diana; Sagebiel, Tara; Cuellar, Sonia Betancourt; McEntee, Mark; Tian, Charles; Brennan, Patrick C.; Haygood, Tamara Miner
2015-01-01
Abstract. Expertise with encoding material has been shown to aid long-term memory for that material. It is not clear how relevant this expertise is for image memorability (e.g., radiologists’ memory for radiographs), and how robust over time. In two studies, we tested scene memory using a standard long-term memory paradigm. One compared the performance of radiologists to naïve observers on two image sets, chest radiographs and everyday scenes, and the other radiologists’ memory with immediate as opposed to delayed recognition tests using musculoskeletal radiographs and forest scenes. Radiologists’ memory was better than novices for images of expertise but no different for everyday scenes. With the heterogeneity of image sets equated, radiologists’ expertise with radiographs afforded them better memory for the musculoskeletal radiographs than forest scenes. Enhanced memory for images of expertise disappeared over time, resulting in chance level performance for both image sets after weeks of delay. Expertise with the material is important for visual memorability but not to the same extent as idiosyncratic detail and variability of the image set. Similar memory decline with time for images of expertise as for everyday scenes further suggests that extended familiarity with an image is not a robust factor for visual memorability. PMID:26870748
Fuzzy Classification of High Resolution Remote Sensing Scenes Using Visual Attention Features.
Li, Linyi; Xu, Tingbao; Chen, Yun
2017-01-01
In recent years the spatial resolutions of remote sensing images have been improved greatly. However, a higher spatial resolution image does not always lead to a better result of automatic scene classification. Visual attention is an important characteristic of the human visual system, which can effectively help to classify remote sensing scenes. In this study, a novel visual attention feature extraction algorithm was proposed, which extracted visual attention features through a multiscale process. And a fuzzy classification method using visual attention features (FC-VAF) was developed to perform high resolution remote sensing scene classification. FC-VAF was evaluated by using remote sensing scenes from widely used high resolution remote sensing images, including IKONOS, QuickBird, and ZY-3 images. FC-VAF achieved more accurate classification results than the others according to the quantitative accuracy evaluation indices. We also discussed the role and impacts of different decomposition levels and different wavelets on the classification accuracy. FC-VAF improves the accuracy of high resolution scene classification and therefore advances the research of digital image analysis and the applications of high resolution remote sensing images.
Fuzzy Classification of High Resolution Remote Sensing Scenes Using Visual Attention Features
Xu, Tingbao; Chen, Yun
2017-01-01
In recent years the spatial resolutions of remote sensing images have been improved greatly. However, a higher spatial resolution image does not always lead to a better result of automatic scene classification. Visual attention is an important characteristic of the human visual system, which can effectively help to classify remote sensing scenes. In this study, a novel visual attention feature extraction algorithm was proposed, which extracted visual attention features through a multiscale process. And a fuzzy classification method using visual attention features (FC-VAF) was developed to perform high resolution remote sensing scene classification. FC-VAF was evaluated by using remote sensing scenes from widely used high resolution remote sensing images, including IKONOS, QuickBird, and ZY-3 images. FC-VAF achieved more accurate classification results than the others according to the quantitative accuracy evaluation indices. We also discussed the role and impacts of different decomposition levels and different wavelets on the classification accuracy. FC-VAF improves the accuracy of high resolution scene classification and therefore advances the research of digital image analysis and the applications of high resolution remote sensing images. PMID:28761440
Locality constrained joint dynamic sparse representation for local matching based face recognition.
Wang, Jianzhong; Yi, Yugen; Zhou, Wei; Shi, Yanjiao; Qi, Miao; Zhang, Ming; Zhang, Baoxue; Kong, Jun
2014-01-01
Recently, Sparse Representation-based Classification (SRC) has attracted a lot of attention for its applications to various tasks, especially in biometric techniques such as face recognition. However, factors such as lighting, expression, pose and disguise variations in face images will decrease the performances of SRC and most other face recognition techniques. In order to overcome these limitations, we propose a robust face recognition method named Locality Constrained Joint Dynamic Sparse Representation-based Classification (LCJDSRC) in this paper. In our method, a face image is first partitioned into several smaller sub-images. Then, these sub-images are sparsely represented using the proposed locality constrained joint dynamic sparse representation algorithm. Finally, the representation results for all sub-images are aggregated to obtain the final recognition result. Compared with other algorithms which process each sub-image of a face image independently, the proposed algorithm regards the local matching-based face recognition as a multi-task learning problem. Thus, the latent relationships among the sub-images from the same face image are taken into account. Meanwhile, the locality information of the data is also considered in our algorithm. We evaluate our algorithm by comparing it with other state-of-the-art approaches. Extensive experiments on four benchmark face databases (ORL, Extended YaleB, AR and LFW) demonstrate the effectiveness of LCJDSRC.
NASA Astrophysics Data System (ADS)
Wang, Yao-yao; Zhang, Juan; Zhao, Xue-wei; Song, Li-pei; Zhang, Bo; Zhao, Xing
2018-03-01
In order to improve depth extraction accuracy, a method using moving array lenslet technique (MALT) in pickup stage is proposed, which can decrease the depth interval caused by pixelation. In this method, the lenslet array is moved along the horizontal and vertical directions simultaneously for N times in a pitch to get N sets of elemental images. Computational integral imaging reconstruction method for MALT is taken to obtain the slice images of the 3D scene, and the sum modulus (SMD) blur metric is taken on these slice images to achieve the depth information of the 3D scene. Simulation and optical experiments are carried out to verify the feasibility of this method.
Scene-based nonuniformity correction with reduced ghosting using a gated LMS algorithm.
Hardie, Russell C; Baxley, Frank; Brys, Brandon; Hytla, Patrick
2009-08-17
In this paper, we present a scene-based nouniformity correction (NUC) method using a modified adaptive least mean square (LMS) algorithm with a novel gating operation on the updates. The gating is designed to significantly reduce ghosting artifacts produced by many scene-based NUC algorithms by halting updates when temporal variation is lacking. We define the algorithm and present a number of experimental results to demonstrate the efficacy of the proposed method in comparison to several previously published methods including other LMS and constant statistics based methods. The experimental results include simulated imagery and a real infrared image sequence. We show that the proposed method significantly reduces ghosting artifacts, but has a slightly longer convergence time. (c) 2009 Optical Society of America
[An image of Saint Ottilia with reading stones].
Daxecker, F; Broucek, A
1995-01-01
Reading stones to facilitate reading in cases of presbyopia are mentioned in the literature, for example in the works of the Middle High German poet Albrecht and of Konrad of Würzburg. Most representations of the abbess, Saint Ottilia, show her holding a book with a pair of eyes in her hands. A gothic altarpiece (1485-1490), kept in the museum of the Premonstratensian Canons of Wilten in Innsbruck, Tyrol, shows a triune representation of St. Anne, the mother of the Virgin, with Mary and Jesus and St. Ursula with her companions. St. Ottilia is depicted on the edge of the painting. Two lenses, one on either side of the open book in her hand, magnify the letters underneath. As the two lenses are not held together by bows or similar devices, they are probably a rare representation of reading stones. The alter showing scenes of the life of St. Mary and St. Ursula was done by Ludwig Konraiter. A panel on the same alter, depicting the death of the Virgin, shows an apostle with rivet spectacles.
A maximally stable extremal region based scene text localization method
NASA Astrophysics Data System (ADS)
Xiao, Chengqiu; Ji, Lixin; Gao, Chao; Li, Shaomei
2015-07-01
Text localization in natural scene images is an important prerequisite for many content-based image analysis tasks. This paper proposes a novel text localization algorithm. Firstly, a fast pruning algorithm is designed to extract Maximally Stable Extremal Regions (MSER) as basic character candidates. Secondly, these candidates are filtered by using the properties of fitting ellipse and the distribution properties of characters to exclude most non-characters. Finally, a new extremal regions projection merging algorithm is designed to group character candidates into words. Experimental results show that the proposed method has an advantage in speed and achieve relatively high precision and recall rates than the latest published algorithms.
Knowledge-based machine vision systems for space station automation
NASA Technical Reports Server (NTRS)
Ranganath, Heggere S.; Chipman, Laure J.
1989-01-01
Computer vision techniques which have the potential for use on the space station and related applications are assessed. A knowledge-based vision system (expert vision system) and the development of a demonstration system for it are described. This system implements some of the capabilities that would be necessary in a machine vision system for the robot arm of the laboratory module in the space station. A Perceptics 9200e image processor, on a host VAXstation, was used to develop the demonstration system. In order to use realistic test images, photographs of actual space shuttle simulator panels were used. The system's capabilities of scene identification and scene matching are discussed.
Research on 3D virtual campus scene modeling based on 3ds Max and VRML
NASA Astrophysics Data System (ADS)
Kang, Chuanli; Zhou, Yanliu; Liang, Xianyue
2015-12-01
With the rapid development of modem technology, the digital information management and the virtual reality simulation technology has become a research hotspot. Virtual campus 3D model can not only express the real world objects of natural, real and vivid, and can expand the campus of the reality of time and space dimension, the combination of school environment and information. This paper mainly uses 3ds Max technology to create three-dimensional model of building and on campus buildings, special land etc. And then, the dynamic interactive function is realized by programming the object model in 3ds Max by VRML .This research focus on virtual campus scene modeling technology and VRML Scene Design, and the scene design process in a variety of real-time processing technology optimization strategy. This paper guarantees texture map image quality and improve the running speed of image texture mapping. According to the features and architecture of Guilin University of Technology, 3ds Max, AutoCAD and VRML were used to model the different objects of the virtual campus. Finally, the result of virtual campus scene is summarized.
Designing Tracking Software for Image-Guided Surgery Applications: IGSTK Experience
Enquobahrie, Andinet; Gobbi, David; Turek, Matt; Cheng, Patrick; Yaniv, Ziv; Lindseth, Frank; Cleary, Kevin
2009-01-01
Objective Many image-guided surgery applications require tracking devices as part of their core functionality. The Image-Guided Surgery Toolkit (IGSTK) was designed and developed to interface tracking devices with software applications incorporating medical images. Methods IGSTK was designed as an open source C++ library that provides the basic components needed for fast prototyping and development of image-guided surgery applications. This library follows a component-based architecture with several components designed for specific sets of image-guided surgery functions. At the core of the toolkit is the tracker component that handles communication between a control computer and navigation device to gather pose measurements of surgical instruments present in the surgical scene. The representations of the tracked instruments are superimposed on anatomical images to provide visual feedback to the clinician during surgical procedures. Results The initial version of the IGSTK toolkit has been released in the public domain and several trackers are supported. The toolkit and related information are available at www.igstk.org. Conclusion With the increased popularity of minimally invasive procedures in health care, several tracking devices have been developed for medical applications. Designing and implementing high-quality and safe software to handle these different types of trackers in a common framework is a challenging task. It requires establishing key software design principles that emphasize abstraction, extensibility, reusability, fault-tolerance, and portability. IGSTK is an open source library that satisfies these needs for the image-guided surgery community. PMID:20037671
Sixteen-month-olds can use language to update their expectations about the visual world.
Ganea, Patricia A; Fitch, Allison; Harris, Paul L; Kaldy, Zsuzsa
2016-11-01
The capacity to use language to form new representations and to revise existing knowledge is a crucial aspect of human cognition. Here we examined whether infants can use language to adjust their representation of a recently encoded scene. Using an eye-tracking paradigm, we asked whether 16-month-old infants (N=26; mean age=16;0 [months;days], range=14;15-17;15) can use language about an occluded event to inform their expectation about what the world will look like when the occluder is removed. We compared looking time to outcome scenes that matched the language input with looking time to those that did not. Infants looked significantly longer at the event outcome when the outcome did not match the language input, suggesting that they generated an expectation of the outcome based on that input alone. This effect was unrelated to infants' vocabulary size. Thus, using language to adjust expectations about the visual world is present at an early developmental stage even when language skills are rudimentary. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Z.; Li, T.; Pan, L.; Kang, Z.
2017-09-01
With increasing attention for the indoor environment and the development of low-cost RGB-D sensors, indoor RGB-D images are easily acquired. However, scene semantic segmentation is still an open area, which restricts indoor applications. The depth information can help to distinguish the regions which are difficult to be segmented out from the RGB images with similar color or texture in the indoor scenes. How to utilize the depth information is the key problem of semantic segmentation for RGB-D images. In this paper, we propose an Encode-Decoder Fully Convolutional Networks for RGB-D image classification. We use Multiple Kernel Maximum Mean Discrepancy (MK-MMD) as a distance measure to find common and special features of RGB and D images in the network to enhance performance of classification automatically. To explore better methods of applying MMD, we designed two strategies; the first calculates MMD for each feature map, and the other calculates MMD for whole batch features. Based on the result of classification, we use the full connect CRFs for the semantic segmentation. The experimental results show that our method can achieve a good performance on indoor RGB-D image semantic segmentation.
Mirage: a visible signature evaluation tool
NASA Astrophysics Data System (ADS)
Culpepper, Joanne B.; Meehan, Alaster J.; Shao, Q. T.; Richards, Noel
2017-10-01
This paper presents the Mirage visible signature evaluation tool, designed to provide a visible signature evaluation capability that will appropriately reflect the effect of scene content on the detectability of targets, providing a capability to assess visible signatures in the context of the environment. Mirage is based on a parametric evaluation of input images, assessing the value of a range of image metrics and combining them using the boosted decision tree machine learning method to produce target detectability estimates. It has been developed using experimental data from photosimulation experiments, where human observers search for vehicle targets in a variety of digital images. The images used for tool development are synthetic (computer generated) images, showing vehicles in many different scenes and exhibiting a wide variation in scene content. A preliminary validation has been performed using k-fold cross validation, where 90% of the image data set was used for training and 10% of the image data set was used for testing. The results of the k-fold validation from 200 independent tests show a prediction accuracy between Mirage predictions of detection probability and observed probability of detection of r(262) = 0:63, p < 0:0001 (Pearson correlation) and a MAE = 0:21 (mean absolute error).
NASA Astrophysics Data System (ADS)
Mayr, Andreas; Rutzinger, Martin; Bremer, Magnus; Geitner, Clemens
2016-06-01
In the Alps as well as in other mountain regions steep grassland is frequently affected by shallow erosion. Often small landslides or snow movements displace the vegetation together with soil and/or unconsolidated material. This results in bare earth surface patches within the grass covered slope. Close-range and remote sensing techniques are promising for both mapping and monitoring these eroded areas. This is essential for a better geomorphological process understanding, to assess past and recent developments, and to plan mitigation measures. Recent developments in image matching techniques make it feasible to produce high resolution orthophotos and digital elevation models from terrestrial oblique images. In this paper we propose to delineate the boundary of eroded areas for selected scenes of a study area, using close-range photogrammetric data. Striving for an efficient, objective and reproducible workflow for this task, we developed an approach for automated classification of the scenes into the classes grass and eroded. We propose an object-based image analysis (OBIA) workflow which consists of image segmentation and automated threshold selection for classification using the Excess Green Vegetation Index (ExG). The automated workflow is tested with ten different scenes. Compared to a manual classification, grass and eroded areas are classified with an overall accuracy between 90.7% and 95.5%, depending on the scene. The methods proved to be insensitive to differences in illumination of the scenes and greenness of the grass. The proposed workflow reduces user interaction and is transferable to other study areas. We conclude that close-range photogrammetry is a valuable low-cost tool for mapping this type of eroded areas in the field with a high level of detail and quality. In future, the output will be used as ground truth for an area-wide mapping of eroded areas in coarser resolution aerial orthophotos acquired at the same time.
NASA Astrophysics Data System (ADS)
Kutulakos, Kyros N.; O'Toole, Matthew
2015-03-01
Conventional cameras record all light falling on their sensor regardless of the path that light followed to get there. In this paper we give an overview of a new family of computational cameras that offers many more degrees of freedom. These cameras record just a fraction of the light coming from a controllable source, based on the actual 3D light path followed. Photos and live video captured this way offer an unconventional view of everyday scenes in which the effects of scattering, refraction and other phenomena can be selectively blocked or enhanced, visual structures that are too subtle to notice with the naked eye can become apparent, and object appearance can depend on depth. We give an overview of the basic theory behind these cameras and their DMD-based implementation, and discuss three applications: (1) live indirect-only imaging of complex everyday scenes, (2) reconstructing the 3D shape of scenes whose geometry or material properties make them hard or impossible to scan with conventional methods, and (3) acquiring time-of-flight images that are free of multi-path interference.
The auditory scene: an fMRI study on melody and accompaniment in professional pianists.
Spada, Danilo; Verga, Laura; Iadanza, Antonella; Tettamanti, Marco; Perani, Daniela
2014-11-15
The auditory scene is a mental representation of individual sounds extracted from the summed sound waveform reaching the ears of the listeners. Musical contexts represent particularly complex cases of auditory scenes. In such a scenario, melody may be seen as the main object moving on a background represented by the accompaniment. Both melody and accompaniment vary in time according to harmonic rules, forming a typical texture with melody in the most prominent, salient voice. In the present sparse acquisition functional magnetic resonance imaging study, we investigated the interplay between melody and accompaniment in trained pianists, by observing the activation responses elicited by processing: (1) melody placed in the upper and lower texture voices, leading to, respectively, a higher and lower auditory salience; (2) harmonic violations occurring in either the melody, the accompaniment, or both. The results indicated that the neural activation elicited by the processing of polyphonic compositions in expert musicians depends upon the upper versus lower position of the melodic line in the texture, and showed an overall greater activation for the harmonic processing of melody over accompaniment. Both these two predominant effects were characterized by the involvement of the posterior cingulate cortex and precuneus, among other associative brain regions. We discuss the prominent role of the posterior medial cortex in the processing of melodic and harmonic information in the auditory stream, and propose to frame this processing in relation to the cognitive construction of complex multimodal sensory imagery scenes. Copyright © 2014 Elsevier Inc. All rights reserved.
Medical Image Fusion Based on Feature Extraction and Sparse Representation
Wei, Gao; Zongxi, Song
2017-01-01
As a novel multiscale geometric analysis tool, sparse representation has shown many advantages over the conventional image representation methods. However, the standard sparse representation does not take intrinsic structure and its time complexity into consideration. In this paper, a new fusion mechanism for multimodal medical images based on sparse representation and decision map is proposed to deal with these problems simultaneously. Three decision maps are designed including structure information map (SM) and energy information map (EM) as well as structure and energy map (SEM) to make the results reserve more energy and edge information. SM contains the local structure feature captured by the Laplacian of a Gaussian (LOG) and EM contains the energy and energy distribution feature detected by the mean square deviation. The decision map is added to the normal sparse representation based method to improve the speed of the algorithm. Proposed approach also improves the quality of the fused results by enhancing the contrast and reserving more structure and energy information from the source images. The experiment results of 36 groups of CT/MR, MR-T1/MR-T2, and CT/PET images demonstrate that the method based on SR and SEM outperforms five state-of-the-art methods. PMID:28321246
NASA Astrophysics Data System (ADS)
Coubard, F.; Brédif, M.; Paparoditis, N.; Briottet, X.
2011-04-01
Terrestrial geolocalized images are nowadays widely used on the Internet, mainly in urban areas, through immersion services such as Google Street View. On the long run, we seek to enhance the visualization of these images; for that purpose, radiometric corrections must be performed to free them from illumination conditions at the time of acquisition. Given the simultaneously acquired 3D geometric model of the scene with LIDAR or vision techniques, we face an inverse problem where the illumination and the geometry of the scene are known and the reflectance of the scene is to be estimated. Our main contribution is the introduction of a symbolic ray-tracing rendering to generate parametric images, for quick evaluation and comparison with the acquired images. The proposed approach is then based on an iterative estimation of the reflectance parameters of the materials, using a single rendering pre-processing. We validate the method on synthetic data with linear BRDF models and discuss the limitations of the proposed approach with more general non-linear BRDF models.
Piao, Xinglin; Zhang, Yong; Li, Tingshu; Hu, Yongli; Liu, Hao; Zhang, Ke; Ge, Yun
2016-01-01
The Received Signal Strength (RSS) fingerprint-based indoor localization is an important research topic in wireless network communications. Most current RSS fingerprint-based indoor localization methods do not explore and utilize the spatial or temporal correlation existing in fingerprint data and measurement data, which is helpful for improving localization accuracy. In this paper, we propose an RSS fingerprint-based indoor localization method by integrating the spatio-temporal constraints into the sparse representation model. The proposed model utilizes the inherent spatial correlation of fingerprint data in the fingerprint matching and uses the temporal continuity of the RSS measurement data in the localization phase. Experiments on the simulated data and the localization tests in the real scenes show that the proposed method improves the localization accuracy and stability effectively compared with state-of-the-art indoor localization methods. PMID:27827882
Application of composite small calibration objects in traffic accident scene photogrammetry.
Chen, Qiang; Xu, Hongguo; Tan, Lidong
2015-01-01
In order to address the difficulty of arranging large calibration objects and the low measurement accuracy of small calibration objects in traffic accident scene photogrammetry, a photogrammetric method based on a composite of small calibration objects is proposed. Several small calibration objects are placed around the traffic accident scene, and the coordinate system of the composite calibration object is given based on one of them. By maintaining the relative position and coplanar relationship of the small calibration objects, the local coordinate system of each small calibration object is transformed into the coordinate system of the composite calibration object. The two-dimensional direct linear transformation method is improved based on minimizing the reprojection error of the calibration points of all objects. A rectified image is obtained using the nonlinear optimization method. The increased accuracy of traffic accident scene photogrammetry using a composite small calibration object is demonstrated through the analysis of field experiments and case studies.
Knowledge-Based Object Detection in Laser Scanning Point Clouds
NASA Astrophysics Data System (ADS)
Boochs, F.; Karmacharya, A.; Marbs, A.
2012-07-01
Object identification and object processing in 3D point clouds have always posed challenges in terms of effectiveness and efficiency. In practice, this process is highly dependent on human interpretation of the scene represented by the point cloud data, as well as the set of modeling tools available for use. Such modeling algorithms are data-driven and concentrate on specific features of the objects, being accessible to numerical models. We present an approach that brings the human expert knowledge about the scene, the objects inside, and their representation by the data and the behavior of algorithms to the machine. This "understanding" enables the machine to assist human interpretation of the scene inside the point cloud. Furthermore, it allows the machine to understand possibilities and limitations of algorithms and to take this into account within the processing chain. This not only assists the researchers in defining optimal processing steps, but also provides suggestions when certain changes or new details emerge from the point cloud. Our approach benefits from the advancement in knowledge technologies within the Semantic Web framework. This advancement has provided a strong base for applications based on knowledge management. In the article we will present and describe the knowledge technologies used for our approach such as Web Ontology Language (OWL), used for formulating the knowledge base and the Semantic Web Rule Language (SWRL) with 3D processing and topologic built-ins, aiming to combine geometrical analysis of 3D point clouds, and specialists' knowledge of the scene and algorithmic processing.
Three-dimensional scene reconstruction from a two-dimensional image
NASA Astrophysics Data System (ADS)
Parkins, Franz; Jacobs, Eddie
2017-05-01
We propose and simulate a method of reconstructing a three-dimensional scene from a two-dimensional image for developing and augmenting world models for autonomous navigation. This is an extension of the Perspective-n-Point (PnP) method which uses a sampling of the 3D scene, 2D image point parings, and Random Sampling Consensus (RANSAC) to infer the pose of the object and produce a 3D mesh of the original scene. Using object recognition and segmentation, we simulate the implementation on a scene of 3D objects with an eye to implementation on embeddable hardware. The final solution will be deployed on the NVIDIA Tegra platform.
Guest Editor's introduction: Special issue on distributed virtual environments
NASA Astrophysics Data System (ADS)
Lea, Rodger
1998-09-01
Distributed virtual environments (DVEs) combine technology from 3D graphics, virtual reality and distributed systems to provide an interactive 3D scene that supports multiple participants. Each participant has a representation in the scene, often known as an avatar, and is free to navigate through the scene and interact with both the scene and other viewers of the scene. Changes to the scene, for example, position changes of one avatar as the associated viewer navigates through the scene, or changes to objects in the scene via manipulation, are propagated in real time to all viewers. This ensures that all viewers of a shared scene `see' the same representation of it, allowing sensible reasoning about the scene. Early work on such environments was restricted to their use in simulation, in particular in military simulation. However, over recent years a number of interesting and potentially far-reaching attempts have been made to exploit the technology for a range of other uses, including: Social spaces. Such spaces can be seen as logical extensions of the familiar text chat space. In 3D social spaces avatars, representing participants, can meet in shared 3D scenes and in addition to text chat can use visual cues and even in some cases spatial audio. Collaborative working. A number of recent projects have attempted to explore the use of DVEs to facilitate computer-supported collaborative working (CSCW), where the 3D space provides a context and work space for collaboration. Gaming. The shared 3D space is already familiar, albeit in a constrained manner, to the gaming community. DVEs are a logical superset of existing 3D games and can provide a rich framework for advanced gaming applications. e-commerce. The ability to navigate through a virtual shopping mall and to look at, and even interact with, 3D representations of articles has appealed to the e-commerce community as it searches for the best method of presenting merchandise to electronic consumers. The technology needed to support these systems crosses a number of disciplines in computer science. These include, but are certainly not limited to, real-time graphics for the accurate and realistic representation of scenes, group communications for the efficient update of shared consistent scene data, user interface modelling to exploit the use of the 3D representation and multimedia systems technology for the delivery of streamed graphics and audio-visual data into the shared scene. It is this intersection of technologies and the overriding need to provide visual realism that places such high demands on the underlying distributed systems infrastructure and makes DVEs such fertile ground for distributed systems research. Two examples serve to show how DVE developers have exploited the unique aspects of their domain. Communications. The usual tension between latency and throughput is particularly noticeable within DVEs. To ensure the timely update of multiple viewers of a particular scene requires that such updates be propagated quickly. However, the sheer volume of changes to any one scene calls for techniques that minimize the number of distinct updates that are sent to the network. Several techniques have been used to address this tension; these include the use of multicast communications, and in particular multicast in wide-area networks to reduce actual message traffic. Multicast has been combined with general group communications to partition updates to related objects or users of a scene. A less traditional approach has been the use of dead reckoning whereby a client application that visualizes the scene calculates position updates by extrapolating movement based on previous information. This allows the system to reduce the number of communications needed to update objects that move in a stable manner within the scene. Scaling. DVEs, especially those used for social spaces, are required to support large numbers of simultaneous users in potentially large shared scenes. The desire for scalability has driven different architectural designs, for example, the use of fully distributed architectures which scale well but often suffer performance costs versus centralized and hierarchical architectures in which the inverse is true. However, DVEs have also exploited the spatial nature of their domain to address scalability and have pioneered techniques that exploit the semantics of the shared space to reduce data updates and so allow greater scalability. Several of the systems reported in this special issue apply a notion of area of interest to partition the scene and so reduce the participants in any data updates. The specification of area of interest differs between systems. One approach has been to exploit a geographical notion, i.e. a regular portion of a scene, or a semantic unit, such as a room or building. Another approach has been to define the area of interest as a spatial area associated with an avatar in the scene. The five papers in this special issue have been chosen to highlight the distributed systems aspects of the DVE domain. The first paper, on the DIVE system, described by Emmanuel Frécon and Mårten Stenius explores the use of multicast and group communication in a fully peer-to-peer architecture. The developers of DIVE have focused on its use as the basis for collaborative work environments and have explored the issues associated with maintaining and updating large complicated scenes. The second paper, by Hiroaki Harada et al, describes the AGORA system, a DVE concentrating on social spaces and employing a novel communication technique that incorporates position update and vector information to support dead reckoning. The paper by Simon Powers et al explores the application of DVEs to the gaming domain. They propose a novel architecture that separates out higher-level game semantics - the conceptual model - from the lower-level scene attributes - the dynamic model, both running on servers, from the actual visual representation - the visual model - running on the client. They claim a number of benefits from this approach, including better predictability and consistency. Wolfgang Broll discusses the SmallView system which is an attempt to provide a toolkit for DVEs. One of the key features of SmallView is a sophisticated application level protocol, DWTP, that provides support for a variety of communication models. The final paper, by Chris Greenhalgh, discusses the MASSIVE system which has been used to explore the notion of awareness in the 3D space via the concept of `auras'. These auras define an area of interest for users and support a mapping between what a user is aware of, and what data update rate the communications infrastructure can support. We hope that this selection of papers will serve to provide a clear introduction to the distributed system issues faced by the DVE community and the approaches they have taken in solving them. Finally, we wish to thank Hubert Le Van Gong for his tireless efforts in pulling together all these papers and both the referees and the authors of the papers for the time and effort in ensuring that their contributions teased out the interesting distributed systems issues for this special issue. † E-mail address: rodger@arch.sel.sony.com
A Photo Album of Earth Scheduling Landsat 7 Mission Daily Activities
NASA Technical Reports Server (NTRS)
Potter, William; Gasch, John; Bauer, Cynthia
1998-01-01
Landsat7 is a member of a new generation of Earth observation satellites. Landsat7 will carry on the mission of the aging Landsat 5 spacecraft by acquiring high resolution, multi-spectral images of the Earth surface for strategic, environmental, commercial, agricultural and civil analysis and research. One of the primary mission goals of Landsat7 is to accumulate and seasonally refresh an archive of global images with full coverage of Earth's landmass, less the central portion of Antarctica. This archive will enable further research into seasonal, annual and long-range trending analysis in such diverse research areas as crop yields, deforestation, population growth, and pollution control, to name just a few. A secondary goal of Landsat7 is to fulfill imaging requests from our international partners in the mission. Landsat7 will transmit raw image data from the spacecraft to 25 ground stations in 20 subscribing countries. Whereas earlier Landsat missions were scheduled manually (as are the majority of current low-orbit satellite missions), the task of manually planning and scheduling Landsat7 mission activities would be overwhelmingly complex when considering the large volume of image requests, the limited resources available, spacecraft instrument limitations, and the limited ground image processing capacity, not to mention avoidance of foul weather systems. The Landsat7 Mission Operation Center (MOC) includes an image scheduler subsystem that is designed to automate the majority of mission planning and scheduling, including selection of the images to be acquired, managing the recording and playback of the images by the spacecraft, scheduling ground station contacts for downlink of images, and generating the spacecraft commands for controlling the imager, recorder, transmitters and antennas. The image scheduler subsystem autonomously generates 90% of the spacecraft commanding with minimal manual intervention. The image scheduler produces a conflict-free schedule for acquiring images of the "best" 250 scenes daily for refreshing the global archive. It then equitably distributes the remaining resources for acquiring up to 430 scenes to satisfy requests by international subscribers. The image scheduler selects candidate scenes based on priority and age of the requests, and predicted cloud cover and sun angle at each scene. It also selects these scenes to avoid instrument constraint violations and maximizes efficiency of resource usage by encouraging acquisition of scenes in clusters. Of particular interest to the mission planners, it produces the resulting schedule in a reasonable time, typically within 15 minutes.
Effects of Scene Modulation Image Blur and Noise Upon Human Target Acquisition Performance.
1997-06-01
AFRL-HE-WP-TR-1998-0012 UNITED STATES AIR FORCE RESEARCH LABORATORY EFFECTS OF SCENE MODULATION IMAGE BLUR AND NOISE UPON HUMAN TARGET...COVERED INTERIM (July 1996 - August 1996) TITLE AND SUBTITLE Effects of Scene Modulation Image Blur and Noise Upon Human Target Acquisition...dilemma in image transmission and display is that we must compromise between die conflicting constraints of dynamic range and noise . Three target
Object-based media and stream-based computing
NASA Astrophysics Data System (ADS)
Bove, V. Michael, Jr.
1998-03-01
Object-based media refers to the representation of audiovisual information as a collection of objects - the result of scene-analysis algorithms - and a script describing how they are to be rendered for display. Such multimedia presentations can adapt to viewing circumstances as well as to viewer preferences and behavior, and can provide a richer link between content creator and consumer. With faster networks and processors, such ideas become applicable to live interpersonal communications as well, creating a more natural and productive alternative to traditional videoconferencing. In this paper is outlined an example of object-based media algorithms and applications developed by my group, and present new hardware architectures and software methods that we have developed to enable meeting the computational requirements of object- based and other advanced media representations. In particular we describe stream-based processing, which enables automatic run-time parallelization of multidimensional signal processing tasks even given heterogenous computational resources.
Video based object representation and classification using multiple covariance matrices.
Zhang, Yurong; Liu, Quan
2017-01-01
Video based object recognition and classification has been widely studied in computer vision and image processing area. One main issue of this task is to develop an effective representation for video. This problem can generally be formulated as image set representation. In this paper, we present a new method called Multiple Covariance Discriminative Learning (MCDL) for image set representation and classification problem. The core idea of MCDL is to represent an image set using multiple covariance matrices with each covariance matrix representing one cluster of images. Firstly, we use the Nonnegative Matrix Factorization (NMF) method to do image clustering within each image set, and then adopt Covariance Discriminative Learning on each cluster (subset) of images. At last, we adopt KLDA and nearest neighborhood classification method for image set classification. Promising experimental results on several datasets show the effectiveness of our MCDL method.
Visual communication with retinex coding.
Huck, F O; Fales, C L; Davis, R E; Alter-Gartenberg, R
2000-04-10
Visual communication with retinex coding seeks to suppress the spatial variation of the irradiance (e.g., shadows) across natural scenes and preserve only the spatial detail and the reflectance (or the lightness) of the surface itself. The separation of reflectance from irradiance begins with nonlinear retinex coding that sharply and clearly enhances edges and preserves their contrast, and it ends with a Wiener filter that restores images from this edge and contrast information. An approximate small-signal model of image gathering with retinex coding is found to consist of the familiar difference-of-Gaussian bandpass filter and a locally adaptive automatic-gain control. A linear representation of this model is used to develop expressions within the small-signal constraint for the information rate and the theoretical minimum data rate of the retinex-coded signal and for the maximum-realizable fidelity of the images restored from this signal. Extensive computations and simulations demonstrate that predictions based on these figures of merit correlate closely with perceptual and measured performance. Hence these predictions can serve as a general guide for the design of visual communication channels that produce images with a visual quality that consistently approaches the best possible sharpness, clarity, and reflectance constancy, even for nonuniform irradiances. The suppression of shadows in the restored image is found to be constrained inherently more by the sharpness of their penumbra than by their depth.
Visual Communication with Retinex Coding
NASA Astrophysics Data System (ADS)
Huck, Friedrich O.; Fales, Carl L.; Davis, Richard E.; Alter-Gartenberg, Rachel
2000-04-01
Visual communication with retinex coding seeks to suppress the spatial variation of the irradiance (e.g., shadows) across natural scenes and preserve only the spatial detail and the reflectance (or the lightness) of the surface itself. The separation of reflectance from irradiance begins with nonlinear retinex coding that sharply and clearly enhances edges and preserves their contrast, and it ends with a Wiener filter that restores images from this edge and contrast information. An approximate small-signal model of image gathering with retinex coding is found to consist of the familiar difference-of-Gaussian bandpass filter and a locally adaptive automatic-gain control. A linear representation of this model is used to develop expressions within the small-signal constraint for the information rate and the theoretical minimum data rate of the retinex-coded signal and for the maximum-realizable fidelity of the images restored from this signal. Extensive computations and simulations demonstrate that predictions based on these figures of merit correlate closely with perceptual and measured performance. Hence these predictions can serve as a general guide for the design of visual communication channels that produce images with a visual quality that consistently approaches the best possible sharpness, clarity, and reflectance constancy, even for nonuniform irradiances. The suppression of shadows in the restored image is found to be constrained inherently more by the sharpness of their penumbra than by their depth.
Figure-Ground Organization in Visual Cortex for Natural Scenes
2016-01-01
Abstract Figure-ground organization and border-ownership assignment are essential for understanding natural scenes. It has been shown that many neurons in the macaque visual cortex signal border-ownership in displays of simple geometric shapes such as squares, but how well these neurons resolve border-ownership in natural scenes is not known. We studied area V2 neurons in behaving macaques with static images of complex natural scenes. We found that about half of the neurons were border-ownership selective for contours in natural scenes, and this selectivity originated from the image context. The border-ownership signals emerged within 70 ms after stimulus onset, only ∼30 ms after response onset. A substantial fraction of neurons were highly consistent across scenes. Thus, the cortical mechanisms of figure-ground organization are fast and efficient even in images of complex natural scenes. Understanding how the brain performs this task so fast remains a challenge. PMID:28058269
Pedale, Tiziana; Santangelo, Valerio
2015-01-01
One of the most important issues in the study of cognition is to understand which are the factors determining internal representation of the external world. Previous literature has started to highlight the impact of low-level sensory features (indexed by saliency-maps) in driving attention selection, hence increasing the probability for objects presented in complex and natural scenes to be successfully encoded into working memory (WM) and then correctly remembered. Here we asked whether the probability of retrieving high-saliency objects modulates the overall contents of WM, by decreasing the probability of retrieving other, lower-saliency objects. We presented pictures of natural scenes for 4 s. After a retention period of 8 s, we asked participants to verbally report as many objects/details as possible of the previous scenes. We then computed how many times the objects located at either the peak of maximal or minimal saliency in the scene (as indexed by a saliency-map; Itti et al., 1998) were recollected by participants. Results showed that maximal-saliency objects were recollected more often and earlier in the stream of successfully reported items than minimal-saliency objects. This indicates that bottom-up sensory salience increases the recollection probability and facilitates the access to memory representation at retrieval, respectively. Moreover, recollection of the maximal- (but not the minimal-) saliency objects predicted the overall amount of successfully recollected objects: The higher the probability of having successfully reported the most-salient object in the scene, the lower the amount of recollected objects. These findings highlight that bottom-up sensory saliency modulates the current contents of WM during recollection of objects from natural scenes, most likely by reducing available resources to encode and then retrieve other (lower saliency) objects. PMID:25741266
The Dynamic Photometric Stereo Method Using a Multi-Tap CMOS Image Sensor.
Yoda, Takuya; Nagahara, Hajime; Taniguchi, Rin-Ichiro; Kagawa, Keiichiro; Yasutomi, Keita; Kawahito, Shoji
2018-03-05
The photometric stereo method enables estimation of surface normals from images that have been captured using different but known lighting directions. The classical photometric stereo method requires at least three images to determine the normals in a given scene. However, this method cannot be applied to dynamic scenes because it is assumed that the scene remains static while the required images are captured. In this work, we present a dynamic photometric stereo method for estimation of the surface normals in a dynamic scene. We use a multi-tap complementary metal-oxide-semiconductor (CMOS) image sensor to capture the input images required for the proposed photometric stereo method. This image sensor can divide the electrons from the photodiode from a single pixel into the different taps of the exposures and can thus capture multiple images under different lighting conditions with almost identical timing. We implemented a camera lighting system and created a software application to enable estimation of the normal map in real time. We also evaluated the accuracy of the estimated surface normals and demonstrated that our proposed method can estimate the surface normals of dynamic scenes.
Capturing the plenoptic function in a swipe
NASA Astrophysics Data System (ADS)
Lawson, Michael; Brookes, Mike; Dragotti, Pier Luigi
2016-09-01
Blur in images, caused by camera motion, is typically thought of as a problem. The approach described in this paper shows instead that it is possible to use the blur caused by the integration of light rays at different positions along a moving camera trajectory to extract information about the light rays present within the scene. Retrieving the light rays of a scene from different viewpoints is equivalent to retrieving the plenoptic function of the scene. In this paper, we focus on a specific case in which the blurred image of a scene, containing a flat plane with a texture signal that is a sum of sine waves, is analysed to recreate the plenoptic function. The image is captured by a single lens camera with shutter open, moving in a straight line between two points, resulting in a swiped image. It is shown that finite rate of innovation sampling theory can be used to recover the scene geometry and therefore the epipolar plane image from the single swiped image. This epipolar plane image can be used to generate unblurred images for a given camera location.
Space Shuttle Columbia views the world with imaging radar: The SIR-A experiment
NASA Technical Reports Server (NTRS)
Ford, J. P.; Cimino, J. B.; Elachi, C.
1983-01-01
Images acquired by the Shuttle Imaging Radar (SIR-A) in November 1981, demonstrate the capability of this microwave remote sensor system to perceive and map a wide range of different surface features around the Earth. A selection of 60 scenes displays this capability with respect to Earth resources - geology, hydrology, agriculture, forest cover, ocean surface features, and prominent man-made structures. The combined area covered by the scenes presented amounts to about 3% of the total acquired. Most of the SIR-A images are accompanied by a LANDSAT multispectral scanner (MSS) or SEASAT synthetic-aperture radar (SAR) image of the same scene for comparison. Differences between the SIR-A image and its companion LANDSAT or SEASAT image at each scene are related to the characteristics of the respective imaging systems, and to seasonal or other changes that occurred in the time interval between acquisition of the images.
NASA Astrophysics Data System (ADS)
Unaldi, Numan; Asari, Vijayan K.; Rahman, Zia-ur
2009-05-01
Recently we proposed a wavelet-based dynamic range compression algorithm to improve the visual quality of digital images captured from high dynamic range scenes with non-uniform lighting conditions. The fast image enhancement algorithm that provides dynamic range compression, while preserving the local contrast and tonal rendition, is also a good candidate for real time video processing applications. Although the colors of the enhanced images produced by the proposed algorithm are consistent with the colors of the original image, the proposed algorithm fails to produce color constant results for some "pathological" scenes that have very strong spectral characteristics in a single band. The linear color restoration process is the main reason for this drawback. Hence, a different approach is required for the final color restoration process. In this paper the latest version of the proposed algorithm, which deals with this issue is presented. The results obtained by applying the algorithm to numerous natural images show strong robustness and high image quality.
Multi-viewpoint Image Array Virtual Viewpoint Rapid Generation Algorithm Based on Image Layering
NASA Astrophysics Data System (ADS)
Jiang, Lu; Piao, Yan
2018-04-01
The use of multi-view image array combined with virtual viewpoint generation technology to record 3D scene information in large scenes has become one of the key technologies for the development of integrated imaging. This paper presents a virtual viewpoint rendering method based on image layering algorithm. Firstly, the depth information of reference viewpoint image is quickly obtained. During this process, SAD is chosen as the similarity measure function. Then layer the reference image and calculate the parallax based on the depth information. Through the relative distance between the virtual viewpoint and the reference viewpoint, the image layers are weighted and panned. Finally the virtual viewpoint image is rendered layer by layer according to the distance between the image layers and the viewer. This method avoids the disadvantages of the algorithm DIBR, such as high-precision requirements of depth map and complex mapping operations. Experiments show that, this algorithm can achieve the synthesis of virtual viewpoints in any position within 2×2 viewpoints range, and the rendering speed is also very impressive. The average result proved that this method can get satisfactory image quality. The average SSIM value of the results relative to real viewpoint images can reaches 0.9525, the PSNR value can reaches 38.353 and the image histogram similarity can reaches 93.77%.
Ryals, Anthony J.; Wang, Jane X.; Polnaszek, Kelly L.; Voss, Joel L.
2015-01-01
Although hippocampus unequivocally supports explicit/ declarative memory, fewer findings have demonstrated its role in implicit expressions of memory. We tested for hippocampal contributions to an implicit expression of configural/relational memory for complex scenes using eye-movement tracking during functional magnetic resonance imaging (fMRI) scanning. Participants studied scenes and were later tested using scenes that resembled study scenes in their overall feature configuration but comprised different elements. These configurally similar scenes were used to limit explicit memory, and were intermixed with new scenes that did not resemble studied scenes. Scene configuration memory was expressed through eye movements reflecting exploration overlap (EO), which is the viewing of the same scene locations at both study and test. EO reliably discriminated similar study-test scene pairs from study-new scene pairs, was reliably greater for similarity-based recognition hits than for misses, and correlated with hippocampal fMRI activity. In contrast, subjects could not reliably discriminate similar from new scenes by overt judgments, although ratings of familiarity were slightly higher for similar than new scenes. Hippocampal fMRI correlates of this weak explicit memory were distinct from EO-related activity. These findings collectively suggest that EO was an implicit expression of scene configuration memory associated with hippocampal activity. Visual exploration can therefore reflect implicit hippocampal-related memory processing that can be observed in eye-movement behavior during naturalistic scene viewing. PMID:25620526
Perfect 3-D movies and stereoscopic movies on TV and projection screens: an appraisement
NASA Astrophysics Data System (ADS)
Klein, Susanne; Dultz, Wolfgang
1990-09-01
Since the invention of stereoscopy (WHEATSTONE 1838) reasons for and against 3-dimensional images have occupied the literature, but there has never been much doubt about the preference of autostereoscopic systems showing a scene which is 3-dimensional and true to life from all sides (perfect 3-dimensional image, HESSE 1939), especially since most stereoscopic movies of the past show serious imperfections with respect to image quality and technical operation. Leave aside that no convincing perfect 3D-TV-system is in sight, there are properties f the stereoscopic movie which are advantageous to certain representations on TV and important for the 3-dimensional motion picture. In this paper we investigate the influence of apparent motions of 3-dimensional images and classify the different projection systems with respect to presence and absence of these spectacular illusions. Apparent motions bring dramatic effects into stereoscopic movies which cannot be created with perfect 3-dimensional systems. In this study we describe their applications and limits for television.
Online coupled camera pose estimation and dense reconstruction from video
Medioni, Gerard; Kang, Zhuoliang
2016-11-01
A product may receive each image in a stream of video image of a scene, and before processing the next image, generate information indicative of the position and orientation of an image capture device that captured the image at the time of capturing the image. The product may do so by identifying distinguishable image feature points in the image; determining a coordinate for each identified image feature point; and for each identified image feature point, attempting to identify one or more distinguishable model feature points in a three dimensional (3D) model of at least a portion of the scene that appears likely to correspond to the identified image feature point. Thereafter, the product may find each of the following that, in combination, produce a consistent projection transformation of the 3D model onto the image: a subset of the identified image feature points for which one or more corresponding model feature points were identified; and, for each image feature point that has multiple likely corresponding model feature points, one of the corresponding model feature points. The product may update a 3D model of at least a portion of the scene following the receipt of each video image and before processing the next video image base on the generated information indicative of the position and orientation of the image capture device at the time of capturing the received image. The product may display the updated 3D model after each update to the model.
Effects of chromatic image statistics on illumination induced color differences.
Lucassen, Marcel P; Gevers, Theo; Gijsenij, Arjan; Dekker, Niels
2013-09-01
We measure the color fidelity of visual scenes that are rendered under different (simulated) illuminants and shown on a calibrated LCD display. Observers make triad illuminant comparisons involving the renderings from two chromatic test illuminants and one achromatic reference illuminant shown simultaneously. Four chromatic test illuminants are used: two along the daylight locus (yellow and blue), and two perpendicular to it (red and green). The observers select the rendering having the best color fidelity, thereby indirectly judging which of the two test illuminants induces the smallest color differences compared to the reference. Both multicolor test scenes and natural scenes are studied. The multicolor scenes are synthesized and represent ellipsoidal distributions in CIELAB chromaticity space having the same mean chromaticity but different chromatic orientations. We show that, for those distributions, color fidelity is best when the vector of the illuminant change (pointing from neutral to chromatic) is parallel to the major axis of the scene's chromatic distribution. For our selection of natural scenes, which generally have much broader chromatic distributions, we measure a higher color fidelity for the yellow and blue illuminants than for red and green. Scrambled versions of the natural images are also studied to exclude possible semantic effects. We quantitatively predict the average observer response (i.e., the illuminant probability) with four types of models, differing in the extent to which they incorporate information processing by the visual system. Results show different levels of performance for the models, and different levels for the multicolor scenes and the natural scenes. Overall, models based on the scene averaged color difference have the best performance. We discuss how color constancy algorithms may be improved by exploiting knowledge of the chromatic distribution of the visual scene.
Orbiting passive microwave sensor simulation applied to soil moisture estimation
NASA Technical Reports Server (NTRS)
Newton, R. W. (Principal Investigator); Clark, B. V.; Pitchford, W. M.; Paris, J. F.
1979-01-01
A sensor/scene simulation program was developed and used to determine the effects of scene heterogeneity, resolution, frequency, look angle, and surface and temperature relations on the performance of a spaceborne passive microwave system designed to estimate soil water information. The ground scene is based on classified LANDSAT images which provide realistic ground classes, as well as geometries. It was determined that the average sensitivity of antenna temperature to soil moisture improves as the antenna footprint size increased. Also, the precision (or variability) of the sensitivity changes as a function of resolution.
Image fusion based on Bandelet and sparse representation
NASA Astrophysics Data System (ADS)
Zhang, Jiuxing; Zhang, Wei; Li, Xuzhi
2018-04-01
Bandelet transform could acquire geometric regular direction and geometric flow, sparse representation could represent signals with as little as possible atoms on over-complete dictionary, both of which could be used to image fusion. Therefore, a new fusion method is proposed based on Bandelet and Sparse Representation, to fuse Bandelet coefficients of multi-source images and obtain high quality fusion effects. The test are performed on remote sensing images and simulated multi-focus images, experimental results show that the performance of new method is better than tested methods according to objective evaluation indexes and subjective visual effects.
Brédart, Serge; Cornet, Alyssa; Rakic, Jean-Marie
2014-01-01
Color deficient (dichromat) and normal observers' recognition memory for colored and black-and-white natural scenes was evaluated through several parameters: the rate of recognition, discrimination (A'), response bias (B"D), response confidence, and the proportion of conscious recollections (Remember responses) among hits. At the encoding phase, 36 images of natural scenes were each presented for 1 sec. Half of the images were shown in color and half in black-and-white. At the recognition phase, these 36 pictures were intermixed with 36 new images. The participants' task was to indicate whether an image had been presented or not at the encoding phase, to rate their level of confidence in his her/his response, and in the case of a positive response, to classify the response as a Remember, a Know or a Guess response. Results indicated that accuracy, response discrimination, response bias and confidence ratings were higher for colored than for black-and-white images; this advantage for colored images was similar in both groups of participants. Rates of Remember responses were not higher for colored images than for black-and-white ones, whatever the group. However, interestingly, Remember responses were significantly more often based on color information for colored than for black-and-white images in normal observers only, not in dichromats.
Guaranteeing Failsafe Operation of Extended-Scene Shack-Hartmann Wavefront Sensor Algorithm
NASA Technical Reports Server (NTRS)
Sidick, Erikin
2009-01-01
A Shack-Hartmann sensor (SHS) is an optical instrument consisting of a lenslet array and a camera. It is widely used for wavefront sensing in optical testing and astronomical adaptive optics. The camera is placed at the focal point of the lenslet array and points at a star or any other point source. The image captured is an array of spot images. When the wavefront error at the lenslet array changes, the position of each spot measurably shifts from its original position. Determining the shifts of the spot images from their reference points shows the extent of the wavefront error. An adaptive cross-correlation (ACC) algorithm has been developed to use scenes as well as point sources for wavefront error detection. Qualifying an extended scene image is often not an easy task due to changing conditions in scene content, illumination level, background, Poisson noise, read-out noise, dark current, sampling format, and field of view. The proposed new technique based on ACC algorithm analyzes the effects of these conditions on the performance of the ACC algorithm and determines the viability of an extended scene image. If it is viable, then it can be used for error correction; if it is not, the image fails and will not be further processed. By potentially testing for a wide variety of conditions, the algorithm s accuracy can be virtually guaranteed. In a typical application, the ACC algorithm finds image shifts of more than 500 Shack-Hartmann camera sub-images relative to a reference sub -image or cell when performing one wavefront sensing iteration. In the proposed new technique, a pair of test and reference cells is selected from the same frame, preferably from two well-separated locations. The test cell is shifted by an integer number of pixels, say, for example, from m= -5 to 5 along the x-direction by choosing a different area on the same sub-image, and the shifts are estimated using the ACC algorithm. The same is done in the y-direction. If the resulting shift estimate errors are less than a pre-determined threshold (e.g., 0.03 pixel), the image is accepted. Otherwise, it is rejected.
Re-presentations of space in Hollywood movies: an event-indexing analysis.
Cutting, James; Iricinschi, Catalina
2015-03-01
Popular movies present chunk-like events (scenes and subscenes) that promote episodic, serial updating of viewers' representations of the ongoing narrative. Event-indexing theory would suggest that the beginnings of new scenes trigger these updates, which in turn require more cognitive processing. Typically, a new movie event is signaled by an establishing shot, one providing more background information and a longer look than the average shot. Our analysis of 24 films reconfirms this. More important, we show that, when returning to a previously shown location, the re-establishing shot reduces both context and duration while remaining greater than the average shot. In general, location shifts dominate character and time shifts in event segmentation of movies. In addition, over the last 70 years re-establishing shots have become more like the noninitial shots of a scene. Establishing shots have also approached noninitial shot scales, but not their durations. Such results suggest that film form is evolving, perhaps to suit more rapid encoding of narrative events. Copyright © 2014 Cognitive Science Society, Inc.
(In) Sensitivity to spatial distortion in natural scenes
Bex, Peter J.
2010-01-01
The perception of object structure in the natural environment is remarkably stable under large variation in image size and projection, especially given our insensitivity to spatial position outside the fovea. Sensitivity to periodic spatial distortions that were introduced into one quadrant of gray-scale natural images was measured in a 4AFC task. Observers were able to detect the presence of distortions in unfamiliar images even though they did not significantly affect the amplitude spectrum. Sensitivity depended on the spatial period of the distortion and on the image structure at the location of the distortion. The results suggest that the detection of distortion involves decisions made in the late stages of image perception and is based on an expectation of the typical structure of natural scenes. PMID:20462324
A Novel Locally Linear KNN Method With Applications to Visual Recognition.
Liu, Qingfeng; Liu, Chengjun
2017-09-01
A locally linear K Nearest Neighbor (LLK) method is presented in this paper with applications to robust visual recognition. Specifically, the concept of an ideal representation is first presented, which improves upon the traditional sparse representation in many ways. The objective function based on a host of criteria for sparsity, locality, and reconstruction is then optimized to derive a novel representation, which is an approximation to the ideal representation. The novel representation is further processed by two classifiers, namely, an LLK-based classifier and a locally linear nearest mean-based classifier, for visual recognition. The proposed classifiers are shown to connect to the Bayes decision rule for minimum error. Additional new theoretical analysis is presented, such as the nonnegative constraint, the group regularization, and the computational efficiency of the proposed LLK method. New methods such as a shifted power transformation for improving reliability, a coefficients' truncating method for enhancing generalization, and an improved marginal Fisher analysis method for feature extraction are proposed to further improve visual recognition performance. Extensive experiments are implemented to evaluate the proposed LLK method for robust visual recognition. In particular, eight representative data sets are applied for assessing the performance of the LLK method for various visual recognition applications, such as action recognition, scene recognition, object recognition, and face recognition.
Local coding based matching kernel method for image classification.
Song, Yan; McLoughlin, Ian Vince; Dai, Li-Rong
2014-01-01
This paper mainly focuses on how to effectively and efficiently measure visual similarity for local feature based representation. Among existing methods, metrics based on Bag of Visual Word (BoV) techniques are efficient and conceptually simple, at the expense of effectiveness. By contrast, kernel based metrics are more effective, but at the cost of greater computational complexity and increased storage requirements. We show that a unified visual matching framework can be developed to encompass both BoV and kernel based metrics, in which local kernel plays an important role between feature pairs or between features and their reconstruction. Generally, local kernels are defined using Euclidean distance or its derivatives, based either explicitly or implicitly on an assumption of Gaussian noise. However, local features such as SIFT and HoG often follow a heavy-tailed distribution which tends to undermine the motivation behind Euclidean metrics. Motivated by recent advances in feature coding techniques, a novel efficient local coding based matching kernel (LCMK) method is proposed. This exploits the manifold structures in Hilbert space derived from local kernels. The proposed method combines advantages of both BoV and kernel based metrics, and achieves a linear computational complexity. This enables efficient and scalable visual matching to be performed on large scale image sets. To evaluate the effectiveness of the proposed LCMK method, we conduct extensive experiments with widely used benchmark datasets, including 15-Scenes, Caltech101/256, PASCAL VOC 2007 and 2011 datasets. Experimental results confirm the effectiveness of the relatively efficient LCMK method.
Adaptive fusion of infrared and visible images in dynamic scene
NASA Astrophysics Data System (ADS)
Yang, Guang; Yin, Yafeng; Man, Hong; Desai, Sachi
2011-11-01
Multiple modalities sensor fusion has been widely employed in various surveillance and military applications. A variety of image fusion techniques including PCA, wavelet, curvelet and HSV has been proposed in recent years to improve human visual perception for object detection. One of the main challenges for visible and infrared image fusion is to automatically determine an optimal fusion strategy for different input scenes along with an acceptable computational cost. This paper, we propose a fast and adaptive feature selection based image fusion method to obtain high a contrast image from visible and infrared sensors for targets detection. At first, fuzzy c-means clustering is applied on the infrared image to highlight possible hotspot regions, which will be considered as potential targets' locations. After that, the region surrounding the target area is segmented as the background regions. Then image fusion is locally applied on the selected target and background regions by computing different linear combination of color components from registered visible and infrared images. After obtaining different fused images, histogram distributions are computed on these local fusion images as the fusion feature set. The variance ratio which is based on Linear Discriminative Analysis (LDA) measure is employed to sort the feature set and the most discriminative one is selected for the whole image fusion. As the feature selection is performed over time, the process will dynamically determine the most suitable feature for the image fusion in different scenes. Experiment is conducted on the OSU Color-Thermal database, and TNO Human Factor dataset. The fusion results indicate that our proposed method achieved a competitive performance compared with other fusion algorithms at a relatively low computational cost.
NASA Astrophysics Data System (ADS)
Zhang, Chun-Sen; Zhang, Meng-Meng; Zhang, Wei-Xing
2017-01-01
This paper outlines a low-cost, user-friendly photogrammetric technique with nonmetric cameras to obtain excavation site digital sequence images, based on photogrammetry and computer vision. Digital camera calibration, automatic aerial triangulation, image feature extraction, image sequence matching, and dense digital differential rectification are used, combined with a certain number of global control points of the excavation site, to reconstruct the high precision of measured three-dimensional (3-D) models. Using the acrobatic figurines in the Qin Shi Huang mausoleum excavation as an example, our method solves the problems of little base-to-height ratio, high inclination, unstable altitudes, and significant ground elevation changes affecting image matching. Compared to 3-D laser scanning, the 3-D color point cloud obtained by this method can maintain the same visual result and has advantages of low project cost, simple data processing, and high accuracy. Structure-from-motion (SfM) is often used to reconstruct 3-D models of large scenes and has lower accuracy if it is a reconstructed 3-D model of a small scene at close range. Results indicate that this method quickly achieves 3-D reconstruction of large archaeological sites and produces heritage site distribution of orthophotos providing a scientific basis for accurate location of cultural relics, archaeological excavations, investigation, and site protection planning. This proposed method has a comprehensive application value.
Temporal consistent depth map upscaling for 3DTV
NASA Astrophysics Data System (ADS)
Schwarz, Sebastian; Sjöström, Mârten; Olsson, Roger
2014-03-01
The ongoing success of three-dimensional (3D) cinema fuels increasing efforts to spread the commercial success of 3D to new markets. The possibilities of a convincing 3D experience at home, such as three-dimensional television (3DTV), has generated a great deal of interest within the research and standardization community. A central issue for 3DTV is the creation and representation of 3D content. Acquiring scene depth information is a fundamental task in computer vision, yet complex and error-prone. Dedicated range sensors, such as the Time of-Flight camera (ToF), can simplify the scene depth capture process and overcome shortcomings of traditional solutions, such as active or passive stereo analysis. Admittedly, currently available ToF sensors deliver only a limited spatial resolution. However, sophisticated depth upscaling approaches use texture information to match depth and video resolution. At Electronic Imaging 2012 we proposed an upscaling routine based on error energy minimization, weighted with edge information from an accompanying video source. In this article we develop our algorithm further. By adding temporal consistency constraints to the upscaling process, we reduce disturbing depth jumps and flickering artifacts in the final 3DTV content. Temporal consistency in depth maps enhances the 3D experience, leading to a wider acceptance of 3D media content. More content in better quality can boost the commercial success of 3DTV.
A hybrid multiview stereo algorithm for modeling urban scenes.
Lafarge, Florent; Keriven, Renaud; Brédif, Mathieu; Vu, Hoang-Hiep
2013-01-01
We present an original multiview stereo reconstruction algorithm which allows the 3D-modeling of urban scenes as a combination of meshes and geometric primitives. The method provides a compact model while preserving details: Irregular elements such as statues and ornaments are described by meshes, whereas regular structures such as columns and walls are described by primitives (planes, spheres, cylinders, cones, and tori). We adopt a two-step strategy consisting first in segmenting the initial meshbased surface using a multilabel Markov Random Field-based model and second in sampling primitive and mesh components simultaneously on the obtained partition by a Jump-Diffusion process. The quality of a reconstruction is measured by a multi-object energy model which takes into account both photo-consistency and semantic considerations (i.e., geometry and shape layout). The segmentation and sampling steps are embedded into an iterative refinement procedure which provides an increasingly accurate hybrid representation. Experimental results on complex urban structures and large scenes are presented and compared to state-of-the-art multiview stereo meshing algorithms.
Implementation of jump-diffusion algorithms for understanding FLIR scenes
NASA Astrophysics Data System (ADS)
Lanterman, Aaron D.; Miller, Michael I.; Snyder, Donald L.
1995-07-01
Our pattern theoretic approach to the automated understanding of forward-looking infrared (FLIR) images brings the traditionally separate endeavors of detection, tracking, and recognition together into a unified jump-diffusion process. New objects are detected and object types are recognized through discrete jump moves. Between jumps, the location and orientation of objects are estimated via continuous diffusions. An hypothesized scene, simulated from the emissive characteristics of the hypothesized scene elements, is compared with the collected data by a likelihood function based on sensor statistics. This likelihood is combined with a prior distribution defined over the set of possible scenes to form a posterior distribution. The jump-diffusion process empirically generates the posterior distribution. Both the diffusion and jump operations involve the simulation of a scene produced by a hypothesized configuration. Scene simulation is most effectively accomplished by pipelined rendering engines such as silicon graphics. We demonstrate the execution of our algorithm on a silicon graphics onyx/reality engine.
The forensic holodeck: an immersive display for forensic crime scene reconstructions.
Ebert, Lars C; Nguyen, Tuan T; Breitbeck, Robert; Braun, Marcel; Thali, Michael J; Ross, Steffen
2014-12-01
In forensic investigations, crime scene reconstructions are created based on a variety of three-dimensional image modalities. Although the data gathered are three-dimensional, their presentation on computer screens and paper is two-dimensional, which incurs a loss of information. By applying immersive virtual reality (VR) techniques, we propose a system that allows a crime scene to be viewed as if the investigator were present at the scene. We used a low-cost VR headset originally developed for computer gaming in our system. The headset offers a large viewing volume and tracks the user's head orientation in real-time, and an optical tracker is used for positional information. In addition, we created a crime scene reconstruction to demonstrate the system. In this article, we present a low-cost system that allows immersive, three-dimensional and interactive visualization of forensic incident scene reconstructions.
Embedded, real-time UAV control for improved, image-based 3D scene reconstruction
Jean Liénard; Andre Vogs; Demetrios Gatziolis; Nikolay Strigul
2016-01-01
Unmanned Aerial Vehicles (UAVs) are already broadly employed for 3D modeling of large objects such as trees and monuments via photogrammetry. The usual workflow includes two distinct steps: image acquisition with UAV and computationally demanding postflight image processing. Insufficient feature overlaps across images is a common shortcoming in post-flight image...
Efficient summary statistical representation when change localization fails.
Haberman, Jason; Whitney, David
2011-10-01
People are sensitive to the summary statistics of the visual world (e.g., average orientation/speed/facial expression). We readily derive this information from complex scenes, often without explicit awareness. Given the fundamental and ubiquitous nature of summary statistical representation, we tested whether this kind of information is subject to the attentional constraints imposed by change blindness. We show that information regarding the summary statistics of a scene is available despite limited conscious access. In a novel experiment, we found that while observers can suffer from change blindness (i.e., not localize where change occurred between two views of the same scene), observers could nevertheless accurately report changes in the summary statistics (or "gist") about the very same scene. In the experiment, observers saw two successively presented sets of 16 faces that varied in expression. Four of the faces in the first set changed from one emotional extreme (e.g., happy) to another (e.g., sad) in the second set. Observers performed poorly when asked to locate any of the faces that changed (change blindness). However, when asked about the ensemble (which set was happier, on average), observer performance remained high. Observers were sensitive to the average expression even when they failed to localize any specific object change. That is, even when observers could not locate the very faces driving the change in average expression between the two sets, they nonetheless derived a precise ensemble representation. Thus, the visual system may be optimized to process summary statistics in an efficient manner, allowing it to operate despite minimal conscious access to the information presented.
Hayes, Scott M.; Baena, Elsa; Truong, Trong-Kha; Cabeza, Roberto
2011-01-01
Although people do not normally try to remember associations between faces and physical contexts, these associations are established automatically, as indicated by the difficulty of recognizing familiar faces in different contexts (“butcher-on-the-bus” phenomenon). The present functional MRI (fMRI) study investigated the automatic binding of faces and scenes. In the Face-Face (F-F) condition, faces were presented alone during both encoding and retrieval, whereas in the Face/Scene-Face (FS-F) condition, they were presented overlaid on scenes during encoding but alone during retrieval (context change). Although participants were instructed to focus only on the faces during both encoding and retrieval, recognition performance was worse in the FS-F than the F-F condition (“context shift decrement”—CSD), confirming automatic face-scene binding during encoding. This binding was mediated by the hippocampus as indicated by greater subsequent memory effects (remembered > forgotten) in this region for the FS-F than the F-F condition. Scene memory was mediated by the right parahippocampal cortex, which was reactivated during successful retrieval when the faces were associated with a scene during encoding (FS-F condition). Analyses using the CSD as a regressor yielded a clear hemispheric asymmetry in medial temporal lobe activity during encoding: left hippocampal and parahippocampal activity was associated with a smaller CSD, indicating more flexible memory representations immune to context changes, whereas right hippocampal/rhinal activity was associated with a larger CSD, indicating less flexible representations sensitive to context change. Taken together, the results clarify the neural mechanisms of context effects on face recognition. PMID:19925208
A study of payload specialist station monitor size constraints. [space shuttle orbiters
NASA Technical Reports Server (NTRS)
Kirkpatrick, M., III; Shields, N. L., Jr.; Malone, T. B.
1975-01-01
Constraints on the CRT display size for the shuttle orbiter cabin are studied. The viewing requirements placed on these monitors were assumed to involve display of imaged scenes providing visual feedback during payload operations and display of alphanumeric characters. Data on target recognition/resolution, target recognition, and range rate detection by human observers were utilized to determine viewing requirements for imaged scenes. Field-of-view and acuity requirements for a variety of payload operations were obtained along with the necessary detection capability in terms of range-to-target size ratios. The monitor size necessary to meet the acuity requirements was established. An empirical test was conducted to determine required recognition sizes for displayed alphanumeric characters. The results of the test were used to determine the number of characters which could be simultaneously displayed based on the recognition size requirements using the proposed monitor size. A CRT display of 20 x 20 cm is recommended. A portion of the display area is used for displaying imaged scenes and the remaining display area is used for alphanumeric characters pertaining to the displayed scene. The entire display is used for the character alone mode.
Zhao, Hui-Jie; Jiang, Cheng; Jia, Guo-Rui
2014-01-01
Adjacency effects may introduce errors in the quantitative applications of hyperspectral remote sensing, of which the significant item is the earth-atmosphere coupling radiance. However, the surrounding relief and shadow induce strong changes in hyperspectral images acquired from rugged terrain, which is not accurate to describe the spectral characteristics. Furthermore, the radiative coupling process between the earth and the atmosphere is more complex over the rugged scenes. In order to meet the requirements of real-time processing in data simulation, an equivalent reflectance of background was developed by taking into account the topography and the geometry between surroundings and targets based on the radiative transfer process. The contributions of the coupling to the signal at sensor level were then evaluated. This approach was integrated to the sensor-level radiance simulation model and then validated through simulating a set of actual radiance data. The results show that the visual effect of simulated images is consistent with that of observed images. It was also shown that the spectral similarity is improved over rugged scenes. In addition, the model precision is maintained at the same level over flat scenes.
Brand, John; Johnson, Aaron P
2014-01-01
In four experiments, we investigated how attention to local and global levels of hierarchical Navon figures affected the selection of diagnostic spatial scale information used in scene categorization. We explored this issue by asking observers to classify hybrid images (i.e., images that contain low spatial frequency (LSF) content of one image, and high spatial frequency (HSF) content from a second image) immediately following global and local Navon tasks. Hybrid images can be classified according to either their LSF, or HSF content; thus, making them ideal for investigating diagnostic spatial scale preference. Although observers were sensitive to both spatial scales (Experiment 1), they overwhelmingly preferred to classify hybrids based on LSF content (Experiment 2). In Experiment 3, we demonstrated that LSF based hybrid categorization was faster following global Navon tasks, suggesting that LSF processing associated with global Navon tasks primed the selection of LSFs in hybrid images. In Experiment 4, replicating Experiment 3 but suppressing the LSF information in Navon letters by contrast balancing the stimuli examined this hypothesis. Similar to Experiment 3, observers preferred to classify hybrids based on LSF content; however and in contrast, LSF based hybrid categorization was slower following global than local Navon tasks.
Brand, John; Johnson, Aaron P.
2014-01-01
In four experiments, we investigated how attention to local and global levels of hierarchical Navon figures affected the selection of diagnostic spatial scale information used in scene categorization. We explored this issue by asking observers to classify hybrid images (i.e., images that contain low spatial frequency (LSF) content of one image, and high spatial frequency (HSF) content from a second image) immediately following global and local Navon tasks. Hybrid images can be classified according to either their LSF, or HSF content; thus, making them ideal for investigating diagnostic spatial scale preference. Although observers were sensitive to both spatial scales (Experiment 1), they overwhelmingly preferred to classify hybrids based on LSF content (Experiment 2). In Experiment 3, we demonstrated that LSF based hybrid categorization was faster following global Navon tasks, suggesting that LSF processing associated with global Navon tasks primed the selection of LSFs in hybrid images. In Experiment 4, replicating Experiment 3 but suppressing the LSF information in Navon letters by contrast balancing the stimuli examined this hypothesis. Similar to Experiment 3, observers preferred to classify hybrids based on LSF content; however and in contrast, LSF based hybrid categorization was slower following global than local Navon tasks. PMID:25520675
Scene-based nonuniformity correction technique for infrared focal-plane arrays.
Liu, Yong-Jin; Zhu, Hong; Zhao, Yi-Gong
2009-04-20
A scene-based nonuniformity correction algorithm is presented to compensate for the gain and bias nonuniformity in infrared focal-plane array sensors, which can be separated into three parts. First, an interframe-prediction method is used to estimate the true scene, since nonuniformity correction is a typical blind-estimation problem and both scene values and detector parameters are unavailable. Second, the estimated scene, along with its corresponding observed data obtained by detectors, is employed to update the gain and the bias by means of a line-fitting technique. Finally, with these nonuniformity parameters, the compensated output of each detector is obtained by computing a very simple formula. The advantages of the proposed algorithm lie in its low computational complexity and storage requirements and ability to capture temporal drifts in the nonuniformity parameters. The performance of every module is demonstrated with simulated and real infrared image sequences. Experimental results indicate that the proposed algorithm exhibits a superior correction effect.