A Review of Feature Extraction Software for Microarray Gene Expression Data
Tan, Ching Siang; Ting, Wai Soon; Mohamad, Mohd Saberi; Chan, Weng Howe; Deris, Safaai; Ali Shah, Zuraini
2014-01-01
When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. PMID:25250315
Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard
2013-09-06
Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
Feature engineering for MEDLINE citation categorization with MeSH.
Jimeno Yepes, Antonio Jose; Plaza, Laura; Carrillo-de-Albornoz, Jorge; Mork, James G; Aronson, Alan R
2015-04-08
Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.
Nagarajan, Mahesh B; Coan, Paola; Huber, Markus B; Diemoz, Paul C; Wismüller, Axel
2015-01-01
Phase contrast X-ray computed tomography (PCI-CT) has been demonstrated as a novel imaging technique that can visualize human cartilage with high spatial resolution and soft tissue contrast. Different textural approaches have been previously investigated for characterizing chondrocyte organization on PCI-CT to enable classification of healthy and osteoarthritic cartilage. However, the large size of feature sets extracted in such studies motivates an investigation into algorithmic feature reduction for computing efficient feature representations without compromising their discriminatory power. For this purpose, geometrical feature sets derived from the scaling index method (SIM) were extracted from 1392 volumes of interest (VOI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. The extracted feature sets were subject to linear and non-linear dimension reduction techniques as well as feature selection based on evaluation of mutual information criteria. The reduced feature set was subsequently used in a machine learning task with support vector regression to classify VOIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic (ROC) curve (AUC). Our results show that the classification performance achieved by 9-D SIM-derived geometric feature sets (AUC: 0.96 ± 0.02) can be maintained with 2-D representations computed from both dimension reduction and feature selection (AUC values as high as 0.97 ± 0.02). Thus, such feature reduction techniques can offer a high degree of compaction to large feature sets extracted from PCI-CT images while maintaining their ability to characterize the underlying chondrocyte patterns.
Classification of large-scale fundus image data sets: a cloud-computing framework.
Roychowdhury, Sohini
2016-08-01
Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.
Nagarajan, Mahesh B.; Coan, Paola; Huber, Markus B.; Diemoz, Paul C.; Wismüller, Axel
2015-01-01
Phase contrast X-ray computed tomography (PCI-CT) has been demonstrated as a novel imaging technique that can visualize human cartilage with high spatial resolution and soft tissue contrast. Different textural approaches have been previously investigated for characterizing chondrocyte organization on PCI-CT to enable classification of healthy and osteoarthritic cartilage. However, the large size of feature sets extracted in such studies motivates an investigation into algorithmic feature reduction for computing efficient feature representations without compromising their discriminatory power. For this purpose, geometrical feature sets derived from the scaling index method (SIM) were extracted from 1392 volumes of interest (VOI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. The extracted feature sets were subject to linear and non-linear dimension reduction techniques as well as feature selection based on evaluation of mutual information criteria. The reduced feature set was subsequently used in a machine learning task with support vector regression to classify VOIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic (ROC) curve (AUC). Our results show that the classification performance achieved by 9-D SIM-derived geometric feature sets (AUC: 0.96 ± 0.02) can be maintained with 2-D representations computed from both dimension reduction and feature selection (AUC values as high as 0.97 ± 0.02). Thus, such feature reduction techniques can offer a high degree of compaction to large feature sets extracted from PCI-CT images while maintaining their ability to characterize the underlying chondrocyte patterns. PMID:25710875
A Feature-based Approach to Big Data Analysis of Medical Images
Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M.
2015-01-01
This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches in O(log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct. PMID:26221685
A Feature-Based Approach to Big Data Analysis of Medical Images.
Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M
2015-01-01
This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches-in O (log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods.. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct.
Roles and Responsibilities in Feature Teams
NASA Astrophysics Data System (ADS)
Eckstein, Jutta
Agile development requires self-organizing teams. The set-up of a (feature) team has to enable self-organization. Special care has to be taken if the project is not only distributed, but also large and more than one feature team is involved. Every feature team needs in such a setting a product owner who ensures the continuous focus on business delivery. The product owners collaborate by working together in a virtual team. Each feature team is supported by a coach who ensures not only the agile process of the individual feature team but also across all feature teams. An architect (or if necessary a team of architects) takes care that the system is technically sound. Contrariwise to small co-located projects, large global projects require a project manager who deals with—among other things—internal and especially external politics.
NASA Astrophysics Data System (ADS)
Rees, S. J.; Jones, Bryan F.
1992-11-01
Once feature extraction has occurred in a processed image, the recognition problem becomes one of defining a set of features which maps sufficiently well onto one of the defined shape/object models to permit a claimed recognition. This process is usually handled by aggregating features until a large enough weighting is obtained to claim membership, or an adequate number of located features are matched to the reference set. A requirement has existed for an operator or measure capable of a more direct assessment of membership/occupancy between feature sets, particularly where the feature sets may be defective representations. Such feature set errors may be caused by noise, by overlapping of objects, and by partial obscuration of features. These problems occur at the point of acquisition: repairing the data would then assume a priori knowledge of the solution. The technique described in this paper offers a set theoretical measure for partial occupancy defined in terms of the set of minimum additions to permit full occupancy and the set of locations of occupancy if such additions are made. As is shown, this technique permits recognition of partial feature sets with quantifiable degrees of uncertainty. A solution to the problems of obscuration and overlapping is therefore available.
OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets.
García-Pedrajas, Nicolás; Perez-Rodríguez, Javier; de Haro-García, Aida
2013-02-01
In current research, an enormous amount of information is constantly being produced, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection, or text mining, share the following two features: large data sets and class-imbalanced distribution of samples. Although many methods have been proposed for dealing with class-imbalanced data sets, most of these methods are not scalable to the very large data sets common to those research fields. In this paper, we propose a new approach to dealing with the class-imbalance problem that is scalable to data sets with many millions of instances and hundreds of features. This proposal is based on the divide-and-conquer principle combined with application of the selection process to balanced subsets of the whole data set. This divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the whole data set into memory. Using 40 class-imbalanced medium-sized data sets, we will demonstrate our method's ability to improve the results of state-of-the-art instance selection methods for class-imbalanced data sets. Using three very large data sets, we will show the scalability of our proposal to millions of instances and hundreds of features.
2012-01-01
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614
Radial sets: interactive visual analysis of large overlapping sets.
Alsallakh, Bilal; Aigner, Wolfgang; Miksch, Silvia; Hauser, Helwig
2013-12-01
In many applications, data tables contain multi-valued attributes that often store the memberships of the table entities to multiple sets such as which languages a person masters, which skills an applicant documents, or which features a product comes with. With a growing number of entities, the resulting element-set membership matrix becomes very rich of information about how these sets overlap. Many analysis tasks targeted at set-typed data are concerned with these overlaps as salient features of such data. This paper presents Radial Sets, a novel visual technique to analyze set memberships for a large number of elements. Our technique uses frequency-based representations to enable quickly finding and analyzing different kinds of overlaps between the sets, and relating these overlaps to other attributes of the table entities. Furthermore, it enables various interactions to select elements of interest, find out if they are over-represented in specific sets or overlaps, and if they exhibit a different distribution for a specific attribute compared to the rest of the elements. These interactions allow formulating highly-expressive visual queries on the elements in terms of their set memberships and attribute values. As we demonstrate via two usage scenarios, Radial Sets enable revealing and analyzing a multitude of overlapping patterns between large sets, beyond the limits of state-of-the-art techniques.
WND-CHARM: Multi-purpose image classification using compound image transforms
Orlov, Nikita; Shamir, Lior; Macura, Tomasz; Johnston, Josiah; Eckley, D. Mark; Goldberg, Ilya G.
2008-01-01
We describe a multi-purpose image classifier that can be applied to a wide variety of image classification tasks without modifications or fine-tuning, and yet provide classification accuracy comparable to state-of-the-art task-specific image classifiers. The proposed image classifier first extracts a large set of 1025 image features including polynomial decompositions, high contrast features, pixel statistics, and textures. These features are computed on the raw image, transforms of the image, and transforms of transforms of the image. The feature values are then used to classify test images into a set of pre-defined image classes. This classifier was tested on several different problems including biological image classification and face recognition. Although we cannot make a claim of universality, our experimental results show that this classifier performs as well or better than classifiers developed specifically for these image classification tasks. Our classifier’s high performance on a variety of classification problems is attributed to (i) a large set of features extracted from images; and (ii) an effective feature selection and weighting algorithm sensitive to specific image classification problems. The algorithms are available for free download from openmicroscopy.org. PMID:18958301
Feature Selection Methods for Zero-Shot Learning of Neural Activity.
Caceres, Carlos A; Roos, Matthew J; Rupp, Kyle M; Milsap, Griffin; Crone, Nathan E; Wolmetz, Michael E; Ratto, Christopher R
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.
NASA Technical Reports Server (NTRS)
Bradley, D. B.; Cain, J. B., III; Williard, M. W.
1978-01-01
The task was to evaluate the ability of a set of timing/synchronization subsystem features to provide a set of desirable characteristics for the evolving Defense Communications System digital communications network. The set of features related to the approaches by which timing/synchronization information could be disseminated throughout the network and the manner in which this information could be utilized to provide a synchronized network. These features, which could be utilized in a large number of different combinations, included mutual control, directed control, double ended reference links, independence of clock error measurement and correction, phase reference combining, and self organizing.
Design of 240,000 orthogonal 25mer DNA barcode probes.
Xu, Qikai; Schlabach, Michael R; Hannon, Gregory J; Elledge, Stephen J
2009-02-17
DNA barcodes linked to genetic features greatly facilitate screening these features in pooled formats using microarray hybridization, and new tools are needed to design large sets of barcodes to allow construction of large barcoded mammalian libraries such as shRNA libraries. Here we report a framework for designing large sets of orthogonal barcode probes. We demonstrate the utility of this framework by designing 240,000 barcode probes and testing their performance by hybridization. From the test hybridizations, we also discovered new probe design rules that significantly reduce cross-hybridization after their introduction into the framework of the algorithm. These rules should improve the performance of DNA microarray probe designs for many applications.
Design of 240,000 orthogonal 25mer DNA barcode probes
Xu, Qikai; Schlabach, Michael R.; Hannon, Gregory J.; Elledge, Stephen J.
2009-01-01
DNA barcodes linked to genetic features greatly facilitate screening these features in pooled formats using microarray hybridization, and new tools are needed to design large sets of barcodes to allow construction of large barcoded mammalian libraries such as shRNA libraries. Here we report a framework for designing large sets of orthogonal barcode probes. We demonstrate the utility of this framework by designing 240,000 barcode probes and testing their performance by hybridization. From the test hybridizations, we also discovered new probe design rules that significantly reduce cross-hybridization after their introduction into the framework of the algorithm. These rules should improve the performance of DNA microarray probe designs for many applications. PMID:19171886
NASA Astrophysics Data System (ADS)
Tadini, A.; Bisson, M.; Neri, A.; Cioni, R.; Bevilacqua, A.; Aspinall, W. P.
2017-06-01
This study presents new and revised data sets about the spatial distribution of past volcanic vents, eruptive fissures, and regional/local structures of the Somma-Vesuvio volcanic system (Italy). The innovative features of the study are the identification and quantification of important sources of uncertainty affecting interpretations of the data sets. In this regard, the spatial uncertainty of each feature is modeled by an uncertainty area, i.e., a geometric element typically represented by a polygon drawn around points or lines. The new data sets have been assembled as an updatable geodatabase that integrates and complements existing databases for Somma-Vesuvio. The data are organized into 4 data sets and stored as 11 feature classes (points and lines for feature locations and polygons for the associated uncertainty areas), totaling more than 1700 elements. More specifically, volcanic vent and eruptive fissure elements are subdivided into feature classes according to their associated eruptive styles: (i) Plinian and sub-Plinian eruptions (i.e., large- or medium-scale explosive activity); (ii) violent Strombolian and continuous ash emission eruptions (i.e., small-scale explosive activity); and (iii) effusive eruptions (including eruptions from both parasitic vents and eruptive fissures). Regional and local structures (i.e., deep faults) are represented as linear feature classes. To support interpretation of the eruption data, additional data sets are provided for Somma-Vesuvio geological units and caldera morphological features. In the companion paper, the data presented here, and the associated uncertainties, are used to develop a first vent opening probability map for the Somma-Vesuvio caldera, with specific attention focused on large or medium explosive events.
Design of an efficient music-speech discriminator.
Tardón, Lorenzo J; Sammartino, Simone; Barbancho, Isabel
2010-01-01
In this paper, the problem of the design of a simple and efficient music-speech discriminator for large audio data sets in which advanced music playing techniques are taught and voice and music are intrinsically interleaved is addressed. In the process, a number of features used in speech-music discrimination are defined and evaluated over the available data set. Specifically, the data set contains pieces of classical music played with different and unspecified instruments (or even lyrics) and the voice of a teacher (a top music performer) or even the overlapped voice of the translator and other persons. After an initial test of the performance of the features implemented, a selection process is started, which takes into account the type of classifier selected beforehand, to achieve good discrimination performance and computational efficiency, as shown in the experiments. The discrimination application has been defined and tested on a large data set supplied by Fundacion Albeniz, containing a large variety of classical music pieces played with different instrument, which include comments and speeches of famous performers.
Heredia-Langner, Alejandro; Cort, John; Bailey, Vanessa
2018-01-16
The Fishing for Features Signature Discovery project developed a framework for discovering signature features in challenging environments involving large and complex data sets or where phenomena may be poorly characterized or understood. Researchers at PNNL have applied the framework to the optimization of biofuels blending and to discover signatures of climate change on microbial soil communities.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heredia-Langner, Alejandro; Cort, John; Bailey, Vanessa
2016-07-21
The Fishing for Features Signature Discovery project developed a framework for discovering signature features in challenging environments involving large and complex data sets or where phenomena may be poorly characterized or understood. Researchers at PNNL have applied the framework to the optimization of biofuels blending and to discover signatures of climate change on microbial soil communities.
Mof-Tree: A Spatial Access Method To Manipulate Multiple Overlapping Features.
ERIC Educational Resources Information Center
Manolopoulos, Yannis; Nardelli, Enrico; Papadopoulos, Apostolos; Proietti, Guido
1997-01-01
Investigates the manipulation of large sets of two-dimensional data representing multiple overlapping features, and presents a new access method, the MOF-tree. Analyzes storage requirements and time with respect to window query operations involving multiple features. Examines both the pointer-based and pointerless MOF-tree representations.…
Permafrost features on Earth and Mars: Similarities, differences
NASA Technical Reports Server (NTRS)
Joens, H. P.
1985-01-01
Typical permafrost features on Earth are polygonal structures, pingos and soli-/gelifluxion features. In areas around the poles and in mountain ranges the precipitation accumulates to inland ice or ice streams. On Mars the same features were identified: polygonal features cover the larger part of the northern lowlands indicating probably an ice wedge-/sand wedge system or desiccation cracks. These features indicate the extend of large mud accumulations which seem to be related to large outflow events of the chaotic terrains. The shore line of this mud accumulation is indicated by a special set of relief types. In some areas large pingo-like hills were identified. In the vicinity of the largest martian volcano, Olympus Mons, the melting of underlying permafrost and/or ground ice led to the downslope sliding of large parts of the primary shield which formed the aureole around Olympus Mons. Glacier-like features are identified along the escarpment which separates the Southern Uplands from the Northern Lowlands.
Hwang, Wonjun; Wang, Haitao; Kim, Hyunwoo; Kee, Seok-Cheol; Kim, Junmo
2011-04-01
The authors present a robust face recognition system for large-scale data sets taken under uncontrolled illumination variations. The proposed face recognition system consists of a novel illumination-insensitive preprocessing method, a hybrid Fourier-based facial feature extraction, and a score fusion scheme. First, in the preprocessing stage, a face image is transformed into an illumination-insensitive image, called an "integral normalized gradient image," by normalizing and integrating the smoothed gradients of a facial image. Then, for feature extraction of complementary classifiers, multiple face models based upon hybrid Fourier features are applied. The hybrid Fourier features are extracted from different Fourier domains in different frequency bandwidths, and then each feature is individually classified by linear discriminant analysis. In addition, multiple face models are generated by plural normalized face images that have different eye distances. Finally, to combine scores from multiple complementary classifiers, a log likelihood ratio-based score fusion scheme is applied. The proposed system using the face recognition grand challenge (FRGC) experimental protocols is evaluated; FRGC is a large available data set. Experimental results on the FRGC version 2.0 data sets have shown that the proposed method shows an average of 81.49% verification rate on 2-D face images under various environmental variations such as illumination changes, expression changes, and time elapses.
Feature Selection Methods for Zero-Shot Learning of Neural Activity
Caceres, Carlos A.; Roos, Matthew J.; Rupp, Kyle M.; Milsap, Griffin; Crone, Nathan E.; Wolmetz, Michael E.; Ratto, Christopher R.
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy. PMID:28690513
Correlated Topic Vector for Scene Classification.
Wei, Pengxu; Qin, Fei; Wan, Fang; Zhu, Yi; Jiao, Jianbin; Ye, Qixiang
2017-07-01
Scene images usually involve semantic correlations, particularly when considering large-scale image data sets. This paper proposes a novel generative image representation, correlated topic vector, to model such semantic correlations. Oriented from the correlated topic model, correlated topic vector intends to naturally utilize the correlations among topics, which are seldom considered in the conventional feature encoding, e.g., Fisher vector, but do exist in scene images. It is expected that the involvement of correlations can increase the discriminative capability of the learned generative model and consequently improve the recognition accuracy. Incorporated with the Fisher kernel method, correlated topic vector inherits the advantages of Fisher vector. The contributions to the topics of visual words have been further employed by incorporating the Fisher kernel framework to indicate the differences among scenes. Combined with the deep convolutional neural network (CNN) features and Gibbs sampling solution, correlated topic vector shows great potential when processing large-scale and complex scene image data sets. Experiments on two scene image data sets demonstrate that correlated topic vector improves significantly the deep CNN features, and outperforms existing Fisher kernel-based features.
Action recognition using mined hierarchical compound features.
Gilbert, Andrew; Illingworth, John; Bowden, Richard
2011-05-01
The field of Action Recognition has seen a large increase in activity in recent years. Much of the progress has been through incorporating ideas from single-frame object recognition and adapting them for temporal-based action recognition. Inspired by the success of interest points in the 2D spatial domain, their 3D (space-time) counterparts typically form the basic components used to describe actions, and in action recognition the features used are often engineered to fire sparsely. This is to ensure that the problem is tractable; however, this can sacrifice recognition accuracy as it cannot be assumed that the optimum features in terms of class discrimination are obtained from this approach. In contrast, we propose to initially use an overcomplete set of simple 2D corners in both space and time. These are grouped spatially and temporally using a hierarchical process, with an increasing search area. At each stage of the hierarchy, the most distinctive and descriptive features are learned efficiently through data mining. This allows large amounts of data to be searched for frequently reoccurring patterns of features. At each level of the hierarchy, the mined compound features become more complex, discriminative, and sparse. This results in fast, accurate recognition with real-time performance on high-resolution video. As the compound features are constructed and selected based upon their ability to discriminate, their speed and accuracy increase at each level of the hierarchy. The approach is tested on four state-of-the-art data sets, the popular KTH data set to provide a comparison with other state-of-the-art approaches, the Multi-KTH data set to illustrate performance at simultaneous multiaction classification, despite no explicit localization information provided during training. Finally, the recent Hollywood and Hollywood2 data sets provide challenging complex actions taken from commercial movie sequences. For all four data sets, the proposed hierarchical approach outperforms all other methods reported thus far in the literature and can achieve real-time operation.
Friberg, Anders; Schoonderwaldt, Erwin; Hedblad, Anton; Fabiani, Marco; Elowsson, Anders
2014-10-01
The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a "brute force" model using a large number of general audio features.
NASA Astrophysics Data System (ADS)
Saha, Ashirbani; Harowicz, Michael R.; Grimm, Lars J.; Kim, Connie E.; Ghate, Sujata V.; Walsh, Ruth; Mazurowski, Maciej A.
2018-02-01
One of the methods widely used to measure the proliferative activity of cells in breast cancer patients is the immunohistochemical (IHC) measurement of the percentage of cells stained for nuclear antigen Ki-67. Use of Ki-67 expression as a prognostic marker is still under investigation. However, numerous clinical studies have reported an association between a high Ki-67 and overall survival (OS) and disease free survival (DFS). On the other hand, to offer non-invasive alternative in determining Ki-67 expression, researchers have made recent attempts to study the association of Ki-67 expression with magnetic resonance (MR) imaging features of breast cancer in small cohorts (<30). Here, we present a large scale evaluation of the relationship between imaging features and Ki-67 score as: (a) we used a set of 450 invasive breast cancer patients, (b) we extracted a set of 529 imaging features of shape and enhancement from breast, tumor and fibroglandular tissue of the patients, (c) used a subset of patients as the training set to select features and trained a multivariate logistic regression model to predict high versus low Ki-67 values, and (d) we validated the performance of the trained model in an independent test set using the area-under the receiver operating characteristics (ROC) curve (AUC) of the values predicted. Our model was able to predict high versus low Ki-67 in the test set with an AUC of 0.67 (95% CI: 0.58-0.75, p<1.1e-04). Thus, a moderate strength of association of Ki-67 values and MRextracted imaging features was demonstrated in our experiments.
Impact of experimental design on PET radiomics in predicting somatic mutation status.
Yip, Stephen S F; Parmar, Chintan; Kim, John; Huynh, Elizabeth; Mak, Raymond H; Aerts, Hugo J W L
2017-12-01
PET-based radiomic features have demonstrated great promises in predicting genetic data. However, various experimental parameters can influence the feature extraction pipeline, and hence, Here, we investigated how experimental settings affect the performance of radiomic features in predicting somatic mutation status in non-small cell lung cancer (NSCLC) patients. 348 NSCLC patients with somatic mutation testing and diagnostic PET images were included in our analysis. Radiomic feature extractions were analyzed for varying voxel sizes, filters and bin widths. 66 radiomic features were evaluated. The performance of features in predicting mutations status was assessed using the area under the receiver-operating-characteristic curve (AUC). The influence of experimental parameters on feature predictability was quantified as the relative difference between the minimum and maximum AUC (δ). The large majority of features (n=56, 85%) were significantly predictive for EGFR mutation status (AUC≥0.61). 29 radiomic features significantly predicted EGFR mutations and were robust to experimental settings with δ Overall <5%. The overall influence (δ Overall ) of the voxel size, filter and bin width for all features ranged from 5% to 15%, respectively. For all features, none of the experimental designs was predictive of KRAS+ from KRAS- (AUC≤0.56). The predictability of 29 radiomic features was robust to the choice of experimental settings; however, these settings need to be carefully chosen for all other features. The combined effect of the investigated processing methods could be substantial and must be considered. Optimized settings that will maximize the predictive performance of individual radiomic features should be investigated in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Complex extreme learning machine applications in terahertz pulsed signals feature sets.
Yin, X-X; Hadjiloucas, S; Zhang, Y
2014-11-01
This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Machine-assisted discovery of relationships in astronomy
NASA Astrophysics Data System (ADS)
Graham, Matthew J.; Djorgovski, S. G.; Mahabal, Ashish A.; Donalek, Ciro; Drake, Andrew J.
2013-05-01
High-volume feature-rich data sets are becoming the bread-and-butter of 21st century astronomy but present significant challenges to scientific discovery. In particular, identifying scientifically significant relationships between sets of parameters is non-trivial. Similar problems in biological and geosciences have led to the development of systems which can explore large parameter spaces and identify potentially interesting sets of associations. In this paper, we describe the application of automated discovery systems of relationships to astronomical data sets, focusing on an evolutionary programming technique and an information-theory technique. We demonstrate their use with classical astronomical relationships - the Hertzsprung-Russell diagram and the Fundamental Plane of elliptical galaxies. We also show how they work with the issue of binary classification which is relevant to the next generation of large synoptic sky surveys, such as the Large Synoptic Survey Telescope (LSST). We find that comparable results to more familiar techniques, such as decision trees, are achievable. Finally, we consider the reality of the relationships discovered and how this can be used for feature selection and extraction.
SETL for Internet Data Processing
2000-01-01
early phase in the evolution of most large software systems, especially those featuring novel designs [135, 173, 66, 70, 71 ]. Second, SETL’s strong...into SETL/E [66, 61], a revision of SETL that was extended with a process creation operator and renamed ProSet [ 71 ] to signify its role in prototyping...set-oriented languages having intrinsic persistence features [60, 64, 67, 70, 71 , 62] that sought to spare the programmer the trouble of coding data
Dimensionality Reduction Through Classifier Ensembles
NASA Technical Reports Server (NTRS)
Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)
1999-01-01
In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.
Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection
NASA Astrophysics Data System (ADS)
Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline
2014-07-01
Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.
NASA Astrophysics Data System (ADS)
Nemoto, Mitsutaka; Hayashi, Naoto; Hanaoka, Shouhei; Nomura, Yukihiro; Miki, Soichiro; Yoshikawa, Takeharu; Ohtomo, Kuni
2016-03-01
The purpose of this study is to evaluate the feasibility of a novel feature generation, which is based on multiple deep neural networks (DNNs) with boosting, for computer-assisted detection (CADe). It is hard and time-consuming to optimize the hyperparameters for DNNs such as stacked denoising autoencoder (SdA). The proposed method allows using SdA based features without the burden of the hyperparameter setting. The proposed method was evaluated by an application for detecting cerebral aneurysms on magnetic resonance angiogram (MRA). A baseline CADe process included four components; scaling, candidate area limitation, candidate detection, and candidate classification. Proposed feature generation method was applied to extract the optimal features for candidate classification. Proposed method only required setting range of the hyperparameters for SdA. The optimal feature set was selected from a large quantity of SdA based features by multiple SdAs, each of which was trained using different hyperparameter set. The feature selection was operated through ada-boost ensemble learning method. Training of the baseline CADe process and proposed feature generation were operated with 200 MRA cases, and the evaluation was performed with 100 MRA cases. Proposed method successfully provided SdA based features just setting the range of some hyperparameters for SdA. The CADe process by using both previous voxel features and SdA based features had the best performance with 0.838 of an area under ROC curve and 0.312 of ANODE score. The results showed that proposed method was effective in the application for detecting cerebral aneurysms on MRA.
Large Margin Multi-Modal Multi-Task Feature Extraction for Image Classification.
Yong Luo; Yonggang Wen; Dacheng Tao; Jie Gui; Chao Xu
2016-01-01
The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We, therefore, propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features, so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization, and each subproblem can be efficiently solved. Experiments on two challenging real-world image data sets demonstrate the effectiveness and superiority of the proposed method.
Using open-source programs to create a web-based portal for hydrologic information
NASA Astrophysics Data System (ADS)
Kim, H.
2013-12-01
Some hydrologic data sets, such as basin climatology, precipitation, and terrestrial water storage, are not easily obtainable and distributable due to their size and complexity. We present a Hydrologic Information Portal (HIP) that has been implemented at the University of California for Hydrologic Modeling (UCCHM) and that has been organized around the large river basins of North America. This portal can be easily accessed through a modern web browser that enables easy access and visualization of such hydrologic data sets. Some of the main features of our HIP include a set of data visualization features so that users can search, retrieve, analyze, integrate, organize, and map data within large river basins. Recent information technologies such as Google Maps, Tornado (Python asynchronous web server), NumPy/SciPy (Scientific Library for Python) and d3.js (Visualization library for JavaScript) were incorporated into the HIP to create ease in navigating large data sets. With such open source libraries, HIP can give public users a way to combine and explore various data sets by generating multiple chart types (Line, Bar, Pie, Scatter plot) directly from the Google Maps viewport. Every rendered object such as a basin shape on the viewport is clickable, and this is the first step to access the visualization of data sets.
Kupas, Katrin; Ultsch, Alfred; Klebe, Gerhard
2008-05-15
A new method to discover similar substructures in protein binding pockets, independently of sequence and folding patterns or secondary structure elements, is introduced. The solvent-accessible surface of a binding pocket, automatically detected as a depression on the protein surface, is divided into a set of surface patches. Each surface patch is characterized by its shape as well as by its physicochemical characteristics. Wavelets defined on surfaces are used for the description of the shape, as they have the great advantage of allowing a comparison at different resolutions. The number of coefficients to describe the wavelets can be chosen with respect to the size of the considered data set. The physicochemical characteristics of the patches are described by the assignment of the exposed amino acid residues to one or more of five different properties determinant for molecular recognition. A self-organizing neural network is used to project the high-dimensional feature vectors onto a two-dimensional layer of neurons, called a map. To find similarities between the binding pockets, in both geometrical and physicochemical features, a clustering of the projected feature vector is performed using an automatic distance- and density-based clustering algorithm. The method was validated with a small training data set of 109 binding cavities originating from a set of enzymes covering 12 different EC numbers. A second test data set of 1378 binding cavities, extracted from enzymes of 13 different EC numbers, was then used to prove the discriminating power of the algorithm and to demonstrate its applicability to large scale analyses. In all cases, members of the data set with the same EC number were placed into coherent regions on the map, with small distances between them. Different EC numbers are separated by large distances between the feature vectors. A third data set comprising three subfamilies of endopeptidases is used to demonstrate the ability of the algorithm to detect similar substructures between functionally related active sites. The algorithm can also be used to predict the function of novel proteins not considered in training data set. 2007 Wiley-Liss, Inc.
Geospatial Analytics in Retail Site Selection and Sales Prediction.
Ting, Choo-Yee; Ho, Chiung Ching; Yee, Hui Jia; Matsah, Wan Razali
2018-03-01
Studies have shown that certain features from geography, demography, trade area, and environment can play a vital role in retail site selection, largely due to the impact they asserted on retail performance. Although the relevant features could be elicited by domain experts, determining the optimal feature set can be intractable and labor-intensive exercise. The challenges center around (1) how to determine features that are important to a particular retail business and (2) how to estimate retail sales performance given a new location? The challenges become apparent when the features vary across time. In this light, this study proposed a nonintervening approach by employing feature selection algorithms and subsequently sales prediction through similarity-based methods. The results of prediction were validated by domain experts. In this study, data sets from different sources were transformed and aggregated before an analytics data set that is ready for analysis purpose could be obtained. The data sets included data about feature location, population count, property type, education status, and monthly sales from 96 branches of a telecommunication company in Malaysia. The finding suggested that (1) optimal retail performance can only be achieved through fulfillment of specific location features together with the surrounding trade area characteristics and (2) similarity-based method can provide solution to retail sales prediction.
MiniWall Tool for Analyzing CFD and Wind Tunnel Large Data Sets
NASA Technical Reports Server (NTRS)
Schuh, Michael J.; Melton, John E.; Stremel, Paul M.
2017-01-01
It is challenging to review and assimilate large data sets created by Computational Fluid Dynamics (CFD) simulations and wind tunnel tests. Over the past 10 years, NASA Ames Research Center has developed and refined a software tool dubbed the MiniWall to increase productivity in reviewing and understanding large CFD-generated data sets. Under the recent NASA ERA project, the application of the tool expanded to enable rapid comparison of experimental and computational data. The MiniWall software is browser based so that it runs on any computer or device that can display a web page. It can also be used remotely and securely by using web server software such as the Apache HTTP server. The MiniWall software has recently been rewritten and enhanced to make it even easier for analysts to review large data sets and extract knowledge and understanding from these data sets. This paper describes the MiniWall software and demonstrates how the different features are used to review and assimilate large data sets.
MiniWall Tool for Analyzing CFD and Wind Tunnel Large Data Sets
NASA Technical Reports Server (NTRS)
Schuh, Michael J.; Melton, John E.; Stremel, Paul M.
2017-01-01
It is challenging to review and assimilate large data sets created by Computational Fluid Dynamics (CFD) simulations and wind tunnel tests. Over the past 10 years, NASA Ames Research Center has developed and refined a software tool dubbed the "MiniWall" to increase productivity in reviewing and understanding large CFD-generated data sets. Under the recent NASA ERA project, the application of the tool expanded to enable rapid comparison of experimental and computational data. The MiniWall software is browser based so that it runs on any computer or device that can display a web page. It can also be used remotely and securely by using web server software such as the Apache HTTP Server. The MiniWall software has recently been rewritten and enhanced to make it even easier for analysts to review large data sets and extract knowledge and understanding from these data sets. This paper describes the MiniWall software and demonstrates how the different features are used to review and assimilate large data sets.
Classification Influence of Features on Given Emotions and Its Application in Feature Selection
NASA Astrophysics Data System (ADS)
Xing, Yin; Chen, Chuang; Liu, Li-Long
2018-04-01
In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.
Model-Based Learning of Local Image Features for Unsupervised Texture Segmentation
NASA Astrophysics Data System (ADS)
Kiechle, Martin; Storath, Martin; Weinmann, Andreas; Kleinsteuber, Martin
2018-04-01
Features that capture well the textural patterns of a certain class of images are crucial for the performance of texture segmentation methods. The manual selection of features or designing new ones can be a tedious task. Therefore, it is desirable to automatically adapt the features to a certain image or class of images. Typically, this requires a large set of training images with similar textures and ground truth segmentation. In this work, we propose a framework to learn features for texture segmentation when no such training data is available. The cost function for our learning process is constructed to match a commonly used segmentation model, the piecewise constant Mumford-Shah model. This means that the features are learned such that they provide an approximately piecewise constant feature image with a small jump set. Based on this idea, we develop a two-stage algorithm which first learns suitable convolutional features and then performs a segmentation. We note that the features can be learned from a small set of images, from a single image, or even from image patches. The proposed method achieves a competitive rank in the Prague texture segmentation benchmark, and it is effective for segmenting histological images.
[Application of Kohonen Self-Organizing Feature Maps in QSAR of human ADMET and kinase data sets].
Hegymegi-Barakonyi, Bálint; Orfi, László; Kéri, György; Kövesdi, István
2013-01-01
QSAR predictions have been proven very useful in a large number of studies for drug design, such as kinase inhibitor design as targets for cancer therapy, however the overall predictability often remains unsatisfactory. To improve predictability of ADMET features and kinase inhibitory data, we present a new method using Kohonen's Self-Organizing Feature Map (SOFM) to cluster molecules based on explanatory variables (X) and separate dissimilar ones. We calculated SOFM clusters for a large number of molecules with human ADMET and kinase inhibitory data, and we showed that chemically similar molecules were in the same SOFM cluster, and within such clusters the QSAR models had significantly better predictability. We used also target variables (Y, e.g. ADMET) jointly with X variables to create a novel type of clustering. With our method, cells of loosely coupled XY data could be identified and separated into different model building sets.
Voxel classification based airway tree segmentation
NASA Astrophysics Data System (ADS)
Lo, Pechin; de Bruijne, Marleen
2008-03-01
This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.
Vajda, Szilárd; Rangoni, Yves; Cecotti, Hubert
2015-01-01
For training supervised classifiers to recognize different patterns, large data collections with accurate labels are necessary. In this paper, we propose a generic, semi-automatic labeling technique for large handwritten character collections. In order to speed up the creation of a large scale ground truth, the method combines unsupervised clustering and minimal expert knowledge. To exploit the potential discriminant complementarities across features, each character is projected into five different feature spaces. After clustering the images in each feature space, the human expert labels the cluster centers. Each data point inherits the label of its cluster’s center. A majority (or unanimity) vote decides the label of each character image. The amount of human involvement (labeling) is strictly controlled by the number of clusters – produced by the chosen clustering approach. To test the efficiency of the proposed approach, we have compared, and evaluated three state-of-the art clustering methods (k-means, self-organizing maps, and growing neural gas) on the MNIST digit data set, and a Lampung Indonesian character data set, respectively. Considering a k-nn classifier, we show that labeling manually only 1.3% (MNIST), and 3.2% (Lampung) of the training data, provides the same range of performance than a completely labeled data set would. PMID:25870463
Detecting spam comments on Indonesia’s Instagram posts
NASA Astrophysics Data System (ADS)
Septiandri, Ali Akbar; Wibisono, Okiriza
2017-01-01
In this paper we experimented with several feature sets for detecting spam comments in social media contents authored by Indonesian public figures. We define spam comments as comments which have promotional purposes (e.g. referring other users to products and services) and thus not related to the content to which the comments are posted. Three sets of features are evaluated for detecting spams: (1) hand-engineered features such as comment length, number of capital letters, and number of emojis, (2) keyword features such as whether the comment contains advertising words or product-related words, and (3) text features, namely, bag-of-words, TF-IDF, and fastText embeddings, each combined with latent semantic analysis. With 24,000 manually-annotated comments scraped from Instagram posts authored by more than 100 Indonesian public figures, we compared the performance of these feature sets and their combinations using 3 popular classification algorithms: Na¨ıve Bayes, SVM, and XGBoost. We find that using all three feature sets (with fastText embedding for the text features) gave the best F 1-score of 0.9601 on a holdout dataset. More interestingly, fastText embedding combined with hand-engineered features (i.e. without keyword features) yield similar F 1-score of 0.9523, and McNemar’s test failed to reject the hypothesis that the two results are not significantly different. This result is important as keyword features are largely dependent on the dataset and may not be as generalisable as the other feature sets when applied to new data. For future work, we hope to collect bigger and more diverse dataset of Indonesian spam comments, improve our model’s performance and generalisability, and publish a programming package for others to reliably detect spam comments.
Real-Time Feature Tracking Using Homography
NASA Technical Reports Server (NTRS)
Clouse, Daniel S.; Cheng, Yang; Ansar, Adnan I.; Trotz, David C.; Padgett, Curtis W.
2010-01-01
This software finds feature point correspondences in sequences of images. It is designed for feature matching in aerial imagery. Feature matching is a fundamental step in a number of important image processing operations: calibrating the cameras in a camera array, stabilizing images in aerial movies, geo-registration of images, and generating high-fidelity surface maps from aerial movies. The method uses a Shi-Tomasi corner detector and normalized cross-correlation. This process is likely to result in the production of some mismatches. The feature set is cleaned up using the assumption that there is a large planar patch visible in both images. At high altitude, this assumption is often reasonable. A mathematical transformation, called an homography, is developed that allows us to predict the position in image 2 of any point on the plane in image 1. Any feature pair that is inconsistent with the homography is thrown out. The output of the process is a set of feature pairs, and the homography. The algorithms in this innovation are well known, but the new implementation improves the process in several ways. It runs in real-time at 2 Hz on 64-megapixel imagery. The new Shi-Tomasi corner detector tries to produce the requested number of features by automatically adjusting the minimum distance between found features. The homography-finding code now uses an implementation of the RANSAC algorithm that adjusts the number of iterations automatically to achieve a pre-set probability of missing a set of inliers. The new interface allows the caller to pass in a set of predetermined points in one of the images. This allows the ability to track the same set of points through multiple frames.
Kolchinsky, A; Lourenço, A; Li, L; Rocha, L M
2013-01-01
Drug-drug interaction (DDI) is a major cause of morbidity and mortality. DDI research includes the study of different aspects of drug interactions, from in vitro pharmacology, which deals with drug interaction mechanisms, to pharmaco-epidemiology, which investigates the effects of DDI on drug efficacy and adverse drug reactions. Biomedical literature mining can aid both kinds of approaches by extracting relevant DDI signals from either the published literature or large clinical databases. However, though drug interaction is an ideal area for translational research, the inclusion of literature mining methodologies in DDI workflows is still very preliminary. One area that can benefit from literature mining is the automatic identification of a large number of potential DDIs, whose pharmacological mechanisms and clinical significance can then be studied via in vitro pharmacology and in populo pharmaco-epidemiology. We implemented a set of classifiers for identifying published articles relevant to experimental pharmacokinetic DDI evidence. These documents are important for identifying causal mechanisms behind putative drug-drug interactions, an important step in the extraction of large numbers of potential DDIs. We evaluate performance of several linear classifiers on PubMed abstracts, under different feature transformation and dimensionality reduction methods. In addition, we investigate the performance benefits of including various publicly-available named entity recognition features, as well as a set of internally-developed pharmacokinetic dictionaries. We found that several classifiers performed well in distinguishing relevant and irrelevant abstracts. We found that the combination of unigram and bigram textual features gave better performance than unigram features alone, and also that normalization transforms that adjusted for feature frequency and document length improved classification. For some classifiers, such as linear discriminant analysis (LDA), proper dimensionality reduction had a large impact on performance. Finally, the inclusion of NER features and dictionaries was found not to help classification.
Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.
Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki
2016-07-01
We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.
Large-Scale Multiobjective Static Test Generation for Web-Based Testing with Integer Programming
ERIC Educational Resources Information Center
Nguyen, M. L.; Hui, Siu Cheung; Fong, A. C. M.
2013-01-01
Web-based testing has become a ubiquitous self-assessment method for online learning. One useful feature that is missing from today's web-based testing systems is the reliable capability to fulfill different assessment requirements of students based on a large-scale question data set. A promising approach for supporting large-scale web-based…
NASA Astrophysics Data System (ADS)
Castillo, Richard; Castillo, Edward; Fuentes, David; Ahmad, Moiz; Wood, Abbie M.; Ludwig, Michelle S.; Guerrero, Thomas
2013-05-01
Landmark point-pairs provide a strategy to assess deformable image registration (DIR) accuracy in terms of the spatial registration of the underlying anatomy depicted in medical images. In this study, we propose to augment a publicly available database (www.dir-lab.com) of medical images with large sets of manually identified anatomic feature pairs between breath-hold computed tomography (BH-CT) images for DIR spatial accuracy evaluation. Ten BH-CT image pairs were randomly selected from the COPDgene study cases. Each patient had received CT imaging of the entire thorax in the supine position at one-fourth dose normal expiration and maximum effort full dose inspiration. Using dedicated in-house software, an imaging expert manually identified large sets of anatomic feature pairs between images. Estimates of inter- and intra-observer spatial variation in feature localization were determined by repeat measurements of multiple observers over subsets of randomly selected features. 7298 anatomic landmark features were manually paired between the 10 sets of images. Quantity of feature pairs per case ranged from 447 to 1172. Average 3D Euclidean landmark displacements varied substantially among cases, ranging from 12.29 (SD: 6.39) to 30.90 (SD: 14.05) mm. Repeat registration of uniformly sampled subsets of 150 landmarks for each case yielded estimates of observer localization error, which ranged in average from 0.58 (SD: 0.87) to 1.06 (SD: 2.38) mm for each case. The additions to the online web database (www.dir-lab.com) described in this work will broaden the applicability of the reference data, providing a freely available common dataset for targeted critical evaluation of DIR spatial accuracy performance in multiple clinical settings. Estimates of observer variance in feature localization suggest consistent spatial accuracy for all observers across both four-dimensional CT and COPDgene patient cohorts.
Munitions related feature extraction from LIDAR data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roberts, Barry L.
2010-06-01
The characterization of former military munitions ranges is critical in the identification of areas likely to contain residual unexploded ordnance (UXO). Although these ranges are large, often covering tens-of-thousands of acres, the actual target areas represent only a small fraction of the sites. The challenge is that many of these sites do not have records indicating locations of former target areas. The identification of target areas is critical in the characterization and remediation of these sites. The Strategic Environmental Research and Development Program (SERDP) and Environmental Security Technology Certification Program (ESTCP) of the DoD have been developing and implementing techniquesmore » for the efficient characterization of large munitions ranges. As part of this process, high-resolution LIDAR terrain data sets have been collected over several former ranges. These data sets have been shown to contain information relating to former munitions usage at these ranges, specifically terrain cratering due to high-explosives detonations. The location and relative intensity of crater features can provide information critical in reconstructing the usage history of a range, and indicate areas most likely to contain UXO. We have developed an automated procedure using an adaptation of the Circular Hough Transform for the identification of crater features in LIDAR terrain data. The Circular Hough Transform is highly adept at finding circular features (craters) in noisy terrain data sets. This technique has the ability to find features of a specific radius providing a means of filtering features based on expected scale and providing additional spatial characterization of the identified feature. This method of automated crater identification has been applied to several former munitions ranges with positive results.« less
Litho-kinematic facies model for large landslide deposits in arid settings
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yarnold, J.C.; Lombard, J.P.
1989-04-01
Reconnaissance field studies of six large landslide deposits in the S. Basin and Range suggest that a set of characteristic features is common to the deposits of large landslides in an arid setting. These include a coarse boulder cap, an upper massive zone, a lower disrupted zone, and a mixed zone overlying disturbed substrate. The upper massive zone is dominated by crackel breccia. This grades downward into a lower disrupted zone composed of a more matrix-rich breccia that is internally sheared, intruded by clastic dikes, and often contains a cataclasite layer at its base. An underlying discontinuous mixed zone ismore » composed of material from the overlying breccia mixed with material entrained from the underlying substrate. Bedding in the substrate sometimes displays folding and contortion that die out downward. The authors work suggests a spatial zonation of these characteristic features within many landslide deposits. In general, clastic dikes, the basal cataclasite, and folding in the substrate are observed mainly in distal parts of landslides. In most cases, total thickness, thickness of the basal disturbed and mixed zones, and the degree of internal shearing increase distally, whereas maximum clast size commonly decreases distally. Zonation of these features is interpreted to result from kinematics of emplacement that cause generally increased deformation in the distal regions of the landslide.« less
Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations. PMID:25885272
Cross-indexing of binary SIFT codes for large-scale image search.
Liu, Zhen; Li, Houqiang; Zhang, Liyan; Zhou, Wengang; Tian, Qi
2014-05-01
In recent years, there has been growing interest in mapping visual features into compact binary codes for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval system demonstrate the effectiveness and efficiency of the proposed algorithm.
Visual Saliency Detection Based on Multiscale Deep CNN Features.
Guanbin Li; Yizhou Yu
2016-11-01
Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this paper, we discover that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for feature extraction at three different scales. The penultimate layer of our neural network has been confirmed to be a discriminative high-level feature vector for saliency detection, which we call deep contrast feature. To generate a more robust feature, we integrate handcrafted low-level features with our deep contrast feature. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotations. Experimental results demonstrate that our proposed method is capable of achieving the state-of-the-art performance on all public benchmarks, improving the F-measure by 6.12% and 10%, respectively, on the DUT-OMRON data set and our new data set (HKU-IS), and lowering the mean absolute error by 9% and 35.3%, respectively, on these two data sets.
Variable importance in nonlinear kernels (VINK): classification of digitized histopathology.
Ginsburg, Shoshana; Ali, Sahirzeeshan; Lee, George; Basavanhally, Ajay; Madabhushi, Anant
2013-01-01
Quantitative histomorphometry is the process of modeling appearance of disease morphology on digitized histopathology images via image-based features (e.g., texture, graphs). Due to the curse of dimensionality, building classifiers with large numbers of features requires feature selection (which may require a large training set) or dimensionality reduction (DR). DR methods map the original high-dimensional features in terms of eigenvectors and eigenvalues, which limits the potential for feature transparency or interpretability. Although methods exist for variable selection and ranking on embeddings obtained via linear DR schemes (e.g., principal components analysis (PCA)), similar methods do not yet exist for nonlinear DR (NLDR) methods. In this work we present a simple yet elegant method for approximating the mapping between the data in the original feature space and the transformed data in the kernel PCA (KPCA) embedding space; this mapping provides the basis for quantification of variable importance in nonlinear kernels (VINK). We show how VINK can be implemented in conjunction with the popular Isomap and Laplacian eigenmap algorithms. VINK is evaluated in the contexts of three different problems in digital pathology: (1) predicting five year PSA failure following radical prostatectomy, (2) predicting Oncotype DX recurrence risk scores for ER+ breast cancers, and (3) distinguishing good and poor outcome p16+ oropharyngeal tumors. We demonstrate that subsets of features identified by VINK provide similar or better classification or regression performance compared to the original high dimensional feature sets.
SoFoCles: feature filtering for microarray classification based on gene ontology.
Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A
2010-02-01
Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.
Density-Dependent Quantized Least Squares Support Vector Machine for Large Data Sets.
Nan, Shengyu; Sun, Lei; Chen, Badong; Lin, Zhiping; Toh, Kar-Ann
2017-01-01
Based on the knowledge that input data distribution is important for learning, a data density-dependent quantization scheme (DQS) is proposed for sparse input data representation. The usefulness of the representation scheme is demonstrated by using it as a data preprocessing unit attached to the well-known least squares support vector machine (LS-SVM) for application on big data sets. Essentially, the proposed DQS adopts a single shrinkage threshold to obtain a simple quantization scheme, which adapts its outputs to input data density. With this quantization scheme, a large data set is quantized to a small subset where considerable sample size reduction is generally obtained. In particular, the sample size reduction can save significant computational cost when using the quantized subset for feature approximation via the Nyström method. Based on the quantized subset, the approximated features are incorporated into LS-SVM to develop a data density-dependent quantized LS-SVM (DQLS-SVM), where an analytic solution is obtained in the primal solution space. The developed DQLS-SVM is evaluated on synthetic and benchmark data with particular emphasis on large data sets. Extensive experimental results show that the learning machine incorporating DQS attains not only high computational efficiency but also good generalization performance.
Bremer, Peer-Timo; Weber, Gunther; Tierny, Julien; Pascucci, Valerio; Day, Marcus S; Bell, John B
2011-09-01
Large-scale simulations are increasingly being used to study complex scientific and engineering phenomena. As a result, advanced visualization and data analysis are also becoming an integral part of the scientific process. Often, a key step in extracting insight from these large simulations involves the definition, extraction, and evaluation of features in the space and time coordinates of the solution. However, in many applications, these features involve a range of parameters and decisions that will affect the quality and direction of the analysis. Examples include particular level sets of a specific scalar field, or local inequalities between derived quantities. A critical step in the analysis is to understand how these arbitrary parameters/decisions impact the statistical properties of the features, since such a characterization will help to evaluate the conclusions of the analysis as a whole. We present a new topological framework that in a single-pass extracts and encodes entire families of possible features definitions as well as their statistical properties. For each time step we construct a hierarchical merge tree a highly compact, yet flexible feature representation. While this data structure is more than two orders of magnitude smaller than the raw simulation data it allows us to extract a set of features for any given parameter selection in a postprocessing step. Furthermore, we augment the trees with additional attributes making it possible to gather a large number of useful global, local, as well as conditional statistic that would otherwise be extremely difficult to compile. We also use this representation to create tracking graphs that describe the temporal evolution of the features over time. Our system provides a linked-view interface to explore the time-evolution of the graph interactively alongside the segmentation, thus making it possible to perform extensive data analysis in a very efficient manner. We demonstrate our framework by extracting and analyzing burning cells from a large-scale turbulent combustion simulation. In particular, we show how the statistical analysis enabled by our techniques provides new insight into the combustion process.
Sentiment analysis of feature ranking methods for classification accuracy
NASA Astrophysics Data System (ADS)
Joseph, Shashank; Mugauri, Calvin; Sumathy, S.
2017-11-01
Text pre-processing and feature selection are important and critical steps in text mining. Text pre-processing of large volumes of datasets is a difficult task as unstructured raw data is converted into structured format. Traditional methods of processing and weighing took much time and were less accurate. To overcome this challenge, feature ranking techniques have been devised. A feature set from text preprocessing is fed as input for feature selection. Feature selection helps improve text classification accuracy. Of the three feature selection categories available, the filter category will be the focus. Five feature ranking methods namely: document frequency, standard deviation information gain, CHI-SQUARE, and weighted-log likelihood -ratio is analyzed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Galavis, P; Friedman, K; Chandarana, H
Purpose: Radiomics involves the extraction of texture features from different imaging modalities with the purpose of developing models to predict patient treatment outcomes. The purpose of this study is to investigate texture feature reproducibility across [18F]FDG PET/CT and [18F]FDG PET/MR imaging in patients with primary malignancies. Methods: Twenty five prospective patients with solid tumors underwent clinical [18F]FDG PET/CT scan followed by [18F]FDG PET/MR scans. In all patients the lesions were identified using nuclear medicine reports. The images were co-registered and segmented using an in-house auto-segmentation method. Fifty features, based on the intensity histogram, second and high order matrices, were extractedmore » from the segmented regions from both image data sets. One-way random-effects ANOVA model of the intra-class correlation coefficient (ICC) was used to establish texture feature correlations between both data sets. Results: Fifty features were classified based on their ICC values, which were found in the range from 0.1 to 0.86, in three categories: high, intermediate, and low. Ten features extracted from second and high-order matrices showed large ICC ≥ 0.70. Seventeen features presented intermediate 0.5 ≤ ICC ≤ 0.65 and the remaining twenty three presented low ICC ≤ 0.45. Conclusion: Features with large ICC values could be reliable candidates for quantification as they lead to similar results from both imaging modalities. Features with small ICC indicates a lack of correlation. Therefore, the use of these features as a quantitative measure will lead to different assessments of the same lesion depending on the imaging modality from where they are extracted. This study shows the importance of the need for further investigation and standardization of features across multiple imaging modalities.« less
NASA Astrophysics Data System (ADS)
Poncelet, Carine; Merz, Ralf; Merz, Bruno; Parajka, Juraj; Oudin, Ludovic; Andréassian, Vazken; Perrin, Charles
2017-08-01
Most of previous assessments of hydrologic model performance are fragmented, based on small number of catchments, different methods or time periods and do not link the results to landscape or climate characteristics. This study uses large-sample hydrology to identify major catchment controls on daily runoff simulations. It is based on a conceptual lumped hydrological model (GR6J), a collection of 29 catchment characteristics, a multinational set of 1103 catchments located in Austria, France, and Germany and four runoff model efficiency criteria. Two analyses are conducted to assess how features and criteria are linked: (i) a one-dimensional analysis based on the Kruskal-Wallis test and (ii) a multidimensional analysis based on regression trees and investigating the interplay between features. The catchment features most affecting model performance are the flashiness of precipitation and streamflow (computed as the ratio of absolute day-to-day fluctuations by the total amount in a year), the seasonality of evaporation, the catchment area, and the catchment aridity. Nonflashy, nonseasonal, large, and nonarid catchments show the best performance for all the tested criteria. We argue that this higher performance is due to fewer nonlinear responses (higher correlation between precipitation and streamflow) and lower input and output variability for such catchments. Finally, we show that, compared to national sets, multinational sets increase results transferability because they explore a wider range of hydroclimatic conditions.
Online Feature Transformation Learning for Cross-Domain Object Category Recognition.
Zhang, Xuesong; Zhuang, Yan; Wang, Wei; Pedrycz, Witold
2017-06-09
In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.
Assessment of features for automatic CTG analysis based on expert annotation.
Chudácek, Vacláv; Spilka, Jirí; Lhotská, Lenka; Janku, Petr; Koucký, Michal; Huptych, Michal; Bursa, Miroslav
2011-01-01
Cardiotocography (CTG) is the monitoring of fetal heart rate (FHR) and uterine contractions (TOCO) since 1960's used routinely by obstetricians to detect fetal hypoxia. The evaluation of the FHR in clinical settings is based on an evaluation of macroscopic morphological features and so far has managed to avoid adopting any achievements from the HRV research field. In this work, most of the ever-used features utilized for FHR characterization, including FIGO, HRV, nonlinear, wavelet, and time and frequency domain features, are investigated and the features are assessed based on their statistical significance in the task of distinguishing the FHR into three FIGO classes. Annotation derived from the panel of experts instead of the commonly utilized pH values was used for evaluation of the features on a large data set (552 records). We conclude the paper by presenting the best uncorrelated features and their individual rank of importance according to the meta-analysis of three different ranking methods. Number of acceleration and deceleration, interval index, as well as Lempel-Ziv complexity and Higuchi's fractal dimension are among the top five features.
Machine learning techniques in searches for$$t\\bar{t}$$h in the h → $$b\\bar{b}$$ decay channel
Santos, Robert; Nguyen, M.; Webster, Jordan; ...
2017-04-10
Study of the production of pairs of top quarks in association with a Higgs boson is one of the primary goals of the Large Hadron Collider over the next decade, as measurements of this process may help us to understand whether the uniquely large mass of the top quark plays a special role in electroweak symmetry breaking. Higgs bosons decay predominantly to bmore » $$\\bar{_b}$$, yielding signatures for the signal that are similar to t$$\\bar{_t}$$ + jets with heavy flavor. Though particularly challenging to study due to the similar kinematics between signal and background events, such final states (t$$\\bar{_t}$$b$$\\bar{b}$$) are an important channel for studying the top quark Yukawa coupling. This paper presents a systematic study of machine learning (ML) methods for detecting t$$\\bar{_t}$$h in the h → b$$\\bar{b}$$ decay channel. Among the seven ML methods tested, we show that neural network models outperform alternative methods. In addition, two neural models used in this paper outperform NeuroBayes, one of the standard algorithms used in current particle physics experiments. We further study the effectiveness of ML algorithms by investigating the impact of feature set and data size, as well as the depth of the networks for neural models. We demonstrate that an extended feature set leads to improvement of performance over basic features. Furthermore, the availability of large samples for training is found to be important for improving the performance of the techniques. For the features and the data set studied here, neural networks of more layers deliver comparable performance to their simpler counterparts.« less
Machine learning techniques in searches for$$t\\bar{t}$$h in the h → $$b\\bar{b}$$ decay channel
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santos, Robert; Nguyen, M.; Webster, Jordan
Study of the production of pairs of top quarks in association with a Higgs boson is one of the primary goals of the Large Hadron Collider over the next decade, as measurements of this process may help us to understand whether the uniquely large mass of the top quark plays a special role in electroweak symmetry breaking. Higgs bosons decay predominantly to bmore » $$\\bar{_b}$$, yielding signatures for the signal that are similar to t$$\\bar{_t}$$ + jets with heavy flavor. Though particularly challenging to study due to the similar kinematics between signal and background events, such final states (t$$\\bar{_t}$$b$$\\bar{b}$$) are an important channel for studying the top quark Yukawa coupling. This paper presents a systematic study of machine learning (ML) methods for detecting t$$\\bar{_t}$$h in the h → b$$\\bar{b}$$ decay channel. Among the seven ML methods tested, we show that neural network models outperform alternative methods. In addition, two neural models used in this paper outperform NeuroBayes, one of the standard algorithms used in current particle physics experiments. We further study the effectiveness of ML algorithms by investigating the impact of feature set and data size, as well as the depth of the networks for neural models. We demonstrate that an extended feature set leads to improvement of performance over basic features. Furthermore, the availability of large samples for training is found to be important for improving the performance of the techniques. For the features and the data set studied here, neural networks of more layers deliver comparable performance to their simpler counterparts.« less
Shilov, Ignat V; Seymour, Sean L; Patel, Alpesh A; Loboda, Alex; Tang, Wilfred H; Keating, Sean P; Hunter, Christie L; Nuwaysir, Lydia M; Schaeffer, Daniel A
2007-09-01
The Paragon Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/MS spectrum of each region of a database to be determined on a continuum. Counter to conventional approaches, features such as modifications, substitutions, and cleavage events are modeled with probabilities rather than by discrete user-controlled settings to consider or not consider a feature. The use of feature probabilities in conjunction with Sequence Temperature Values allows for a very large increase in the effective search space with only a very small increase in the actual number of hypotheses that must be scored. The algorithm has a new kind of user interface that removes the user expertise requirement, presenting control settings in the language of the laboratory that are translated to optimal algorithmic settings. To validate this new algorithm, a comparison with Mascot is presented for a series of analogous searches to explore the relative impact of increasing search space probed with Mascot by relaxing the tryptic digestion conformance requirements from trypsin to semitrypsin to no enzyme and with the Paragon Algorithm using its Rapid mode and Thorough mode with and without tryptic specificity. Although they performed similarly for small search space, dramatic differences were observed in large search space. With the Paragon Algorithm, hundreds of biological and artifact modifications, all possible substitutions, and all levels of conformance to the expected digestion pattern can be searched in a single search step, yet the typical cost in search time is only 2-5 times that of conventional small search space. Despite this large increase in effective search space, there is no drastic loss of discrimination that typically accompanies the exploration of large search space.
Chudáček, V; Spilka, J; Janků, P; Koucký, M; Lhotská, L; Huptych, M
2011-08-01
Cardiotocography is the monitoring of fetal heart rate (FHR) and uterine contractions (TOCO), used routinely since the 1960s by obstetricians to detect fetal hypoxia. The evaluation of the FHR in clinical settings is based on an evaluation of macroscopic morphological features and so far has managed to avoid adopting any achievements from the HRV research field. In this work, most of the features utilized for FHR characterization, including FIGO, HRV, nonlinear, wavelet, and time and frequency domain features, are investigated and assessed based on their statistical significance in the task of distinguishing the FHR into three FIGO classes. We assess the features on a large data set (552 records) and unlike in other published papers we use three-class expert evaluation of the records instead of the pH values. We conclude the paper by presenting the best uncorrelated features and their individual rank of importance according to the meta-analysis of three different ranking methods. The number of accelerations and decelerations, interval index, as well as Lempel-Ziv complexity and Higuchi's fractal dimension are among the top five features.
Optimal number of features as a function of sample size for various classification rules.
Hua, Jianping; Xiong, Zixiang; Lowey, James; Suh, Edward; Dougherty, Edward R
2005-04-15
Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. For the companion website, please visit http://public.tgen.org/tamu/ofs/ e-dougherty@ee.tamu.edu.
Analysis of the IJCNN 2011 UTL Challenge
2012-01-13
large datasets from various application domains: handwriting recognition, image recognition, video processing, text processing, and ecology. The goal...http //clopinet.com/ul). We made available large datasets from various application domains handwriting recognition, image recognition, video...evaluation sets consist of 4096 examples each. Dataset Domain Features Sparsity Devel. Transf. AVICENNA Handwriting 120 0% 150205 50000 HARRY Video 5000 98.1
Remote sensing techniques in cultural resource management archaeology
NASA Astrophysics Data System (ADS)
Johnson, Jay K.; Haley, Bryan S.
2003-04-01
Cultural resource management archaeology in the United States concerns compliance with legislation set in place to protect archaeological resources from the impact of modern activities. Traditionally, surface collection, shovel testing, test excavation, and mechanical stripping are used in these projects. These methods are expensive, time consuming, and may poorly represent the features within archaeological sites. The use of remote sensing techniques in cultural resource management archaeology may provide an answer to these problems. Near-surface geophysical techniques, including magnetometry, resistivity, electromagnetics, and ground penetrating radar, have proven to be particularly successful at efficiently locating archaeological features. Research has also indicated airborne and satellite remote sensing may hold some promise in the future for large-scale archaeological survey, although this is difficult in many areas of the world where ground cover reflect archaeological features in an indirect manner. A cost simulation of a hypothetical data recovery project on a large complex site in Mississippi is presented to illustrate the potential advantages of remote sensing in a cultural resource management setting. The results indicate these techniques can save a substantial amount of time and money for these projects.
Obermeier, S.F.
1996-01-01
Liquefaction features can be used in many field settings to estimate the recurrence interval and magnitude of strong earthquakes through much of the Holocene. These features include dikes, craters, vented sand, sills, and laterally spreading landslides. The relatively high seismic shaking level required for their formation makes them particularly valuable as records of strong paleo-earthquakes. This state-of-the-art summary for using liquefaction-induced features for paleoseismic interpretation and analysis takes into account both geological and geotechnical engineering perspectives. The driving mechanism for formation of the features is primarily the increased pore-water pressure associated with liquefaction of sand-rich sediment. The role of this mechanism is often supplemented greatly by the direct action of seismic shaking at the ground surface, which strains and breaks the clay-rich cap that lies immediately above the sediment that liquefied. Discussed in the text are the processes involved in formation of the features, as well as their morphology and characteristics in field settings. Whether liquefaction occurs is controlled mainly by sediment grain size, sediment packing, depth to the water table, and strength and duration of seismic shaking. Formation of recognizable features in the field generally requires a low-permeability cap above the sediment that liquefied. Field manifestations are controlled largely by the severity of liquefaction and the thickness and properties of the low-permeability cap. Criteria are presented for determining whether observed sediment deformation in the field originated by seismically induced liquefaction. These criteria have been developed mainly by observing historic effects of liquefaction in varied field settings. The most important criterion is that a seismic liquefaction origin requires widespread, regional development of features around a core area where the effects are most severe. In addition, the features must have a morphology that is consistent with a very sudden application of a large hydraulic force. This article discusses case studies in widely separated and different geological settings: coastal South Carolina, the New Madrid seismic zone, the Wabash Valley seismic zone, and coastal Washington State. These studies encompass most of the range of settings and the types of liquefaction-induced features likely to be encountered anywhere. The case studies describe the observed features and the logic for assigning a seismic liquefaction origin to them. Also discussed are some types of sediment deformations that can be misinterpreted as having a seismic origin. Two independent methods for estimating prehistoric magnitude are discussed briefly. One method is based on determination of the maximum distance from the epicenter over which liquefaction-induced effects have formed. The other method is based on use of geotechnical engineering techniques at sites of marginal liquefaction, in order to bracket the peak accelerations as a function of epicentral distance; these accelerations can then be compared with predictions from seismological models.
Learning feature representations with a cost-relevant sparse autoencoder.
Längkvist, Martin; Loutfi, Amy
2015-02-01
There is an increasing interest in the machine learning community to automatically learn feature representations directly from the (unlabeled) data instead of using hand-designed features. The autoencoder is one method that can be used for this purpose. However, for data sets with a high degree of noise, a large amount of the representational capacity in the autoencoder is used to minimize the reconstruction error for these noisy inputs. This paper proposes a method that improves the feature learning process by focusing on the task relevant information in the data. This selective attention is achieved by weighting the reconstruction error and reducing the influence of noisy inputs during the learning process. The proposed model is trained on a number of publicly available image data sets and the test error rate is compared to a standard sparse autoencoder and other methods, such as the denoising autoencoder and contractive autoencoder.
Size matters: large objects capture attention in visual search.
Proulx, Michael J
2010-12-23
Can objects or events ever capture one's attention in a purely stimulus-driven manner? A recent review of the literature set out the criteria required to find stimulus-driven attentional capture independent of goal-directed influences, and concluded that no published study has satisfied that criteria. Here visual search experiments assessed whether an irrelevantly large object can capture attention. Capture of attention by this static visual feature was found. The results suggest that a large object can indeed capture attention in a stimulus-driven manner and independent of displaywide features of the task that might encourage a goal-directed bias for large items. It is concluded that these results are either consistent with the stimulus-driven criteria published previously or alternatively consistent with a flexible, goal-directed mechanism of saliency detection.
Subsurface failure in spherical bodies. A formation scenario for linear troughs on Vesta’s surface
Stickle, Angela M.; Schultz, P. H.; Crawford, D. A.
2014-10-13
Many asteroids in the Solar System exhibit unusual, linear features on their surface. The Dawn mission recently observed two sets of linear features on the surface of the asteroid 4 Vesta. Geologic observations indicate that these features are related to the two large impact basins at the south pole of Vesta, though no specific mechanism of origin has been determined. Furthermore, the orientation of the features is offset from the center of the basins. Experimental and numerical results reveal that the offset angle is a natural consequence of oblique impacts into a spherical target. We demonstrate that a set ofmore » shear planes develops in the subsurface of the body opposite to the point of first contact. Moreover, these subsurface failure zones then propagate to the surface under combined tensile-shear stress fields after the impact to create sets of approximately linear faults on the surface. Comparison between the orientation of damage structures in the laboratory and failure regions within Vesta can be used to constrain impact parameters (e.g., the approximate impact point and likely impact trajectory).« less
Coherent Image Layout using an Adaptive Visual Vocabulary
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dillard, Scott E.; Henry, Michael J.; Bohn, Shawn J.
When querying a huge image database containing millions of images, the result of the query may still contain many thousands of images that need to be presented to the user. We consider the problem of arranging such a large set of images into a visually coherent layout, one that places similar images next to each other. Image similarity is determined using a bag-of-features model, and the layout is constructed from a hierarchical clustering of the image set by mapping an in-order traversal of the hierarchy tree into a space-filling curve. This layout method provides strong locality guarantees so we aremore » able to quantitatively evaluate performance using standard image retrieval benchmarks. Performance of the bag-of-features method is best when the vocabulary is learned on the image set being clustered. Because learning a large, discriminative vocabulary is a computationally demanding task, we present a novel method for efficiently adapting a generic visual vocabulary to a particular dataset. We evaluate our clustering and vocabulary adaptation methods on a variety of image datasets and show that adapting a generic vocabulary to a particular set of images improves performance on both hierarchical clustering and image retrieval tasks.« less
Martínez-Bartolomé, Salvador; Medina-Aunon, J Alberto; López-García, Miguel Ángel; González-Tejedo, Carmen; Prieto, Gorka; Navajas, Rosana; Salazar-Donate, Emilio; Fernández-Costa, Carolina; Yates, John R; Albar, Juan Pablo
2018-04-06
Mass-spectrometry-based proteomics has evolved into a high-throughput technology in which numerous large-scale data sets are generated from diverse analytical platforms. Furthermore, several scientific journals and funding agencies have emphasized the storage of proteomics data in public repositories to facilitate its evaluation, inspection, and reanalysis. (1) As a consequence, public proteomics data repositories are growing rapidly. However, tools are needed to integrate multiple proteomics data sets to compare different experimental features or to perform quality control analysis. Here, we present a new Java stand-alone tool, Proteomics Assay COMparator (PACOM), that is able to import, combine, and simultaneously compare numerous proteomics experiments to check the integrity of the proteomic data as well as verify data quality. With PACOM, the user can detect source of errors that may have been introduced in any step of a proteomics workflow and that influence the final results. Data sets can be easily compared and integrated, and data quality and reproducibility can be visually assessed through a rich set of graphical representations of proteomics data features as well as a wide variety of data filters. Its flexibility and easy-to-use interface make PACOM a unique tool for daily use in a proteomics laboratory. PACOM is available at https://github.com/smdb21/pacom .
Self-adaptive MOEA feature selection for classification of bankruptcy prediction data.
Gaspar-Cunha, A; Recio, G; Costa, L; Estébanez, C
2014-01-01
Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.
Shift-invariant discrete wavelet transform analysis for retinal image classification.
Khademi, April; Krishnan, Sridhar
2007-12-01
This work involves retinal image classification and a novel analysis system was developed. From the compressed domain, the proposed scheme extracts textural features from wavelet coefficients, which describe the relative homogeneity of localized areas of the retinal images. Since the discrete wavelet transform (DWT) is shift-variant, a shift-invariant DWT was explored to ensure that a robust feature set was extracted. To combat the small database size, linear discriminant analysis classification was used with the leave one out method. 38 normal and 48 abnormal (exudates, large drusens, fine drusens, choroidal neovascularization, central vein and artery occlusion, histoplasmosis, arteriosclerotic retinopathy, hemi-central retinal vein occlusion and more) were used and a specificity of 79% and sensitivity of 85.4% were achieved (the average classification rate is 82.2%). The success of the system can be accounted to the highly robust feature set which included translation, scale and semi-rotational, features. Additionally, this technique is database independent since the features were specifically tuned to the pathologies of the human eye.
Self-Adaptive MOEA Feature Selection for Classification of Bankruptcy Prediction Data
Gaspar-Cunha, A.; Recio, G.; Costa, L.; Estébanez, C.
2014-01-01
Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier. PMID:24707201
NASA Astrophysics Data System (ADS)
Lesniak, J. M.; Hupse, R.; Blanc, R.; Karssemeijer, N.; Székely, G.
2012-08-01
False positive (FP) marks represent an obstacle for effective use of computer-aided detection (CADe) of breast masses in mammography. Typically, the problem can be approached either by developing more discriminative features or by employing different classifier designs. In this paper, the usage of support vector machine (SVM) classification for FP reduction in CADe is investigated, presenting a systematic quantitative evaluation against neural networks, k-nearest neighbor classification, linear discriminant analysis and random forests. A large database of 2516 film mammography examinations and 73 input features was used to train the classifiers and evaluate for their performance on correctly diagnosed exams as well as false negatives. Further, classifier robustness was investigated using varying training data and feature sets as input. The evaluation was based on the mean exam sensitivity in 0.05-1 FPs on normals on the free-response receiver operating characteristic curve (FROC), incorporated into a tenfold cross validation framework. It was found that SVM classification using a Gaussian kernel offered significantly increased detection performance (P = 0.0002) compared to the reference methods. Varying training data and input features, SVMs showed improved exploitation of large feature sets. It is concluded that with the SVM-based CADe a significant reduction of FPs is possible outperforming other state-of-the-art approaches for breast mass CADe.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features
Mohammad-Noori, Morteza; Beer, Michael A.
2014-01-01
Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408
Enhanced regulatory sequence prediction using gapped k-mer features.
Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A
2014-07-01
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.
Dexter, Alex; Race, Alan M; Steven, Rory T; Barnes, Jennifer R; Hulme, Heather; Goodwin, Richard J A; Styles, Iain B; Bunch, Josephine
2017-11-07
Clustering is widely used in MSI to segment anatomical features and differentiate tissue types, but existing approaches are both CPU and memory-intensive, limiting their application to small, single data sets. We propose a new approach that uses a graph-based algorithm with a two-phase sampling method that overcomes this limitation. We demonstrate the algorithm on a range of sample types and show that it can segment anatomical features that are not identified using commonly employed algorithms in MSI, and we validate our results on synthetic MSI data. We show that the algorithm is robust to fluctuations in data quality by successfully clustering data with a designed-in variance using data acquired with varying laser fluence. Finally, we show that this method is capable of generating accurate segmentations of large MSI data sets acquired on the newest generation of MSI instruments and evaluate these results by comparison with histopathology.
Interictal epileptiform discharge characteristics underlying expert interrater agreement.
Bagheri, Elham; Dauwels, Justin; Dean, Brian C; Waters, Chad G; Westover, M Brandon; Halford, Jonathan J
2017-10-01
The presence of interictal epileptiform discharges (IED) in the electroencephalogram (EEG) is a key finding in the medical workup of a patient with suspected epilepsy. However, inter-rater agreement (IRA) regarding the presence of IED is imperfect, leading to incorrect and delayed diagnoses. An improved understanding of which IED attributes mediate expert IRA might help in developing automatic methods for IED detection able to emulate the abilities of experts. Therefore, using a set of IED scored by a large number of experts, we set out to determine which attributes of IED predict expert agreement regarding the presence of IED. IED were annotated on a 5-point scale by 18 clinical neurophysiologists within 200 30-s EEG segments from recordings of 200 patients. 5538 signal analysis features were extracted from the waveforms, including wavelet coefficients, morphological features, signal energy, nonlinear energy operator response, electrode location, and spectrogram features. Feature selection was performed by applying elastic net regression and support vector regression (SVR) was applied to predict expert opinion, with and without the feature selection procedure and with and without several types of signal normalization. Multiple types of features were useful for predicting expert annotations, but particular types of wavelet features performed best. Local EEG normalization also enhanced best model performance. As the size of the group of EEGers used to train the models was increased, the performance of the models leveled off at a group size of around 11. The features that best predict inter-rater agreement among experts regarding the presence of IED are wavelet features, using locally standardized EEG. Our models for predicting expert opinion based on EEGer's scores perform best with a large group of EEGers (more than 10). By examining a large group of EEG signal analysis features we found that wavelet features with certain wavelet basis functions performed best to identify IEDs. Local normalization also improves predictability, suggesting the importance of IED morphology over amplitude-based features. Although most IED detection studies in the past have used opinion from three or fewer experts, our study suggests a "wisdom of the crowd" effect, such that pooling over a larger number of expert opinions produces a better correlation between expert opinion and objectively quantifiable features of the EEG. Copyright © 2017 International Federation of Clinical Neurophysiology. Published by Elsevier B.V. All rights reserved.
Chang, Kaowen Grace; Chien, Hungju
2017-07-05
Studies have suggested that visiting and viewing landscaping at hospitals accelerates patient's recovery from surgery and help staff's recovery from mental fatigue. To plan and construct such landscapes, we need to unravel landscape features desirable to different groups so that the space can benefit a wide range of hospital users. Using discrete choice modeling, we developed experimental choice sets to investigate how landscape features influence the visitations of different users in a large regional hospital in Taiwan. The empirical survey provides quantitative estimates of the influence of each landscape feature on four user groups, including patients, caregivers, staff, and neighborhood residents. Our findings suggest that different types of features promote visits from specific user groups. Landscape features facilitating physical activities effectively encourage visits across user groups especially for caregivers and staff. Patients in this study specify a strong need for contact with nature. The nearby community favors the features designed for children's play and family activities. People across user groups value the features that provide a mitigated microclimate of comfort, such as a shelter. Study implications and limitations are also discussed. Our study provides information essential for creating a better healing environment in a hospital setting.
Developing a benchmark for emotional analysis of music
Yang, Yi-Hsuan; Soleymani, Mohammad
2017-01-01
Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the ‘Emotion in Music’ task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER. PMID:28282400
Developing a benchmark for emotional analysis of music.
Aljanaki, Anna; Yang, Yi-Hsuan; Soleymani, Mohammad
2017-01-01
Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the 'Emotion in Music' task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER.
NASA Astrophysics Data System (ADS)
Zhang, Yunlu; Yan, Lei; Liou, Frank
2018-05-01
The quality initial guess of deformation parameters in digital image correlation (DIC) has a serious impact on convergence, robustness, and efficiency of the following subpixel level searching stage. In this work, an improved feature-based initial guess (FB-IG) scheme is presented to provide initial guess for points of interest (POIs) inside a large region. Oriented FAST and Rotated BRIEF (ORB) features are semi-uniformly extracted from the region of interest (ROI) and matched to provide initial deformation information. False matched pairs are eliminated by the novel feature guided Gaussian mixture model (FG-GMM) point set registration algorithm, and nonuniform deformation parameters of the versatile reproducing kernel Hilbert space (RKHS) function are calculated simultaneously. Validations on simulated images and real-world mini tensile test verify that this scheme can robustly and accurately compute initial guesses with semi-subpixel level accuracy in cases with small or large translation, deformation, or rotation.
Quantum algorithms for topological and geometric analysis of data
Lloyd, Seth; Garnerone, Silvano; Zanardi, Paolo
2016-01-01
Extracting useful information from large data sets can be a daunting task. Topological methods for analysing data sets provide a powerful technique for extracting such information. Persistent homology is a sophisticated tool for identifying topological features and for determining how such features persist as the data is viewed at different scales. Here we present quantum machine learning algorithms for calculating Betti numbers—the numbers of connected components, holes and voids—in persistent homology, and for finding eigenvectors and eigenvalues of the combinatorial Laplacian. The algorithms provide an exponential speed-up over the best currently known classical algorithms for topological data analysis. PMID:26806491
Large-scale Exploration of Neuronal Morphologies Using Deep Learning and Augmented Reality.
Li, Zhongyu; Butler, Erik; Li, Kang; Lu, Aidong; Ji, Shuiwang; Zhang, Shaoting
2018-02-12
Recently released large-scale neuron morphological data has greatly facilitated the research in neuroinformatics. However, the sheer volume and complexity of these data pose significant challenges for efficient and accurate neuron exploration. In this paper, we propose an effective retrieval framework to address these problems, based on frontier techniques of deep learning and binary coding. For the first time, we develop a deep learning based feature representation method for the neuron morphological data, where the 3D neurons are first projected into binary images and then learned features using an unsupervised deep neural network, i.e., stacked convolutional autoencoders (SCAEs). The deep features are subsequently fused with the hand-crafted features for more accurate representation. Considering the exhaustive search is usually very time-consuming in large-scale databases, we employ a novel binary coding method to compress feature vectors into short binary codes. Our framework is validated on a public data set including 58,000 neurons, showing promising retrieval precision and efficiency compared with state-of-the-art methods. In addition, we develop a novel neuron visualization program based on the techniques of augmented reality (AR), which can help users take a deep exploration of neuron morphologies in an interactive and immersive manner.
Large-scale urban point cloud labeling and reconstruction
NASA Astrophysics Data System (ADS)
Zhang, Liqiang; Li, Zhuqiang; Li, Anjian; Liu, Fangyu
2018-04-01
The large number of object categories and many overlapping or closely neighboring objects in large-scale urban scenes pose great challenges in point cloud classification. In this paper, a novel framework is proposed for classification and reconstruction of airborne laser scanning point cloud data. To label point clouds, we present a rectified linear units neural network named ReLu-NN where the rectified linear units (ReLu) instead of the traditional sigmoid are taken as the activation function in order to speed up the convergence. Since the features of the point cloud are sparse, we reduce the number of neurons by the dropout to avoid over-fitting of the training process. The set of feature descriptors for each 3D point is encoded through self-taught learning, and forms a discriminative feature representation which is taken as the input of the ReLu-NN. The segmented building points are consolidated through an edge-aware point set resampling algorithm, and then they are reconstructed into 3D lightweight models using the 2.5D contouring method (Zhou and Neumann, 2010). Compared with deep learning approaches, the ReLu-NN introduced can easily classify unorganized point clouds without rasterizing the data, and it does not need a large number of training samples. Most of the parameters in the network are learned, and thus the intensive parameter tuning cost is significantly reduced. Experimental results on various datasets demonstrate that the proposed framework achieves better performance than other related algorithms in terms of classification accuracy and reconstruction quality.
BSIFT: toward data-independent codebook for large scale image search.
Zhou, Wengang; Li, Houqiang; Hong, Richang; Lu, Yijuan; Tian, Qi
2015-03-01
Bag-of-Words (BoWs) model based on Scale Invariant Feature Transform (SIFT) has been widely used in large-scale image retrieval applications. Feature quantization by vector quantization plays a crucial role in BoW model, which generates visual words from the high- dimensional SIFT features, so as to adapt to the inverted file structure for the scalable retrieval. Traditional feature quantization approaches suffer several issues, such as necessity of visual codebook training, limited reliability, and update inefficiency. To avoid the above problems, in this paper, a novel feature quantization scheme is proposed to efficiently quantize each SIFT descriptor to a descriptive and discriminative bit-vector, which is called binary SIFT (BSIFT). Our quantizer is independent of image collections. In addition, by taking the first 32 bits out from BSIFT as code word, the generated BSIFT naturally lends itself to adapt to the classic inverted file structure for image indexing. Moreover, the quantization error is reduced by feature filtering, code word expansion, and query sensitive mask shielding. Without any explicit codebook for quantization, our approach can be readily applied in image search in some resource-limited scenarios. We evaluate the proposed algorithm for large scale image search on two public image data sets. Experimental results demonstrate the index efficiency and retrieval accuracy of our approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heredia-Langner, Alejandro; Amidan, Brett G.; Matzner, Shari
We present results from the optimization of a re-identification process using two sets of biometric data obtained from the Civilian American and European Surface Anthropometry Resource Project (CAESAR) database. The datasets contain real measurements of features for 2378 individuals in a standing (43 features) and seated (16 features) position. A genetic algorithm (GA) was used to search a large combinatorial space where different features are available between the probe (seated) and gallery (standing) datasets. Results show that optimized model predictions obtained using less than half of the 43 gallery features and data from roughly 16% of the individuals available producemore » better re-identification rates than two other approaches that use all the information available.« less
Hadoop neural network for parallel and distributed feature selection.
Hodge, Victoria J; O'Keefe, Simon; Austin, Jim
2016-06-01
In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Hypergraph Based Feature Selection Technique for Medical Diagnosis.
Somu, Nivethitha; Raman, M R Gauthama; Kirthivasan, Kannan; Sriram, V S Shankar
2016-11-01
The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K - Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K - Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.
Bommert, Andrea; Rahnenführer, Jörg; Lang, Michel
2017-01-01
Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.
Feature selection for elderly faller classification based on wearable sensors.
Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D
2017-05-30
Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature sets and improved faller classification compared to faller classification without feature selection. CFS and FCBF provided the best feature subset (one posterior pelvis accelerometer feature) for faller classification. However, better sensitivity was achieved by the second best model based on a Relief-F feature subset with three pressure-sensing insole features and seven head accelerometer features. Feature selection should be considered as an important step in faller classification using wearable sensors.
Image segmentation using association rule features.
Rushing, John A; Ranganath, Heggere; Hinke, Thomas H; Graves, Sara J
2002-01-01
A new type of texture feature based on association rules is described. Association rules have been used in applications such as market basket analysis to capture relationships present among items in large data sets. It is shown that association rules can be adapted to capture frequently occurring local structures in images. The frequency of occurrence of these structures can be used to characterize texture. Methods for segmentation of textured images based on association rule features are described. Simulation results using images consisting of man made and natural textures show that association rule features perform well compared to other widely used texture features. Association rule features are used to detect cumulus cloud fields in GOES satellite images and are found to achieve higher accuracy than other statistical texture features for this problem.
Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.
Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor
2010-08-01
Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods. Copyright © 2010 Elsevier Ltd. All rights reserved.
Ordinal feature selection for iris and palmprint recognition.
Sun, Zhenan; Wang, Libin; Tan, Tieniu
2014-09-01
Ordinal measures have been demonstrated as an effective feature representation model for iris and palmprint recognition. However, ordinal measures are a general concept of image analysis and numerous variants with different parameter settings, such as location, scale, orientation, and so on, can be derived to construct a huge feature space. This paper proposes a novel optimization formulation for ordinal feature selection with successful applications to both iris and palmprint recognition. The objective function of the proposed feature selection method has two parts, i.e., misclassification error of intra and interclass matching samples and weighted sparsity of ordinal feature descriptors. Therefore, the feature selection aims to achieve an accurate and sparse representation of ordinal measures. And, the optimization subjects to a number of linear inequality constraints, which require that all intra and interclass matching pairs are well separated with a large margin. Ordinal feature selection is formulated as a linear programming (LP) problem so that a solution can be efficiently obtained even on a large-scale feature pool and training database. Extensive experimental results demonstrate that the proposed LP formulation is advantageous over existing feature selection methods, such as mRMR, ReliefF, Boosting, and Lasso for biometric recognition, reporting state-of-the-art accuracy on CASIA and PolyU databases.
Pilling, Michael; Gellatly, Angus
2013-07-01
We investigated the influence of dimensional set on report of object feature information using an immediate memory probe task. Participants viewed displays containing up to 36 coloured geometric shapes which were presented for several hundred milliseconds before one item was abruptly occluded by a probe. A cue presented simultaneously with the probe instructed participants to report either about the colour or shape of the probe item. A dimensional set towards the colour or shape of the presented items was induced by manipulating task probability - the relative probability with which the two feature dimensions required report. This was done across two participant groups: One group was given trials where there was a higher report probability of colour, the other a higher report probability of shape. Two experiments showed that features were reported most accurately when they were of high task probability, though in both cases the effect was largely driven by the colour dimension. Importantly the task probability effect did not interact with display set size. This is interpreted as tentative evidence that this manipulation influences feature processing in a global manner and at a stage prior to visual short term memory. Copyright © 2013 Elsevier B.V. All rights reserved.
Non-specific filtering of beta-distributed data.
Wang, Xinhui; Laird, Peter W; Hinoue, Toshinori; Groshen, Susan; Siegmund, Kimberly D
2014-06-19
Non-specific feature selection is a dimension reduction procedure performed prior to cluster analysis of high dimensional molecular data. Not all measured features are expected to show biological variation, so only the most varying are selected for analysis. In DNA methylation studies, DNA methylation is measured as a proportion, bounded between 0 and 1, with variance a function of the mean. Filtering on standard deviation biases the selection of probes to those with mean values near 0.5. We explore the effect this has on clustering, and develop alternate filter methods that utilize a variance stabilizing transformation for Beta distributed data and do not share this bias. We compared results for 11 different non-specific filters on eight Infinium HumanMethylation data sets, selected to span a variety of biological conditions. We found that for data sets having a small fraction of samples showing abnormal methylation of a subset of normally unmethylated CpGs, a characteristic of the CpG island methylator phenotype in cancer, a novel filter statistic that utilized a variance-stabilizing transformation for Beta distributed data outperformed the common filter of using standard deviation of the DNA methylation proportion, or its log-transformed M-value, in its ability to detect the cancer subtype in a cluster analysis. However, the standard deviation filter always performed among the best for distinguishing subgroups of normal tissue. The novel filter and standard deviation filter tended to favour features in different genome contexts; for the same data set, the novel filter always selected more features from CpG island promoters and the standard deviation filter always selected more features from non-CpG island intergenic regions. Interestingly, despite selecting largely non-overlapping sets of features, the two filters did find sample subsets that overlapped for some real data sets. We found two different filter statistics that tended to prioritize features with different characteristics, each performed well for identifying clusters of cancer and non-cancer tissue, and identifying a cancer CpG island hypermethylation phenotype. Since cluster analysis is for discovery, we would suggest trying both filters on any new data sets, evaluating the overlap of features selected and clusters discovered.
The perceptual processing capacity of summary statistics between and within feature dimensions
Attarha, Mouna; Moore, Cathleen M.
2015-01-01
The simultaneous–sequential method was used to test the processing capacity of statistical summary representations both within and between feature dimensions. Sixteen gratings varied with respect to their size and orientation. In Experiment 1, the gratings were equally divided into four separate smaller sets, one of which with a mean size that was larger or smaller than the other three sets, and one of which with a mean orientation that was tilted more leftward or rightward. The task was to report the mean size and orientation of the oddball sets. This therefore required four summary representations for size and another four for orientation. The sets were presented at the same time in the simultaneous condition or across two temporal frames in the sequential condition. Experiment 1 showed evidence of a sequential advantage, suggesting that the system may be limited with respect to establishing multiple within-feature summaries. Experiment 2 eliminates the possibility that some aspect of the task, other than averaging, was contributing to this observed limitation. In Experiment 3, the same 16 gratings appeared as one large superset, and therefore the task only required one summary representation for size and another one for orientation. Equal simultaneous–sequential performance indicated that between-feature summaries are capacity free. These findings challenge the view that within-feature summaries drive a global sense of visual continuity across areas of the peripheral visual field, and suggest a shift in focus to seeking an understanding of how between-feature summaries in one area of the environment control behavior. PMID:26360153
Quality of clinical brain tumor MR spectra judged by humans and machine learning tools.
Kyathanahally, Sreenath P; Mocioiu, Victor; Pedrosa de Barros, Nuno; Slotboom, Johannes; Wright, Alan J; Julià-Sapé, Margarida; Arús, Carles; Kreis, Roland
2018-05-01
To investigate and compare human judgment and machine learning tools for quality assessment of clinical MR spectra of brain tumors. A very large set of 2574 single voxel spectra with short and long echo time from the eTUMOUR and INTERPRET databases were used for this analysis. Original human quality ratings from these studies as well as new human guidelines were used to train different machine learning algorithms for automatic quality control (AQC) based on various feature extraction methods and classification tools. The performance was compared with variance in human judgment. AQC built using the RUSBoost classifier that combats imbalanced training data performed best. When furnished with a large range of spectral and derived features where the most crucial ones had been selected by the TreeBagger algorithm it showed better specificity (98%) in judging spectra from an independent test-set than previously published methods. Optimal performance was reached with a virtual three-class ranking system. Our results suggest that feature space should be relatively large for the case of MR tumor spectra and that three-class labels may be beneficial for AQC. The best AQC algorithm showed a performance in rejecting spectra that was comparable to that of a panel of human expert spectroscopists. Magn Reson Med 79:2500-2510, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
An interactive environment for the analysis of large Earth observation and model data sets
NASA Technical Reports Server (NTRS)
Bowman, Kenneth P.; Walsh, John E.; Wilhelmson, Robert B.
1994-01-01
Envision is an interactive environment that provides researchers in the earth sciences convenient ways to manage, browse, and visualize large observed or model data sets. Its main features are support for the netCDF and HDF file formats, an easy to use X/Motif user interface, a client-server configuration, and portability to many UNIX workstations. The Envision package also provides new ways to view and change metadata in a set of data files. It permits a scientist to conveniently and efficiently manage large data sets consisting of many data files. It also provides links to popular visualization tools so that data can be quickly browsed. Envision is a public domain package, freely available to the scientific community. Envision software (binaries and source code) and documentation can be obtained from either of these servers: ftp://vista.atmos.uiuc.edu/pub/envision/ and ftp://csrp.tamu.edu/pub/envision/. Detailed descriptions of Envision capabilities and operations can be found in the User's Guide and Reference Manuals distributed with Envision software.
Detection of pesticide (Cyantraniliprole) residue on grapes using hyperspectral sensing
NASA Astrophysics Data System (ADS)
Mohite, Jayantrao; Karale, Yogita; Pappula, Srinivasu; Shabeer, Ahammed T. P.; Sawant, S. D.; Hingmire, Sandip
2017-05-01
Pesticide residues in the fruits, vegetables and agricultural commodities are harmful to humans and are becoming a health concern nowadays. Detection of pesticide residues on various commodities in an open environment is a challenging task. Hyperspectral sensing is one of the recent technologies used to detect the pesticide residues. This paper addresses the problem of detection of pesticide residues of Cyantraniliprole on grapes in open fields using multi temporal hyperspectral remote sensing data. The re ectance data of 686 samples of grapes with no, single and double dose application of Cyantraniliprole has been collected by handheld spectroradiometer (MS- 720) with a wavelength ranging from 350 nm to 1052 nm. The data collection was carried out over a large feature set of 213 spectral bands during the period of March to May 2015. This large feature set may cause model over-fitting problem as well as increase the computational time, so in order to get the most relevant features, various feature selection techniques viz Principle Component Analysis (PCA), LASSO and Elastic Net regularization have been used. Using this selected features, we evaluate the performance of various classifiers such as Artificial Neural Networks (ANN), Support Vector Machine (SVM), Random Forest (RF) and Extreme Gradient Boosting (XGBoost) to classify the grape sample with no, single or double application of Cyantraniliprole. The key finding of this paper is; most of the features selected by the LASSO varies between 350-373nm and 940-990nm consistently for all days. Experimental results also shows that, by using the relevant features selected by LASSO, SVM performs better with average prediction accuracy of 91.98 % among all classifiers, for all days.
Materials prediction via classification learning
Balachandran, Prasanna V.; Theiler, James; Rondinelli, James M.; ...
2015-08-25
In the paradigm of materials informatics for accelerated materials discovery, the choice of feature set (i.e. attributes that capture aspects of structure, chemistry and/or bonding) is critical. Ideally, the feature sets should provide a simple physical basis for extracting major structural and chemical trends and furthermore, enable rapid predictions of new material chemistries. Orbital radii calculated from model pseudopotential fits to spectroscopic data are potential candidates to satisfy these conditions. Although these radii (and their linear combinations) have been utilized in the past, their functional forms are largely justified with heuristic arguments. Here we show that machine learning methods naturallymore » uncover the functional forms that mimic most frequently used features in the literature, thereby providing a mathematical basis for feature set construction without a priori assumptions. We apply these principles to study two broad materials classes: (i) wide band gap AB compounds and (ii) rare earth-main group RM intermetallics. The AB compounds serve as a prototypical example to demonstrate our approach, whereas the RM intermetallics show how these concepts can be used to rapidly design new ductile materials. In conclusion, our predictive models indicate that ScCo, ScIr, and YCd should be ductile, whereas each was previously proposed to be brittle.« less
Materials Prediction via Classification Learning
Balachandran, Prasanna V.; Theiler, James; Rondinelli, James M.; Lookman, Turab
2015-01-01
In the paradigm of materials informatics for accelerated materials discovery, the choice of feature set (i.e. attributes that capture aspects of structure, chemistry and/or bonding) is critical. Ideally, the feature sets should provide a simple physical basis for extracting major structural and chemical trends and furthermore, enable rapid predictions of new material chemistries. Orbital radii calculated from model pseudopotential fits to spectroscopic data are potential candidates to satisfy these conditions. Although these radii (and their linear combinations) have been utilized in the past, their functional forms are largely justified with heuristic arguments. Here we show that machine learning methods naturally uncover the functional forms that mimic most frequently used features in the literature, thereby providing a mathematical basis for feature set construction without a priori assumptions. We apply these principles to study two broad materials classes: (i) wide band gap AB compounds and (ii) rare earth-main group RM intermetallics. The AB compounds serve as a prototypical example to demonstrate our approach, whereas the RM intermetallics show how these concepts can be used to rapidly design new ductile materials. Our predictive models indicate that ScCo, ScIr, and YCd should be ductile, whereas each was previously proposed to be brittle. PMID:26304800
ERIC Educational Resources Information Center
Scholl, Daniel
2012-01-01
The results of international school achievement studies had major educational implications in many European countries, especially for the control concepts of education. This becomes exemplarily apparent in Germany, in which a large-scale educational reform was set in motion. Thereby, the education system was set from an input- to output-oriented…
NASA Astrophysics Data System (ADS)
Laura, Jason; Skinner, James A.; Hunter, Marc A.
2017-08-01
In this paper we present the Large Crater Clustering (LCC) tool set, an ArcGIS plugin that supports the quantitative approximation of a primary impact location from user-identified locations of possible secondary impact craters or the long-axes of clustered secondary craters. The identification of primary impact craters directly supports planetary geologic mapping and topical science studies where the chronostratigraphic age of some geologic units may be known, but more distant features have questionable geologic ages. Previous works (e.g., McEwen et al., 2005; Dundas and McEwen, 2007) have shown that the source of secondary impact craters can be estimated from secondary impact craters. This work adapts those methods into a statistically robust tool set. We describe the four individual tools within the LCC tool set to support: (1) processing individually digitized point observations (craters), (2) estimating the directional distribution of a clustered set of craters, back projecting the potential flight paths (crater clusters or linearly approximated catenae or lineaments), (3) intersecting projected paths, and (4) intersecting back-projected trajectories to approximate the local of potential source primary craters. We present two case studies using secondary impact features mapped in two regions of Mars. We demonstrate that the tool is able to quantitatively identify primary impacts and supports the improved qualitative interpretation of potential secondary crater flight trajectories.
A Robust Shape Reconstruction Method for Facial Feature Point Detection.
Tan, Shuqiu; Chen, Dongyi; Guo, Chenggang; Huang, Zhiqi
2017-01-01
Facial feature point detection has been receiving great research advances in recent years. Numerous methods have been developed and applied in practical face analysis systems. However, it is still a quite challenging task because of the large variability in expression and gestures and the existence of occlusions in real-world photo shoot. In this paper, we present a robust sparse reconstruction method for the face alignment problems. Instead of a direct regression between the feature space and the shape space, the concept of shape increment reconstruction is introduced. Moreover, a set of coupled overcomplete dictionaries termed the shape increment dictionary and the local appearance dictionary are learned in a regressive manner to select robust features and fit shape increments. Additionally, to make the learned model more generalized, we select the best matched parameter set through extensive validation tests. Experimental results on three public datasets demonstrate that the proposed method achieves a better robustness over the state-of-the-art methods.
Chang, Kaowen Grace; Chien, Hungju
2017-01-01
Studies have suggested that visiting and viewing landscaping at hospitals accelerates patient’s recovery from surgery and help staff’s recovery from mental fatigue. To plan and construct such landscapes, we need to unravel landscape features desirable to different groups so that the space can benefit a wide range of hospital users. Using discrete choice modeling, we developed experimental choice sets to investigate how landscape features influence the visitations of different users in a large regional hospital in Taiwan. The empirical survey provides quantitative estimates of the influence of each landscape feature on four user groups, including patients, caregivers, staff, and neighborhood residents. Our findings suggest that different types of features promote visits from specific user groups. Landscape features facilitating physical activities effectively encourage visits across user groups especially for caregivers and staff. Patients in this study specify a strong need for contact with nature. The nearby community favors the features designed for children’s play and family activities. People across user groups value the features that provide a mitigated microclimate of comfort, such as a shelter. Study implications and limitations are also discussed. Our study provides information essential for creating a better healing environment in a hospital setting. PMID:28678168
Fahimi, Fatemeh; Guan, Cuntai; Wooi Boon Goh; Kai Keng Ang; Choon Guan Lim; Tih Shih Lee
2017-07-01
Measuring attention from electroencephalogram (EEG) has found applications in the treatment of Attention Deficit Hyperactivity Disorder (ADHD). It is of great interest to understand what features in EEG are most representative of attention. Intensive research has been done in the past and it has been proven that frequency band powers and their ratios are effective features in detecting attention. However, there are still unanswered questions, like, what features in EEG are most discriminative between attentive and non-attentive states? Are these features common among all subjects or are they subject-specific and must be optimized for each subject? Using Mutual Information (MI) to perform subject-specific feature selection on a large data set including 120 ADHD children, we found that besides theta beta ratio (TBR) which is commonly used in attention detection and neurofeedback, the relative beta power and theta/(alpha+beta) (TBAR) are also equally significant and informative for attention detection. Interestingly, we found that the relative theta power (which is also commonly used) may not have sufficient discriminative information itself (it is informative only for 3.26% of ADHD children). We have also demonstrated that although these features (relative beta power, TBR and TBAR) are the most important measures to detect attention on average, different subjects have different set of most discriminative features.
Jimeno Yepes, Antonio
2017-09-01
Word sense disambiguation helps identifying the proper sense of ambiguous words in text. With large terminologies such as the UMLS Metathesaurus ambiguities appear and highly effective disambiguation methods are required. Supervised learning algorithm methods are used as one of the approaches to perform disambiguation. Features extracted from the context of an ambiguous word are used to identify the proper sense of such a word. The type of features have an impact on machine learning methods, thus affect disambiguation performance. In this work, we have evaluated several types of features derived from the context of the ambiguous word and we have explored as well more global features derived from MEDLINE using word embeddings. Results show that word embeddings improve the performance of more traditional features and allow as well using recurrent neural network classifiers based on Long-Short Term Memory (LSTM) nodes. The combination of unigrams and word embeddings with an SVM sets a new state of the art performance with a macro accuracy of 95.97 in the MSH WSD data set. Copyright © 2017 Elsevier Inc. All rights reserved.
An ensemble method for extracting adverse drug events from social media.
Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi
2016-06-01
Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.
2011-01-01
Background Cardiotocography (CTG) is the most widely used tool for fetal surveillance. The visual analysis of fetal heart rate (FHR) traces largely depends on the expertise and experience of the clinician involved. Several approaches have been proposed for the effective interpretation of FHR. In this paper, a new approach for FHR feature extraction based on empirical mode decomposition (EMD) is proposed, which was used along with support vector machine (SVM) for the classification of FHR recordings as 'normal' or 'at risk'. Methods The FHR were recorded from 15 subjects at a sampling rate of 4 Hz and a dataset consisting of 90 randomly selected records of 20 minutes duration was formed from these. All records were labelled as 'normal' or 'at risk' by two experienced obstetricians. A training set was formed by 60 records, the remaining 30 left as the testing set. The standard deviations of the EMD components are input as features to a support vector machine (SVM) to classify FHR samples. Results For the training set, a five-fold cross validation test resulted in an accuracy of 86% whereas the overall geometric mean of sensitivity and specificity was 94.8%. The Kappa value for the training set was .923. Application of the proposed method to the testing set (30 records) resulted in a geometric mean of 81.5%. The Kappa value for the testing set was .684. Conclusions Based on the overall performance of the system it can be stated that the proposed methodology is a promising new approach for the feature extraction and classification of FHR signals. PMID:21244712
Mapping Surface Features Produced by an Active Landslide
NASA Astrophysics Data System (ADS)
Parise, Mario; Gueguen, Erwan; Vennari, Carmela
2016-10-01
A large landslide reactivated on December 2013, at Montescaglioso, southern Italy, after 56 hours of rainfall. The landslide disrupted over 500 m of a freeway, involved a few warehouses, a supermarket, and private homes. After the event, it has been performed field surveys, aided by visual analysis of terrestrial and helicopter photographs, to compile a map of the surface deformations. The geomorphological features mapped included single fractures, sets of fractures, tension cracks, trenches, and pressure ridges. In this paper we present the methodology used, the map obtained through the intensive field work, and discuss the main surface features produced by the landslide.
Saliency image of feature building for image quality assessment
NASA Astrophysics Data System (ADS)
Ju, Xinuo; Sun, Jiyin; Wang, Peng
2011-11-01
The purpose and method of image quality assessment are quite different for automatic target recognition (ATR) and traditional application. Local invariant feature detectors, mainly including corner detectors, blob detectors and region detectors etc., are widely applied for ATR. A saliency model of feature was proposed to evaluate feasibility of ATR in this paper. The first step consisted of computing the first-order derivatives on horizontal orientation and vertical orientation, and computing DoG maps in different scales respectively. Next, saliency images of feature were built based auto-correlation matrix in different scale. Then, saliency images of feature of different scales amalgamated. Experiment were performed on a large test set, including infrared images and optical images, and the result showed that the salient regions computed by this model were consistent with real feature regions computed by mostly local invariant feature extraction algorithms.
A statistical parts-based appearance model of inter-subject variability.
Toews, Matthew; Collins, D Louis; Arbel, Tal
2006-01-01
In this article, we present a general statistical parts-based model for representing the appearance of an image set, applied to the problem of inter-subject MR brain image matching. In contrast with global image representations such as active appearance models, the parts-based model consists of a collection of localized image parts whose appearance, geometry and occurrence frequency are quantified statistically. The parts-based approach explicitly addresses the case where one-to-one correspondence does not exist between subjects due to anatomical differences, as parts are not expected to occur in all subjects. The model can be learned automatically, discovering structures that appear with statistical regularity in a large set of subject images, and can be robustly fit to new images, all in the presence of significant inter-subject variability. As parts are derived from generic scale-invariant features, the framework can be applied in a wide variety of image contexts, in order to study the commonality of anatomical parts or to group subjects according to the parts they share. Experimentation shows that a parts-based model can be learned from a large set of MR brain images, and used to determine parts that are common within the group of subjects. Preliminary results indicate that the model can be used to automatically identify distinctive features for inter-subject image registration despite large changes in appearance.
Deep learning with non-medical training used for chest pathology identification
NASA Astrophysics Data System (ADS)
Bar, Yaniv; Diamant, Idit; Wolf, Lior; Greenspan, Hayit
2015-03-01
In this work, we examine the strength of deep learning approaches for pathology detection in chest radiograph data. Convolutional neural networks (CNN) deep architecture classification approaches have gained popularity due to their ability to learn mid and high level image representations. We explore the ability of a CNN to identify different types of pathologies in chest x-ray images. Moreover, since very large training sets are generally not available in the medical domain, we explore the feasibility of using a deep learning approach based on non-medical learning. We tested our algorithm on a dataset of 93 images. We use a CNN that was trained with ImageNet, a well-known large scale nonmedical image database. The best performance was achieved using a combination of features extracted from the CNN and a set of low-level features. We obtained an area under curve (AUC) of 0.93 for Right Pleural Effusion detection, 0.89 for Enlarged heart detection and 0.79 for classification between healthy and abnormal chest x-ray, where all pathologies are combined into one large class. This is a first-of-its-kind experiment that shows that deep learning with large scale non-medical image databases may be sufficient for general medical image recognition tasks.
Semi-Supervised Geographical Feature Detection
NASA Astrophysics Data System (ADS)
Yu, H.; Yu, L.; Kuo, K. S.
2016-12-01
Extraction and tracking geographical features is a fundamental requirement in many geoscience fields. However, this operation has become an increasingly challenging task for domain scientists when tackling a large amount of geoscience data. Although domain scientists may have a relatively clear definition of features, it is difficult to capture the presence of features in an accurate and efficient fashion. We propose a semi-supervised approach to address large geographical feature detection. Our approach has two main components. First, we represent a heterogeneous geoscience data in a unified high-dimensional space, which can facilitate us to evaluate the similarity of data points with respect to geolocation, time, and variable values. We characterize the data from these measures, and use a set of hash functions to parameterize the initial knowledge of the data. Second, for any user query, our approach can automatically extract the initial results based on the hash functions. To improve the accuracy of querying, our approach provides a visualization interface to display the querying results and allow users to interactively explore and refine them. The user feedback will be used to enhance our knowledge base in an iterative manner. In our implementation, we use high-performance computing techniques to accelerate the construction of hash functions. Our design facilitates a parallelization scheme for feature detection and extraction, which is a traditionally challenging problem for large-scale data. We evaluate our approach and demonstrate the effectiveness using both synthetic and real world datasets.
Accurate prediction of personalized olfactory perception from large-scale chemoinformatic features.
Li, Hongyang; Panwar, Bharat; Omenn, Gilbert S; Guan, Yuanfang
2018-02-01
The olfactory stimulus-percept problem has been studied for more than a century, yet it is still hard to precisely predict the odor given the large-scale chemoinformatic features of an odorant molecule. A major challenge is that the perceived qualities vary greatly among individuals due to different genetic and cultural backgrounds. Moreover, the combinatorial interactions between multiple odorant receptors and diverse molecules significantly complicate the olfaction prediction. Many attempts have been made to establish structure-odor relationships for intensity and pleasantness, but no models are available to predict the personalized multi-odor attributes of molecules. In this study, we describe our winning algorithm for predicting individual and population perceptual responses to various odorants in the DREAM Olfaction Prediction Challenge. We find that random forest model consisting of multiple decision trees is well suited to this prediction problem, given the large feature spaces and high variability of perceptual ratings among individuals. Integrating both population and individual perceptions into our model effectively reduces the influence of noise and outliers. By analyzing the importance of each chemical feature, we find that a small set of low- and nondegenerative features is sufficient for accurate prediction. Our random forest model successfully predicts personalized odor attributes of structurally diverse molecules. This model together with the top discriminative features has the potential to extend our understanding of olfactory perception mechanisms and provide an alternative for rational odorant design.
Lê Cao, Kim-Anh; Boitard, Simon; Besse, Philippe
2011-06-22
Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.
Method and system for data clustering for very large databases
NASA Technical Reports Server (NTRS)
Livny, Miron (Inventor); Zhang, Tian (Inventor); Ramakrishnan, Raghu (Inventor)
1998-01-01
Multi-dimensional data contained in very large databases is efficiently and accurately clustered to determine patterns therein and extract useful information from such patterns. Conventional computer processors may be used which have limited memory capacity and conventional operating speed, allowing massive data sets to be processed in a reasonable time and with reasonable computer resources. The clustering process is organized using a clustering feature tree structure wherein each clustering feature comprises the number of data points in the cluster, the linear sum of the data points in the cluster, and the square sum of the data points in the cluster. A dense region of data points is treated collectively as a single cluster, and points in sparsely occupied regions can be treated as outliers and removed from the clustering feature tree. The clustering can be carried out continuously with new data points being received and processed, and with the clustering feature tree being restructured as necessary to accommodate the information from the newly received data points.
Feature relevance assessment for the semantic interpretation of 3D point cloud data
NASA Astrophysics Data System (ADS)
Weinmann, M.; Jutzi, B.; Mallet, C.
2013-10-01
The automatic analysis of large 3D point clouds represents a crucial task in photogrammetry, remote sensing and computer vision. In this paper, we propose a new methodology for the semantic interpretation of such point clouds which involves feature relevance assessment in order to reduce both processing time and memory consumption. Given a standard benchmark dataset with 1.3 million 3D points, we first extract a set of 21 geometric 3D and 2D features. Subsequently, we apply a classifier-independent ranking procedure which involves a general relevance metric in order to derive compact and robust subsets of versatile features which are generally applicable for a large variety of subsequent tasks. This metric is based on 7 different feature selection strategies and thus addresses different intrinsic properties of the given data. For the example of semantically interpreting 3D point cloud data, we demonstrate the great potential of smaller subsets consisting of only the most relevant features with 4 different state-of-the-art classifiers. The results reveal that, instead of including as many features as possible in order to compensate for lack of knowledge, a crucial task such as scene interpretation can be carried out with only few versatile features and even improved accuracy.
Data Downloads | ECHO | US EPA
The ECHO website with its facility search features is designed to provide easy access to EPA's compliance and enforcement data with customizable onscreen display and download. For those with larger data needs, ECHO has several types of data sets available. These large data sets may be of particular use to developers, programmers, academics, and analysts. The data available here can be downloaded and used for many different functions and are certain to meet all data retrieval needs.
Chaibub Neto, Elias; Bare, J. Christopher; Margolin, Adam A.
2014-01-01
New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where “omics” features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights. PMID:25289666
Discovering semantic features in the literature: a foundation for building functional associations
Chagoyen, Monica; Carmona-Saez, Pedro; Shatkay, Hagit; Carazo, Jose M; Pascual-Montano, Alberto
2006-01-01
Background Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. Results We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. Conclusion The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data. PMID:16438716
An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.
Bravi, Luca; Piccialli, Veronica; Sciandrone, Marco
2017-04-01
In this paper, we consider the feature ranking problem, where, given a set of training instances, the task is to associate a score with the features in order to assess their relevance. Feature ranking is a very important tool for decision support systems, and may be used as an auxiliary step of feature selection to reduce the high dimensionality of real-world data. We focus on regression problems by assuming that the process underlying the generated data can be approximated by a continuous function (for instance, a feedforward neural network). We formally state the notion of relevance of a feature by introducing a minimum zero-norm inversion problem of a neural network, which is a nonsmooth, constrained optimization problem. We employ a concave approximation of the zero-norm function, and we define a smooth, global optimization problem to be solved in order to assess the relevance of the features. We present the new feature ranking method based on the solution of instances of the global optimization problem depending on the available training data. Computational experiments on both artificial and real data sets are performed, and point out that the proposed feature ranking method is a valid alternative to existing methods in terms of effectiveness. The obtained results also show that the method is costly in terms of CPU time, and this may be a limitation in the solution of large-dimensional problems.
The association of color memory and the enumeration of multiple spatially overlapping sets.
Poltoratski, Sonia; Xu, Yaoda
2013-07-09
Using dot displays, Halberda, Sires, and Feigenson (2006) showed that observers could simultaneously encode the numerosity of two spatially overlapping sets and the superset of all items at a glance. With the brief display and the masking used in Halberda et al., the task required observers to encode the colors of each set in order to select and enumerate all the dots in that set. As such, the observed capacity limit for set enumeration could reflect a limit in visual short-term memory (VSTM) capacity for the set color rather than a limit in set enumeration per se. Here, we largely replicated Halberda et al. and found successful enumeration of approximately two sets (the superset was not probed). We also found that only about two and a half colors could be remembered from the colored dot displays whether or not the enumeration task was performed concurrently with the color VSTM task. Because observers must remember the color of a set prior to enumerating it, the under three-item VSTM capacity for color necessarily dictates that set enumeration capacity in this paradigm could not exceed two sets. Thus, the ability to enumerate multiple spatially overlapping sets is likely limited by VSTM capacity to retain the discriminating feature of these sets. This relationship suggests that the capacity for set enumeration cannot be considered independently from the capacity for the set's defining features.
Receptive fields selection for binary feature description.
Fan, Bin; Kong, Qingqun; Trzcinski, Tomasz; Wang, Zhiheng; Pan, Chunhong; Fua, Pascal
2014-06-01
Feature description for local image patch is widely used in computer vision. While the conventional way to design local descriptor is based on expert experience and knowledge, learning-based methods for designing local descriptor become more and more popular because of their good performance and data-driven property. This paper proposes a novel data-driven method for designing binary feature descriptor, which we call receptive fields descriptor (RFD). Technically, RFD is constructed by thresholding responses of a set of receptive fields, which are selected from a large number of candidates according to their distinctiveness and correlations in a greedy way. Using two different kinds of receptive fields (namely rectangular pooling area and Gaussian pooling area) for selection, we obtain two binary descriptors RFDR and RFDG .accordingly. Image matching experiments on the well-known patch data set and Oxford data set demonstrate that RFD significantly outperforms the state-of-the-art binary descriptors, and is comparable with the best float-valued descriptors at a fraction of processing time. Finally, experiments on object recognition tasks confirm that both RFDR and RFDG successfully bridge the performance gap between binary descriptors and their floating-point competitors.
Joint classification and contour extraction of large 3D point clouds
NASA Astrophysics Data System (ADS)
Hackel, Timo; Wegner, Jan D.; Schindler, Konrad
2017-08-01
We present an effective and efficient method for point-wise semantic classification and extraction of object contours of large-scale 3D point clouds. What makes point cloud interpretation challenging is the sheer size of several millions of points per scan and the non-grid, sparse, and uneven distribution of points. Standard image processing tools like texture filters, for example, cannot handle such data efficiently, which calls for dedicated point cloud labeling methods. It turns out that one of the major drivers for efficient computation and handling of strong variations in point density, is a careful formulation of per-point neighborhoods at multiple scales. This allows, both, to define an expressive feature set and to extract topologically meaningful object contours. Semantic classification and contour extraction are interlaced problems. Point-wise semantic classification enables extracting a meaningful candidate set of contour points while contours help generating a rich feature representation that benefits point-wise classification. These methods are tailored to have fast run time and small memory footprint for processing large-scale, unstructured, and inhomogeneous point clouds, while still achieving high classification accuracy. We evaluate our methods on the semantic3d.net benchmark for terrestrial laser scans with >109 points.
Breaking the polar-nonpolar division in solvation free energy prediction.
Wang, Bao; Wang, Chengzhang; Wu, Kedi; Wei, Guo-Wei
2018-02-05
Implicit solvent models divide solvation free energies into polar and nonpolar additive contributions, whereas polar and nonpolar interactions are inseparable and nonadditive. We present a feature functional theory (FFT) framework to break this ad hoc division. The essential ideas of FFT are as follows: (i) representability assumption: there exists a microscopic feature vector that can uniquely characterize and distinguish one molecule from another; (ii) feature-function relationship assumption: the macroscopic features, including solvation free energy, of a molecule is a functional of microscopic feature vectors; and (iii) similarity assumption: molecules with similar microscopic features have similar macroscopic properties, such as solvation free energies. Based on these assumptions, solvation free energy prediction is carried out in the following protocol. First, we construct a molecular microscopic feature vector that is efficient in characterizing the solvation process using quantum mechanics and Poisson-Boltzmann theory. Microscopic feature vectors are combined with macroscopic features, that is, physical observable, to form extended feature vectors. Additionally, we partition a solvation dataset into queries according to molecular compositions. Moreover, for each target molecule, we adopt a machine learning algorithm for its nearest neighbor search, based on the selected microscopic feature vectors. Finally, from the extended feature vectors of obtained nearest neighbors, we construct a functional of solvation free energy, which is employed to predict the solvation free energy of the target molecule. The proposed FFT model has been extensively validated via a large dataset of 668 molecules. The leave-one-out test gives an optimal root-mean-square error (RMSE) of 1.05 kcal/mol. FFT predictions of SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 challenge sets deliver the RMSEs of 0.61, 1.86, 1.64, 0.86, and 1.14 kcal/mol, respectively. Using a test set of 94 molecules and its associated training set, the present approach was carefully compared with a classic solvation model based on weighted solvent accessible surface area. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
A generalized geologic map of Mars.
NASA Technical Reports Server (NTRS)
Carr, M. H.; Masursky, H.; Saunders, R. S.
1973-01-01
A geologic map of Mars has been constructed largely on the basis of photographic evidence. Four classes of units are recognized: (1) primitive cratered terrain, (2) sparsely cratered volcanic eolian plains, (3) circular radially symmetric volcanic constructs such as shield volcanoes, domes, and craters, and (4) tectonic erosional units such as chaotic and channel deposits. Grabens are the main structural features; compressional and strike slip features are almost completely absent. Most grabens are part of a set radial to the main volcanic area, Tharsis.
Proportional plus integral MIMO controller for regulation and tracking with anti-wind-up features
DOE Office of Scientific and Technical Information (OSTI.GOV)
Puleston, P.F.; Mantz, R.J.
1993-11-01
A proportional plus integral matrix control structure for MIMO systems is proposed. Based on a standard optimal control structure with integral action, it permits a greater degree of independence of the design and tuning of the regulating and tracking features, without considerably increasing the controller complexity. Fast recovery from load disturbances is achieved, while large overshoots associated with set-point changes and reset wind-up problems can be reduced. A simple effective procedure for practical tuning is introduced.
Automatic alignment of individual peaks in large high-resolution spectral data sets
NASA Astrophysics Data System (ADS)
Stoyanova, Radka; Nicholls, Andrew W.; Nicholson, Jeremy K.; Lindon, John C.; Brown, Truman R.
2004-10-01
Pattern recognition techniques are effective tools for reducing the information contained in large spectral data sets to a much smaller number of significant features which can then be used to make interpretations about the chemical or biochemical system under study. Often the effectiveness of such approaches is impeded by experimental and instrument induced variations in the position, phase, and line width of the spectral peaks. Although characterizing the cause and magnitude of these fluctuations could be important in its own right (pH-induced NMR chemical shift changes, for example) in general they obscure the process of pattern discovery. One major area of application is the use of large databases of 1H NMR spectra of biofluids such as urine for investigating perturbations in metabolic profiles caused by drugs or disease, a process now termed metabonomics. Frequency shifts of individual peaks are the dominant source of such unwanted variations in this type of data. In this paper, an automatic procedure for aligning the individual peaks in the data set is described and evaluated. The proposed method will be vital for the efficient and automatic analysis of large metabonomic data sets and should also be applicable to other types of data.
Online feature selection with streaming features.
Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan
2013-05-01
We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.
A Statistical Analysis of Corona Topography: New Insights into Corona Formation and Evolution
NASA Technical Reports Server (NTRS)
Stofan, E. R.; Glaze, L. S.; Smrekar, S. E.; Baloga, S. M.
2003-01-01
Extensive mapping of the surface of Venus and continued analysis of Magellan data have allowed a more comprehensive survey of coronae to be conducted. Our updated corona database contains 514 features, an increase from the 326 coronae of the previous survey. We include a new set of 106 Type 2 or stealth coronae, which have a topographic rather than a fracture annulus. The large increase in the number of coronae over the 1992 survey results from several factors, including the use of the full Magellan data set and the addition of features identified as part of the systematic geologic mapping of Venus. Parameters of the population that we have analyzed to date include size and topography.
Fast 3D Surface Extraction 2 pages (including abstract)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sewell, Christopher Meyer; Patchett, John M.; Ahrens, James P.
Ocean scientists searching for isosurfaces and/or thresholds of interest in high resolution 3D datasets required a tedious and time-consuming interactive exploration experience. PISTON research and development activities are enabling ocean scientists to rapidly and interactively explore isosurfaces and thresholds in their large data sets using a simple slider with real time calculation and visualization of these features. Ocean Scientists can now visualize more features in less time, helping them gain a better understanding of the high resolution data sets they work with on a daily basis. Isosurface timings (512{sup 3} grid): VTK 7.7 s, Parallel VTK (48-core) 1.3 s, PISTONmore » OpenMP (48-core) 0.2 s, PISTON CUDA (Quadro 6000) 0.1 s.« less
A new computer approach to mixed feature classification for forestry application
NASA Technical Reports Server (NTRS)
Kan, E. P.
1976-01-01
A computer approach for mapping mixed forest features (i.e., types, classes) from computer classification maps is discussed. Mixed features such as mixed softwood/hardwood stands are treated as admixtures of softwood and hardwood areas. Large-area mixed features are identified and small-area features neglected when the nominal size of a mixed feature can be specified. The computer program merges small isolated areas into surrounding areas by the iterative manipulation of the postprocessing algorithm that eliminates small connected sets. For a forestry application, computer-classified LANDSAT multispectral scanner data of the Sam Houston National Forest were used to demonstrate the proposed approach. The technique was successful in cleaning the salt-and-pepper appearance of multiclass classification maps and in mapping admixtures of softwood areas and hardwood areas. However, the computer-mapped mixed areas matched very poorly with the ground truth because of inadequate resolution and inappropriate definition of mixed features.
Cascaded ensemble of convolutional neural networks and handcrafted features for mitosis detection
NASA Astrophysics Data System (ADS)
Wang, Haibo; Cruz-Roa, Angel; Basavanhally, Ajay; Gilmore, Hannah; Shih, Natalie; Feldman, Mike; Tomaszewski, John; Gonzalez, Fabio; Madabhushi, Anant
2014-03-01
Breast cancer (BCa) grading plays an important role in predicting disease aggressiveness and patient outcome. A key component of BCa grade is mitotic count, which involves quantifying the number of cells in the process of dividing (i.e. undergoing mitosis) at a specific point in time. Currently mitosis counting is done manually by a pathologist looking at multiple high power fields on a glass slide under a microscope, an extremely laborious and time consuming process. The development of computerized systems for automated detection of mitotic nuclei, while highly desirable, is confounded by the highly variable shape and appearance of mitoses. Existing methods use either handcrafted features that capture certain morphological, statistical or textural attributes of mitoses or features learned with convolutional neural networks (CNN). While handcrafted features are inspired by the domain and the particular application, the data-driven CNN models tend to be domain agnostic and attempt to learn additional feature bases that cannot be represented through any of the handcrafted features. On the other hand, CNN is computationally more complex and needs a large number of labeled training instances. Since handcrafted features attempt to model domain pertinent attributes and CNN approaches are largely unsupervised feature generation methods, there is an appeal to attempting to combine these two distinct classes of feature generation strategies to create an integrated set of attributes that can potentially outperform either class of feature extraction strategies individually. In this paper, we present a cascaded approach for mitosis detection that intelligently combines a CNN model and handcrafted features (morphology, color and texture features). By employing a light CNN model, the proposed approach is far less demanding computationally, and the cascaded strategy of combining handcrafted features and CNN-derived features enables the possibility of maximizing performance by leveraging the disconnected feature sets. Evaluation on the public ICPR12 mitosis dataset that has 226 mitoses annotated on 35 High Power Fields (HPF, x400 magnification) by several pathologists and 15 testing HPFs yielded an F-measure of 0.7345. Apart from this being the second best performance ever recorded for this MITOS dataset, our approach is faster and requires fewer computing resources compared to extant methods, making this feasible for clinical use.
Modeling Geometric-Temporal Context With Directional Pyramid Co-Occurrence for Action Recognition.
Yuan, Chunfeng; Li, Xi; Hu, Weiming; Ling, Haibin; Maybank, Stephen J
2014-02-01
In this paper, we present a new geometric-temporal representation for visual action recognition based on local spatio-temporal features. First, we propose a modified covariance descriptor under the log-Euclidean Riemannian metric to represent the spatio-temporal cuboids detected in the video sequences. Compared with previously proposed covariance descriptors, our descriptor can be measured and clustered in Euclidian space. Second, to capture the geometric-temporal contextual information, we construct a directional pyramid co-occurrence matrix (DPCM) to describe the spatio-temporal distribution of the vector-quantized local feature descriptors extracted from a video. DPCM characterizes the co-occurrence statistics of local features as well as the spatio-temporal positional relationships among the concurrent features. These statistics provide strong descriptive power for action recognition. To use DPCM for action recognition, we propose a directional pyramid co-occurrence matching kernel to measure the similarity of videos. The proposed method achieves the state-of-the-art performance and improves on the recognition performance of the bag-of-visual-words (BOVWs) models by a large margin on six public data sets. For example, on the KTH data set, it achieves 98.78% accuracy while the BOVW approach only achieves 88.06%. On both Weizmann and UCF CIL data sets, the highest possible accuracy of 100% is achieved.
Scalable Nearest Neighbor Algorithms for High Dimensional Data.
Muja, Marius; Lowe, David G
2014-11-01
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Genetic basis of climatic adaptation in scots pine by bayesian quantitative trait locus analysis.
Hurme, P; Sillanpää, M J; Arjas, E; Repo, T; Savolainen, O
2000-01-01
We examined the genetic basis of large adaptive differences in timing of bud set and frost hardiness between natural populations of Scots pine. As a mapping population, we considered an "open-pollinated backcross" progeny by collecting seeds of a single F(1) tree (cross between trees from southern and northern Finland) growing in southern Finland. Due to the special features of the design (no marker information available on grandparents or the father), we applied a Bayesian quantitative trait locus (QTL) mapping method developed previously for outcrossed offspring. We found four potential QTL for timing of bud set and seven for frost hardiness. Bayesian analyses detected more QTL than ANOVA for frost hardiness, but the opposite was true for bud set. These QTL included alleles with rather large effects, and additionally smaller QTL were supported. The largest QTL for bud set date accounted for about a fourth of the mean difference between populations. Thus, natural selection during adaptation has resulted in selection of at least some alleles of rather large effect. PMID:11063704
Schulze, H Georg; Turner, Robin F B
2013-04-01
Raman spectra often contain undesirable, randomly positioned, intense, narrow-bandwidth, positive, unidirectional spectral features generated when cosmic rays strike charge-coupled device cameras. These must be removed prior to analysis, but doing so manually is not feasible for large data sets. We developed a quick, simple, effective, semi-automated procedure to remove cosmic ray spikes from spectral data sets that contain large numbers of relatively homogenous spectra. Although some inhomogeneous spectral data sets can be accommodated--it requires replacing excessively modified spectra with the originals and removing their spikes with a median filter instead--caution is advised when processing such data sets. In addition, the technique is suitable for interpolating missing spectra or replacing aberrant spectra with good spectral estimates. The method is applied to baseline-flattened spectra and relies on fitting a third-order (or higher) polynomial through all the spectra at every wavenumber. Pixel intensities in excess of a threshold of 3× the noise standard deviation above the fit are reduced to the threshold level. Because only two parameters (with readily specified default values) might require further adjustment, the method is easily implemented for semi-automated processing of large spectral sets.
Applicability of PM3 to transphosphorylation reaction path: Toward designing a minimal ribozyme
NASA Technical Reports Server (NTRS)
Manchester, John I.; Shibata, Masayuki; Setlik, Robert F.; Ornstein, Rick L.; Rein, Robert
1993-01-01
A growing body of evidence shows that RNA can catalyze many of the reactions necessary both for replication of genetic material and the possible transition into the modern protein-based world. However, contemporary ribozymes are too large to have self-assembled from a prebiotic oligonucleotide pool. Still, it is likely that the major features of the earliest ribozymes have been preserved as molecular fossils in the catalytic RNA of today. Therefore, the search for a minimal ribozyme has been aimed at finding the necessary structural features of a modern ribozyme (Beaudry and Joyce, 1990). Both a three-dimensional model and quantum chemical calculations are required to quantitatively determine the effects of structural features of the ribozyme on the reaction it catalyzes. Using this model, quantum chemical calculations must be performed to determine quantitatively the effects of structural features on catalysis. Previous studies of the reaction path have been conducted at the ab initio level, but these methods are limited to small models due to enormous computational requirements. Semiempirical methods have been applied to large systems in the past; however, the accuracy of these methods depends largely on a simple model of the ribozyme-catalyzed reaction, or hydrolysis of phosphoric acid. We find that the results are qualitatively similar to ab initio results using large basis sets. Therefore, PM3 is suitable for studying the reaction path of the ribozyme-catalyzed reaction.
Curve Set Feature-Based Robust and Fast Pose Estimation Algorithm
Hashimoto, Koichi
2017-01-01
Bin picking refers to picking the randomly-piled objects from a bin for industrial production purposes, and robotic bin picking is always used in automated assembly lines. In order to achieve a higher productivity, a fast and robust pose estimation algorithm is necessary to recognize and localize the randomly-piled parts. This paper proposes a pose estimation algorithm for bin picking tasks using point cloud data. A novel descriptor Curve Set Feature (CSF) is proposed to describe a point by the surface fluctuation around this point and is also capable of evaluating poses. The Rotation Match Feature (RMF) is proposed to match CSF efficiently. The matching process combines the idea of the matching in 2D space of origin Point Pair Feature (PPF) algorithm with nearest neighbor search. A voxel-based pose verification method is introduced to evaluate the poses and proved to be more than 30-times faster than the kd-tree-based verification method. Our algorithm is evaluated against a large number of synthetic and real scenes and proven to be robust to noise, able to detect metal parts, more accurately and more than 10-times faster than PPF and Oriented, Unique and Repeatable (OUR)-Clustered Viewpoint Feature Histogram (CVFH). PMID:28771216
Context-based automated defect classification system using multiple morphological masks
Gleason, Shaun S.; Hunt, Martin A.; Sari-Sarraf, Hamed
2002-01-01
Automatic detection of defects during the fabrication of semiconductor wafers is largely automated, but the classification of those defects is still performed manually by technicians. This invention includes novel digital image analysis techniques that generate unique feature vector descriptions of semiconductor defects as well as classifiers that use these descriptions to automatically categorize the defects into one of a set of pre-defined classes. Feature extraction techniques based on multiple-focus images, multiple-defect mask images, and segmented semiconductor wafer images are used to create unique feature-based descriptions of the semiconductor defects. These feature-based defect descriptions are subsequently classified by a defect classifier into categories that depend on defect characteristics and defect contextual information, that is, the semiconductor process layer(s) with which the defect comes in contact. At the heart of the system is a knowledge database that stores and distributes historical semiconductor wafer and defect data to guide the feature extraction and classification processes. In summary, this invention takes as its input a set of images containing semiconductor defect information, and generates as its output a classification for the defect that describes not only the defect itself, but also the location of that defect with respect to the semiconductor process layers.
Jekova, Irena; Krasteva, Vessela; Leber, Remo; Schmid, Ramun; Twerenbold, Raphael; Müller, Christian; Reichlin, Tobias; Abächerli, Roger
Electrocardiogram (ECG) biometrics is an advanced technology, not yet covered by guidelines on criteria, features and leads for maximal authentication accuracy. This study aims to define the minimal set of morphological metrics in 12-lead ECG by optimization towards high reliability and security, and validation in a person verification model across a large population. A standard 12-lead resting ECG database from 574 non-cardiac patients with two remote recordings (>1year apart) was used. A commercial ECG analysis module (Schiller AG) measured 202 morphological features, including lead-specific amplitudes, durations, ST-metrics, and axes. Coefficient of variation (CV, intersubject variability) and percent-mean-absolute-difference (PMAD, intrasubject reproducibility) defined the optimization (PMAD/CV→min) and restriction (CV<30%) criteria for selection of the most stable and distinctive features. Linear discriminant analysis (LDA) validated the non-redundant feature set for person verification. Maximal LDA verification sensitivity (85.3%) and specificity (86.4%) were validated for 11 optimal features: R-amplitude (I,II,V1,V2,V3,V5), S-amplitude (V1,V2), Tnegative-amplitude (aVR), and R-duration (aVF,V1). Copyright © 2016 Elsevier Inc. All rights reserved.
Gadd, C S; Baskaran, P; Lobach, D F
1998-01-01
Extensive utilization of point-of-care decision support systems will be largely dependent on the development of user interaction capabilities that make them effective clinical tools in patient care settings. This research identified critical design features of point-of-care decision support systems that are preferred by physicians, through a multi-method formative evaluation of an evolving prototype of an Internet-based clinical decision support system. Clinicians used four versions of the system--each highlighting a different functionality. Surveys and qualitative evaluation methodologies assessed clinicians' perceptions regarding system usability and usefulness. Our analyses identified features that improve perceived usability, such as telegraphic representations of guideline-related information, facile navigation, and a forgiving, flexible interface. Users also preferred features that enhance usefulness and motivate use, such as an encounter documentation tool and the availability of physician instruction and patient education materials. In addition to identifying design features that are relevant to efforts to develop clinical systems for point-of-care decision support, this study demonstrates the value of combining quantitative and qualitative methods of formative evaluation with an iterative system development strategy to implement new information technology in complex clinical settings.
MATE: Machine Learning for Adaptive Calibration Template Detection
Donné, Simon; De Vylder, Jonas; Goossens, Bart; Philips, Wilfried
2016-01-01
The problem of camera calibration is two-fold. On the one hand, the parameters are estimated from known correspondences between the captured image and the real world. On the other, these correspondences themselves—typically in the form of chessboard corners—need to be found. Many distinct approaches for this feature template extraction are available, often of large computational and/or implementational complexity. We exploit the generalized nature of deep learning networks to detect checkerboard corners: our proposed method is a convolutional neural network (CNN) trained on a large set of example chessboard images, which generalizes several existing solutions. The network is trained explicitly against noisy inputs, as well as inputs with large degrees of lens distortion. The trained network that we evaluate is as accurate as existing techniques while offering improved execution time and increased adaptability to specific situations with little effort. The proposed method is not only robust against the types of degradation present in the training set (lens distortions, and large amounts of sensor noise), but also to perspective deformations, e.g., resulting from multi-camera set-ups. PMID:27827920
Sarker, Abeed; Gonzalez, Graciela
2015-02-01
Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training
Gonzalez, Graciela
2014-01-01
Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. PMID:25451103
Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.
Wu, Yonghui; Jiang, Min; Lei, Jianbo; Xu, Hua
2015-01-01
Rapid growth in electronic health records (EHRs) use has led to an unprecedented expansion of available clinical data in electronic formats. However, much of the important healthcare information is locked in the narrative documents. Therefore Natural Language Processing (NLP) technologies, e.g., Named Entity Recognition that identifies boundaries and types of entities, has been extensively studied to unlock important clinical information in free text. In this study, we investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach. We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task. The experiment results showed that the DNN with word embeddings trained from the large unlabeled corpus outperformed the state-of-the-art CRF's model in the minimal feature engineering setting, achieving the highest F1-score of 0.9280. Further analysis showed that word embeddings derived through unsupervised learning from large unlabeled corpus remarkably improved the DNN with randomized embedding, denoting the usefulness of unsupervised feature learning.
The McIntosh Archive: A solar feature database spanning four solar cycles
NASA Astrophysics Data System (ADS)
Gibson, S. E.; Malanushenko, A. V.; Hewins, I.; McFadden, R.; Emery, B.; Webb, D. F.; Denig, W. F.
2016-12-01
The McIntosh Archive consists of a set of hand-drawn solar Carrington maps created by Patrick McIntosh from 1964 to 2009. McIntosh used mainly H-alpha, He-1 10830 and photospheric magnetic measurements from both ground-based and NASA satellite observations. With these he traced coronal holes, polarity inversion lines, filaments, sunspots and plage, yielding a unique 45-year record of the features associated with the large-scale solar magnetic field. We will present the results of recent efforts to preserve and digitize this archive. Most of the original hand-drawn maps have been scanned, a method for processing these scans into digital, searchable format has been developed and streamlined, and an archival repository at NOAA's National Centers for Environmental Information (NCEI) has been created. We will demonstrate how Solar Cycle 23 data may now be accessed and how it may be utilized for scientific applications. In addition, we will discuss how this database of human-recognized features, which overlaps with the onset of high-resolution, continuous modern solar data, may act as a training set for computer feature recognition algorithms.
From tiger to panda: animal head detection.
Zhang, Weiwei; Sun, Jian; Tang, Xiaoou
2011-06-01
Robust object detection has many important applications in real-world online photo processing. For example, both Google image search and MSN live image search have integrated human face detector to retrieve face or portrait photos. Inspired by the success of such face filtering approach, in this paper, we focus on another popular online photo category--animal, which is one of the top five categories in the MSN live image search query log. As a first attempt, we focus on the problem of animal head detection of a set of relatively large land animals that are popular on the internet, such as cat, tiger, panda, fox, and cheetah. First, we proposed a new set of gradient oriented feature, Haar of Oriented Gradients (HOOG), to effectively capture the shape and texture features on animal head. Then, we proposed two detection algorithms, namely Bruteforce detection and Deformable detection, to effectively exploit the shape feature and texture feature simultaneously. Experimental results on 14,379 well labeled animals images validate the superiority of the proposed approach. Additionally, we apply the animal head detector to improve the image search result through text based online photo search result filtering.
Detecting natural occlusion boundaries using local cues
DiMattina, Christopher; Fox, Sean A.; Lewicki, Michael S.
2012-01-01
Occlusion boundaries and junctions provide important cues for inferring three-dimensional scene organization from two-dimensional images. Although several investigators in machine vision have developed algorithms for detecting occlusions and other edges in natural images, relatively few psychophysics or neurophysiology studies have investigated what features are used by the visual system to detect natural occlusions. In this study, we addressed this question using a psychophysical experiment where subjects discriminated image patches containing occlusions from patches containing surfaces. Image patches were drawn from a novel occlusion database containing labeled occlusion boundaries and textured surfaces in a variety of natural scenes. Consistent with related previous work, we found that relatively large image patches were needed to attain reliable performance, suggesting that human subjects integrate complex information over a large spatial region to detect natural occlusions. By defining machine observers using a set of previously studied features measured from natural occlusions and surfaces, we demonstrate that simple features defined at the spatial scale of the image patch are insufficient to account for human performance in the task. To define machine observers using a more biologically plausible multiscale feature set, we trained standard linear and neural network classifiers on the rectified outputs of a Gabor filter bank applied to the image patches. We found that simple linear classifiers could not match human performance, while a neural network classifier combining filter information across location and spatial scale compared well. These results demonstrate the importance of combining a variety of cues defined at multiple spatial scales for detecting natural occlusions. PMID:23255731
Li, Sui-Xian
2018-05-07
Previous research has shown that the effectiveness of selecting filter sets from among a large set of commercial broadband filters by a vector analysis method based on maximum linear independence (MLI). However, the traditional MLI approach is suboptimal due to the need to predefine the first filter of the selected filter set to be the maximum ℓ₂ norm among all available filters. An exhaustive imaging simulation with every single filter serving as the first filter is conducted to investigate the features of the most competent filter set. From the simulation, the characteristics of the most competent filter set are discovered. Besides minimization of the condition number, the geometric features of the best-performed filter set comprise a distinct transmittance peak along the wavelength axis of the first filter, a generally uniform distribution for the peaks of the filters and substantial overlaps of the transmittance curves of the adjacent filters. Therefore, the best-performed filter sets can be recognized intuitively by simple vector analysis and just a few experimental verifications. A practical two-step framework for selecting optimal filter set is recommended, which guarantees a significant enhancement of the performance of the systems. This work should be useful for optimizing the spectral sensitivity of broadband multispectral imaging sensors.
NASA Astrophysics Data System (ADS)
Hassanat, Ahmad B. A.; Jassim, Sabah
2010-04-01
In this paper, the automatic lip reading problem is investigated, and an innovative approach to providing solutions to this problem has been proposed. This new VSR approach is dependent on the signature of the word itself, which is obtained from a hybrid feature extraction method dependent on geometric, appearance, and image transform features. The proposed VSR approach is termed "visual words". The visual words approach consists of two main parts, 1) Feature extraction/selection, and 2) Visual speech feature recognition. After localizing face and lips, several visual features for the lips where extracted. Such as the height and width of the mouth, mutual information and the quality measurement between the DWT of the current ROI and the DWT of the previous ROI, the ratio of vertical to horizontal features taken from DWT of ROI, The ratio of vertical edges to horizontal edges of ROI, the appearance of the tongue and the appearance of teeth. Each spoken word is represented by 8 signals, one of each feature. Those signals maintain the dynamic of the spoken word, which contains a good portion of information. The system is then trained on these features using the KNN and DTW. This approach has been evaluated using a large database for different people, and large experiment sets. The evaluation has proved the visual words efficiency, and shown that the VSR is a speaker dependent problem.
2013-01-01
Background While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. Results The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. Conclusions While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side. PMID:24059743
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi
2016-01-01
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961
Newcombe, Nora S; Ratliff, Kristin R; Shallcross, Wendy L; Twyman, Alexandra D
2010-01-01
Proponents of a geometric module have argued that instances of young children's use of features as well as geometry to reorient can be explained by a two-stage process. In this model, only the first stage is a true reorientation, accomplished by using geometric information alone; features are considered in a second stage using association (Lee, Shusterman & Spelke, 2006). This account is contradicted by the data from two experiments. Experiment 1a sets the stage for Experiment 1b by showing that young children use geometric information to reorient in a complex geometric figure without a single principal axis of symmetry (an octagon). In such a figure, there are two sets of geometrically congruent corners, with four corners in each set. The addition of a colored wall leads to the existence of three geometrically congruent but, crucially, all unmarked corners; using the colored wall to distinguish among them could not be done associatively. In Experiment 1b, both 3- and 5-year-old children showed true non-associative reorientation using features by performing at above-chance levels on all-white trials. Experiment 2 used a paradigm without distinctive geometry, modeled on Lee et al. (2006), involving an equilateral triangle of hiding places located within a circular enclosure, but with a large stable feature rather than a small moveable one. Four-year-olds (the age group studied by Lee et al.) used features at above-chance levels. Thus, features can be used to reorient, in a way not dependent on association, in contradiction to the two-stage version of the modular view.
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Comparative analysis and visualization of multiple collinear genomes
2012-01-01
Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897
Clustering-based Feature Learning on Variable Stars
NASA Astrophysics Data System (ADS)
Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos
2016-04-01
The success of automatic classification of variable stars depends strongly on the lightcurve representation. Usually, lightcurves are represented as a vector of many descriptors designed by astronomers called features. These descriptors are expensive in terms of computing, require substantial research effort to develop, and do not guarantee a good classification. Today, lightcurve representation is not entirely automatic; algorithms must be designed and manually tuned up for every survey. The amounts of data that will be generated in the future mean astronomers must develop scalable and automated analysis pipelines. In this work we present a feature learning algorithm designed for variable objects. Our method works by extracting a large number of lightcurve subsequences from a given set, which are then clustered to find common local patterns in the time series. Representatives of these common patterns are then used to transform lightcurves of a labeled set into a new representation that can be used to train a classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias using only labeled data. We test our method on data sets from the Massive Compact Halo Object survey and the Optical Gravitational Lensing Experiment; the results show that our classification performance is as good as and in some cases better than the performance achieved using traditional statistical features, while the computational cost is significantly lower. With these promising results, we believe that our method constitutes a significant step toward the automation of the lightcurve classification pipeline.
Brosch, Tom; Tang, Lisa Y W; Youngjin Yoo; Li, David K B; Traboulsee, Anthony; Tam, Roger
2016-05-01
We propose a novel segmentation approach based on deep 3D convolutional encoder networks with shortcut connections and apply it to the segmentation of multiple sclerosis (MS) lesions in magnetic resonance images. Our model is a neural network that consists of two interconnected pathways, a convolutional pathway, which learns increasingly more abstract and higher-level image features, and a deconvolutional pathway, which predicts the final segmentation at the voxel level. The joint training of the feature extraction and prediction pathways allows for the automatic learning of features at different scales that are optimized for accuracy for any given combination of image types and segmentation task. In addition, shortcut connections between the two pathways allow high- and low-level features to be integrated, which enables the segmentation of lesions across a wide range of sizes. We have evaluated our method on two publicly available data sets (MICCAI 2008 and ISBI 2015 challenges) with the results showing that our method performs comparably to the top-ranked state-of-the-art methods, even when only relatively small data sets are available for training. In addition, we have compared our method with five freely available and widely used MS lesion segmentation methods (EMS, LST-LPA, LST-LGA, Lesion-TOADS, and SLS) on a large data set from an MS clinical trial. The results show that our method consistently outperforms these other methods across a wide range of lesion sizes.
Adult Education in India & Abroad.
ERIC Educational Resources Information Center
Roy, Nikhil Ranjan
A survey is made of various aspects of adult education in India since 1947, together with comparative accounts of the origin, development, and notable features of adult education in Denmark, Great Britain, the Soviet Union, and the United States. Needs and objectives in India, largely in the eradication of illiteracy, are set forth, and pertinent…
Early Childhood Education: Pathways to Better Health. Preschool Policy Brief Issue 25
ERIC Educational Resources Information Center
Friedman-Krauss, Allison; Barnett, W. Steven
2013-01-01
The potential health benefits of early childhood education programs are quite large, especially for children living in poverty. In this report, authors Allison Friedman-Krauss and Steve Barnett set out the evidence regarding the short and long term health benefits to children from early childhood education programs, identify the features of…
Youth Unemployment and Labour Market Transitions in Hungary
ERIC Educational Resources Information Center
Audas, Rick; Berde, Eva; Dolton, Peter
2005-01-01
Unemployment and labour market adjustment have featured prominently in the problems of transitional economies. However, the position of young people and their transitions from school to work in these new market economies has been virtually ignored. This paper examines a new large longitudinal data set relating to young people in Hungary over the…
NASA Astrophysics Data System (ADS)
Frikha, Mayssa; Fendri, Emna; Hammami, Mohamed
2017-09-01
Using semantic attributes such as gender, clothes, and accessories to describe people's appearance is an appealing modeling method for video surveillance applications. We proposed a midlevel appearance signature based on extracting a list of nameable semantic attributes describing the body in uncontrolled acquisition conditions. Conventional approaches extract the same set of low-level features to learn the semantic classifiers uniformly. Their critical limitation is the inability to capture the dominant visual characteristics for each trait separately. The proposed approach consists of extracting low-level features in an attribute-adaptive way by automatically selecting the most relevant features for each attribute separately. Furthermore, relying on a small training-dataset would easily lead to poor performance due to the large intraclass and interclass variations. We annotated large scale people images collected from different person reidentification benchmarks covering a large attribute sample and reflecting the challenges of uncontrolled acquisition conditions. These annotations were gathered into an appearance semantic attribute dataset that contains 3590 images annotated with 14 attributes. Various experiments prove that carefully designed features for learning the visual characteristics for an attribute provide an improvement of the correct classification accuracy and a reduction of both spatial and temporal complexities against state-of-the-art approaches.
A feature-based developmental model of the infant brain in structural MRI.
Toews, Matthew; Wells, William M; Zöllei, Lilla
2012-01-01
In this paper, anatomical development is modeled as a collection of distinctive image patterns localized in space and time. A Bayesian posterior probability is defined over a random variable of subject age, conditioned on data in the form of scale-invariant image features. The model is automatically learned from a large set of images exhibiting significant variation, used to discover anatomical structure related to age and development, and fit to new images to predict age. The model is applied to a set of 230 infant structural MRIs of 92 subjects acquired at multiple sites over an age range of 8-590 days. Experiments demonstrate that the model can be used to identify age-related anatomical structure, and to predict the age of new subjects with an average error of 72 days.
Families of FPGA-Based Accelerators for Approximate String Matching1
Van Court, Tom; Herbordt, Martin C.
2011-01-01
Dynamic programming for approximate string matching is a large family of different algorithms, which vary significantly in purpose, complexity, and hardware utilization. Many implementations have reported impressive speed-ups, but have typically been point solutions – highly specialized and addressing only one or a few of the many possible options. The problem to be solved is creating a hardware description that implements a broad range of behavioral options without losing efficiency due to feature bloat. We report a set of three component types that address different parts of the approximate string matching problem. This allows each application to choose the feature set required, then make maximum use of the FPGA fabric according to that application’s specific resource requirements. Multiple, interchangeable implementations are available for each component type. We show that these methods allow the efficient generation of a large, if not complete, family of accelerators for this application. This flexibility was obtained while retaining high performance: We have evaluated a sample against serial reference codes and found speed-ups of from 150× to 400× over a high-end PC. PMID:21603598
Ghiassian, Sina; Greiner, Russell; Jin, Ping; Brown, Matthew R. G.
2016-01-01
A clinical tool that can diagnose psychiatric illness using functional or structural magnetic resonance (MR) brain images has the potential to greatly assist physicians and improve treatment efficacy. Working toward the goal of automated diagnosis, we propose an approach for automated classification of ADHD and autism based on histogram of oriented gradients (HOG) features extracted from MR brain images, as well as personal characteristic data features. We describe a learning algorithm that can produce effective classifiers for ADHD and autism when run on two large public datasets. The algorithm is able to distinguish ADHD from control with hold-out accuracy of 69.6% (over baseline 55.0%) using personal characteristics and structural brain scan features when trained on the ADHD-200 dataset (769 participants in training set, 171 in test set). It is able to distinguish autism from control with hold-out accuracy of 65.0% (over baseline 51.6%) using functional images with personal characteristic data when trained on the Autism Brain Imaging Data Exchange (ABIDE) dataset (889 participants in training set, 222 in test set). These results outperform all previously presented methods on both datasets. To our knowledge, this is the first demonstration of a single automated learning process that can produce classifiers for distinguishing patients vs. controls from brain imaging data with above-chance accuracy on large datasets for two different psychiatric illnesses (ADHD and autism). Working toward clinical applications requires robustness against real-world conditions, including the substantial variability that often exists among data collected at different institutions. It is therefore important that our algorithm was successful with the large ADHD-200 and ABIDE datasets, which include data from hundreds of participants collected at multiple institutions. While the resulting classifiers are not yet clinically relevant, this work shows that there is a signal in the (f)MRI data that a learning algorithm is able to find. We anticipate this will lead to yet more accurate classifiers, over these and other psychiatric disorders, working toward the goal of a clinical tool for high accuracy differential diagnosis. PMID:28030565
Extending GIS Technology to Study Karst Features of Southeastern Minnesota
NASA Astrophysics Data System (ADS)
Gao, Y.; Tipping, R. G.; Alexander, E. C.; Alexander, S. C.
2001-12-01
This paper summarizes ongoing research on karst feature distribution of southeastern Minnesota. The main goals of this interdisciplinary research are: 1) to look for large-scale patterns in the rate and distribution of sinkhole development; 2) to conduct statistical tests of hypotheses about the formation of sinkholes; 3) to create management tools for land-use managers and planners; and 4) to deliver geomorphic and hydrogeologic criteria for making scientifically valid land-use policies and ethical decisions in karst areas of southeastern Minnesota. Existing county and sub-county karst feature datasets of southeastern Minnesota have been assembled into a large GIS-based database capable of analyzing the entire data set. The central database management system (DBMS) is a relational GIS-based system interacting with three modules: GIS, statistical and hydrogeologic modules. ArcInfo and ArcView were used to generate a series of 2D and 3D maps depicting karst feature distributions in southeastern Minnesota. IRIS ExplorerTM was used to produce satisfying 3D maps and animations using data exported from GIS-based database. Nearest-neighbor analysis has been used to test sinkhole distributions in different topographic and geologic settings. All current nearest-neighbor analyses testify that sinkholes in southeastern Minnesota are not evenly distributed in this area (i.e., they tend to be clustered). More detailed statistical methods such as cluster analysis, histograms, probability estimation, correlation and regression have been used to study the spatial distributions of some mapped karst features of southeastern Minnesota. A sinkhole probability map for Goodhue County has been constructed based on sinkhole distribution, bedrock geology, depth to bedrock, GIS buffer analysis and nearest-neighbor analysis. A series of karst features for Winona County including sinkholes, springs, seeps, stream sinks and outcrop has been mapped and entered into the Karst Feature Database of Southeastern Minnesota. The Karst Feature Database of Winona County is being expanded to include all the mapped karst features of southeastern Minnesota. Air photos from 1930s to 1990s of Spring Valley Cavern Area in Fillmore County were scanned and geo-referenced into our GIS system. This technology has been proved to be very useful to identify sinkholes and study the rate of sinkhole development.
Palm vein recognition based on directional empirical mode decomposition
NASA Astrophysics Data System (ADS)
Lee, Jen-Chun; Chang, Chien-Ping; Chen, Wei-Kuei
2014-04-01
Directional empirical mode decomposition (DEMD) has recently been proposed to make empirical mode decomposition suitable for the processing of texture analysis. Using DEMD, samples are decomposed into a series of images, referred to as two-dimensional intrinsic mode functions (2-D IMFs), from finer to large scale. A DEMD-based 2 linear discriminant analysis (LDA) for palm vein recognition is proposed. The proposed method progresses through three steps: (i) a set of 2-D IMF features of various scale and orientation are extracted using DEMD, (ii) the 2LDA method is then applied to reduce the dimensionality of the feature space in both the row and column directions, and (iii) the nearest neighbor classifier is used for classification. We also propose two strategies for using the set of 2-D IMF features: ensemble DEMD vein representation (EDVR) and multichannel DEMD vein representation (MDVR). In experiments using palm vein databases, the proposed MDVR-based 2LDA method achieved recognition accuracy of 99.73%, thereby demonstrating its feasibility for palm vein recognition.
A multiparametric assay for quantitative nerve regeneration evaluation.
Weyn, B; van Remoortere, M; Nuydens, R; Meert, T; van de Wouwer, G
2005-08-01
We introduce an assay for the semi-automated quantification of nerve regeneration by image analysis. Digital images of histological sections of regenerated nerves are recorded using an automated inverted microscope and merged into high-resolution mosaic images representing the entire nerve. These are analysed by a dedicated image-processing package that computes nerve-specific features (e.g. nerve area, fibre count, myelinated area) and fibre-specific features (area, perimeter, myelin sheet thickness). The assay's performance and correlation of the automatically computed data with visually obtained data are determined on a set of 140 semithin sections from the distal part of a rat tibial nerve from four different experimental treatment groups (control, sham, sutured, cut) taken at seven different time points after surgery. Results show a high correlation between the manually and automatically derived data, and a high discriminative power towards treatment. Extra value is added by the large feature set. In conclusion, the assay is fast and offers data that currently can be obtained only by a combination of laborious and time-consuming tests.
Cloud computing for genomic data analysis and collaboration.
Langmead, Ben; Nellore, Abhinav
2018-04-01
Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data.
Multiple logic functions from extended blockade region in a silicon quantum-dot transistor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Youngmin; Lee, Sejoon, E-mail: sejoon@dongguk.edu; Im, Hyunsik
2015-02-14
We demonstrate multiple logic-functions at room temperature on a unit device of the Si single electron transistor (SET). Owing to the formation of the multi-dot system, the device exhibits the enhanced Coulomb blockade characteristics (e.g., large peak-to-valley current ratio ∼200) that can improve the reliability of the SET-based logic circuits. The SET displays a unique feature useful for the logic applications; namely, the Coulomb oscillation peaks are systematically shifted by changing either of only the gate or the drain voltage. This enables the SET to act as a multi-functional one-transistor logic gate with AND, OR, NAND, and XOR functions.
Outdoor environmental assessment of attention promoting settings for preschool children.
Mårtensson, F; Boldemann, C; Söderström, M; Blennow, M; Englund, J-E; Grahn, P
2009-12-01
The restorative potential of green outdoor environments for children in preschool settings was investigated by measuring the attention of children playing in settings with different environmental features. Eleven preschools with outdoor environments typical for the Stockholm area were assessed using the outdoor play environment categories (OPEC) and the fraction of visible sky from play structures (sky view factor), and 198 children, aged 4.5-6.5 years, were rated by the staff for inattentive, hyperactive and impulsive behaviors with the ECADDES tool. Children playing in large and integrated outdoor areas containing large areas of trees, shrubbery and a hilly terrain showed less often behaviors of inattention (p<.05). The choice of tool for assessment of attention is discussed in relation to outdoor stay and play characteristics in Swedish preschool settings. The results indicate that the restorative potential of green outdoor environments applies also to preschool children and that environmental assessment tools as OPEC can be useful when to locate and develop health-promoting land adjacent to preschools.
SING: Subgraph search In Non-homogeneous Graphs
2010-01-01
Background Finding the subgraphs of a graph database that are isomorphic to a given query graph has practical applications in several fields, from cheminformatics to image understanding. Since subgraph isomorphism is a computationally hard problem, indexing techniques have been intensively exploited to speed up the process. Such systems filter out those graphs which cannot contain the query, and apply a subgraph isomorphism algorithm to each residual candidate graph. The applicability of such systems is limited to databases of small graphs, because their filtering power degrades on large graphs. Results In this paper, SING (Subgraph search In Non-homogeneous Graphs), a novel indexing system able to cope with large graphs, is presented. The method uses the notion of feature, which can be a small subgraph, subtree or path. Each graph in the database is annotated with the set of all its features. The key point is to make use of feature locality information. This idea is used to both improve the filtering performance and speed up the subgraph isomorphism task. Conclusions Extensive tests on chemical compounds, biological networks and synthetic graphs show that the proposed system outperforms the most popular systems in query time over databases of medium and large graphs. Other specific tests show that the proposed system is effective for single large graphs. PMID:20170516
Huser, Vojtech; Cimino, James J.
2013-01-01
Integrated data repositories (IDRs) are indispensable tools for numerous biomedical research studies. We compare three large IDRs (Informatics for Integrating Biology and the Bedside (i2b2), HMO Research Network’s Virtual Data Warehouse (VDW) and Observational Medical Outcomes Partnership (OMOP) repository) in order to identify common architectural features that enable efficient storage and organization of large amounts of clinical data. We define three high-level classes of underlying data storage models and we analyze each repository using this classification. We look at how a set of sample facts is represented in each repository and conclude with a list of desiderata for IDRs that deal with the information storage model, terminology model, data integration and value-sets management. PMID:24551366
Huser, Vojtech; Cimino, James J
2013-01-01
Integrated data repositories (IDRs) are indispensable tools for numerous biomedical research studies. We compare three large IDRs (Informatics for Integrating Biology and the Bedside (i2b2), HMO Research Network's Virtual Data Warehouse (VDW) and Observational Medical Outcomes Partnership (OMOP) repository) in order to identify common architectural features that enable efficient storage and organization of large amounts of clinical data. We define three high-level classes of underlying data storage models and we analyze each repository using this classification. We look at how a set of sample facts is represented in each repository and conclude with a list of desiderata for IDRs that deal with the information storage model, terminology model, data integration and value-sets management.
Competitive code-based fast palmprint identification using a set of cover trees
NASA Astrophysics Data System (ADS)
Yue, Feng; Zuo, Wangmeng; Zhang, David; Wang, Kuanquan
2009-06-01
A palmprint identification system recognizes a query palmprint image by searching for its nearest neighbor from among all the templates in a database. When applied on a large-scale identification system, it is often necessary to speed up the nearest-neighbor searching process. We use competitive code, which has very fast feature extraction and matching speed, for palmprint identification. To speed up the identification process, we extend the cover tree method and propose to use a set of cover trees to facilitate the fast and accurate nearest-neighbor searching. We can use the cover tree method because, as we show, the angular distance used in competitive code can be decomposed into a set of metrics. Using the Hong Kong PolyU palmprint database (version 2) and a large-scale palmprint database, our experimental results show that the proposed method searches for nearest neighbors faster than brute force searching.
Wu, Lingfei; Wu, Kesheng; Sim, Alex; ...
2016-06-01
A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes tomore » detect and track blob-filaments in real time in fusion plasma. Here, on a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC.« less
Object-based benefits without object-based representations.
Fougnie, Daryl; Cormiea, Sarah M; Alvarez, George A
2013-08-01
Influential theories of visual working memory have proposed that the basic units of memory are integrated object representations. Key support for this proposal is provided by the same object benefit: It is easier to remember multiple features of a single object than the same set of features distributed across multiple objects. Here, we replicate the object benefit but demonstrate that features are not stored as single, integrated representations. Specifically, participants could remember 10 features better when arranged in 5 objects compared to 10 objects, yet memory for one object feature was largely independent of memory for the other object feature. These results rule out the possibility that integrated representations drive the object benefit and require a revision of the concept of object-based memory representations. We propose that working memory is object-based in regard to the factors that enhance performance but feature based in regard to the level of representational failure. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Dissolution-Enlarged Fractures Imaging Using Electrical Resistivity Tomography (ERT)
NASA Astrophysics Data System (ADS)
Siami-Irdemoosa, Elnaz
In recent years the electrical imaging techniques have been largely applied to geotechnical and environmental investigations. These techniques have proven to be the best geophysical methods for site investigations in karst terrain, particularly when the overburden soil is clay-dominated. Karst is terrain with a special landscape and distinctive hydrological system developed by dissolution of rocks, particularly carbonate rocks such as limestone and dolomite, made by enlarging fractures into underground conduits that can enlarge into caverns, and in some cases collapse to form sinkholes. Bedding planes, joints, and faults are the principal structural guides for underground flow and dissolution in almost all karstified rocks. Despite the important role of fractures in karst development, the geometry of dissolution-enlarged fractures remain poorly unknown. These features are characterized by an strong contrast with the surrounding formations in terms of physical properties, such as electrical resistivity. Electrical resistivity tomography (ERT) was used as the primary geophysical tool to image the subsurface in a karst terrain in Greene County, Missouri. Pattern, orientation and density of the joint sets were interpreted from ERT data in the investigation site. The Multi-channel Analysis of Surface Wave (MASW) method and coring were employed to validate the interpretation results. Two sets of orthogonal visually prominent joints have been identified in the investigation site: north-south trending joint sets and west-east trending joint sets. However, most of the visually prominent joint sets are associated with either cultural features that concentrate runoff, natural surface drainage features or natural surface drainage.
Park, Sang-Hoon; Lee, David; Lee, Sang-Goog
2018-02-01
For the last few years, many feature extraction methods have been proposed based on biological signals. Among these, the brain signals have the advantage that they can be obtained, even by people with peripheral nervous system damage. Motor imagery electroencephalograms (EEG) are inexpensive to measure, offer a high temporal resolution, and are intuitive. Therefore, these have received a significant amount of attention in various fields, including signal processing, cognitive science, and medicine. The common spatial pattern (CSP) algorithm is a useful method for feature extraction from motor imagery EEG. However, performance degradation occurs in a small-sample setting (SSS), because the CSP depends on sample-based covariance. Since the active frequency range is different for each subject, it is also inconvenient to set the frequency range to be different every time. In this paper, we propose the feature extraction method based on a filter bank to solve these problems. The proposed method consists of five steps. First, motor imagery EEG is divided by a using filter bank. Second, the regularized CSP (R-CSP) is applied to the divided EEG. Third, we select the features according to mutual information based on the individual feature algorithm. Fourth, parameter sets are selected for the ensemble. Finally, we classify using ensemble based on features. The brain-computer interface competition III data set IVa is used to evaluate the performance of the proposed method. The proposed method improves the mean classification accuracy by 12.34%, 11.57%, 9%, 4.95%, and 4.47% compared with CSP, SR-CSP, R-CSP, filter bank CSP (FBCSP), and SR-FBCSP. Compared with the filter bank R-CSP ( , ), which is a parameter selection version of the proposed method, the classification accuracy is improved by 3.49%. In particular, the proposed method shows a large improvement in performance in the SSS.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.
Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi
2016-07-08
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Universal relations for range corrections to Efimov features
Ji, Chen; Braaten, Eric; Phillips, Daniel R.; ...
2015-09-09
In a three-body system of identical bosons interacting through a large S-wave scattering length a, there are several sets of features related to the Efimov effect that are characterized by discrete scale invariance. Effective field theory was recently used to derive universal relations between these Efimov features that include the first-order correction due to a nonzero effective range r s. We reveal a simple pattern in these range corrections that had not been previously identified. The pattern is explained by the renormalization group for the effective field theory, which implies that the Efimov three-body parameter runs logarithmically with the momentummore » scale at a rate proportional to r s/a. The running Efimov parameter also explains the empirical observation that range corrections can be largely taken into account by shifting the Efimov parameter by an adjustable parameter divided by a. Furthermore, the accuracy of universal relations that include first-order range corrections is verified by comparing them with various theoretical calculations using models with nonzero range.« less
Sacchet, Matthew D; Prasad, Gautam; Foland-Ross, Lara C; Thompson, Paul M; Gotlib, Ian H
2014-04-01
Graph theory is increasingly used in the field of neuroscience to understand the large-scale network structure of the human brain. There is also considerable interest in applying machine learning techniques in clinical settings, for example, to make diagnoses or predict treatment outcomes. Here we used support-vector machines (SVMs), in conjunction with whole-brain tractography, to identify graph metrics that best differentiate individuals with Major Depressive Disorder (MDD) from nondepressed controls. To do this, we applied a novel feature-scoring procedure that incorporates iterative classifier performance to assess feature robustness. We found that small-worldness , a measure of the balance between global integration and local specialization, most reliably differentiated MDD from nondepressed individuals. Post-hoc regional analyses suggested that heightened connectivity of the subcallosal cingulate gyrus (SCG) in MDDs contributes to these differences. The current study provides a novel way to assess the robustness of classification features and reveals anomalies in large-scale neural networks in MDD.
Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach
Aerts, Hugo J. W. L.; Velazquez, Emmanuel Rios; Leijenaar, Ralph T. H.; Parmar, Chintan; Grossmann, Patrick; Cavalho, Sara; Bussink, Johan; Monshouwer, René; Haibe-Kains, Benjamin; Rietveld, Derek; Hoebers, Frank; Rietbergen, Michelle M.; Leemans, C. René; Dekker, Andre; Quackenbush, John; Gillies, Robert J.; Lambin, Philippe
2014-01-01
Human cancers exhibit strong phenotypic differences that can be visualized noninvasively by medical imaging. Radiomics refers to the comprehensive quantification of tumour phenotypes by applying a large number of quantitative image features. Here we present a radiomic analysis of 440 features quantifying tumour image intensity, shape and texture, which are extracted from computed tomography data of 1,019 patients with lung or head-and-neck cancer. We find that a large number of radiomic features have prognostic power in independent data sets of lung and head-and-neck cancer patients, many of which were not identified as significant before. Radiogenomics analysis reveals that a prognostic radiomic signature, capturing intratumour heterogeneity, is associated with underlying gene-expression patterns. These data suggest that radiomics identifies a general prognostic phenotype existing in both lung and head-and-neck cancer. This may have a clinical impact as imaging is routinely used in clinical practice, providing an unprecedented opportunity to improve decision-support in cancer treatment at low cost. PMID:24892406
ANDERSON, JR; MOHAMMED, S; GRIMM, B; JONES, BW; KOSHEVOY, P; TASDIZEN, T; WHITAKER, R; MARC, RE
2011-01-01
Modern microscope automation permits the collection of vast amounts of continuous anatomical imagery in both two and three dimensions. These large data sets present significant challenges for data storage, access, viewing, annotation and analysis. The cost and overhead of collecting and storing the data can be extremely high. Large data sets quickly exceed an individual's capability for timely analysis and present challenges in efficiently applying transforms, if needed. Finally annotated anatomical data sets can represent a significant investment of resources and should be easily accessible to the scientific community. The Viking application was our solution created to view and annotate a 16.5 TB ultrastructural retinal connectome volume and we demonstrate its utility in reconstructing neural networks for a distinctive retinal amacrine cell class. Viking has several key features. (1) It works over the internet using HTTP and supports many concurrent users limited only by hardware. (2) It supports a multi-user, collaborative annotation strategy. (3) It cleanly demarcates viewing and analysis from data collection and hosting. (4) It is capable of applying transformations in real-time. (5) It has an easily extensible user interface, allowing addition of specialized modules without rewriting the viewer. PMID:21118201
Viewpoints: A High-Performance High-Dimensional Exploratory Data Analysis Tool
NASA Astrophysics Data System (ADS)
Gazis, P. R.; Levit, C.; Way, M. J.
2010-12-01
Scientific data sets continue to increase in both size and complexity. In the past, dedicated graphics systems at supercomputing centers were required to visualize large data sets, but as the price of commodity graphics hardware has dropped and its capability has increased, it is now possible, in principle, to view large complex data sets on a single workstation. To do this in practice, an investigator will need software that is written to take advantage of the relevant graphics hardware. The Viewpoints visualization package described herein is an example of such software. Viewpoints is an interactive tool for exploratory visual analysis of large high-dimensional (multivariate) data. It leverages the capabilities of modern graphics boards (GPUs) to run on a single workstation or laptop. Viewpoints is minimalist: it attempts to do a small set of useful things very well (or at least very quickly) in comparison with similar packages today. Its basic feature set includes linked scatter plots with brushing, dynamic histograms, normalization, and outlier detection/removal. Viewpoints was originally designed for astrophysicists, but it has since been used in a variety of fields that range from astronomy, quantum chemistry, fluid dynamics, machine learning, bioinformatics, and finance to information technology server log mining. In this article, we describe the Viewpoints package and show examples of its usage.
Kiranyaz, Serkan; Mäkinen, Toni; Gabbouj, Moncef
2012-10-01
In this paper, we propose a novel framework based on a collective network of evolutionary binary classifiers (CNBC) to address the problems of feature and class scalability. The main goal of the proposed framework is to achieve a high classification performance over dynamic audio and video repositories. The proposed framework adopts a "Divide and Conquer" approach in which an individual network of binary classifiers (NBC) is allocated to discriminate each audio class. An evolutionary search is applied to find the best binary classifier in each NBC with respect to a given criterion. Through the incremental evolution sessions, the CNBC framework can dynamically adapt to each new incoming class or feature set without resorting to a full-scale re-training or re-configuration. Therefore, the CNBC framework is particularly designed for dynamically varying databases where no conventional static classifiers can adapt to such changes. In short, it is entirely a novel topology, an unprecedented approach for dynamic, content/data adaptive and scalable audio classification. A large set of audio features can be effectively used in the framework, where the CNBCs make appropriate selections and combinations so as to achieve the highest discrimination among individual audio classes. Experiments demonstrate a high classification accuracy (above 90%) and efficiency of the proposed framework over large and dynamic audio databases. Copyright © 2012 Elsevier Ltd. All rights reserved.
Online tracking of outdoor lighting variations for augmented reality with moving cameras.
Liu, Yanli; Granier, Xavier
2012-04-01
In augmented reality, one of key tasks to achieve a convincing visual appearance consistency between virtual objects and video scenes is to have a coherent illumination along the whole sequence. As outdoor illumination is largely dependent on the weather, the lighting condition may change from frame to frame. In this paper, we propose a full image-based approach for online tracking of outdoor illumination variations from videos captured with moving cameras. Our key idea is to estimate the relative intensities of sunlight and skylight via a sparse set of planar feature-points extracted from each frame. To address the inevitable feature misalignments, a set of constraints are introduced to select the most reliable ones. Exploiting the spatial and temporal coherence of illumination, the relative intensities of sunlight and skylight are finally estimated by using an optimization process. We validate our technique on a set of real-life videos and show that the results with our estimations are visually coherent along the video sequences.
A method for feature selection of APT samples based on entropy
NASA Astrophysics Data System (ADS)
Du, Zhenyu; Li, Yihong; Hu, Jinsong
2018-05-01
By studying the known APT attack events deeply, this paper propose a feature selection method of APT sample and a logic expression generation algorithm IOCG (Indicator of Compromise Generate). The algorithm can automatically generate machine readable IOCs (Indicator of Compromise), to solve the existing IOCs logical relationship is fixed, the number of logical items unchanged, large scale and cannot generate a sample of the limitations of the expression. At the same time, it can reduce the redundancy and useless APT sample processing time consumption, and improve the sharing rate of information analysis, and actively respond to complex and volatile APT attack situation. The samples were divided into experimental set and training set, and then the algorithm was used to generate the logical expression of the training set with the IOC_ Aware plug-in. The contrast expression itself was different from the detection result. The experimental results show that the algorithm is effective and can improve the detection effect.
Uber, Amy; Sadler, Richard C; Chassee, Todd; Reynolds, Joshua C
2017-08-01
Geographic clustering of bystander cardiopulmonary resuscitation (CPR) is associated with demographic and socioeconomic features of the community where out-of-hospital cardiac arrest (OHCA) occurred, although this association remains largely untested in rural areas. With a significant rural component and relative racial homogeneity, Kent County, Michigan, provides a unique setting to externally validate or identify new community features associated with bystander CPR. Using a large, countywide data set, we tested for geographic clustering of bystander CPR and its associations with community socioeconomic features. Secondary analysis of adult OHCA subjects (2010-2015) in the Cardiac Arrest Registry to Enhance Survival (CARES) data set for Kent County, Michigan. After linking geocoded OHCA cases to U.S. census data, we used Moran's I-test to assess for spatial autocorrelation of population-weighted cardiac arrest rate by census block group. Getis-Ord Gi statistic assessed for spatial clustering of bystander CPR and mixed-effects hierarchical logistic regression estimated adjusted associations between community features and bystander CPR. Of 1,592 subjects, 1,465 met inclusion criteria. Geospatial analysis revealed significant clustering of OHCA in more populated/urban areas. Conversely, bystander CPR was less likely in these areas (99% confidence) and more likely in suburban and rural areas (99% confidence). Adjusting for clinical, demographic, and socioeconomic covariates, bystander CPR was associated with public location (odds ratio [OR] = 1.19; 95% confidence interval [CI] = 1.03-1.39), initially shockable rhythms (OR = 1.48; 95% CI = 1.12-1.96), and those in urban neighborhoods (OR = 0.54; 95% CI = 0.38-0.77). Out-of-hospital cardiac arrest and bystander CPR are geographically clustered in Kent County, Michigan, but bystander CPR is inversely associated with urban designation. These results offer new insight into bystander CPR patterns in mixed urban and rural regions and afford the opportunity for targeted community CPR education in areas of low bystander CPR prevalence. © 2017 by the Society for Academic Emergency Medicine.
A recurrent neural model for proto-object based contour integration and figure-ground segregation.
Hu, Brian; Niebur, Ernst
2017-12-01
Visual processing of objects makes use of both feedforward and feedback streams of information. However, the nature of feedback signals is largely unknown, as is the identity of the neuronal populations in lower visual areas that receive them. Here, we develop a recurrent neural model to address these questions in the context of contour integration and figure-ground segregation. A key feature of our model is the use of grouping neurons whose activity represents tentative objects ("proto-objects") based on the integration of local feature information. Grouping neurons receive input from an organized set of local feature neurons, and project modulatory feedback to those same neurons. Additionally, inhibition at both the local feature level and the object representation level biases the interpretation of the visual scene in agreement with principles from Gestalt psychology. Our model explains several sets of neurophysiological results (Zhou et al. Journal of Neuroscience, 20(17), 6594-6611 2000; Qiu et al. Nature Neuroscience, 10(11), 1492-1499 2007; Chen et al. Neuron, 82(3), 682-694 2014), and makes testable predictions about the influence of neuronal feedback and attentional selection on neural responses across different visual areas. Our model also provides a framework for understanding how object-based attention is able to select both objects and the features associated with them.
Evolutionary optimization of radial basis function classifiers for data mining applications.
Buchtala, Oliver; Klimek, Manuel; Sick, Bernhard
2005-10-01
In many data mining applications that address classification problems, feature and model selection are considered as key tasks. That is, appropriate input features of the classifier must be selected from a given (and often large) set of possible features and structure parameters of the classifier must be adapted with respect to these features and a given data set. This paper describes an evolutionary algorithm (EA) that performs feature and model selection simultaneously for radial basis function (RBF) classifiers. In order to reduce the optimization effort, various techniques are integrated that accelerate and improve the EA significantly: hybrid training of RBF networks, lazy evaluation, consideration of soft constraints by means of penalty terms, and temperature-based adaptive control of the EA. The feasibility and the benefits of the approach are demonstrated by means of four data mining problems: intrusion detection in computer networks, biometric signature verification, customer acquisition with direct marketing methods, and optimization of chemical production processes. It is shown that, compared to earlier EA-based RBF optimization techniques, the runtime is reduced by up to 99% while error rates are lowered by up to 86%, depending on the application. The algorithm is independent of specific applications so that many ideas and solutions can be transferred to other classifier paradigms.
Gadd, C. S.; Baskaran, P.; Lobach, D. F.
1998-01-01
Extensive utilization of point-of-care decision support systems will be largely dependent on the development of user interaction capabilities that make them effective clinical tools in patient care settings. This research identified critical design features of point-of-care decision support systems that are preferred by physicians, through a multi-method formative evaluation of an evolving prototype of an Internet-based clinical decision support system. Clinicians used four versions of the system--each highlighting a different functionality. Surveys and qualitative evaluation methodologies assessed clinicians' perceptions regarding system usability and usefulness. Our analyses identified features that improve perceived usability, such as telegraphic representations of guideline-related information, facile navigation, and a forgiving, flexible interface. Users also preferred features that enhance usefulness and motivate use, such as an encounter documentation tool and the availability of physician instruction and patient education materials. In addition to identifying design features that are relevant to efforts to develop clinical systems for point-of-care decision support, this study demonstrates the value of combining quantitative and qualitative methods of formative evaluation with an iterative system development strategy to implement new information technology in complex clinical settings. Images Figure 1 PMID:9929188
3-D seismic study into the origin of a large seafloor depression on the Chatham Rise, New Zealand
NASA Astrophysics Data System (ADS)
Pecher, I. A.; Waghorn, K. A.; Strachan, L. J.; Crutchley, G. J.; Bialas, J.; Sarkar, S.; Davy, B. W.; Papenberg, C. A.; Koch, S.; Eckardt, T.; Kroeger, K. F.; Rose, P. S.; Coffin, R. B.
2014-12-01
Vast areas of the Chatham Rise, east of New Zealand's South Island, are covered by circular to elliptical seafloor depressions. Distribution and size of these seafloor depressions appear to be linked to bathymetry: Small depressions several hundred meters in diameter are found in a depth range of ~500-800 m while two types of larger depressions with 2-5 km and >10 km in diameter, respectively, are present in water depths of 800-1100 m. Here we evaluate 3-D seismic reflection data acquired off the R/V Sonne in 2013 over one of the 2-5 km large depressions. We interpret that the seafloor bathymetry associated with the 2-5 km depressions was most likely created by contour current erosion and deposition. These contourite features are underlain by structures that indicate upward fluid flow, including polygonal fault networks and a conical feature that we interpret to result from sediment re-mobilization. We also discovered a set of smaller buried depressions immediately beneath the contourites. These features are directly connected to the stratigraphy containing the conical feature through sets of polygonal faults which truncate against the base of the paleo-depressions. We interpret these depressions as paleo-pockmarks resulting from fluid expulsion, presumably including gas. Based on interpretation and age correlation of a regional-scale seismic line, the paleo-pockmarks could be as old as 5.5 Ma. We suggest the resulting paleo-topography provided the initial roughness required to form mounded contourite deposits that lead to depressions in seafloor bathymetry.
Analysis Of The IJCNN 2011 UTL Challenge
2012-01-13
large datasets from various application domains: handwriting recognition, image recognition, video processing, text processing, and ecology. The goal...validation and final evaluation sets consist of 4096 examples each. Dataset Domain Features Sparsity Devel. Transf. AVICENNA Handwriting 120 0% 150205...documents [3]. Transfer learning methods could accelerate the application of handwriting recognizers to historical manuscript by reducing the need for
Automated Scoring of L2 Spoken English with Random Forests
ERIC Educational Resources Information Center
Kobayashi, Yuichiro; Abe, Mariko
2016-01-01
The purpose of the present study is to assess second language (L2) spoken English using automated scoring techniques. Automated scoring aims to classify a large set of learners' oral performance data into a small number of discrete oral proficiency levels. In automated scoring, objectively measurable features such as the frequencies of lexical and…
The Effects of School Desegregation on Crime. NBER Working Paper No. 15380
ERIC Educational Resources Information Center
Weiner, David A.; Lutz, Byron F.; Ludwig, Jens
2009-01-01
One of the most striking features of crime in America is its disproportionate concentration in disadvantaged, racially segregated communities. In this paper we estimate the effects of court-ordered school desegregation on crime by exploiting plausibly random variation in the timing of when these orders go into effect across the set of large urban…
T-ray relevant frequencies for osteosarcoma classification
NASA Astrophysics Data System (ADS)
Withayachumnankul, W.; Ferguson, B.; Rainsford, T.; Findlay, D.; Mickan, S. P.; Abbott, D.
2006-01-01
We investigate the classification of the T-ray response of normal human bone cells and human osteosarcoma cells, grown in culture. Given the magnitude and phase responses within a reliable spectral range as features for input vectors, a trained support vector machine can correctly classify the two cell types to some extent. Performance of the support vector machine is deteriorated by the curse of dimensionality, resulting from the comparatively large number of features in the input vectors. Feature subset selection methods are used to select only an optimal number of relevant features for inputs. As a result, an improvement in generalization performance is attainable, and the selected frequencies can be used for further describing different mechanisms of the cells, responding to T-rays. We demonstrate a consistent classification accuracy of 89.6%, while the only one fifth of the original features are retained in the data set.
Efficient and robust computation of PDF features from diffusion MR signal.
Assemlal, Haz-Edine; Tschumperlé, David; Brun, Luc
2009-10-01
We present a method for the estimation of various features of the tissue micro-architecture using the diffusion magnetic resonance imaging. The considered features are designed from the displacement probability density function (PDF). The estimation is based on two steps: first the approximation of the signal by a series expansion made of Gaussian-Laguerre and Spherical Harmonics functions; followed by a projection on a finite dimensional space. Besides, we propose to tackle the problem of the robustness to Rician noise corrupting in-vivo acquisitions. Our feature estimation is expressed as a variational minimization process leading to a variational framework which is robust to noise. This approach is very flexible regarding the number of samples and enables the computation of a large set of various features of the local tissues structure. We demonstrate the effectiveness of the method with results on both synthetic phantom and real MR datasets acquired in a clinical time-frame.
Angular description for 3D scattering centers
NASA Astrophysics Data System (ADS)
Bhalla, Rajan; Raynal, Ann Marie; Ling, Hao; Moore, John; Velten, Vincent J.
2006-05-01
The electromagnetic scattered field from an electrically large target can often be well modeled as if it is emanating from a discrete set of scattering centers (see Fig. 1). In the scattering center extraction tool we developed previously based on the shooting and bouncing ray technique, no correspondence is maintained amongst the 3D scattering center extracted at adjacent angles. In this paper we present a multi-dimensional clustering algorithm to track the angular and spatial behaviors of 3D scattering centers and group them into features. The extracted features for the Slicy and backhoe targets are presented. We also describe two metrics for measuring the angular persistence and spatial mobility of the 3D scattering centers that make up these features in order to gather insights into target physics and feature stability. We find that features that are most persistent are also the most mobile and discuss implications for optimal SAR imaging.
van der Kloet, Frans M; Hendriks, Margriet; Hankemeier, Thomas; Reijmers, Theo
2013-11-01
Because of its high sensitivity and specificity, hyphenated mass spectrometry has become the predominant method to detect and quantify metabolites present in bio-samples relevant for all sorts of life science studies being executed. In contrast to targeted methods that are dedicated to specific features, global profiling acquisition methods allow new unspecific metabolites to be analyzed. The challenge with these so-called untargeted methods is the proper and automated extraction and integration of features that could be of relevance. We propose a new algorithm that enables untargeted integration of samples that are measured with high resolution liquid chromatography-mass spectrometry (LC-MS). In contrast to other approaches limited user interaction is needed allowing also less experienced users to integrate their data. The large amount of single features that are found within a sample is combined to a smaller list of, compound-related, grouped feature-sets representative for that sample. These feature-sets allow for easier interpretation and identification and as important, easier matching over samples. We show that the automatic obtained integration results for a set of known target metabolites match those generated with vendor software but that at least 10 times more feature-sets are extracted as well. We demonstrate our approach using high resolution LC-MS data acquired for 128 samples on a lipidomics platform. The data was also processed in a targeted manner (with a combination of automatic and manual integration) using vendor software for a set of 174 targets. As our untargeted extraction procedure is run per sample and per mass trace the implementation of it is scalable. Because of the generic approach, we envision that this data extraction lipids method will be used in a targeted as well as untargeted analysis of many different kinds of TOF-MS data, even CE- and GC-MS data or MRM. The Matlab package is available for download on request and efforts are directed toward a user-friendly Windows executable. Copyright © 2013 Elsevier B.V. All rights reserved.
Critical Song Features for Auditory Pattern Recognition in Crickets
Meckenhäuser, Gundula; Hennig, R. Matthias; Nawrot, Martin P.
2013-01-01
Many different invertebrate and vertebrate species use acoustic communication for pair formation. In the cricket Gryllus bimaculatus, females recognize their species-specific calling song and localize singing males by positive phonotaxis. The song pattern of males has a clear structure consisting of brief and regular pulses that are grouped into repetitive chirps. Information is thus present on a short and a long time scale. Here, we ask which structural features of the song critically determine the phonotactic performance. To this end we employed artificial neural networks to analyze a large body of behavioral data that measured females’ phonotactic behavior under systematic variation of artificially generated song patterns. In a first step we used four non-redundant descriptive temporal features to predict the female response. The model prediction showed a high correlation with the experimental results. We used this behavioral model to explore the integration of the two different time scales. Our result suggested that only an attractive pulse structure in combination with an attractive chirp structure reliably induced phonotactic behavior to signals. In a further step we investigated all feature sets, each one consisting of a different combination of eight proposed temporal features. We identified feature sets of size two, three, and four that achieve highest prediction power by using the pulse period from the short time scale plus additional information from the long time scale. PMID:23437054
Robustly Aligning a Shape Model and Its Application to Car Alignment of Unknown Pose.
Li, Yan; Gu, Leon; Kanade, Takeo
2011-09-01
Precisely localizing in an image a set of feature points that form a shape of an object, such as car or face, is called alignment. Previous shape alignment methods attempted to fit a whole shape model to the observed data, based on the assumption of Gaussian observation noise and the associated regularization process. However, such an approach, though able to deal with Gaussian noise in feature detection, turns out not to be robust or precise because it is vulnerable to gross feature detection errors or outliers resulting from partial occlusions or spurious features from the background or neighboring objects. We address this problem by adopting a randomized hypothesis-and-test approach. First, a Bayesian inference algorithm is developed to generate a shape-and-pose hypothesis of the object from a partial shape or a subset of feature points. For alignment, a large number of hypotheses are generated by randomly sampling subsets of feature points, and then evaluated to find the one that minimizes the shape prediction error. This method of randomized subset-based matching can effectively handle outliers and recover the correct object shape. We apply this approach on a challenging data set of over 5,000 different-posed car images, spanning a wide variety of car types, lighting, background scenes, and partial occlusions. Experimental results demonstrate favorable improvements over previous methods on both accuracy and robustness.
Unsupervised Feature Selection Based on the Morisita Index for Hyperspectral Images
NASA Astrophysics Data System (ADS)
Golay, Jean; Kanevski, Mikhail
2017-04-01
Hyperspectral sensors are capable of acquiring images with hundreds of narrow and contiguous spectral bands. Compared with traditional multispectral imagery, the use of hyperspectral images allows better performance in discriminating between land-cover classes, but it also results in large redundancy and high computational data processing. To alleviate such issues, unsupervised feature selection techniques for redundancy minimization can be implemented. Their goal is to select the smallest subset of features (or bands) in such a way that all the information content of a data set is preserved as much as possible. The present research deals with the application to hyperspectral images of a recently introduced technique of unsupervised feature selection: the Morisita-Based filter for Redundancy Minimization (MBRM). MBRM is based on the (multipoint) Morisita index of clustering and on the Morisita estimator of Intrinsic Dimension (ID). The fundamental idea of the technique is to retain only the bands which contribute to increasing the ID of an image. In this way, redundant bands are disregarded, since they have no impact on the ID. Besides, MBRM has several advantages over benchmark techniques: in addition to its ability to deal with large data sets, it can capture highly-nonlinear dependences and its implementation is straightforward in any programming environment. Experimental results on freely available hyperspectral images show the good effectiveness of MBRM in remote sensing data processing. Comparisons with benchmark techniques are carried out and random forests are used to assess the performance of MBRM in reducing the data dimensionality without loss of relevant information. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158-171, 2000. [2] J. Golay, M. Kanevski, A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48(12), pp. 4070-4081, 2015. [3] J. Golay, M. Kanevski, Unsupervised feature selection based on the Morisita estimator of intrinsic dimension, arXiv:1608.05581, 2016.
A Feature-based Developmental Model of the Infant Brain in Structural MRI
Toews, Matthew; Wells, William M.; Zöllei, Lilla
2014-01-01
In this paper, anatomical development is modeled as a collection of distinctive image patterns localized in space and time. A Bayesian posterior probability is defined over a random variable of subject age, conditioned on data in the form of scale-invariant image features. The model is automatically learned from a large set of images exhibiting significant variation, used to discover anatomical structure related to age and development, and fit to new images to predict age. The model is applied to a set of 230 infant structural MRIs of 92 subjects acquired at multiple sites over an age range of 8-590 days. Experiments demonstrate that the model can be used to identify age-related anatomical structure, and to predict the age of new subjects with an average error of 72 days. PMID:23286050
The Centre for Speech, Language and the Brain (CSLB) concept property norms.
Devereux, Barry J; Tyler, Lorraine K; Geertzen, Jeroen; Randall, Billi
2014-12-01
Theories of the representation and processing of concepts have been greatly enhanced by models based on information available in semantic property norms. This information relates both to the identity of the features produced in the norms and to their statistical properties. In this article, we introduce a new and large set of property norms that are designed to be a more flexible tool to meet the demands of many different disciplines interested in conceptual knowledge representation, from cognitive psychology to computational linguistics. As well as providing all features listed by 2 or more participants, we also show the considerable linguistic variation that underlies each normalized feature label and the number of participants who generated each variant. Our norms are highly comparable with the largest extant set (McRae, Cree, Seidenberg, & McNorgan, 2005) in terms of the number and distribution of features. In addition, we show how the norms give rise to a coherent category structure. We provide these norms in the hope that the greater detail available in the Centre for Speech, Language and the Brain norms should further promote the development of models of conceptual knowledge. The norms can be downloaded at www.csl.psychol.cam.ac.uk/propertynorms.
Cyberhubs: Virtual Research Environments for Astronomy
NASA Astrophysics Data System (ADS)
Herwig, Falk; Andrassy, Robert; Annau, Nic; Clarkson, Ondrea; Côté, Benoit; D’Sa, Aaron; Jones, Sam; Moa, Belaid; O’Connell, Jericho; Porter, David; Ritter, Christian; Woodward, Paul
2018-05-01
Collaborations in astronomy and astrophysics are faced with numerous cyber-infrastructure challenges, such as large data sets, the need to combine heterogeneous data sets, and the challenge to effectively collaborate on those large, heterogeneous data sets with significant processing requirements and complex science software tools. The cyberhubs system is an easy-to-deploy package for small- to medium-sized collaborations based on the Jupyter and Docker technology, which allows web-browser-enabled, remote, interactive analytic access to shared data. It offers an initial step to address these challenges. The features and deployment steps of the system are described, as well as the requirements collection through an account of the different approaches to data structuring, handling, and available analytic tools for the NuGrid and PPMstar collaborations. NuGrid is an international collaboration that creates stellar evolution and explosion physics and nucleosynthesis simulation data. The PPMstar collaboration performs large-scale 3D stellar hydrodynamics simulations of interior convection in the late phases of stellar evolution. Examples of science that is currently performed on cyberhubs, in the areas of 3D stellar hydrodynamic simulations, stellar evolution and nucleosynthesis, and Galactic chemical evolution, are presented.
Długosz, Maciej; Trylska, Joanna
2008-01-01
We present a method for describing and comparing global electrostatic properties of biomolecules based on the spherical harmonic decomposition of electrostatic potential data. Unlike other approaches our method does not require any prior three dimensional structural alignment. The electrostatic potential, given as a volumetric data set from a numerical solution of the Poisson or Poisson–Boltzmann equation, is represented with descriptors that are rotation invariant. The method can be applied to large and structurally diverse sets of biomolecules enabling to cluster them according to their electrostatic features. PMID:18624502
OpenCL based machine learning labeling of biomedical datasets
NASA Astrophysics Data System (ADS)
Amoros, Oscar; Escalera, Sergio; Puig, Anna
2011-03-01
In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and labeling speeds.
Covert photo classification by fusing image features and visual attributes.
Lang, Haitao; Ling, Haibin
2015-10-01
In this paper, we study a novel problem of classifying covert photos, whose acquisition processes are intentionally concealed from the subjects being photographed. Covert photos are often privacy invasive and, if distributed over Internet, can cause serious consequences. Automatic identification of such photos, therefore, serves as an important initial step toward further privacy protection operations. The problem is, however, very challenging due to the large semantic similarity between covert and noncovert photos, the enormous diversity in the photographing process and environment of cover photos, and the difficulty to collect an effective data set for the study. Attacking these challenges, we make three consecutive contributions. First, we collect a large data set containing 2500 covert photos, each of them is verified rigorously and carefully. Second, we conduct a user study on how humans distinguish covert photos from noncovert ones. The user study not only provides an important evaluation baseline, but also suggests fusing heterogeneous information for an automatic solution. Our third contribution is a covert photo classification algorithm that fuses various image features and visual attributes in the multiple kernel learning framework. We evaluate the proposed approach on the collected data set in comparison with other modern image classifiers. The results show that our approach achieves an average classification rate (1-EER) of 0.8940, which significantly outperforms other competitors as well as human's performance.
Identifying well-formed biomedical phrases in MEDLINE® text.
Kim, Won; Yeganova, Lana; Comeau, Donald C; Wilbur, W John
2012-12-01
In the modern world people frequently interact with retrieval systems to satisfy their information needs. Humanly understandable well-formed phrases represent a crucial interface between humans and the web, and the ability to index and search with such phrases is beneficial for human-web interactions. In this paper we consider the problem of identifying humanly understandable, well formed, and high quality biomedical phrases in MEDLINE documents. The main approaches used previously for detecting such phrases are syntactic, statistical, and a hybrid approach combining these two. In this paper we propose a supervised learning approach for identifying high quality phrases. First we obtain a set of known well-formed useful phrases from an existing source and label these phrases as positive. We then extract from MEDLINE a large set of multiword strings that do not contain stop words or punctuation. We believe this unlabeled set contains many well-formed phrases. Our goal is to identify these additional high quality phrases. We examine various feature combinations and several machine learning strategies designed to solve this problem. A proper choice of machine learning methods and features identifies in the large collection strings that are likely to be high quality phrases. We evaluate our approach by making human judgments on multiword strings extracted from MEDLINE using our methods. We find that over 85% of such extracted phrase candidates are humanly judged to be of high quality. Published by Elsevier Inc.
Intrapartum fetal heart rate classification from trajectory in Sparse SVM feature space.
Spilka, J; Frecon, J; Leonarduzzi, R; Pustelnik, N; Abry, P; Doret, M
2015-01-01
Intrapartum fetal heart rate (FHR) constitutes a prominent source of information for the assessment of fetal reactions to stress events during delivery. Yet, early detection of fetal acidosis remains a challenging signal processing task. The originality of the present contribution are three-fold: multiscale representations and wavelet leader based multifractal analysis are used to quantify FHR variability ; Supervised classification is achieved by means of Sparse-SVM that aim jointly to achieve optimal detection performance and to select relevant features in a multivariate setting ; Trajectories in the feature space accounting for the evolution along time of features while labor progresses are involved in the construction of indices quantifying fetal health. The classification performance permitted by this combination of tools are quantified on a intrapartum FHR large database (≃ 1250 subjects) collected at a French academic public hospital.
Using Activity-Related Behavioural Features towards More Effective Automatic Stress Detection
Giakoumis, Dimitris; Drosou, Anastasios; Cipresso, Pietro; Tzovaras, Dimitrios; Hassapis, George; Gaggioli, Andrea; Riva, Giuseppe
2012-01-01
This paper introduces activity-related behavioural features that can be automatically extracted from a computer system, with the aim to increase the effectiveness of automatic stress detection. The proposed features are based on processing of appropriate video and accelerometer recordings taken from the monitored subjects. For the purposes of the present study, an experiment was conducted that utilized a stress-induction protocol based on the stroop colour word test. Video, accelerometer and biosignal (Electrocardiogram and Galvanic Skin Response) recordings were collected from nineteen participants. Then, an explorative study was conducted by following a methodology mainly based on spatiotemporal descriptors (Motion History Images) that are extracted from video sequences. A large set of activity-related behavioural features, potentially useful for automatic stress detection, were proposed and examined. Experimental evaluation showed that several of these behavioural features significantly correlate to self-reported stress. Moreover, it was found that the use of the proposed features can significantly enhance the performance of typical automatic stress detection systems, commonly based on biosignal processing. PMID:23028461
2014-01-01
Background In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems. Results We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations. Conclusions The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods. PMID:24731138
A Transform-Based Feature Extraction Approach for Motor Imagery Tasks Classification
Khorshidtalab, Aida; Mesbah, Mostefa; Salami, Momoh J. E.
2015-01-01
In this paper, we present a new motor imagery classification method in the context of electroencephalography (EEG)-based brain–computer interface (BCI). This method uses a signal-dependent orthogonal transform, referred to as linear prediction singular value decomposition (LP-SVD), for feature extraction. The transform defines the mapping as the left singular vectors of the LP coefficient filter impulse response matrix. Using a logistic tree-based model classifier; the extracted features are classified into one of four motor imagery movements. The proposed approach was first benchmarked against two related state-of-the-art feature extraction approaches, namely, discrete cosine transform (DCT) and adaptive autoregressive (AAR)-based methods. By achieving an accuracy of 67.35%, the LP-SVD approach outperformed the other approaches by large margins (25% compared with DCT and 6 % compared with AAR-based methods). To further improve the discriminatory capability of the extracted features and reduce the computational complexity, we enlarged the extracted feature subset by incorporating two extra features, namely, Q- and the Hotelling’s \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$T^{2}$ \\end{document} statistics of the transformed EEG and introduced a new EEG channel selection method. The performance of the EEG classification based on the expanded feature set and channel selection method was compared with that of a number of the state-of-the-art classification methods previously reported with the BCI IIIa competition data set. Our method came second with an average accuracy of 81.38%. PMID:27170898
NASA Astrophysics Data System (ADS)
Xiong, Wei; Qiu, Bo; Tian, Qi; Mueller, Henning; Xu, Changsheng
2005-04-01
Medical image retrieval is still mainly a research domain with a large variety of applications and techniques. With the ImageCLEF 2004 benchmark, an evaluation framework has been created that includes a database, query topics and ground truth data. Eleven systems (with a total of more than 50 runs) compared their performance in various configurations. The results show that there is not any one feature that performs well on all query tasks. Key to successful retrieval is rather the selection of features and feature weights based on a specific set of input features, thus on the query task. In this paper we propose a novel method based on query topic dependent image features (QTDIF) for content-based medical image retrieval. These feature sets are designed to capture both inter-category and intra-category statistical variations to achieve good retrieval performance in terms of recall and precision. We have used Gaussian Mixture Models (GMM) and blob representation to model medical images and construct the proposed novel QTDIF for CBIR. Finally, trained multi-class support vector machines (SVM) are used for image similarity ranking. The proposed methods have been tested over the Casimage database with around 9000 images, for the given 26 image topics, used for imageCLEF 2004. The retrieval performance has been compared with the medGIFT system, which is based on the GNU Image Finding Tool (GIFT). The experimental results show that the proposed QTDIF-based CBIR can provide significantly better performance than systems based general features only.
Liu, Jingfang; Zhang, Pengzhu; Lu, Yingjie
2014-11-01
User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews. We conducted experiments on online user reviews using different feature set and different classification technique. Firstly, the messages from three communities, allergy community, schizophrenia community and pain management community, were collected, the 3000 messages were annotated. Secondly, the N-gram-based features set and medical domain-specific features set were generated. Thirdly, three classification techniques, SVM, C4.5 and Naïve Bayes, were used to perform classification tasks separately. Finally, we evaluated the performance of different method using different feature set and different classification technique by comparing the metrics including accuracy and F-measure. In terms of accuracy, the accuracy of SVM classifier was higher than 0.8, the accuracy of C4.5 classifier or Naïve Bayes classifier was lower than 0.8; meanwhile, the combination feature sets including n-gram-based feature set and domain-specific feature set consistently outperformed single feature set. In terms of F-measure, the highest F-measure is 0.895 which was achieved by using combination feature sets and a SVM classifier. In all, we can get the best classification performance by using combination feature sets and SVM classifier. By using combination feature sets and SVM classifier, we can get an effective method to identify messages related to ADRs automatically from online user reviews.
Sorted Index Numbers for Privacy Preserving Face Recognition
NASA Astrophysics Data System (ADS)
Wang, Yongjin; Hatzinakos, Dimitrios
2009-12-01
This paper presents a novel approach for changeable and privacy preserving face recognition. We first introduce a new method of biometric matching using the sorted index numbers (SINs) of feature vectors. Since it is impossible to recover any of the exact values of the original features, the transformation from original features to the SIN vectors is noninvertible. To address the irrevocable nature of biometric signals whilst obtaining stronger privacy protection, a random projection-based method is employed in conjunction with the SIN approach to generate changeable and privacy preserving biometric templates. The effectiveness of the proposed method is demonstrated on a large generic data set, which contains images from several well-known face databases. Extensive experimentation shows that the proposed solution may improve the recognition accuracy.
A mixture model-based approach to the clustering of microarray expression data.
McLachlan, G J; Bean, R W; Peel, D
2002-03-01
This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/
Regional climate model sensitivity to domain size
NASA Astrophysics Data System (ADS)
Leduc, Martin; Laprise, René
2009-05-01
Regional climate models are increasingly used to add small-scale features that are not present in their lateral boundary conditions (LBC). It is well known that the limited area over which a model is integrated must be large enough to allow the full development of small-scale features. On the other hand, integrations on very large domains have shown important departures from the driving data, unless large scale nudging is applied. The issue of domain size is studied here by using the “perfect model” approach. This method consists first of generating a high-resolution climatic simulation, nicknamed big brother (BB), over a large domain of integration. The next step is to degrade this dataset with a low-pass filter emulating the usual coarse-resolution LBC. The filtered nesting data (FBB) are hence used to drive a set of four simulations (LBs for Little Brothers), with the same model, but on progressively smaller domain sizes. The LB statistics for a climate sample of four winter months are compared with BB over a common region. The time average (stationary) and transient-eddy standard deviation patterns of the LB atmospheric fields generally improve in terms of spatial correlation with the reference (BB) when domain gets smaller. The extraction of the small-scale features by using a spectral filter allows detecting important underestimations of the transient-eddy variability in the vicinity of the inflow boundary, which can penalize the use of small domains (less than 100 × 100 grid points). The permanent “spatial spin-up” corresponds to the characteristic distance that the large-scale flow needs to travel before developing small-scale features. The spin-up distance tends to grow in size at higher levels in the atmosphere.
Multiscale Simulations of ALD in Cross Flow Reactors
Yanguas-Gil, Angel; Libera, Joseph A.; Elam, Jeffrey W.
2014-08-13
In this study, we have developed a multiscale simulation code that allows us to study the impact of surface chemistry on the coating of large area substrates with high surface area/high aspect-ratio features. Our code, based on open-source libraries, takes advantage of the ALD surface chemistry to achieve an extremely efficient two-way coupling between reactor and feature length scales, and it can provide simulated quartz crystal microbalance and mass spectrometry data at any point of the reactor. By combining experimental surface characterization with simple analysis of growth profiles in a tubular cross flow reactor, we are able to extract amore » minimal set of reactions to effectively model the surface chemistry, including the presence of spurious CVD, to evaluate the impact of surface chemistry on the coating of large, high surface area substrates.« less
Natural scene logo recognition by joint boosting feature selection in salient regions
NASA Astrophysics Data System (ADS)
Fan, Wei; Sun, Jun; Naoi, Satoshi; Minagawa, Akihiro; Hotta, Yoshinobu
2011-01-01
Logos are considered valuable intellectual properties and a key component of the goodwill of a business. In this paper, we propose a natural scene logo recognition method which is segmentation-free and capable of processing images extremely rapidly and achieving high recognition rates. The classifiers for each logo are trained jointly, rather than independently. In this way, common features can be shared across multiple classes for better generalization. To deal with large range of aspect ratio of different logos, a set of salient regions of interest (ROI) are extracted to describe each class. We ensure the selected ROIs to be both individually informative and two-by-two weakly dependant by a Class Conditional Entropy Maximization criteria. Experimental results on a large logo database demonstrate the effectiveness and efficiency of our proposed method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jurrus, Elizabeth R.; Hodas, Nathan O.; Baker, Nathan A.
Forensic analysis of nanoparticles is often conducted through the collection and identifi- cation of electron microscopy images to determine the origin of suspected nuclear material. Each image is carefully studied by experts for classification of materials based on texture, shape, and size. Manually inspecting large image datasets takes enormous amounts of time. However, automatic classification of large image datasets is a challenging problem due to the complexity involved in choosing image features, the lack of training data available for effective machine learning methods, and the availability of user interfaces to parse through images. Therefore, a significant need exists for automatedmore » and semi-automated methods to help analysts perform accurate image classification in large image datasets. We present INStINCt, our Intelligent Signature Canvas, as a framework for quickly organizing image data in a web based canvas framework. Images are partitioned using small sets of example images, chosen by users, and presented in an optimal layout based on features derived from convolutional neural networks.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spittler, T.E.; Sydnor, R.H.; Manson, M.W.
1990-01-01
The Loma Prieta earthquake of October 17, 1989 triggered landslides throughout the Santa Cruz Mountains in central California. The California Department of Conservation, Division of Mines and Geology (DMG) responded to a request for assistance from the County of Santa Cruz, Office of Emergency Services to evaluate the geologic hazard from major reactivated large landslides. DMG prepared a set of geologic maps showing the landslide features that resulted from the October 17 earthquake. The principal purpose of large-scale mapping of these landslides is: (1) to provide county officials with regional landslide information that can be used for timely recovery ofmore » damaged areas; (2) to identify disturbed ground which is potentially vulnerable to landslide movement during winter rains; (3) to provide county planning officials with timely geologic information that will be used for effective land-use decisions; (4) to document regional landslide features that may not otherwise be available for individual site reconstruction permits and for future development.« less
Automatic trajectory measurement of large numbers of crowded objects
NASA Astrophysics Data System (ADS)
Li, Hui; Liu, Ye; Chen, Yan Qiu
2013-06-01
Complex motion patterns of natural systems, such as fish schools, bird flocks, and cell groups, have attracted great attention from scientists for years. Trajectory measurement of individuals is vital for quantitative and high-throughput study of their collective behaviors. However, such data are rare mainly due to the challenges of detection and tracking of large numbers of objects with similar visual features and frequent occlusions. We present an automatic and effective framework to measure trajectories of large numbers of crowded oval-shaped objects, such as fish and cells. We first use a novel dual ellipse locator to detect the coarse position of each individual and then propose a variance minimization active contour method to obtain the optimal segmentation results. For tracking, cost matrix of assignment between consecutive frames is trainable via a random forest classifier with many spatial, texture, and shape features. The optimal trajectories are found for the whole image sequence by solving two linear assignment problems. We evaluate the proposed method on many challenging data sets.
True polar wander on Europa from global-scale small-circle depressions.
Schenk, Paul; Matsuyama, Isamu; Nimmo, Francis
2008-05-15
The tectonic patterns and stress history of Europa are exceedingly complex and many large-scale features remain unexplained. True polar wander, involving reorientation of Europa's floating outer ice shell about the tidal axis with Jupiter, has been proposed as a possible explanation for some of the features. This mechanism is possible if the icy shell is latitudinally variable in thickness and decoupled from the rocky interior. It would impose high stress levels on the shell, leading to predictable fracture patterns. No satisfactory match to global-scale features has hitherto been found for polar wander stress patterns. Here we describe broad arcuate troughs and depressions on Europa that do not fit other proposed stress mechanisms in their current position. Using imaging from three spacecraft, we have mapped two global-scale organized concentric antipodal sets of arcuate troughs up to hundreds of kilometres long and 300 m to approximately 1.5 km deep. An excellent match to these features is found with stresses caused by an episode of approximately 80 degrees true polar wander. These depressions also appear to be geographically related to other large-scale bright and dark lineaments, suggesting that many of Europa's tectonic patterns may also be related to true polar wander.
A Realistic Seizure Prediction Study Based on Multiclass SVM.
Direito, Bruno; Teixeira, César A; Sales, Francisco; Castelo-Branco, Miguel; Dourado, António
2017-05-01
A patient-specific algorithm, for epileptic seizure prediction, based on multiclass support-vector machines (SVM) and using multi-channel high-dimensional feature sets, is presented. The feature sets, combined with multiclass classification and post-processing schemes aim at the generation of alarms and reduced influence of false positives. This study considers 216 patients from the European Epilepsy Database, and includes 185 patients with scalp EEG recordings and 31 with intracranial data. The strategy was tested over a total of 16,729.80[Formula: see text]h of inter-ictal data, including 1206 seizures. We found an overall sensitivity of 38.47% and a false positive rate per hour of 0.20. The performance of the method achieved statistical significance in 24 patients (11% of the patients). Despite the encouraging results previously reported in specific datasets, the prospective demonstration on long-term EEG recording has been limited. Our study presents a prospective analysis of a large heterogeneous, multicentric dataset. The statistical framework based on conservative assumptions, reflects a realistic approach compared to constrained datasets, and/or in-sample evaluations. The improvement of these results, with the definition of an appropriate set of features able to improve the distinction between the pre-ictal and nonpre-ictal states, hence minimizing the effect of confounding variables, remains a key aspect.
NASA Technical Reports Server (NTRS)
Toossi, Mostafa; Weisenburger, Richard; Hashemi-Kia, Mostafa
1993-01-01
This paper presents a summary of some of the work performed by McDonnell Douglas Helicopter Company under NASA Langley-sponsored rotorcraft structural dynamics program known as DAMVIBS (Design Analysis Methods for VIBrationS). A set of guidelines which is applicable to dynamic modeling, analysis, testing, and correlation of both helicopter airframes and a large variety of structural finite element models is presented. Utilization of these guidelines and the key features of their applications to vibration modeling of helicopter airframes are discussed. Correlation studies with the test data, together with the development and applications of a set of efficient finite element model checkout procedures, are demonstrated on a large helicopter airframe finite element model. Finally, the lessons learned and the benefits resulting from this program are summarized.
ERIC Educational Resources Information Center
Lavonen, Jari; Juuti, Kalle; Meisalo, Veijo
2003-01-01
In this study we analyse how the experiences of chemistry teachers on the use of a Microcomputer-Based Laboratory (MBL), gathered by a Likert-scale instrument, can be utilized to develop the new package "Empirica 2000." We used exploratory factor analysis to identify the essential features in a large set of questionnaire data to see how…
NASA Astrophysics Data System (ADS)
Wang, Weibao; Overall, Gary; Riggs, Travis; Silveston-Keith, Rebecca; Whitney, Julie; Chiu, George; Allebach, Jan P.
2013-01-01
Assessment of macro-uniformity is a capability that is important for the development and manufacture of printer products. Our goal is to develop a metric that will predict macro-uniformity, as judged by human subjects, by scanning and analyzing printed pages. We consider two different machine learning frameworks for the metric: linear regression and the support vector machine. We have implemented the image quality ruler, based on the recommendations of the INCITS W1.1 macro-uniformity team. Using 12 subjects at Purdue University and 20 subjects at Lexmark, evenly balanced with respect to gender, we conducted subjective evaluations with a set of 35 uniform b/w prints from seven different printers with five levels of tint coverage. Our results suggest that the image quality ruler method provides a reliable means to assess macro-uniformity. We then defined and implemented separate features to measure graininess, mottle, large area variation, jitter, and large-scale non-uniformity. The algorithms that we used are largely based on ISO image quality standards. Finally, we used these features computed for a set of test pages and the subjects' image quality ruler assessments of these pages to train the two different predictors - one based on linear regression and the other based on the support vector machine (SVM). Using five-fold cross-validation, we confirmed the efficacy of our predictor.
Searching for substructures in fragment spaces.
Ehrlich, Hans-Christian; Volkamer, Andrea; Rarey, Matthias
2012-12-21
A common task in drug development is the selection of compounds fulfilling specific structural features from a large data pool. While several methods that iteratively search through such data sets exist, their application is limited compared to the infinite character of molecular space. The introduction of the concept of fragment spaces (FSs), which are composed of molecular fragments and their connection rules, made the representation of large combinatorial data sets feasible. At the same time, search algorithms face the problem of structural features spanning over multiple fragments. Due to the combinatorial nature of FSs, an enumeration of all products is impossible. In order to overcome these time and storage issues, we present a method that is able to find substructures in FSs without explicit product enumeration. This is accomplished by splitting substructures into subsubstructures and mapping them onto fragments with respect to fragment connectivity rules. The method has been evaluated on three different drug discovery scenarios considering the exploration of a molecule class, the elaboration of decoration patterns for a molecular core, and the exhaustive query for peptides in FSs. FSs can be searched in seconds, and found products contain novel compounds not present in the PubChem database which may serve as hints for new lead structures.
Photometric Supernova Classification with Machine Learning
NASA Astrophysics Data System (ADS)
Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.
2016-08-01
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos
The success of automatic classification of variable stars depends strongly on the lightcurve representation. Usually, lightcurves are represented as a vector of many descriptors designed by astronomers called features. These descriptors are expensive in terms of computing, require substantial research effort to develop, and do not guarantee a good classification. Today, lightcurve representation is not entirely automatic; algorithms must be designed and manually tuned up for every survey. The amounts of data that will be generated in the future mean astronomers must develop scalable and automated analysis pipelines. In this work we present a feature learning algorithm designed for variablemore » objects. Our method works by extracting a large number of lightcurve subsequences from a given set, which are then clustered to find common local patterns in the time series. Representatives of these common patterns are then used to transform lightcurves of a labeled set into a new representation that can be used to train a classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias using only labeled data. We test our method on data sets from the Massive Compact Halo Object survey and the Optical Gravitational Lensing Experiment; the results show that our classification performance is as good as and in some cases better than the performance achieved using traditional statistical features, while the computational cost is significantly lower. With these promising results, we believe that our method constitutes a significant step toward the automation of the lightcurve classification pipeline.« less
Waldner, François; Hansen, Matthew C; Potapov, Peter V; Löw, Fabian; Newby, Terence; Ferreira, Stefanus; Defourny, Pierre
2017-01-01
The lack of sufficient ground truth data has always constrained supervised learning, thereby hindering the generation of up-to-date satellite-derived thematic maps. This is all the more true for those applications requiring frequent updates over large areas such as cropland mapping. Therefore, we present a method enabling the automated production of spatially consistent cropland maps at the national scale, based on spectral-temporal features and outdated land cover information. Following an unsupervised approach, this method extracts reliable calibration pixels based on their labels in the outdated map and their spectral signatures. To ensure spatial consistency and coherence in the map, we first propose to generate seamless input images by normalizing the time series and deriving spectral-temporal features that target salient cropland characteristics. Second, we reduce the spatial variability of the class signatures by stratifying the country and by classifying each stratum independently. Finally, we remove speckle with a weighted majority filter accounting for per-pixel classification confidence. Capitalizing on a wall-to-wall validation data set, the method was tested in South Africa using a 16-year old land cover map and multi-sensor Landsat time series. The overall accuracy of the resulting cropland map reached 92%. A spatially explicit validation revealed large variations across the country and suggests that intensive grain-growing areas were better characterized than smallholder farming systems. Informative features in the classification process vary from one stratum to another but features targeting the minimum of vegetation as well as short-wave infrared features were consistently important throughout the country. Overall, the approach showed potential for routinely delivering consistent cropland maps over large areas as required for operational crop monitoring.
Hansen, Matthew C.; Potapov, Peter V.; Löw, Fabian; Newby, Terence; Ferreira, Stefanus; Defourny, Pierre
2017-01-01
The lack of sufficient ground truth data has always constrained supervised learning, thereby hindering the generation of up-to-date satellite-derived thematic maps. This is all the more true for those applications requiring frequent updates over large areas such as cropland mapping. Therefore, we present a method enabling the automated production of spatially consistent cropland maps at the national scale, based on spectral-temporal features and outdated land cover information. Following an unsupervised approach, this method extracts reliable calibration pixels based on their labels in the outdated map and their spectral signatures. To ensure spatial consistency and coherence in the map, we first propose to generate seamless input images by normalizing the time series and deriving spectral-temporal features that target salient cropland characteristics. Second, we reduce the spatial variability of the class signatures by stratifying the country and by classifying each stratum independently. Finally, we remove speckle with a weighted majority filter accounting for per-pixel classification confidence. Capitalizing on a wall-to-wall validation data set, the method was tested in South Africa using a 16-year old land cover map and multi-sensor Landsat time series. The overall accuracy of the resulting cropland map reached 92%. A spatially explicit validation revealed large variations across the country and suggests that intensive grain-growing areas were better characterized than smallholder farming systems. Informative features in the classification process vary from one stratum to another but features targeting the minimum of vegetation as well as short-wave infrared features were consistently important throughout the country. Overall, the approach showed potential for routinely delivering consistent cropland maps over large areas as required for operational crop monitoring. PMID:28817618
Effective traffic features selection algorithm for cyber-attacks samples
NASA Astrophysics Data System (ADS)
Li, Yihong; Liu, Fangzheng; Du, Zhenyu
2018-05-01
By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.
Mathieson, Luke; Mendes, Alexandre; Marsden, John; Pond, Jeffrey; Moscato, Pablo
2017-01-01
This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686-690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rohrer, Brandon Robinson
2011-09-01
Events of interest to data analysts are sometimes difficult to characterize in detail. Rather, they consist of anomalies, events that are unpredicted, unusual, or otherwise incongruent. The purpose of this LDRD was to test the hypothesis that a biologically-inspired anomaly detection algorithm could be used to detect contextual, multi-modal anomalies. There currently is no other solution to this problem, but the existence of a solution would have a great national security impact. The technical focus of this research was the application of a brain-emulating cognition and control architecture (BECCA) to the problem of anomaly detection. One aspect of BECCA inmore » particular was discovered to be critical to improved anomaly detection capabilities: it's feature creator. During the course of this project the feature creator was developed and tested against multiple data types. Development direction was drawn from psychological and neurophysiological measurements. Major technical achievements include the creation of hierarchical feature sets created from both audio and imagery data.« less
Deep SOMs for automated feature extraction and classification from big data streaming
NASA Astrophysics Data System (ADS)
Sakkari, Mohamed; Ejbali, Ridha; Zaied, Mourad
2017-03-01
In this paper, we proposed a deep self-organizing map model (Deep-SOMs) for automated features extracting and learning from big data streaming which we benefit from the framework Spark for real time streams and highly parallel data processing. The SOMs deep architecture is based on the notion of abstraction (patterns automatically extract from the raw data, from the less to more abstract). The proposed model consists of three hidden self-organizing layers, an input and an output layer. Each layer is made up of a multitude of SOMs, each map only focusing at local headmistress sub-region from the input image. Then, each layer trains the local information to generate more overall information in the higher layer. The proposed Deep-SOMs model is unique in terms of the layers architecture, the SOMs sampling method and learning. During the learning stage we use a set of unsupervised SOMs for feature extraction. We validate the effectiveness of our approach on large data sets such as Leukemia dataset and SRBCT. Results of comparison have shown that the Deep-SOMs model performs better than many existing algorithms for images classification.
PCAN: Probabilistic Correlation Analysis of Two Non-normal Data Sets
Zoh, Roger S.; Mallick, Bani; Ivanov, Ivan; Baladandayuthapani, Veera; Manyam, Ganiraju; Chapkin, Robert S.; Lampe, Johanna W.; Carroll, Raymond J.
2016-01-01
Summary Most cancer research now involves one or more assays profiling various biological molecules, e.g., messenger RNA and micro RNA, in samples collected on the same individuals. The main interest with these genomic data sets lies in the identification of a subset of features that are active in explaining the dependence between platforms. To quantify the strength of the dependency between two variables, correlation is often preferred. However, expression data obtained from next-generation sequencing platforms are integer with very low counts for some important features. In this case, the sample Pearson correlation is not a valid estimate of the true correlation matrix, because the sample correlation estimate between two features/variables with low counts will often be close to zero, even when the natural parameters of the Poisson distribution are, in actuality, highly correlated. We propose a model-based approach to correlation estimation between two non-normal data sets, via a method we call Probabilistic Correlations ANalysis, or PCAN. PCAN takes into consideration the distributional assumption about both data sets and suggests that correlations estimated at the model natural parameter level are more appropriate than correlations estimated directly on the observed data. We demonstrate through a simulation study that PCAN outperforms other standard approaches in estimating the true correlation between the natural parameters. We then apply PCAN to the joint analysis of a microRNA (miRNA) and a messenger RNA (mRNA) expression data set from a squamous cell lung cancer study, finding a large number of negative correlation pairs when compared to the standard approaches. PMID:27037601
PCAN: Probabilistic correlation analysis of two non-normal data sets.
Zoh, Roger S; Mallick, Bani; Ivanov, Ivan; Baladandayuthapani, Veera; Manyam, Ganiraju; Chapkin, Robert S; Lampe, Johanna W; Carroll, Raymond J
2016-12-01
Most cancer research now involves one or more assays profiling various biological molecules, e.g., messenger RNA and micro RNA, in samples collected on the same individuals. The main interest with these genomic data sets lies in the identification of a subset of features that are active in explaining the dependence between platforms. To quantify the strength of the dependency between two variables, correlation is often preferred. However, expression data obtained from next-generation sequencing platforms are integer with very low counts for some important features. In this case, the sample Pearson correlation is not a valid estimate of the true correlation matrix, because the sample correlation estimate between two features/variables with low counts will often be close to zero, even when the natural parameters of the Poisson distribution are, in actuality, highly correlated. We propose a model-based approach to correlation estimation between two non-normal data sets, via a method we call Probabilistic Correlations ANalysis, or PCAN. PCAN takes into consideration the distributional assumption about both data sets and suggests that correlations estimated at the model natural parameter level are more appropriate than correlations estimated directly on the observed data. We demonstrate through a simulation study that PCAN outperforms other standard approaches in estimating the true correlation between the natural parameters. We then apply PCAN to the joint analysis of a microRNA (miRNA) and a messenger RNA (mRNA) expression data set from a squamous cell lung cancer study, finding a large number of negative correlation pairs when compared to the standard approaches. © 2016, The International Biometric Society.
Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N
2018-05-15
In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Marble, Jay A.; Gorman, John D.
1999-08-01
A feature based approach is taken to reduce the occurrence of false alarms in foliage penetrating, ultra-wideband, synthetic aperture radar data. A set of 'generic' features is defined based on target size, shape, and pixel intensity. A second set of features is defined that contains generic features combined with features based on scattering phenomenology. Each set is combined using a quadratic polynomial discriminant (QPD), and performance is characterized by generating a receiver operating characteristic (ROC) curve. Results show that the feature set containing phenomenological features improves performance against both broadside and end-on targets. Performance against end-on targets, however, is especially pronounced.
Mass wasting features in Juventae Chasma, Mars
NASA Astrophysics Data System (ADS)
Sarkar, Ranjan; Singh, Pragya; Porwal, Alok; Ganesh, Indujaa
2016-07-01
Introduction : We report mass-wasting features preserved as debris aprons from Juventae Chasma. Diverse lines of evidence and associated geomorphological features indicate that fluidized ice or water within the wall rocks of the chasma could be responsible for mobilizing the debris. Description : The distinctive features of the landslides in Juvenate Chasma are: (1) lack of a well-defined crown or a clear-cut section at their point of origin and instead the presence of amphitheatre-headed tributary canyons; (2) absence of slump blocks; (3) overlapping of debris aprons; (4) a variety of surface textures from fresh and grooved to degraded and chaotic; (5) rounded lobes of debris aprons; (6) large variation of sizes from small lumps (~0.52 m2) to large tongue shaped ones (~ 80 m2); (7) smaller average size of landslides as compared to other chasmas; and (8) occasional preservation of fresh surficial features indicating recent emplacement. Discussion : Amphitheatre-headed tributary canyons, which are formed due to ground water sapping, indicate that the same was responsible for wall-section collapse, although a structural control cannot be completely ruled out. The emplacement of the mass wasting features preferentially at the mouths of amphitheatre-headed tributary canyons along with the rounded flow fronts of the debris suggest fluids may have played a vital role in their emplacement. The mass-wasting features in Juventae Chasma are unique compared to other landslides in Valles Marineris despite commonalities such as the radial furrows, fan-shaped outlines, overlapping aprons and overtopped obstacles. The unique set of features and close association with amphitheatre-headed tributary canyons imply that the trigger of the landslides was not structural or tectonic but possibly weakness imparted by the presence of water or ice in the pore-spaces of the wall. Craters with fluidized ejecta blankets and scalloped depressions in the surrounding plateau also support this possibility. Depending on the amounts of fluids involved at the time of emplacement, these mass movements may also qualify as debris flows. The role of fluids in the Valles Marineris landslides is still debated; however, in the Juventae Chasma landslides we see unique features which set these apart from other landslides in Valles Marineris. Further study is required to fully investigate the mechanism of emplacement of these debris.
Underwater target classification using wavelet packets and neural networks.
Azimi-Sadjadi, M R; Yao, D; Huang, Q; Dobeck, G J
2000-01-01
In this paper, a new subband-based classification scheme is developed for classifying underwater mines and mine-like targets from the acoustic backscattered signals. The system consists of a feature extractor using wavelet packets in conjunction with linear predictive coding (LPC), a feature selection scheme, and a backpropagation neural-network classifier. The data set used for this study consists of the backscattered signals from six different objects: two mine-like targets and four nontargets for several aspect angles. Simulation results on ten different noisy realizations and for signal-to-noise ratio (SNR) of 12 dB are presented. The receiver operating characteristic (ROC) curve of the classifier generated based on these results demonstrated excellent classification performance of the system. The generalization ability of the trained network was demonstrated by computing the error and classification rate statistics on a large data set. A multiaspect fusion scheme was also adopted in order to further improve the classification performance.
A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data
Batal, Iyad; Valizadegan, Hamed; Cooper, Gregory F.; Hauskrecht, Milos
2013-01-01
We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the Minimal Predictive Temporal Patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in efficiently learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems. PMID:25309815
Response comment: Carbon sequestration on Mars
Edwards, Christopher; Ehlmann, Bethany L.
2016-01-01
Martian atmospheric pressure has important implications for the past and present habitability of the planet, including the timing and causes of environmental change. The ancient Martian surface is strewn with evidence for early water bound in minerals (e.g., Ehlmann and Edwards, 2014) and recorded in surface features such as large catastrophically created outflow channels (e.g., Carr, 1979), valley networks (Hynek et al., 2010; Irwin et al., 2005), and crater lakes (e.g., Fassett and Head, 2008). Using orbital spectral data sets coupled with geologic maps and a set of numerical spectral analysis models, Edwards and Ehlmann (2015) constrained the amount of atmospheric sequestration in early Martian rocks and found that the majority of this sequestration occurred prior to the formation of the early Hesperian/late Noachian valley networks (Fassett and Head, 2011; Hynek et al., 2010), thus implying the atmosphere was already thin by the time these surface-water-related features were formed.
Two-photon calcium imaging during fictive navigation in virtual environments.
Ahrens, Misha B; Huang, Kuo Hua; Narayan, Sujatha; Mensh, Brett D; Engert, Florian
2013-01-01
A full understanding of nervous system function requires recording from large populations of neurons during naturalistic behaviors. Here we enable paralyzed larval zebrafish to fictively navigate two-dimensional virtual environments while we record optically from many neurons with two-photon imaging. Electrical recordings from motor nerves in the tail are decoded into intended forward swims and turns, which are used to update a virtual environment displayed underneath the fish. Several behavioral features-such as turning responses to whole-field motion and dark avoidance-are well-replicated in this virtual setting. We readily observed neuronal populations in the hindbrain with laterally selective responses that correlated with right or left optomotor behavior. We also observed neurons in the habenula, pallium, and midbrain with response properties specific to environmental features. Beyond single-cell correlations, the classification of network activity in such virtual settings promises to reveal principles of brainwide neural dynamics during behavior.
A Semisupervised Support Vector Machines Algorithm for BCI Systems
Qin, Jianzhao; Li, Yuanqing; Sun, Wei
2007-01-01
As an emerging technology, brain-computer interfaces (BCIs) bring us new communication interfaces which translate brain activities into control signals for devices like computers, robots, and so forth. In this study, we propose a semisupervised support vector machine (SVM) algorithm for brain-computer interface (BCI) systems, aiming at reducing the time-consuming training process. In this algorithm, we apply a semisupervised SVM for translating the features extracted from the electrical recordings of brain into control signals. This SVM classifier is built from a small labeled data set and a large unlabeled data set. Meanwhile, to reduce the time for training semisupervised SVM, we propose a batch-mode incremental learning method, which can also be easily applied to the online BCI systems. Additionally, it is suggested in many studies that common spatial pattern (CSP) is very effective in discriminating two different brain states. However, CSP needs a sufficient labeled data set. In order to overcome the drawback of CSP, we suggest a two-stage feature extraction method for the semisupervised learning algorithm. We apply our algorithm to two BCI experimental data sets. The offline data analysis results demonstrate the effectiveness of our algorithm. PMID:18368141
Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.
Pasolli, Edoardo; Truong, Duy Tin; Malik, Faizan; Waldron, Levi; Segata, Nicola
2016-07-01
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
NASA Astrophysics Data System (ADS)
Kilpatrick, Brian; Cubillos, Patricio; Bruno, Giovanni; Lewis, Nikole K.; Stevenson, Kevin B.; Wakeford, Hannah; Blecic, Jasmina; Burrows, Adam Seth; Deming, Drake; Heng, Kevin; Line, Michael R.; Madhusudhan, Nikku; Morley, Caroline; Waldmann, Ingo P.; Transiting Exoplanet Early Release Science Community
2017-06-01
We present observations of the Hubble Space Telescope (HST) ``A Preparatory Program to Identify the Single Best Transiting Exoplanet for JWST Early Release Science" for WASP-63b, one of the community targets proposed for the James Webb Space Telescope (JWST) Early Release Science (ERS) program. A large collaboration of transiting exoplanet scientists identified a set of ``community targets" which meet a certain set of criteria for ecliptic latitude, period, host star brightness, well constrained orbital parameters, and strength of spectroscopic features. WASP-63b was one of the targets identified as a potential candidate for the ERS program. It is presented as an inflated planet with a large signal. It will be accessible to JWST approximately six months after the planned start of Cycle 1/ERS in April 2019 making it an ideal candidate should there be any delays in the JWST timetable. Here, we observe WASP-63b to evaluate its suitability as the best target to test the capabilities of JWST. Ideally, a clear atmosphere will be best suited for bench marking the instruments ability to detect spectroscopic features. We can use the strength of the water absorption feature at 1.4 μm as a way to determine the presence of obscuring clouds/hazes. The results of atmospheric retrieval are presented along with a discussion on the suitability of WASP-63b as the best target to be observed during the ERS Program.
Detection of explosive cough events in audio recordings by internal sound analysis.
Rocha, B M; Mendes, L; Couceiro, R; Henriques, J; Carvalho, P; Paiva, R P
2017-07-01
We present a new method for the discrimination of explosive cough events, which is based on a combination of spectral content descriptors and pitch-related features. After the removal of near-silent segments, a vector of event boundaries is obtained and a proposed set of 9 features is extracted for each event. Two data sets, recorded using electronic stethoscopes and comprising a total of 46 healthy subjects and 13 patients, were employed to evaluate the method. The proposed feature set is compared to three other sets of descriptors: a baseline, a combination of both sets, and an automatic selection of the best 10 features from both sets. The combined feature set yields good results on the cross-validated database, attaining a sensitivity of 92.3±2.3% and a specificity of 84.7±3.3%. Besides, this feature set seems to generalize well when it is trained on a small data set of patients, with a variety of respiratory and cardiovascular diseases, and tested on a bigger data set of mostly healthy subjects: a sensitivity of 93.4% and a specificity of 83.4% are achieved in those conditions. These results demonstrate that complementing the proposed feature set with a baseline set is a promising approach.
Robust k-mer frequency estimation using gapped k-mers
Ghandi, Mahmoud; Mohammad-Noori, Morteza
2013-01-01
Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome. PMID:23861010
Robust k-mer frequency estimation using gapped k-mers.
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Beer, Michael A
2014-08-01
Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome.
Anderson, J R; Mohammed, S; Grimm, B; Jones, B W; Koshevoy, P; Tasdizen, T; Whitaker, R; Marc, R E
2011-01-01
Modern microscope automation permits the collection of vast amounts of continuous anatomical imagery in both two and three dimensions. These large data sets present significant challenges for data storage, access, viewing, annotation and analysis. The cost and overhead of collecting and storing the data can be extremely high. Large data sets quickly exceed an individual's capability for timely analysis and present challenges in efficiently applying transforms, if needed. Finally annotated anatomical data sets can represent a significant investment of resources and should be easily accessible to the scientific community. The Viking application was our solution created to view and annotate a 16.5 TB ultrastructural retinal connectome volume and we demonstrate its utility in reconstructing neural networks for a distinctive retinal amacrine cell class. Viking has several key features. (1) It works over the internet using HTTP and supports many concurrent users limited only by hardware. (2) It supports a multi-user, collaborative annotation strategy. (3) It cleanly demarcates viewing and analysis from data collection and hosting. (4) It is capable of applying transformations in real-time. (5) It has an easily extensible user interface, allowing addition of specialized modules without rewriting the viewer. © 2010 The Authors Journal of Microscopy © 2010 The Royal Microscopical Society.
NASA Astrophysics Data System (ADS)
Revollo Sarmiento, G. N.; Cipolletti, M. P.; Perillo, M. M.; Delrieux, C. A.; Perillo, Gerardo M. E.
2016-03-01
Tidal flats generally exhibit ponds of diverse size, shape, orientation and origin. Studying the genesis, evolution, stability and erosive mechanisms of these geographic features is critical to understand the dynamics of coastal wetlands. However, monitoring these locations through direct access is hard and expensive, not always feasible, and environmentally damaging. Processing remote sensing images is a natural alternative for the extraction of qualitative and quantitative data due to their non-invasive nature. In this work, a robust methodology for automatic classification of ponds and tidal creeks in tidal flats using Google Earth images is proposed. The applicability of our method is tested in nine zones with different morphological settings. Each zone is processed by a segmentation stage, where ponds and tidal creeks are identified. Next, each geographical feature is measured and a set of shape descriptors is calculated. This dataset, together with a-priori classification of each geographical feature, is used to define a regression model, which allows an extensive automatic classification of large volumes of data discriminating ponds and tidal creeks against other various geographical features. In all cases, we identified and automatically classified different geographic features with an average accuracy over 90% (89.7% in the worst case, and 99.4% in the best case). These results show the feasibility of using freely available Google Earth imagery for the automatic identification and classification of complex geographical features. Also, the presented methodology may be easily applied in other wetlands of the world and perhaps employing other remote sensing imagery.
Form drag in rivers due to small-scale natural topographic features: 2. Irregular sequences
Kean, J.W.; Smith, J.D.
2006-01-01
The size, shape, and spacing of small-scale topographic features found on the boundaries of natural streams, rivers, and floodplains can be quite variable. Consequently, a procedure for determining the form drag on irregular sequences of different-sized topographic features is essential for calculating near-boundary flows and sediment transport. A method for carrying out such calculations is developed in this paper. This method builds on the work of Kean and Smith (2006), which describes the flow field for the simpler case of a regular sequence of identical topographic features. Both approaches model topographic features as two-dimensional elements with Gaussian-shaped cross sections defined in terms of three parameters. Field measurements of bank topography are used to show that (1) the magnitude of these shape parameters can vary greatly between adjacent topographic features and (2) the variability of these shape parameters follows a lognormal distribution. Simulations using an irregular set of topographic roughness elements show that the drag on an individual element is primarily controlled by the size and shape of the feature immediately upstream and that the spatial average of the boundary shear stress over a large set of randomly ordered elements is relatively insensitive to the sequence of the elements. In addition, a method to transform the topography of irregular surfaces into an equivalently rough surface of regularly spaced, identical topographic elements also is given. The methods described in this paper can be used to improve predictions of flow resistance in rivers as well as quantify bank roughness.
Yu, Jin; Abidi, Syed Sibte Raza; Artes, Paul; McIntyre, Andy; Heywood, Malcolm
2005-01-01
The availability of modern imaging techniques such as Confocal Scanning Laser Tomography (CSLT) for capturing high-quality optic nerve images offer the potential for developing automatic and objective methods for diagnosing glaucoma. We present a hybrid approach that features the analysis of CSLT images using moment methods to derive abstract image defining features. The features are then used to train classifers for automatically distinguishing CSLT images of normal and glaucoma patient. As a first, in this paper, we present investigations in feature subset selction methods for reducing the relatively large input space produced by the moment methods. We use neural networks and support vector machines to determine a sub-set of moments that offer high classification accuracy. We demonstratee the efficacy of our methods to discriminate between healthy and glaucomatous optic disks based on shape information automatically derived from optic disk topography and reflectance images.
Effects of achievement contexts on the meaning structure of emotion words.
Gentsch, Kornelia; Loderer, Kristina; Soriano, Cristina; Fontaine, Johnny R J; Eid, Michael; Pekrun, Reinhard; Scherer, Klaus R
2018-03-01
Little is known about the impact of context on the meaning of emotion words. In the present study, we used a semantic profiling instrument (GRID) to investigate features representing five emotion components (appraisal, bodily reaction, expression, action tendencies, and feeling) of 11 emotion words in situational contexts involving success or failure. We compared these to the data from an earlier study in which participants evaluated the typicality of features out of context. Profile analyses identified features for which typicality changed as a function of context for all emotion words, except contentment, with appraisal features being most frequently affected. Those context effects occurred for both hypothesised basic and non-basic emotion words. Moreover, both data sets revealed a four-dimensional structure. The four dimensions were largely similar (valence, power, arousal, and novelty). The results suggest that context may not change the underlying dimensionality but affects facets of the meaning of emotion words.
Dissuasive exit signage for building fire evacuation.
Olander, Joakim; Ronchi, Enrico; Lovreglio, Ruggiero; Nilsson, Daniel
2017-03-01
This work presents the result of a questionnaire study which investigates the design of dissuasive emergency signage, i.e. signage conveying a message of not utilizing a specific exit door. The work analyses and tests a set of key features of dissuasive emergency signage using the Theory of Affordances. The variables having the largest impact on observer preference, interpretation and noticeability of the signage have been identified. Results show that features which clearly negate the exit-message of the original positive exit signage are most effective, for instance a red X-marking placed across the entirety of the exit signage conveys a clear dissuasive message. Other features of note are red flashing lights and alternation of colour. The sense of urgency conveyed by the sign is largely affected by sensory inputs such as red flashing lights or other features which cause the signs to break the tendencies of normalcy. Copyright © 2016 Elsevier Ltd. All rights reserved.
A robust dataset-agnostic heart disease classifier from Phonocardiogram.
Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M
2017-07-01
Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.
Blöchliger, Nicolas; Caflisch, Amedeo; Vitalis, Andreas
2015-11-10
Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Molecular dynamics simulations produce high-dimensional data sets describing molecules observed in time. Here, we propose to globally or locally weight simulation features based on effective rates. This emphasizes, in a data-driven manner, slow degrees of freedom that often report on the metastable states sampled by the molecular system. We couple this idea to several unsupervised learning protocols. Our approach unmasks slow side chain dynamics within the native state of a miniprotein and reveals additional metastable conformations of a protein. The approach can be combined with most algorithms for clustering or dimensionality reduction.
2012-01-01
Background Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment. Results A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets. Conclusions The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set. PMID:23122055
Stanislawski, Larry V.; Liu, Yan; Buttenfield, Barbara P.; Survila, Kornelijus; Wendel, Jeffrey; Okok, Abdurraouf
2016-01-01
The National Hydrography Dataset (NHD) for the United States furnishes a comprehensive set of vector features representing the surface-waters in the country (U.S. Geological Survey 2000). The high-resolution (HR) layer of the NHD is largely comprised of hydrographic features originally derived from 1:24,000-scale (24K) U.S. Topographic maps. However, in recent years (2009 to present) densified hydrographic feature content, from sources as large as 1:2,400, have been incorporated into some watersheds of the HR NHD within the conterminous United States to better support the needs of various local and state organizations. As such, the HR NHD is a multiresolution dataset with obvious data density variations because of scale changes. In addition, data density variations exist within the HR NHD that are particularly evident in the surface-water flow network (NHD flowlines) because of natural variations of local geographic conditions; and also because of unintentional compilation inconsistencies due to variations in data collection standards and climate conditions over the many years of 24K hydrographic data collection (US Geological Survey 1955).
Construction of Penrose Diagrams for Dynamic Black Holes
NASA Technical Reports Server (NTRS)
Brown, Beth A.; Lindesay, James
2008-01-01
A set of Penrose diagrams is constructed in order to examine the large-scale causal structure of black holes with dynamic horizons. Coordinate dependencies of significant features, such as the event horizon and radial mass scale, are demonstrated on the diagrams. Unlike in static Schwarzschild geometries, the radial mass scale is clearly seen to differ from the horizon. Trajectories for photons near the horizon are briefly discussed.
ERIC Educational Resources Information Center
Leonelli, Sabina
2013-01-01
The collection and dissemination of data on human and nonhuman organisms has become a central feature of 21st-century biology and has been endorsed by funding agencies in the United States and Europe as crucial to translating biological research into therapeutic and agricultural innovation. Large molecular data sets, often referred to as "big…
Multi-Sensor Information Integration and Automatic Understanding
2008-11-01
also produced a real-time implementation of the tracking and anomalous behavior detection system that runs on real- world data – either using real-time...surveillance and airborne IED detection . 15. SUBJECT TERMS Multi-hypothesis tracking , particle filters, anomalous behavior detection , Bayesian...analyst to support decision making with large data sets. A key feature of the real-time tracking and behavior detection system developed is that the
[Tyramine and serotonin syndromes. Pharmacological, medical and legal remarks].
Toro-Martínez, Esteban
2005-01-01
The tyramine syndrome and the serotonin syndrome are a complex of signs and symptoms that are thought to be largely attributable to drug - drug interactions or drug - food interactions that enhances norepinephrine o serotonin activity. This article reviews: pharmacological basis of those syndromes; clinical features; forbidden foods, drug-drug interactions, and treatment options. Finally a set of legal recommendations are proposed to avoid liability litigations.
NASA Astrophysics Data System (ADS)
Andreon, S.; Gargiulo, G.; Longo, G.; Tagliaferri, R.; Capuano, N.
2000-12-01
Astronomical wide-field imaging performed with new large-format CCD detectors poses data reduction problems of unprecedented scale, which are difficult to deal with using traditional interactive tools. We present here NExt (Neural Extractor), a new neural network (NN) based package capable of detecting objects and performing both deblending and star/galaxy classification in an automatic way. Traditionally, in astronomical images, objects are first distinguished from the noisy background by searching for sets of connected pixels having brightnesses above a given threshold; they are then classified as stars or as galaxies through diagnostic diagrams having variables chosen according to the astronomer's taste and experience. In the extraction step, assuming that images are well sampled, NExt requires only the simplest a priori definition of `what an object is' (i.e. it keeps all structures composed of more than one pixel) and performs the detection via an unsupervised NN, approaching detection as a clustering problem that has been thoroughly studied in the artificial intelligence literature. The first part of the NExt procedure consists of an optimal compression of the redundant information contained in the pixels via a mapping from pixel intensities to a subspace individualized through principal component analysis. At magnitudes fainter than the completeness limit, stars are usually almost indistinguishable from galaxies, and therefore the parameters characterizing the two classes do not lie in disconnected subspaces, thus preventing the use of unsupervised methods. We therefore adopted a supervised NN (i.e. a NN that first finds the rules to classify objects from examples and then applies them to the whole data set). In practice, each object is classified depending on its membership of the regions mapping the input feature space in the training set. In order to obtain an objective and reliable classification, instead of using an arbitrarily defined set of features we use a NN to select the most significant features among the large number of measured ones, and then we use these selected features to perform the classification task. In order to optimize the performance of the system, we implemented and tested several different models of NN. The comparison of the NExt performance with that of the best detection and classification package known to the authors (SExtractor) shows that NExt is at least as effective as the best traditional packages.
The JPL Tropical Cyclone Information System: Data and Tools for Researchers
NASA Astrophysics Data System (ADS)
Knosp, B. W.; Ao, C. O.; Chao, Y.; Dang, V.; Garay, M.; Haddad, Z.; Hristova-Veleva, S.; Lambrigtsen, B.; Li, P. P.; Park, K.; Poulsen, W. L.; Rosenman, M. A.; Su, H.; Vane, D.; Vu, Q. A.; Willis, J. K.; Wu, D.
2008-12-01
The JPL Tropical Cyclone Information System (TCIS) is now open to the public. This web portal is designed to assist researchers by providing a one-stop shop for hurricane related data and analysis tools. While there are currently many places that offer storm data, plots, and other information, none offer an extensive archive of data files and images in a common space. The JPL TCIS was created to fill this gap. As currently configured, the JPL Tropical Cyclone Portal has three main features for researchers. The first feature consists of storm-scale data and plots for both observed and modeled data. As of the TCIS' first release, the entire 2005 storm season has been populated with data and plots from AIRS, MLS, AMSU-A, QuikSCAT, Argo floats, WRF models, GPS, and others. Storm data is subsetted to a 1000x1000 km window around the hurricane track for all six oceanic cyclone basins, and all the available data during the life time of any storm can be downloaded with one mouse click. Users can also view pre-generated storm-scale plots from all these data sets that are all co-located to the same temporal and spatial parameters. Work is currently underway to backfill all storm seasons to 1998 with as many relevant data sets as possible. The second offering from this web portal are large-scale data sets and associated visualization tools powered by Google Maps. On this interactive map, researchers can view a particular storm's intensity and track. Users may also overlay large-scale data such as aerosol maps from MODIS and MISR, and a blended microwave sea-surface temperature (SST) to gain an understanding of the large-scale environment of the storm. For example, by using this map, the cold sea-surface temperature wake can be tracked as a storm passes by. The third feature of this portal deals with interactive model and data analysis. A single-parameter analysis tools has recently been developed and added to this portal where users can plot maps, profiles, and histograms of any given data set on this portal and also get several statistics, such as the mean, standard deviation, and median of the data they are viewing. Also available is the ability to compare and condition data sets with each other. For example, users can choose to view sea surface temperature when wind speed is X m/s. Additional data sets continue to be added to this tool and it will eventually expand to include multi- parameter analyses. In this presentation, we will describe the current configuration of the JPL Tropical Cyclone Portal and demonstrate how it will be an asset to researchers. Future plans for the site will also be discussed.
Zhao, Weixiang; Sankaran, Shankar; Ibáñez, Ana M; Dandekar, Abhaya M; Davis, Cristina E
2009-08-04
This study introduces two-dimensional (2-D) wavelet analysis to the classification of gas chromatogram differential mobility spectrometry (GC/DMS) data which are composed of retention time, compensation voltage, and corresponding intensities. One reported method to process such large data sets is to convert 2-D signals to 1-D signals by summing intensities either across retention time or compensation voltage, but it can lose important signal information in one data dimension. A 2-D wavelet analysis approach keeps the 2-D structure of original signals, while significantly reducing data size. We applied this feature extraction method to 2-D GC/DMS signals measured from control and disordered fruit and then employed two typical classification algorithms to testify the effects of the resultant features on chemical pattern recognition. Yielding a 93.3% accuracy of separating data from control and disordered fruit samples, 2-D wavelet analysis not only proves its feasibility to extract feature from original 2-D signals but also shows its superiority over the conventional feature extraction methods including converting 2-D to 1-D and selecting distinguishable pixels from training set. Furthermore, this process does not require coupling with specific pattern recognition methods, which may help ensure wide applications of this method to 2-D spectrometry data.
Radiomics: Extracting more information from medical images using advanced feature analysis
Lambin, Philippe; Rios-Velazquez, Emmanuel; Leijenaar, Ralph; Carvalho, Sara; van Stiphout, Ruud G.P.M.; Granton, Patrick; Zegers, Catharina M.L.; Gillies, Robert; Boellard, Ronald; Dekker, André; Aerts, Hugo J.W.L.
2015-01-01
Solid cancers are spatially and temporally heterogeneous. This limits the use of invasive biopsy based molecular assays but gives huge potential for medical imaging, which has the ability to capture intra-tumoural heterogeneity in a non-invasive way. During the past decades, medical imaging innovations with new hardware, new imaging agents and standardised protocols, allows the field to move towards quantitative imaging. Therefore, also the development of automated and reproducible analysis methodologies to extract more information from image-based features is a requirement. Radiomics – the high-throughput extraction of large amounts of image features from radiographic images – addresses this problem and is one of the approaches that hold great promises but need further validation in multi-centric settings and in the laboratory. PMID:22257792
Gaussian mixture models-based ship target recognition algorithm in remote sensing infrared images
NASA Astrophysics Data System (ADS)
Yao, Shoukui; Qin, Xiaojuan
2018-02-01
Since the resolution of remote sensing infrared images is low, the features of ship targets become unstable. The issue of how to recognize ships with fuzzy features is an open problem. In this paper, we propose a novel ship target recognition algorithm based on Gaussian mixture models (GMMs). In the proposed algorithm, there are mainly two steps. At the first step, the Hu moments of these ship target images are calculated, and the GMMs are trained on the moment features of ships. At the second step, the moment feature of each ship image is assigned to the trained GMMs for recognition. Because of the scale, rotation, translation invariance property of Hu moments and the power feature-space description ability of GMMs, the GMMs-based ship target recognition algorithm can recognize ship reliably. Experimental results of a large simulating image set show that our approach is effective in distinguishing different ship types, and obtains a satisfactory ship recognition performance.
Fine-tuning convolutional deep features for MRI based brain tumor classification
NASA Astrophysics Data System (ADS)
Ahmed, Kaoutar B.; Hall, Lawrence O.; Goldgof, Dmitry B.; Liu, Renhao; Gatenby, Robert A.
2017-03-01
Prediction of survival time from brain tumor magnetic resonance images (MRI) is not commonly performed and would ordinarily be a time consuming process. However, current cross-sectional imaging techniques, particularly MRI, can be used to generate many features that may provide information on the patient's prognosis, including survival. This information can potentially be used to identify individuals who would benefit from more aggressive therapy. Rather than using pre-defined and hand-engineered features as with current radiomics methods, we investigated the use of deep features extracted from pre-trained convolutional neural networks (CNNs) in predicting survival time. We also provide evidence for the power of domain specific fine-tuning in improving the performance of a pre-trained CNN's, even though our data set is small. We fine-tuned a CNN initially trained on a large natural image recognition dataset (Imagenet ILSVRC) and transferred the learned feature representations to the survival time prediction task, obtaining over 81% accuracy in a leave one out cross validation.
Insights into multimodal imaging classification of ADHD
Colby, John B.; Rudie, Jeffrey D.; Brown, Jesse A.; Douglas, Pamela K.; Cohen, Mark S.; Shehzad, Zarrar
2012-01-01
Attention deficit hyperactivity disorder (ADHD) currently is diagnosed in children by clinicians via subjective ADHD-specific behavioral instruments and by reports from the parents and teachers. Considering its high prevalence and large economic and societal costs, a quantitative tool that aids in diagnosis by characterizing underlying neurobiology would be extremely valuable. This provided motivation for the ADHD-200 machine learning (ML) competition, a multisite collaborative effort to investigate imaging classifiers for ADHD. Here we present our ML approach, which used structural and functional magnetic resonance imaging data, combined with demographic information, to predict diagnostic status of individuals with ADHD from typically developing (TD) children across eight different research sites. Structural features included quantitative metrics from 113 cortical and non-cortical regions. Functional features included Pearson correlation functional connectivity matrices, nodal and global graph theoretical measures, nodal power spectra, voxelwise global connectivity, and voxelwise regional homogeneity. We performed feature ranking for each site and modality using the multiple support vector machine recursive feature elimination (SVM-RFE) algorithm, and feature subset selection by optimizing the expected generalization performance of a radial basis function kernel SVM (RBF-SVM) trained across a range of the top features. Site-specific RBF-SVMs using these optimal feature sets from each imaging modality were used to predict the class labels of an independent hold-out test set. A voting approach was used to combine these multiple predictions and assign final class labels. With this methodology we were able to predict diagnosis of ADHD with 55% accuracy (versus a 39% chance level in this sample), 33% sensitivity, and 80% specificity. This approach also allowed us to evaluate predictive structural and functional features giving insight into abnormal brain circuitry in ADHD. PMID:22912605
A unified framework of image latent feature learning on Sina microblog
NASA Astrophysics Data System (ADS)
Wei, Jinjin; Jin, Zhigang; Zhou, Yuan; Zhang, Rui
2015-10-01
Large-scale user-contributed images with texts are rapidly increasing on the social media websites, such as Sina microblog. However, the noise and incomplete correspondence between the images and the texts give rise to the difficulty in precise image retrieval and ranking. In this paper, a hypergraph-based learning framework is proposed for image ranking, which simultaneously utilizes visual feature, textual content and social link information to estimate the relevance between images. Representing each image as a vertex in the hypergraph, complex relationship between images can be reflected exactly. Then updating the weight of hyperedges throughout the hypergraph learning process, the effect of different edges can be adaptively modulated in the constructed hypergraph. Furthermore, the popularity degree of the image is employed to re-rank the retrieval results. Comparative experiments on a large-scale Sina microblog data-set demonstrate the effectiveness of the proposed approach.
Volatility return intervals analysis of the Japanese market
NASA Astrophysics Data System (ADS)
Jung, W.-S.; Wang, F. Z.; Havlin, S.; Kaizoji, T.; Moon, H.-T.; Stanley, H. E.
2008-03-01
We investigate scaling and memory effects in return intervals between price volatilities above a certain threshold q for the Japanese stock market using daily and intraday data sets. We find that the distribution of return intervals can be approximated by a scaling function that depends only on the ratio between the return interval τ and its mean <τ>. We also find memory effects such that a large (or small) return interval follows a large (or small) interval by investigating the conditional distribution and mean return interval. The results are similar to previous studies of other markets and indicate that similar statistical features appear in different financial markets. We also compare our results between the period before and after the big crash at the end of 1989. We find that scaling and memory effects of the return intervals show similar features although the statistical properties of the returns are different.
Policy Driven Development: Flexible Policy Insertion for Large Scale Systems.
Demchak, Barry; Krüger, Ingolf
2012-07-01
The success of a software system depends critically on how well it reflects and adapts to stakeholder requirements. Traditional development methods often frustrate stakeholders by creating long latencies between requirement articulation and system deployment, especially in large scale systems. One source of latency is the maintenance of policy decisions encoded directly into system workflows at development time, including those involving access control and feature set selection. We created the Policy Driven Development (PDD) methodology to address these development latencies by enabling the flexible injection of decision points into existing workflows at runtime , thus enabling policy composition that integrates requirements furnished by multiple, oblivious stakeholder groups. Using PDD, we designed and implemented a production cyberinfrastructure that demonstrates policy and workflow injection that quickly implements stakeholder requirements, including features not contemplated in the original system design. PDD provides a path to quickly and cost effectively evolve such applications over a long lifetime.
NASA Astrophysics Data System (ADS)
Dicente Cid, Yashin; Mamonov, Artem; Beers, Andrew; Thomas, Armin; Kovalev, Vassili; Kalpathy-Cramer, Jayashree; Müller, Henning
2017-03-01
The analysis of large data sets can help to gain knowledge about specific organs or on specific diseases, just as big data analysis does in many non-medical areas. This article aims to gain information from 3D volumes, so the visual content of lung CT scans of a large number of patients. In the case of the described data set, only little annotation is available on the patients that were all part of an ongoing screening program and besides age and gender no information on the patient and the findings was available for this work. This is a scenario that can happen regularly as image data sets are produced and become available in increasingly large quantities but manual annotations are often not available and also clinical data such as text reports are often harder to share. We extracted a set of visual features from 12,414 CT scans of 9,348 patients that had CT scans of the lung taken in the context of a national lung screening program in Belarus. Lung fields were segmented by two segmentation algorithms and only cases where both algorithms were able to find left and right lung and had a Dice coefficient above 0.95 were analyzed. This assures that only segmentations of good quality were used to extract features of the lung. Patients ranged in age from 0 to 106 years. Data analysis shows that age can be predicted with a fairly high accuracy for persons under 15 years. Relatively good results were also obtained between 30 and 65 years where a steady trend is seen. For young adults and older people the results are not as good as variability is very high in these groups. Several visualizations of the data show the evolution patters of the lung texture, size and density with age. The experiments allow learning the evolution of the lung and the gained results show that even with limited metadata we can extract interesting information from large-scale visual data. These age-related changes (for example of the lung volume, the density histogram of the tissue) can also be taken into account for the interpretation of new cases. The database used includes patients that had suspicions on a chest X-ray, so it is not a group of healthy people, and only tendencies and not a model of a healthy lung at a specific age can be derived.
Probabilistic combination of static and dynamic gait features for verification
NASA Astrophysics Data System (ADS)
Bazin, Alex I.; Nixon, Mark S.
2005-03-01
This paper describes a novel probabilistic framework for biometric identification and data fusion. Based on intra and inter-class variation extracted from training data, posterior probabilities describing the similarity between two feature vectors may be directly calculated from the data using the logistic function and Bayes rule. Using a large publicly available database we show the two imbalanced gait modalities may be fused using this framework. All fusion methods tested provide an improvement over the best modality, with the weighted sum rule giving the best performance, hence showing that highly imbalanced classifiers may be fused in a probabilistic setting; improving not only the performance, but also generalized application capability.
Büssow, Konrad; Hoffmann, Steve; Sievert, Volker
2002-12-19
Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.
Parallel/distributed direct method for solving linear systems
NASA Technical Reports Server (NTRS)
Lin, Avi
1990-01-01
A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.
Constraints on inflation with LSS surveys: features in the primordial power spectrum
NASA Astrophysics Data System (ADS)
Palma, Gonzalo A.; Sapone, Domenico; Sypsas, Spyros
2018-06-01
We analyse the efficiency of future large scale structure surveys to unveil the presence of scale dependent features in the primordial spectrum—resulting from cosmic inflation—imprinted in the distribution of galaxies. Features may appear as a consequence of non-trivial dynamics during cosmic inflation, in which one or more background quantities experienced small but rapid deviations from their characteristic slow-roll evolution. We consider two families of features: localised features and oscillatory extended features. To characterise them we employ various possible templates parametrising their scale dependence and provide forecasts on the constraints on these parametrisations for LSST like surveys. We perform a Fisher matrix analysis for three observables: cosmic microwave background (CMB), galaxy clustering and weak lensing. We find that the combined data set of these observables will be able to limit the presence of features down to levels that are more restrictive than current constraints coming from CMB observations only. In particular, we address the possibility of gaining information on currently known deviations from scale invariance inferred from CMB data, such as the feature appearing at the l ~ 20 multipole (which is the main contribution to the low-l deficit) and another one around l ~ 800.
Object Classification With Joint Projection and Low-Rank Dictionary Learning.
Foroughi, Homa; Ray, Nilanjan; Hong Zhang
2018-02-01
For an object classification system, the most critical obstacles toward real-world applications are often caused by large intra-class variability, arising from different lightings, occlusion, and corruption, in limited sample sets. Most methods in the literature would fail when the training samples are heavily occluded, corrupted or have significant illumination or viewpoint variations. Besides, most of the existing methods and especially deep learning-based methods, need large training sets to achieve a satisfactory recognition performance. Although using the pre-trained network on a generic large-scale data set and fine-tune it to the small-sized target data set is a widely used technique, this would not help when the content of base and target data sets are very different. To address these issues simultaneously, we propose a joint projection and low-rank dictionary learning method using dual graph constraints. Specifically, a structured class-specific dictionary is learned in the low-dimensional space, and the discrimination is further improved by imposing a graph constraint on the coding coefficients, that maximizes the intra-class compactness and inter-class separability. We enforce structural incoherence and low-rank constraints on sub-dictionaries to reduce the redundancy among them, and also make them robust to variations and outliers. To preserve the intrinsic structure of data, we introduce a supervised neighborhood graph into the framework to make the proposed method robust to small-sized and high-dimensional data sets. Experimental results on several benchmark data sets verify the superior performance of our method for object classification of small-sized data sets, which include a considerable amount of different kinds of variation, and may have high-dimensional feature vectors.
Discrimination Enhancement with Transient Feature Analysis of a Graphene Chemical Sensor.
Nallon, Eric C; Schnee, Vincent P; Bright, Collin J; Polcha, Michael P; Li, Qiliang
2016-01-19
A graphene chemical sensor is subjected to a set of structurally and chemically similar hydrocarbon compounds consisting of toluene, o-xylene, p-xylene, and mesitylene. The fractional change in resistance of the sensor upon exposure to these compounds exhibits a similar response magnitude among compounds, whereas large variation is observed within repetitions for each compound, causing a response overlap. Therefore, traditional features depending on maximum response change will cause confusion during further discrimination and classification analysis. More robust features that are less sensitive to concentration, sampling, and drift variability would provide higher quality information. In this work, we have explored the advantage of using transient-based exponential fitting coefficients to enhance the discrimination of similar compounds. The advantages of such feature analysis to discriminate each compound is evaluated using principle component analysis (PCA). In addition, machine learning-based classification algorithms were used to compare the prediction accuracies when using fitting coefficients as features. The additional features greatly enhanced the discrimination between compounds while performing PCA and also improved the prediction accuracy by 34% when using linear discrimination analysis.
Ordinal measures for iris recognition.
Sun, Zhenan; Tan, Tieniu
2009-12-01
Images of a human iris contain rich texture information useful for identity authentication. A key and still open issue in iris recognition is how best to represent such textural information using a compact set of features (iris features). In this paper, we propose using ordinal measures for iris feature representation with the objective of characterizing qualitative relationships between iris regions rather than precise measurements of iris image structures. Such a representation may lose some image-specific information, but it achieves a good trade-off between distinctiveness and robustness. We show that ordinal measures are intrinsic features of iris patterns and largely invariant to illumination changes. Moreover, compactness and low computational complexity of ordinal measures enable highly efficient iris recognition. Ordinal measures are a general concept useful for image analysis and many variants can be derived for ordinal feature extraction. In this paper, we develop multilobe differential filters to compute ordinal measures with flexible intralobe and interlobe parameters such as location, scale, orientation, and distance. Experimental results on three public iris image databases demonstrate the effectiveness of the proposed ordinal feature models.
a Critical Review of Automated Photogrammetric Processing of Large Datasets
NASA Astrophysics Data System (ADS)
Remondino, F.; Nocerino, E.; Toschi, I.; Menna, F.
2017-08-01
The paper reports some comparisons between commercial software able to automatically process image datasets for 3D reconstruction purposes. The main aspects investigated in the work are the capability to correctly orient large sets of image of complex environments, the metric quality of the results, replicability and redundancy. Different datasets are employed, each one featuring a diverse number of images, GSDs at cm and mm resolutions, and ground truth information to perform statistical analyses of the 3D results. A summary of (photogrammetric) terms is also provided, in order to provide rigorous terms of reference for comparisons and critical analyses.
Seismic classification through sparse filter dictionaries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hickmann, Kyle Scott; Srinivasan, Gowri
We tackle a multi-label classi cation problem involving the relation between acoustic- pro le features and the measured seismogram. To isolate components of the seismo- grams unique to each class of acoustic pro le we build dictionaries of convolutional lters. The convolutional- lter dictionaries for the individual classes are then combined into a large dictionary for the entire seismogram set. A given seismogram is classi ed by computing its representation in the large dictionary and then comparing reconstruction accuracy with this representation using each of the sub-dictionaries. The sub-dictionary with the minimal reconstruction error identi es the seismogram class.
Deep machine learning provides state-of-the-art performance in image-based plant phenotyping.
Pound, Michael P; Atkinson, Jonathan A; Townsend, Alexandra J; Wilson, Michael H; Griffiths, Marcus; Jackson, Aaron S; Bulat, Adrian; Tzimiropoulos, Georgios; Wells, Darren M; Murchie, Erik H; Pridmore, Tony P; French, Andrew P
2017-10-01
In plant phenotyping, it has become important to be able to measure many features on large image sets in order to aid genetic discovery. The size of the datasets, now often captured robotically, often precludes manual inspection, hence the motivation for finding a fully automated approach. Deep learning is an emerging field that promises unparalleled results on many data analysis problems. Building on artificial neural networks, deep approaches have many more hidden layers in the network, and hence have greater discriminative and predictive power. We demonstrate the use of such approaches as part of a plant phenotyping pipeline. We show the success offered by such techniques when applied to the challenging problem of image-based plant phenotyping and demonstrate state-of-the-art results (>97% accuracy) for root and shoot feature identification and localization. We use fully automated trait identification using deep learning to identify quantitative trait loci in root architecture datasets. The majority (12 out of 14) of manually identified quantitative trait loci were also discovered using our automated approach based on deep learning detection to locate plant features. We have shown deep learning-based phenotyping to have very good detection and localization accuracy in validation and testing image sets. We have shown that such features can be used to derive meaningful biological traits, which in turn can be used in quantitative trait loci discovery pipelines. This process can be completely automated. We predict a paradigm shift in image-based phenotyping bought about by such deep learning approaches, given sufficient training sets. © The Authors 2017. Published by Oxford University Press.
PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less
Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei
2006-06-23
Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.
Three-Class Mammogram Classification Based on Descriptive CNN Features
Zhang, Qianni; Jadoon, Adeel
2017-01-01
In this paper, a novel classification technique for large data set of mammograms using a deep learning method is proposed. The proposed model targets a three-class classification study (normal, malignant, and benign cases). In our model we have presented two methods, namely, convolutional neural network-discrete wavelet (CNN-DW) and convolutional neural network-curvelet transform (CNN-CT). An augmented data set is generated by using mammogram patches. To enhance the contrast of mammogram images, the data set is filtered by contrast limited adaptive histogram equalization (CLAHE). In the CNN-DW method, enhanced mammogram images are decomposed as its four subbands by means of two-dimensional discrete wavelet transform (2D-DWT), while in the second method discrete curvelet transform (DCT) is used. In both methods, dense scale invariant feature (DSIFT) for all subbands is extracted. Input data matrix containing these subband features of all the mammogram patches is created that is processed as input to convolutional neural network (CNN). Softmax layer and support vector machine (SVM) layer are used to train CNN for classification. Proposed methods have been compared with existing methods in terms of accuracy rate, error rate, and various validation assessment measures. CNN-DW and CNN-CT have achieved accuracy rate of 81.83% and 83.74%, respectively. Simulation results clearly validate the significance and impact of our proposed model as compared to other well-known existing techniques. PMID:28191461
Three-Class Mammogram Classification Based on Descriptive CNN Features.
Jadoon, M Mohsin; Zhang, Qianni; Haq, Ihsan Ul; Butt, Sharjeel; Jadoon, Adeel
2017-01-01
In this paper, a novel classification technique for large data set of mammograms using a deep learning method is proposed. The proposed model targets a three-class classification study (normal, malignant, and benign cases). In our model we have presented two methods, namely, convolutional neural network-discrete wavelet (CNN-DW) and convolutional neural network-curvelet transform (CNN-CT). An augmented data set is generated by using mammogram patches. To enhance the contrast of mammogram images, the data set is filtered by contrast limited adaptive histogram equalization (CLAHE). In the CNN-DW method, enhanced mammogram images are decomposed as its four subbands by means of two-dimensional discrete wavelet transform (2D-DWT), while in the second method discrete curvelet transform (DCT) is used. In both methods, dense scale invariant feature (DSIFT) for all subbands is extracted. Input data matrix containing these subband features of all the mammogram patches is created that is processed as input to convolutional neural network (CNN). Softmax layer and support vector machine (SVM) layer are used to train CNN for classification. Proposed methods have been compared with existing methods in terms of accuracy rate, error rate, and various validation assessment measures. CNN-DW and CNN-CT have achieved accuracy rate of 81.83% and 83.74%, respectively. Simulation results clearly validate the significance and impact of our proposed model as compared to other well-known existing techniques.
NASA Astrophysics Data System (ADS)
Strohmeier, Dominik; Kunze, Kristina; Göbel, Klemens; Liebetrau, Judith
2013-01-01
Assessing audiovisual Quality of Experience (QoE) is a key element to ensure quality acceptance of today's multimedia products. The use of descriptive evaluation methods allows evaluating QoE preferences and the underlying QoE features jointly. From our previous evaluations on QoE for mobile 3D video we found that mainly one dimension, video quality, dominates the descriptive models. Large variations of the visual video quality in the tests may be the reason for these findings. A new study was conducted to investigate whether test sets of low QoE are described differently than those of high audiovisual QoE. Reanalysis of previous data sets seems to confirm this hypothesis. Our new study consists of a pre-test and a main test, using the Descriptive Sorted Napping method. Data sets of good-only and bad-only video quality were evaluated separately. The results show that the perception of bad QoE is mainly determined one-dimensionally by visual artifacts, whereas the perception of good quality shows multiple dimensions. Here, mainly semantic-related features of the content and affective descriptors are used by the naïve test participants. The results show that, with increasing QoE of audiovisual systems, content semantics and users' a_ective involvement will become important for assessing QoE differences.
Sensor-oriented feature usability evaluation in fingerprint segmentation
NASA Astrophysics Data System (ADS)
Li, Ying; Yin, Yilong; Yang, Gongping
2013-06-01
Existing fingerprint segmentation methods usually process fingerprint images captured by different sensors with the same feature or feature set. We propose to improve the fingerprint segmentation result in view of an important fact that images from different sensors have different characteristics for segmentation. Feature usability evaluation, which means to evaluate the usability of features to find the personalized feature or feature set for different sensors to improve the performance of segmentation. The need for feature usability evaluation for fingerprint segmentation is raised and analyzed as a new issue. To address this issue, we present a decision-tree-based feature-usability evaluation method, which utilizes a C4.5 decision tree algorithm to evaluate and pick the best suitable feature or feature set for fingerprint segmentation from a typical candidate feature set. We apply the novel method on the FVC2002 database of fingerprint images, which are acquired by four different respective sensors and technologies. Experimental results show that the accuracy of segmentation is improved, and time consumption for feature extraction is dramatically reduced with selected feature(s).
NASA Astrophysics Data System (ADS)
Giles, A. N.; Wilkie, K. M.
2008-12-01
Photo-projects have long been utilized as a way of getting students in introductory geology courses to apply what they have learned in lecture to the outcrop and landscape. While the projects have many benefits, we have found that with large-format classes of 200+ students, where a mandatory field trip is logistically impossible, many problems can arise. One problem has been that of consistent and timely grading, which can be addressed by a project that can be turned in throughout the course of the semester and by utilizing a grading rubric. Also, in many cases, students simply take photographs of "scenery" and then try to identify features/processes with little thought as to whether that particular feature/process can occur in that geologic setting (such as identifying features as having a glacial origin in a non-glaciated terrain.) These types of problem can be attributed to the student's lack of knowledge of the geology of the area within which the photographs were taken and having little to no field instruction. Many of these problems can be addressed by utilizing a term project that combines elements of both research and the traditional photo project. The student chooses a specific area/region (i.e. a national park) that the student will/has actually visit(ed) and is then required to do background research before attempting to identify features and processes in photographs they have taken from the area. Here we present details of such a project that involves students performing research activities in three stages: The history/geologic setting of the area, the specific lithology of the area, and then the hydrology of the area, with each being completed at specified times throughout the semester. The final stage is the photo project component where the student identifies and interprets the features/processes in photographs from the area. The research provides the student with a framework within which they can identify and interpret the features/processes that are likely to be seen in their area.
Einhäuser, Wolfgang; Nuthmann, Antje
2016-09-01
During natural scene viewing, humans typically attend and fixate selected locations for about 200-400 ms. Two variables characterize such "overt" attention: the probability of a location being fixated, and the fixation's duration. Both variables have been widely researched, but little is known about their relation. We use a two-step approach to investigate the relation between fixation probability and duration. In the first step, we use a large corpus of fixation data. We demonstrate that fixation probability (empirical salience) predicts fixation duration across different observers and tasks. Linear mixed-effects modeling shows that this relation is explained neither by joint dependencies on simple image features (luminance, contrast, edge density) nor by spatial biases (central bias). In the second step, we experimentally manipulate some of these features. We find that fixation probability from the corpus data still predicts fixation duration for this new set of experimental data. This holds even if stimuli are deprived of low-level images features, as long as higher level scene structure remains intact. Together, this shows a robust relation between fixation duration and probability, which does not depend on simple image features. Moreover, the study exemplifies the combination of empirical research on a large corpus of data with targeted experimental manipulations.
Samala, Ravi K; Chan, Heang-Ping; Hadjiiski, Lubomir; Helvie, Mark A; Wei, Jun; Cha, Kenny
2016-12-01
Develop a computer-aided detection (CAD) system for masses in digital breast tomosynthesis (DBT) volume using a deep convolutional neural network (DCNN) with transfer learning from mammograms. A data set containing 2282 digitized film and digital mammograms and 324 DBT volumes were collected with IRB approval. The mass of interest on the images was marked by an experienced breast radiologist as reference standard. The data set was partitioned into a training set (2282 mammograms with 2461 masses and 230 DBT views with 228 masses) and an independent test set (94 DBT views with 89 masses). For DCNN training, the region of interest (ROI) containing the mass (true positive) was extracted from each image. False positive (FP) ROIs were identified at prescreening by their previously developed CAD systems. After data augmentation, a total of 45 072 mammographic ROIs and 37 450 DBT ROIs were obtained. Data normalization and reduction of non-uniformity in the ROIs across heterogeneous data was achieved using a background correction method applied to each ROI. A DCNN with four convolutional layers and three fully connected (FC) layers was first trained on the mammography data. Jittering and dropout techniques were used to reduce overfitting. After training with the mammographic ROIs, all weights in the first three convolutional layers were frozen, and only the last convolution layer and the FC layers were randomly initialized again and trained using the DBT training ROIs. The authors compared the performances of two CAD systems for mass detection in DBT: one used the DCNN-based approach and the other used their previously developed feature-based approach for FP reduction. The prescreening stage was identical in both systems, passing the same set of mass candidates to the FP reduction stage. For the feature-based CAD system, 3D clustering and active contour method was used for segmentation; morphological, gray level, and texture features were extracted and merged with a linear discriminant classifier to score the detected masses. For the DCNN-based CAD system, ROIs from five consecutive slices centered at each candidate were passed through the trained DCNN and a mass likelihood score was generated. The performances of the CAD systems were evaluated using free-response ROC curves and the performance difference was analyzed using a non-parametric method. Before transfer learning, the DCNN trained only on mammograms with an AUC of 0.99 classified DBT masses with an AUC of 0.81 in the DBT training set. After transfer learning with DBT, the AUC improved to 0.90. For breast-based CAD detection in the test set, the sensitivity for the feature-based and the DCNN-based CAD systems was 83% and 91%, respectively, at 1 FP/DBT volume. The difference between the performances for the two systems was statistically significant (p-value < 0.05). The image patterns learned from the mammograms were transferred to the mass detection on DBT slices through the DCNN. This study demonstrated that large data sets collected from mammography are useful for developing new CAD systems for DBT, alleviating the problem and effort of collecting entirely new large data sets for the new modality.
From big data to rich data: The key features of athlete wheelchair mobility performance.
van der Slikke, R M A; Berger, M A M; Bregman, D J J; Veeger, H E J
2016-10-03
Quantitative assessment of an athlete׳s individual wheelchair mobility performance is one prerequisite needed to evaluate game performance, improve wheelchair settings and optimize training routines. Inertial Measurement Unit (IMU) based methods can be used to perform such quantitative assessment, providing a large number of kinematic data. The goal of this research was to reduce that large amount of data to a set of key features best describing wheelchair mobility performance in match play and present them in meaningful way for both scientists and athletes. To test the discriminative power, wheelchair mobility characteristics of athletes with different performance levels were compared. The wheelchair kinematics of 29 (inter-)national level athletes were measured during a match using three inertial sensors mounted on the wheelchair. Principal component analysis was used to reduce 22 kinematic outcomes to a set of six outcomes regarding linear and rotational movement; speed and acceleration; average and best performance. In addition, it was explored whether groups of athletes with known performance differences based on their impairment classification also differed with respect to these key outcomes using univariate general linear models. For all six key outcomes classification showed to be a significant factor (p<0.05). We composed a set of six key kinematic outcomes that accurately describe wheelchair mobility performance in match play. The key kinematic outcomes were displayed in an easy to interpret way, usable for athletes, coaches and scientists. This standardized representation enables comparison of different wheelchair sports regarding wheelchair mobility, but also evaluation at the level of an individual athlete. By this means, the tool could enhance further development of wheelchair sports in general. Copyright © 2016 Elsevier Ltd. All rights reserved.
Quantitative Wood Anatomy-Practical Guidelines.
von Arx, Georg; Crivellaro, Alan; Prendin, Angela L; Čufar, Katarina; Carrer, Marco
2016-01-01
Quantitative wood anatomy analyzes the variability of xylem anatomical features in trees, shrubs, and herbaceous species to address research questions related to plant functioning, growth, and environment. Among the more frequently considered anatomical features are lumen dimensions and wall thickness of conducting cells, fibers, and several ray properties. The structural properties of each xylem anatomical feature are mostly fixed once they are formed, and define to a large extent its functionality, including transport and storage of water, nutrients, sugars, and hormones, and providing mechanical support. The anatomical features can often be localized within an annual growth ring, which allows to establish intra-annual past and present structure-function relationships and its sensitivity to environmental variability. However, there are many methodological challenges to handle when aiming at producing (large) data sets of xylem anatomical data. Here we describe the different steps from wood sample collection to xylem anatomical data, provide guidance and identify pitfalls, and present different image-analysis tools for the quantification of anatomical features, in particular conducting cells. We show that each data production step from sample collection in the field, microslide preparation in the lab, image capturing through an optical microscope and image analysis with specific tools can readily introduce measurement errors between 5 and 30% and more, whereby the magnitude usually increases the smaller the anatomical features. Such measurement errors-if not avoided or corrected-may make it impossible to extract meaningful xylem anatomical data in light of the rather small range of variability in many anatomical features as observed, for example, within time series of individual plants. Following a rigid protocol and quality control as proposed in this paper is thus mandatory to use quantitative data of xylem anatomical features as a powerful source for many research topics.
Quantitative Wood Anatomy—Practical Guidelines
von Arx, Georg; Crivellaro, Alan; Prendin, Angela L.; Čufar, Katarina; Carrer, Marco
2016-01-01
Quantitative wood anatomy analyzes the variability of xylem anatomical features in trees, shrubs, and herbaceous species to address research questions related to plant functioning, growth, and environment. Among the more frequently considered anatomical features are lumen dimensions and wall thickness of conducting cells, fibers, and several ray properties. The structural properties of each xylem anatomical feature are mostly fixed once they are formed, and define to a large extent its functionality, including transport and storage of water, nutrients, sugars, and hormones, and providing mechanical support. The anatomical features can often be localized within an annual growth ring, which allows to establish intra-annual past and present structure-function relationships and its sensitivity to environmental variability. However, there are many methodological challenges to handle when aiming at producing (large) data sets of xylem anatomical data. Here we describe the different steps from wood sample collection to xylem anatomical data, provide guidance and identify pitfalls, and present different image-analysis tools for the quantification of anatomical features, in particular conducting cells. We show that each data production step from sample collection in the field, microslide preparation in the lab, image capturing through an optical microscope and image analysis with specific tools can readily introduce measurement errors between 5 and 30% and more, whereby the magnitude usually increases the smaller the anatomical features. Such measurement errors—if not avoided or corrected—may make it impossible to extract meaningful xylem anatomical data in light of the rather small range of variability in many anatomical features as observed, for example, within time series of individual plants. Following a rigid protocol and quality control as proposed in this paper is thus mandatory to use quantitative data of xylem anatomical features as a powerful source for many research topics. PMID:27375641
Feature Selection and Pedestrian Detection Based on Sparse Representation.
Yao, Shihong; Wang, Tao; Shen, Weiming; Pan, Shaoming; Chong, Yanwen; Ding, Fei
2015-01-01
Pedestrian detection have been currently devoted to the extraction of effective pedestrian features, which has become one of the obstacles in pedestrian detection application according to the variety of pedestrian features and their large dimension. Based on the theoretical analysis of six frequently-used features, SIFT, SURF, Haar, HOG, LBP and LSS, and their comparison with experimental results, this paper screens out the sparse feature subsets via sparse representation to investigate whether the sparse subsets have the same description abilities and the most stable features. When any two of the six features are fused, the fusion feature is sparsely represented to obtain its important components. Sparse subsets of the fusion features can be rapidly generated by avoiding calculation of the corresponding index of dimension numbers of these feature descriptors; thus, the calculation speed of the feature dimension reduction is improved and the pedestrian detection time is reduced. Experimental results show that sparse feature subsets are capable of keeping the important components of these six feature descriptors. The sparse features of HOG and LSS possess the same description ability and consume less time compared with their full features. The ratios of the sparse feature subsets of HOG and LSS to their full sets are the highest among the six, and thus these two features can be used to best describe the characteristics of the pedestrian and the sparse feature subsets of the combination of HOG-LSS show better distinguishing ability and parsimony.
Zhang, Junming; Wu, Yan
2018-03-28
Many systems are developed for automatic sleep stage classification. However, nearly all models are based on handcrafted features. Because of the large feature space, there are so many features that feature selection should be used. Meanwhile, designing handcrafted features is a difficult and time-consuming task because the feature designing needs domain knowledge of experienced experts. Results vary when different sets of features are chosen to identify sleep stages. Additionally, many features that we may be unaware of exist. However, these features may be important for sleep stage classification. Therefore, a new sleep stage classification system, which is based on the complex-valued convolutional neural network (CCNN), is proposed in this study. Unlike the existing sleep stage methods, our method can automatically extract features from raw electroencephalography data and then classify sleep stage based on the learned features. Additionally, we also prove that the decision boundaries for the real and imaginary parts of a complex-valued convolutional neuron intersect orthogonally. The classification performances of handcrafted features are compared with those of learned features via CCNN. Experimental results show that the proposed method is comparable to the existing methods. CCNN obtains a better classification performance and considerably faster convergence speed than convolutional neural network. Experimental results also show that the proposed method is a useful decision-support tool for automatic sleep stage classification.
Amano, Ken-Ichi; Yoshidome, Takashi; Iwaki, Mitsuhiro; Suzuki, Makoto; Kinoshita, Masahiro
2010-07-28
We report a new progress in elucidating the mechanism of the unidirectional movement of a linear-motor protein (e.g., myosin) along a filament (e.g., F-actin). The basic concept emphasized here is that a potential field is entropically formed for the protein on the filament immersed in solvent due to the effect of the translational displacement of solvent molecules. The entropic potential field is strongly dependent on geometric features of the protein and the filament, their overall shapes as well as details of the polyatomic structures. The features and the corresponding field are judiciously adjusted by the binding of adenosine triphosphate (ATP) to the protein, hydrolysis of ATP into adenosine diphosphate (ADP)+Pi, and release of Pi and ADP. As the first step, we propose the following physical picture: The potential field formed along the filament for the protein without the binding of ATP or ADP+Pi to it is largely different from that for the protein with the binding, and the directed movement is realized by repeated switches from one of the fields to the other. To illustrate the picture, we analyze the spatial distribution of the entropic potential between a large solute and a large body using the three-dimensional integral equation theory. The solute is modeled as a large hard sphere. Two model filaments are considered as the body: model 1 is a set of one-dimensionally connected large hard spheres and model 2 is a double helical structure formed by two sets of connected large hard spheres. The solute and the filament are immersed in small hard spheres forming the solvent. The major findings are as follows. The solute is strongly confined within a narrow space in contact with the filament. Within the space there are locations with sharply deep local potential minima along the filament, and the distance between two adjacent locations is equal to the diameter of the large spheres constituting the filament. The potential minima form a ringlike domain in model 1 while they form a pointlike one in model 2. We then examine the effects of geometric features of the solute on the amplitudes and asymmetry of the entropic potential field acting on the solute along the filament. A large aspherical solute with a cleft near the solute-filament interface, which mimics the myosin motor domain, is considered in the examination. Thus, the two fields in our physical picture described above are qualitatively reproduced. The factors to be taken into account in further studies are also discussed.
Custom Super-Resolution Microscope for the Structural Analysis of Nanostructures
2018-05-29
research community. As part of our validation of the new design approach, we performed two - color imaging of pairs of adjacent oligo probes hybridized...nanostructures and biological targets. Our microscope features a large field of view and custom optics that facilitate 3D imaging and enhanced contrast in...our imaging throughput by creating two microscopy platforms for high-throughput, super-resolution materials characterization, with the AO set-up being
Brock, John C.; Krabill, William; Sallenger, Asbury H.
2004-01-01
In order to reap the potential of airborne lidar surveys to provide geological information useful in understanding coastal sedimentary processes acting on various time scales, a new set of analysis methods are needed. This paper presents a multi-temporal lidar analysis of north Assateague Island, Maryland, and demonstrates the calculation of lidar metrics that condense barrier island morphology and morphological change into attributed linear features that may be used to analyze trends in coastal evolution. The new methods proposed in this paper are also of significant practical value, because lidar metric analysis reduces large volumes of point elevations into linear features attributed with essential morphological variables that are ideally suited for inclusion in Geographic Information Systems. A morphodynamic classification of north Assategue Island for a recent 10 month time period that is based on the recognition of simple patterns described by lidar change metrics is presented. Such morphodynamic classification reveals the relative magnitude and the fine scale alongshore variation in the importance of coastal changes over the study area during a defined time period. More generally, through the presentation of this morphodynamic classification of north Assateague Island, the value of lidar metrics in both examining large lidar data sets for coherent trends and in building hypotheses regarding processes driving barrier evolution is demonstrated
NASA Astrophysics Data System (ADS)
Selsam, Peter; Schwartze, Christian
2016-10-01
Providing software solutions via internet has been known for quite some time and is now an increasing trend marketed as "software as a service". A lot of business units accept the new methods and streamlined IT strategies by offering web-based infrastructures for external software usage - but geospatial applications featuring very specialized services or functionalities on demand are still rare. Originally applied in desktop environments, the ILMSimage tool for remote sensing image analysis and classification was modified in its communicating structures and enabled for running on a high-power server and benefiting from Tavema software. On top, a GIS-like and web-based user interface guides the user through the different steps in ILMSimage. ILMSimage combines object oriented image segmentation with pattern recognition features. Basic image elements form a construction set to model for large image objects with diverse and complex appearance. There is no need for the user to set up detailed object definitions. Training is done by delineating one or more typical examples (templates) of the desired object using a simple vector polygon. The template can be large and does not need to be homogeneous. The template is completely independent from the segmentation. The object definition is done completely by the software.
Distinguishing features of Excited Delirium Syndrome in non-fatal use of force encounters.
Baldwin, Simon; Hall, Christine; Bennell, Craig; Blaskovits, Brittany; Lawrence, Chris
2016-07-01
The frequency with which the police encounter non-fatal cases of Excited Delirium Syndrome (ExDS) has not been well studied. To date only a single prospective, epidemiologic study has been completed to determine the prevalence of the features of ExDS in police use of force (UoF) encounters. We examined a cluster of previously published features associated with ExDS to establish if these features were consistently recognizable across policing populations, thus demonstrating reproducibility. We further sought to determine whether any feature or number of concomitant features were likely to have physiologic significance. These are important first steps in determining a case definition of ExDS in a law enforcement and medical setting. A prospective evaluation of a consecutive cohort of subjects involved in UoF encounters with police was conducted. Data were collected through the UoF reporting database of a large Canadian law enforcement agency from January, 2012 to December, 2013. The ten core characteristics of ExDS that have been observed in past research were documented by officers and, consistent with previous research, the presence of six or more features was used to identify probable cases of ExDS and a state of medical emergency. UoF occurred in 4799 of 5.4 million police-public interactions (0.09%). Of the UoF encounters, 73 (1.5%) subjects displayed six or more features of ExDS. Upwards of 9.2% of these subjects could be expected to be at risk of sudden and unexpected arrest-related death (ARD). Features with the highest odds of being presented with a large number of concomitant features included "Does not Fatigue", "Superhuman Strength" and "Tactile Hyperthermia" (287, 137 and 93 times higher, respectively). Moreover, "Tactile Hyperthermia" demonstrated the highest odds of being presented in individuals with a large number of features as opposed to those with fewer (33 times higher). We demonstrate that there is the ability for law enforcement officers to consistently recognize and report features of ExDS that have been associated with ARD. The varying presence of features across the examined categories indicates that some features are more distinguishing than others, which may enable narrowing the scope of features that represent ExDS and understanding its pathophysiology. The current debate surrounding whether or not ExDS exists limits first responders and emergency physicians in their ability to increase awareness, improve training and interventions, and design appropriate policy and response protocols to reduce ARDs. Crown Copyright © 2016. Published by Elsevier Ltd. All rights reserved.
Comparing Pattern Recognition Feature Sets for Sorting Triples in the FIRST Database
NASA Astrophysics Data System (ADS)
Proctor, D. D.
2006-07-01
Pattern recognition techniques have been used with increasing success for coping with the tremendous amounts of data being generated by automated surveys. Usually this process involves construction of training sets, the typical examples of data with known classifications. Given a feature set, along with the training set, statistical methods can be employed to generate a classifier. The classifier is then applied to process the remaining data. Feature set selection, however, is still an issue. This paper presents techniques developed for accommodating data for which a substantive portion of the training set cannot be classified unambiguously, a typical case for low-resolution data. Significance tests on the sort-ordered, sample-size-normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. The technique is applied to comparing feature sets for sorting a particular radio galaxy morphology, bent-doubles, from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) database. Also examined are alternative functional forms for feature sets. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The technique also may be applied to situations for which, although accurate classifications are available, the feature set is clearly inadequate, but is desired nonetheless to make the best of available information.
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
Liu, Wei; Li, Dong; Zhang, Jiyang; Zhu, Yunping; He, Fuchu
2006-11-27
Measuring each protein's importance in signaling networks helps to identify the crucial proteins in a cellular process, find the fragile portion of the biology system and further assist for disease therapy. However, there are relatively few methods to evaluate the importance of proteins in signaling networks. We developed a novel network feature to evaluate the importance of proteins in signal transduction networks, that we call SigFlux, based on the concept of minimal path sets (MPSs). An MPS is a minimal set of nodes that can perform the signal propagation from ligands to target genes or feedback loops. We define SigFlux as the number of MPSs in which each protein is involved. We applied this network feature to the large signal transduction network in the hippocampal CA1 neuron of mice. Significant correlations were simultaneously observed between SigFlux and both the essentiality and evolutionary rate of genes. Compared with another commonly used network feature, connectivity, SigFlux has similar or better ability as connectivity to reflect a protein's essentiality. Further classification according to protein function demonstrates that high SigFlux, low connectivity proteins are abundant in receptors and transcriptional factors, indicating that SigFlux candescribe the importance of proteins within the context of the entire network. SigFlux is a useful network feature in signal transduction networks that allows the prediction of the essentiality and conservation of proteins. With this novel network feature, proteins that participate in more pathways or feedback loops within a signaling network are proved far more likely to be essential and conserved during evolution than their counterparts.
The Distribution and Behaviour of Photospheric Magnetic Features
NASA Astrophysics Data System (ADS)
Parnell, C. E.; Lamb, D. A.; DeForest, C. E.
2014-12-01
Over the past two decades enormous amounts of data on the magnetic fields of the solar photosphere have been produced by both ground-based (Kitt Peak & SOLIS), as well as space-based instruments (MDI, Hinode & HMI). In order to study the behaviour and distribution of photospheric magnetic features, efficient automated detection routines need to be utilised to identify and track magnetic features. In this talk, I will discuss the pros and cons of different automated magnetic feature identification and tracking routines with a special focus on the requirements of these codes to deal with the large data sets produced by HMI. By patching together results from Hinode and MDI (high-res & full-disk), the fluxes of magnetic features were found to follow a power-law over 5 orders of magnitude. At the strong flux tail of this distribution, the power law was found to fall off at solar minimum, but was maintained over all fluxes during solar maximum. However, the point of deflection in the power-law distribution occurs at a patching point between instruments and so questions remain over the reasons for the deflection. The feature fluxes determined from the superb high-resolution HMI data covers almost all of the 5 orders of magnitude. Considering both solar mimimum and solar maximum HMI data sets, we investigate whether the power-law over 5 orders of magnitude in flux still holds. Furthermore, we investigate the behaviour of magnetic features in order to probe the nature of their origin. In particular, we analyse small-scale flux emergence events using HMI data to investigate the existence of a small-scale dynamo just below the solar photosphere.
Uppal, Karan; Soltow, Quinlyn A; Strobel, Frederick H; Pittard, W Stephen; Gernert, Kim M; Yu, Tianwei; Jones, Dean P
2013-01-16
Detection of low abundance metabolites is important for de novo mapping of metabolic pathways related to diet, microbiome or environmental exposures. Multiple algorithms are available to extract m/z features from liquid chromatography-mass spectral data in a conservative manner, which tends to preclude detection of low abundance chemicals and chemicals found in small subsets of samples. The present study provides software to enhance such algorithms for feature detection, quality assessment, and annotation. xMSanalyzer is a set of utilities for automated processing of metabolomics data. The utilites can be classified into four main modules to: 1) improve feature detection for replicate analyses by systematic re-extraction with multiple parameter settings and data merger to optimize the balance between sensitivity and reliability, 2) evaluate sample quality and feature consistency, 3) detect feature overlap between datasets, and 4) characterize high-resolution m/z matches to small molecule metabolites and biological pathways using multiple chemical databases. The package was tested with plasma samples and shown to more than double the number of features extracted while improving quantitative reliability of detection. MS/MS analysis of a random subset of peaks that were exclusively detected using xMSanalyzer confirmed that the optimization scheme improves detection of real metabolites. xMSanalyzer is a package of utilities for data extraction, quality control assessment, detection of overlapping and unique metabolites in multiple datasets, and batch annotation of metabolites. The program was designed to integrate with existing packages such as apLCMS and XCMS, but the framework can also be used to enhance data extraction for other LC/MS data software.
Investigating a link between large and small-scale chaos features on Europa
NASA Astrophysics Data System (ADS)
Tognetti, L.; Rhoden, A.; Nelson, D. M.
2017-12-01
Chaos is one of the most recognizable, and studied, features on Europa's surface. Most models of chaos formation invoke liquid water at shallow depths within the ice shell; the liquid destabilizes the overlying ice layer, breaking it into mobile rafts and destroying pre-existing terrain. This class of model has been applied to both large-scale chaos like Conamara and small-scale features (i.e. microchaos), which are typically <10 km in diameter. Currently unknown, however, is whether both large-scale and small-scale features are produced together, e.g. through a network of smaller sills linked to a larger liquid water pocket. If microchaos features do form as satellites of large-scale chaos features, we would expect a drop off in the number density of microchaos with increasing distance from the large chaos feature; the trend should not be observed in regions without large-scale chaos features. Here, we test the hypothesis that large chaos features create "satellite" systems of smaller chaos features. Either outcome will help us better understand the relationship between large-scale chaos and microchaos. We focus first on regions surrounding the large chaos features Conamara and Murias (e.g. the Mitten). We map all chaos features within 90,000 sq km of the main chaos feature and assign each one a ranking (High Confidence, Probable, or Low Confidence) based on the observed characteristics of each feature. In particular, we look for a distinct boundary, loss of preexisting terrain, the existence of rafts or blocks, and the overall smoothness of the feature. We also note features that are chaos-like but lack sufficient characteristics to be classified as chaos. We then apply the same criteria to map microchaos features in regions of similar area ( 90,000 sq km) that lack large chaos features. By plotting the distribution of microchaos with distance from the center point of the large chaos feature or the mapping region (for the cases without a large feature), we determine whether there is a distinct signature linking large-scale chaos features with nearby microchaos. We discuss the implications of these results on the process of chaos formation and the extent of liquid water within Europa's ice shell.
Davie, Stuart J; Di Pasquale, Nicodemo; Popelier, Paul L A
2016-10-15
Machine learning algorithms have been demonstrated to predict atomistic properties approaching the accuracy of quantum chemical calculations at significantly less computational cost. Difficulties arise, however, when attempting to apply these techniques to large systems, or systems possessing excessive conformational freedom. In this article, the machine learning method kriging is applied to predict both the intra-atomic and interatomic energies, as well as the electrostatic multipole moments, of the atoms of a water molecule at the center of a 10 water molecule (decamer) cluster. Unlike previous work, where the properties of small water clusters were predicted using a molecular local frame, and where training set inputs (features) were based on atomic index, a variety of feature definitions and coordinate frames are considered here to increase prediction accuracy. It is shown that, for a water molecule at the center of a decamer, no single method of defining features or coordinate schemes is optimal for every property. However, explicitly accounting for the structure of the first solvation shell in the definition of the features of the kriging training set, and centring the coordinate frame on the atom-of-interest will, in general, return better predictions than models that apply the standard methods of feature definition, or a molecular coordinate frame. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Arav, Reuma; Filin, Sagi
2016-06-01
Airborne laser scans present an optimal tool to describe geomorphological features in natural environments. However, a challenge arises in the detection of such phenomena, as they are embedded in the topography, tend to blend into their surroundings and leave only a subtle signature within the data. Most object-recognition studies address mainly urban environments and follow a general pipeline where the data are partitioned into segments with uniform properties. These approaches are restricted to man-made domain and are capable to handle limited features that answer a well-defined geometric form. As natural environments present a more complex set of features, the common interpretation of the data is still manual at large. In this paper, we propose a data-aware detection scheme, unbound to specific domains or shapes. We define the recognition question as an energy optimization problem, solved by variational means. Our approach, based on the level-set method, characterizes geometrically local surfaces within the data, and uses these characteristics as potential field for minimization. The main advantage here is that it allows topological changes of the evolving curves, such as merging and breaking. We demonstrate the proposed methodology on the detection of collapse sinkholes.
When will Low-Contrast Features be Visible in a STEM X-Ray Spectrum Image?
Parish, Chad M
2015-06-01
When will a small or low-contrast feature, such as an embedded second-phase particle, be visible in a scanning transmission electron microscopy (STEM) X-ray map? This work illustrates a computationally inexpensive method to simulate X-ray maps and spectrum images (SIs), based upon the equations of X-ray generation and detection. To particularize the general procedure, an example of nanostructured ferritic alloy (NFA) containing nm-sized Y2Ti2O7 embedded precipitates in ferritic stainless steel matrix is chosen. The proposed model produces physically appearing simulated SI data sets, which can either be reduced to X-ray dot maps or analyzed via multivariate statistical analysis. Comparison to NFA X-ray maps acquired using three different STEM instruments match the generated simulations quite well, despite the large number of simplifying assumptions used. A figure of merit of electron dose multiplied by X-ray collection solid angle is proposed to compare feature detectability from one data set (simulated or experimental) to another. The proposed method can scope experiments that are feasible under specific analysis conditions on a given microscope. Future applications, such as spallation proton-neutron irradiations, core-shell nanoparticles, or dopants in polycrystalline photovoltaic solar cells, are proposed.
Proposed patient motion monitoring system using feature point tracking with a web camera.
Miura, Hideharu; Ozawa, Shuichi; Matsuura, Takaaki; Yamada, Kiyoshi; Nagata, Yasushi
2017-12-01
Patient motion monitoring systems play an important role in providing accurate treatment dose delivery. We propose a system that utilizes a web camera (frame rate up to 30 fps, maximum resolution of 640 × 480 pixels) and an in-house image processing software (developed using Microsoft Visual C++ and OpenCV). This system is simple to use and convenient to set up. The pyramidal Lucas-Kanade method was applied to calculate motions for each feature point by analysing two consecutive frames. The image processing software employs a color scheme where the defined feature points are blue under stable (no movement) conditions and turn red along with a warning message and an audio signal (beeping alarm) for large patient movements. The initial position of the marker was used by the program to determine the marker positions in all the frames. The software generates a text file that contains the calculated motion for each frame and saves it as a compressed audio video interleave (AVI) file. We proposed a patient motion monitoring system using a web camera, which is simple and convenient to set up, to increase the safety of treatment delivery.
Long term orbital storage of cryogenic propellants for advanced space transportation missions
NASA Technical Reports Server (NTRS)
Schuster, John R.; Brown, Norman S.
1987-01-01
A comprehensive study has developed the major features of a large capacity orbital propellant depot for the space-based, cryogenic OTV. The study has treated both the Dual-Keel Space Station and co-orbiting platforms as the accommodations base for the propellant storage facilities, and trades have examined both tethered and hard-docked options. Five tank set concepts were developed for storing the propellants, and along with layout options for the station and platform, were evaluated from the standpoints of servicing, propellant delivery, boiloff, micrometeoroid/debris shielding, development requirements, and cost. These trades led to the recommendation that an all-passive storage concept be considered for the platform and an actively refrigerated concept providing for reliquefaction of all boiloff be considered for the Space Station. The tank sets are modular, each storing up to 45,400 kg of LO2/LH2, and employ many advanced features to provide for microgravity fluid management and to limit boiloff. The features include such technologies as zero-gravity mass gauging, total communication capillary liquid acquisition devices, autogenous pressurization, thermodynamic vent systems, thick multilayer insulation, vapor-cooled shields, solar-selective coatings, advanced micrometeoroid/debris protection systems, and long-lived cryogenic refrigeration systems.
ViA: a perceptual visualization assistant
NASA Astrophysics Data System (ADS)
Healey, Chris G.; St. Amant, Robert; Elhaddad, Mahmoud S.
2000-05-01
This paper describes an automated visualized assistant called ViA. ViA is designed to help users construct perceptually optical visualizations to represent, explore, and analyze large, complex, multidimensional datasets. We have approached this problem by studying what is known about the control of human visual attention. By harnessing the low-level human visual system, we can support our dual goals of rapid and accurate visualization. Perceptual guidelines that we have built using psychophysical experiments form the basis for ViA. ViA uses modified mixed-initiative planning algorithms from artificial intelligence to search of perceptually optical data attribute to visual feature mappings. Our perceptual guidelines are integrated into evaluation engines that provide evaluation weights for a given data-feature mapping, and hints on how that mapping might be improved. ViA begins by asking users a set of simple questions about their dataset and the analysis tasks they want to perform. Answers to these questions are used in combination with the evaluation engines to identify and intelligently pursue promising data-feature mappings. The result is an automatically-generated set of mappings that are perceptually salient, but that also respect the context of the dataset and users' preferences about how they want to visualize their data.
Hot spine loops and the nature of a late-phase solar flare
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sun, Xudong; Todd Hoeksema, J.; Liu, Yang
2013-12-01
The fan-spine magnetic topology is believed to be responsible for many curious features in solar explosive events. A spine field line links distinct flux domains, but direct observation of such a feature has been rare. Here we report a unique event observed by the Solar Dynamic Observatory where a set of hot coronal loops (over 10 MK) connected to a quasi-circular chromospheric ribbon at one end and a remote brightening at the other. Magnetic field extrapolation suggests that these loops are partly tracers of the evolving spine field line. Continuous slipping- and null-point-type reconnections were likely at work, energizing themore » loop plasma and transferring magnetic flux within and across the fan quasi-separatrix layer. We argue that the initial reconnection is of the 'breakout' type, which then transitioned to a more violent flare reconnection with an eruption from the fan dome. Significant magnetic field changes are expected and indeed ensued. This event also features an extreme-ultraviolet (EUV) late phase, i.e., a delayed secondary emission peak in warm EUV lines (about 2-7 MK). We show that this peak comes from the cooling of large post-reconnection loops beside and above the compact fan, a direct product of eruption in such topological settings. The long cooling time of the large arcades contributes to the long delay; additional heating may also be required. Our result demonstrates the critical nature of cross-scale magnetic coupling—topological change in a sub-system may lead to explosions on a much larger scale.« less
Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE.
Chen, Qi; Meng, Zhaopeng; Liu, Xinyi; Jin, Qianguo; Su, Ran
2018-06-15
Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE.
Automatic morphological classification of galaxy images
Shamir, Lior
2009-01-01
We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594
NASA Technical Reports Server (NTRS)
Connerney, John E.; Acuna, Mario H.; Ness, Norman F.; Wasilewski, Peter J.
1999-01-01
The Mars Global Surveyor spacecraft, in a highly elliptical polar orbit about Mars, obtained vector magnetic field measurements just above the surface of Mars (altitudes > 100 kilometers). Crustal magnetization, largely confined to the most ancient, heavily cratered Mars highlands, is frequently organized in east-west trending linear features, the largest of which extends over 2000 km. A representative set of survey passes are modeled using uniformly magnetized thin plates and a generalized inverse methodology. Crustal remanent magnetization exceeds that deduced for the largest terrestrial magnetic anomalies by more than an order of magnitude. Groups of quasi-parallel linear features of alternating magnetic polarity are found. They are reminiscent of similar magnetic features associated with sea floor spreading and crustal genesis on Earth but with a much larger spatial scale.
A novel feature extraction approach for microarray data based on multi-algorithm fusion
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions. PMID:25780277
A novel feature extraction approach for microarray data based on multi-algorithm fusion.
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.
Facial recognition using multisensor images based on localized kernel eigen spaces.
Gundimada, Satyanadh; Asari, Vijayan K
2009-06-01
A feature selection technique along with an information fusion procedure for improving the recognition accuracy of a visual and thermal image-based facial recognition system is presented in this paper. A novel modular kernel eigenspaces approach is developed and implemented on the phase congruency feature maps extracted from the visual and thermal images individually. Smaller sub-regions from a predefined neighborhood within the phase congruency images of the training samples are merged to obtain a large set of features. These features are then projected into higher dimensional spaces using kernel methods. The proposed localized nonlinear feature selection procedure helps to overcome the bottlenecks of illumination variations, partial occlusions, expression variations and variations due to temperature changes that affect the visual and thermal face recognition techniques. AR and Equinox databases are used for experimentation and evaluation of the proposed technique. The proposed feature selection procedure has greatly improved the recognition accuracy for both the visual and thermal images when compared to conventional techniques. Also, a decision level fusion methodology is presented which along with the feature selection procedure has outperformed various other face recognition techniques in terms of recognition accuracy.
NASA Astrophysics Data System (ADS)
Stelten, S. A.; Gallus, W. A., Jr.
2015-12-01
A large portion of precipitation seen in the Great Plains region of the United States falls from nocturnal convection. Quite often, nocturnally initiated convection may grow upscale into a Mesoscale Convective System (MCS) that in turn may cause high impact weather events such as severe wind, flooding, and even tornadoes. Thus, correctly predicting nocturnal convective initiation is an integral part of forecasting for the Great Plains. Unfortunately, it is also one of the most challenging aspects of forecasting for this region. Many forecasters familiar with the Great Plains region have noted that elevated nocturnal convective initiation seems to favor a few distinct and rather diverse modes, which pose varying degrees of forecasting difficulties. This study investigates four of these modes, including initiation caused by the interaction of the low level jet and a frontal feature, initiation at the nose of the low level jet without the presence of a frontal feature, linear features ahead of and perpendicular to a forward propagating MCS, and initiation occurring with no discernible large scale forcing mechanism. Improving elevated nocturnal convective initiation forecasts was one of the primary goals of the Plains Elevated Convection At Night (PECAN) field campaign that took place from June 1 to July 15, 2015, which collected a wealth of convective initiation data. To coincide with these data sets, nocturnal convective initiation episodes from the 2015 summer season were classified into each of the aforementioned groups. This allowed for a thorough investigation of the frequency of each type of initiation event, as well as identification of typical characteristics of the atmosphere (forcing mechanisms present, available instability, strength/location of low level jet, etc.) during each event type. Then, using archived model data and the vast data sets collected during the PECAN field campaign, model performance during PECAN for each convective initiation mode was compared to the high quality data sets in order to flesh out why certain convective initiation modes may be more difficult to forecast than others.
Diagnostic and prognostic histopathology system using morphometric indices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parvin, Bahram; Chang, Hang; Han, Ju
Determining at least one of a prognosis or a therapy for a patient based on a stained tissue section of the patient. An image of a stained tissue section of a patient is processed by a processing device. A set of features values for a set of cell-based features is extracted from the processed image, and the processed image is associated with a particular cluster of a plurality of clusters based on the set of feature values, where the plurality of clusters is defined with respect to a feature space corresponding to the set of features.
NASA Astrophysics Data System (ADS)
Schumacher, R.; Schimpf, H.; Schiller, J.
2011-06-01
The most challenging problem of Automatic Target Recognition (ATR) is the extraction of robust and independent target features which describe the target unambiguously. These features have to be robust and invariant in different senses: in time, between aspect views (azimuth and elevation angle), between target motion (translation and rotation) and between different target variants. Especially for ground moving targets in military applications an irregular target motion is typical, so that a strong variation of the backscattered radar signal with azimuth and elevation angle makes the extraction of stable and robust features most difficult. For ATR based on High Range Resolution (HRR) profiles and / or Inverse Synthetic Aperture Radar (ISAR) images it is crucial that the reference dataset consists of stable and robust features, which, among others, will depend on the target aspect and depression angle amongst others. Here it is important to find an adequate data grid for an efficient data coverage in the reference dataset for ATR. In this paper the variability of the backscattered radar signals of target scattering centers is analyzed for different HRR profiles and ISAR images from measured turntable datasets of ground targets under controlled conditions. Especially the dependency of the features on the elevation angle is analyzed regarding to the ATR of large strip SAR data with a large range of depression angles by using available (I)SAR datasets as reference. In this work the robustness of these scattering centers is analyzed by extracting their amplitude, phase and position. Therefore turntable measurements under controlled conditions were performed targeting an artificial military reference object called STANDCAM. Measures referring to variability, similarity, robustness and separability regarding the scattering centers are defined. The dependency of the scattering behaviour with respect to azimuth and elevation variations is analyzed. Additionally generic types of features (geometrical, statistical), which can be derived especially from (I)SAR images, are applied to the ATR-task. Therefore subsequently the dependence of individual feature values as well as the feature statistics on aspect (i.e. azimuth and elevation) are presented. The Kolmogorov-Smirnov distance will be used to show how the feature statistics is influenced by varying elevation angles. Finally, confusion matrices are computed between the STANDCAM target at all eleven elevation angles. This helps to assess the robustness of ATR performance under the influence of aspect angle deviations between training set and test set.
Incremental wind tunnel testing of high lift systems
NASA Astrophysics Data System (ADS)
Victor, Pricop Mihai; Mircea, Boscoianu; Daniel-Eugeniu, Crunteanu
2016-06-01
Efficiency of trailing edge high lift systems is essential for long range future transport aircrafts evolving in the direction of laminar wings, because they have to compensate for the low performance of the leading edge devices. Modern high lift systems are subject of high performance requirements and constrained to simple actuation, combined with a reduced number of aerodynamic elements. Passive or active flow control is thus required for the performance enhancement. An experimental investigation of reduced kinematics flap combined with passive flow control took place in a low speed wind tunnel. The most important features of the experimental setup are the relatively large size, corresponding to a Reynolds number of about 2 Million, the sweep angle of 30 degrees corresponding to long range airliners with high sweep angle wings and the large number of flap settings and mechanical vortex generators. The model description, flap settings, methodology and results are presented.
Matching by linear programming and successive convexification.
Jiang, Hao; Drew, Mark S; Li, Ze-Nian
2007-06-01
We present a novel convex programming scheme to solve matching problems, focusing on the challenging problem of matching in a large search range and with cluttered background. Matching is formulated as metric labeling with L1 regularization terms, for which we propose a novel linear programming relaxation method and an efficient successive convexification implementation. The unique feature of the proposed relaxation scheme is that a much smaller set of basis labels is used to represent the original label space. This greatly reduces the size of the searching space. A successive convexification scheme solves the labeling problem in a coarse to fine manner. Importantly, the original cost function is reconvexified at each stage, in the new focus region only, and the focus region is updated so as to refine the searching result. This makes the method well-suited for large label set matching. Experiments demonstrate successful applications of the proposed matching scheme in object detection, motion estimation, and tracking.
The FRUITY database on AGB stars: past, present and future
NASA Astrophysics Data System (ADS)
Cristallo, S.; Piersanti, L.; Straniero, O.
2016-01-01
We present and show the features of the FRUITY database, an interactive web- based interface devoted to the nucleosynthesis in AGB stars. We describe the current available set of AGB models (largely expanded with respect to the original one) with masses in the range 1.3≤M/M⊙≤3.0 and metallicities -2.15 ≤[Fe/H]≤+0.15. We illustrate the details of our s-process surface distributions and we compare our results to observations. Moreover, we introduce a new set of models where the effects of rotation are taken into account. Finally, we shortly describe next planned upgrades.
Anomaly Detection Using an Ensemble of Feature Models
Noto, Keith; Brodley, Carla; Slonim, Donna
2011-01-01
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of “normal” training data points in a chosen representation of the feature space. For some data sets, the normal data may not have discernible positions in feature space, but do have consistent relationships among some features that fail to appear in the anomalous examples. Our approach learns to predict the values of training set features from the values of other features. After we have formed an ensemble of predictors, we apply this ensemble to new data points. To combine the contribution of each predictor in our ensemble, we have developed a novel, information-theoretic anomaly measure that our experimental results show selects against noisy and irrelevant features. Our results on 47 data sets show that for most data sets, this approach significantly improves performance over current state-of-the-art feature space distance and density-based approaches. PMID:22020249
Hanselman, Paul; Rozek, Christopher S.; Grigg, Jeffrey; Borman, Geoffrey D.
2016-01-01
Brief, targeted self-affirmation writing exercises have recently been offered as a way to reduce racial achievement gaps, but evidence about their effects in educational settings is mixed, leaving ambiguity about the likely benefits of these strategies if implemented broadly. A key limitation in interpreting these mixed results is that they come from studies conducted by different research teams with different procedures in different settings; it is therefore impossible to isolate whether different effects are the result of theorized heterogeneity, unidentified moderators, or idiosyncratic features of the different studies. We addressed this limitation by conducting a well-powered replication of self-affirmation in a setting where a previous large-scale field experiment demonstrated significant positive impacts, using the same procedures. We found no evidence of effects in this replication study and estimates were precise enough to reject benefits larger than an effect size of 0.10. These null effects were significantly different from persistent benefits in the prior study in the same setting, and extensive testing revealed that currently theorized moderators of self-affirmation effects could not explain the difference. These results highlight the potential fragility of self-affirmation in educational settings when implemented widely and the need for new theory, measures, and evidence about the necessary conditions for self-affirmation success. PMID:28450753
Fast detection of vascular plaque in optical coherence tomography images using a reduced feature set
NASA Astrophysics Data System (ADS)
Prakash, Ammu; Ocana Macias, Mariano; Hewko, Mark; Sowa, Michael; Sherif, Sherif
2018-03-01
Optical coherence tomography (OCT) images are capable of detecting vascular plaque by using the full set of 26 Haralick textural features and a standard K-means clustering algorithm. However, the use of the full set of 26 textural features is computationally expensive and may not be feasible for real time implementation. In this work, we identified a reduced set of 3 textural feature which characterizes vascular plaque and used a generalized Fuzzy C-means clustering algorithm. Our work involves three steps: 1) the reduction of a full set 26 textural feature to a reduced set of 3 textural features by using genetic algorithm (GA) optimization method 2) the implementation of an unsupervised generalized clustering algorithm (Fuzzy C-means) on the reduced feature space, and 3) the validation of our results using histology and actual photographic images of vascular plaque. Our results show an excellent match with histology and actual photographic images of vascular tissue. Therefore, our results could provide an efficient pre-clinical tool for the detection of vascular plaque in real time OCT imaging.
The Landsat Image Mosaic of Antarctica
NASA Astrophysics Data System (ADS)
Bindschadler, R.; Vornberger, P.; Fleming, A.; Fox, A.; Morin, P.
2008-12-01
The first-ever true-color, high-resolution digital mosaic of Antarctica has been produced from nearly 1100 Landsat-7 ETM+ images collected between 1999 and 2003. The Landsat Image Mosaic of Antarctica (LIMA) project was an early benchmark data set of the International Polar Year and represents a close and successful collaboration between NASA, USGS, the British Antarctic Survey and the National Science Foundation. The mosaic was successfully merged with lower resolution MODIS data south of Landsat coverage to produce a complete true-color data set of the entire continent. LIMA is being used as a platform for a variety of education and outreach activities. Central to this effort is the NASA website 'Faces of Antarctica' that offers the web visitor the opportunity to explore the data set and to learn how these data are used to support scientific research. Content is delivered through a set of mysteries designed to pique the user's interest and to motivate them to delve deeper into the website where there are various videos and scientific articles for downloading. Detailed lesson plans written by teachers are provided for classroom use and Java applets let the user track the motion of ice in sequential Landsat images. Web links take the user to other sites where they can roam over the imagery using standard pan and zoom functions, or search for any named feature in the Antarctic Geographic Names data base that returns to the user a centered true-color view of any named feature. LIMA also has appeared is a host of external presentations from museum exhibits, to postcards and large posters. It has attracted various value-added providers that increase LIMA's accessibility by allowing users to specify subsets of the very large data set for individual downloads. The ultimate goal of LIMA in the public and educational sector is to enable everyone to become more familiar with Antarctica.
Interobserver Agreement on Endoscopic Classification of Oesophageal Varices in Children.
D'Antiga, Lorenzo; Betalli, Pietro; De Angelis, Paola; Davenport, Mark; Di Giorgio, Angelo; McKiernan, Patrick J; McLin, Valerie; Ravelli, Paolo; Durmaz, Ozlem; Talbotec, Cecile; Sturm, Ekkehard; Woynarowski, Marek; Burroughs, Andrew K
2015-08-01
Data regarding agreement on endoscopic features of oesophageal varices in children with portal hypertension (PH) are scant. The aim of this study was to evaluate endoscopic visualisation and classification of oesophageal varices in children by several European clinicians, to build a rational basis for future multicentre trials. Endoscopic pictures of the distal oesophagus of 100 children with a clinical diagnosis of PH were distributed to 10 endoscopists. Observers were requested to classify variceal size according to a 3-degree scale (small, medium, and large, class A), a 2-degree scale (small and large, class B), and to recognise red wales (presence or absence, class Red). Overall agreement was considered fair if Fleiss and Cohen κ test was ≥0.30, good if ≥0.40, excellent if ≥0.60, and perfect if ≥0.80. Agreement between observers was fair with class A (κ = 0.34) and class B (κ = 0.38), and good with class Red (κ = 0.49). The agreement was good on presence versus absence of varices (class A = 0.53, class B = 0.48). The agreement among the observers was good in class A when endoscopic features of severe PH (medium and large sizes, red marks) were grouped and compared with mild features (absent and small varices) (κ = 0.58). Experts working in different centres show a fairly good agreement on endoscopic features of PH in children, although a better training of paediatric endoscopists may improve the agreement in grading severity of varices in this setting.
Eyben, Florian; Weninger, Felix; Lehment, Nicolas; Schuller, Björn; Rigoll, Gerhard
2013-01-01
Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology "out of the lab" to real-world, diverse data. In this contribution, we address the problem of finding "disturbing" scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.
Eyben, Florian; Weninger, Felix; Lehment, Nicolas; Schuller, Björn; Rigoll, Gerhard
2013-01-01
Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology “out of the lab” to real-world, diverse data. In this contribution, we address the problem of finding “disturbing” scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis. PMID:24391704
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
Using R for large spatiotemporal data sets
NASA Astrophysics Data System (ADS)
Pebesma, Edzer
2017-04-01
Writing and sharing scientific software is a means to communicate scientific ideas for finding scientific consensus, no more and no less than writing and sharing scientific papers is. Important factors for successful communication are adopting an open source environment, and using a language that is understood by many. For many scientist, R's combination of rich data abstraction and highly exposed data structures makes it an attractive communication tool. This paper discusses the development of spatial and spatiotemporal data handling and analysis with R since 2000, and will point to some of R's strengths and weaknesses in a historical perspective. We will also discuss a new, S3-based package for feature data ("Simple Features for R"), and point to a way forward into the data science realm, where pipeline-based workflows are assumed. Finally, we will discuss how, in a similar vein, massive satellite or climate model data sets, potentially held in a cloud environment, can be handled and analyzed with R.
Jiang, Xiong; Chevillet, Mark A; Rauschecker, Josef P; Riesenhuber, Maximilian
2018-04-18
Grouping auditory stimuli into common categories is essential for a variety of auditory tasks, including speech recognition. We trained human participants to categorize auditory stimuli from a large novel set of morphed monkey vocalizations. Using fMRI-rapid adaptation (fMRI-RA) and multi-voxel pattern analysis (MVPA) techniques, we gained evidence that categorization training results in two distinct sets of changes: sharpened tuning to monkey call features (without explicit category representation) in left auditory cortex and category selectivity for different types of calls in lateral prefrontal cortex. In addition, the sharpness of neural selectivity in left auditory cortex, as estimated with both fMRI-RA and MVPA, predicted the steepness of the categorical boundary, whereas categorical judgment correlated with release from adaptation in the left inferior frontal gyrus. These results support the theory that auditory category learning follows a two-stage model analogous to the visual domain, suggesting general principles of perceptual category learning in the human brain. Copyright © 2018 Elsevier Inc. All rights reserved.
Fast Semantic Segmentation of 3d Point Clouds with Strongly Varying Density
NASA Astrophysics Data System (ADS)
Hackel, Timo; Wegner, Jan D.; Schindler, Konrad
2016-06-01
We describe an effective and efficient method for point-wise semantic classification of 3D point clouds. The method can handle unstructured and inhomogeneous point clouds such as those derived from static terrestrial LiDAR or photogammetric reconstruction; and it is computationally efficient, making it possible to process point clouds with many millions of points in a matter of minutes. The key issue, both to cope with strong variations in point density and to bring down computation time, turns out to be careful handling of neighborhood relations. By choosing appropriate definitions of a point's (multi-scale) neighborhood, we obtain a feature set that is both expressive and fast to compute. We evaluate our classification method both on benchmark data from a mobile mapping platform and on a variety of large, terrestrial laser scans with greatly varying point density. The proposed feature set outperforms the state of the art with respect to per-point classification accuracy, while at the same time being much faster to compute.
Social Networking Adapted for Distributed Scientific Collaboration
NASA Technical Reports Server (NTRS)
Karimabadi, Homa
2012-01-01
Share is a social networking site with novel, specially designed feature sets to enable simultaneous remote collaboration and sharing of large data sets among scientists. The site will include not only the standard features found on popular consumer-oriented social networking sites such as Facebook and Myspace, but also a number of powerful tools to extend its functionality to a science collaboration site. A Virtual Observatory is a promising technology for making data accessible from various missions and instruments through a Web browser. Sci-Share augments services provided by Virtual Observatories by enabling distributed collaboration and sharing of downloaded and/or processed data among scientists. This will, in turn, increase science returns from NASA missions. Sci-Share also enables better utilization of NASA s high-performance computing resources by providing an easy and central mechanism to access and share large files on users space or those saved on mass storage. The most common means of remote scientific collaboration today remains the trio of e-mail for electronic communication, FTP for file sharing, and personalized Web sites for dissemination of papers and research results. Each of these tools has well-known limitations. Sci-Share transforms the social networking paradigm into a scientific collaboration environment by offering powerful tools for cooperative discourse and digital content sharing. Sci-Share differentiates itself by serving as an online repository for users digital content with the following unique features: a) Sharing of any file type, any size, from anywhere; b) Creation of projects and groups for controlled sharing; c) Module for sharing files on HPC (High Performance Computing) sites; d) Universal accessibility of staged files as embedded links on other sites (e.g. Facebook) and tools (e.g. e-mail); e) Drag-and-drop transfer of large files, replacing awkward e-mail attachments (and file size limitations); f) Enterprise-level data and messaging encryption; and g) Easy-to-use intuitive workflow.
Moore, Katherine Sledge; Weissman, Daniel H
2010-08-01
In the present study, we investigated whether involuntarily directing attention to a target-colored distractor causes the corresponding attentional set to enter a limited-capacity focus of attention, thereby facilitating the identification of a subsequent target whose color matches the same attentional set. As predicted, in Experiment 1, contingent attentional capture effects from a target-colored distractor were only one half to one third as large when subsequent target identification relied on the same (vs. a different) attentional set. In Experiment 2, this effect was eliminated when all of the target colors matched the same attentional set, arguing against bottom-up perceptual priming of the distractor's color as an alternative account of our findings. In Experiment 3, this effect was reversed when a target-colored distractor appeared after the target, ruling out a feature-based interference account of our findings. We conclude that capacity limitations in working memory strongly influence contingent attentional capture when multiple attentional sets guide selection.
Tsai, Wen-Ting; Hassan, Ahmed; Sarkar, Purbasha; Correa, Joaquin; Metlagel, Zoltan; Jorgens, Danielle M.; Auer, Manfred
2014-01-01
Modern 3D electron microscopy approaches have recently allowed unprecedented insight into the 3D ultrastructural organization of cells and tissues, enabling the visualization of large macromolecular machines, such as adhesion complexes, as well as higher-order structures, such as the cytoskeleton and cellular organelles in their respective cell and tissue context. Given the inherent complexity of cellular volumes, it is essential to first extract the features of interest in order to allow visualization, quantification, and therefore comprehension of their 3D organization. Each data set is defined by distinct characteristics, e.g., signal-to-noise ratio, crispness (sharpness) of the data, heterogeneity of its features, crowdedness of features, presence or absence of characteristic shapes that allow for easy identification, and the percentage of the entire volume that a specific region of interest occupies. All these characteristics need to be considered when deciding on which approach to take for segmentation. The six different 3D ultrastructural data sets presented were obtained by three different imaging approaches: resin embedded stained electron tomography, focused ion beam- and serial block face- scanning electron microscopy (FIB-SEM, SBF-SEM) of mildly stained and heavily stained samples, respectively. For these data sets, four different segmentation approaches have been applied: (1) fully manual model building followed solely by visualization of the model, (2) manual tracing segmentation of the data followed by surface rendering, (3) semi-automated approaches followed by surface rendering, or (4) automated custom-designed segmentation algorithms followed by surface rendering and quantitative analysis. Depending on the combination of data set characteristics, it was found that typically one of these four categorical approaches outperforms the others, but depending on the exact sequence of criteria, more than one approach may be successful. Based on these data, we propose a triage scheme that categorizes both objective data set characteristics and subjective personal criteria for the analysis of the different data sets. PMID:25145678
Household Energy Consumption Segmentation Using Hourly Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwac, J; Flora, J; Rajagopal, R
2014-01-01
The increasing US deployment of residential advanced metering infrastructure (AMI) has made hourly energy consumption data widely available. Using CA smart meter data, we investigate a household electricity segmentation methodology that uses an encoding system with a pre-processed load shape dictionary. Structured approaches using features derived from the encoded data drive five sample program and policy relevant energy lifestyle segmentation strategies. We also ensure that the methodologies developed scale to large data sets.
Medium power amplifiers covering 90 - 130 GHz for telescope local oscillators
NASA Technical Reports Server (NTRS)
Samoska, Lorene A.; Bryerton, Eric; Pukala, David; Peralta, Alejandro; Hu, Ming; Schmitz, Adele
2005-01-01
This paper describes a set of power amplifier (PA) modules containing InP High Electron Mobility Transistor (HEMT) Monolithic Millimeter-wave Integrated Circuit (MMIC) chips. The chips were designed and optimized for local oscillator sources in the 90-130 GHz band for the Atacama Large Millimeter Array telescope. The modules feature 20-45 mW of output power, to date the highest power from solid state HEMT MMIC modules above 110 GHz.
Neural network post-processing of grayscale optical correlator
NASA Technical Reports Server (NTRS)
Lu, Thomas T; Hughlett, Casey L.; Zhoua, Hanying; Chao, Tien-Hsin; Hanan, Jay C.
2005-01-01
In this paper we present the use of a radial basis function neural network (RBFNN) as a post-processor to assist the optical correlator to identify the objects and to reject false alarms. Image plane features near the correlation peaks are extracted and fed to the neural network for analysis. The approach is capable of handling large number of object variations and filter sets. Preliminary experimental results are presented and the performance is analyzed.
Toward natural selection in virtual reality.
Sherstyuk, Andrei; Vincent, Dale; Treskunov, Anton
2010-01-01
Here we describe a vision of VR games that combine the best features of gaming and VR: large, persistent worlds experienced in photorealistic settings with full immersion. For example, Figure 1 illustrates a hypothetical immersive VR game that could be developed using current technologies, including real-time, cinematic-quality graphics; a panoramic head-mounted display (HMD); and wide-area tracking. We also examine the gap between available VR and gaming technologies, and offer solutions for bridging it.
Signal Feature Analysis Using Neural Networks & Psychoacoustics
1993-05-01
large class file on the DAT recording . This processing produced signals which ranged in length from 13200 and 39650 points. The extractions produced ... recorded . This signal set, denoted as "Air" signals , lacked the parameter of angle but added the parameter of striker (metal, plastic, and wood...the subjects were recorded . These became r.4 w data for confusion matrices which described how often a subject confused the class of a signal
Automatic Beam Path Analysis of Laser Wakefield Particle Acceleration Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rubel, Oliver; Geddes, Cameron G.R.; Cormier-Michel, Estelle
2009-10-19
Numerical simulations of laser wakefield particle accelerators play a key role in the understanding of the complex acceleration process and in the design of expensive experimental facilities. As the size and complexity of simulation output grows, an increasingly acute challenge is the practical need for computational techniques that aid in scientific knowledge discovery. To that end, we present a set of data-understanding algorithms that work in concert in a pipeline fashion to automatically locate and analyze high energy particle bunches undergoing acceleration in very large simulation datasets. These techniques work cooperatively by first identifying features of interest in individual timesteps,more » then integrating features across timesteps, and based on the information derived perform analysis of temporally dynamic features. This combination of techniques supports accurate detection of particle beams enabling a deeper level of scientific understanding of physical phenomena than hasbeen possible before. By combining efficient data analysis algorithms and state-of-the-art data management we enable high-performance analysis of extremely large particle datasets in 3D. We demonstrate the usefulness of our methods for a variety of 2D and 3D datasets and discuss the performance of our analysis pipeline.« less
Dynamic security contingency screening and ranking using neural networks.
Mansour, Y; Vaahedi, E; El-Sharkawi, M A
1997-01-01
This paper summarizes BC Hydro's experience in applying neural networks to dynamic security contingency screening and ranking. The idea is to use the information on the prevailing operating condition and directly provide contingency screening and ranking using a trained neural network. To train the two neural networks for the large scale systems of BC Hydro and Hydro Quebec, in total 1691 detailed transient stability simulation were conducted, 1158 for BC Hydro system and 533 for the Hydro Quebec system. The simulation program was equipped with the energy margin calculation module (second kick) to measure the energy margin in each run. The first set of results showed poor performance for the neural networks in assessing the dynamic security. However a number of corrective measures improved the results significantly. These corrective measures included: 1) the effectiveness of output; 2) the number of outputs; 3) the type of features (static versus dynamic); 4) the number of features; 5) system partitioning; and 6) the ratio of training samples to features. The final results obtained using the large scale systems of BC Hydro and Hydro Quebec demonstrates a good potential for neural network in dynamic security assessment contingency screening and ranking.
Ahmad, Riaz; Naz, Saeeda; Afzal, Muhammad Zeshan; Amin, Sayed Hassan; Breuel, Thomas
2015-01-01
The presence of a large number of unique shapes called ligatures in cursive languages, along with variations due to scaling, orientation and location provides one of the most challenging pattern recognition problems. Recognition of the large number of ligatures is often a complicated task in oriental languages such as Pashto, Urdu, Persian and Arabic. Research on cursive script recognition often ignores the fact that scaling, orientation, location and font variations are common in printed cursive text. Therefore, these variations are not included in image databases and in experimental evaluations. This research uncovers challenges faced by Arabic cursive script recognition in a holistic framework by considering Pashto as a test case, because Pashto language has larger alphabet set than Arabic, Persian and Urdu. A database containing 8000 images of 1000 unique ligatures having scaling, orientation and location variations is introduced. In this article, a feature space based on scale invariant feature transform (SIFT) along with a segmentation framework has been proposed for overcoming the above mentioned challenges. The experimental results show a significantly improved performance of proposed scheme over traditional feature extraction techniques such as principal component analysis (PCA). PMID:26368566
Magnostics: Image-Based Search of Interesting Matrix Views for Guided Network Exploration.
Behrisch, Michael; Bach, Benjamin; Hund, Michael; Delz, Michael; Von Ruden, Laura; Fekete, Jean-Daniel; Schreck, Tobias
2017-01-01
In this work we address the problem of retrieving potentially interesting matrix views to support the exploration of networks. We introduce Matrix Diagnostics (or Magnostics), following in spirit related approaches for rating and ranking other visualization techniques, such as Scagnostics for scatter plots. Our approach ranks matrix views according to the appearance of specific visual patterns, such as blocks and lines, indicating the existence of topological motifs in the data, such as clusters, bi-graphs, or central nodes. Magnostics can be used to analyze, query, or search for visually similar matrices in large collections, or to assess the quality of matrix reordering algorithms. While many feature descriptors for image analyzes exist, there is no evidence how they perform for detecting patterns in matrices. In order to make an informed choice of feature descriptors for matrix diagnostics, we evaluate 30 feature descriptors-27 existing ones and three new descriptors that we designed specifically for MAGNOSTICS-with respect to four criteria: pattern response, pattern variability, pattern sensibility, and pattern discrimination. We conclude with an informed set of six descriptors as most appropriate for Magnostics and demonstrate their application in two scenarios; exploring a large collection of matrices and analyzing temporal networks.
NASA Technical Reports Server (NTRS)
Ardanuy, Phillip E.; Hucek, Richard R.; Groveman, Brian S.; Kyle, H. Lee
1987-01-01
A deconvolution technique is employed that permits recovery of daily averaged earth radiation budget (ERB) parameters at the top of the atmosphere from a set of the Nimbus 7 ERB wide field of view (WFOV) measurements. Improvements in both the spatial resolution of the resultant fields and in the fidelity of the time averages is obtained. The algorithm is evaluated on a set of months during the period 1980-1983. The albedo, outgoing long-wave radiation, and net radiation parameters are analyzed. The amplitude and phase of the quasi-stationary patterns that appear in the spatially deconvolved fields describe the radiation budget components for 'normal' as well as the El Nino/Southern Oscillation (ENSO) episode years. They delineate the seasonal development of large-scale features inherent in the earth's radiation budget as well as the natural variability of interannual differences. These features are underscored by the powerful emergence of the 1982-1983 ENSO event in the fields displayed. The conclusion is that with this type of resolution enhancement, WFOV radiometers provide a useful tool for the observation of the contemporary climate and its variability.
Feature Extraction and Selection for Myoelectric Control Based on Wearable EMG Sensors.
Phinyomark, Angkoon; N Khushaba, Rami; Scheme, Erik
2018-05-18
Specialized myoelectric sensors have been used in prosthetics for decades, but, with recent advancements in wearable sensors, wireless communication and embedded technologies, wearable electromyographic (EMG) armbands are now commercially available for the general public. Due to physical, processing, and cost constraints, however, these armbands typically sample EMG signals at a lower frequency (e.g., 200 Hz for the Myo armband) than their clinical counterparts. It remains unclear whether existing EMG feature extraction methods, which largely evolved based on EMG signals sampled at 1000 Hz or above, are still effective for use with these emerging lower-bandwidth systems. In this study, the effects of sampling rate (low: 200 Hz vs. high: 1000 Hz) on the classification of hand and finger movements were evaluated for twenty-six different individual features and eight sets of multiple features using a variety of datasets comprised of both able-bodied and amputee subjects. The results show that, on average, classification accuracies drop significantly ( p.
NASA Astrophysics Data System (ADS)
Ma, Ligang; Ma, Fenglan; Li, Jiadan; Gu, Qing; Yang, Shengtian; Ding, Jianli
2017-04-01
Land degradation, specifically soil salinization has rendered large areas of China west sterile and unproductive while diminishing the productivity of adjacent lands and other areas where salting is less severe. Up to now despite decades of research in soil mapping, few accurate and up-to-date information on the spatial extent and variability of soil salinity are available for large geographic regions. This study explores the po-tentials of assessing soil salinity via linear and random forest modeling of remote sensing based environmental factors and indirect indicators. A case study is presented for the arid oases of Tarim and Junggar Basin, Xinjiang, China using time series land surface temperature (LST), evapotranspiration (ET), TRMM precipitation (TRM), DEM product and vegetation indexes as well as their second order products. In par-ticular, the location of the oasis, the best feature sets, different salinity degrees and modeling approaches were fully examined. All constructed models were evaluated for their fit to the whole data set and their performance in a leave-one-field-out spatial cross-validation. In addition, the Kruskal-Wallis rank test was adopted for the statis-tical comparison of different models. Overall, the random forest model outperformed the linear model for the two basins, all salinity degrees and datasets. As for feature set, LST and ET were consistently identified to be the most important factors for two ba-sins while the contribution of vegetation indexes vary with location. What's more, models performances are promising for the salinity ranges that are most relevant to agricultural productivity.
Understanding the heavy-tailed dynamics in human behavior
NASA Astrophysics Data System (ADS)
Ross, Gordon J.; Jones, Tim
2015-06-01
The recent availability of electronic data sets containing large volumes of communication data has made it possible to study human behavior on a larger scale than ever before. From this, it has been discovered that across a diverse range of data sets, the interevent times between consecutive communication events obey heavy-tailed power law dynamics. Explaining this has proved controversial, and two distinct hypotheses have emerged. The first holds that these power laws are fundamental, and arise from the mechanisms such as priority queuing that humans use to schedule tasks. The second holds that they are statistical artifacts which only occur in aggregated data when features such as circadian rhythms and burstiness are ignored. We use a large social media data set to test these hypotheses, and find that although models that incorporate circadian rhythms and burstiness do explain part of the observed heavy tails, there is residual unexplained heavy-tail behavior which suggests a more fundamental cause. Based on this, we develop a quantitative model of human behavior which improves on existing approaches and gives insight into the mechanisms underlying human interactions.
Saund, Eric
2013-10-01
Effective object and scene classification and indexing depend on extraction of informative image features. This paper shows how large families of complex image features in the form of subgraphs can be built out of simpler ones through construction of a graph lattice—a hierarchy of related subgraphs linked in a lattice. Robustness is achieved by matching many overlapping and redundant subgraphs, which allows the use of inexpensive exact graph matching, instead of relying on expensive error-tolerant graph matching to a minimal set of ideal model graphs. Efficiency in exact matching is gained by exploitation of the graph lattice data structure. Additionally, the graph lattice enables methods for adaptively growing a feature space of subgraphs tailored to observed data. We develop the approach in the domain of rectilinear line art, specifically for the practical problem of document forms recognition. We are especially interested in methods that require only one or very few labeled training examples per category. We demonstrate two approaches to using the subgraph features for this purpose. Using a bag-of-words feature vector we achieve essentially single-instance learning on a benchmark forms database, following an unsupervised clustering stage. Further performance gains are achieved on a more difficult dataset using a feature voting method and feature selection procedure.
NASA Astrophysics Data System (ADS)
Pandremmenou, K.; Shahid, M.; Kondi, L. P.; Lövström, B.
2015-03-01
In this work, we propose a No-Reference (NR) bitstream-based model for predicting the quality of H.264/AVC video sequences, affected by both compression artifacts and transmission impairments. The proposed model is based on a feature extraction procedure, where a large number of features are calculated from the packet-loss impaired bitstream. Many of the features are firstly proposed in this work, and the specific set of the features as a whole is applied for the first time for making NR video quality predictions. All feature observations are taken as input to the Least Absolute Shrinkage and Selection Operator (LASSO) regression method. LASSO indicates the most important features, and using only them, it is possible to estimate the Mean Opinion Score (MOS) with high accuracy. Indicatively, we point out that only 13 features are able to produce a Pearson Correlation Coefficient of 0.92 with the MOS. Interestingly, the performance statistics we computed in order to assess our method for predicting the Structural Similarity Index and the Video Quality Metric are equally good. Thus, the obtained experimental results verified the suitability of the features selected by LASSO as well as the ability of LASSO in making accurate predictions through sparse modeling.
Extraordinarily Adaptive Properties of the Genetically Encoded Amino Acids
Ilardo, Melissa; Meringer, Markus; Freeland, Stephen; Rasulev, Bakhtiyor; Cleaves II, H. James
2015-01-01
Using novel advances in computational chemistry, we demonstrate that the set of 20 genetically encoded amino acids, used nearly universally to construct all coded terrestrial proteins, has been highly influenced by natural selection. We defined an adaptive set of amino acids as one whose members thoroughly cover relevant physico-chemical properties, or “chemistry space.” Using this metric, we compared the encoded amino acid alphabet to random sets of amino acids. These random sets were drawn from a computationally generated compound library containing 1913 alternative amino acids that lie within the molecular weight range of the encoded amino acids. Sets that cover chemistry space better than the genetically encoded alphabet are extremely rare and energetically costly. Further analysis of more adaptive sets reveals common features and anomalies, and we explore their implications for synthetic biology. We present these computations as evidence that the set of 20 amino acids found within the standard genetic code is the result of considerable natural selection. The amino acids used for constructing coded proteins may represent a largely global optimum, such that any aqueous biochemistry would use a very similar set. PMID:25802223
A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification
Wang, Jing; Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; ...
2013-01-01
Background . The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective . To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods . The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expertmore » knowledge was integrated into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results . The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions . Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification.« less
A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification
Wang, Jing; Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; Varnum, Susan M.; Brown, Joseph N.; Riensche, Roderick M.; Adkins, Joshua N.; Jacobs, Jon M.; Hoidal, John R.; Scholand, Mary Beth; Pounds, Joel G.; Blackburn, Michael R.; Rodland, Karin D.; McDermott, Jason E.
2013-01-01
Background. The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective. To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods. The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expert knowledge was integrated into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results. The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions. Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification. PMID:24223463
A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Jing; Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.
2013-10-01
Background. The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective. To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods. The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expert knowledge was integratedmore » into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results. The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions. Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification.« less
Mougiakakou, Stavroula G; Valavanis, Ioannis K; Nikita, Alexandra; Nikita, Konstantina S
2007-09-01
The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed. Number of regions of interests (ROIs) corresponding to C1-C4 have been defined by experienced radiologists in non-enhanced liver CT images. For each ROI, five distinct sets of texture features were extracted using first order statistics, spatial gray level dependence matrix, gray level difference method, Laws' texture energy measures, and fractal dimension measurements. Two different ECs were constructed and compared. The first one consists of five multilayer perceptron neural networks (NNs), each using as input one of the computed texture feature sets or its reduced version after genetic algorithm-based feature selection. The second EC comprised five different primary classifiers, namely one multilayer perceptron NN, one probabilistic NN, and three k-nearest neighbor classifiers, each fed with the combination of the five texture feature sets or their reduced versions. The final decision of each EC was extracted by using appropriate voting schemes, while bootstrap re-sampling was utilized in order to estimate the generalization ability of the CAD architectures based on the available relatively small-sized data set. The best mean classification accuracy (84.96%) is achieved by the second EC using a fused feature set, and the weighted voting scheme. The fused feature set was obtained after appropriate feature selection applied to specific subsets of the original feature set. The comparative assessment of the various CAD architectures shows that combining three types of classifiers with a voting scheme, fed with identical feature sets obtained after appropriate feature selection and fusion, may result in an accurate system able to assist differential diagnosis of focal liver lesions from non-enhanced CT images.
Mass tracking and material accounting in the Integral Fast Reactor (IFR)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Orechwa, Y.; Adams, C.H.; White, A.M.
1991-01-01
The Integral Fast Reactor (IFR) is a generic advanced liquid metal cooled reactor concept being developed at Argonne National Laboratory (ANL). There are a number of technical features of the IFR which contribute to its potential as a next-generation reactor. These are associated with large safety margins with regard to off-normal events involving the heat transport system, and the use of metallic fuel which makes possible the utilization of innovative fuel cycle processes. The latter feature permits fuel cycle closure the compact, low-cost reprocessing facilities, collocated with the reactor plant. These primary features are being demonstrated in the facilities atmore » ANL-West, utilizing Experimental Breeder Reactor 2 and the associated Fuel Cycle Facility (FCF) as an IFR prototype. The demonstration of this IFR prototype includes the design and implementation of the Mass-Tracking System (MTG). In this system, data from the operations of the FCF, including weights and batch-process parameters, are collected and maintained by the MTG running on distributed workstations. The components of the MTG System include: (1) an Oracle database manager with a Fortran interface, (2) a set of MTG Tasks'' which collect, manipulate and report data, (3) a set of MTG Terminal Sessions'' which provide some interactive control of the Tasks, and (4) a set of servers which manage the Tasks and which provide the communications link between the MTG System and Operator Control Stations, which control process equipment and monitoring devices within the FCF.« less
Serial vs. parallel models of attention in visual search: accounting for benchmark RT-distributions.
Moran, Rani; Zehetleitner, Michael; Liesefeld, Heinrich René; Müller, Hermann J; Usher, Marius
2016-10-01
Visual search is central to the investigation of selective visual attention. Classical theories propose that items are identified by serially deploying focal attention to their locations. While this accounts for set-size effects over a continuum of task difficulties, it has been suggested that parallel models can account for such effects equally well. We compared the serial Competitive Guided Search model with a parallel model in their ability to account for RT distributions and error rates from a large visual search data-set featuring three classical search tasks: 1) a spatial configuration search (2 vs. 5); 2) a feature-conjunction search; and 3) a unique feature search (Wolfe, Palmer & Horowitz Vision Research, 50(14), 1304-1311, 2010). In the parallel model, each item is represented by a diffusion to two boundaries (target-present/absent); the search corresponds to a parallel race between these diffusors. The parallel model was highly flexible in that it allowed both for a parametric range of capacity-limitation and for set-size adjustments of identification boundaries. Furthermore, a quit unit allowed for a continuum of search-quitting policies when the target is not found, with "single-item inspection" and exhaustive searches comprising its extremes. The serial model was found to be superior to the parallel model, even before penalizing the parallel model for its increased complexity. We discuss the implications of the results and the need for future studies to resolve the debate.
Sample variance in weak lensing: How many simulations are required?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Petri, Andrea; May, Morgan; Haiman, Zoltan
Constraining cosmology using weak gravitational lensing consists of comparing a measured feature vector of dimension N b with its simulated counterpart. An accurate estimate of the N b × N b feature covariance matrix C is essential to obtain accurate parameter confidence intervals. When C is measured from a set of simulations, an important question is how large this set should be. To answer this question, we construct different ensembles of N r realizations of the shear field, using a common randomization procedure that recycles the outputs from a smaller number N s ≤ N r of independent ray-tracing N-bodymore » simulations. We study parameter confidence intervals as a function of (N s, N r) in the range 1 ≤ N s ≤ 200 and 1 ≤ N r ≲ 105. Previous work [S. Dodelson and M. D. Schneider, Phys. Rev. D 88, 063537 (2013)] has shown that Gaussian noise in the feature vectors (from which the covariance is estimated) lead, at quadratic order, to an O(1/N r) degradation of the parameter confidence intervals. Using a variety of lensing features measured in our simulations, including shear-shear power spectra and peak counts, we show that cubic and quartic covariance fluctuations lead to additional O(1/N 2 r) error degradation that is not negligible when N r is only a factor of few larger than N b. We study the large N r limit, and find that a single, 240 Mpc/h sized 512 3-particle N-body simulation (N s = 1) can be repeatedly recycled to produce as many as N r = few × 10 4 shear maps whose power spectra and high-significance peak counts can be treated as statistically independent. Lastly, a small number of simulations (N s = 1 or 2) is sufficient to forecast parameter confidence intervals at percent accuracy.« less
Sample variance in weak lensing: How many simulations are required?
Petri, Andrea; May, Morgan; Haiman, Zoltan
2016-03-24
Constraining cosmology using weak gravitational lensing consists of comparing a measured feature vector of dimension N b with its simulated counterpart. An accurate estimate of the N b × N b feature covariance matrix C is essential to obtain accurate parameter confidence intervals. When C is measured from a set of simulations, an important question is how large this set should be. To answer this question, we construct different ensembles of N r realizations of the shear field, using a common randomization procedure that recycles the outputs from a smaller number N s ≤ N r of independent ray-tracing N-bodymore » simulations. We study parameter confidence intervals as a function of (N s, N r) in the range 1 ≤ N s ≤ 200 and 1 ≤ N r ≲ 105. Previous work [S. Dodelson and M. D. Schneider, Phys. Rev. D 88, 063537 (2013)] has shown that Gaussian noise in the feature vectors (from which the covariance is estimated) lead, at quadratic order, to an O(1/N r) degradation of the parameter confidence intervals. Using a variety of lensing features measured in our simulations, including shear-shear power spectra and peak counts, we show that cubic and quartic covariance fluctuations lead to additional O(1/N 2 r) error degradation that is not negligible when N r is only a factor of few larger than N b. We study the large N r limit, and find that a single, 240 Mpc/h sized 512 3-particle N-body simulation (N s = 1) can be repeatedly recycled to produce as many as N r = few × 10 4 shear maps whose power spectra and high-significance peak counts can be treated as statistically independent. Lastly, a small number of simulations (N s = 1 or 2) is sufficient to forecast parameter confidence intervals at percent accuracy.« less
Isotropy analyses of the Planck convergence map
NASA Astrophysics Data System (ADS)
Marques, G. A.; Novaes, C. P.; Bernui, A.; Ferreira, I. S.
2018-01-01
The presence of matter in the path of relic photons causes distortions in the angular pattern of the cosmic microwave background (CMB) temperature fluctuations, modifying their properties in a slight but measurable way. Recently, the Planck Collaboration released the estimated convergence map, an integrated measure of the large-scale matter distribution that produced the weak gravitational lensing (WL) phenomenon observed in Planck CMB data. We perform exhaustive analyses of this convergence map calculating the variance in small and large regions of the sky, but excluding the area masked due to Galactic contaminations, and compare them with the features expected in the set of simulated convergence maps, also released by the Planck Collaboration. Our goal is to search for sky directions or regions where the WL imprints anomalous signatures to the variance estimator revealed through a χ2 analyses at a statistically significant level. In the local analysis of the Planck convergence map, we identified eight patches of the sky in disagreement, in more than 2σ, with what is observed in the average of the simulations. In contrast, in the large regions analysis we found no statistically significant discrepancies, but, interestingly, the regions with the highest χ2 values are surrounding the ecliptic poles. Thus, our results show a good agreement with the features expected by the Λ cold dark matter concordance model, as given by the simulations. Yet, the outliers regions found here could suggest that the data still contain residual contamination, like noise, due to over- or underestimation of systematic effects in the simulation data set.
A quantitative perspective on ethics in large team science.
Petersen, Alexander M; Pavlidis, Ioannis; Semendeferi, Ioanna
2014-12-01
The gradual crowding out of singleton and small team science by large team endeavors is challenging key features of research culture. It is therefore important for the future of scientific practice to reflect upon the individual scientist's ethical responsibilities within teams. To facilitate this reflection we show labor force trends in the US revealing a skewed growth in academic ranks and increased levels of competition for promotion within the system; we analyze teaming trends across disciplines and national borders demonstrating why it is becoming difficult to distribute credit and to avoid conflicts of interest; and we use more than a century of Nobel prize data to show how science is outgrowing its old institutions of singleton awards. Of particular concern within the large team environment is the weakening of the mentor-mentee relation, which undermines the cultivation of virtue ethics across scientific generations. These trends and emerging organizational complexities call for a universal set of behavioral norms that transcend team heterogeneity and hierarchy. To this end, our expository analysis provides a survey of ethical issues in team settings to inform science ethics education and science policy.
Occurrence of neanderthal features in mandibles from the Atapuerca-SH site.
Rosas, A
2001-01-01
Analysis of variation and distribution of evolutionary novelties is meaningful in understanding evolutionary processes. The mandible, as a morphological complex, comprises a large number of derived Neanderthal features. The present study investigates whether the features usually considered as European lineage apomorphies evolved independently; the occurrence of these features is studied in the mandibles from the Sima de los Huesos (SH) site (Atapuerca, Spain). For comparative purposes, a large sample of Neanderthal mandibles as well as older fossil Homo specimens have been used for the study. Chi-square tests were employed to test for independence. The SH mandibles present a set of features that clearly show the basic architecture of the Neanderthal mandible. A highly significant association is detected in the variation of the position of the mental foramen, the lateral prominence, and the anterior marginal tubercle, as well as in the development of retromolar space. However, a much weaker association is detected in the features of the internal aspect of the mandible, with a few exceptions. Features of the external aspect of the mandible occur chronologically earlier than those observed in the internal aspect. The hypothesis that two distinct and consecutive morphological processes have driven the emergence of the European lineage throughout the Middle Pleistocene is proposed. A first transformation affects the mandible by means of backwards displacement of the structures located at the external aspect, as well as the position of the condyle. A second process would modify the features of the internal aspect of the mandible, in which the relief of the masseteric and pterygoid fossae are affected, in association with a spatial rearrangement of the corpus and ramus. Analyzed individually, some of the considered features may be questioned as Neanderthal apomorphies (Trinkaus,1993; Franciscus and Trinkaus, 995); however, the joint occurrence of many of them suggests that the complex is an evolutionary novelty. Copyright 2001 Wiley-Liss, Inc.
Pérez-Hernández, Guillermo; Noé, Frank
2016-12-13
Analysis of molecular dynamics, for example using Markov models, often requires the identification of order parameters that are good indicators of the rare events, i.e. good reaction coordinates. Recently, it has been shown that the time-lagged independent component analysis (TICA) finds the linear combinations of input coordinates that optimally represent the slow kinetic modes and may serve in order to define reaction coordinates between the metastable states of the molecular system. A limitation of the method is that both computing time and memory requirements scale with the square of the number of input features. For large protein systems, this exacerbates the use of extensive feature sets such as the distances between all pairs of residues or even heavy atoms. Here we derive a hierarchical TICA (hTICA) method that approximates the full TICA solution by a hierarchical, divide-and-conquer calculation. By using hTICA on distances between heavy atoms we identify previously unknown relaxation processes in the bovine pancreatic trypsin inhibitor.
Multiagent model and mean field theory of complex auction dynamics
NASA Astrophysics Data System (ADS)
Chen, Qinghua; Huang, Zi-Gang; Wang, Yougui; Lai, Ying-Cheng
2015-09-01
Recent years have witnessed a growing interest in analyzing a variety of socio-economic phenomena using methods from statistical and nonlinear physics. We study a class of complex systems arising from economics, the lowest unique bid auction (LUBA) systems, which is a recently emerged class of online auction game systems. Through analyzing large, empirical data sets of LUBA, we identify a general feature of the bid price distribution: an inverted J-shaped function with exponential decay in the large bid price region. To account for the distribution, we propose a multi-agent model in which each agent bids stochastically in the field of winner’s attractiveness, and develop a theoretical framework to obtain analytic solutions of the model based on mean field analysis. The theory produces bid-price distributions that are in excellent agreement with those from the real data. Our model and theory capture the essential features of human behaviors in the competitive environment as exemplified by LUBA, and may provide significant quantitative insights into complex socio-economic phenomena.
The reliability of continuous brain responses during naturalistic listening to music.
Burunat, Iballa; Toiviainen, Petri; Alluri, Vinoo; Bogert, Brigitte; Ristaniemi, Tapani; Sams, Mikko; Brattico, Elvira
2016-01-01
Low-level (timbral) and high-level (tonal and rhythmical) musical features during continuous listening to music, studied by functional magnetic resonance imaging (fMRI), have been shown to elicit large-scale responses in cognitive, motor, and limbic brain networks. Using a similar methodological approach and a similar group of participants, we aimed to study the replicability of previous findings. Participants' fMRI responses during continuous listening of a tango Nuevo piece were correlated voxelwise against the time series of a set of perceptually validated musical features computationally extracted from the music. The replicability of previous results and the present study was assessed by two approaches: (a) correlating the respective activation maps, and (b) computing the overlap of active voxels between datasets at variable levels of ranked significance. Activity elicited by timbral features was better replicable than activity elicited by tonal and rhythmical ones. These results indicate more reliable processing mechanisms for low-level musical features as compared to more high-level features. The processing of such high-level features is probably more sensitive to the state and traits of the listeners, as well as of their background in music. Copyright © 2015 Elsevier Inc. All rights reserved.
High-resolution face verification using pore-scale facial features.
Li, Dong; Zhou, Huiling; Lam, Kin-Man
2015-08-01
Face recognition methods, which usually represent face images using holistic or local facial features, rely heavily on alignment. Their performances also suffer a severe degradation under variations in expressions or poses, especially when there is one gallery per subject only. With the easy access to high-resolution (HR) face images nowadays, some HR face databases have recently been developed. However, few studies have tackled the use of HR information for face recognition or verification. In this paper, we propose a pose-invariant face-verification method, which is robust to alignment errors, using the HR information based on pore-scale facial features. A new keypoint descriptor, namely, pore-Principal Component Analysis (PCA)-Scale Invariant Feature Transform (PPCASIFT)-adapted from PCA-SIFT-is devised for the extraction of a compact set of distinctive pore-scale facial features. Having matched the pore-scale features of two-face regions, an effective robust-fitting scheme is proposed for the face-verification task. Experiments show that, with one frontal-view gallery only per subject, our proposed method outperforms a number of standard verification methods, and can achieve excellent accuracy even the faces are under large variations in expression and pose.
Land use classification using texture information in ERTS-A MSS imagery
NASA Technical Reports Server (NTRS)
Haralick, R. M. (Principal Investigator); Shanmugam, K. S.; Bosley, R.
1973-01-01
The author has identified the following significant results. Preliminary digital analysis of ERTS-1 MSS imagery reveals that the textural features of the imagery are very useful for land use classification. A procedure for extracting the textural features of ERTS-1 imagery is presented and the results of a land use classification scheme based on the textural features are also presented. The land use classification algorithm using textural features was tested on a 5100 square mile area covered by part of an ERTS-1 MSS band 5 image over the California coastline. The image covering this area was blocked into 648 subimages of size 8.9 square miles each. Based on a color composite of the image set, a total of 7 land use categories were identified. These land use categories are: coastal forest, woodlands, annual grasslands, urban areas, large irrigated fields, small irrigated fields, and water. The automatic classifier was trained to identify the land use categories using only the textural characteristics of the subimages; 75 percent of the subimages were assigned correct identifications. Since texture and spectral features provide completely different kinds of information, a significant increase in identification accuracy will take place when both features are used together.
Identification of informative features for predicting proinflammatory potentials of engine exhausts.
Wang, Chia-Chi; Lin, Ying-Chi; Lin, Yuan-Chung; Jhang, Syu-Ruei; Tung, Chun-Wei
2017-08-18
The immunotoxicity of engine exhausts is of high concern to human health due to the increasing prevalence of immune-related diseases. However, the evaluation of immunotoxicity of engine exhausts is currently based on expensive and time-consuming experiments. It is desirable to develop efficient methods for immunotoxicity assessment. To accelerate the development of safe alternative fuels, this study proposed a computational method for identifying informative features for predicting proinflammatory potentials of engine exhausts. A principal component regression (PCR) algorithm was applied to develop prediction models. The informative features were identified by a sequential backward feature elimination (SBFE) algorithm. A total of 19 informative chemical and biological features were successfully identified by SBFE algorithm. The informative features were utilized to develop a computational method named FS-CBM for predicting proinflammatory potentials of engine exhausts. FS-CBM model achieved a high performance with correlation coefficient values of 0.997 and 0.943 obtained from training and independent test sets, respectively. The FS-CBM model was developed for predicting proinflammatory potentials of engine exhausts with a large improvement on prediction performance compared with our previous CBM model. The proposed method could be further applied to construct models for bioactivities of mixtures.
ECG based Myocardial Infarction detection using Hybrid Firefly Algorithm.
Kora, Padmavathi
2017-12-01
Myocardial Infarction (MI) is one of the most frequent diseases, and can also cause demise, disability and monetary loss in patients who suffer from cardiovascular disorder. Diagnostic methods of this ailment by physicians are typically invasive, even though they do not fulfill the required detection accuracy. Recent feature extraction methods, for example, Auto Regressive (AR) modelling; Magnitude Squared Coherence (MSC); Wavelet Coherence (WTC) using Physionet database, yielded a collection of huge feature set. A large number of these features may be inconsequential containing some excess and non-discriminative components that present excess burden in computation and loss of execution performance. So Hybrid Firefly and Particle Swarm Optimization (FFPSO) is directly used to optimise the raw ECG signal instead of extracting features using the above feature extraction techniques. Provided results in this paper show that, for the detection of MI class, the FFPSO algorithm with ANN gives 99.3% accuracy, sensitivity of 99.97%, and specificity of 98.7% on MIT-BIH database by including NSR database also. The proposed approach has shown that methods that are based on the feature optimization of the ECG signals are the perfect to diagnosis the condition of the heart patients. Copyright © 2017 Elsevier B.V. All rights reserved.
Chan, Heang-Ping; Hadjiiski, Lubomir; Helvie, Mark A.; Wei, Jun; Cha, Kenny
2016-01-01
Purpose: Develop a computer-aided detection (CAD) system for masses in digital breast tomosynthesis (DBT) volume using a deep convolutional neural network (DCNN) with transfer learning from mammograms. Methods: A data set containing 2282 digitized film and digital mammograms and 324 DBT volumes were collected with IRB approval. The mass of interest on the images was marked by an experienced breast radiologist as reference standard. The data set was partitioned into a training set (2282 mammograms with 2461 masses and 230 DBT views with 228 masses) and an independent test set (94 DBT views with 89 masses). For DCNN training, the region of interest (ROI) containing the mass (true positive) was extracted from each image. False positive (FP) ROIs were identified at prescreening by their previously developed CAD systems. After data augmentation, a total of 45 072 mammographic ROIs and 37 450 DBT ROIs were obtained. Data normalization and reduction of non-uniformity in the ROIs across heterogeneous data was achieved using a background correction method applied to each ROI. A DCNN with four convolutional layers and three fully connected (FC) layers was first trained on the mammography data. Jittering and dropout techniques were used to reduce overfitting. After training with the mammographic ROIs, all weights in the first three convolutional layers were frozen, and only the last convolution layer and the FC layers were randomly initialized again and trained using the DBT training ROIs. The authors compared the performances of two CAD systems for mass detection in DBT: one used the DCNN-based approach and the other used their previously developed feature-based approach for FP reduction. The prescreening stage was identical in both systems, passing the same set of mass candidates to the FP reduction stage. For the feature-based CAD system, 3D clustering and active contour method was used for segmentation; morphological, gray level, and texture features were extracted and merged with a linear discriminant classifier to score the detected masses. For the DCNN-based CAD system, ROIs from five consecutive slices centered at each candidate were passed through the trained DCNN and a mass likelihood score was generated. The performances of the CAD systems were evaluated using free-response ROC curves and the performance difference was analyzed using a non-parametric method. Results: Before transfer learning, the DCNN trained only on mammograms with an AUC of 0.99 classified DBT masses with an AUC of 0.81 in the DBT training set. After transfer learning with DBT, the AUC improved to 0.90. For breast-based CAD detection in the test set, the sensitivity for the feature-based and the DCNN-based CAD systems was 83% and 91%, respectively, at 1 FP/DBT volume. The difference between the performances for the two systems was statistically significant (p-value < 0.05). Conclusions: The image patterns learned from the mammograms were transferred to the mass detection on DBT slices through the DCNN. This study demonstrated that large data sets collected from mammography are useful for developing new CAD systems for DBT, alleviating the problem and effort of collecting entirely new large data sets for the new modality. PMID:27908154
Discriminative prediction of mammalian enhancers from DNA sequence
Lee, Dongwon; Karchin, Rachel; Beer, Michael A.
2011-01-01
Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers. PMID:21875935
Zhu, Xiaolei; Mitchell, Julie C
2011-09-01
Hot spots constitute a small fraction of protein-protein interface residues, yet they account for a large fraction of the binding affinity. Based on our previous method (KFC), we present two new methods (KFC2a and KFC2b) that outperform other methods at hot spot prediction. A number of improvements were made in developing these new methods. First, we created a training data set that contained a similar number of hot spot and non-hot spot residues. In addition, we generated 47 different features, and different numbers of features were used to train the models to avoid over-fitting. Finally, two feature combinations were selected: One (used in KFC2a) is composed of eight features that are mainly related to solvent accessible surface area and local plasticity; the other (KFC2b) is composed of seven features, only two of which are identical to those used in KFC2a. The two models were built using support vector machines (SVM). The two KFC2 models were then tested on a mixed independent test set, and compared with other methods such as Robetta, FOLDEF, HotPoint, MINERVA, and KFC. KFC2a showed the highest predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.85); however, the false positive rate was somewhat higher than for other models. KFC2b showed the best predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.62) among all methods other than KFC2a, and the False Positive Rate (FPR = 0.15) was comparable with other highly predictive methods. Copyright © 2011 Wiley-Liss, Inc.
Automatic detection of anomalies in screening mammograms
2013-01-01
Background Diagnostic performance in breast screening programs may be influenced by the prior probability of disease. Since breast cancer incidence is roughly half a percent in the general population there is a large probability that the screening exam will be normal. That factor may contribute to false negatives. Screening programs typically exhibit about 83% sensitivity and 91% specificity. This investigation was undertaken to determine if a system could be developed to pre-sort screening-images into normal and suspicious bins based on their likelihood to contain disease. Wavelets were investigated as a method to parse the image data, potentially removing confounding information. The development of a classification system based on features extracted from wavelet transformed mammograms is reported. Methods In the multi-step procedure images were processed using 2D discrete wavelet transforms to create a set of maps at different size scales. Next, statistical features were computed from each map, and a subset of these features was the input for a concerted-effort set of naïve Bayesian classifiers. The classifier network was constructed to calculate the probability that the parent mammography image contained an abnormality. The abnormalities were not identified, nor were they regionalized. The algorithm was tested on two publicly available databases: the Digital Database for Screening Mammography (DDSM) and the Mammographic Images Analysis Society’s database (MIAS). These databases contain radiologist-verified images and feature common abnormalities including: spiculations, masses, geometric deformations and fibroid tissues. Results The classifier-network designs tested achieved sensitivities and specificities sufficient to be potentially useful in a clinical setting. This first series of tests identified networks with 100% sensitivity and up to 79% specificity for abnormalities. This performance significantly exceeds the mean sensitivity reported in literature for the unaided human expert. Conclusions Classifiers based on wavelet-derived features proved to be highly sensitive to a range of pathologies, as a result Type II errors were nearly eliminated. Pre-sorting the images changed the prior probability in the sorted database from 37% to 74%. PMID:24330643
Compact and Hybrid Feature Description for Building Extraction
NASA Astrophysics Data System (ADS)
Li, Z.; Liu, Y.; Hu, Y.; Li, P.; Ding, Y.
2017-05-01
Building extraction in aerial orthophotos is crucial for various applications. Currently, deep learning has been shown to be successful in addressing building extraction with high accuracy and high robustness. However, quite a large number of samples is required in training a classifier when using deep learning model. In order to realize accurate and semi-interactive labelling, the performance of feature description is crucial, as it has significant effect on the accuracy of classification. In this paper, we bring forward a compact and hybrid feature description method, in order to guarantees desirable classification accuracy of the corners on the building roof contours. The proposed descriptor is a hybrid description of an image patch constructed from 4 sets of binary intensity tests. Experiments show that benefiting from binary description and making full use of color channels, this descriptor is not only computationally frugal, but also accurate than SURF for building extraction.
Process service quality evaluation based on Dempster-Shafer theory and support vector machine.
Pei, Feng-Que; Li, Dong-Bo; Tong, Yi-Fei; He, Fei
2017-01-01
Human involvement influences traditional service quality evaluations, which triggers an evaluation's low accuracy, poor reliability and less impressive predictability. This paper proposes a method by employing a support vector machine (SVM) and Dempster-Shafer evidence theory to evaluate the service quality of a production process by handling a high number of input features with a low sampling data set, which is called SVMs-DS. Features that can affect production quality are extracted by a large number of sensors. Preprocessing steps such as feature simplification and normalization are reduced. Based on three individual SVM models, the basic probability assignments (BPAs) are constructed, which can help the evaluation in a qualitative and quantitative way. The process service quality evaluation results are validated by the Dempster rules; the decision threshold to resolve conflicting results is generated from three SVM models. A case study is presented to demonstrate the effectiveness of the SVMs-DS method.
Gameiro, Ana; Gouveia, Miguel; Cardoso, José Carlos; Tellechea, Oscar
2016-01-01
Rosai-Dorfman disease is a benign histiocytic proliferative disorder of unknown etiology. The disease mainly affects lymph node tissue, although it is rarely confined to the skin. Here, we describe a 53-year-old woman with purely cutaneous Rosai-Dorfman disease. The patient presented with a large pigmented plaque on her left leg, and sparse erythematous papules on her face and arms. A complete clinical response was achieved with thalidomide, followed by recurrence at the initial site one year later. The histological examination displayed the typical features of Rosai-Dorfman disease in the recent lesions but not in the older lesions. In the setting of no lymphadenopathy, the histopathological features of Rosai-Dorfman disease are commonly misinterpreted. Therefore, awareness of the histological aspects present at different stages, not always featuring the hallmark microscopic signs of Rosai-Dorfman disease, is particularly important for a correct diagnosis of this rare disorder.
Detection of pigment network in dermatoscopy images using texture analysis
Anantha, Murali; Moss, Randy H.; Stoecker, William V.
2011-01-01
Dermatoscopy, also known as dermoscopy or epiluminescence microscopy (ELM), is a non-invasive, in vivo technique, which permits visualization of features of pigmented melanocytic neoplasms that are not discernable by examination with the naked eye. ELM offers a completely new range of visual features. One such prominent feature is the pigment network. Two texture-based algorithms are developed for the detection of pigment network. These methods are applicable to various texture patterns in dermatoscopy images, including patterns that lack fine lines such as cobblestone, follicular, or thickened network patterns. Two texture algorithms, Laws energy masks and the neighborhood gray-level dependence matrix (NGLDM) large number emphasis, were optimized on a set of 155 dermatoscopy images and compared. Results suggest superiority of Laws energy masks for pigment network detection in dermatoscopy images. For both methods, a texel width of 10 pixels or approximately 0.22 mm is found for dermatoscopy images. PMID:15249068
Centimeter to Decimeter Size Spherical and Cylindrical Features in Gale Crater Sediments
NASA Technical Reports Server (NTRS)
Wiens, R. C.; Maurice, S.; Gasnault, O.; Clegg, S.; Fabre, C.; Nachon, M.; Rubin, D.; Goetz, W.; Mangold, N.; Schroeder, S.;
2015-01-01
The Curiosity rover traverse in Gale crater has explored a large series of sedimentary deposits in an ancient lake on Mars. Over the nine kilometers of traverse a recurrent observation has been southward-dipping sedimentary strata, from Shaler at the edge of Yellowknife Bay to the striated units near the Kimberley. Within the sedimentary strata cm- to decimeter- size hollow spheroidal objects and some apparent cylindrical objects have been observed. These features have not been seen by previous landed missions. The first of these were observed on sol 122 in the Gillespie Lake member at Yellowknife Bay. Additional hollow features were observed in the Point Lake outcrop in the same area. More recently a spherical and apparently hollow object, Winnipesaukee, was observed by ChemCam and Mastcam on sol 653. Here we describe the settings, morphology, and associated compositions, and we discuss possible origins of these objects.
Study on Huizhou architecture of point cloud registration based on optimized ICP algorithm
NASA Astrophysics Data System (ADS)
Zhang, Runmei; Wu, Yulu; Zhang, Guangbin; Zhou, Wei; Tao, Yuqian
2018-03-01
In view of the current point cloud registration software has high hardware requirements, heavy workload and moltiple interactive definition, the source of software with better processing effect is not open, a two--step registration method based on normal vector distribution feature and coarse feature based iterative closest point (ICP) algorithm is proposed in this paper. This method combines fast point feature histogram (FPFH) algorithm, define the adjacency region of point cloud and the calculation model of the distribution of normal vectors, setting up the local coordinate system for each key point, and obtaining the transformation matrix to finish rough registration, the rough registration results of two stations are accurately registered by using the ICP algorithm. Experimental results show that, compared with the traditional ICP algorithm, the method used in this paper has obvious time and precision advantages for large amount of point clouds.
Fractal based modelling and analysis of electromyography (EMG) to identify subtle actions.
Arjunan, Sridhar P; Kumar, Dinesh K
2007-01-01
The paper reports the use of fractal theory and fractal dimension to study the non-linear properties of surface electromyogram (sEMG) and to use these properties to classify subtle hand actions. The paper reports identifying a new feature of the fractal dimension, the bias that has been found to be useful in modelling the muscle activity and of sEMG. Experimental results demonstrate that the feature set consisting of bias values and fractal dimension of the recordings is suitable for classification of sEMG against the different hand gestures. The scatter plots demonstrate the presence of simple relationships of these features against the four hand gestures. The results indicate that there is small inter-experimental variation but large inter-subject variation. This may be due to differences in the size and shape of muscles for different subjects. The possible applications of this research include use in developing prosthetic hands, controlling machines and computers.
Finessing filter scarcity problem in face recognition via multi-fold filter convolution
NASA Astrophysics Data System (ADS)
Low, Cheng-Yaw; Teoh, Andrew Beng-Jin
2017-06-01
The deep convolutional neural networks for face recognition, from DeepFace to the recent FaceNet, demand a sufficiently large volume of filters for feature extraction, in addition to being deep. The shallow filter-bank approaches, e.g., principal component analysis network (PCANet), binarized statistical image features (BSIF), and other analogous variants, endure the filter scarcity problem that not all PCA and ICA filters available are discriminative to abstract noise-free features. This paper extends our previous work on multi-fold filter convolution (ℳ-FFC), where the pre-learned PCA and ICA filter sets are exponentially diversified by ℳ folds to instantiate PCA, ICA, and PCA-ICA offspring. The experimental results unveil that the 2-FFC operation solves the filter scarcity state. The 2-FFC descriptors are also evidenced to be superior to that of PCANet, BSIF, and other face descriptors, in terms of rank-1 identification rate (%).
Large-scale oscillation of structure-related DNA sequence features in human chromosome 21
NASA Astrophysics Data System (ADS)
Li, Wentian; Miramontes, Pedro
2006-08-01
Human chromosome 21 is the only chromosome in the human genome that exhibits oscillation of the (G+C) content of a cycle length of hundreds kilobases (kb) ( 500kb near the right telomere). We aim at establishing the existence of a similar periodicity in structure-related sequence features in order to relate this (G+C)% oscillation to other biological phenomena. The following quantities are shown to oscillate with the same 500kb periodicity in human chromosome 21: binding energy calculated by two sets of dinucleotide-based thermodynamic parameters, AA/TT and AAA/TTT bi- and tri-nucleotide density, 5'-TA-3' dinucleotide density, and signal for 10- or 11-base periodicity of AA/TT or AAA/TTT. These intrinsic quantities are related to structural features of the double helix of DNA molecules, such as base-pair binding, untwisting or unwinding, stiffness, and a putative tendency for nucleosome formation.
Ng, Hui Wen; Doughty, Stephen W; Luo, Heng; Ye, Hao; Ge, Weigong; Tong, Weida; Hong, Huixiao
2015-12-21
Some chemicals in the environment possess the potential to interact with the endocrine system in the human body. Multiple receptors are involved in the endocrine system; estrogen receptor α (ERα) plays very important roles in endocrine activity and is the most studied receptor. Understanding and predicting estrogenic activity of chemicals facilitates the evaluation of their endocrine activity. Hence, we have developed a decision forest classification model to predict chemical binding to ERα using a large training data set of 3308 chemicals obtained from the U.S. Food and Drug Administration's Estrogenic Activity Database. We tested the model using cross validations and external data sets of 1641 chemicals obtained from the U.S. Environmental Protection Agency's ToxCast project. The model showed good performance in both internal (92% accuracy) and external validations (∼ 70-89% relative balanced accuracies), where the latter involved the validations of the model across different ER pathway-related assays in ToxCast. The important features that contribute to the prediction ability of the model were identified through informative descriptor analysis and were related to current knowledge of ER binding. Prediction confidence analysis revealed that the model had both high prediction confidence and accuracy for most predicted chemicals. The results demonstrated that the model constructed based on the large training data set is more accurate and robust for predicting ER binding of chemicals than the published models that have been developed using much smaller data sets. The model could be useful for the evaluation of ERα-mediated endocrine activity potential of environmental chemicals.
Shrivastava, Vimal K; Londhe, Narendra D; Sonawane, Rajendra S; Suri, Jasjit S
2015-10-01
A large percentage of dermatologist׳s decision in psoriasis disease assessment is based on color. The current computer-aided diagnosis systems for psoriasis risk stratification and classification lack the vigor of color paradigm. The paper presents an automated psoriasis computer-aided diagnosis (pCAD) system for classification of psoriasis skin images into psoriatic lesion and healthy skin, which solves the two major challenges: (i) fulfills the color feature requirements and (ii) selects the powerful dominant color features while retaining high classification accuracy. Fourteen color spaces are discovered for psoriasis disease analysis leading to 86 color features. The pCAD system is implemented in a support vector-based machine learning framework where the offline image data set is used for computing machine learning offline color machine learning parameters. These are then used for transformation of the online color features to predict the class labels for healthy vs. diseased cases. The above paradigm uses principal component analysis for color feature selection of dominant features, keeping the original color feature unaltered. Using the cross-validation protocol, the above machine learning protocol is compared against the standalone grayscale features with 60 features and against the combined grayscale and color feature set of 146. Using a fixed data size of 540 images with equal number of healthy and diseased, 10 fold cross-validation protocol, and SVM of polynomial kernel of type two, pCAD system shows an accuracy of 99.94% with sensitivity and specificity of 99.93% and 99.96%. Using a varying data size protocol, the mean classification accuracies for color, grayscale, and combined scenarios are: 92.85%, 93.83% and 93.99%, respectively. The reliability of the system in these three scenarios are: 94.42%, 97.39% and 96.00%, respectively. We conclude that pCAD system using color space alone is compatible to grayscale space or combined color and grayscale spaces. We validated our pCAD system against facial color databases and the results are consistent in accuracy and reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
WebViz:A Web-based Collaborative Interactive Visualization System for large-Scale Data Sets
NASA Astrophysics Data System (ADS)
Yuen, D. A.; McArthur, E.; Weiss, R. M.; Zhou, J.; Yao, B.
2010-12-01
WebViz is a web-based application designed to conduct collaborative, interactive visualizations of large data sets for multiple users, allowing researchers situated all over the world to utilize the visualization services offered by the University of Minnesota’s Laboratory for Computational Sciences and Engineering (LCSE). This ongoing project has been built upon over the last 3 1/2 years .The motivation behind WebViz lies primarily with the need to parse through an increasing amount of data produced by the scientific community as a result of larger and faster multicore and massively parallel computers coming to the market, including the use of general purpose GPU computing. WebViz allows these large data sets to be visualized online by anyone with an account. The application allows users to save time and resources by visualizing data ‘on the fly’, wherever he or she may be located. By leveraging AJAX via the Google Web Toolkit (http://code.google.com/webtoolkit/), we are able to provide users with a remote, web portal to LCSE's (http://www.lcse.umn.edu) large-scale interactive visualization system already in place at the University of Minnesota. LCSE’s custom hierarchical volume rendering software provides high resolution visualizations on the order of 15 million pixels and has been employed for visualizing data primarily from simulations in astrophysics to geophysical fluid dynamics . In the current version of WebViz, we have implemented a highly extensible back-end framework built around HTTP "server push" technology. The web application is accessible via a variety of devices including netbooks, iPhones, and other web and javascript-enabled cell phones. Features in the current version include the ability for users to (1) securely login (2) launch multiple visualizations (3) conduct collaborative visualization sessions (4) delegate control aspects of a visualization to others and (5) engage in collaborative chats with other users within the user interface of the web application. These features are all in addition to a full range of essential visualization functions including 3-D camera and object orientation, position manipulation, time-stepping control, and custom color/alpha mapping.
Reproducibility of radiomics for deciphering tumor phenotype with imaging
NASA Astrophysics Data System (ADS)
Zhao, Binsheng; Tan, Yongqiang; Tsai, Wei-Yann; Qi, Jing; Xie, Chuanmiao; Lu, Lin; Schwartz, Lawrence H.
2016-03-01
Radiomics (radiogenomics) characterizes tumor phenotypes based on quantitative image features derived from routine radiologic imaging to improve cancer diagnosis, prognosis, prediction and response to therapy. Although radiomic features must be reproducible to qualify as biomarkers for clinical care, little is known about how routine imaging acquisition techniques/parameters affect reproducibility. To begin to fill this knowledge gap, we assessed the reproducibility of a comprehensive, commonly-used set of radiomic features using a unique, same-day repeat computed tomography data set from lung cancer patients. Each scan was reconstructed at 6 imaging settings, varying slice thicknesses (1.25 mm, 2.5 mm and 5 mm) and reconstruction algorithms (sharp, smooth). Reproducibility was assessed using the repeat scans reconstructed at identical imaging setting (6 settings in total). In separate analyses, we explored differences in radiomic features due to different imaging parameters by assessing the agreement of these radiomic features extracted from the repeat scans reconstructed at the same slice thickness but different algorithms (3 settings in total). Our data suggest that radiomic features are reproducible over a wide range of imaging settings. However, smooth and sharp reconstruction algorithms should not be used interchangeably. These findings will raise awareness of the importance of properly setting imaging acquisition parameters in radiomics/radiogenomics research.
Intelligent System Development Using a Rough Sets Methodology
NASA Technical Reports Server (NTRS)
Anderson, Gray T.; Shelton, Robert O.
1997-01-01
The purpose of this research was to examine the potential of the rough sets technique for developing intelligent models of complex systems from limited information. Rough sets a simple but promising technology to extract easily understood rules from data. The rough set methodology has been shown to perform well when used with a large set of exemplars, but its performance with sparse data sets is less certain. The difficulty is that rules will be developed based on just a few examples, each of which might have a large amount of noise associated with them. The question then becomes, what is the probability of a useful rule being developed from such limited information? One nice feature of rough sets is that in unusual situations, the technique can give an answer of 'I don't know'. That is, if a case arises that is different from the cases the rough set rules were developed on, the methodology can recognize this and alert human operators of it. It can also be trained to do this when the desired action is unknown because conflicting examples apply to the same set of inputs. This summer's project was to look at combining rough set theory with statistical theory to develop confidence limits in rules developed by rough sets. Often it is important not to make a certain type of mistake (e.g., false positives or false negatives), so the rules must be biased toward preventing a catastrophic error, rather than giving the most likely course of action. A method to determine the best course of action in the light of such constraints was examined. The resulting technique was tested with files containing electrical power line 'signatures' from the space shuttle and with decompression sickness data.
Enhanced attentional gain as a mechanism for generalized perceptual learning in human visual cortex.
Byers, Anna; Serences, John T
2014-09-01
Learning to better discriminate a specific visual feature (i.e., a specific orientation in a specific region of space) has been associated with plasticity in early visual areas (sensory modulation) and with improvements in the transmission of sensory information from early visual areas to downstream sensorimotor and decision regions (enhanced readout). However, in many real-world scenarios that require perceptual expertise, observers need to efficiently process numerous exemplars from a broad stimulus class as opposed to just a single stimulus feature. Some previous data suggest that perceptual learning leads to highly specific neural modulations that support the discrimination of specific trained features. However, the extent to which perceptual learning acts to improve the discriminability of a broad class of stimuli via the modulation of sensory responses in human visual cortex remains largely unknown. Here, we used functional MRI and a multivariate analysis method to reconstruct orientation-selective response profiles based on activation patterns in the early visual cortex before and after subjects learned to discriminate small offsets in a set of grating stimuli that were rendered in one of nine possible orientations. Behavioral performance improved across 10 training sessions, and there was a training-related increase in the amplitude of orientation-selective response profiles in V1, V2, and V3 when orientation was task relevant compared with when it was task irrelevant. These results suggest that generalized perceptual learning can lead to modified responses in the early visual cortex in a manner that is suitable for supporting improved discriminability of stimuli drawn from a large set of exemplars. Copyright © 2014 the American Physiological Society.
Bartesaghi, Alberto; Sapiro, Guillermo; Subramaniam, Sriram
2006-01-01
Electron tomography allows for the determination of the three-dimensional structures of cells and tissues at resolutions significantly higher than that which is possible with optical microscopy. Electron tomograms contain, in principle, vast amounts of information on the locations and architectures of large numbers of subcellular assemblies and organelles. The development of reliable quantitative approaches for the analysis of features in tomograms is an important problem, and a challenging prospect due to the low signal-to-noise ratios that are inherent to biological electron microscopic images. This is, in part, a consequence of the tremendous complexity of biological specimens. We report on a new method for the automated segmentation of HIV particles and selected cellular compartments in electron tomograms recorded from fixed, plastic-embedded sections derived from HIV-infected human macrophages. Individual features in the tomogram are segmented using a novel robust algorithm that finds their boundaries as global minimal surfaces in a metric space defined by image features. The optimization is carried out in a transformed spherical domain with the center an interior point of the particle of interest, providing a proper setting for the fast and accurate minimization of the segmentation energy. This method provides tools for the semi-automated detection and statistical evaluation of HIV particles at different stages of assembly in the cells and presents opportunities for correlation with biochemical markers of HIV infection. The segmentation algorithm developed here forms the basis of the automated analysis of electron tomograms and will be especially useful given the rapid increases in the rate of data acquisition. It could also enable studies of much larger data sets, such as those which might be obtained from the tomographic analysis of HIV-infected cells from studies of large populations. PMID:16190467
W-tree indexing for fast visual word generation.
Shi, Miaojing; Xu, Ruixin; Tao, Dacheng; Xu, Chao
2013-03-01
The bag-of-visual-words representation has been widely used in image retrieval and visual recognition. The most time-consuming step in obtaining this representation is the visual word generation, i.e., assigning visual words to the corresponding local features in a high-dimensional space. Recently, structures based on multibranch trees and forests have been adopted to reduce the time cost. However, these approaches cannot perform well without a large number of backtrackings. In this paper, by considering the spatial correlation of local features, we can significantly speed up the time consuming visual word generation process while maintaining accuracy. In particular, visual words associated with certain structures frequently co-occur; hence, we can build a co-occurrence table for each visual word for a large-scale data set. By associating each visual word with a probability according to the corresponding co-occurrence table, we can assign a probabilistic weight to each node of a certain index structure (e.g., a KD-tree and a K-means tree), in order to re-direct the searching path to be close to its global optimum within a small number of backtrackings. We carefully study the proposed scheme by comparing it with the fast library for approximate nearest neighbors and the random KD-trees on the Oxford data set. Thorough experimental results suggest the efficiency and effectiveness of the new scheme.
Minutia Tensor Matrix: A New Strategy for Fingerprint Matching
Fu, Xiang; Feng, Jufu
2015-01-01
Establishing correspondences between two minutia sets is a fundamental issue in fingerprint recognition. This paper proposes a new tensor matching strategy. First, the concept of minutia tensor matrix (simplified as MTM) is proposed. It describes the first-order features and second-order features of a matching pair. In the MTM, the diagonal elements indicate similarities of minutia pairs and non-diagonal elements indicate pairwise compatibilities between minutia pairs. Correct minutia pairs are likely to establish both large similarities and large compatibilities, so they form a dense sub-block. Minutia matching is then formulated as recovering the dense sub-block in the MTM. This is a new tensor matching strategy for fingerprint recognition. Second, as fingerprint images show both local rigidity and global nonlinearity, we design two different kinds of MTMs: local MTM and global MTM. Meanwhile, a two-level matching algorithm is proposed. For local matching level, the local MTM is constructed and a novel local similarity calculation strategy is proposed. It makes full use of local rigidity in fingerprints. For global matching level, the global MTM is constructed to calculate similarities of entire minutia sets. It makes full use of global compatibility in fingerprints. Proposed method has stronger description ability and better robustness to noise and nonlinearity. Experiments conducted on Fingerprint Verification Competition databases (FVC2002 and FVC2004) demonstrate the effectiveness and the efficiency. PMID:25822489
Feature Selection for Chemical Sensor Arrays Using Mutual Information
Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.
2014-01-01
We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058
LoBue, Vanessa; Baker, Lewis; Thrasher, Cat
2017-08-10
Researchers have been interested in the perception of human emotional expressions for decades. Importantly, most empirical work in this domain has relied on controlled stimulus sets of adults posing for various emotional expressions. Recently, the Child Affective Facial Expression (CAFE) set was introduced to the scientific community, featuring a large validated set of photographs of preschool aged children posing for seven different emotional expressions. Although the CAFE set was extensively validated using adult participants, the set was designed for use with children. It is therefore necessary to verify that adult validation applies to child performance. In the current study, we examined 3- to 4-year-olds' identification of a subset of children's faces in the CAFE set, and compared it to adult ratings cited in previous research. Our results demonstrate an exceptionally strong relationship between adult ratings of the CAFE photos and children's ratings, suggesting that the adult validation of the set can be applied to preschool-aged participants. The results are discussed in terms of methodological implications for the use of the CAFE set with children, and theoretical implications for using the set to study the development of emotion perception in early childhood.
Walsh-Hadamard transform kernel-based feature vector for shot boundary detection.
Lakshmi, Priya G G; Domnic, S
2014-12-01
Video shot boundary detection (SBD) is the first step of video analysis, summarization, indexing, and retrieval. In SBD process, videos are segmented into basic units called shots. In this paper, a new SBD method is proposed using color, edge, texture, and motion strength as vector of features (feature vector). Features are extracted by projecting the frames on selected basis vectors of Walsh-Hadamard transform (WHT) kernel and WHT matrix. After extracting the features, based on the significance of the features, weights are calculated. The weighted features are combined to form a single continuity signal, used as input for Procedure Based shot transition Identification process (PBI). Using the procedure, shot transitions are classified into abrupt and gradual transitions. Experimental results are examined using large-scale test sets provided by the TRECVID 2007, which has evaluated hard cut and gradual transition detection. To evaluate the robustness of the proposed method, the system evaluation is performed. The proposed method yields F1-Score of 97.4% for cut, 78% for gradual, and 96.1% for overall transitions. We have also evaluated the proposed feature vector with support vector machine classifier. The results show that WHT-based features can perform well than the other existing methods. In addition to this, few more video sequences are taken from the Openvideo project and the performance of the proposed method is compared with the recent existing SBD method.
X-ray EM simulation tool for ptychography dataset construction
NASA Astrophysics Data System (ADS)
Stoevelaar, L. Pjotr; Gerini, Giampiero
2018-03-01
In this paper, we present an electromagnetic full-wave modeling framework, as a support EM tool providing data sets for X-ray ptychographic imaging. Modeling the entire scattering problem with Finite Element Method (FEM) tools is, in fact, a prohibitive task, because of the large area illuminated by the beam (due to the poor focusing power at these wavelengths) and the very small features to be imaged. To overcome this problem, the spectrum of the illumination beam is decomposed into a discrete set of plane waves. This allows reducing the electromagnetic modeling volume to the one enclosing the area to be imaged. The total scattered field is reconstructed by superimposing the solutions for each plane wave illumination.
Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.
Yousef, Malik; Saçar Demirci, Müşerref Duygu; Khalifa, Waleed; Allmer, Jens
2016-01-01
MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.
Andrabi, Munazah; Hutchins, Andrew Paul; Miranda-Saavedra, Diego; Kono, Hidetoshi; Nussinov, Ruth; Mizuguchi, Kenji; Ahmad, Shandar
2017-06-22
DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.
NASA Astrophysics Data System (ADS)
Székely, B.; Karátson, D.; Koma, Zs.; Dorninger, P.; Wörner, G.; Brandmeier, M.; Nothegger, C.
2012-04-01
The Western slope of the Central Andes between 22° and 17°S is characterized by large, quasi-planar landforms with tilted ignimbrite surfaces and overlying younger sedimentary deposits (e.g. Nazca, Oxaya, Huaylillas ignimbrites). These surfaces were only modified by tectonic uplift and tilting of the Western Cordillera preserving minor now fossilized drainage systems. Several deep, canyons started to form from about 5 Ma ago. Due to tectonic oversteepening in a arid region of very low erosion rates, gravitational collapses and landslides additionally modified the Andean slope and valley flanks. Large areas of fossil surfaces, however, remain. The age of these surfaces has been dated between 11 Ma and 25 Ma at elevations of 3500 m in the Precordillera and at c. 1000 m near the coast. Due to their excellent preservation, our aim is to identify, delineate, and reconstruct these original ignimbrite and sediment surfaces via a sophisticated evaluation of SRTM DEMs. The technique we use here is a robust morphological segmentation method that is insensitive to a certain amount of outliers, even if they are spatially correlated. This paves the way to identify common local planar features and combine these into larger areas of a particular surface segment. Erosional dissection and faulting, tilting and folding define subdomains, and thus the original quasi-planar surfaces are modified. Additional processes may create younger surfaces, such as sedimentary floodplains and salt pans. The procedure is tuned to provide a distinction of these features. The technique is based on the evaluation of local normal vectors (perpendicular to the actual surface) that are obtained by determination of locally fitting planes. Then, this initial set of normal vectors are gradually classified into groups with similar properties providing candidate point clouds that are quasi co-planar. The quasi co-planar sets of points are analysed further against other criteria, such as number of minimum points, maximized standard deviation of spatial scatter, maximum point-to-plane surface, etc. SRTM DEMs of selected areas of the Western slope of the Central Andes have been processed with various parameter sets. The resulting domain structure shows strong correlation with tectonic features (e.g. faulting) and younger depositional surfaces whereas other segmentation features appear or disappear depending on parameters of the analysis. For example, a fine segmentation results - for a given study area - in ca. 2500 planar features (of course not all are geologically meaningful), whereas a more meaningful result has an order of magnitude less planes, ca. 270. The latter segmentation still covers the key areas, and the dissecting features (e.g., large incised canyons) are typically identified. For the fine segmentation version an area of 3863 km2 is covered by fitted planes for the ignimbrite surfaces, whereas for the more robust segmentation this area is 2555 km2. The same values for the sedimentary surfaces are 3162 km2 and 2080 km2, respectively. The total processed area was 14498 km2. As the previous numbers and the 18,1% and 18,6% decrease in the coverage suggest, the robust segmentation remains meaningful for large parts of the area while the number of planar features decreased by an order of magnitude. This result also emphasizes the importance of the initial parameters. To verify the results in more detail, residuals (difference between measured and modelled elevation) are also evaluated, and the results are fed back to the segmentation procedure. Steeper landscapes (young volcanic edifices) are clearly separated from higher-order (long-wavelength) structures. This method allows to quantitatively identify uniform surface segments and to relate these to geologically and morphologically meaningful parameters (type of depositional surface, rock type, surface age).
How Many Dystonias? Clinical Evidence.
Albanese, Alberto
2017-01-01
Literary reports on dystonia date back to post-Medieval times. Medical reports are instead more recent. We review here the early descriptions and the historical establishment of a consensus on the clinical phenomenology and the diagnostic features of dystonia syndromes. Lumping and splitting exercises have characterized this area of knowledge, and it remains largely unclear how many dystonia types we are to count. This review describes the history leading to recognize that focal dystonia syndromes are a coherent clinical set encompassing cranial dystonia (including blepharospasm), oromandibular dystonia, spasmodic torticollis, truncal dystonia, writer's cramp, and other occupational dystonias. Papers describing features of dystonia and diagnostic criteria are critically analyzed and put into historical perspective. Issues and inconsistencies in this lumping effort are discussed, and the currently unmet needs are critically reviewed.
FEATURE 3, LARGE GUN POSITION, ARMCO HUT (FEATURE 4) IN ...
FEATURE 3, LARGE GUN POSITION, ARMCO HUT (FEATURE 4) IN BACKGROUND, VIEW FACING NORTH. - Naval Air Station Barbers Point, Anti-Aircraft Battery Complex-Large Gun Position, East of Coral Sea Road, northwest of Hamilton Road, Ewa, Honolulu County, HI
Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H
2017-07-01
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.
Wang, Dayong; Otto, Charles; Jain, Anil K
2017-06-01
Given the prevalence of social media websites, one challenge facing computer vision researchers is to devise methods to search for persons of interest among the billions of shared photos on these websites. Despite significant progress in face recognition, searching a large collection of unconstrained face images remains a difficult problem. To address this challenge, we propose a face search system which combines a fast search procedure, coupled with a state-of-the-art commercial off the shelf (COTS) matcher, in a cascaded framework. Given a probe face, we first filter the large gallery of photos to find the top- k most similar faces using features learned by a convolutional neural network. The k retrieved candidates are re-ranked by combining similarities based on deep features and those output by the COTS matcher. We evaluate the proposed face search system on a gallery containing 80 million web-downloaded face images. Experimental results demonstrate that while the deep features perform worse than the COTS matcher on a mugshot dataset (93.7 percent versus 98.6 percent TAR@FAR of 0.01 percent), fusing the deep features with the COTS matcher improves the overall performance ( 99.5 percent TAR@FAR of 0.01 percent). This shows that the learned deep features provide complementary information over representations used in state-of-the-art face matchers. On the unconstrained face image benchmarks, the performance of the learned deep features is competitive with reported accuracies. LFW database: 98.20 percent accuracy under the standard protocol and 88.03 percent TAR@FAR of 0.1 percent under the BLUFR protocol; IJB-A benchmark: 51.0 percent TAR@FAR of 0.1 percent (verification), rank 1 retrieval of 82.2 percent (closed-set search), 61.5 percent FNIR@FAR of 1 percent (open-set search). The proposed face search system offers an excellent trade-off between accuracy and scalability on galleries with millions of images. Additionally, in a face search experiment involving photos of the Tsarnaev brothers, convicted of the Boston Marathon bombing, the proposed cascade face search system could find the younger brother's (Dzhokhar Tsarnaev) photo at rank 1 in 1 second on a 5 M gallery and at rank 8 in 7 seconds on an 80 M gallery.
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Internal Gravity Waves: Generation and Breaking Mechanisms by Laboratory Experiments
NASA Astrophysics Data System (ADS)
la Forgia, Giovanni; Adduce, Claudia; Falcini, Federico
2016-04-01
Internal gravity waves (IGWs), occurring within estuaries and the coastal oceans, are manifest as large amplitude undulations of the pycnocline. IGWs propagating horizontally in a two layer stratified fluid are studied. The breaking of an IGW of depression shoaling upon a uniformly sloping boundary is investigated experimentally. Breaking dynamics beneath the shoaling waves causes both mixing and wave-induced near-bottom vortices suspending and redistributing the bed material. Laboratory experiments are conducted in a Perspex tank through the standard lock-release method, following the technique described in Sutherland et al. (2013). Each experiment is analysed and the instantaneous pycnocline position is measured, in order to obtain both geometric and kinematic features of the IGW: amplitude, wavelength and celerity. IGWs main features depend on the geometrical parameters that define the initial experimental setting: the density difference between the layers, the total depth, the layers depth ratio, the aspect ratio, and the displacement between the pycnoclines. Relations between IGWs geometric and kinematic features and the initial setting parameters are analysed. The approach of the IGWs toward a uniform slope is investigated in the present experiments. Depending on wave and slope characteristics, different breaking and mixing processes are observed. Sediments are sprinkled on the slope to visualize boundary layer separation in order to analyze the suspension e redistribution mechanisms due to the wave breaking.
When will low-contrast features be visible in a STEM X-ray spectrum image?
Parish, Chad M.
2015-04-01
When will a small or low-contrast feature, such as an embedded second-phase particle, be visible in a scanning transmission electron microscopy (STEM) X-ray map? This work illustrates a computationally inexpensive method to simulate X-ray maps and spectrum images (SIs), based upon the equations of X-ray generation and detection. To particularize the general procedure, an example of nanostructured ferritic alloy (NFA) containing nm-sized Y 2Ti 2O 7 embedded precipitates in ferritic stainless steel matrix is chosen. The proposed model produces physically appearing simulated SI data sets, which can either be reduced to X-ray dot maps or analyzed via multivariate statistical analysis.more » Comparison to NFA X-ray maps acquired using three different STEM instruments match the generated simulations quite well, despite the large number of simplifying assumptions used. A figure of merit of electron dose multiplied by X-ray collection solid angle is proposed to compare feature detectability from one data set (simulated or experimental) to another. The proposed method can scope experiments that are feasible under specific analysis conditions on a given microscope. As a result, future applications, such as spallation proton–neutron irradiations, core-shell nanoparticles, or dopants in polycrystalline photovoltaic solar cells, are proposed.« less
Optical phased array configuration for an extremely large telescope.
Meinel, Aden Baker; Meinel, Marjorie Pettit
2004-01-20
Extremely large telescopes are currently under consideration by several groups in several countries. Extrapolation of current technology up to 30 m indicates a cost of over dollars 1 billion. Innovative concepts are being explored to find significant cost reductions. We explore the concept of an Optical Phased Array (OPA) telescope. Each element of the OPA is a separate Cassegrain telescope. Collimated beams from the array are sent via an associated set of delay lines to a central beam combiner. This array of small telescope elements offers the possibility of starting with a low-cost array of a few rings of elements, adding structure and additional Cass elements until the desired diameter telescope is attained. We address the salient features of such an extremely large telescope and cost elements relative to more conventional options.
Speech recognition features for EEG signal description in detection of neonatal seizures.
Temko, A; Boylan, G; Marnane, W; Lightbody, G
2010-01-01
In this work, features which are usually employed in automatic speech recognition (ASR) are used for the detection of neonatal seizures in newborn EEG. Three conventional ASR feature sets are compared to the feature set which has been previously developed for this task. The results indicate that the thoroughly-studied spectral envelope based ASR features perform reasonably well on their own. Additionally, the SVM Recursive Feature Elimination routine is applied to all extracted features pooled together. It is shown that ASR features consistently appear among the top-rank features.
Wessler, Benjamin S; Thaler, David E; Ruthazer, Robin; Weimar, Christian; Di Tullio, Marco R; Elkind, Mitchell S V; Homma, Shunichi; Lutz, Jennifer S; Mas, Jean-Louis; Mattle, Heinrich P; Meier, Bernhard; Nedeltchev, Krassen; Papetti, Federica; Di Angelantonio, Emanuele; Reisman, Mark; Serena, Joaquín; Kent, David M
2014-01-01
Patent foramen ovale (PFO) is associated with cryptogenic stroke (CS), although the pathogenicity of a discovered PFO in the setting of CS is typically unclear. Transesophageal echocardiography features such as PFO size, associated hypermobile septum, and presence of a right-to-left shunt at rest have all been proposed as markers of risk. The association of these transesophageal echocardiography features with other markers of pathogenicity has not been examined. We used a recently derived score based on clinical and neuroimaging features to stratify patients with PFO and CS by the probability that their stroke is PFO-attributable. We examined whether high-risk transesophageal echocardiography features are seen more frequently in patients more likely to have had a PFO-attributable stroke (n=637) compared with those less likely to have a PFO-attributable stroke (n=657). Large physiologic shunt size was not more frequently seen among those with probable PFO-attributable strokes (odds ratio [OR], 0.92; P=0.53). The presence of neither a hypermobile septum nor a right-to-left shunt at rest was detected more often in those with a probable PFO-attributable stroke (OR, 0.80; P=0.45; OR, 1.15; P=0.11, respectively). We found no evidence that the proposed transesophageal echocardiography risk markers of large PFO size, hypermobile septum, and presence of right-to-left shunt at rest are associated with clinical features suggesting that a CS is PFO-attributable. Additional tools to describe PFOs may be useful in helping to determine whether an observed PFO is incidental or pathogenically related to CS.
Diller, Kyle I; Bayden, Alexander S; Audie, Joseph; Diller, David J
2018-01-01
There is growing interest in peptide-based drug design and discovery. Due to their relatively large size, polymeric nature, and chemical complexity, the design of peptide-based drugs presents an interesting "big data" challenge. Here, we describe an interactive computational environment, PeptideNavigator, for naturally exploring the tremendous amount of information generated during a peptide drug design project. The purpose of PeptideNavigator is the presentation of large and complex experimental and computational data sets, particularly 3D data, so as to enable multidisciplinary scientists to make optimal decisions during a peptide drug discovery project. PeptideNavigator provides users with numerous viewing options, such as scatter plots, sequence views, and sequence frequency diagrams. These views allow for the collective visualization and exploration of many peptides and their properties, ultimately enabling the user to focus on a small number of peptides of interest. To drill down into the details of individual peptides, PeptideNavigator provides users with a Ramachandran plot viewer and a fully featured 3D visualization tool. Each view is linked, allowing the user to seamlessly navigate from collective views of large peptide data sets to the details of individual peptides with promising property profiles. Two case studies, based on MHC-1A activating peptides and MDM2 scaffold design, are presented to demonstrate the utility of PeptideNavigator in the context of disparate peptide-design projects. Copyright © 2017 Elsevier Ltd. All rights reserved.
Automatic feature design for optical character recognition using an evolutionary search procedure.
Stentiford, F W
1985-03-01
An automatic evolutionary search is applied to the problem of feature extraction in an OCR application. A performance measure based on feature independence is used to generate features which do not appear to suffer from peaking effects [17]. Features are extracted from a training set of 30 600 machine printed 34 class alphanumeric characters derived from British mail. Classification results on the training set and a test set of 10 200 characters are reported for an increasing number of features. A 1.01 percent forced decision error rate is obtained on the test data using 316 features. The hardware implementation should be cheap and fast to operate. The performance compares favorably with current low cost OCR page readers.
Wong, Gerard; Leckie, Christopher; Kowalczyk, Adam
2012-01-15
Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR.
Ataer-Cansizoglu, E; Kalpathy-Cramer, J; You, S; Keck, K; Erdogmus, D; Chiang, M F
2015-01-01
Inter-expert variability in image-based clinical diagnosis has been demonstrated in many diseases including retinopathy of prematurity (ROP), which is a disease affecting low birth weight infants and is a major cause of childhood blindness. In order to better understand the underlying causes of variability among experts, we propose a method to quantify the variability of expert decisions and analyze the relationship between expert diagnoses and features computed from the images. Identification of these features is relevant for development of computer-based decision support systems and educational systems in ROP, and these methods may be applicable to other diseases where inter-expert variability is observed. The experiments were carried out on a dataset of 34 retinal images, each with diagnoses provided independently by 22 experts. Analysis was performed using concepts of Mutual Information (MI) and Kernel Density Estimation. A large set of structural features (a total of 66) were extracted from retinal images. Feature selection was utilized to identify the most important features that correlated to actual clinical decisions by the 22 study experts. The best three features for each observer were selected by an exhaustive search on all possible feature subsets and considering joint MI as a relevance criterion. We also compared our results with the results of Cohen's Kappa [36] as an inter-rater reliability measure. The results demonstrate that a group of observers (17 among 22) decide consistently with each other. Mean and second central moment of arteriolar tortuosity is among the reasons of disagreement between this group and the rest of the observers, meaning that the group of experts consider amount of tortuosity as well as the variation of tortuosity in the image. Given a set of image-based features, the proposed analysis method can identify critical image-based features that lead to expert agreement and disagreement in diagnosis of ROP. Although tree-based features and various statistics such as central moment are not popular in the literature, our results suggest that they are important for diagnosis.
Data Exploration using Unsupervised Feature Extraction for Mixed Micro-Seismic Signals
NASA Astrophysics Data System (ADS)
Meyer, Matthias; Weber, Samuel; Beutel, Jan
2017-04-01
We present a system for the analysis of data originating in a multi-sensor and multi-year experiment focusing on slope stability and its underlying processes in fractured permafrost rock walls undertaken at 3500m a.s.l. on the Matterhorn Hörnligrat, (Zermatt, Switzerland). This system incorporates facilities for the transmission, management and storage of large-scales of data ( 7 GB/day), preprocessing and aggregation of multiple sensor types, machine-learning based automatic feature extraction for micro-seismic and acoustic emission data and interactive web-based visualization of the data. Specifically, a combination of three types of sensors are used to profile the frequency spectrum from 1 Hz to 80 kHz with the goal to identify the relevant destructive processes (e.g. micro-cracking and fracture propagation) leading to the eventual destabilization of large rock masses. The sensors installed for this profiling experiment (2 geophones, 1 accelerometers and 2 piezo-electric sensors for detecting acoustic emission), are further augmented with sensors originating from a previous activity focusing on long-term monitoring of temperature evolution and rock kinematics with the help of wireless sensor networks (crackmeters, cameras, weather station, rock temperature profiles, differential GPS) [Hasler2012]. In raw format, the data generated by the different types of sensors, specifically the micro-seismic and acoustic emission sensors, is strongly heterogeneous, in part unsynchronized and the storage and processing demand is large. Therefore, a purpose-built signal preprocessing and event-detection system is used. While the analysis of data from each individual sensor follows established methods, the application of all these sensor types in combination within a field experiment is unique. Furthermore, experience and methods from using such sensors in laboratory settings cannot be readily transferred to the mountain field site setting with its scale and full exposure to the natural environment. Consequently, many state-of-the-art algorithms for big data analysis and event classification requiring a ground truth dataset cannot be applied. The above mentioned challenges require a tool for data exploration. In the presented system, data exploration is supported by unsupervised feature learning based on convolutional neural networks, which is used to automatically extract common features for preliminary clustering and outlier detection. With this information, an interactive web-tool allows for a fast identification of interesting time segments on which segment-selective algorithms for visualization, feature extraction and statistics can be applied. The combination of manual labeling based and unsupervised feature extraction provides an event catalog for classification of different characteristic events related to internal progression of micro-crack in steep fractured bedrock permafrost. References Hasler, A., S. Gruber, and J. Beutel (2012), Kinematics of steep bedrock permafrost, J. Geophys. Res., 117, F01016, doi:10.1029/2011JF001981.
Search time critically depends on irrelevant subset size in visual search.
Benjamins, Jeroen S; Hooge, Ignace T C; van Elst, Jacco C; Wertheim, Alexander H; Verstraten, Frans A J
2009-02-01
In order for our visual system to deal with the massive amount of sensory input, some of this input is discarded, while other parts are processed [Wolfe, J. M. (1994). Guided search 2.0: a revised model of visual search. Psychonomic Bulletin and Review, 1, 202-238]. From the visual search literature it is unclear how well one set of items can be selected that differs in only one feature from target (a 1F set), while another set of items can be ignored that differs in two features from target (a 2F set). We systematically varied the percentage of 2F non-targets to determine the contribution of these non-targets to search behaviour. Increasing the percentage 2F non-targets, that have to be ignored, was expected to result in increasingly faster search, since it decreases the size of 1F set that has to be searched. Observers searched large displays for a target in the 1F set with a variable percentage of 2F non-targets. Interestingly, when the search displays contained 5% 2F non-targets, the search time was longer compared to the search time in other conditions. This effect of 2F non-targets on performance was independent of set size. An inspection of the saccades revealed that saccade target selection did not contribute to the longer search times in displays with 5% 2F non-targets. Occurrence of longer search times in displays containing 5% 2F non-targets might be attributed to covert processes related to visual analysis of the fixated part of the display. Apparently, visual search performance critically depends on the percentage of irrelevant 2F non-targets.
Shrivastava, Vimal K; Londhe, Narendra D; Sonawane, Rajendra S; Suri, Jasjit S
2016-04-01
Psoriasis is an autoimmune skin disease with red and scaly plaques on skin and affecting about 125 million people worldwide. Currently, dermatologist use visual and haptic methods for diagnosis the disease severity. This does not help them in stratification and risk assessment of the lesion stage and grade. Further, current methods add complexity during monitoring and follow-up phase. The current diagnostic tools lead to subjectivity in decision making and are unreliable and laborious. This paper presents a first comparative performance study of its kind using principal component analysis (PCA) based CADx system for psoriasis risk stratification and image classification utilizing: (i) 11 higher order spectra (HOS) features, (ii) 60 texture features, and (iii) 86 color feature sets and their seven combinations. Aggregate 540 image samples (270 healthy and 270 diseased) from 30 psoriasis patients of Indian ethnic origin are used in our database. Machine learning using PCA is used for dominant feature selection which is then fed to support vector machine classifier (SVM) to obtain optimized performance. Three different protocols are implemented using three kinds of feature sets. Reliability index of the CADx is computed. Among all feature combinations, the CADx system shows optimal performance of 100% accuracy, 100% sensitivity and specificity, when all three sets of feature are combined. Further, our experimental result with increasing data size shows that all feature combinations yield high reliability index throughout the PCA-cutoffs except color feature set and combination of color and texture feature sets. HOS features are powerful in psoriasis disease classification and stratification. Even though, independently, all three set of features HOS, texture, and color perform competitively, but when combined, the machine learning system performs the best. The system is fully automated, reliable and accurate. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Robust object matching for persistent tracking with heterogeneous features.
Guo, Yanlin; Hsu, Steve; Sawhney, Harpreet S; Kumar, Rakesh; Shan, Ying
2007-05-01
This paper addresses the problem of matching vehicles across multiple sightings under variations in illumination and camera poses. Since multiple observations of a vehicle are separated in large temporal and/or spatial gaps, thus prohibiting the use of standard frame-to-frame data association, we employ features extracted over a sequence during one time interval as a vehicle fingerprint that is used to compute the likelihood that two or more sequence observations are from the same or different vehicles. Furthermore, since our domain is aerial video tracking, in order to deal with poor image quality and large resolution and quality variations, our approach employs robust alignment and match measures for different stages of vehicle matching. Most notably, we employ a heterogeneous collection of features such as lines, points, and regions in an integrated matching framework. Heterogeneous features are shown to be important. Line and point features provide accurate localization and are employed for robust alignment across disparate views. The challenges of change in pose, aspect, and appearances across two disparate observations are handled by combining a novel feature-based quasi-rigid alignment with flexible matching between two or more sequences. However, since lines and points are relatively sparse, they are not adequate to delineate the object and provide a comprehensive matching set that covers the complete object. Region features provide a high degree of coverage and are employed for continuous frames to provide a delineation of the vehicle region for subsequent generation of a match measure. Our approach reliably delineates objects by representing regions as robust blob features and matching multiple regions to multiple regions using Earth Mover's Distance (EMD). Extensive experimentation under a variety of real-world scenarios and over hundreds of thousands of Confirmatory Identification (CID) trails has demonstrated about 95 percent accuracy in vehicle reacquisition with both visible and Infrared (IR) imaging cameras.
VizieR Online Data Catalog: Radial velocities of the Be star HR 2142 (Peters+, 2016)
NASA Astrophysics Data System (ADS)
Peters, G. J.; Wang, L.; Gies, D. R.; Grundstrom, E. D.
2016-11-01
Radial velocity measurements were made using the set of spectra summarized in Table 1. The main focus of this work is a set of 88 high resolution, SWP HIRES FUV spectra acquired over the lifetime of the International Ultraviolet Explorer (IUE) observatory. These were downloaded from MAST and resampled. We also collected a set of 49 LWR and LWP near-UV spectra that were used to inspect the orbital variations in the MgII2796,2803 feature. The UV spectra were supplemented with a large collection of Hα spectra that we secured with the KPNO Coude Feed telescope and that were obtained by amateur astronomers participating in the Be Star Spectra database project (Pollmann 2007IBVS.5778....1P; Neiner et al. 2011AJ....142..149N). (2 data files).
One in the Dance: Musical Correlates of Group Synchrony in a Real-World Club Environment
Ellamil, Melissa; Berson, Joshua; Wong, Jen; Buckley, Louis; Margulies, Daniel S.
2016-01-01
Previous research on interpersonal synchrony has mainly investigated small groups in isolated laboratory settings, which may not fully reflect the complex and dynamic interactions of real-life social situations. The present study expands on this by examining group synchrony across a large number of individuals in a naturalistic environment. Smartphone acceleration measures were recorded from participants during a music set in a dance club and assessed to identify how group movement synchrony covaried with various features of the music. In an evaluation of different preprocessing and analysis methods, giving more weight to front-back movement provided the most sensitive and reliable measure of group synchrony. During the club music set, group synchrony of torso movement was most strongly associated with pulsations that approximate walking rhythm (100–150 beats per minute). Songs with higher real-world play counts were also correlated with greater group synchrony. Group synchrony thus appears to be constrained by familiarity of the movement (walking action and rhythm) and of the music (song popularity). These findings from a real-world, large-scale social and musical setting can guide the development of methods for capturing and examining collective experiences in the laboratory and for effectively linking them to synchrony across people in daily life. PMID:27764167
One in the Dance: Musical Correlates of Group Synchrony in a Real-World Club Environment.
Ellamil, Melissa; Berson, Joshua; Wong, Jen; Buckley, Louis; Margulies, Daniel S
2016-01-01
Previous research on interpersonal synchrony has mainly investigated small groups in isolated laboratory settings, which may not fully reflect the complex and dynamic interactions of real-life social situations. The present study expands on this by examining group synchrony across a large number of individuals in a naturalistic environment. Smartphone acceleration measures were recorded from participants during a music set in a dance club and assessed to identify how group movement synchrony covaried with various features of the music. In an evaluation of different preprocessing and analysis methods, giving more weight to front-back movement provided the most sensitive and reliable measure of group synchrony. During the club music set, group synchrony of torso movement was most strongly associated with pulsations that approximate walking rhythm (100-150 beats per minute). Songs with higher real-world play counts were also correlated with greater group synchrony. Group synchrony thus appears to be constrained by familiarity of the movement (walking action and rhythm) and of the music (song popularity). These findings from a real-world, large-scale social and musical setting can guide the development of methods for capturing and examining collective experiences in the laboratory and for effectively linking them to synchrony across people in daily life.
Working memory for visual features and conjunctions in schizophrenia.
Gold, James M; Wilk, Christopher M; McMahon, Robert P; Buchanan, Robert W; Luck, Steven J
2003-02-01
The visual working memory (WM) storage capacity of patients with schizophrenia was investigated using a change detection paradigm. Participants were presented with 2, 3, 4, or 6 colored bars with testing of both single feature (color, orientation) and feature conjunction conditions. Patients performed significantly worse than controls at all set sizes but demonstrated normal feature binding. Unlike controls, patient WM capacity declined at set size 6 relative to set size 4. Impairments with subcapacity arrays suggest a deficit in task set maintenance: Greater impairment for supercapacity set sizes suggests a deficit in the ability to selectively encode information for WM storage. Thus, the WM impairment in schizophrenia appears to be a consequence of attentional deficits rather than a reduction in storage capacity.
3D variational brain tumor segmentation on a clustered feature set
NASA Astrophysics Data System (ADS)
Popuri, Karteek; Cobzas, Dana; Jagersand, Martin; Shah, Sirish L.; Murtha, Albert
2009-02-01
Tumor segmentation from MRI data is a particularly challenging and time consuming task. Tumors have a large diversity in shape and appearance with intensities overlapping the normal brain tissues. In addition, an expanding tumor can also deflect and deform nearby tissue. Our work addresses these last two difficult problems. We use the available MRI modalities (T1, T1c, T2) and their texture characteristics to construct a multi-dimensional feature set. Further, we extract clusters which provide a compact representation of the essential information in these features. The main idea in this paper is to incorporate these clustered features into the 3D variational segmentation framework. In contrast to the previous variational approaches, we propose a segmentation method that evolves the contour in a supervised fashion. The segmentation boundary is driven by the learned inside and outside region voxel probabilities in the cluster space. We incorporate prior knowledge about the normal brain tissue appearance, during the estimation of these region statistics. In particular, we use a Dirichlet prior that discourages the clusters in the ventricles to be in the tumor and hence better disambiguate the tumor from brain tissue. We show the performance of our method on real MRI scans. The experimental dataset includes MRI scans, from patients with difficult instances, with tumors that are inhomogeneous in appearance, small in size and in proximity to the major structures in the brain. Our method shows good results on these test cases.
Nikfarjam, Azadeh; Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Gonzalez, Graciela
2015-05-01
Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Spatiotemporal property and predictability of large-scale human mobility
NASA Astrophysics Data System (ADS)
Zhang, Hai-Tao; Zhu, Tao; Fu, Dongfei; Xu, Bowen; Han, Xiao-Pu; Chen, Duxin
2018-04-01
Spatiotemporal characteristics of human mobility emerging from complexity on individual scale have been extensively studied due to the application potential on human behavior prediction and recommendation, and control of epidemic spreading. We collect and investigate a comprehensive data set of human activities on large geographical scales, including both websites browse and mobile towers visit. Numerical results show that the degree of activity decays as a power law, indicating that human behaviors are reminiscent of scale-free random walks known as Lévy flight. More significantly, this study suggests that human activities on large geographical scales have specific non-Markovian characteristics, such as a two-segment power-law distribution of dwelling time and a high possibility for prediction. Furthermore, a scale-free featured mobility model with two essential ingredients, i.e., preferential return and exploration, and a Gaussian distribution assumption on the exploration tendency parameter is proposed, which outperforms existing human mobility models under scenarios of large geographical scales.
Perceptual quality estimation of H.264/AVC videos using reduced-reference and no-reference models
NASA Astrophysics Data System (ADS)
Shahid, Muhammad; Pandremmenou, Katerina; Kondi, Lisimachos P.; Rossholm, Andreas; Lövström, Benny
2016-09-01
Reduced-reference (RR) and no-reference (NR) models for video quality estimation, using features that account for the impact of coding artifacts, spatio-temporal complexity, and packet losses, are proposed. The purpose of this study is to analyze a number of potentially quality-relevant features in order to select the most suitable set of features for building the desired models. The proposed sets of features have not been used in the literature and some of the features are used for the first time in this study. The features are employed by the least absolute shrinkage and selection operator (LASSO), which selects only the most influential of them toward perceptual quality. For comparison, we apply feature selection in the complete feature sets and ridge regression on the reduced sets. The models are validated using a database of H.264/AVC encoded videos that were subjectively assessed for quality in an ITU-T compliant laboratory. We infer that just two features selected by RR LASSO and two bitstream-based features selected by NR LASSO are able to estimate perceptual quality with high accuracy, higher than that of ridge, which uses more features. The comparisons with competing works and two full-reference metrics also verify the superiority of our models.
Evidence of tampering in watermark identification
NASA Astrophysics Data System (ADS)
McLauchlan, Lifford; Mehrübeoglu, Mehrübe
2009-08-01
In this work, watermarks are embedded in digital images in the discrete wavelet transform (DWT) domain. Principal component analysis (PCA) is performed on the DWT coefficients. Next higher order statistics based on the principal components and the eigenvalues are determined for different sets of images. Feature sets are analyzed for different types of attacks in m dimensional space. The results demonstrate the separability of the features for the tampered digital copies. Different feature sets are studied to determine more effective tamper evident feature sets. The digital forensics, the probable manipulation(s) or modification(s) performed on the digital information can be identified using the described technique.
Karstic slope "breathing": morpho-structural influence and hazard implications
NASA Astrophysics Data System (ADS)
Devoti, Roberto; Falcucci, Emanuela; Gori, Stefano; Eliana Poli, Maria; Zanferrari, Adriano; Braitenberg, Carla; Fabris, Paolo; Grillo, Barbara; Zuliani, David
2016-04-01
The study refers to the active slope deformation detected by GPS and tiltmeter stations in the Cansiglio karstic plateau located in the western Carnic Prealps (NE Italy). The observed transient deformation clearly correlates with the rainfall, so that the southernmost border of the Plateau reacts instantly to heavy rains displaying a "back and forth" deformation up to a few centimeters wide, with different time constants, demonstrating a response to different catchment volumes. We carried out a field survey along the southern Cansiglio slope, to achieve structural characterization of the relief and to verify the possible relation between structural features and the peculiar geomorphological setting dominated by widespread karstic features. The Cansiglio plateau develops on the frontal ramp anticline of the Cansiglio thrust, an about ENE-WSW trending, SSE-verging, low angle thrust, belonging to the Neogene-Quaternary front of the eastern Southern Alps. The Cansiglio thrust outcrops at the base of the Cansiglio plateau, where it overlaps the Mesozoic carbonates on the Miocene-Quaternary terrigenous succession. All along its length cataclastic limestone largely outcrop. The Cansiglio thrust is bordered by two transfer zones probably inherited from the Mesozoic paleogeography: the Caneva fault in the west and the Col Longone fault in the east. The carbonatic massif is also characterized by a series of about northward steeply dipping reverse minor faults and a set of subvertical joints parallel to the axes of the Cansiglio anticline. Other NNW-SSE and NNE-SSW conjugate faults and fractures perpendicular to the Cansiglio southern slope are also identified. This structural setting affect pervasively the whole slope and may determine centimetre- to metre-scale rock prisms. Interestingly, along the topmost portion of the slope, some dolines and swallow holes show an incipient coalescence, that trends parallel to the massif front and to the deformation zones related to the reverse fault. Such a dolines alignment forms a ridge parallel elongated trench, about 4 km long, which is a typical morpho-structural feature of slopes undergoing large scale gravitational instability (deep seated gravitational slope deformations). The trench is interrupted towards the NE by several coalescent and slide scarps. Such geomorphic evidence testifies to the occurrence of landslides events (mainly rockslides and rock falls) that sourced from the top portion of the slope, as local collapses of the sector affected by the trench. Our observations, as a whole, suggest that morpho-structural framework of the Cansiglio south-eastern slope is highly influenced by tectonic features related to the complex tectonic deformation. The structural setting is locally favoring the nucleation of karstic landforms (dolines, swallow holes and ipokarstic features). Moreover, the presence of widespread tectonic features lead gravitational instability affecting the slope, linked to the high local relief of the mountain front, may trigger collapse of sectors of the slope in rock falls phenomena. In this perspective, therefore, the continuous "back and forth" movements of the slope observed by GPS time series analysis induced by rainfall may progressively weaken the slope and render it prone to landsliding.
Parametric modeling studies of turbulent non-premixed jet flames with thin reaction zones
NASA Astrophysics Data System (ADS)
Wang, Haifeng
2013-11-01
The Sydney piloted jet flame series (Flames L, B, and M) feature thinner reaction zones and hence impose greater challenges to modeling than the Sanida Piloted jet flames (Flames D, E, and F). Recently, the Sydney flames received renewed interest due to these challenges. Several new modeling efforts have emerged. However, no systematic parametric modeling studies have been reported for the Sydney flames. A large set of modeling computations of the Sydney flames is presented here by using the coupled large eddy simulation (LES)/probability density function (PDF) method. Parametric studies are performed to gain insight into the model performance, its sensitivity and the effect of numerics.
COSMOS: Carnegie Observatories System for MultiObject Spectroscopy
NASA Astrophysics Data System (ADS)
Oemler, A.; Clardy, K.; Kelson, D.; Walth, G.; Villanueva, E.
2017-05-01
COSMOS (Carnegie Observatories System for MultiObject Spectroscopy) reduces multislit spectra obtained with the IMACS and LDSS3 spectrographs on the Magellan Telescopes. It can be used for the quick-look analysis of data at the telescope as well as for pipeline reduction of large data sets. COSMOS is based on a precise optical model of the spectrographs, which allows (after alignment and calibration) an accurate prediction of the location of spectra features. This eliminates the line search procedure which is fundamental to many spectral reduction programs, and allows a robust data pipeline to be run in an almost fully automatic mode, allowing large amounts of data to be reduced with minimal intervention.
Fast support vector data descriptions for novelty detection.
Liu, Yi-Hung; Liu, Yan-Chen; Chen, Yen-Jen
2010-08-01
Support vector data description (SVDD) has become a very attractive kernel method due to its good results in many novelty detection problems. However, the decision function of SVDD is expressed in terms of the kernel expansion, which results in a run-time complexity linear in the number of support vectors. For applications where fast real-time response is needed, how to speed up the decision function is crucial. This paper aims at dealing with the issue of reducing the testing time complexity of SVDD. A method called fast SVDD (F-SVDD) is proposed. Unlike the traditional methods which all try to compress a kernel expansion into one with fewer terms, the proposed F-SVDD directly finds the preimage of a feature vector, and then uses a simple relationship between this feature vector and the SVDD sphere center to re-express the center with a single vector. The decision function of F-SVDD contains only one kernel term, and thus the decision boundary of F-SVDD is only spherical in the original space. Hence, the run-time complexity of the F-SVDD decision function is no longer linear in the support vectors, but is a constant, no matter how large the training set size is. In this paper, we also propose a novel direct preimage-finding method, which is noniterative and involves no free parameters. The unique preimage can be obtained in real time by the proposed direct method without taking trial-and-error. For demonstration, several real-world data sets and a large-scale data set, the extended MIT face data set, are used in experiments. In addition, a practical industry example regarding liquid crystal display micro-defect inspection is also used to compare the applicability of SVDD and our proposed F-SVDD when faced with mass data input. The results are very encouraging.
NASA Astrophysics Data System (ADS)
Song, Bowen; Zhang, Guopeng; Wang, Huafeng; Zhu, Wei; Liang, Zhengrong
2013-02-01
Various types of features, e.g., geometric features, texture features, projection features etc., have been introduced for polyp detection and differentiation tasks via computer aided detection and diagnosis (CAD) for computed tomography colonography (CTC). Although these features together cover more information of the data, some of them are statistically highly-related to others, which made the feature set redundant and burdened the computation task of CAD. In this paper, we proposed a new dimension reduction method which combines hierarchical clustering and principal component analysis (PCA) for false positives (FPs) reduction task. First, we group all the features based on their similarity using hierarchical clustering, and then PCA is employed within each group. Different numbers of principal components are selected from each group to form the final feature set. Support vector machine is used to perform the classification. The results show that when three principal components were chosen from each group we can achieve an area under the curve of receiver operating characteristics of 0.905, which is as high as the original dataset. Meanwhile, the computation time is reduced by 70% and the feature set size is reduce by 77%. It can be concluded that the proposed method captures the most important information of the feature set and the classification accuracy is not affected after the dimension reduction. The result is promising and further investigation, such as automatically threshold setting, are worthwhile and are under progress.
NASA Technical Reports Server (NTRS)
1985-01-01
A usable data base, the Pilot climate Data System (PCDS) is described. The PCDS is designed to be an interactive, easy-to-use, on-line generalized scientific information system. It efficiently provides uniform data catalogs; inventories, and access method, as well as manipulation and display tools for a large assortment of Earth, ocean and atmospheric data for the climate-related research community. Researchers can employ the PCDS to scan, manipulate, compare, display, and study climate parameters from diverse data sets. Software features, and applications of the PCDS are highlighted.
ERIC Educational Resources Information Center
Sivtseva-Maksimova, Praskovia Vasilevna
2016-01-01
The relevance of the study is determined by the increasing interest in the new interpretations of social issues of living in the early 20th century, and from this perspective, in the scientific heritage of A. E. Kulakovsky (1877-1926) as an original thinker, who worried about the fate of the indigenous people inhabiting a large territory of the…
The Nonlinear Magnetosphere: Expressions in MHD and in Kinetic Models
NASA Technical Reports Server (NTRS)
Hesse, Michael; Birn, Joachim
2011-01-01
Like most plasma systems, the magnetosphere of the Earth is governed by nonlinear dynamic evolution equations. The impact of nonlinearities ranges from large scales, where overall dynamics features are exhibiting nonlinear behavior, to small scale, kinetic, processes, where nonlinear behavior governs, among others, energy conversion and dissipation. In this talk we present a select set of examples of such behavior, with a specific emphasis on how nonlinear effects manifest themselves in MHD and in kinetic models of magnetospheric plasma dynamics.
1990-02-15
XIALTM DoD 5000.51-G A * FINAL DRAFT 2/15/90 si- N ~LUM I -KY FEATRES OFTHE Do IMLEENAON TOTAL L/ De~rrn QUADefens FOREWORD Government and industry...away with all government inspectors. Rather. government oversight will change from te large scale product inspection and specifying the -how to...in class" - Set the course for the future, and - Provide a baseline for measuring progress. Benchmarking is a continuous process of comparing an
A Robust Geometric Model for Argument Classification
NASA Astrophysics Data System (ADS)
Giannone, Cristina; Croce, Danilo; Basili, Roberto; de Cao, Diego
Argument classification is the task of assigning semantic roles to syntactic structures in natural language sentences. Supervised learning techniques for frame semantics have been recently shown to benefit from rich sets of syntactic features. However argument classification is also highly dependent on the semantics of the involved lexicals. Empirical studies have shown that domain dependence of lexical information causes large performance drops in outside domain tests. In this paper a distributional approach is proposed to improve the robustness of the learning model against out-of-domain lexical phenomena.
AlphaSpace: Fragment-Centric Topographical Mapping To Target Protein–Protein Interaction Interfaces
2016-01-01
Inhibition of protein–protein interactions (PPIs) is emerging as a promising therapeutic strategy despite the difficulty in targeting such interfaces with drug-like small molecules. PPIs generally feature large and flat binding surfaces as compared to typical drug targets. These features pose a challenge for structural characterization of the surface using geometry-based pocket-detection methods. An attractive mapping strategy—that builds on the principles of fragment-based drug discovery (FBDD)—is to detect the fragment-centric modularity at the protein surface and then characterize the large PPI interface as a set of localized, fragment-targetable interaction regions. Here, we introduce AlphaSpace, a computational analysis tool designed for fragment-centric topographical mapping (FCTM) of PPI interfaces. Our approach uses the alpha sphere construct, a geometric feature of a protein’s Voronoi diagram, to map out concave interaction space at the protein surface. We introduce two new features—alpha-atom and alpha-space—and the concept of the alpha-atom/alpha-space pair to rank pockets for fragment-targetability and to facilitate the evaluation of pocket/fragment complementarity. The resulting high-resolution interfacial map of targetable pocket space can be used to guide the rational design and optimization of small molecule or biomimetic PPI inhibitors. PMID:26225450
Electrical features of eighteen automated external defibrillators: a systematic evaluation.
Kette, Fulvio; Locatelli, Aldo; Bozzola, Marcella; Zoli, Alberto; Li, Yongqin; Salmoiraghi, Marco; Ristagno, Giuseppe; Andreassi, Aida
2013-11-01
Assessment and comparison of the electrical parameters (energy, current, first and second phase waveform duration) among eighteen AEDs. Engineering bench tests for a descriptive systematic evaluation in commercially available AEDs. AEDs were tested through an ECG simulator, an impedance simulator, an oscilloscope and a measuring device detecting energy delivered, peak and average current, and duration of first and second phase of the biphasic waveforms. All tests were performed at the engineering facility of the Lombardia Regional Emergency Service (AREU). Large variations in the energy delivered at the first shock were observed. The trend of current highlighted a progressive decline concurrent with the increases of impedance. First and second phase duration varied substantially among the AEDs using the exponential biphasic waveform, unlike rectilinear waveform AEDs in which phase duration remained relatively constant. There is a large variability in the electrical features of the AEDs tested. Energy is likely not to be the best indicator for strength dose selection. Current and shock duration should be both considered when approaching the technical features of AEDs. These findings may prompt further investigations to define the optimal current and duration of the shock waves to increase the success rate in the clinical setting. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Feature selection gait-based gender classification under different circumstances
NASA Astrophysics Data System (ADS)
Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah
2014-05-01
This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.
MINC 2.0: A Flexible Format for Multi-Modal Images.
Vincent, Robert D; Neelin, Peter; Khalili-Mahani, Najmeh; Janke, Andrew L; Fonov, Vladimir S; Robbins, Steven M; Baghdadi, Leila; Lerch, Jason; Sled, John G; Adalat, Reza; MacDonald, David; Zijdenbos, Alex P; Collins, D Louis; Evans, Alan C
2016-01-01
It is often useful that an imaging data format can afford rich metadata, be flexible, scale to very large file sizes, support multi-modal data, and have strong inbuilt mechanisms for data provenance. Beginning in 1992, MINC was developed as a system for flexible, self-documenting representation of neuroscientific imaging data with arbitrary orientation and dimensionality. The MINC system incorporates three broad components: a file format specification, a programming library, and a growing set of tools. In the early 2000's the MINC developers created MINC 2.0, which added support for 64-bit file sizes, internal compression, and a number of other modern features. Because of its extensible design, it has been easy to incorporate details of provenance in the header metadata, including an explicit processing history, unique identifiers, and vendor-specific scanner settings. This makes MINC ideal for use in large scale imaging studies and databases. It also makes it easy to adapt to new scanning sequences and modalities.
Sigoillot, Frederic D; Huckins, Jeremy F; Li, Fuhai; Zhou, Xiaobo; Wong, Stephen T C; King, Randall W
2011-01-01
Automated time-lapse microscopy can visualize proliferation of large numbers of individual cells, enabling accurate measurement of the frequency of cell division and the duration of interphase and mitosis. However, extraction of quantitative information by manual inspection of time-lapse movies is too time-consuming to be useful for analysis of large experiments. Here we present an automated time-series approach that can measure changes in the duration of mitosis and interphase in individual cells expressing fluorescent histone 2B. The approach requires analysis of only 2 features, nuclear area and average intensity. Compared to supervised learning approaches, this method reduces processing time and does not require generation of training data sets. We demonstrate that this method is as sensitive as manual analysis in identifying small changes in interphase or mitotic duration induced by drug or siRNA treatment. This approach should facilitate automated analysis of high-throughput time-lapse data sets to identify small molecules or gene products that influence timing of cell division.
Two-stage opening of the Dover Strait and the origin of island Britain
Gupta, Sanjeev; Collier, Jenny S.; Garcia-Moreno, David; Oggioni, Francesca; Trentesaux, Alain; Vanneste, Kris; De Batist, Marc; Camelbeeck, Thierry; Potter, Graeme; Van Vliet-Lanoë, Brigitte; Arthur, John C. R.
2017-01-01
Late Quaternary separation of Britain from mainland Europe is considered to be a consequence of spillover of a large proglacial lake in the Southern North Sea basin. Lake spillover is inferred to have caused breaching of a rock ridge at the Dover Strait, although this hypothesis remains untested. Here we show that opening of the Strait involved at least two major episodes of erosion. Sub-bottom records reveal a remarkable set of sediment-infilled depressions that are deeply incised into bedrock that we interpret as giant plunge pools. These support a model of initial erosion of the Dover Strait by lake overspill, plunge pool erosion by waterfalls and subsequent dam breaching. Cross-cutting of these landforms by a prominent bedrock-eroded valley that is characterized by features associated with catastrophic flooding indicates final breaching of the Strait by high-magnitude flows. These events set-up conditions for island Britain during sea-level highstands and caused large-scale re-routing of NW European drainage. PMID:28375202
Two-stage opening of the Dover Strait and the origin of island Britain
NASA Astrophysics Data System (ADS)
Gupta, Sanjeev; Collier, Jenny S.; Garcia-Moreno, David; Oggioni, Francesca; Trentesaux, Alain; Vanneste, Kris; de Batist, Marc; Camelbeeck, Thierry; Potter, Graeme; van Vliet-Lanoë, Brigitte; Arthur, John C. R.
2017-04-01
Late Quaternary separation of Britain from mainland Europe is considered to be a consequence of spillover of a large proglacial lake in the Southern North Sea basin. Lake spillover is inferred to have caused breaching of a rock ridge at the Dover Strait, although this hypothesis remains untested. Here we show that opening of the Strait involved at least two major episodes of erosion. Sub-bottom records reveal a remarkable set of sediment-infilled depressions that are deeply incised into bedrock that we interpret as giant plunge pools. These support a model of initial erosion of the Dover Strait by lake overspill, plunge pool erosion by waterfalls and subsequent dam breaching. Cross-cutting of these landforms by a prominent bedrock-eroded valley that is characterized by features associated with catastrophic flooding indicates final breaching of the Strait by high-magnitude flows. These events set-up conditions for island Britain during sea-level highstands and caused large-scale re-routing of NW European drainage.
EuPathDB: the eukaryotic pathogen genomics database resource
Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie
2017-01-01
The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906
A Computer-Assisted Personalized Approach in an Undergraduate Plant Physiology Class1
Artus, Nancy N.; Nadler, Kenneth D.
1999-01-01
We used Computer-Assisted Personalized Approach (CAPA), a networked teaching and learning tool that generates computer individualized homework problem sets, in our large-enrollment introductory plant physiology course. We saw significant improvement in student examination performance with regular homework assignments, with CAPA being an effective and efficient substitute for hand-graded homework. Using CAPA, each student received a printed set of similar but individualized problems of a conceptual (qualitative) and/or quantitative nature with quality graphics. Because each set of problems is unique, students were encouraged to work together to clarify concepts but were required to do their own work for credit. Students could enter answers multiple times without penalty, and they were able to obtain immediate feedback and hints until the due date. These features increased student time on task, allowing higher course standards and student achievement in a diverse student population. CAPA handles routine tasks such as grading, recording, summarizing, and posting grades. In anonymous surveys, students indicated an overwhelming preference for homework in CAPA format, citing several features such as immediate feedback, multiple tries, and on-line accessibility as reasons for their preference. We wrote and used more than 170 problems on 17 topics in introductory plant physiology, cataloging them in a computer library for general access. Representative problems are compared and discussed. PMID:10198076
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harvey, Neal R; Ruggiero, Christy E; Pawley, Norma H
2009-01-01
Detecting complex targets, such as facilities, in commercially available satellite imagery is a difficult problem that human analysts try to solve by applying world knowledge. Often there are known observables that can be extracted by pixel-level feature detectors that can assist in the facility detection process. Individually, each of these observables is not sufficient for an accurate and reliable detection, but in combination, these auxiliary observables may provide sufficient context for detection by a machine learning algorithm. We describe an approach for automatic detection of facilities that uses an automated feature extraction algorithm to extract auxiliary observables, and a semi-supervisedmore » assisted target recognition algorithm to then identify facilities of interest. We illustrate the approach using an example of finding schools in Quickbird image data of Albuquerque, New Mexico. We use Los Alamos National Laboratory's Genie Pro automated feature extraction algorithm to find a set of auxiliary features that should be useful in the search for schools, such as parking lots, large buildings, sports fields and residential areas and then combine these features using Genie Pro's assisted target recognition algorithm to learn a classifier that finds schools in the image data.« less
The morphometrics of "masculinity" in human faces.
Mitteroecker, Philipp; Windhager, Sonja; Müller, Gerd B; Schaefer, Katrin
2015-01-01
In studies of social inference and human mate preference, a wide but inconsistent array of tools for computing facial masculinity has been devised. Several of these approaches implicitly assumed that the individual expression of sexually dimorphic shape features, which we refer to as maleness, resembles facial shape features perceived as masculine. We outline a morphometric strategy for estimating separately the face shape patterns that underlie perceived masculinity and maleness, and for computing individual scores for these shape patterns. We further show how faces with different degrees of masculinity or maleness can be constructed in a geometric morphometric framework. In an application of these methods to a set of human facial photographs, we found that shape features typically perceived as masculine are wide faces with a wide inter-orbital distance, a wide nose, thin lips, and a large and massive lower face. The individual expressions of this combination of shape features--the masculinity shape scores--were the best predictor of rated masculinity among the compared methods (r = 0.5). The shape features perceived as masculine only partly resembled the average face shape difference between males and females (sexual dimorphism). Discriminant functions and Procrustes distances to the female mean shape were poor predictors of perceived masculinity.
NASA Astrophysics Data System (ADS)
Parker, L.; Dye, R. A.; Perez, J.; Rinsland, P.
2012-12-01
Over the past decade the Atmospheric Science Data Center (ASDC) at NASA Langley Research Center has archived and distributed a variety of satellite mission and aircraft campaign data sets. These datasets posed unique challenges to the user community at large due to the sheer volume and variety of the data and the lack of intuitive features in the order tools available to the investigator. Some of these data sets also lack sufficient metadata to provide rudimentary data discovery. To meet the needs of emerging users, the ASDC addressed issues in data discovery and delivery through the use of standards in data and access methods, and distribution through appropriate portals. The ASDC is currently undergoing a refresh of its webpages and Ordering Tools that will leverage updated collection level metadata in an effort to enhance the user experience. The ASDC is now providing search and subset capability to key mission satellite data sets. The ASDC has collaborated with Science Teams to accommodate prospective science users in the climate and modeling communities. The ASDC is using a common framework that enables more rapid development and deployment of search and subset tools that provide enhanced access features for the user community. Features of the Search and Subset web application enables a more sophisticated approach to selecting and ordering data subsets by parameter, date, time, and geographic area. The ASDC has also applied key practices from satellite missions to the multi-campaign aircraft missions executed for Earth Venture-1 and MEaSUReS
Rough sets and Laplacian score based cost-sensitive feature selection
Yu, Shenglong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884
Rough sets and Laplacian score based cost-sensitive feature selection.
Yu, Shenglong; Zhao, Hong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.
Vendruscolo, M; Najmanovich, R; Domany, E
2000-02-01
We present a method to derive contact energy parameters from large sets of proteins. The basic requirement on which our method is based is that for each protein in the database the native contact map has lower energy than all its decoy conformations that are obtained by threading. Only when this condition is satisfied one can use the proposed energy function for fold identification. Such a set of parameters can be found (by perceptron learning) if Mp, the number of proteins in the database, is not too large. Other aspects that influence the existence of such a solution are the exact definition of contact and the value of the critical distance Rc, below which two residues are considered to be in contact. Another important novel feature of our approach is its ability to determine whether an energy function of some suitable proposed form can or cannot be parameterized in a way that satisfies our basic requirement. As a demonstration of this, we determine the region in the (Rc, Mp) plane in which the problem is solvable, i.e., we can find a set of contact parameters that stabilize simultaneously all the native conformations. We show that for large enough databases the contact approximation to the energy cannot stabilize all the native folds even against the decoys obtained by gapless threading.
Chin, Wei-Chien-Benny; Wen, Tzai-Hung
2015-01-01
A network approach, which simplifies geographic settings as a form of nodes and links, emphasizes the connectivity and relationships of spatial features. Topological networks of spatial features are used to explore geographical connectivity and structures. The PageRank algorithm, a network metric, is often used to help identify important locations where people or automobiles concentrate in the geographical literature. However, geographic considerations, including proximity and location attractiveness, are ignored in most network metrics. The objective of the present study is to propose two geographically modified PageRank algorithms-Distance-Decay PageRank (DDPR) and Geographical PageRank (GPR)-that incorporate geographic considerations into PageRank algorithms to identify the spatial concentration of human movement in a geospatial network. Our findings indicate that in both intercity and within-city settings the proposed algorithms more effectively capture the spatial locations where people reside than traditional commonly-used network metrics. In comparing location attractiveness and distance decay, we conclude that the concentration of human movement is largely determined by the distance decay. This implies that geographic proximity remains a key factor in human mobility.
Persistent homology and non-Gaussianity
NASA Astrophysics Data System (ADS)
Cole, Alex; Shiu, Gary
2018-03-01
In this paper, we introduce the topological persistence diagram as a statistic for Cosmic Microwave Background (CMB) temperature anisotropy maps. A central concept in 'Topological Data Analysis' (TDA), the idea of persistence is to represent a data set by a family of topological spaces. One then examines how long topological features 'persist' as the family of spaces is traversed. We compute persistence diagrams for simulated CMB temperature anisotropy maps featuring various levels of primordial non-Gaussianity of local type. Postponing the analysis of observational effects, we show that persistence diagrams are more sensitive to local non-Gaussianity than previous topological statistics including the genus and Betti number curves, and can constrain Δ fNLloc= 35.8 at the 68% confidence level on the simulation set, compared to Δ fNLloc= 60.6 for the Betti number curves. Given the resolution of our simulations, we expect applying persistence diagrams to observational data will give constraints competitive with those of the Minkowski Functionals. This is the first in a series of papers where we plan to apply TDA to different shapes of non-Gaussianity in the CMB and Large Scale Structure.
Cassini UVIS Auroral Observations in 2016 and 2017
NASA Astrophysics Data System (ADS)
Pryor, Wayne R.; Esposito, Larry W.; Jouchoux, Alain; Radioti, Aikaterini; Grodent, Denis; Gustin, Jacques; Gerard, Jean-Claude; Lamy, Laurent; Badman, Sarah; Dyudina, Ulyana A.; Cassini UVIS Team, Cassini VIMS Team, Cassini ISS Team, HST Saturn Auroral Team
2017-10-01
In 2016 and 2017, the Cassini Saturn orbiter executed a final series of high-inclination, low-periapsis orbits ideal for studies of Saturn's polar regions. The Cassini Ultraviolet Imaging Spectrograph (UVIS) obtained an extensive set of auroral images, some at the highest spatial resolution obtained during Cassini's long orbital mission (2004-2017). In some cases, two or three spacecraft slews at right angles to the long slit of the spectrograph were required to cover the entire auroral region to form auroral images. We will present selected images from this set showing narrow arcs of emission, more diffuse auroral emissions, multiple auroral arcs in a single image, discrete spots of emission, small scale vortices, large-scale spiral forms, and parallel linear features that appear to cross in places like twisted wires. Some shorter features are transverse to the main auroral arcs, like barbs on a wire. UVIS observations were in some cases simultaneous with auroral observations from the Cassini Imaging Science Subsystem (ISS) the Cassini Visual and Infrared Mapping Spectrometer (VIMS), and the Hubble Space Telescope Space Telescope Imaging Spectrograph (STIS) that will also be presented.
The UCSC Genome Browser database: extensions and updates 2013.
Meyer, Laurence R; Zweig, Ann S; Hinrichs, Angie S; Karolchik, Donna; Kuhn, Robert M; Wong, Matthew; Sloan, Cricket A; Rosenbloom, Kate R; Roe, Greg; Rhead, Brooke; Raney, Brian J; Pohl, Andy; Malladi, Venkat S; Li, Chin H; Lee, Brian T; Learned, Katrina; Kirkup, Vanessa; Hsu, Fan; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Goldman, Mary; Giardine, Belinda M; Fujita, Pauline A; Dreszer, Timothy R; Diekhans, Mark; Cline, Melissa S; Clawson, Hiram; Barber, Galt P; Haussler, David; Kent, W James
2013-01-01
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation 'tracks' are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal.
NASA Astrophysics Data System (ADS)
Ma, Lei; Cheng, Liang; Li, Manchun; Liu, Yongxue; Ma, Xiaoxue
2015-04-01
Unmanned Aerial Vehicle (UAV) has been used increasingly for natural resource applications in recent years due to their greater availability and the miniaturization of sensors. In addition, Geographic Object-Based Image Analysis (GEOBIA) has received more attention as a novel paradigm for remote sensing earth observation data. However, GEOBIA generates some new problems compared with pixel-based methods. In this study, we developed a strategy for the semi-automatic optimization of object-based classification, which involves an area-based accuracy assessment that analyzes the relationship between scale and the training set size. We found that the Overall Accuracy (OA) increased as the training set ratio (proportion of the segmented objects used for training) increased when the Segmentation Scale Parameter (SSP) was fixed. The OA increased more slowly as the training set ratio became larger and a similar rule was obtained according to the pixel-based image analysis. The OA decreased as the SSP increased when the training set ratio was fixed. Consequently, the SSP should not be too large during classification using a small training set ratio. By contrast, a large training set ratio is required if classification is performed using a high SSP. In addition, we suggest that the optimal SSP for each class has a high positive correlation with the mean area obtained by manual interpretation, which can be summarized by a linear correlation equation. We expect that these results will be applicable to UAV imagery classification to determine the optimal SSP for each class.
Considerations for Observational Research Using Large Data Sets in Radiation Oncology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jagsi, Reshma, E-mail: rjagsi@med.umich.edu; Bekelman, Justin E.; Chen, Aileen
The radiation oncology community has witnessed growing interest in observational research conducted using large-scale data sources such as registries and claims-based data sets. With the growing emphasis on observational analyses in health care, the radiation oncology community must possess a sophisticated understanding of the methodological considerations of such studies in order to evaluate evidence appropriately to guide practice and policy. Because observational research has unique features that distinguish it from clinical trials and other forms of traditional radiation oncology research, the International Journal of Radiation Oncology, Biology, Physics assembled a panel of experts in health services research to provide a concisemore » and well-referenced review, intended to be informative for the lay reader, as well as for scholars who wish to embark on such research without prior experience. This review begins by discussing the types of research questions relevant to radiation oncology that large-scale databases may help illuminate. It then describes major potential data sources for such endeavors, including information regarding access and insights regarding the strengths and limitations of each. Finally, it provides guidance regarding the analytical challenges that observational studies must confront, along with discussion of the techniques that have been developed to help minimize the impact of certain common analytical issues in observational analysis. Features characterizing a well-designed observational study include clearly defined research questions, careful selection of an appropriate data source, consultation with investigators with relevant methodological expertise, inclusion of sensitivity analyses, caution not to overinterpret small but significant differences, and recognition of limitations when trying to evaluate causality. This review concludes that carefully designed and executed studies using observational data that possess these qualities hold substantial promise for advancing our understanding of many unanswered questions of importance to the field of radiation oncology.« less
Considerations for observational research using large data sets in radiation oncology.
Jagsi, Reshma; Bekelman, Justin E; Chen, Aileen; Chen, Ronald C; Hoffman, Karen; Shih, Ya-Chen Tina; Smith, Benjamin D; Yu, James B
2014-09-01
The radiation oncology community has witnessed growing interest in observational research conducted using large-scale data sources such as registries and claims-based data sets. With the growing emphasis on observational analyses in health care, the radiation oncology community must possess a sophisticated understanding of the methodological considerations of such studies in order to evaluate evidence appropriately to guide practice and policy. Because observational research has unique features that distinguish it from clinical trials and other forms of traditional radiation oncology research, the International Journal of Radiation Oncology, Biology, Physics assembled a panel of experts in health services research to provide a concise and well-referenced review, intended to be informative for the lay reader, as well as for scholars who wish to embark on such research without prior experience. This review begins by discussing the types of research questions relevant to radiation oncology that large-scale databases may help illuminate. It then describes major potential data sources for such endeavors, including information regarding access and insights regarding the strengths and limitations of each. Finally, it provides guidance regarding the analytical challenges that observational studies must confront, along with discussion of the techniques that have been developed to help minimize the impact of certain common analytical issues in observational analysis. Features characterizing a well-designed observational study include clearly defined research questions, careful selection of an appropriate data source, consultation with investigators with relevant methodological expertise, inclusion of sensitivity analyses, caution not to overinterpret small but significant differences, and recognition of limitations when trying to evaluate causality. This review concludes that carefully designed and executed studies using observational data that possess these qualities hold substantial promise for advancing our understanding of many unanswered questions of importance to the field of radiation oncology. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Bryant, Gerald
2015-04-01
Large-scale soft-sediment deformation features in the Navajo Sandstone have been a topic of interest for nearly 40 years, ever since they were first explored as a criterion for discriminating between marine and continental processes in the depositional environment. For much of this time, evidence for large-scale sediment displacements was commonly attributed to processes of mass wasting. That is, gravity-driven movements of surficial sand. These slope failures were attributed to the inherent susceptibility of dune sand responding to environmental triggers such as earthquakes, floods, impacts, and the differential loading associated with dune topography. During the last decade, a new wave of research is focusing on the event significance of deformation features in more detail, revealing a broad diversity of large-scale deformation morphologies. This research has led to a better appreciation of subsurface dynamics in the early Jurassic deformation events recorded in the Navajo Sandstone, including the important role of intrastratal sediment flow. This report documents two illustrative examples of large-scale sediment displacements represented in extensive outcrops of the Navajo Sandstone along the Utah/Arizona border. Architectural relationships in these outcrops provide definitive constraints that enable the recognition of a large-scale sediment outflow, at one location, and an equally large-scale subsurface flow at the other. At both sites, evidence for associated processes of liquefaction appear at depths of at least 40 m below the original depositional surface, which is nearly an order of magnitude greater than has commonly been reported from modern settings. The surficial, mass flow feature displays attributes that are consistent with much smaller-scale sediment eruptions (sand volcanoes) that are often documented from modern earthquake zones, including the development of hydraulic pressure from localized, subsurface liquefaction and the subsequent escape of fluidized sand toward the unconfined conditions of the surface. The origin of the forces that produced the lateral, subsurface movement of a large body of sand at the other site is not readily apparent. The various constraints on modeling the generation of the lateral force required to produce the observed displacement are considered here, along with photodocumentation of key outcrop relationships.
Pian, Cong; Zhang, Guangle; Chen, Zhi; Chen, Yuanyuan; Zhang, Jin; Yang, Tao; Zhang, Liangyun
2016-01-01
As a novel class of noncoding RNAs, long noncoding RNAs (lncRNAs) have been verified to be associated with various diseases. As large scale transcripts are generated every year, it is significant to accurately and quickly identify lncRNAs from thousands of assembled transcripts. To accurately discover new lncRNAs, we develop a classification tool of random forest (RF) named LncRNApred based on a new hybrid feature. This hybrid feature set includes three new proposed features, which are MaxORF, RMaxORF and SNR. LncRNApred is effective for classifying lncRNAs and protein coding transcripts accurately and quickly. Moreover,our RF model only requests the training using data on human coding and non-coding transcripts. Other species can also be predicted by using LncRNApred. The result shows that our method is more effective compared with the Coding Potential Calculate (CPC). The web server of LncRNApred is available for free at http://mm20132014.wicp.net:57203/LncRNApred/home.jsp.
Røislien, Jo; Winje, Brita
2013-09-20
Clinical studies frequently include repeated measurements of individuals, often for long periods. We present a methodology for extracting common temporal features across a set of individual time series observations. In particular, the methodology explores extreme observations within the time series, such as spikes, as a possible common temporal phenomenon. Wavelet basis functions are attractive in this sense, as they are localized in both time and frequency domains simultaneously, allowing for localized feature extraction from a time-varying signal. We apply wavelet basis function decomposition of individual time series, with corresponding wavelet shrinkage to remove noise. We then extract common temporal features using linear principal component analysis on the wavelet coefficients, before inverse transformation back to the time domain for clinical interpretation. We demonstrate the methodology on a subset of a large fetal activity study aiming to identify temporal patterns in fetal movement (FM) count data in order to explore formal FM counting as a screening tool for identifying fetal compromise and thus preventing adverse birth outcomes. Copyright © 2013 John Wiley & Sons, Ltd.
van Gemert, Jan C; Veenman, Cor J; Smeulders, Arnold W M; Geusebroek, Jan-Mark
2010-07-01
This paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been successfully applied for some years. In this paper, we investigate four types of soft assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known data sets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories.
Discriminative Multi-View Interactive Image Re-Ranking.
Li, Jun; Xu, Chang; Yang, Wankou; Sun, Changyin; Tao, Dacheng
2017-07-01
Given an unreliable visual patterns and insufficient query information, content-based image retrieval is often suboptimal and requires image re-ranking using auxiliary information. In this paper, we propose a discriminative multi-view interactive image re-ranking (DMINTIR), which integrates user relevance feedback capturing users' intentions and multiple features that sufficiently describe the images. In DMINTIR, heterogeneous property features are incorporated in the multi-view learning scheme to exploit their complementarities. In addition, a discriminatively learned weight vector is obtained to reassign updated scores and target images for re-ranking. Compared with other multi-view learning techniques, our scheme not only generates a compact representation in the latent space from the redundant multi-view features but also maximally preserves the discriminative information in feature encoding by the large-margin principle. Furthermore, the generalization error bound of the proposed algorithm is theoretically analyzed and shown to be improved by the interactions between the latent space and discriminant function learning. Experimental results on two benchmark data sets demonstrate that our approach boosts baseline retrieval quality and is competitive with the other state-of-the-art re-ranking strategies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nawrocki, J; Chino, J; Das, S
Purpose: This study examines the effect on texture analysis due to variable reconstruction of PET images in the context of an adaptive FDG PET protocol for node positive gynecologic cancer patients. By measuring variability in texture features from baseline and intra-treatment PET-CT, we can isolate unreliable texture features due to large variation. Methods: A subset of seven patients with node positive gynecological cancers visible on PET was selected for this study. Prescribed dose varied between 45–50.4Gy, with a 55–70Gy boost to the PET positive nodes. A baseline and intratreatment (between 30–36Gy) PET-CT were obtained on a Siemens Biograph mCT. Eachmore » clinical PET image set was reconstructed 6 times using a TrueX+TOF algorithm with varying iterations and Gaussian filter. Baseline and intra-treatment primary GTVs were segmented using PET Edge (MIM Software Inc., Cleveland, OH), a semi-automatic gradient-based algorithm, on the clinical PET and transferred to the other reconstructed sets. Using an in-house MATLAB program, four 3D texture matrices describing relationships between voxel intensities in the GTV were generated: co-occurrence, run length, size zone, and neighborhood difference. From these, 39 textural features characterizing texture were calculated in addition to SUV histogram features. The percent variability among parameters was first calculated. Each reconstructed texture feature from baseline and intra-treatment per patient was normalized to the clinical baseline scan and compared using the Wilcoxon signed-rank test in order to isolate variations due to reconstruction parameters. Results: For the baseline scans, 13 texture features showed a mean range greater than 10%. For the intra scans, 28 texture features showed a mean range greater than 10%. Comparing baseline to intra scans, 25 texture features showed p <0.05. Conclusion: Variability due to different reconstruction parameters increased with treatment, however, the majority of texture features showed significant changes during treatment independent of reconstruction effects.« less
Contingent attentional capture across multiple feature dimensions in a temporal search task.
Ito, Motohiro; Kawahara, Jun I
2016-01-01
The present study examined whether attention can be flexibly controlled to monitor two different feature dimensions (shape and color) in a temporal search task. Specifically, we investigated the occurrence of contingent attentional capture (i.e., interference from task-relevant distractors) and resulting set reconfiguration (i.e., enhancement of single task-relevant set). If observers can restrict searches to a specific value for each relevant feature dimension independently, the capture and reconfiguration effect should only occur when the single relevant distractor in each dimension appears. Participants identified a target letter surrounded by a non-green square or a non-square green frame. The results revealed contingent attentional capture, as target identification accuracy was lower when the distractor contained a target-defining feature than when it contained a nontarget feature. Resulting set reconfiguration was also obtained in that accuracy was superior when the current target's feature (e.g., shape) corresponded to the defining feature of the present distractor (shape) than when the current target's feature did not match the distractor's feature (color). This enhancement was not due to perceptual priming. The present study demonstrated that the principles of contingent attentional capture and resulting set reconfiguration held even when multiple target feature dimensions were monitored. Copyright © 2015 Elsevier B.V. All rights reserved.
New Features for Neuron Classification.
Hernández-Pérez, Leonardo A; Delgado-Castillo, Duniel; Martín-Pérez, Rainer; Orozco-Morales, Rubén; Lorenzo-Ginori, Juan V
2018-04-28
This paper addresses the problem of obtaining new neuron features capable of improving results of neuron classification. Most studies on neuron classification using morphological features have been based on Euclidean geometry. Here three one-dimensional (1D) time series are derived from the three-dimensional (3D) structure of neuron instead, and afterwards a spatial time series is finally constructed from which the features are calculated. Digitally reconstructed neurons were separated into control and pathological sets, which are related to three categories of alterations caused by epilepsy, Alzheimer's disease (long and local projections), and ischemia. These neuron sets were then subjected to supervised classification and the results were compared considering three sets of features: morphological, features obtained from the time series and a combination of both. The best results were obtained using features from the time series, which outperformed the classification using only morphological features, showing higher correct classification rates with differences of 5.15, 3.75, 5.33% for epilepsy and Alzheimer's disease (long and local projections) respectively. The morphological features were better for the ischemia set with a difference of 3.05%. Features like variance, Spearman auto-correlation, partial auto-correlation, mutual information, local minima and maxima, all related to the time series, exhibited the best performance. Also we compared different evaluators, among which ReliefF was the best ranked.
Dragas, Jelena; Jäckel, David; Hierlemann, Andreas; Franke, Felix
2017-01-01
Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction. PMID:25415989
Dragas, Jelena; Jackel, David; Hierlemann, Andreas; Franke, Felix
2015-03-01
Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction.
Linked Scatter Plots, A Powerful Exploration Tool For Very Large Sets of Spectra
NASA Astrophysics Data System (ADS)
Carbon, Duane Francis; Henze, Christopher
2015-08-01
We present a new tool, based on linked scatter plots, that is designed to efficiently explore very large spectrum data sets such as the SDSS, APOGEE, LAMOST, GAIA, and RAVE data sets.The tool works in two stages: the first uses batch processing and the second runs interactively. In the batch stage, spectra are processed through our data pipeline which computes the depths relative to the local continuum at preselected feature wavelengths. These depths, and any additional available variables such as local S/N level, magnitudes, colors, positions, and radial velocities, are the basic measured quantities used in the interactive stage.The interactive stage employs the NASA hyperwall, a configuration of 128 workstation displays (8x16 array) controlled by a parallelized software suite running on NASA's Pleiades supercomputer. Each hyperwall panel is used to display a fully linked 2-D scatter plot showing the depth of feature A vs the depth of feature B for all of the spectra. A and B change from panel to panel. The relationships between the various (A,B) strengths and any distinctive clustering, as well as unique outlier groupings, are visually apparent when examining and inter-comparing the different panels on the hyperwall. In addition, the data links between the scatter plots allow the user to apply a logical algebra to the measurements. By graphically selecting the objects in any interesting region of any 2-D plot on the hyperwall, the tool immediately and clearly shows how the selected objects are distributed in all the other 2-D plots. The selection process may be repeated multiple times and, at each step, the selections can represent a sequence of logical constraints on the measurements, revealing those objects which satisfy all the constraints thus far. The spectra of the selected objects may be examined at any time on a connected workstation display.Using over 945,000,000 depth measurements from 569,738 SDSS DR10 stellar spectra, we illustrate how to quickly isolate and examine such interesting stellar subsets as EMP stars, C-rich EMP stars, and CV stars.
Inflationary features and shifts in cosmological parameters from Planck 2015 data
NASA Astrophysics Data System (ADS)
Obied, Georges; Dvorkin, Cora; Heinrich, Chen; Hu, Wayne; Miranda, Vinicius
2017-10-01
We explore the relationship between features in the Planck 2015 temperature and polarization data, shifts in the cosmological parameters, and features from inflation. Residuals in the temperature data from the best-fit power-law Λ CDM model at low multipole ℓ≲40 are mainly responsible for the high H0 and low σ8Ωm1 /2 values when comparing the ℓ<1000 portion to the full data set. These same residuals are better fit to inflationary features with a 1.9 σ preference for running of the running of the tilt or a stronger 99% C.L. local significance preference for a sharp drop in power around k =0.004 Mpc-1, relieving the internal tension with H0. At ℓ>1000 , the same in-phase acoustic residuals that drive the global H0 constraints and appear as a lensing anomaly also favor running parameters which allow even lower H0, but not once lensing reconstruction is considered. Polarization spectra are intrinsically highly sensitive to these parameter shifts, and even more so in the Planck 2015 TE data due to an anomalous suppression in power at ℓ≈165 , which disfavors the best-fit H0 Λ CDM solution by more than 2 σ , and high H0 value at almost 3 σ . Current polarization data also slightly enhance the significance of a sharp suppression of large-scale power but leave room for large improvements in the future with cosmic variance limited E -mode measurements.
Panni, Simona; Montecchi-Palazzi, Luisa; Kiemer, Lars; Cabibbo, Andrea; Paoluzi, Serena; Santonico, Elena; Landgraf, Christiane; Volkmer-Engert, Rudolf; Bachi, Angela; Castagnoli, Luisa; Cesareni, Gianni
2011-01-01
Large-scale interaction studies contribute the largest fraction of protein interactions information in databases. However, co-purification of non-specific or indirect ligands, often results in data sets that are affected by a considerable number of false positives. For the fraction of interactions mediated by short linear peptides, we present here a combined experimental and computational strategy for ranking the reliability of the inferred partners. We apply this strategy to the family of 14-3-3 domains. We have first characterized the recognition specificity of this domain family, largely confirming the results of previous analyses, while revealing new features of the preferred sequence context of 14-3-3 phospho-peptide partners. Notably, a proline next to the carboxy side of the phospho-amino acid functions as a potent inhibitor of 14-3-3 binding. The position-specific information about residue preference was encoded in a scoring matrix and two regular expressions. The integration of these three features in a single predictive model outperforms publicly available prediction tools. Next we have combined, by a naïve Bayesian approach, these "peptide features" with "protein features", such as protein co-expression and co-localization. Our approach provides an orthogonal reliability assessment and maps with high confidence the 14-3-3 peptide target on the partner proteins. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A Generic multi-dimensional feature extraction method using multiobjective genetic programming.
Zhang, Yang; Rockett, Peter I
2009-01-01
In this paper, we present a generic feature extraction method for pattern classification using multiobjective genetic programming. This not only evolves the (near-)optimal set of mappings from a pattern space to a multi-dimensional decision space, but also simultaneously optimizes the dimensionality of that decision space. The presented framework evolves vector-to-vector feature extractors that maximize class separability. We demonstrate the efficacy of our approach by making statistically-founded comparisons with a wide variety of established classifier paradigms over a range of datasets and find that for most of the pairwise comparisons, our evolutionary method delivers statistically smaller misclassification errors. At very worst, our method displays no statistical difference in a few pairwise comparisons with established classifier/dataset combinations; crucially, none of the misclassification results produced by our method is worse than any comparator classifier. Although principally focused on feature extraction, feature selection is also performed as an implicit side effect; we show that both feature extraction and selection are important to the success of our technique. The presented method has the practical consequence of obviating the need to exhaustively evaluate a large family of conventional classifiers when faced with a new pattern recognition problem in order to attain a good classification accuracy.
Van Landeghem, Sofie; Abeel, Thomas; Saeys, Yvan; Van de Peer, Yves
2010-09-15
In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/).
Algorithms for Learning Preferences for Sets of Objects
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; desJardins, Marie; Eaton, Eric
2010-01-01
A method is being developed that provides for an artificial-intelligence system to learn a user's preferences for sets of objects and to thereafter automatically select subsets of objects according to those preferences. The method was originally intended to enable automated selection, from among large sets of images acquired by instruments aboard spacecraft, of image subsets considered to be scientifically valuable enough to justify use of limited communication resources for transmission to Earth. The method is also applicable to other sets of objects: examples of sets of objects considered in the development of the method include food menus, radio-station music playlists, and assortments of colored blocks for creating mosaics. The method does not require the user to perform the often-difficult task of quantitatively specifying preferences; instead, the user provides examples of preferred sets of objects. This method goes beyond related prior artificial-intelligence methods for learning which individual items are preferred by the user: this method supports a concept of setbased preferences, which include not only preferences for individual items but also preferences regarding types and degrees of diversity of items in a set. Consideration of diversity in this method involves recognition that members of a set may interact with each other in the sense that when considered together, they may be regarded as being complementary, redundant, or incompatible to various degrees. The effects of such interactions are loosely summarized in the term portfolio effect. The learning method relies on a preference representation language, denoted DD-PREF, to express set-based preferences. In DD-PREF, a preference is represented by a tuple that includes quality (depth) functions to estimate how desired a specific value is, weights for each feature preference, the desired diversity of feature values, and the relative importance of diversity versus depth. The system applies statistical concepts to estimate quantitative measures of the user s preferences from training examples (preferred subsets) specified by the user. Once preferences have been learned, the system uses those preferences to select preferred subsets from new sets. The method was found to be viable when tested in computational experiments on menus, music playlists, and rover images. Contemplated future development efforts include further tests on more diverse sets and development of a sub-method for (a) estimating the parameter that represents the relative importance of diversity versus depth, and (b) incorporating background knowledge about the nature of quality functions, which are special functions that specify depth preferences for features.
Hassan, Ahnaf Rashik; Bhuiyan, Mohammed Imamul Hassan
2016-09-15
Automatic sleep scoring is essential owing to the fact that conventionally a large volume of data have to be analyzed visually by the physicians which is onerous, time-consuming and error-prone. Therefore, there is a dire need of an automated sleep staging scheme. In this work, we decompose sleep-EEG signal segments using tunable-Q factor wavelet transform (TQWT). Various spectral features are then computed from TQWT sub-bands. The performance of spectral features in the TQWT domain has been determined by intuitive and graphical analyses, statistical validation, and Fisher criteria. Random forest is used to perform classification. Optimal choices and the effects of TQWT and random forest parameters have been determined and expounded. Experimental outcomes manifest the efficacy of our feature generation scheme in terms of p-values of ANOVA analysis and Fisher criteria. The proposed scheme yields 90.38%, 91.50%, 92.11%, 94.80%, 97.50% for 6-stage to 2-stage classification of sleep states on the benchmark Sleep-EDF data-set. In addition, its performance on DREAMS Subjects Data-set is also promising. The performance of the proposed method is significantly better than the existing ones in terms of accuracy and Cohen's kappa coefficient. Additionally, the proposed scheme gives high detection accuracy for sleep stages non-REM 1 and REM. Spectral features in the TQWT domain can discriminate sleep-EEG signals corresponding to various sleep states efficaciously. The proposed scheme will alleviate the burden of the physicians, speed-up sleep disorder diagnosis, and expedite sleep research. Copyright © 2016 Elsevier B.V. All rights reserved.
Schmidt, Joseph; MacNamara, Annmarie; Proudfit, Greg Hajcak; Zelinsky, Gregory J.
2014-01-01
The visual-search literature has assumed that the top-down target representation used to guide search resides in visual working memory (VWM). We directly tested this assumption using contralateral delay activity (CDA) to estimate the VWM load imposed by the target representation. In Experiment 1, observers previewed four photorealistic objects and were cued to remember the two objects appearing to the left or right of central fixation; Experiment 2 was identical except that observers previewed two photorealistic objects and were cued to remember one. CDA was measured during a delay following preview offset but before onset of a four-object search array. One of the targets was always present, and observers were asked to make an eye movement to it and press a button. We found lower magnitude CDA on trials when the initial search saccade was directed to the target (strong guidance) compared to when it was not (weak guidance). This difference also tended to be larger shortly before search-display onset and was largely unaffected by VWM item-capacity limits or number of previews. Moreover, the difference between mean strong- and weak-guidance CDA was proportional to the increase in search time between mean strong-and weak-guidance trials (as measured by time-to-target and reaction-time difference scores). Contrary to most search models, our data suggest that trials resulting in the maintenance of more target features results in poor search guidance to a target. We interpret these counterintuitive findings as evidence for strong search guidance using a small set of highly discriminative target features that remain after pruning from a larger set of features, with the load imposed on VWM varying with this feature-consolidation process. PMID:24599946