Unbiased feature selection in learning random forests for high-dimensional data.
Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi
2015-01-01
Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.
Ortega, Julio; Asensio-Cubero, Javier; Gan, John Q; Ortiz, Andrés
2016-07-15
Brain-computer interfacing (BCI) applications based on the classification of electroencephalographic (EEG) signals require solving high-dimensional pattern classification problems with such a relatively small number of training patterns that curse of dimensionality problems usually arise. Multiresolution analysis (MRA) has useful properties for signal analysis in both temporal and spectral analysis, and has been broadly used in the BCI field. However, MRA usually increases the dimensionality of the input data. Therefore, some approaches to feature selection or feature dimensionality reduction should be considered for improving the performance of the MRA based BCI. This paper investigates feature selection in the MRA-based frameworks for BCI. Several wrapper approaches to evolutionary multiobjective feature selection are proposed with different structures of classifiers. They are evaluated by comparing with baseline methods using sparse representation of features or without feature selection. The statistical analysis, by applying the Kolmogorov-Smirnoff and Kruskal-Wallis tests to the means of the Kappa values evaluated by using the test patterns in each approach, has demonstrated some advantages of the proposed approaches. In comparison with the baseline MRA approach used in previous studies, the proposed evolutionary multiobjective feature selection approaches provide similar or even better classification performances, with significant reduction in the number of features that need to be computed.
Zhang, Yu; Wu, Jianxin; Cai, Jianfei
2016-05-01
In large-scale visual recognition and image retrieval tasks, feature vectors, such as Fisher vector (FV) or the vector of locally aggregated descriptors (VLAD), have achieved state-of-the-art results. However, the combination of the large numbers of examples and high-dimensional vectors necessitates dimensionality reduction, in order to reduce its storage and CPU costs to a reasonable range. In spite of the popularity of various feature compression methods, this paper shows that the feature (dimension) selection is a better choice for high-dimensional FV/VLAD than the feature (dimension) compression methods, e.g., product quantization. We show that strong correlation among the feature dimensions in the FV and the VLAD may not exist, which renders feature selection a natural choice. We also show that, many dimensions in FV/VLAD are noise. Throwing them away using feature selection is better than compressing them and useful dimensions altogether using feature compression methods. To choose features, we propose an efficient importance sorting algorithm considering both the supervised and unsupervised cases, for visual recognition and image retrieval, respectively. Combining with the 1-bit quantization, feature selection has achieved both higher accuracy and less computational cost than feature compression methods, such as product quantization, on the FV and the VLAD image representations.
Cheng, Qiang; Zhou, Hongbo; Cheng, Jie
2011-06-01
Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.
Effective traffic features selection algorithm for cyber-attacks samples
NASA Astrophysics Data System (ADS)
Li, Yihong; Liu, Fangzheng; Du, Zhenyu
2018-05-01
By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.
Dimensionality Reduction Through Classifier Ensembles
NASA Technical Reports Server (NTRS)
Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)
1999-01-01
In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.
Bommert, Andrea; Rahnenführer, Jörg; Lang, Michel
2017-01-01
Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.
Advancing three-dimensional MEMS by complimentary laser micro manufacturing
NASA Astrophysics Data System (ADS)
Palmer, Jeremy A.; Williams, John D.; Lemp, Tom; Lehecka, Tom M.; Medina, Francisco; Wicker, Ryan B.
2006-01-01
This paper describes improvements that enable engineers to create three-dimensional MEMS in a variety of materials. It also provides a means for selectively adding three-dimensional, high aspect ratio features to pre-existing PMMA micro molds for subsequent LIGA processing. This complimentary method involves in situ construction of three-dimensional micro molds in a stand-alone configuration or directly adjacent to features formed by x-ray lithography. Three-dimensional micro molds are created by micro stereolithography (MSL), an additive rapid prototyping technology. Alternatively, three-dimensional features may be added by direct femtosecond laser micro machining. Parameters for optimal femtosecond laser micro machining of PMMA at 800 nanometers are presented. The technical discussion also includes strategies for enhancements in the context of material selection and post-process surface finish. This approach may lead to practical, cost-effective 3-D MEMS with the surface finish and throughput advantages of x-ray lithography. Accurate three-dimensional metal microstructures are demonstrated. Challenges remain in process planning for micro stereolithography and development of buried features following femtosecond laser micro machining.
Thomas, Minta; De Brabanter, Kris; De Moor, Bart
2014-05-10
DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.
Ensemble of sparse classifiers for high-dimensional biological data.
Kim, Sunghan; Scalzo, Fabien; Telesca, Donatello; Hu, Xiao
2015-01-01
Biological data are often high in dimension while the number of samples is small. In such cases, the performance of classification can be improved by reducing the dimension of data, which is referred to as feature selection. Recently, a novel feature selection method has been proposed utilising the sparsity of high-dimensional biological data where a small subset of features accounts for most variance of the dataset. In this study we propose a new classification method for high-dimensional biological data, which performs both feature selection and classification within a single framework. Our proposed method utilises a sparse linear solution technique and the bootstrap aggregating algorithm. We tested its performance on four public mass spectrometry cancer datasets along with two other conventional classification techniques such as Support Vector Machines and Adaptive Boosting. The results demonstrate that our proposed method performs more accurate classification across various cancer datasets than those conventional classification techniques.
A stereo remote sensing feature selection method based on artificial bee colony algorithm
NASA Astrophysics Data System (ADS)
Yan, Yiming; Liu, Pigang; Zhang, Ye; Su, Nan; Tian, Shu; Gao, Fengjiao; Shen, Yi
2014-05-01
To improve the efficiency of stereo information for remote sensing classification, a stereo remote sensing feature selection method is proposed in this paper presents, which is based on artificial bee colony algorithm. Remote sensing stereo information could be described by digital surface model (DSM) and optical image, which contain information of the three-dimensional structure and optical characteristics, respectively. Firstly, three-dimensional structure characteristic could be analyzed by 3D-Zernike descriptors (3DZD). However, different parameters of 3DZD could descript different complexity of three-dimensional structure, and it needs to be better optimized selected for various objects on the ground. Secondly, features for representing optical characteristic also need to be optimized. If not properly handled, when a stereo feature vector composed of 3DZD and image features, that would be a lot of redundant information, and the redundant information may not improve the classification accuracy, even cause adverse effects. To reduce information redundancy while maintaining or improving the classification accuracy, an optimized frame for this stereo feature selection problem is created, and artificial bee colony algorithm is introduced for solving this optimization problem. Experimental results show that the proposed method can effectively improve the computational efficiency, improve the classification accuracy.
High Dimensional Classification Using Features Annealed Independence Rules.
Fan, Jianqing; Fan, Yingying
2008-01-01
Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
Effects of band selection on endmember extraction for forestry applications
NASA Astrophysics Data System (ADS)
Karathanassi, Vassilia; Andreou, Charoula; Andronis, Vassilis; Kolokoussis, Polychronis
2014-10-01
In spectral unmixing theory, data reduction techniques play an important role as hyperspectral imagery contains an immense amount of data, posing many challenging problems such as data storage, computational efficiency, and the so called "curse of dimensionality". Feature extraction and feature selection are the two main approaches for dimensionality reduction. Feature extraction techniques are used for reducing the dimensionality of the hyperspectral data by applying transforms on hyperspectral data. Feature selection techniques retain the physical meaning of the data by selecting a set of bands from the input hyperspectral dataset, which mainly contain the information needed for spectral unmixing. Although feature selection techniques are well-known for their dimensionality reduction potentials they are rarely used in the unmixing process. The majority of the existing state-of-the-art dimensionality reduction methods set criteria to the spectral information, which is derived by the whole wavelength, in order to define the optimum spectral subspace. These criteria are not associated with any particular application but with the data statistics, such as correlation and entropy values. However, each application is associated with specific land c over materials, whose spectral characteristics present variations in specific wavelengths. In forestry for example, many applications focus on tree leaves, in which specific pigments such as chlorophyll, xanthophyll, etc. determine the wavelengths where tree species, diseases, etc., can be detected. For such applications, when the unmixing process is applied, the tree species, diseases, etc., are considered as the endmembers of interest. This paper focuses on investigating the effects of band selection on the endmember extraction by exploiting the information of the vegetation absorbance spectral zones. More precisely, it is explored whether endmember extraction can be optimized when specific sets of initial bands related to leaf spectral characteristics are selected. Experiments comprise application of well-known signal subspace estimation and endmember extraction methods on a hyperspectral imagery that presents a forest area. Evaluation of the extracted endmembers showed that more forest species can be extracted as endmembers using selected bands.
Chen, Yifei; Sun, Yuxing; Han, Bing-Qing
2015-01-01
Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.
Li, Jinyan; Fong, Simon; Wong, Raymond K; Millham, Richard; Wong, Kelvin K L
2017-06-28
Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.
Fukunaga-Koontz transform based dimensionality reduction for hyperspectral imagery
NASA Astrophysics Data System (ADS)
Ochilov, S.; Alam, M. S.; Bal, A.
2006-05-01
Fukunaga-Koontz Transform based technique offers some attractive properties for desired class oriented dimensionality reduction in hyperspectral imagery. In FKT, feature selection is performed by transforming into a new space where feature classes have complimentary eigenvectors. Dimensionality reduction technique based on these complimentary eigenvector analysis can be described under two classes, desired class and background clutter, such that each basis function best represent one class while carrying the least amount of information from the second class. By selecting a few eigenvectors which are most relevant to desired class, one can reduce the dimension of hyperspectral cube. Since the FKT based technique reduces data size, it provides significant advantages for near real time detection applications in hyperspectral imagery. Furthermore, the eigenvector selection approach significantly reduces computation burden via the dimensionality reduction processes. The performance of the proposed dimensionality reduction algorithm has been tested using real-world hyperspectral dataset.
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
A Fast, Open EEG Classification Framework Based on Feature Compression and Channel Ranking
Han, Jiuqi; Zhao, Yuwei; Sun, Hongji; Chen, Jiayun; Ke, Ang; Xu, Gesen; Zhang, Hualiang; Zhou, Jin; Wang, Changyong
2018-01-01
Superior feature extraction, channel selection and classification methods are essential for designing electroencephalography (EEG) classification frameworks. However, the performance of most frameworks is limited by their improper channel selection methods and too specifical design, leading to high computational complexity, non-convergent procedure and narrow expansibility. In this paper, to remedy these drawbacks, we propose a fast, open EEG classification framework centralized by EEG feature compression, low-dimensional representation, and convergent iterative channel ranking. First, to reduce the complexity, we use data clustering to compress the EEG features channel-wise, packing the high-dimensional EEG signal, and endowing them with numerical signatures. Second, to provide easy access to alternative superior methods, we structurally represent each EEG trial in a feature vector with its corresponding numerical signature. Thus, the recorded signals of many trials shrink to a low-dimensional structural matrix compatible with most pattern recognition methods. Third, a series of effective iterative feature selection approaches with theoretical convergence is introduced to rank the EEG channels and remove redundant ones, further accelerating the EEG classification process and ensuring its stability. Finally, a classical linear discriminant analysis (LDA) model is employed to classify a single EEG trial with selected channels. Experimental results on two real world brain-computer interface (BCI) competition datasets demonstrate the promising performance of the proposed framework over state-of-the-art methods. PMID:29713262
Classification Influence of Features on Given Emotions and Its Application in Feature Selection
NASA Astrophysics Data System (ADS)
Xing, Yin; Chen, Chuang; Liu, Li-Long
2018-04-01
In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.
Feature weight estimation for gene selection: a local hyperlinear learning approach
2014-01-01
Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071
Derivation of an artificial gene to improve classification accuracy upon gene selection.
Seo, Minseok; Oh, Sejong
2012-02-01
Classification analysis has been developed continuously since 1936. This research field has advanced as a result of development of classifiers such as KNN, ANN, and SVM, as well as through data preprocessing areas. Feature (gene) selection is required for very high dimensional data such as microarray before classification work. The goal of feature selection is to choose a subset of informative features that reduces processing time and provides higher classification accuracy. In this study, we devised a method of artificial gene making (AGM) for microarray data to improve classification accuracy. Our artificial gene was derived from a whole microarray dataset, and combined with a result of gene selection for classification analysis. We experimentally confirmed a clear improvement of classification accuracy after inserting artificial gene. Our artificial gene worked well for popular feature (gene) selection algorithms and classifiers. The proposed approach can be applied to any type of high dimensional dataset. Copyright © 2011 Elsevier Ltd. All rights reserved.
Intrinsic two-dimensional features as textons
NASA Technical Reports Server (NTRS)
Barth, E.; Zetzsche, C.; Rentschler, I.
1998-01-01
We suggest that intrinsic two-dimensional (i2D) features, computationally defined as the outputs of nonlinear operators that model the activity of end-stopped neurons, play a role in preattentive texture discrimination. We first show that for discriminable textures with identical power spectra the predictions of traditional models depend on the type of nonlinearity and fail for energy measures. We then argue that the concept of intrinsic dimensionality, and the existence of end-stopped neurons, can help us to understand the role of the nonlinearities. Furthermore, we show examples in which models without strong i2D selectivity fail to predict the correct ranking order of perceptual segregation. Our arguments regarding the importance of i2D features resemble the arguments of Julesz and co-workers regarding textons such as terminators and crossings. However, we provide a computational framework that identifies textons with the outputs of nonlinear operators that are selective to i2D features.
Decorrelation of the true and estimated classifier errors in high-dimensional settings.
Hanczar, Blaise; Hua, Jianping; Dougherty, Edward R
2007-01-01
The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional settings where feature selection is used to mitigate the peaking phenomenon (overfitting). Because classifier design is based upon random samples, both the true and estimated errors are sample-dependent random variables, and one would expect a loss of precision if the estimated and true errors are not well correlated, so that natural questions arise as to the degree of correlation and the manner in which lack of correlation impacts error estimation. We demonstrate the effect of correlation on error precision via a decomposition of the variance of the deviation distribution, observe that the correlation is often severely decreased in high-dimensional settings, and show that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on the variance of the estimated error. We consider the correlation between the true and estimated errors under different experimental conditions using both synthetic and real data, several feature-selection methods, different classification rules, and three error estimators commonly used (leave-one-out cross-validation, k-fold cross-validation, and .632 bootstrap). Moreover, three scenarios are considered: (1) feature selection, (2) known-feature set, and (3) all features. Only the first is of practical interest; however, the other two are needed for comparison purposes. We will observe that the true and estimated errors tend to be much more correlated in the case of a known feature set than with either feature selection or using all features, with the better correlation between the latter two showing no general trend, but differing for different models.
A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.
Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho
2014-10-01
Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.
Saini, Harsh; Lal, Sunil Pranit; Naidu, Vimal Vikash; Pickering, Vincel Wince; Singh, Gurmeet; Tsunoda, Tatsuhiko; Sharma, Alok
2016-12-05
High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy. Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance. This technique was applied on publicly available datasets whereby it substantially reduced the number of features used for classification while maintaining high accuracies. The proposed technique can be extremely useful in feature selection as it heuristically removes non-contributing features to improve the performance of classifiers.
Jamieson, Andrew R; Giger, Maryellen L; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha
2010-01-01
In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)]. These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance. In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN. Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.
A Selective Overview of Variable Selection in High Dimensional Feature Space
Fan, Jianqing
2010-01-01
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
A real negative selection algorithm with evolutionary preference for anomaly detection
NASA Astrophysics Data System (ADS)
Yang, Tao; Chen, Wen; Li, Tao
2017-04-01
Traditional real negative selection algorithms (RNSAs) adopt the estimated coverage (c0) as the algorithm termination threshold, and generate detectors randomly. With increasing dimensions, the data samples could reside in the low-dimensional subspace, so that the traditional detectors cannot effectively distinguish these samples. Furthermore, in high-dimensional feature space, c0 cannot exactly reflect the detectors set coverage rate for the nonself space, and it could lead the algorithm to be terminated unexpectedly when the number of detectors is insufficient. These shortcomings make the traditional RNSAs to perform poorly in high-dimensional feature space. Based upon "evolutionary preference" theory in immunology, this paper presents a real negative selection algorithm with evolutionary preference (RNSAP). RNSAP utilizes the "unknown nonself space", "low-dimensional target subspace" and "known nonself feature" as the evolutionary preference to guide the generation of detectors, thus ensuring the detectors can cover the nonself space more effectively. Besides, RNSAP uses redundancy to replace c0 as the termination threshold, in this way RNSAP can generate adequate detectors under a proper convergence rate. The theoretical analysis and experimental result demonstrate that, compared to the classical RNSA (V-detector), RNSAP can achieve a higher detection rate, but with less detectors and computing cost.
Simultaneous grouping pursuit and feature selection over an undirected graph*
Zhu, Yunzhang; Shen, Xiaotong; Pan, Wei
2013-01-01
Summary In high-dimensional regression, grouping pursuit and feature selection have their own merits while complementing each other in battling the curse of dimensionality. To seek a parsimonious model, we perform simultaneous grouping pursuit and feature selection over an arbitrary undirected graph with each node corresponding to one predictor. When the corresponding nodes are reachable from each other over the graph, regression coefficients can be grouped, whose absolute values are the same or close. This is motivated from gene network analysis, where genes tend to work in groups according to their biological functionalities. Through a nonconvex penalty, we develop a computational strategy and analyze the proposed method. Theoretical analysis indicates that the proposed method reconstructs the oracle estimator, that is, the unbiased least squares estimator given the true grouping, leading to consistent reconstruction of grouping structures and informative features, as well as to optimal parameter estimation. Simulation studies suggest that the method combines the benefit of grouping pursuit with that of feature selection, and compares favorably against its competitors in selection accuracy and predictive performance. An application to eQTL data is used to illustrate the methodology, where a network is incorporated into analysis through an undirected graph. PMID:24098061
LFSPMC: Linear feature selection program using the probability of misclassification
NASA Technical Reports Server (NTRS)
Guseman, L. F., Jr.; Marion, B. P.
1975-01-01
The computational procedure and associated computer program for a linear feature selection technique are presented. The technique assumes that: a finite number, m, of classes exists; each class is described by an n-dimensional multivariate normal density function of its measurement vectors; the mean vector and covariance matrix for each density function are known (or can be estimated); and the a priori probability for each class is known. The technique produces a single linear combination of the original measurements which minimizes the one-dimensional probability of misclassification defined by the transformed densities.
Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R; Barman, Ishan; Kumar Gundawar, Manoj
2015-08-19
Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the 'curse of dimensionality' have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers -based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations.
Variable importance in nonlinear kernels (VINK): classification of digitized histopathology.
Ginsburg, Shoshana; Ali, Sahirzeeshan; Lee, George; Basavanhally, Ajay; Madabhushi, Anant
2013-01-01
Quantitative histomorphometry is the process of modeling appearance of disease morphology on digitized histopathology images via image-based features (e.g., texture, graphs). Due to the curse of dimensionality, building classifiers with large numbers of features requires feature selection (which may require a large training set) or dimensionality reduction (DR). DR methods map the original high-dimensional features in terms of eigenvectors and eigenvalues, which limits the potential for feature transparency or interpretability. Although methods exist for variable selection and ranking on embeddings obtained via linear DR schemes (e.g., principal components analysis (PCA)), similar methods do not yet exist for nonlinear DR (NLDR) methods. In this work we present a simple yet elegant method for approximating the mapping between the data in the original feature space and the transformed data in the kernel PCA (KPCA) embedding space; this mapping provides the basis for quantification of variable importance in nonlinear kernels (VINK). We show how VINK can be implemented in conjunction with the popular Isomap and Laplacian eigenmap algorithms. VINK is evaluated in the contexts of three different problems in digital pathology: (1) predicting five year PSA failure following radical prostatectomy, (2) predicting Oncotype DX recurrence risk scores for ER+ breast cancers, and (3) distinguishing good and poor outcome p16+ oropharyngeal tumors. We demonstrate that subsets of features identified by VINK provide similar or better classification or regression performance compared to the original high dimensional feature sets.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.
Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification
Feng, Yang; Jiang, Jiancheng; Tong, Xin
2015-01-01
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970
Online feature selection with streaming features.
Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan
2013-05-01
We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.
Spectral feature design in high dimensional multispectral data
NASA Technical Reports Server (NTRS)
Chen, Chih-Chien Thomas; Landgrebe, David A.
1988-01-01
The High resolution Imaging Spectrometer (HIRIS) is designed to acquire images simultaneously in 192 spectral bands in the 0.4 to 2.5 micrometers wavelength region. It will make possible the collection of essentially continuous reflectance spectra at a spectral resolution sufficient to extract significantly enhanced amounts of information from return signals as compared to existing systems. The advantages of such high dimensional data come at a cost of increased system and data complexity. For example, since the finer the spectral resolution, the higher the data rate, it becomes impractical to design the sensor to be operated continuously. It is essential to find new ways to preprocess the data which reduce the data rate while at the same time maintaining the information content of the high dimensional signal produced. Four spectral feature design techniques are developed from the Weighted Karhunen-Loeve Transforms: (1) non-overlapping band feature selection algorithm; (2) overlapping band feature selection algorithm; (3) Walsh function approach; and (4) infinite clipped optimal function approach. The infinite clipped optimal function approach is chosen since the features are easiest to find and their classification performance is the best. After the preprocessed data has been received at the ground station, canonical analysis is further used to find the best set of features under the criterion that maximal class separability is achieved. Both 100 dimensional vegetation data and 200 dimensional soil data were used to test the spectral feature design system. It was shown that the infinite clipped versions of the first 16 optimal features had excellent classification performance. The overall probability of correct classification is over 90 percent while providing for a reduced downlink data rate by a factor of 10.
Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images.
Zhang, Lefei; Zhang, Qian; Du, Bo; Huang, Xin; Tang, Yuan Yan; Tao, Dacheng
2018-01-01
In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature, and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation has not efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient.
Jamieson, Andrew R.; Giger, Maryellen L.; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha
2010-01-01
Purpose: In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput. 15, 1373–1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res. 9, 2579–2605 (2008)]. Methods: These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier’s AUC performance. Results: In the large U.S. data set, sample high performance results include, AUC0.632+=0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+=0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+=0.90 with interval [0.847;0.919], all using the MCMC-BANN. Conclusions: Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space. PMID:20175497
NASA Astrophysics Data System (ADS)
Taşkin Kaya, Gülşen
2013-10-01
Recently, earthquake damage assessment using satellite images has been a very popular ongoing research direction. Especially with the availability of very high resolution (VHR) satellite images, a quite detailed damage map based on building scale has been produced, and various studies have also been conducted in the literature. As the spatial resolution of satellite images increases, distinguishability of damage patterns becomes more cruel especially in case of using only the spectral information during classification. In order to overcome this difficulty, textural information needs to be involved to the classification to improve the visual quality and reliability of damage map. There are many kinds of textural information which can be derived from VHR satellite images depending on the algorithm used. However, extraction of textural information and evaluation of them have been generally a time consuming process especially for the large areas affected from the earthquake due to the size of VHR image. Therefore, in order to provide a quick damage map, the most useful features describing damage patterns needs to be known in advance as well as the redundant features. In this study, a very high resolution satellite image after Iran, Bam earthquake was used to identify the earthquake damage. Not only the spectral information, textural information was also used during the classification. For textural information, second order Haralick features were extracted from the panchromatic image for the area of interest using gray level co-occurrence matrix with different size of windows and directions. In addition to using spatial features in classification, the most useful features representing the damage characteristic were selected with a novel feature selection method based on high dimensional model representation (HDMR) giving sensitivity of each feature during classification. The method called HDMR was recently proposed as an efficient tool to capture the input-output relationships in high-dimensional systems for many problems in science and engineering. The HDMR method is developed to improve the efficiency of the deducing high dimensional behaviors. The method is formed by a particular organization of low dimensional component functions, in which each function is the contribution of one or more input variables to the output variables.
NASA Astrophysics Data System (ADS)
Lestari, A. W.; Rustam, Z.
2017-07-01
In the last decade, breast cancer has become the focus of world attention as this disease is one of the primary leading cause of death for women. Therefore, it is necessary to have the correct precautions and treatment. In previous studies, Fuzzy Kennel K-Medoid algorithm has been used for multi-class data. This paper proposes an algorithm to classify the high dimensional data of breast cancer using Fuzzy Possibilistic C-means (FPCM) and a new method based on clustering analysis using Normed Kernel Function-Based Fuzzy Possibilistic C-Means (NKFPCM). The objective of this paper is to obtain the best accuracy in classification of breast cancer data. In order to improve the accuracy of the two methods, the features candidates are evaluated using feature selection, where Laplacian Score is used. The results show the comparison accuracy and running time of FPCM and NKFPCM with and without feature selection.
Fisher, Charles K; Mehta, Pankaj
2015-06-01
Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Fuzzy feature selection based on interval type-2 fuzzy sets
NASA Astrophysics Data System (ADS)
Cherif, Sahar; Baklouti, Nesrine; Alimi, Adel; Snasel, Vaclav
2017-03-01
When dealing with real world data; noise, complexity, dimensionality, uncertainty and irrelevance can lead to low performance and insignificant judgment. Fuzzy logic is a powerful tool for controlling conflicting attributes which can have similar effects and close meanings. In this paper, an interval type-2 fuzzy feature selection is presented as a new approach for removing irrelevant features and reducing complexity. We demonstrate how can Feature Selection be joined with Interval Type-2 Fuzzy Logic for keeping significant features and hence reducing time complexity. The proposed method is compared with some other approaches. The results show that the number of attributes is proportionally small.
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
Wang, Hailing; Ip, Chengteng; Fu, Shimin; Sun, Pei
2017-05-01
Face recognition theories suggest that our brains process invariant (e.g., gender) and changeable (e.g., emotion) facial dimensions separately. To investigate whether these two dimensions are processed in different time courses, we analyzed the selection negativity (SN, an event-related potential component reflecting attentional modulation) elicited by face gender and emotion during a feature selective attention task. Participants were instructed to attend to a combination of face emotion and gender attributes in Experiment 1 (bi-dimensional task) and to either face emotion or gender in Experiment 2 (uni-dimensional task). The results revealed that face emotion did not elicit a substantial SN, whereas face gender consistently generated a substantial SN in both experiments. These results suggest that face gender is more sensitive to feature-selective attention and that face emotion is encoded relatively automatically on SN, implying the existence of different underlying processing mechanisms for invariant and changeable facial dimensions. Copyright © 2017 Elsevier Ltd. All rights reserved.
A study of metaheuristic algorithms for high dimensional feature selection on microarray data
NASA Astrophysics Data System (ADS)
Dankolo, Muhammad Nasiru; Radzi, Nor Haizan Mohamed; Sallehuddin, Roselina; Mustaffa, Noorfa Haszlinna
2017-11-01
Microarray systems enable experts to examine gene profile at molecular level using machine learning algorithms. It increases the potentials of classification and diagnosis of many diseases at gene expression level. Though, numerous difficulties may affect the efficiency of machine learning algorithms which includes vast number of genes features comprised in the original data. Many of these features may be unrelated to the intended analysis. Therefore, feature selection is necessary to be performed in the data pre-processing. Many feature selection algorithms are developed and applied on microarray which including the metaheuristic optimization algorithms. This paper discusses the application of the metaheuristics algorithms for feature selection in microarray dataset. This study reveals that, the algorithms have yield an interesting result with limited resources thereby saving computational expenses of machine learning algorithms.
A Generic multi-dimensional feature extraction method using multiobjective genetic programming.
Zhang, Yang; Rockett, Peter I
2009-01-01
In this paper, we present a generic feature extraction method for pattern classification using multiobjective genetic programming. This not only evolves the (near-)optimal set of mappings from a pattern space to a multi-dimensional decision space, but also simultaneously optimizes the dimensionality of that decision space. The presented framework evolves vector-to-vector feature extractors that maximize class separability. We demonstrate the efficacy of our approach by making statistically-founded comparisons with a wide variety of established classifier paradigms over a range of datasets and find that for most of the pairwise comparisons, our evolutionary method delivers statistically smaller misclassification errors. At very worst, our method displays no statistical difference in a few pairwise comparisons with established classifier/dataset combinations; crucially, none of the misclassification results produced by our method is worse than any comparator classifier. Although principally focused on feature extraction, feature selection is also performed as an implicit side effect; we show that both feature extraction and selection are important to the success of our technique. The presented method has the practical consequence of obviating the need to exhaustively evaluate a large family of conventional classifiers when faced with a new pattern recognition problem in order to attain a good classification accuracy.
Using learning automata to determine proper subset size in high-dimensional spaces
NASA Astrophysics Data System (ADS)
Seyyedi, Seyyed Hossein; Minaei-Bidgoli, Behrouz
2017-03-01
In this paper, we offer a new method called FSLA (Finding the best candidate Subset using Learning Automata), which combines the filter and wrapper approaches for feature selection in high-dimensional spaces. Considering the difficulties of dimension reduction in high-dimensional spaces, FSLA's multi-objective functionality is to determine, in an efficient manner, a feature subset that leads to an appropriate tradeoff between the learning algorithm's accuracy and efficiency. First, using an existing weighting function, the feature list is sorted and selected subsets of the list of different sizes are considered. Then, a learning automaton verifies the performance of each subset when it is used as the input space of the learning algorithm and estimates its fitness upon the algorithm's accuracy and the subset size, which determines the algorithm's efficiency. Finally, FSLA introduces the fittest subset as the best choice. We tested FSLA in the framework of text classification. The results confirm its promising performance of attaining the identified goal.
Kopp, Bruno; Tabeling, Sandra; Moschner, Carsten; Wessel, Karl
2007-08-17
Decision-making is a fundamental capacity which is crucial to many higher-order psychological functions. We recorded event-related potentials (ERPs) during a visual target-identification task that required go-nogo choices. Targets were identified on the basis of cross-dimensional conjunctions of particular colors and forms. Color discriminability was manipulated in three conditions to determine the effects of color distinctiveness on component processes of decision-making. Target identification was accompanied by the emergence of prefrontal P2a and P3b. Selection negativity (SN) revealed that target-compatible features captured attention more than target-incompatible features, suggesting that intra-dimensional attentional capture was goal-contingent. No changes of cross-dimensional selection priorities were measurable when color discriminability was altered. Peak latencies of the color-related SN provided a chronometric measure of the duration of attention-related neural processing. ERPs recorded over the frontocentral scalp (N2c, P3a) revealed that color-overlap distractors, more than form-overlap distractors, required additional late selection. The need for additional response selection induced by color-overlap distractors was severely reduced when color discriminability decreased. We propose a simple model of cross-dimensional perceptual decision-making. The temporal synchrony of separate color-related and form-related choices determines whether or not distractor processing includes post-perceptual stages. ERP measures contribute to a comprehensive explanation of the temporal dynamics of component processes of perceptual decision-making.
Low-Dimensional Feature Representation for Instrument Identification
NASA Astrophysics Data System (ADS)
Ihara, Mizuki; Maeda, Shin-Ichi; Ikeda, Kazushi; Ishii, Shin
For monophonic music instrument identification, various feature extraction and selection methods have been proposed. One of the issues toward instrument identification is that the same spectrum is not always observed even in the same instrument due to the difference of the recording condition. Therefore, it is important to find non-redundant instrument-specific features that maintain information essential for high-quality instrument identification to apply them to various instrumental music analyses. For such a dimensionality reduction method, the authors propose the utilization of linear projection methods: local Fisher discriminant analysis (LFDA) and LFDA combined with principal component analysis (PCA). After experimentally clarifying that raw power spectra are actually good for instrument classification, the authors reduced the feature dimensionality by LFDA or by PCA followed by LFDA (PCA-LFDA). The reduced features achieved reasonably high identification performance that was comparable or higher than those by the power spectra and those achieved by other existing studies. These results demonstrated that our LFDA and PCA-LFDA can successfully extract low-dimensional instrument features that maintain the characteristic information of the instruments.
Ghayab, Hadi Ratham Al; Li, Yan; Abdulla, Shahab; Diykh, Mohammed; Wan, Xiangkui
2016-06-01
Electroencephalogram (EEG) signals are used broadly in the medical fields. The main applications of EEG signals are the diagnosis and treatment of diseases such as epilepsy, Alzheimer, sleep problems and so on. This paper presents a new method which extracts and selects features from multi-channel EEG signals. This research focuses on three main points. Firstly, simple random sampling (SRS) technique is used to extract features from the time domain of EEG signals. Secondly, the sequential feature selection (SFS) algorithm is applied to select the key features and to reduce the dimensionality of the data. Finally, the selected features are forwarded to a least square support vector machine (LS_SVM) classifier to classify the EEG signals. The LS_SVM classifier classified the features which are extracted and selected from the SRS and the SFS. The experimental results show that the method achieves 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.
Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel
2011-05-09
Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data
2011-01-01
Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets. PMID:21554689
Sparse Zero-Sum Games as Stable Functional Feature Selection
Sokolovska, Nataliya; Teytaud, Olivier; Rizkalla, Salwa; Clément, Karine; Zucker, Jean-Daniel
2015-01-01
In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints. PMID:26325268
Torres-Valencia, Cristian A; Álvarez, Mauricio A; Orozco-Gutiérrez, Alvaro A
2014-01-01
Human emotion recognition (HER) allows the assessment of an affective state of a subject. Until recently, such emotional states were described in terms of discrete emotions, like happiness or contempt. In order to cover a high range of emotions, researchers in the field have introduced different dimensional spaces for emotion description that allow the characterization of affective states in terms of several variables or dimensions that measure distinct aspects of the emotion. One of the most common of such dimensional spaces is the bidimensional Arousal/Valence space. To the best of our knowledge, all HER systems so far have modelled independently, the dimensions in these dimensional spaces. In this paper, we study the effect of modelling the output dimensions simultaneously and show experimentally the advantages in modeling them in this way. We consider a multimodal approach by including features from the Electroencephalogram and a few physiological signals. For modelling the multiple outputs, we employ a multiple output regressor based on support vector machines. We also include an stage of feature selection that is developed within an embedded approach known as Recursive Feature Elimination (RFE), proposed initially for SVM. The results show that several features can be eliminated using the multiple output support vector regressor with RFE without affecting the performance of the regressor. From the analysis of the features selected in smaller subsets via RFE, it can be observed that the signals that are more informative into the arousal and valence space discrimination are the EEG, Electrooculogram/Electromiogram (EOG/EMG) and the Galvanic Skin Response (GSR).
McTwo: a two-step feature selection algorithm based on maximal information coefficient.
Ge, Ruiquan; Zhou, Manli; Luo, Youxi; Meng, Qinghan; Mai, Guoqin; Ma, Dongli; Wang, Guoqing; Zhou, Fengfeng
2016-03-23
High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.
Integrated feature extraction and selection for neuroimage classification
NASA Astrophysics Data System (ADS)
Fan, Yong; Shen, Dinggang
2009-02-01
Feature extraction and selection are of great importance in neuroimage classification for identifying informative features and reducing feature dimensionality, which are generally implemented as two separate steps. This paper presents an integrated feature extraction and selection algorithm with two iterative steps: constrained subspace learning based feature extraction and support vector machine (SVM) based feature selection. The subspace learning based feature extraction focuses on the brain regions with higher possibility of being affected by the disease under study, while the possibility of brain regions being affected by disease is estimated by the SVM based feature selection, in conjunction with SVM classification. This algorithm can not only take into account the inter-correlation among different brain regions, but also overcome the limitation of traditional subspace learning based feature extraction methods. To achieve robust performance and optimal selection of parameters involved in feature extraction, selection, and classification, a bootstrapping strategy is used to generate multiple versions of training and testing sets for parameter optimization, according to the classification performance measured by the area under the ROC (receiver operating characteristic) curve. The integrated feature extraction and selection method is applied to a structural MR image based Alzheimer's disease (AD) study with 98 non-demented and 100 demented subjects. Cross-validation results indicate that the proposed algorithm can improve performance of the traditional subspace learning based classification.
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press. With a CD: data, software, guides. (2009). 2. Kanevski M. Spatial Predictions of Soil Contamination Using General Regression Neural Networks. Systems Research and Information Systems, Volume 8, number 4, 1999. 3. Robert S., Foresti L., Kanevski M. Spatial prediction of monthly wind speeds in complex terrain with adaptive general regression neural networks. International Journal of Climatology, 33 pp. 1793-1804, 2013.
NASA Astrophysics Data System (ADS)
Shahiri, Amirah Mohamed; Husain, Wahidah; Rashid, Nur'Aini Abd
2017-10-01
Huge amounts of data in educational datasets may cause the problem in producing quality data. Recently, data mining approach are increasingly used by educational data mining researchers for analyzing the data patterns. However, many research studies have concentrated on selecting suitable learning algorithms instead of performing feature selection process. As a result, these data has problem with computational complexity and spend longer computational time for classification. The main objective of this research is to provide an overview of feature selection techniques that have been used to analyze the most significant features. Then, this research will propose a framework to improve the quality of students' dataset. The proposed framework uses filter and wrapper based technique to support prediction process in future study.
NASA Astrophysics Data System (ADS)
Mahrooghy, Majid; Ashraf, Ahmed B.; Daye, Dania; Mies, Carolyn; Rosen, Mark; Feldman, Michael; Kontos, Despina
2014-03-01
We evaluate the prognostic value of sparse representation-based features by applying the K-SVD algorithm on multiparametric kinetic, textural, and morphologic features in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). K-SVD is an iterative dimensionality reduction method that optimally reduces the initial feature space by updating the dictionary columns jointly with the sparse representation coefficients. Therefore, by using K-SVD, we not only provide sparse representation of the features and condense the information in a few coefficients but also we reduce the dimensionality. The extracted K-SVD features are evaluated by a machine learning algorithm including a logistic regression classifier for the task of classifying high versus low breast cancer recurrence risk as determined by a validated gene expression assay. The features are evaluated using ROC curve analysis and leave one-out cross validation for different sparse representation and dimensionality reduction numbers. Optimal sparse representation is obtained when the number of dictionary elements is 4 (K=4) and maximum non-zero coefficients is 2 (L=2). We compare K-SVD with ANOVA based feature selection for the same prognostic features. The ROC results show that the AUC of the K-SVD based (K=4, L=2), the ANOVA based, and the original features (i.e., no dimensionality reduction) are 0.78, 0.71. and 0.68, respectively. From the results, it can be inferred that by using sparse representation of the originally extracted multi-parametric, high-dimensional data, we can condense the information on a few coefficients with the highest predictive value. In addition, the dimensionality reduction introduced by K-SVD can prevent models from over-fitting.
Target oriented dimensionality reduction of hyperspectral data by Kernel Fukunaga-Koontz Transform
NASA Astrophysics Data System (ADS)
Binol, Hamidullah; Ochilov, Shuhrat; Alam, Mohammad S.; Bal, Abdullah
2017-02-01
Principal component analysis (PCA) is a popular technique in remote sensing for dimensionality reduction. While PCA is suitable for data compression, it is not necessarily an optimal technique for feature extraction, particularly when the features are exploited in supervised learning applications (Cheriyadat and Bruce, 2003) [1]. Preserving features belonging to the target is very crucial to the performance of target detection/recognition techniques. Fukunaga-Koontz Transform (FKT) based supervised band reduction technique can be used to provide this requirement. FKT achieves feature selection by transforming into a new space in where feature classes have complimentary eigenvectors. Analysis of these eigenvectors under two classes, target and background clutter, can be utilized for target oriented band reduction since each basis functions best represent target class while carrying least information of the background class. By selecting few eigenvectors which are the most relevant to the target class, dimension of hyperspectral data can be reduced and thus, it presents significant advantages for near real time target detection applications. The nonlinear properties of the data can be extracted by kernel approach which provides better target features. Thus, we propose constructing kernel FKT (KFKT) to present target oriented band reduction. The performance of the proposed KFKT based target oriented dimensionality reduction algorithm has been tested employing two real-world hyperspectral data and results have been reported consequently.
Joint Feature Selection and Classification for Multilabel Learning.
Huang, Jun; Li, Guorong; Huang, Qingming; Wu, Xindong
2018-03-01
Multilabel learning deals with examples having multiple class labels simultaneously. It has been applied to a variety of applications, such as text categorization and image annotation. A large number of algorithms have been proposed for multilabel learning, most of which concentrate on multilabel classification problems and only a few of them are feature selection algorithms. Current multilabel classification models are mainly built on a single data representation composed of all the features which are shared by all the class labels. Since each class label might be decided by some specific features of its own, and the problems of classification and feature selection are often addressed independently, in this paper, we propose a novel method which can perform joint feature selection and classification for multilabel learning, named JFSC. Different from many existing methods, JFSC learns both shared features and label-specific features by considering pairwise label correlations, and builds the multilabel classifier on the learned low-dimensional data representations simultaneously. A comparative study with state-of-the-art approaches manifests a competitive performance of our proposed method both in classification and feature selection for multilabel learning.
Feature Selection Methods for Zero-Shot Learning of Neural Activity.
Caceres, Carlos A; Roos, Matthew J; Rupp, Kyle M; Milsap, Griffin; Crone, Nathan E; Wolmetz, Michael E; Ratto, Christopher R
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.
Feature Grouping and Selection Over an Undirected Graph.
Yang, Sen; Yuan, Lei; Lai, Ying-Cheng; Shen, Xiaotong; Wonka, Peter; Ye, Jieping
2012-01-01
High-dimensional regression/classification continues to be an important and challenging problem, especially when features are highly correlated. Feature selection, combined with additional structure information on the features has been considered to be promising in promoting regression/classification performance. Graph-guided fused lasso (GFlasso) has recently been proposed to facilitate feature selection and graph structure exploitation, when features exhibit certain graph structures. However, the formulation in GFlasso relies on pairwise sample correlations to perform feature grouping, which could introduce additional estimation bias. In this paper, we propose three new feature grouping and selection methods to resolve this issue. The first method employs a convex function to penalize the pairwise l ∞ norm of connected regression/classification coefficients, achieving simultaneous feature grouping and selection. The second method improves the first one by utilizing a non-convex function to reduce the estimation bias. The third one is the extension of the second method using a truncated l 1 regularization to further reduce the estimation bias. The proposed methods combine feature grouping and feature selection to enhance estimation accuracy. We employ the alternating direction method of multipliers (ADMM) and difference of convex functions (DC) programming to solve the proposed formulations. Our experimental results on synthetic data and two real datasets demonstrate the effectiveness of the proposed methods.
Zhang, Yifan; Gao, Xunzhang; Peng, Xuan; Ye, Jiaqi; Li, Xiang
2018-05-16
The High Resolution Range Profile (HRRP) recognition has attracted great concern in the field of Radar Automatic Target Recognition (RATR). However, traditional HRRP recognition methods failed to model high dimensional sequential data efficiently and have a poor anti-noise ability. To deal with these problems, a novel stochastic neural network model named Attention-based Recurrent Temporal Restricted Boltzmann Machine (ARTRBM) is proposed in this paper. RTRBM is utilized to extract discriminative features and the attention mechanism is adopted to select major features. RTRBM is efficient to model high dimensional HRRP sequences because it can extract the information of temporal and spatial correlation between adjacent HRRPs. The attention mechanism is used in sequential data recognition tasks including machine translation and relation classification, which makes the model pay more attention to the major features of recognition. Therefore, the combination of RTRBM and the attention mechanism makes our model effective for extracting more internal related features and choose the important parts of the extracted features. Additionally, the model performs well with the noise corrupted HRRP data. Experimental results on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset show that our proposed model outperforms other traditional methods, which indicates that ARTRBM extracts, selects, and utilizes the correlation information between adjacent HRRPs effectively and is suitable for high dimensional data or noise corrupted data.
Random forest feature selection approach for image segmentation
NASA Astrophysics Data System (ADS)
Lefkovits, László; Lefkovits, Szidónia; Emerich, Simina; Vaida, Mircea Florin
2017-03-01
In the field of image segmentation, discriminative models have shown promising performance. Generally, every such model begins with the extraction of numerous features from annotated images. Most authors create their discriminative model by using many features without using any selection criteria. A more reliable model can be built by using a framework that selects the important variables, from the point of view of the classification, and eliminates the unimportant once. In this article we present a framework for feature selection and data dimensionality reduction. The methodology is built around the random forest (RF) algorithm and its variable importance evaluation. In order to deal with datasets so large as to be practically unmanageable, we propose an algorithm based on RF that reduces the dimension of the database by eliminating irrelevant features. Furthermore, this framework is applied to optimize our discriminative model for brain tumor segmentation.
Zheng, Weili; Ackley, Elena S; Martínez-Ramón, Manel; Posse, Stefan
2013-02-01
In previous works, boosting aggregation of classifier outputs from discrete brain areas has been demonstrated to reduce dimensionality and improve the robustness and accuracy of functional magnetic resonance imaging (fMRI) classification. However, dimensionality reduction and classification of mixed activation patterns of multiple classes remain challenging. In the present study, the goals were (a) to reduce dimensionality by combining feature reduction at the voxel level and backward elimination of optimally aggregated classifiers at the region level, (b) to compare region selection for spatially aggregated classification using boosting and partial least squares regression methods and (c) to resolve mixed activation patterns using probabilistic prediction of individual tasks. Brain activation maps from interleaved visual, motor, auditory and cognitive tasks were segmented into 144 functional regions. Feature selection reduced the number of feature voxels by more than 50%, leaving 95 regions. The two aggregation approaches further reduced the number of regions to 30, resulting in more than 75% reduction of classification time and misclassification rates of less than 3%. Boosting and partial least squares (PLS) were compared to select the most discriminative and the most task correlated regions, respectively. Successful task prediction in mixed activation patterns was feasible within the first block of task activation in real-time fMRI experiments. This methodology is suitable for sparsifying activation patterns in real-time fMRI and for neurofeedback from distributed networks of brain activation. Copyright © 2013 Elsevier Inc. All rights reserved.
Feature Selection Methods for Zero-Shot Learning of Neural Activity
Caceres, Carlos A.; Roos, Matthew J.; Rupp, Kyle M.; Milsap, Griffin; Crone, Nathan E.; Wolmetz, Michael E.; Ratto, Christopher R.
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy. PMID:28690513
PCA based feature reduction to improve the accuracy of decision tree c4.5 classification
NASA Astrophysics Data System (ADS)
Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.
2018-03-01
Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.
Lancaster, Matthew E; Shelhamer, Ryan; Homa, Donald
2013-04-01
Two experiments investigated category inference when categories were composed of correlated or uncorrelated dimensions and the categories overlapped minimally or moderately. When the categories minimally overlapped, the dimensions were strongly correlated with the category label. Following a classification learning phase, subsequent transfer required the selection of either a category label or a feature when one, two, or three features were missing. Experiments 1 and 2 differed primarily in the number of learning blocks prior to transfer. In each experiment, the inference of the category label or category feature was influenced by both dimensional and category correlations, as well as their interaction. The number of cues available at test impacted performance more when the dimensional correlations were zero and category overlap was high. However, a minimal number of cues were sufficient to produce high levels of inference when the dimensions were highly correlated; additional cues had a positive but reduced impact, even when overlap was high. Subjects were generally more accurate in inferring the category label than a category feature regardless of dimensional correlation, category overlap, or number of cues available at test. Whether the category label functioned as a special feature or not was critically dependent upon these embedded correlations, with feature inference driven more strongly by dimensional correlations.
Multi-level gene/MiRNA feature selection using deep belief nets and active learning.
Ibrahim, Rania; Yousri, Noha A; Ismail, Mohamed A; El-Makky, Nagwa M
2014-01-01
Selecting the most discriminative genes/miRNAs has been raised as an important task in bioinformatics to enhance disease classifiers and to mitigate the dimensionality curse problem. Original feature selection methods choose genes/miRNAs based on their individual features regardless of how they perform together. Considering group features instead of individual ones provides a better view for selecting the most informative genes/miRNAs. Recently, deep learning has proven its ability in representing the data in multiple levels of abstraction, allowing for better discrimination between different classes. However, the idea of using deep learning for feature selection is not widely used in the bioinformatics field yet. In this paper, a novel multi-level feature selection approach named MLFS is proposed for selecting genes/miRNAs based on expression profiles. The approach is based on both deep and active learning. Moreover, an extension to use the technique for miRNAs is presented by considering the biological relation between miRNAs and genes. Experimental results show that the approach was able to outperform classical feature selection methods in hepatocellular carcinoma (HCC) by 9%, lung cancer by 6% and breast cancer by around 10% in F1-measure. Results also show the enhancement in F1-measure of our approach over recently related work in [1] and [2].
Guo, Xinyu; Dominick, Kelli C; Minai, Ali A; Li, Hailong; Erickson, Craig A; Lu, Long J
2017-01-01
The whole-brain functional connectivity (FC) pattern obtained from resting-state functional magnetic resonance imaging data are commonly applied to study neuropsychiatric conditions such as autism spectrum disorder (ASD) by using different machine learning models. Recent studies indicate that both hyper- and hypo- aberrant ASD-associated FCs were widely distributed throughout the entire brain rather than only in some specific brain regions. Deep neural networks (DNN) with multiple hidden layers have shown the ability to systematically extract lower-to-higher level information from high dimensional data across a series of neural hidden layers, significantly improving classification accuracy for such data. In this study, a DNN with a novel feature selection method (DNN-FS) is developed for the high dimensional whole-brain resting-state FC pattern classification of ASD patients vs. typical development (TD) controls. The feature selection method is able to help the DNN generate low dimensional high-quality representations of the whole-brain FC patterns by selecting features with high discriminating power from multiple trained sparse auto-encoders. For the comparison, a DNN without the feature selection method (DNN-woFS) is developed, and both of them are tested with different architectures (i.e., with different numbers of hidden layers/nodes). Results show that the best classification accuracy of 86.36% is generated by the DNN-FS approach with 3 hidden layers and 150 hidden nodes (3/150). Remarkably, DNN-FS outperforms DNN-woFS for all architectures studied. The most significant accuracy improvement was 9.09% with the 3/150 architecture. The method also outperforms other feature selection methods, e.g., two sample t -test and elastic net. In addition to improving the classification accuracy, a Fisher's score-based biomarker identification method based on the DNN is also developed, and used to identify 32 FCs related to ASD. These FCs come from or cross different pre-defined brain networks including the default-mode, cingulo-opercular, frontal-parietal, and cerebellum. Thirteen of them are statically significant between ASD and TD groups (two sample t -test p < 0.05) while 19 of them are not. The relationship between the statically significant FCs and the corresponding ASD behavior symptoms is discussed based on the literature and clinician's expert knowledge. Meanwhile, the potential reason of obtaining 19 FCs which are not statistically significant is also provided.
NASA Astrophysics Data System (ADS)
Adi Putra, Januar
2018-04-01
In this paper, we propose a new mammogram classification scheme to classify the breast tissues as normal or abnormal. Feature matrix is generated using Local Binary Pattern to all the detailed coefficients from 2D-DWT of the region of interest (ROI) of a mammogram. Feature selection is done by selecting the relevant features that affect the classification. Feature selection is used to reduce the dimensionality of data and features that are not relevant, in this paper the F-test and Ttest will be performed to the results of the feature extraction dataset to reduce and select the relevant feature. The best features are used in a Neural Network classifier for classification. In this research we use MIAS and DDSM database. In addition to the suggested scheme, the competent schemes are also simulated for comparative analysis. It is observed that the proposed scheme has a better say with respect to accuracy, specificity and sensitivity. Based on experiments, the performance of the proposed scheme can produce high accuracy that is 92.71%, while the lowest accuracy obtained is 77.08%.
Fault Diagnosis for Rotating Machinery: A Method based on Image Processing
Lu, Chen; Wang, Yang; Ragulskis, Minvydas; Cheng, Yujie
2016-01-01
Rotating machinery is one of the most typical types of mechanical equipment and plays a significant role in industrial applications. Condition monitoring and fault diagnosis of rotating machinery has gained wide attention for its significance in preventing catastrophic accident and guaranteeing sufficient maintenance. With the development of science and technology, fault diagnosis methods based on multi-disciplines are becoming the focus in the field of fault diagnosis of rotating machinery. This paper presents a multi-discipline method based on image-processing for fault diagnosis of rotating machinery. Different from traditional analysis method in one-dimensional space, this study employs computing method in the field of image processing to realize automatic feature extraction and fault diagnosis in a two-dimensional space. The proposed method mainly includes the following steps. First, the vibration signal is transformed into a bi-spectrum contour map utilizing bi-spectrum technology, which provides a basis for the following image-based feature extraction. Then, an emerging approach in the field of image processing for feature extraction, speeded-up robust features, is employed to automatically exact fault features from the transformed bi-spectrum contour map and finally form a high-dimensional feature vector. To reduce the dimensionality of the feature vector, thus highlighting main fault features and reducing subsequent computing resources, t-Distributed Stochastic Neighbor Embedding is adopt to reduce the dimensionality of the feature vector. At last, probabilistic neural network is introduced for fault identification. Two typical rotating machinery, axial piston hydraulic pump and self-priming centrifugal pumps, are selected to demonstrate the effectiveness of the proposed method. Results show that the proposed method based on image-processing achieves a high accuracy, thus providing a highly effective means to fault diagnosis for rotating machinery. PMID:27711246
Fault Diagnosis for Rotating Machinery: A Method based on Image Processing.
Lu, Chen; Wang, Yang; Ragulskis, Minvydas; Cheng, Yujie
2016-01-01
Rotating machinery is one of the most typical types of mechanical equipment and plays a significant role in industrial applications. Condition monitoring and fault diagnosis of rotating machinery has gained wide attention for its significance in preventing catastrophic accident and guaranteeing sufficient maintenance. With the development of science and technology, fault diagnosis methods based on multi-disciplines are becoming the focus in the field of fault diagnosis of rotating machinery. This paper presents a multi-discipline method based on image-processing for fault diagnosis of rotating machinery. Different from traditional analysis method in one-dimensional space, this study employs computing method in the field of image processing to realize automatic feature extraction and fault diagnosis in a two-dimensional space. The proposed method mainly includes the following steps. First, the vibration signal is transformed into a bi-spectrum contour map utilizing bi-spectrum technology, which provides a basis for the following image-based feature extraction. Then, an emerging approach in the field of image processing for feature extraction, speeded-up robust features, is employed to automatically exact fault features from the transformed bi-spectrum contour map and finally form a high-dimensional feature vector. To reduce the dimensionality of the feature vector, thus highlighting main fault features and reducing subsequent computing resources, t-Distributed Stochastic Neighbor Embedding is adopt to reduce the dimensionality of the feature vector. At last, probabilistic neural network is introduced for fault identification. Two typical rotating machinery, axial piston hydraulic pump and self-priming centrifugal pumps, are selected to demonstrate the effectiveness of the proposed method. Results show that the proposed method based on image-processing achieves a high accuracy, thus providing a highly effective means to fault diagnosis for rotating machinery.
Aksu, Yaman; Miller, David J; Kesidis, George; Yang, Qing X
2010-05-01
Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedom--the hyperplane's intercept and its squared 2-norm--with the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.
Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K
2012-09-01
For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.
Hypergraph Based Feature Selection Technique for Medical Diagnosis.
Somu, Nivethitha; Raman, M R Gauthama; Kirthivasan, Kannan; Sriram, V S Shankar
2016-11-01
The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K - Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K - Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.
Group sparse multiview patch alignment framework with view consistency for image classification.
Gui, Jie; Tao, Dacheng; Sun, Zhenan; Luo, Yong; You, Xinge; Tang, Yuan Yan
2014-07-01
No single feature can satisfactorily characterize the semantic concepts of an image. Multiview learning aims to unify different kinds of features to produce a consensual and efficient representation. This paper redefines part optimization in the patch alignment framework (PAF) and develops a group sparse multiview patch alignment framework (GSM-PAF). The new part optimization considers not only the complementary properties of different views, but also view consistency. In particular, view consistency models the correlations between all possible combinations of any two kinds of view. In contrast to conventional dimensionality reduction algorithms that perform feature extraction and feature selection independently, GSM-PAF enjoys joint feature extraction and feature selection by exploiting l(2,1)-norm on the projection matrix to achieve row sparsity, which leads to the simultaneous selection of relevant features and learning transformation, and thus makes the algorithm more discriminative. Experiments on two real-world image data sets demonstrate the effectiveness of GSM-PAF for image classification.
Polarization-selective transmission in stacked two-dimensional complementary plasmonic crystal slabs
NASA Astrophysics Data System (ADS)
Iwanaga, Masanobu
2010-02-01
It has been experimentally and numerically shown that transmission at near infrared wavelengths is selectively controlled by polarizations in two-dimensional complementary plasmonic crystal slabs (2D c-PlCSs) of stacked unit cell. This feature is naturally derived by taking account of Babinet's principle. Moreover, the slight structural modification of the unit cell has been found to result in a drastic change in linear optical responses of stacked 2D c-PlCSs. These results substantiate the feasibility of 2D c-PlCSs for producing efficient polarizers with subwavelength thickness.
T-ray relevant frequencies for osteosarcoma classification
NASA Astrophysics Data System (ADS)
Withayachumnankul, W.; Ferguson, B.; Rainsford, T.; Findlay, D.; Mickan, S. P.; Abbott, D.
2006-01-01
We investigate the classification of the T-ray response of normal human bone cells and human osteosarcoma cells, grown in culture. Given the magnitude and phase responses within a reliable spectral range as features for input vectors, a trained support vector machine can correctly classify the two cell types to some extent. Performance of the support vector machine is deteriorated by the curse of dimensionality, resulting from the comparatively large number of features in the input vectors. Feature subset selection methods are used to select only an optimal number of relevant features for inputs. As a result, an improvement in generalization performance is attainable, and the selected frequencies can be used for further describing different mechanisms of the cells, responding to T-rays. We demonstrate a consistent classification accuracy of 89.6%, while the only one fifth of the original features are retained in the data set.
Classification of Microarray Data Using Kernel Fuzzy Inference System
Kumar Rath, Santanu
2014-01-01
The DNA microarray classification technique has gained more popularity in both research and practice. In real data analysis, such as microarray data, the dataset contains a huge number of insignificant and irrelevant features that tend to lose useful information. Classes with high relevance and feature sets with high significance are generally referred for the selected features, which determine the samples classification into their respective classes. In this paper, kernel fuzzy inference system (K-FIS) algorithm is applied to classify the microarray data (leukemia) using t-test as a feature selection method. Kernel functions are used to map original data points into a higher-dimensional (possibly infinite-dimensional) feature space defined by a (usually nonlinear) function ϕ through a mathematical process called the kernel trick. This paper also presents a comparative study for classification using K-FIS along with support vector machine (SVM) for different set of features (genes). Performance parameters available in the literature such as precision, recall, specificity, F-measure, ROC curve, and accuracy are considered to analyze the efficiency of the classification model. From the proposed approach, it is apparent that K-FIS model obtains similar results when compared with SVM model. This is an indication that the proposed approach relies on kernel function. PMID:27433543
Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection
NASA Astrophysics Data System (ADS)
Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline
2014-07-01
Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.
NASA Technical Reports Server (NTRS)
Anderson, B. H.
1983-01-01
A broad program to develop advanced, reliable, and user oriented three-dimensional viscous design techniques for supersonic inlet systems, and encourage their transfer into the general user community is discussed. Features of the program include: (1) develop effective methods of computing three-dimensional flows within a zonal modeling methodology; (2) ensure reasonable agreement between said analysis and selective sets of benchmark validation data; (3) develop user orientation into said analysis; and (4) explore and develop advanced numerical methodology.
Effect of finite sample size on feature selection and classification: a simulation study.
Way, Ted W; Sahiner, Berkman; Hadjiiski, Lubomir M; Chan, Heang-Ping
2010-02-01
The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples. Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200. It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available. None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.
Subsurface structures of buried features in the lunar Procellarum region
NASA Astrophysics Data System (ADS)
Wang, Wenrui; Heki, Kosuke
2017-07-01
The Gravity Recovery and Interior Laboratory (GRAIL) mission unraveled numbers of features showing strong gravity anomalies without prominent topographic signatures in the lunar Procellarum region. These features, located in different geologic units, are considered to have complex subsurface structures reflecting different evolution processes. By using the GRAIL level-1 data, we estimated the free-air and Bouguer gravity anomalies in several selected regions including such intriguing features. With the three-dimensional inversion technique, we recovered subsurface density structures in these regions.
Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration
Liu, Bo; Chen, Sanfeng; Li, Shuai; Liang, Yongsheng
2012-01-01
In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces. PMID:22736969
NASA Astrophysics Data System (ADS)
Jaferzadeh, Keyvan; Moon, Inkyu
2016-12-01
The classification of erythrocytes plays an important role in the field of hematological diagnosis, specifically blood disorders. Since the biconcave shape of red blood cell (RBC) is altered during the different stages of hematological disorders, we believe that the three-dimensional (3-D) morphological features of erythrocyte provide better classification results than conventional two-dimensional (2-D) features. Therefore, we introduce a set of 3-D features related to the morphological and chemical properties of RBC profile and try to evaluate the discrimination power of these features against 2-D features with a neural network classifier. The 3-D features include erythrocyte surface area, volume, average cell thickness, sphericity index, sphericity coefficient and functionality factor, MCH and MCHSD, and two newly introduced features extracted from the ring section of RBC at the single-cell level. In contrast, the 2-D features are RBC projected surface area, perimeter, radius, elongation, and projected surface area to perimeter ratio. All features are obtained from images visualized by off-axis digital holographic microscopy with a numerical reconstruction algorithm, and four categories of biconcave (doughnut shape), flat-disc, stomatocyte, and echinospherocyte RBCs are interested. Our experimental results demonstrate that the 3-D features can be more useful in RBC classification than the 2-D features. Finally, we choose the best feature set of the 2-D and 3-D features by sequential forward feature selection technique, which yields better discrimination results. We believe that the final feature set evaluated with a neural network classification strategy can improve the RBC classification accuracy.
Guo, Xinyu; Dominick, Kelli C.; Minai, Ali A.; Li, Hailong; Erickson, Craig A.; Lu, Long J.
2017-01-01
The whole-brain functional connectivity (FC) pattern obtained from resting-state functional magnetic resonance imaging data are commonly applied to study neuropsychiatric conditions such as autism spectrum disorder (ASD) by using different machine learning models. Recent studies indicate that both hyper- and hypo- aberrant ASD-associated FCs were widely distributed throughout the entire brain rather than only in some specific brain regions. Deep neural networks (DNN) with multiple hidden layers have shown the ability to systematically extract lower-to-higher level information from high dimensional data across a series of neural hidden layers, significantly improving classification accuracy for such data. In this study, a DNN with a novel feature selection method (DNN-FS) is developed for the high dimensional whole-brain resting-state FC pattern classification of ASD patients vs. typical development (TD) controls. The feature selection method is able to help the DNN generate low dimensional high-quality representations of the whole-brain FC patterns by selecting features with high discriminating power from multiple trained sparse auto-encoders. For the comparison, a DNN without the feature selection method (DNN-woFS) is developed, and both of them are tested with different architectures (i.e., with different numbers of hidden layers/nodes). Results show that the best classification accuracy of 86.36% is generated by the DNN-FS approach with 3 hidden layers and 150 hidden nodes (3/150). Remarkably, DNN-FS outperforms DNN-woFS for all architectures studied. The most significant accuracy improvement was 9.09% with the 3/150 architecture. The method also outperforms other feature selection methods, e.g., two sample t-test and elastic net. In addition to improving the classification accuracy, a Fisher's score-based biomarker identification method based on the DNN is also developed, and used to identify 32 FCs related to ASD. These FCs come from or cross different pre-defined brain networks including the default-mode, cingulo-opercular, frontal-parietal, and cerebellum. Thirteen of them are statically significant between ASD and TD groups (two sample t-test p < 0.05) while 19 of them are not. The relationship between the statically significant FCs and the corresponding ASD behavior symptoms is discussed based on the literature and clinician's expert knowledge. Meanwhile, the potential reason of obtaining 19 FCs which are not statistically significant is also provided. PMID:28871217
Feature Selection and Classifier Development for Radio Frequency Device Identification
2015-12-01
adds important background knowledge for this research . 41 Four leading RF-based device identification methods have been proposed: Radio...appropriate level of dimensionality. Both qualitative and quantitative DRA dimensionality assessment methods are possible. Prior RF-DNA DRA research , e.g...Employing experimental designs to find optimal algorithm settings has been seen in hyperspectral anomaly detection research , c.f. [513–520], but not
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.
Facial recognition using multisensor images based on localized kernel eigen spaces.
Gundimada, Satyanadh; Asari, Vijayan K
2009-06-01
A feature selection technique along with an information fusion procedure for improving the recognition accuracy of a visual and thermal image-based facial recognition system is presented in this paper. A novel modular kernel eigenspaces approach is developed and implemented on the phase congruency feature maps extracted from the visual and thermal images individually. Smaller sub-regions from a predefined neighborhood within the phase congruency images of the training samples are merged to obtain a large set of features. These features are then projected into higher dimensional spaces using kernel methods. The proposed localized nonlinear feature selection procedure helps to overcome the bottlenecks of illumination variations, partial occlusions, expression variations and variations due to temperature changes that affect the visual and thermal face recognition techniques. AR and Equinox databases are used for experimentation and evaluation of the proposed technique. The proposed feature selection procedure has greatly improved the recognition accuracy for both the visual and thermal images when compared to conventional techniques. Also, a decision level fusion methodology is presented which along with the feature selection procedure has outperformed various other face recognition techniques in terms of recognition accuracy.
Feature-based data assimilation in geophysics
NASA Astrophysics Data System (ADS)
Morzfeld, Matthias; Adams, Jesse; Lunderman, Spencer; Orozco, Rafael
2018-05-01
Many applications in science require that computational models and data be combined. In a Bayesian framework, this is usually done by defining likelihoods based on the mismatch of model outputs and data. However, matching model outputs and data in this way can be unnecessary or impossible. For example, using large amounts of steady state data is unnecessary because these data are redundant. It is numerically difficult to assimilate data in chaotic systems. It is often impossible to assimilate data of a complex system into a low-dimensional model. As a specific example, consider a low-dimensional stochastic model for the dipole of the Earth's magnetic field, while other field components are ignored in the model. The above issues can be addressed by selecting features of the data, and defining likelihoods based on the features, rather than by the usual mismatch of model output and data. Our goal is to contribute to a fundamental understanding of such a feature-based approach that allows us to assimilate selected aspects of data into models. We also explain how the feature-based approach can be interpreted as a method for reducing an effective dimension and derive new noise models, based on perturbed observations, that lead to computationally efficient solutions. Numerical implementations of our ideas are illustrated in four examples.
Lin, Xiaohui; Li, Chao; Zhang, Yanhui; Su, Benzhe; Fan, Meng; Wei, Hai
2017-12-26
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.
Hadoop neural network for parallel and distributed feature selection.
Hodge, Victoria J; O'Keefe, Simon; Austin, Jim
2016-06-01
In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An
2015-01-01
There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.
Wong, Gerard; Leckie, Christopher; Kowalczyk, Adam
2012-01-15
Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR.
Dimensionality reduction in epidemic spreading models
NASA Astrophysics Data System (ADS)
Frasca, M.; Rizzo, A.; Gallo, L.; Fortuna, L.; Porfiri, M.
2015-09-01
Complex dynamical systems often exhibit collective dynamics that are well described by a reduced set of key variables in a low-dimensional space. Such a low-dimensional description offers a privileged perspective to understand the system behavior across temporal and spatial scales. In this work, we propose a data-driven approach to establish low-dimensional representations of large epidemic datasets by using a dimensionality reduction algorithm based on isometric features mapping (ISOMAP). We demonstrate our approach on synthetic data for epidemic spreading in a population of mobile individuals. We find that ISOMAP is successful in embedding high-dimensional data into a low-dimensional manifold, whose topological features are associated with the epidemic outbreak. Across a range of simulation parameters and model instances, we observe that epidemic outbreaks are embedded into a family of closed curves in a three-dimensional space, in which neighboring points pertain to instants that are close in time. The orientation of each curve is unique to a specific outbreak, and the coordinates correlate with the number of infected individuals. A low-dimensional description of epidemic spreading is expected to improve our understanding of the role of individual response on the outbreak dynamics, inform the selection of meaningful global observables, and, possibly, aid in the design of control and quarantine procedures.
Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos
2014-04-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.
Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos
2014-01-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564
Two-dimensional wavelet transform feature extraction for porous silicon chemical sensors.
Murguía, José S; Vergara, Alexander; Vargas-Olmos, Cecilia; Wong, Travis J; Fonollosa, Jordi; Huerta, Ramón
2013-06-27
Designing reliable, fast responding, highly sensitive, and low-power consuming chemo-sensory systems has long been a major goal in chemo-sensing. This goal, however, presents a difficult challenge because having a set of chemo-sensory detectors exhibiting all these aforementioned ideal conditions are still largely un-realizable to-date. This paper presents a unique perspective on capturing more in-depth insights into the physicochemical interactions of two distinct, selectively chemically modified porous silicon (pSi) film-based optical gas sensors by implementing an innovative, based on signal processing methodology, namely the two-dimensional discrete wavelet transform. Specifically, the method consists of using the two-dimensional discrete wavelet transform as a feature extraction method to capture the non-stationary behavior from the bi-dimensional pSi rugate sensor response. Utilizing a comprehensive set of measurements collected from each of the aforementioned optically based chemical sensors, we evaluate the significance of our approach on a complex, six-dimensional chemical analyte discrimination/quantification task problem. Due to the bi-dimensional aspects naturally governing the optical sensor response to chemical analytes, our findings provide evidence that the proposed feature extractor strategy may be a valuable tool to deepen our understanding of the performance of optically based chemical sensors as well as an important step toward attaining their implementation in more realistic chemo-sensing applications. Copyright © 2013 Elsevier B.V. All rights reserved.
Geraci, Joseph; Dharsee, Moyez; Nuin, Paulo; Haslehurst, Alexandria; Koti, Madhuri; Feilotter, Harriet E; Evans, Ken
2014-03-01
We introduce a novel method for visualizing high dimensional data via a discrete dynamical system. This method provides a 2D representation of the relationship between subjects according to a set of variables without geometric projections, transformed axes or principal components. The algorithm exploits a memory-type mechanism inherent in a certain class of discrete dynamical systems collectively referred to as the chaos game that are closely related to iterative function systems. The goal of the algorithm was to create a human readable representation of high dimensional patient data that was capable of detecting unrevealed subclusters of patients from within anticipated classifications. This provides a mechanism to further pursue a more personalized exploration of pathology when used with medical data. For clustering and classification protocols, the dynamical system portion of the algorithm is designed to come after some feature selection filter and before some model evaluation (e.g. clustering accuracy) protocol. In the version given here, a univariate features selection step is performed (in practice more complex feature selection methods are used), a discrete dynamical system is driven by this reduced set of variables (which results in a set of 2D cluster models), these models are evaluated for their accuracy (according to a user-defined binary classification) and finally a visual representation of the top classification models are returned. Thus, in addition to the visualization component, this methodology can be used for both supervised and unsupervised machine learning as the top performing models are returned in the protocol we describe here. Butterfly, the algorithm we introduce and provide working code for, uses a discrete dynamical system to classify high dimensional data and provide a 2D representation of the relationship between subjects. We report results on three datasets (two in the article; one in the appendix) including a public lung cancer dataset that comes along with the included Butterfly R package. In the included R script, a univariate feature selection method is used for the dimension reduction step, but in the future we wish to use a more powerful multivariate feature reduction method based on neural networks (Kriesel, 2007). A script written in R (designed to run on R studio) accompanies this article that implements this algorithm and is available at http://butterflygeraci.codeplex.com/. For details on the R package or for help installing the software refer to the accompanying document, Supporting Material and Appendix.
Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj
2015-01-01
Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers –based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations. PMID:26286630
Nankali, Saber; Miandoab, Payam Samadi; Baghizadeh, Amin
2016-01-01
In external‐beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation‐based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two “Genetic” and “Ranker” searching procedures. The performance of these algorithms has been evaluated using four‐dimensional extended cardiac‐torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro‐fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F‐test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation‐based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers. PACS numbers: 87.55.km, 87.56.Fc PMID:26894358
Nankali, Saber; Torshabi, Ahmad Esmaili; Miandoab, Payam Samadi; Baghizadeh, Amin
2016-01-08
In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers.
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Wang, Guohua; Liu, Qiong
2015-01-01
Far-infrared pedestrian detection approaches for advanced driver-assistance systems based on high-dimensional features fail to simultaneously achieve robust and real-time detection. We propose a robust and real-time pedestrian detection system characterized by novel candidate filters, novel pedestrian features and multi-frame approval matching in a coarse-to-fine fashion. Firstly, we design two filters based on the pedestrians’ head and the road to select the candidates after applying a pedestrian segmentation algorithm to reduce false alarms. Secondly, we propose a novel feature encapsulating both the relationship of oriented gradient distribution and the code of oriented gradient to deal with the enormous variance in pedestrians’ size and appearance. Thirdly, we introduce a multi-frame approval matching approach utilizing the spatiotemporal continuity of pedestrians to increase the detection rate. Large-scale experiments indicate that the system works in real time and the accuracy has improved about 9% compared with approaches based on high-dimensional features only. PMID:26703611
Wang, Guohua; Liu, Qiong
2015-12-21
Far-infrared pedestrian detection approaches for advanced driver-assistance systems based on high-dimensional features fail to simultaneously achieve robust and real-time detection. We propose a robust and real-time pedestrian detection system characterized by novel candidate filters, novel pedestrian features and multi-frame approval matching in a coarse-to-fine fashion. Firstly, we design two filters based on the pedestrians' head and the road to select the candidates after applying a pedestrian segmentation algorithm to reduce false alarms. Secondly, we propose a novel feature encapsulating both the relationship of oriented gradient distribution and the code of oriented gradient to deal with the enormous variance in pedestrians' size and appearance. Thirdly, we introduce a multi-frame approval matching approach utilizing the spatiotemporal continuity of pedestrians to increase the detection rate. Large-scale experiments indicate that the system works in real time and the accuracy has improved about 9% compared with approaches based on high-dimensional features only.
Li, Ziyi; Safo, Sandra E; Long, Qi
2017-07-11
Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection. Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma. The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.
Elucidating high-dimensional cancer hallmark annotation via enriched ontology.
Yan, Shankai; Wong, Ka-Chun
2017-09-01
Cancer hallmark annotation is a promising technique that could discover novel knowledge about cancer from the biomedical literature. The automated annotation of cancer hallmarks could reveal relevant cancer transformation processes in the literature or extract the articles that correspond to the cancer hallmark of interest. It acts as a complementary approach that can retrieve knowledge from massive text information, advancing numerous focused studies in cancer research. Nonetheless, the high-dimensional nature of cancer hallmark annotation imposes a unique challenge. To address the curse of dimensionality, we compared multiple cancer hallmark annotation methods on 1580 PubMed abstracts. Based on the insights, a novel approach, UDT-RF, which makes use of ontological features is proposed. It expands the feature space via the Medical Subject Headings (MeSH) ontology graph and utilizes novel feature selections for elucidating the high-dimensional cancer hallmark annotation space. To demonstrate its effectiveness, state-of-the-art methods are compared and evaluated by a multitude of performance metrics, revealing the full performance spectrum on the full set of cancer hallmarks. Several case studies are conducted, demonstrating how the proposed approach could reveal novel insights into cancers. https://github.com/cskyan/chmannot. Copyright © 2017 Elsevier Inc. All rights reserved.
Comparative study of feature selection with ensemble learning using SOM variants
NASA Astrophysics Data System (ADS)
Filali, Ameni; Jlassi, Chiraz; Arous, Najet
2017-03-01
Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
Stabilizing l1-norm prediction models by supervised feature grouping.
Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha
2016-02-01
Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l1-norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. Copyright © 2015 Elsevier Inc. All rights reserved.
Fuzzy support vector machine for microarray imbalanced data classification
NASA Astrophysics Data System (ADS)
Ladayya, Faroh; Purnami, Santi Wulan; Irhamah
2017-11-01
DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.
Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning
2018-03-08
Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.
Classification of radiolarian images with hand-crafted and deep features
NASA Astrophysics Data System (ADS)
Keçeli, Ali Seydi; Kaya, Aydın; Keçeli, Seda Uzunçimen
2017-12-01
Radiolarians are planktonic protozoa and are important biostratigraphic and paleoenvironmental indicators for paleogeographic reconstructions. Radiolarian paleontology still remains as a low cost and the one of the most convenient way to obtain dating of deep ocean sediments. Traditional methods for identifying radiolarians are time-consuming and cannot scale to the granularity or scope necessary for large-scale studies. Automated image classification will allow making these analyses promptly. In this study, a method for automatic radiolarian image classification is proposed on Scanning Electron Microscope (SEM) images of radiolarians to ease species identification of fossilized radiolarians. The proposed method uses both hand-crafted features like invariant moments, wavelet moments, Gabor features, basic morphological features and deep features obtained from a pre-trained Convolutional Neural Network (CNN). Feature selection is applied over deep features to reduce high dimensionality. Classification outcomes are analyzed to compare hand-crafted features, deep features, and their combinations. Results show that the deep features obtained from a pre-trained CNN are more discriminative comparing to hand-crafted ones. Additionally, feature selection utilizes to the computational cost of classification algorithms and have no negative effect on classification accuracy.
IMMAN: free software for information theory-based chemometric analysis.
Urias, Ricardo W Pino; Barigye, Stephen J; Marrero-Ponce, Yovani; García-Jacas, César R; Valdes-Martiní, José R; Perez-Gimenez, Facundo
2015-05-01
The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon's entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software ( http://mobiosd-hub.com/imman-soft/ ), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. Graphic representation for Shannon's distribution of MD calculating software.
Mohammed, Ameer; Zamani, Majid; Bayford, Richard; Demosthenous, Andreas
2017-12-01
In Parkinson's disease (PD), on-demand deep brain stimulation is required so that stimulation is regulated to reduce side effects resulting from continuous stimulation and PD exacerbation due to untimely stimulation. Also, the progressive nature of PD necessitates the use of dynamic detection schemes that can track the nonlinearities in PD. This paper proposes the use of dynamic feature extraction and dynamic pattern classification to achieve dynamic PD detection taking into account the demand for high accuracy, low computation, and real-time detection. The dynamic feature extraction and dynamic pattern classification are selected by evaluating a subset of feature extraction, dimensionality reduction, and classification algorithms that have been used in brain-machine interfaces. A novel dimensionality reduction technique, the maximum ratio method (MRM) is proposed, which provides the most efficient performance. In terms of accuracy and complexity for hardware implementation, a combination having discrete wavelet transform for feature extraction, MRM for dimensionality reduction, and dynamic k-nearest neighbor for classification was chosen as the most efficient. It achieves a classification accuracy of 99.29%, an F1-score of 97.90%, and a choice probability of 99.86%.
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations. PMID:25885272
Improved classification accuracy by feature extraction using genetic algorithms
NASA Astrophysics Data System (ADS)
Patriarche, Julia; Manduca, Armando; Erickson, Bradley J.
2003-05-01
A feature extraction algorithm has been developed for the purposes of improving classification accuracy. The algorithm uses a genetic algorithm / hill-climber hybrid to generate a set of linearly recombined features, which may be of reduced dimensionality compared with the original set. The genetic algorithm performs the global exploration, and a hill climber explores local neighborhoods. Hybridizing the genetic algorithm with a hill climber improves both the rate of convergence, and the final overall cost function value; it also reduces the sensitivity of the genetic algorithm to parameter selection. The genetic algorithm includes the operators: crossover, mutation, and deletion / reactivation - the last of these effects dimensionality reduction. The feature extractor is supervised, and is capable of deriving a separate feature space for each tissue (which are reintegrated during classification). A non-anatomical digital phantom was developed as a gold standard for testing purposes. In tests with the phantom, and with images of multiple sclerosis patients, classification with feature extractor derived features yielded lower error rates than using standard pulse sequences, and with features derived using principal components analysis. Using the multiple sclerosis patient data, the algorithm resulted in a mean 31% reduction in classification error of pure tissues.
Dynamic adaptive learning for decision-making supporting systems
NASA Astrophysics Data System (ADS)
He, Haibo; Cao, Yuan; Chen, Sheng; Desai, Sachi; Hohil, Myron E.
2008-03-01
This paper proposes a novel adaptive learning method for data mining in support of decision-making systems. Due to the inherent characteristics of information ambiguity/uncertainty, high dimensionality and noisy in many homeland security and defense applications, such as surveillances, monitoring, net-centric battlefield, and others, it is critical to develop autonomous learning methods to efficiently learn useful information from raw data to help the decision making process. The proposed method is based on a dynamic learning principle in the feature spaces. Generally speaking, conventional approaches of learning from high dimensional data sets include various feature extraction (principal component analysis, wavelet transform, and others) and feature selection (embedded approach, wrapper approach, filter approach, and others) methods. However, very limited understandings of adaptive learning from different feature spaces have been achieved. We propose an integrative approach that takes advantages of feature selection and hypothesis ensemble techniques to achieve our goal. Based on the training data distributions, a feature score function is used to provide a measurement of the importance of different features for learning purpose. Then multiple hypotheses are iteratively developed in different feature spaces according to their learning capabilities. Unlike the pre-set iteration steps in many of the existing ensemble learning approaches, such as adaptive boosting (AdaBoost) method, the iterative learning process will automatically stop when the intelligent system can not provide a better understanding than a random guess in that particular subset of feature spaces. Finally, a voting algorithm is used to combine all the decisions from different hypotheses to provide the final prediction results. Simulation analyses of the proposed method on classification of different US military aircraft databases show the effectiveness of this method.
Intelligent feature selection techniques for pattern classification of Lamb wave signals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hinders, Mark K.; Miller, Corey A.
2014-02-18
Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crossholemore » tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy.« less
A face and palmprint recognition approach based on discriminant DCT feature extraction.
Jing, Xiao-Yuan; Zhang, David
2004-12-01
In the field of image processing and recognition, discrete cosine transform (DCT) and linear discrimination are two widely used techniques. Based on them, we present a new face and palmprint recognition approach in this paper. It first uses a two-dimensional separability judgment to select the DCT frequency bands with favorable linear separability. Then from the selected bands, it extracts the linear discriminative features by an improved Fisherface method and performs the classification by the nearest neighbor classifier. We detailedly analyze theoretical advantages of our approach in feature extraction. The experiments on face databases and palmprint database demonstrate that compared to the state-of-the-art linear discrimination methods, our approach obtains better classification performance. It can significantly improve the recognition rates for face and palmprint data and effectively reduce the dimension of feature space.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Honorio, J.; Goldstein, R.; Honorio, J.
We propose a simple, well grounded classification technique which is suited for group classification on brain fMRI data sets that have high dimensionality, small number of subjects, high noise level, high subject variability, imperfect registration and capture subtle cognitive effects. We propose threshold-split region as a new feature selection method and majority voteas the classification technique. Our method does not require a predefined set of regions of interest. We use average acros ssessions, only one feature perexperimental condition, feature independence assumption, and simple classifiers. The seeming counter-intuitive approach of using a simple design is supported by signal processing and statisticalmore » theory. Experimental results in two block design data sets that capture brain function under distinct monetary rewards for cocaine addicted and control subjects, show that our method exhibits increased generalization accuracy compared to commonly used feature selection and classification techniques.« less
Kesharaju, Manasa; Nagarajah, Romesh
2015-09-01
The motivation for this research stems from a need for providing a non-destructive testing method capable of detecting and locating any defects and microstructural variations within armour ceramic components before issuing them to the soldiers who rely on them for their survival. The development of an automated ultrasonic inspection based classification system would make possible the checking of each ceramic component and immediately alert the operator about the presence of defects. Generally, in many classification problems a choice of features or dimensionality reduction is significant and simultaneously very difficult, as a substantial computational effort is required to evaluate possible feature subsets. In this research, a combination of artificial neural networks and genetic algorithms are used to optimize the feature subset used in classification of various defects in reaction-sintered silicon carbide ceramic components. Initially wavelet based feature extraction is implemented from the region of interest. An Artificial Neural Network classifier is employed to evaluate the performance of these features. Genetic Algorithm based feature selection is performed. Principal Component Analysis is a popular technique used for feature selection and is compared with the genetic algorithm based technique in terms of classification accuracy and selection of optimal number of features. The experimental results confirm that features identified by Principal Component Analysis lead to improved performance in terms of classification percentage with 96% than Genetic algorithm with 94%. Copyright © 2015 Elsevier B.V. All rights reserved.
An opinion formation based binary optimization approach for feature selection
NASA Astrophysics Data System (ADS)
Hamedmoghadam, Homayoun; Jalili, Mahdi; Yu, Xinghuo
2018-02-01
This paper proposed a novel optimization method based on opinion formation in complex network systems. The proposed optimization technique mimics human-human interaction mechanism based on a mathematical model derived from social sciences. Our method encodes a subset of selected features to the opinion of an artificial agent and simulates the opinion formation process among a population of agents to solve the feature selection problem. The agents interact using an underlying interaction network structure and get into consensus in their opinions, while finding better solutions to the problem. A number of mechanisms are employed to avoid getting trapped in local minima. We compare the performance of the proposed method with a number of classical population-based optimization methods and a state-of-the-art opinion formation based method. Our experiments on a number of high dimensional datasets reveal outperformance of the proposed algorithm over others.
Efficient Iris Recognition Based on Optimal Subfeature Selection and Weighted Subregion Fusion
Deng, Ning
2014-01-01
In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT) in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF) based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF) based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region's weights and then weighted different subregions' matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, andMMU-V1), demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity. PMID:24683317
Efficient iris recognition based on optimal subfeature selection and weighted subregion fusion.
Chen, Ying; Liu, Yuanning; Zhu, Xiaodong; He, Fei; Wang, Hongye; Deng, Ning
2014-01-01
In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT) in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF) based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF) based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region's weights and then weighted different subregions' matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, and MMU-V1), demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity.
Wavelength selection by dielectric-loaded plasmonic components
NASA Astrophysics Data System (ADS)
Holmgaard, Tobias; Chen, Zhuo; Bozhevolnyi, Sergey I.; Markey, Laurent; Dereux, Alain; Krasavin, Alexey V.; Zayats, Anatoly V.
2009-02-01
Fabrication, characterization, and modeling of waveguide-ring resonators and in-line Bragg gratings for wavelength selection in the telecommunication range are reported utilizing dielectric-loaded surface plasmon-polariton waveguides. The devices were fabricated by depositing subwavelength-sized polymer ridges on a smooth gold film using industrially compatible large-scale UV photolithography. We demonstrate efficient and compact wavelength-selective filters, including waveguide-ring resonators with an insertion loss of ˜2 dB and a footprint of only 150 μm2 featuring narrow bandwidth (˜20 nm) and high contrast (˜13 dB) features in the transmission spectrum. The performance of the components is found in good agreement with the results obtained by full vectorial three-dimensional finite element simulations.
Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.
Sahran, Shahnorbanun; Albashish, Dheeb; Abdullah, Azizi; Shukor, Nordashima Abd; Hayati Md Pauzi, Suria
2018-04-18
Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to other reported FS methods. Copyright © 2018 Elsevier B.V. All rights reserved.
2015-01-01
Background Investigations into novel biomarkers using omics techniques generate large amounts of data. Due to their size and numbers of attributes, these data are suitable for analysis with machine learning methods. A key component of typical machine learning pipelines for omics data is feature selection, which is used to reduce the raw high-dimensional data into a tractable number of features. Feature selection needs to balance the objective of using as few features as possible, while maintaining high predictive power. This balance is crucial when the goal of data analysis is the identification of highly accurate but small panels of biomarkers with potential clinical utility. In this paper we propose a heuristic for the selection of very small feature subsets, via an iterative feature elimination process that is guided by rule-based machine learning, called RGIFE (Rule-guided Iterative Feature Elimination). We use this heuristic to identify putative biomarkers of osteoarthritis (OA), articular cartilage degradation and synovial inflammation, using both proteomic and transcriptomic datasets. Results and discussion Our RGIFE heuristic increased the classification accuracies achieved for all datasets when no feature selection is used, and performed well in a comparison with other feature selection methods. Using this method the datasets were reduced to a smaller number of genes or proteins, including those known to be relevant to OA, cartilage degradation and joint inflammation. The results have shown the RGIFE feature reduction method to be suitable for analysing both proteomic and transcriptomics data. Methods that generate large ‘omics’ datasets are increasingly being used in the area of rheumatology. Conclusions Feature reduction methods are advantageous for the analysis of omics data in the field of rheumatology, as the applications of such techniques are likely to result in improvements in diagnosis, treatment and drug discovery. PMID:25923811
Drug-target interaction prediction using ensemble learning and dimensionality reduction.
Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong
2017-10-01
Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction. Copyright © 2017 Elsevier Inc. All rights reserved.
Pattern selection in solidification
NASA Technical Reports Server (NTRS)
Langer, J. S.
1984-01-01
Directional solidification of alloys produces a wide variety of cellular or lamellar structures which, depending upon growth conditions, may be reproducibly regular or may behave chaotically. It is not well understood how these patterns are selected and controlled or even whether there ever exist sharp selection mechanisms. A related phenomenon is the spatial propagation of a pattern into a system which has been caused to become unstable against pattern-forming deformations. This phenomenon has some features in common with the propagation of sidebranching modes in dendritic solidification. In a class of one-dimensional models, the nonlinear system can be shown to select the propagating mode in which the leading edge of the pattern is just marginally stable. This stability principle, when applicable, predicts both the speed of propagation and the geometrical characteristics of the pattern which forms behind the moving front. A boundary-layer model for fully two or three dimensional solidification problems appears to exhibit similar mathematical behavior.
Selecting relevant 3D image features of margin sharpness and texture for lung nodule retrieval.
Ferreira, José Raniery; de Azevedo-Marques, Paulo Mazzoncini; Oliveira, Marcelo Costa
2017-03-01
Lung cancer is the leading cause of cancer-related deaths in the world. Its diagnosis is a challenge task to specialists due to several aspects on the classification of lung nodules. Therefore, it is important to integrate content-based image retrieval methods on the lung nodule classification process, since they are capable of retrieving similar cases from databases that were previously diagnosed. However, this mechanism depends on extracting relevant image features in order to obtain high efficiency. The goal of this paper is to perform the selection of 3D image features of margin sharpness and texture that can be relevant on the retrieval of similar cancerous and benign lung nodules. A total of 48 3D image attributes were extracted from the nodule volume. Border sharpness features were extracted from perpendicular lines drawn over the lesion boundary. Second-order texture features were extracted from a cooccurrence matrix. Relevant features were selected by a correlation-based method and a statistical significance analysis. Retrieval performance was assessed according to the nodule's potential malignancy on the 10 most similar cases and by the parameters of precision and recall. Statistical significant features reduced retrieval performance. Correlation-based method selected 2 margin sharpness attributes and 6 texture attributes and obtained higher precision compared to all 48 extracted features on similar nodule retrieval. Feature space dimensionality reduction of 83 % obtained higher retrieval performance and presented to be a computationaly low cost method of retrieving similar nodules for the diagnosis of lung cancer.
Building Facade Reconstruction by Fusing Terrestrial Laser Points and Images
Pu, Shi; Vosselman, George
2009-01-01
Laser data and optical data have a complementary nature for three dimensional feature extraction. Efficient integration of the two data sources will lead to a more reliable and automated extraction of three dimensional features. This paper presents a semiautomatic building facade reconstruction approach, which efficiently combines information from terrestrial laser point clouds and close range images. A building facade's general structure is discovered and established using the planar features from laser data. Then strong lines in images are extracted using Canny extractor and Hough transformation, and compared with current model edges for necessary improvement. Finally, textures with optimal visibility are selected and applied according to accurate image orientations. Solutions to several challenge problems throughout the collaborated reconstruction, such as referencing between laser points and multiple images and automated texturing, are described. The limitations and remaining works of this approach are also discussed. PMID:22408539
High-dimensional cluster analysis with the Masked EM Algorithm
Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.
2014-01-01
Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
NASA Astrophysics Data System (ADS)
Carlone, Pierpaolo; Astarita, Antonello; Rubino, Felice; Pasquino, Nicola; Aprea, Paolo
2016-12-01
In this paper, a selective laser post-deposition on pure grade II titanium coatings, cold-sprayed on AA2024-T3 sheets, was experimentally and numerically investigated. Morphological features, microstructure, and chemical composition of the treated zone were assessed by means of optical microscopy, scanning electron microscopy, and energy dispersive X-ray spectrometry. Microhardness measurements were also carried out to evaluate the mechanical properties of the coating. A numerical model of the laser treatment was implemented and solved to simulate the process and discuss the experimental outcomes. Obtained results highlighted the key role played by heat input and dimensional features on the effectiveness of the treatment.
Structure Size Enhanced Histogram
NASA Astrophysics Data System (ADS)
Wesarg, Stefan; Kirschner, Matthias
Direct volume visualization requires the definition of transfer functions (TFs) for the assignment of opacity and color. Multi-dimensional TFs are based on at least two image properties, and are specified by means of 2D histograms. In this work we propose a new type of a 2D histogram which combines gray value with information about the size of the structures. This structure size enhanced (SSE) histogram is an intuitive approach for representing anatomical features. Clinicians — the users we are focusing on — are much more familiar with selecting features by their size than by their gradient magnitude value. As a proof of concept, we employ the SSE histogram for the definition of two-dimensional TFs for the visualization of 3D MRI and CT image data.
Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392
NASA Astrophysics Data System (ADS)
Ashbach, Jason A.
Periodic metallodielectric frequency selective surface (FSS) designs have historically seen widespread use in the microwave and radio frequency spectra. By scaling the dimensions of an FSS unit cell for use in a nano-fabrication process, these concepts have recently been adapted for use in optical applications as well. While early optical designs have been limited to wellunderstood geometries or optimized pixelated screens, nano-fabrication, lithographic and interconnect technology has progressed to a point where it is possible to fabricate metallic screens of arbitrary geometries featuring curvilinear or even three-dimensional characteristics that are only tens of nanometers wide. In order to design an FSS featuring such characteristics, it is important to have a robust numerical solver that features triangular elements in purely two-dimensional geometries and prismatic or tetrahedral elements in three-dimensional geometries. In this dissertation, a periodic finite element method code has been developed which features prismatic elements whose top and bottom boundaries are truncated by numerical integration of the boundary integral as opposed to an approximate representation found in a perfectly matched layer. However, since no exact solution exists for the calculation of triangular elements in a boundary integral, this process can be time consuming. To address this, these calculations were optimized for parallelization such that they may be done on a graphics processor, which provides a large increase in computational speed. Additionally, a simple geometrical representation using a Bezier surface is presented which provides generality with few variables. With a fast numerical solver coupled with a lowvariable geometric representation, a heuristic optimization algorithm has been used to develop several optical designs such as an absorber, a circular polarization filter, a transparent conductive surface and an enhanced, optical modulator.
A novel feature extraction approach for microarray data based on multi-algorithm fusion
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions. PMID:25780277
A novel feature extraction approach for microarray data based on multi-algorithm fusion.
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.
Three-dimensional biofilm structure quantification.
Beyenal, Haluk; Donovan, Conrad; Lewandowski, Zbigniew; Harkin, Gary
2004-12-01
Quantitative parameters describing biofilm physical structure have been extracted from three-dimensional confocal laser scanning microscopy images and used to compare biofilm structures, monitor biofilm development, and quantify environmental factors affecting biofilm structure. Researchers have previously used biovolume, volume to surface ratio, roughness coefficient, and mean and maximum thicknesses to compare biofilm structures. The selection of these parameters is dependent on the availability of software to perform calculations. We believe it is necessary to develop more comprehensive parameters to describe heterogeneous biofilm morphology in three dimensions. This research presents parameters describing three-dimensional biofilm heterogeneity, size, and morphology of biomass calculated from confocal laser scanning microscopy images. This study extends previous work which extracted quantitative parameters regarding morphological features from two-dimensional biofilm images to three-dimensional biofilm images. We describe two types of parameters: (1) textural parameters showing microscale heterogeneity of biofilms and (2) volumetric parameters describing size and morphology of biomass. The three-dimensional features presented are average (ADD) and maximum diffusion distances (MDD), fractal dimension, average run lengths (in X, Y and Z directions), aspect ratio, textural entropy, energy and homogeneity. We discuss the meaning of each parameter and present the calculations in detail. The developed algorithms, including automatic thresholding, are implemented in software as MATLAB programs which will be available at site prior to publication of the paper.
Stoffer, Philip W.
2008-01-01
This is a set of two sheets of 3D images showing geologic features of many National Parks. Red-and-cyan viewing glasses are need to see the three-dimensional effect. A search on the World Wide Web will yield many sites about anaglyphs and where to get 3D glasses. Red-blue glasses will do but red-cyan glasses are a little better. This publication features a photo quiz game: Name that park! where you can explore, interpret, and identify selected park landscapes. Can you identify landscape features in the images? Can you explain processes that may have helped form the landscape features? You can get the answers online.
An audiovisual emotion recognition system
NASA Astrophysics Data System (ADS)
Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun
2007-12-01
Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.
Soguero-Ruiz, Cristina; Hindberg, Kristian; Rojo-Alvarez, Jose Luis; Skrovseth, Stein Olav; Godtliebsen, Fred; Mortensen, Kim; Revhaug, Arthur; Lindsetmo, Rolv-Ole; Augestad, Knut Magne; Jenssen, Robert
2016-09-01
The free text in electronic health records (EHRs) conveys a huge amount of clinical information about health state and patient history. Despite a rapidly growing literature on the use of machine learning techniques for extracting this information, little effort has been invested toward feature selection and the features' corresponding medical interpretation. In this study, we focus on the task of early detection of anastomosis leakage (AL), a severe complication after elective surgery for colorectal cancer (CRC) surgery, using free text extracted from EHRs. We use a bag-of-words model to investigate the potential for feature selection strategies. The purpose is earlier detection of AL and prediction of AL with data generated in the EHR before the actual complication occur. Due to the high dimensionality of the data, we derive feature selection strategies using the robust support vector machine linear maximum margin classifier, by investigating: 1) a simple statistical criterion (leave-one-out-based test); 2) an intensive-computation statistical criterion (Bootstrap resampling); and 3) an advanced statistical criterion (kernel entropy). Results reveal a discriminatory power for early detection of complications after CRC (sensitivity 100%; specificity 72%). These results can be used to develop prediction models, based on EHR data, that can support surgeons and patients in the preoperative decision making phase.
Liu, Zhenqiu; Sun, Fengzhu; McGovern, Dermot P
2017-01-01
Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L 1 , SCAD and MC+. However, none of the existing algorithms optimizes L 0 , which penalizes the number of nonzero features directly. In this paper, we develop a novel sparse generalized linear model (GLM) with L 0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L 0 optimization directly. Even though the original L 0 problem is non-convex, the problem is approximated by sequential convex optimizations with the proposed algorithm. The proposed method is easy to implement with only several lines of code. Novel adaptive ridge algorithms ( L 0 ADRIDGE) for L 0 penalized GLM with ultra high dimensional big data are developed. The proposed approach outperforms the other cutting edge regularization methods including SCAD and MC+ in simulations. When it is applied to integrated analysis of mRNA, microRNA, and methylation data from TCGA ovarian cancer, multilevel gene signatures associated with suboptimal debulking are identified simultaneously. The biological significance and potential clinical importance of those genes are further explored. The developed Software L 0 ADRIDGE in MATLAB is available at https://github.com/liuzqx/L0adridge.
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-01-01
Background Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Methodology/Principal Findings Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. Conclusions/Significance The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice. PMID:21359184
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-02-16
Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice.
Geometric mean for subspace selection.
Tao, Dacheng; Li, Xuelong; Wu, Xindong; Maybank, Stephen J
2009-02-01
Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.
Amis, Gregory P; Carpenter, Gail A
2010-03-01
Computational models of learning typically train on labeled input patterns (supervised learning), unlabeled input patterns (unsupervised learning), or a combination of the two (semi-supervised learning). In each case input patterns have a fixed number of features throughout training and testing. Human and machine learning contexts present additional opportunities for expanding incomplete knowledge from formal training, via self-directed learning that incorporates features not previously experienced. This article defines a new self-supervised learning paradigm to address these richer learning contexts, introducing a neural network called self-supervised ARTMAP. Self-supervised learning integrates knowledge from a teacher (labeled patterns with some features), knowledge from the environment (unlabeled patterns with more features), and knowledge from internal model activation (self-labeled patterns). Self-supervised ARTMAP learns about novel features from unlabeled patterns without destroying partial knowledge previously acquired from labeled patterns. A category selection function bases system predictions on known features, and distributed network activation scales unlabeled learning to prediction confidence. Slow distributed learning on unlabeled patterns focuses on novel features and confident predictions, defining classification boundaries that were ambiguous in the labeled patterns. Self-supervised ARTMAP improves test accuracy on illustrative low-dimensional problems and on high-dimensional benchmarks. Model code and benchmark data are available from: http://techlab.eu.edu/SSART/. Copyright 2009 Elsevier Ltd. All rights reserved.
Dimensionality reduction for the quantitative evaluation of a smartphone-based Timed Up and Go test.
Palmerini, Luca; Mellone, Sabato; Rocchi, Laura; Chiari, Lorenzo
2011-01-01
The Timed Up and Go is a clinical test to assess mobility in the elderly and in Parkinson's disease. Lately instrumented versions of the test are being considered, where inertial sensors assess motion. To improve the pervasiveness, ease of use, and cost, we consider a smartphone's accelerometer as the measurement system. Several parameters (usually highly correlated) can be computed from the signals recorded during the test. To avoid redundancy and obtain the features that are most sensitive to the locomotor performance, a dimensionality reduction was performed through principal component analysis (PCA). Forty-nine healthy subjects of different ages were tested. PCA was performed to extract new features (principal components) which are not redundant combinations of the original parameters and account for most of the data variability. They can be useful for exploratory analysis and outlier detection. Then, a reduced set of the original parameters was selected through correlation analysis with the principal components. This set could be recommended for studies based on healthy adults. The proposed procedure could be used as a first-level feature selection in classification studies (i.e. healthy-Parkinson's disease, fallers-non fallers) and could allow, in the future, a complete system for movement analysis to be incorporated in a smartphone.
An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.
Bravi, Luca; Piccialli, Veronica; Sciandrone, Marco
2017-04-01
In this paper, we consider the feature ranking problem, where, given a set of training instances, the task is to associate a score with the features in order to assess their relevance. Feature ranking is a very important tool for decision support systems, and may be used as an auxiliary step of feature selection to reduce the high dimensionality of real-world data. We focus on regression problems by assuming that the process underlying the generated data can be approximated by a continuous function (for instance, a feedforward neural network). We formally state the notion of relevance of a feature by introducing a minimum zero-norm inversion problem of a neural network, which is a nonsmooth, constrained optimization problem. We employ a concave approximation of the zero-norm function, and we define a smooth, global optimization problem to be solved in order to assess the relevance of the features. We present the new feature ranking method based on the solution of instances of the global optimization problem depending on the available training data. Computational experiments on both artificial and real data sets are performed, and point out that the proposed feature ranking method is a valid alternative to existing methods in terms of effectiveness. The obtained results also show that the method is costly in terms of CPU time, and this may be a limitation in the solution of large-dimensional problems.
NASA Astrophysics Data System (ADS)
Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore
2017-10-01
This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
2-DE combined with two-layer feature selection accurately establishes the origin of oolong tea.
Chien, Han-Ju; Chu, Yen-Wei; Chen, Chi-Wei; Juang, Yu-Min; Chien, Min-Wei; Liu, Chih-Wei; Wu, Chia-Chang; Tzen, Jason T C; Lai, Chien-Chen
2016-11-15
Taiwan is known for its high quality oolong tea. Because of high consumer demand, some tea manufactures mix lower quality leaves with genuine Taiwan oolong tea in order to increase profits. Robust scientific methods are, therefore, needed to verify the origin and quality of tea leaves. In this study, we investigated whether two-dimensional gel electrophoresis (2-DE) and nanoscale liquid chromatography/tandem mass spectroscopy (nano-LC/MS/MS) coupled with a two-layer feature selection mechanism comprising information gain attribute evaluation (IGAE) and support vector machine feature selection (SVM-FS) are useful in identifying characteristic proteins that can be used as markers of the original source of oolong tea. Samples in this study included oolong tea leaves from 23 different sources. We found that our method had an accuracy of 95.5% in correctly identifying the origin of the leaves. Overall, our method is a novel approach for determining the origin of oolong tea leaves. Copyright © 2016 Elsevier Ltd. All rights reserved.
Khotanlou, Hassan; Afrasiabi, Mahlagha
2012-10-01
This paper presents a new feature selection approach for automatically extracting multiple sclerosis (MS) lesions in three-dimensional (3D) magnetic resonance (MR) images. Presented method is applicable to different types of MS lesions. In this method, T1, T2, and fluid attenuated inversion recovery (FLAIR) images are firstly preprocessed. In the next phase, effective features to extract MS lesions are selected by using a genetic algorithm (GA). The fitness function of the GA is the Similarity Index (SI) of a support vector machine (SVM) classifier. The results obtained on different types of lesions have been evaluated by comparison with manual segmentations. This algorithm is evaluated on 15 real 3D MR images using several measures. As a result, the SI between MS regions determined by the proposed method and radiologists was 87% on average. Experiments and comparisons with other methods show the effectiveness and the efficiency of the proposed approach.
Derivative component analysis for mass spectral serum proteomic profiles.
Han, Henry
2014-01-01
As a promising way to transform medicine, mass spectrometry based proteomics technologies have seen a great progress in identifying disease biomarkers for clinical diagnosis and prognosis. However, there is a lack of effective feature selection methods that are able to capture essential data behaviors to achieve clinical level disease diagnosis. Moreover, it faces a challenge from data reproducibility, which means that no two independent studies have been found to produce same proteomic patterns. Such reproducibility issue causes the identified biomarker patterns to lose repeatability and prevents it from real clinical usage. In this work, we propose a novel machine-learning algorithm: derivative component analysis (DCA) for high-dimensional mass spectral proteomic profiles. As an implicit feature selection algorithm, derivative component analysis examines input proteomics data in a multi-resolution approach by seeking its derivatives to capture latent data characteristics and conduct de-noising. We further demonstrate DCA's advantages in disease diagnosis by viewing input proteomics data as a profile biomarker via integrating it with support vector machines to tackle the reproducibility issue, besides comparing it with state-of-the-art peers. Our results show that high-dimensional proteomics data are actually linearly separable under proposed derivative component analysis (DCA). As a novel multi-resolution feature selection algorithm, DCA not only overcomes the weakness of the traditional methods in subtle data behavior discovery, but also suggests an effective resolution to overcoming proteomics data's reproducibility problem and provides new techniques and insights in translational bioinformatics and machine learning. The DCA-based profile biomarker diagnosis makes clinical level diagnostic performances reproducible across different proteomic data, which is more robust and systematic than the existing biomarker discovery based diagnosis. Our findings demonstrate the feasibility and power of the proposed DCA-based profile biomarker diagnosis in achieving high sensitivity and conquering the data reproducibility issue in serum proteomics. Furthermore, our proposed derivative component analysis suggests the subtle data characteristics gleaning and de-noising are essential in separating true signals from red herrings for high-dimensional proteomic profiles, which can be more important than the conventional feature selection or dimension reduction. In particular, our profile biomarker diagnosis can be generalized to other omics data for derivative component analysis (DCA)'s nature of generic data analysis.
The life-cycle of upper-tropospheric jet streams identified with a novel data segmentation algorithm
NASA Astrophysics Data System (ADS)
Limbach, S.; Schömer, E.; Wernli, H.
2010-09-01
Jet streams are prominent features of the upper-tropospheric atmospheric flow. Through the thermal wind relationship these regions with intense horizontal wind speed (typically larger than 30 m/s) are associated with pronounced baroclinicity, i.e., with regions where extratropical cyclones develop due to baroclinic instability processes. Individual jet streams are non-stationary elongated features that can extend over more than 2000 km in the along-flow and 200-500 km in the across-flow direction, respectively. Their lifetime can vary between a few days and several weeks. In recent years, feature-based algorithms have been developed that allow compiling synoptic climatologies and typologies of upper-tropospheric jet streams based upon objective selection criteria and climatological reanalysis datasets. In this study a novel algorithm to efficiently identify jet streams using an extended region-growing segmentation approach is introduced. This algorithm iterates over a 4-dimensional field of horizontal wind speed from ECMWF analyses and decides at each grid point whether all prerequisites for a jet stream are met. In a single pass the algorithm keeps track of all adjacencies of these grid points and creates the 4-dimensional connected segments associated with each jet stream. In addition to the detection of these sets of connected grid points, the algorithm analyzes the development over time of the distinct 3-dimensional features each segment consists of. Important events in the development of these features, for example mergings and splittings, are detected and analyzed on a per-grid-point and per-feature basis. The output of the algorithm consists of the actual sets of grid-points augmented with information about the particular events, and of the so-called event graphs, which are an abstract representation of the distinct 3-dimensional features and events of each segment. This technique provides comprehensive information about the frequency of upper-tropospheric jet streams, their preferred regions of genesis, merging, splitting, and lysis, and statistical information about their size, amplitude and lifetime. The presentation will introduce the technique, provide example visualizations of the time evolution of the identified 3-dimensional jet stream features, and present results from a first multi-month "climatology" of upper-tropospheric jets. In the future, the technique can be applied to longer datasets, for instance reanalyses and output from global climate model simulations - and provide detailed information about key characteristics of jet stream life cycles.
Hyperspectral image classification based on local binary patterns and PCANet
NASA Astrophysics Data System (ADS)
Yang, Huizhen; Gao, Feng; Dong, Junyu; Yang, Yang
2018-04-01
Hyperspectral image classification has been well acknowledged as one of the challenging tasks of hyperspectral data processing. In this paper, we propose a novel hyperspectral image classification framework based on local binary pattern (LBP) features and PCANet. In the proposed method, linear prediction error (LPE) is first employed to select a subset of informative bands, and LBP is utilized to extract texture features. Then, spectral and texture features are stacked into a high dimensional vectors. Next, the extracted features of a specified position are transformed to a 2-D image. The obtained images of all pixels are fed into PCANet for classification. Experimental results on real hyperspectral dataset demonstrate the effectiveness of the proposed method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nadeau, Jeremy S.; Wright, Bob W.; Synovec, Robert E.
2010-04-15
A critical comparison of methods for correcting severely retention time shifted gas chromatography-mass spectrometry (GC-MS) data is presented. The method reported herein is an adaptation to the Piecewise Alignment Algorithm to quickly align severely shifted one-dimensional (1D) total ion current (TIC) data, then applying these shifts to broadly align all mass channels throughout the separation, referred to as a TIC shift function (SF). The maximum shift varied from (-) 5 s in the beginning of the chromatographic separation to (+) 20 s toward the end of the separation, equivalent to a maximum shift of over 5 peak widths. Implementing themore » TIC shift function (TIC SF) prior to Fisher Ratio (F-Ratio) feature selection and then principal component analysis (PCA) was found to be a viable approach to classify complex chromatograms, that in this study were obtained from GC-MS separations of three gasoline samples serving as complex test mixtures, referred to as types C, M and S. The reported alignment algorithm via the TIC SF approach corrects for large dynamic shifting in the data as well as subtle peak-to-peak shifts. The benefits of the overall TIC SF alignment and feature selection approach were quantified using the degree-of-class separation (DCS) metric of the PCA scores plots using the type C and M samples, since they were the most similar, and thus the most challenging samples to properly classify. The DCS values showed an increase from an initial value of essentially zero for the unaligned GC-TIC data to a value of 7.9 following alignment; however, the DCS was unchanged by feature selection using F-Ratios for the GC-TIC data. The full mass spectral data provided an increase to a final DCS of 13.7 after alignment and two-dimensional (2D) F-Ratio feature selection.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, H; Liu, T; Xu, X
Purpose: There are clinical decision challenges to select optimal treatment positions for left-sided breast cancer patients—supine free breathing (FB), supine Deep Inspiration Breath Hold (DIBH) and prone free breathing (prone). Physicians often make the decision based on experiences and trials, which might not always result optimal OAR doses. We herein propose a mathematical model to predict the lowest OAR doses among these three positions, providing a quantitative tool for corresponding clinical decision. Methods: Patients were scanned in FB, DIBH, and prone positions under an IRB approved protocol. Tangential beam plans were generated for each position, and OAR doses were calculated.more » The position with least OAR doses is defined as the optimal position. The following features were extracted from each scan to build the model: heart, ipsilateral lung, breast volume, in-field heart, ipsilateral lung volume, distance between heart and target, laterality of heart, and dose to heart and ipsilateral lung. Principal Components Analysis (PCA) was applied to remove the co-linearity of the input data and also to lower the data dimensionality. Feature selection, another method to reduce dimensionality, was applied as a comparison. Support Vector Machine (SVM) was then used for classification. Thirtyseven patient data were acquired; up to now, five patient plans were available. K-fold cross validation was used to validate the accuracy of the classifier model with small training size. Results: The classification results and K-fold cross validation demonstrated the model is capable of predicting the optimal position for patients. The accuracy of K-fold cross validations has reached 80%. Compared to PCA, feature selection allows causal features of dose to be determined. This provides more clinical insights. Conclusion: The proposed classification system appeared to be feasible. We are generating plans for the rest of the 37 patient images, and more statistically significant results are to be presented.« less
Learning what matters: A neural explanation for the sparsity bias.
Hassall, Cameron D; Connor, Patrick C; Trappenberg, Thomas P; McDonald, John J; Krigolson, Olave E
2018-05-01
The visual environment is filled with complex, multi-dimensional objects that vary in their value to an observer's current goals. When faced with multi-dimensional stimuli, humans may rely on biases to learn to select those objects that are most valuable to the task at hand. Here, we show that decision making in a complex task is guided by the sparsity bias: the focusing of attention on a subset of available features. Participants completed a gambling task in which they selected complex stimuli that varied randomly along three dimensions: shape, color, and texture. Each dimension comprised three features (e.g., color: red, green, yellow). Only one dimension was relevant in each block (e.g., color), and a randomly-chosen value ranking determined outcome probabilities (e.g., green > yellow > red). Participants were faster to respond to infrequent probe stimuli that appeared unexpectedly within stimuli that possessed a more valuable feature than to probes appearing within stimuli possessing a less valuable feature. Event-related brain potentials recorded during the task provided a neurophysiological explanation for sparsity as a learning-dependent increase in optimal attentional performance (as measured by the N2pc component of the human event-related potential) and a concomitant learning-dependent decrease in prediction errors (as measured by the feedback-elicited reward positivity). Together, our results suggest that the sparsity bias guides human reinforcement learning in complex environments. Copyright © 2018 Elsevier B.V. All rights reserved.
Recognizing Banknote Fitness with a Visible Light One Dimensional Line Image Sensor
Pham, Tuyen Danh; Park, Young Ho; Kwon, Seung Yong; Nguyen, Dat Tien; Vokhidov, Husan; Park, Kang Ryoung; Jeong, Dae Sik; Yoon, Sungsoo
2015-01-01
In general, dirty banknotes that have creases or soiled surfaces should be replaced by new banknotes, whereas clean banknotes should be recirculated. Therefore, the accurate classification of banknote fitness when sorting paper currency is an important and challenging task. Most previous research has focused on sensors that used visible, infrared, and ultraviolet light. Furthermore, there was little previous research on the fitness classification for Indian paper currency. Therefore, we propose a new method for classifying the fitness of Indian banknotes, with a one-dimensional line image sensor that uses only visible light. The fitness of banknotes is usually determined by various factors such as soiling, creases, and tears, etc. although we just consider banknote soiling in our research. This research is novel in the following four ways: first, there has been little research conducted on fitness classification for the Indian Rupee using visible-light images. Second, the classification is conducted based on the features extracted from the regions of interest (ROIs), which contain little texture. Third, 1-level discrete wavelet transformation (DWT) is used to extract the features for discriminating between fit and unfit banknotes. Fourth, the optimal DWT features that represent the fitness and unfitness of banknotes are selected based on linear regression analysis with ground-truth data measured by densitometer. In addition, the selected features are used as the inputs to a support vector machine (SVM) for the final classification of banknote fitness. Experimental results showed that our method outperforms other methods. PMID:26343654
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marney, Luke C.; Siegler, William C.; Parsons, Brendon A.
Two-dimensional (2D) gas chromatography coupled with time-of-flight mass spectrometry (GC × GC – TOFMS) is a highly capable instrumental platform that produces complex and information-rich multi-dimensional chemical data. The complex data can be overwhelming, especially when many samples (of various sample classes) are analyzed with multiple injections for each sample. Thus, the data must be analyzed in such a way to extract the most meaningful information. The pixel-based and peak table-based algorithmic use of Fisher ratios has been used successfully in the past to reduce the multi-dimensional data down to those chemical compounds that are changing between classes relative tomore » those that are not (i.e., chemical feature selection). We report on the initial development of a computationally fast novel tile-based Fisher-ratio software that addresses challenges due to 2D retention time misalignment without explicitly aligning the data, which is a problem for both pixel-based and peak table- based methods. Concurrently, the tile-based Fisher-ratio software maximizes the sensitivity contrast of true positives against a background of potential false positives and noise. To study this software, eight compounds, plus one internal standard, were spiked into diesel at various concentrations. The tile-based F-ratio software was able to discover all spiked analytes, within the complex diesel sample matrix with thousands of potential false positives, in each possible concentration comparison, even at the lowest absolute spiked analyte concentration ratio of 1.06.« less
A malware detection scheme based on mining format information.
Bai, Jinrong; Wang, Junfeng; Zou, Guozhong
2014-01-01
Malware has become one of the most serious threats to computer information system and the current malware detection technology still has very significant limitations. In this paper, we proposed a malware detection approach by mining format information of PE (portable executable) files. Based on in-depth analysis of the static format information of the PE files, we extracted 197 features from format information of PE files and applied feature selection methods to reduce the dimensionality of the features and achieve acceptable high performance. When the selected features were trained using classification algorithms, the results of our experiments indicate that the accuracy of the top classification algorithm is 99.1% and the value of the AUC is 0.998. We designed three experiments to evaluate the performance of our detection scheme and the ability of detecting unknown and new malware. Although the experimental results of identifying new malware are not perfect, our method is still able to identify 97.6% of new malware with 1.3% false positive rates.
A Malware Detection Scheme Based on Mining Format Information
Bai, Jinrong; Wang, Junfeng; Zou, Guozhong
2014-01-01
Malware has become one of the most serious threats to computer information system and the current malware detection technology still has very significant limitations. In this paper, we proposed a malware detection approach by mining format information of PE (portable executable) files. Based on in-depth analysis of the static format information of the PE files, we extracted 197 features from format information of PE files and applied feature selection methods to reduce the dimensionality of the features and achieve acceptable high performance. When the selected features were trained using classification algorithms, the results of our experiments indicate that the accuracy of the top classification algorithm is 99.1% and the value of the AUC is 0.998. We designed three experiments to evaluate the performance of our detection scheme and the ability of detecting unknown and new malware. Although the experimental results of identifying new malware are not perfect, our method is still able to identify 97.6% of new malware with 1.3% false positive rates. PMID:24991639
3D reconstruction based on light field images
NASA Astrophysics Data System (ADS)
Zhu, Dong; Wu, Chunhong; Liu, Yunluo; Fu, Dongmei
2018-04-01
This paper proposed a method of reconstructing three-dimensional (3D) scene from two light field images capture by Lytro illium. The work was carried out by first extracting the sub-aperture images from light field images and using the scale-invariant feature transform (SIFT) for feature registration on the selected sub-aperture images. Structure from motion (SFM) algorithm is further used on the registration completed sub-aperture images to reconstruct the three-dimensional scene. 3D sparse point cloud was obtained in the end. The method shows that the 3D reconstruction can be implemented by only two light field camera captures, rather than at least a dozen times captures by traditional cameras. This can effectively solve the time-consuming, laborious issues for 3D reconstruction based on traditional digital cameras, to achieve a more rapid, convenient and accurate reconstruction.
Enhancement of the CAVE computer code
NASA Astrophysics Data System (ADS)
Rathjen, K. A.; Burk, H. O.
1983-12-01
The computer code CAVE (Conduction Analysis via Eigenvalues) is a convenient and efficient computer code for predicting two dimensional temperature histories within thermal protection systems for hypersonic vehicles. The capabilities of CAVE were enhanced by incorporation of the following features into the code: real gas effects in the aerodynamic heating predictions, geometry and aerodynamic heating package for analyses of cone shaped bodies, input option to change from laminar to turbulent heating predictions on leading edges, modification to account for reduction in adiabatic wall temperature with increase in leading sweep, geometry package for two dimensional scramjet engine sidewall, with an option for heat transfer to external and internal surfaces, print out modification to provide tables of select temperatures for plotting and storage, and modifications to the radiation calculation procedure to eliminate temperature oscillations induced by high heating rates. These new features are described.
Support Vector Machines for Hyperspectral Remote Sensing Classification
NASA Technical Reports Server (NTRS)
Gualtieri, J. Anthony; Cromp, R. F.
1998-01-01
The Support Vector Machine provides a new way to design classification algorithms which learn from examples (supervised learning) and generalize when applied to new data. We demonstrate its success on a difficult classification problem from hyperspectral remote sensing, where we obtain performances of 96%, and 87% correct for a 4 class problem, and a 16 class problem respectively. These results are somewhat better than other recent results on the same data. A key feature of this classifier is its ability to use high-dimensional data without the usual recourse to a feature selection step to reduce the dimensionality of the data. For this application, this is important, as hyperspectral data consists of several hundred contiguous spectral channels for each exemplar. We provide an introduction to this new approach, and demonstrate its application to classification of an agriculture scene.
Kavakiotis, Ioannis; Samaras, Patroklos; Triantafyllidis, Alexandros; Vlahavas, Ioannis
2017-11-01
Single Nucleotide Polymorphism (SNPs) are, nowadays, becoming the marker of choice for biological analyses involving a wide range of applications with great medical, biological, economic and environmental interest. Classification tasks i.e. the assignment of individuals to groups of origin based on their (multi-locus) genotypes, are performed in many fields such as forensic investigations, discrimination between wild and/or farmed populations and others. Τhese tasks, should be performed with a small number of loci, for computational as well as biological reasons. Thus, feature selection should precede classification tasks, especially for Single Nucleotide Polymorphism (SNP) datasets, where the number of features can amount to hundreds of thousands or millions. In this paper, we present a novel data mining approach, called FIFS - Frequent Item Feature Selection, based on the use of frequent items for selection of the most informative markers from population genomic data. It is a modular method, consisting of two main components. The first one identifies the most frequent and unique genotypes for each sampled population. The second one selects the most appropriate among them, in order to create the informative SNP subsets to be returned. The proposed method (FIFS) was tested on a real dataset, which comprised of a comprehensive coverage of pig breed types present in Britain. This dataset consisted of 446 individuals divided in 14 sub-populations, genotyped at 59,436 SNPs. Our method outperforms the state-of-the-art and baseline methods in every case. More specifically, our method surpassed the assignment accuracy threshold of 95% needing only half the number of SNPs selected by other methods (FIFS: 28 SNPs, Delta: 70 SNPs Pairwise FST: 70 SNPs, In: 100 SNPs.) CONCLUSION: Our approach successfully deals with the problem of informative marker selection in high dimensional genomic datasets. It offers better results compared to existing approaches and can aid biologists in selecting the most informative markers with maximum discrimination power for optimization of cost-effective panels with applications related to e.g. species identification, wildlife management, and forensics. Copyright © 2017 Elsevier Ltd. All rights reserved.
An ecosystem model for tropical forest disturbance and selective logging
Maoyi Huang; Gregory P. Asner; Michael Keller; Joseph A. Berry
2008-01-01
[1] A new three-dimensional version of the Carnegie-Ames-Stanford Approach (CASA) ecosystem model (CASA-3D) was developed to simulate regional carbon cycling in tropical forest ecosystems after disturbances such as logging. CASA-3D has the following new features: (1) an alternative approach for calculating absorbed photosynthetically active radiation (APAR) using new...
n-SIFT: n-dimensional scale invariant feature transform.
Cheung, Warren; Hamarneh, Ghassan
2009-09-01
We propose the n-dimensional scale invariant feature transform (n-SIFT) method for extracting and matching salient features from scalar images of arbitrary dimensionality, and compare this method's performance to other related features. The proposed features extend the concepts used for 2-D scalar images in the computer vision SIFT technique for extracting and matching distinctive scale invariant features. We apply the features to images of arbitrary dimensionality through the use of hyperspherical coordinates for gradients and multidimensional histograms to create the feature vectors. We analyze the performance of a fully automated multimodal medical image matching technique based on these features, and successfully apply the technique to determine accurate feature point correspondence between pairs of 3-D MRI images and dynamic 3D + time CT data.
Wang, Yong; Wu, Qiao-Feng; Chen, Chen; Wu, Ling-Yun; Yan, Xian-Zhong; Yu, Shu-Guang; Zhang, Xiang-Sun; Liang, Fan-Rong
2012-01-01
Acupuncture has been practiced in China for thousands of years as part of the Traditional Chinese Medicine (TCM) and has gradually accepted in western countries as an alternative or complementary treatment. However, the underlying mechanism of acupuncture, especially whether there exists any difference between varies acupoints, remains largely unknown, which hinders its widespread use. In this study, we develop a novel Linear Programming based Feature Selection method (LPFS) to understand the mechanism of acupuncture effect, at molecular level, by revealing the metabolite biomarkers for acupuncture treatment. Specifically, we generate and investigate the high-throughput metabolic profiles of acupuncture treatment at several acupoints in human. To select the subsets of metabolites that best characterize the acupuncture effect for each meridian point, an optimization model is proposed to identify biomarkers from high-dimensional metabolic data from case and control samples. Importantly, we use nearest centroid as the prototype to simultaneously minimize the number of selected features and the leave-one-out cross validation error of classifier. We compared the performance of LPFS to several state-of-the-art methods, such as SVM recursive feature elimination (SVM-RFE) and sparse multinomial logistic regression approach (SMLR). We find that our LPFS method tends to reveal a small set of metabolites with small standard deviation and large shifts, which exactly serves our requirement for good biomarker. Biologically, several metabolite biomarkers for acupuncture treatment are revealed and serve as the candidates for further mechanism investigation. Also biomakers derived from five meridian points, Zusanli (ST36), Liangmen (ST21), Juliao (ST3), Yanglingquan (GB34), and Weizhong (BL40), are compared for their similarity and difference, which provide evidence for the specificity of acupoints. Our result demonstrates that metabolic profiling might be a promising method to investigate the molecular mechanism of acupuncture. Comparing with other existing methods, LPFS shows better performance to select a small set of key molecules. In addition, LPFS is a general methodology and can be applied to other high-dimensional data analysis, for example cancer genomics.
2012-01-01
Background Acupuncture has been practiced in China for thousands of years as part of the Traditional Chinese Medicine (TCM) and has gradually accepted in western countries as an alternative or complementary treatment. However, the underlying mechanism of acupuncture, especially whether there exists any difference between varies acupoints, remains largely unknown, which hinders its widespread use. Results In this study, we develop a novel Linear Programming based Feature Selection method (LPFS) to understand the mechanism of acupuncture effect, at molecular level, by revealing the metabolite biomarkers for acupuncture treatment. Specifically, we generate and investigate the high-throughput metabolic profiles of acupuncture treatment at several acupoints in human. To select the subsets of metabolites that best characterize the acupuncture effect for each meridian point, an optimization model is proposed to identify biomarkers from high-dimensional metabolic data from case and control samples. Importantly, we use nearest centroid as the prototype to simultaneously minimize the number of selected features and the leave-one-out cross validation error of classifier. We compared the performance of LPFS to several state-of-the-art methods, such as SVM recursive feature elimination (SVM-RFE) and sparse multinomial logistic regression approach (SMLR). We find that our LPFS method tends to reveal a small set of metabolites with small standard deviation and large shifts, which exactly serves our requirement for good biomarker. Biologically, several metabolite biomarkers for acupuncture treatment are revealed and serve as the candidates for further mechanism investigation. Also biomakers derived from five meridian points, Zusanli (ST36), Liangmen (ST21), Juliao (ST3), Yanglingquan (GB34), and Weizhong (BL40), are compared for their similarity and difference, which provide evidence for the specificity of acupoints. Conclusions Our result demonstrates that metabolic profiling might be a promising method to investigate the molecular mechanism of acupuncture. Comparing with other existing methods, LPFS shows better performance to select a small set of key molecules. In addition, LPFS is a general methodology and can be applied to other high-dimensional data analysis, for example cancer genomics. PMID:23046877
Evolutionary Algorithm Based Feature Optimization for Multi-Channel EEG Classification.
Wang, Yubo; Veluvolu, Kalyana C
2017-01-01
The most BCI systems that rely on EEG signals employ Fourier based methods for time-frequency decomposition for feature extraction. The band-limited multiple Fourier linear combiner is well-suited for such band-limited signals due to its real-time applicability. Despite the improved performance of these techniques in two channel settings, its application in multiple-channel EEG is not straightforward and challenging. As more channels are available, a spatial filter will be required to eliminate the noise and preserve the required useful information. Moreover, multiple-channel EEG also adds the high dimensionality to the frequency feature space. Feature selection will be required to stabilize the performance of the classifier. In this paper, we develop a new method based on Evolutionary Algorithm (EA) to solve these two problems simultaneously. The real-valued EA encodes both the spatial filter estimates and the feature selection into its solution and optimizes it with respect to the classification error. Three Fourier based designs are tested in this paper. Our results show that the combination of Fourier based method with covariance matrix adaptation evolution strategy (CMA-ES) has the best overall performance.
Non-specific filtering of beta-distributed data.
Wang, Xinhui; Laird, Peter W; Hinoue, Toshinori; Groshen, Susan; Siegmund, Kimberly D
2014-06-19
Non-specific feature selection is a dimension reduction procedure performed prior to cluster analysis of high dimensional molecular data. Not all measured features are expected to show biological variation, so only the most varying are selected for analysis. In DNA methylation studies, DNA methylation is measured as a proportion, bounded between 0 and 1, with variance a function of the mean. Filtering on standard deviation biases the selection of probes to those with mean values near 0.5. We explore the effect this has on clustering, and develop alternate filter methods that utilize a variance stabilizing transformation for Beta distributed data and do not share this bias. We compared results for 11 different non-specific filters on eight Infinium HumanMethylation data sets, selected to span a variety of biological conditions. We found that for data sets having a small fraction of samples showing abnormal methylation of a subset of normally unmethylated CpGs, a characteristic of the CpG island methylator phenotype in cancer, a novel filter statistic that utilized a variance-stabilizing transformation for Beta distributed data outperformed the common filter of using standard deviation of the DNA methylation proportion, or its log-transformed M-value, in its ability to detect the cancer subtype in a cluster analysis. However, the standard deviation filter always performed among the best for distinguishing subgroups of normal tissue. The novel filter and standard deviation filter tended to favour features in different genome contexts; for the same data set, the novel filter always selected more features from CpG island promoters and the standard deviation filter always selected more features from non-CpG island intergenic regions. Interestingly, despite selecting largely non-overlapping sets of features, the two filters did find sample subsets that overlapped for some real data sets. We found two different filter statistics that tended to prioritize features with different characteristics, each performed well for identifying clusters of cancer and non-cancer tissue, and identifying a cancer CpG island hypermethylation phenotype. Since cluster analysis is for discovery, we would suggest trying both filters on any new data sets, evaluating the overlap of features selected and clusters discovered.
Feature Selection Methods for Robust Decoding of Finger Movements in a Non-human Primate
Padmanaban, Subash; Baker, Justin; Greger, Bradley
2018-01-01
Objective: The performance of machine learning algorithms used for neural decoding of dexterous tasks may be impeded due to problems arising when dealing with high-dimensional data. The objective of feature selection algorithms is to choose a near-optimal subset of features from the original feature space to improve the performance of the decoding algorithm. The aim of our study was to compare the effects of four feature selection techniques, Wilcoxon signed-rank test, Relative Importance, Principal Component Analysis (PCA), and Mutual Information Maximization on SVM classification performance for a dexterous decoding task. Approach: A nonhuman primate (NHP) was trained to perform small coordinated movements—similar to typing. An array of microelectrodes was implanted in the hand area of the motor cortex of the NHP and used to record action potentials (AP) during finger movements. A Support Vector Machine (SVM) was used to classify which finger movement the NHP was making based upon AP firing rates. We used the SVM classification to examine the functional parameters of (i) robustness to simulated failure and (ii) longevity of classification. We also compared the effect of using isolated-neuron and multi-unit firing rates as the feature vector supplied to the SVM. Main results: The average decoding accuracy for multi-unit features and single-unit features using Mutual Information Maximization (MIM) across 47 sessions was 96.74 ± 3.5% and 97.65 ± 3.36% respectively. The reduction in decoding accuracy between using 100% of the features and 10% of features based on MIM was 45.56% (from 93.7 to 51.09%) and 4.75% (from 95.32 to 90.79%) for multi-unit and single-unit features respectively. MIM had best performance compared to other feature selection methods. Significance: These results suggest improved decoding performance can be achieved by using optimally selected features. The results based on clinically relevant performance metrics also suggest that the decoding algorithm can be made robust by using optimal features and feature selection algorithms. We believe that even a few percent increase in performance is important and improves the decoding accuracy of the machine learning algorithm potentially increasing the ease of use of a brain machine interface. PMID:29467602
Feature selection and classification of multiparametric medical images using bagging and SVM
NASA Astrophysics Data System (ADS)
Fan, Yong; Resnick, Susan M.; Davatzikos, Christos
2008-03-01
This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.
Combined data mining/NIR spectroscopy for purity assessment of lime juice
NASA Astrophysics Data System (ADS)
Shafiee, Sahameh; Minaei, Saeid
2018-06-01
This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spectra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature.
NASA Astrophysics Data System (ADS)
Qu, Haicheng; Liang, Xuejian; Liang, Shichao; Liu, Wanjun
2018-01-01
Many methods of hyperspectral image classification have been proposed recently, and the convolutional neural network (CNN) achieves outstanding performance. However, spectral-spatial classification of CNN requires an excessively large model, tremendous computations, and complex network, and CNN is generally unable to use the noisy bands caused by water-vapor absorption. A dimensionality-varied CNN (DV-CNN) is proposed to address these issues. There are four stages in DV-CNN and the dimensionalities of spectral-spatial feature maps vary with the stages. DV-CNN can reduce the computation and simplify the structure of the network. All feature maps are processed by more kernels in higher stages to extract more precise features. DV-CNN also improves the classification accuracy and enhances the robustness to water-vapor absorption bands. The experiments are performed on data sets of Indian Pines and Pavia University scene. The classification performance of DV-CNN is compared with state-of-the-art methods, which contain the variations of CNN, traditional, and other deep learning methods. The experiment of performance analysis about DV-CNN itself is also carried out. The experimental results demonstrate that DV-CNN outperforms state-of-the-art methods for spectral-spatial classification and it is also robust to water-vapor absorption bands. Moreover, reasonable parameters selection is effective to improve classification accuracy.
Maximum entropy PDF projection: A review
NASA Astrophysics Data System (ADS)
Baggenstoss, Paul M.
2017-06-01
We review maximum entropy (MaxEnt) PDF projection, a method with wide potential applications in statistical inference. The method constructs a sampling distribution for a high-dimensional vector x based on knowing the sampling distribution p(z) of a lower-dimensional feature z = T (x). Under mild conditions, the distribution p(x) having highest possible entropy among all distributions consistent with p(z) may be readily found. Furthermore, the MaxEnt p(x) may be sampled, making the approach useful in Monte Carlo methods. We review the theorem and present a case study in model order selection and classification for handwritten character recognition.
Three-dimensional multigrid Navier-Stokes computations for turbomachinery applications
NASA Astrophysics Data System (ADS)
Subramanian, S. V.
1989-07-01
The fully three-dimensional, time-dependent compressible Navier-Stokes equations in cylindrical coordinates are presently used, in conjunction with the multistage Runge-Kutta numerical integration scheme for solution of the governing flow equations, to simulate complex flowfields within turbomechanical components whose pertinent effects encompass those of viscosity, compressibility, blade rotation, and tip clearance. Computed results are presented for selected cascades, emphasizing the code's capabilities in the accurate prediction of such features as airfoil loadings, exit flow angles, shocks, and secondary flows. Computations for several test cases have been performed on a Cray-YMP, using nearly 90,000 grid points.
Kumar, Girijesh; Gupta, Rajeev
2013-10-07
The present work shows the utilization of Co(3+) complexes appended with either para- or meta-arylcarboxylic acid groups as the molecular building blocks for the construction of three-dimensional {Co(3+)-Zn(2+)} and {Co(3+)-Cd(2+)} heterobimetallic networks. The structural characterizations of these networks show several interesting features including well-defined pores and channels. These networks function as heterogeneous and reusable catalysts for the regio- and stereoselective ring-opening reactions of various epoxides and size-selective cyanation reactions of assorted aldehydes.
Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A
2017-09-15
Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code available at http://insilico.utulsa.edu/software/privateEC . brett-mckinney@utulsa.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Rodriguez-Rivera, Veronica; Weidner, John W.; Yost, Michael J.
2016-01-01
Tissue scaffolds play a crucial role in the tissue regeneration process. The ideal scaffold must fulfill several requirements such as having proper composition, targeted modulus, and well-defined architectural features. Biomaterials that recapitulate the intrinsic architecture of in vivo tissue are vital for studying diseases as well as to facilitate the regeneration of lost and malformed soft tissue. A novel biofabrication technique was developed which combines state of the art imaging, three-dimensional (3D) printing, and selective enzymatic activity to create a new generation of biomaterials for research and clinical application. The developed material, Bovine Serum Albumin rubber, is reaction injected into a mold that upholds specific geometrical features. This sacrificial material allows the adequate transfer of architectural features to a natural scaffold material. The prototype consists of a 3D collagen scaffold with 4 and 3 mm channels that represent a branched architecture. This paper emphasizes the use of this biofabrication technique for the generation of natural constructs. This protocol utilizes a computer-aided software (CAD) to manufacture a solid mold which will be reaction injected with BSA rubber followed by the enzymatic digestion of the rubber, leaving its architectural features within the scaffold material. PMID:26967145
Rodriguez-Rivera, Veronica; Weidner, John W; Yost, Michael J
2016-02-12
Tissue scaffolds play a crucial role in the tissue regeneration process. The ideal scaffold must fulfill several requirements such as having proper composition, targeted modulus, and well-defined architectural features. Biomaterials that recapitulate the intrinsic architecture of in vivo tissue are vital for studying diseases as well as to facilitate the regeneration of lost and malformed soft tissue. A novel biofabrication technique was developed which combines state of the art imaging, three-dimensional (3D) printing, and selective enzymatic activity to create a new generation of biomaterials for research and clinical application. The developed material, Bovine Serum Albumin rubber, is reaction injected into a mold that upholds specific geometrical features. This sacrificial material allows the adequate transfer of architectural features to a natural scaffold material. The prototype consists of a 3D collagen scaffold with 4 and 3 mm channels that represent a branched architecture. This paper emphasizes the use of this biofabrication technique for the generation of natural constructs. This protocol utilizes a computer-aided software (CAD) to manufacture a solid mold which will be reaction injected with BSA rubber followed by the enzymatic digestion of the rubber, leaving its architectural features within the scaffold material.
Liu, Xiao; Shi, Jun; Zhou, Shichong; Lu, Minhua
2014-01-01
The dimensionality reduction is an important step in ultrasound image based computer-aided diagnosis (CAD) for breast cancer. A newly proposed l2,1 regularized correntropy algorithm for robust feature selection (CRFS) has achieved good performance for noise corrupted data. Therefore, it has the potential to reduce the dimensions of ultrasound image features. However, in clinical practice, the collection of labeled instances is usually expensive and time costing, while it is relatively easy to acquire the unlabeled or undetermined instances. Therefore, the semi-supervised learning is very suitable for clinical CAD. The iterated Laplacian regularization (Iter-LR) is a new regularization method, which has been proved to outperform the traditional graph Laplacian regularization in semi-supervised classification and ranking. In this study, to augment the classification accuracy of the breast ultrasound CAD based on texture feature, we propose an Iter-LR-based semi-supervised CRFS (Iter-LR-CRFS) algorithm, and then apply it to reduce the feature dimensions of ultrasound images for breast CAD. We compared the Iter-LR-CRFS with LR-CRFS, original supervised CRFS, and principal component analysis. The experimental results indicate that the proposed Iter-LR-CRFS significantly outperforms all other algorithms.
Variations in lithospheric thickness on Venus
NASA Technical Reports Server (NTRS)
Johnson, C. L.; Sandwell, David T.
1992-01-01
Recent analyses of Magellan data have indicated many regions exhibiting topograhic flexure. On Venus, flexure is associated predominantly with coronae and the chasmata with Aphrodite Terra. Modeling of these flexural signatures allows the elastic and mechanical thickness of the lithosphere to be estimated. In areas where the lithosphere is flexed beyond its elastic limit the saturation moment provides information on the strength of the lithosphere. Modeling of 12 flexural features on Venus has indicated lithospheric thicknesses comparable with terrestrial values. This has important implications for the venusian heat budget. Flexure of a thin elastic plate due simultaneously to a line load on a continuous plate and a bending moment applied to the end of a broken plate is considered. The mean radius and regional topographic gradient are also included in the model. Features with a large radius of curvature were selected so that a two-dimensional approximation could be used. Comparisons with an axisymmetric model were made for some features to check the validity of the two-dimensional assumption. The best-fit elastic thickness was found for each profile crossing a given flexural feature. In addition, the surface stress and bending moment at the first zero crossing of each profile were also calculated. Flexural amplitudes and elastic thicknesses obtained for 12 features vary significantly. Three examples of the model fitting procedures are discussed.
Liu, Aiming; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi
2017-01-01
Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems. PMID:29117100
Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi
2017-11-08
Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.
The Analysis of Dimensionality Reduction Techniques in Cryptographic Object Code Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jason L. Wright; Milos Manic
2010-05-01
This paper compares the application of three different dimension reduction techniques to the problem of locating cryptography in compiled object code. A simple classi?er is used to compare dimension reduction via sorted covariance, principal component analysis, and correlation-based feature subset selection. The analysis concentrates on the classi?cation accuracy as the number of dimensions is increased.
Zviagin, V N; Galitskaia, O I; Negasheva, M A
2012-01-01
We have determined absolute dimensions of the head and the relationship between the dimensions of its selected parts. The study enrolled adult subjects (mostly of Russian ethnicity) at the age from 17 to 22 years (1108 men and 1153 women). We calculated the normal values for the estimation of real dimensional characteristics and the frequency of their occurrence in the population. The proposed approach makes it possible to reliably identify the dimensional features of human appearance in terms of the quantitative verbal description (categories 1-5) and to reveal its most characteristic features. The results of this biometric study of the heads of unrecognized corpses obtained by the specially developed technology may be used in operational and search investigations, in the procedure of corpse identification, and forensic medical personality identification of a missing subject.
NASA Technical Reports Server (NTRS)
Rathjen, K. A.; Burk, H. O.
1983-01-01
The computer code CAVE (Conduction Analysis via Eigenvalues) is a convenient and efficient computer code for predicting two dimensional temperature histories within thermal protection systems for hypersonic vehicles. The capabilities of CAVE were enhanced by incorporation of the following features into the code: real gas effects in the aerodynamic heating predictions, geometry and aerodynamic heating package for analyses of cone shaped bodies, input option to change from laminar to turbulent heating predictions on leading edges, modification to account for reduction in adiabatic wall temperature with increase in leading sweep, geometry package for two dimensional scramjet engine sidewall, with an option for heat transfer to external and internal surfaces, print out modification to provide tables of select temperatures for plotting and storage, and modifications to the radiation calculation procedure to eliminate temperature oscillations induced by high heating rates. These new features are described.
Recovery of sparse translation-invariant signals with continuous basis pursuit
Ekanadham, Chaitanya; Tranchina, Daniel; Simoncelli, Eero
2013-01-01
We consider the problem of decomposing a signal into a linear combination of features, each a continuously translated version of one of a small set of elementary features. Although these constituents are drawn from a continuous family, most current signal decomposition methods rely on a finite dictionary of discrete examples selected from this family (e.g., shifted copies of a set of basic waveforms), and apply sparse optimization methods to select and solve for the relevant coefficients. Here, we generate a dictionary that includes auxiliary interpolation functions that approximate translates of features via adjustment of their coefficients. We formulate a constrained convex optimization problem, in which the full set of dictionary coefficients represents a linear approximation of the signal, the auxiliary coefficients are constrained so as to only represent translated features, and sparsity is imposed on the primary coefficients using an L1 penalty. The basis pursuit denoising (BP) method may be seen as a special case, in which the auxiliary interpolation functions are omitted, and we thus refer to our methodology as continuous basis pursuit (CBP). We develop two implementations of CBP for a one-dimensional translation-invariant source, one using a first-order Taylor approximation, and another using a form of trigonometric spline. We examine the tradeoff between sparsity and signal reconstruction accuracy in these methods, demonstrating empirically that trigonometric CBP substantially outperforms Taylor CBP, which in turn offers substantial gains over ordinary BP. In addition, the CBP bases can generally achieve equally good or better approximations with much coarser sampling than BP, leading to a reduction in dictionary dimensionality. PMID:24352562
Knauer, Uwe; Matros, Andrea; Petrovic, Tijana; Zanker, Timothy; Scott, Eileen S; Seiffert, Udo
2017-01-01
Hyperspectral imaging is an emerging means of assessing plant vitality, stress parameters, nutrition status, and diseases. Extraction of target values from the high-dimensional datasets either relies on pixel-wise processing of the full spectral information, appropriate selection of individual bands, or calculation of spectral indices. Limitations of such approaches are reduced classification accuracy, reduced robustness due to spatial variation of the spectral information across the surface of the objects measured as well as a loss of information intrinsic to band selection and use of spectral indices. In this paper we present an improved spatial-spectral segmentation approach for the analysis of hyperspectral imaging data and its application for the prediction of powdery mildew infection levels (disease severity) of intact Chardonnay grape bunches shortly before veraison. Instead of calculating texture features (spatial features) for the huge number of spectral bands independently, dimensionality reduction by means of Linear Discriminant Analysis (LDA) was applied first to derive a few descriptive image bands. Subsequent classification was based on modified Random Forest classifiers and selective extraction of texture parameters from the integral image representation of the image bands generated. Dimensionality reduction, integral images, and the selective feature extraction led to improved classification accuracies of up to [Formula: see text] for detached berries used as a reference sample (training dataset). Our approach was validated by predicting infection levels for a sample of 30 intact bunches. Classification accuracy improved with the number of decision trees of the Random Forest classifier. These results corresponded with qPCR results. An accuracy of 0.87 was achieved in classification of healthy, infected, and severely diseased bunches. However, discrimination between visually healthy and infected bunches proved to be challenging for a few samples, perhaps due to colonized berries or sparse mycelia hidden within the bunch or airborne conidia on the berries that were detected by qPCR. An advanced approach to hyperspectral image classification based on combined spatial and spectral image features, potentially applicable to many available hyperspectral sensor technologies, has been developed and validated to improve the detection of powdery mildew infection levels of Chardonnay grape bunches. The spatial-spectral approach improved especially the detection of light infection levels compared with pixel-wise spectral data analysis. This approach is expected to improve the speed and accuracy of disease detection once the thresholds for fungal biomass detected by hyperspectral imaging are established; it can also facilitate monitoring in plant phenotyping of grapevine and additional crops.
NASA Astrophysics Data System (ADS)
Peetermans, S.; Bopp, M.; Vontobel, P.; Lehmann, E. H.
Common neutron imaging uses the full polychromatic neutron beam spectrum to reveal the material distribution in a non-destructive way. Performing it with a reduced energy band, i.e. energy-selective neutron imaging, allows access to local variation in sample crystallographic properties. Two sample categories can be discerned with different energy responses. Polycrystalline materials have an energy-dependent cross-section featuring Bragg edges. Energy-selective neutron imaging can be used to distinguish be- tween crystallographic phases, increase material sensitivity or penetration, improve quantification etc. An example of the latter is shown by the examination of copper discs prior to machining them into linear accelerator cavity structures. The cross-section of single crystals features distinct Bragg peaks. Based on their pattern, one can determine the orientation of the crystal, as in a Laue pattern, but with the tremendous advantage that the operation can be performed for each pixel, yielding crystal orientation maps at high spatial resolution. A wholly different method to investigate such samples is also introduced: neutron diffraction imaging. It is based on projections formed by neutrons diffracted from the crystal lattice out of the direct beam. The position of these projections on the detector gives information on the crystal orientation. The projection itself can be used to reconstruct the crystal shape. A three-dimensional mapping of local Bragg reflectivity or a grain orientation mapping can thus be obtained.
Matrix-Assisted Three-Dimensional Printing of Cellulose Nanofibers for Paper Microfluidics.
Shin, Sungchul; Hyun, Jinho
2017-08-09
A cellulose nanofiber (CNF), one of the most attractive green bioresources, was adopted for construction of microfluidic devices using matrix-assisted three-dimensional (3D) printing. CNF hydrogels can support structures printed using CAD design in a 3D hydrogel environment with the appropriate combination of rheological properties between the CNF hydrogel and ink materials. Amazingly, the structure printed freely in the bulky CNF hydrogels was able to retain its highly resolved 3D features in an ultrathin two-dimensional (2D) paper using a simple drying process. The dimensional change in the CNF hydrogels from 3D to 2D resulted from simple dehydration of the CNFs and provided transparent, stackable paper-based 3D channel devices. As a proof of principle, the rheological properties of the CNF hydrogels, the 3D structure of the ink, the formation of channels by evacuation of the ink, and the highly localized selectivity of the devices are described.
Wang, Jing-Jing; Wu, Hai-Feng; Sun, Tao; Li, Xia; Wang, Wei; Tao, Li-Xin; Huo, Da; Lv, Ping-Xin; He, Wen; Guo, Xiu-Hua
2013-01-01
Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images.
The Cutplane - A tool for interactive solid modeling
NASA Technical Reports Server (NTRS)
Edwards, Laurence; Kessler, William; Leifer, Larry
1988-01-01
A geometric modeling system which incorporates a new concept for intuitively and unambiguously specifying and manipulating points or features in three dimensional space is presented. The central concept, the Cutplane, consists of a plane that moves through space under control of a mouse or similar input device. The intersection of the plane and any object is highlighted, and only this highlighted section can be selected for manipulation. Selection is accomplished with a crosshair that is constrained to remain within the plane, so that the relationship between the crosshair and the feature of interest is immediately evident. Although the idea of a section view is not new, previously it has been used as a way to reveal hidden structure, not as a means of manipulating objects or indicating spatial position, as is proposed here.
Dense mesh sampling for video-based facial animation
NASA Astrophysics Data System (ADS)
Peszor, Damian; Wojciechowska, Marzena
2016-06-01
The paper describes an approach for selection of feature points on three-dimensional, triangle mesh obtained using various techniques from several video footages. This approach has a dual purpose. First, it allows to minimize the data stored for the purpose of facial animation, so that instead of storing position of each vertex in each frame, one could store only a small subset of vertices for each frame and calculate positions of others based on the subset. Second purpose is to select feature points that could be used for anthropometry-based retargeting of recorded mimicry to another model, with sampling density beyond that which can be achieved using marker-based performance capture techniques. Developed approach was successfully tested on artificial models, models constructed using structured light scanner, and models constructed from video footages using stereophotogrammetry.
Self-Assembled Polysaccharide Nanotubes Generated from β-1,3-Glucan Polysaccharides
NASA Astrophysics Data System (ADS)
Numata, Munenori; Shinkai, Seiji
β-1,3-Glucans act as unique natural nanotubes, the features of which are greatly different from other natural or synthetic helical polymers. The origin mostly stems from their strong helix-forming nature and reversible interconversion between single-strand random coil and triple-strand helix. During this interconversion process, they can accept functional polymers, molecular assemblies and nanoparticles in an induced-fit manner to create water-soluble one-dimensional nanocomposites, where individual conjugated polymers or molecular assemblies can be incorporated into the one-dimensional hollow constructed by the helical superstructure of β-1,3-glucans. The advantageous point of the β-1,3-glucan hosting system is that the selective modification of β-1,3-glucans leads to the creation of various functional one-dimensional nanocomposites in a supramolecular manner, being applicable toward fundamental nanomaterials such as sensors or circuits. Furthermore, the composites with functional surfaces can act as one-dimensional building blocks toward further hierarchical self-assemblies, leading to the creation of two- or three-dimensional nanoarchitectures.
Valley excitons in two-dimensional semiconductors
Yu, Hongyi; Cui, Xiaodong; Xu, Xiaodong; ...
2014-12-30
Monolayer group-VIB transition metal dichalcogenides have recently emerged as a new class of semiconductors in the two-dimensional limit. The attractive properties include: the visible range direct band gap ideal for exploring optoelectronic applications; the intriguing physics associated with spin and valley pseudospin of carriers which implies potentials for novel electronics based on these internal degrees of freedom; the exceptionally strong Coulomb interaction due to the two-dimensional geometry and the large effective masses. The physics of excitons, the bound states of electrons and holes, has been one of the most actively studied topics on these two-dimensional semiconductors, where the excitons exhibitmore » remarkably new features due to the strong Coulomb binding, the valley degeneracy of the band edges, and the valley dependent optical selection rules for interband transitions. Here we give a brief overview of the experimental and theoretical findings on excitons in two-dimensional transition metal dichalcogenides, with focus on the novel properties associated with their valley degrees of freedom.« less
NASA Astrophysics Data System (ADS)
Specht, Judith F.; Knorr, Andreas; Richter, Marten
2015-04-01
The linear and two-dimensional coherent optical spectra of Coulomb-coupled quantum emitters are discussed with respect to the underlying coupling processes. We present a theoretical analysis of the two different resonance energy transfer mechanisms between coupled nanostructures: Förster and Dexter interaction. Our investigation shows that the features visible in optical spectra of coupled quantum dots can be traced back to the nature of the underlying coupling mechanism (Förster or Dexter). Therefore, we discuss how the excitation transfer pathways can be controlled by choosing particular laser polarizations and mutual orientations of the quantum emitters in coherent two-dimensional spectroscopy. In this context, we analyze to what extent the delocalized double-excitonic states are bound to the optical selection rules of the uncoupled system.
NASA Astrophysics Data System (ADS)
Hoell, Simon; Omenzetter, Piotr
2018-02-01
To advance the concept of smart structures in large systems, such as wind turbines (WTs), it is desirable to be able to detect structural damage early while using minimal instrumentation. Data-driven vibration-based damage detection methods can be competitive in that respect because global vibrational responses encompass the entire structure. Multivariate damage sensitive features (DSFs) extracted from acceleration responses enable to detect changes in a structure via statistical methods. However, even though such DSFs contain information about the structural state, they may not be optimised for the damage detection task. This paper addresses the shortcoming by exploring a DSF projection technique specialised for statistical structural damage detection. High dimensional initial DSFs are projected onto a low-dimensional space for improved damage detection performance and simultaneous computational burden reduction. The technique is based on sequential projection pursuit where the projection vectors are optimised one by one using an advanced evolutionary strategy. The approach is applied to laboratory experiments with a small-scale WT blade under wind-like excitations. Autocorrelation function coefficients calculated from acceleration signals are employed as DSFs. The optimal numbers of projection vectors are identified with the help of a fast forward selection procedure. To benchmark the proposed method, selections of original DSFs as well as principal component analysis scores from these features are additionally investigated. The optimised DSFs are tested for damage detection on previously unseen data from the healthy state and a wide range of damage scenarios. It is demonstrated that using selected subsets of the initial and transformed DSFs improves damage detectability compared to the full set of features. Furthermore, superior results can be achieved by projecting autocorrelation coefficients onto just a single optimised projection vector.
Robust evaluation of time series classification algorithms for structural health monitoring
NASA Astrophysics Data System (ADS)
Harvey, Dustin Y.; Worden, Keith; Todd, Michael D.
2014-03-01
Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and mechanical infrastructure through analysis of structural response measurements. The supervised learning methodology for data-driven SHM involves computation of low-dimensional, damage-sensitive features from raw measurement data that are then used in conjunction with machine learning algorithms to detect, classify, and quantify damage states. However, these systems often suffer from performance degradation in real-world applications due to varying operational and environmental conditions. Probabilistic approaches to robust SHM system design suffer from incomplete knowledge of all conditions a system will experience over its lifetime. Info-gap decision theory enables nonprobabilistic evaluation of the robustness of competing models and systems in a variety of decision making applications. Previous work employed info-gap models to handle feature uncertainty when selecting various components of a supervised learning system, namely features from a pre-selected family and classifiers. In this work, the info-gap framework is extended to robust feature design and classifier selection for general time series classification through an efficient, interval arithmetic implementation of an info-gap data model. Experimental results are presented for a damage type classification problem on a ball bearing in a rotating machine. The info-gap framework in conjunction with an evolutionary feature design system allows for fully automated design of a time series classifier to meet performance requirements under maximum allowable uncertainty.
Evaluation of variable selection methods for random forests and omics data sets.
Degenhardt, Frauke; Seifert, Stephan; Szymczak, Silke
2017-10-16
Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our comparison included the Boruta algorithm, the Vita method, recurrent relative variable importance, a permutation approach and its parametric variant (Altmann) as well as recursive feature elimination (RFE). In our simulation studies, Boruta was the most powerful approach, followed closely by the Vita method. Both approaches demonstrated similar stability in variable selection, while Vita was the most robust approach under a pure null model without any predictor variables related to the outcome. In the analysis of the different experimental data sets, Vita demonstrated slightly better stability in variable selection and was less computationally intensive than Boruta.In conclusion, we recommend the Boruta and Vita approaches for the analysis of high-dimensional data sets. Vita is considerably faster than Boruta and thus more suitable for large data sets, but only Boruta can also be applied in low-dimensional settings. © The Author 2017. Published by Oxford University Press.
Thomaz, Ricardo de Lima; Carneiro, Pedro Cunha; Bonin, João Eliton; Macedo, Túlio Augusto Alves; Patrocinio, Ana Claudia; Soares, Alcimar Barbosa
2018-05-01
Detection of early hepatocellular carcinoma (HCC) is responsible for increasing survival rates in up to 40%. One-class classifiers can be used for modeling early HCC in multidetector computed tomography (MDCT), but demand the specific knowledge pertaining to the set of features that best describes the target class. Although the literature outlines several features for characterizing liver lesions, it is unclear which is most relevant for describing early HCC. In this paper, we introduce an unconstrained GA feature selection algorithm based on a multi-objective Mahalanobis fitness function to improve the classification performance for early HCC. We compared our approach to a constrained Mahalanobis function and two other unconstrained functions using Welch's t-test and Gaussian Data Descriptors. The performance of each fitness function was evaluated by cross-validating a one-class SVM. The results show that the proposed multi-objective Mahalanobis fitness function is capable of significantly reducing data dimensionality (96.4%) and improving one-class classification of early HCC (0.84 AUC). Furthermore, the results provide strong evidence that intensity features extracted at the arterial to portal and arterial to equilibrium phases are important for classifying early HCC.
Particle Swarm Optimization approach to defect detection in armour ceramics.
Kesharaju, Manasa; Nagarajah, Romesh
2017-03-01
In this research, various extracted features were used in the development of an automated ultrasonic sensor based inspection system that enables defect classification in each ceramic component prior to despatch to the field. Classification is an important task and large number of irrelevant, redundant features commonly introduced to a dataset reduces the classifiers performance. Feature selection aims to reduce the dimensionality of the dataset while improving the performance of a classification system. In the context of a multi-criteria optimization problem (i.e. to minimize classification error rate and reduce number of features) such as one discussed in this research, the literature suggests that evolutionary algorithms offer good results. Besides, it is noted that Particle Swarm Optimization (PSO) has not been explored especially in the field of classification of high frequency ultrasonic signals. Hence, a binary coded Particle Swarm Optimization (BPSO) technique is investigated in the implementation of feature subset selection and to optimize the classification error rate. In the proposed method, the population data is used as input to an Artificial Neural Network (ANN) based classification system to obtain the error rate, as ANN serves as an evaluator of PSO fitness function. Copyright © 2016. Published by Elsevier B.V.
Nayak, Deepak Ranjan; Dash, Ratnakar; Majhi, Banshidhar
2017-01-01
This paper presents an automatic classification system for segregating pathological brain from normal brains in magnetic resonance imaging scanning. The proposed system employs contrast limited adaptive histogram equalization scheme to enhance the diseased region in brain MR images. Two-dimensional stationary wavelet transform is harnessed to extract features from the preprocessed images. The feature vector is constructed using the energy and entropy values, computed from the level- 2 SWT coefficients. Then, the relevant and uncorrelated features are selected using symmetric uncertainty ranking filter. Subsequently, the selected features are given input to the proposed AdaBoost with support vector machine classifier, where SVM is used as the base classifier of AdaBoost algorithm. To validate the proposed system, three standard MR image datasets, Dataset-66, Dataset-160, and Dataset- 255 have been utilized. The 5 runs of k-fold stratified cross validation results indicate the suggested scheme offers better performance than other existing schemes in terms of accuracy and number of features. The proposed system earns ideal classification over Dataset-66 and Dataset-160; whereas, for Dataset- 255, an accuracy of 99.45% is achieved. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Unsupervised Feature Selection Based on the Morisita Index for Hyperspectral Images
NASA Astrophysics Data System (ADS)
Golay, Jean; Kanevski, Mikhail
2017-04-01
Hyperspectral sensors are capable of acquiring images with hundreds of narrow and contiguous spectral bands. Compared with traditional multispectral imagery, the use of hyperspectral images allows better performance in discriminating between land-cover classes, but it also results in large redundancy and high computational data processing. To alleviate such issues, unsupervised feature selection techniques for redundancy minimization can be implemented. Their goal is to select the smallest subset of features (or bands) in such a way that all the information content of a data set is preserved as much as possible. The present research deals with the application to hyperspectral images of a recently introduced technique of unsupervised feature selection: the Morisita-Based filter for Redundancy Minimization (MBRM). MBRM is based on the (multipoint) Morisita index of clustering and on the Morisita estimator of Intrinsic Dimension (ID). The fundamental idea of the technique is to retain only the bands which contribute to increasing the ID of an image. In this way, redundant bands are disregarded, since they have no impact on the ID. Besides, MBRM has several advantages over benchmark techniques: in addition to its ability to deal with large data sets, it can capture highly-nonlinear dependences and its implementation is straightforward in any programming environment. Experimental results on freely available hyperspectral images show the good effectiveness of MBRM in remote sensing data processing. Comparisons with benchmark techniques are carried out and random forests are used to assess the performance of MBRM in reducing the data dimensionality without loss of relevant information. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158-171, 2000. [2] J. Golay, M. Kanevski, A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48(12), pp. 4070-4081, 2015. [3] J. Golay, M. Kanevski, Unsupervised feature selection based on the Morisita estimator of intrinsic dimension, arXiv:1608.05581, 2016.
Hyperspectral data discrimination methods
NASA Astrophysics Data System (ADS)
Casasent, David P.; Chen, Xuewen
2000-12-01
Hyperspectral data provides spectral response information that provides detailed chemical, moisture, and other description of constituent parts of an item. These new sensor data are useful in USDA product inspection. However, such data introduce problems such as the curse of dimensionality, the need to reduce the number of features used to accommodate realistic small training set sizes, and the need to employ discriminatory features and still achieve good generalization (comparable training and test set performance). Several two-step methods are compared to a new and preferable single-step spectral decomposition algorithm. Initial results on hyperspectral data for good/bad almonds and for good/bad (aflatoxin infested) corn kernels are presented. The hyperspectral application addressed differs greatly from prior USDA work (PLS) in which the level of a specific channel constituent in food was estimated. A validation set (separate from the test set) is used in selecting algorithm parameters. Threshold parameters are varied to select the best Pc operating point. Initial results show that nonlinear features yield improved performance.
Analysis of the GRNs Inference by Using Tsallis Entropy and a Feature Selection Approach
NASA Astrophysics Data System (ADS)
Lopes, Fabrício M.; de Oliveira, Evaldo A.; Cesar, Roberto M.
An important problem in the bioinformatics field is to understand how genes are regulated and interact through gene networks. This knowledge can be helpful for many applications, such as disease treatment design and drugs creation purposes. For this reason, it is very important to uncover the functional relationship among genes and then to construct the gene regulatory network (GRN) from temporal expression data. However, this task usually involves data with a large number of variables and small number of observations. In this way, there is a strong motivation to use pattern recognition and dimensionality reduction approaches. In particular, feature selection is specially important in order to select the most important predictor genes that can explain some phenomena associated with the target genes. This work presents a first study about the sensibility of entropy methods regarding the entropy functional form, applied to the problem of topology recovery of GRNs. The generalized entropy proposed by Tsallis is used to study this sensibility. The inference process is based on a feature selection approach, which is applied to simulated temporal expression data generated by an artificial gene network (AGN) model. The inferred GRNs are validated in terms of global network measures. Some interesting conclusions can be drawn from the experimental results, as reported for the first time in the present paper.
Intrinsic feature-based pose measurement for imaging motion compensation
Baba, Justin S.; Goddard, Jr., James Samuel
2014-08-19
Systems and methods for generating motion corrected tomographic images are provided. A method includes obtaining first images of a region of interest (ROI) to be imaged and associated with a first time, where the first images are associated with different positions and orientations with respect to the ROI. The method also includes defining an active region in the each of the first images and selecting intrinsic features in each of the first images based on the active region. Second, identifying a portion of the intrinsic features temporally and spatially matching intrinsic features in corresponding ones of second images of the ROI associated with a second time prior to the first time and computing three-dimensional (3D) coordinates for the portion of the intrinsic features. Finally, the method includes computing a relative pose for the first images based on the 3D coordinates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, R; Aguilera, T; Shultz, D
2014-06-15
Purpose: This study aims to develop predictive models of patient outcome by extracting advanced imaging features (i.e., Radiomics) from FDG-PET images. Methods: We acquired pre-treatment PET scans for 51 stage I NSCLC patients treated with SABR. We calculated 139 quantitative features from each patient PET image, including 5 morphological features, 8 statistical features, 27 texture features, and 100 features from the intensity-volume histogram. Based on the imaging features, we aim to distinguish between 2 risk groups of patients: those with regional failure or distant metastasis versus those without. We investigated 3 pattern classification algorithms: linear discriminant analysis (LDA), naive Bayesmore » (NB), and logistic regression (LR). To avoid the curse of dimensionality, we performed feature selection by first removing redundant features and then applying sequential forward selection using the wrapper approach. To evaluate the predictive performance, we performed 10-fold cross validation with 1000 random splits of the data and calculated the area under the ROC curve (AUC). Results: Feature selection identified 2 texture features (homogeneity and/or wavelet decompositions) for NB and LR, while for LDA SUVmax and one texture feature (correlation) were identified. All 3 classifiers achieved statistically significant improvements over conventional PET imaging metrics such as tumor volume (AUC = 0.668) and SUVmax (AUC = 0.737). Overall, NB achieved the best predictive performance (AUC = 0.806). This also compares favorably with MTV using the best threshold at an SUV of 11.6 (AUC = 0.746). At a sensitivity of 80%, NB achieved 69% specificity, while SUVmax and tumor volume only had 36% and 47% specificity. Conclusion: Through a systematic analysis of advanced PET imaging features, we are able to build models with improved predictive value over conventional imaging metrics. If validated in a large independent cohort, the proposed techniques could potentially aid in identifying patients who might benefit from adjuvant therapy.« less
Feature Screening in Ultrahigh Dimensional Cox's Model.
Yang, Guangren; Yu, Ye; Li, Runze; Buu, Anne
Survival data with ultrahigh dimensional covariates such as genetic markers have been collected in medical studies and other fields. In this work, we propose a feature screening procedure for the Cox model with ultrahigh dimensional covariates. The proposed procedure is distinguished from the existing sure independence screening (SIS) procedures (Fan, Feng and Wu, 2010, Zhao and Li, 2012) in that the proposed procedure is based on joint likelihood of potential active predictors, and therefore is not a marginal screening procedure. The proposed procedure can effectively identify active predictors that are jointly dependent but marginally independent of the response without performing an iterative procedure. We develop a computationally effective algorithm to carry out the proposed procedure and establish the ascent property of the proposed algorithm. We further prove that the proposed procedure possesses the sure screening property. That is, with the probability tending to one, the selected variable set includes the actual active predictors. We conduct Monte Carlo simulation to evaluate the finite sample performance of the proposed procedure and further compare the proposed procedure and existing SIS procedures. The proposed methodology is also demonstrated through an empirical analysis of a real data example.
Pérez-Beteta, Julián; Luque, Belén; Arregui, Elena; Calvo, Manuel; Borrás, José M; López, Carlos; Martino, Juan; Velasquez, Carlos; Asenjo, Beatriz; Benavides, Manuel; Herruzo, Ismael; Martínez-González, Alicia; Pérez-Romasanta, Luis; Arana, Estanislao; Pérez-García, Víctor M
2016-01-01
Objective: The main objective of this retrospective work was the study of three-dimensional (3D) heterogeneity measures of post-contrast pre-operative MR images acquired with T1 weighted sequences of patients with glioblastoma (GBM) as predictors of clinical outcome. Methods: 79 patients from 3 hospitals were included in the study. 16 3D textural heterogeneity measures were computed including run-length matrix (RLM) features (regional heterogeneity) and co-occurrence matrix (CM) features (local heterogeneity). The significance of the results was studied using Kaplan–Meier curves and Cox proportional hazards analysis. Correlation between the variables of the study was assessed using the Spearman's correlation coefficient. Results: Kaplan–Meyer survival analysis showed that 4 of the 11 RLM features and 4 of the 5 CM features considered were robust predictors of survival. The median survival differences in the most significant cases were of over 6 months. Conclusion: Heterogeneity measures computed on the post-contrast pre-operative T1 weighted MR images of patients with GBM are predictors of survival. Advances in knowledge: Texture analysis to assess tumour heterogeneity has been widely studied. However, most works develop a two-dimensional analysis, focusing only on one MRI slice to state tumour heterogeneity. The study of fully 3D heterogeneity textural features as predictors of clinical outcome is more robust and is not dependent on the selected slice of the tumour. PMID:27319577
Molina, David; Pérez-Beteta, Julián; Luque, Belén; Arregui, Elena; Calvo, Manuel; Borrás, José M; López, Carlos; Martino, Juan; Velasquez, Carlos; Asenjo, Beatriz; Benavides, Manuel; Herruzo, Ismael; Martínez-González, Alicia; Pérez-Romasanta, Luis; Arana, Estanislao; Pérez-García, Víctor M
2016-07-04
The main objective of this retrospective work was the study of three-dimensional (3D) heterogeneity measures of post-contrast pre-operative MR images acquired with T 1 weighted sequences of patients with glioblastoma (GBM) as predictors of clinical outcome. 79 patients from 3 hospitals were included in the study. 16 3D textural heterogeneity measures were computed including run-length matrix (RLM) features (regional heterogeneity) and co-occurrence matrix (CM) features (local heterogeneity). The significance of the results was studied using Kaplan-Meier curves and Cox proportional hazards analysis. Correlation between the variables of the study was assessed using the Spearman's correlation coefficient. Kaplan-Meyer survival analysis showed that 4 of the 11 RLM features and 4 of the 5 CM features considered were robust predictors of survival. The median survival differences in the most significant cases were of over 6 months. Heterogeneity measures computed on the post-contrast pre-operative T 1 weighted MR images of patients with GBM are predictors of survival. Texture analysis to assess tumour heterogeneity has been widely studied. However, most works develop a two-dimensional analysis, focusing only on one MRI slice to state tumour heterogeneity. The study of fully 3D heterogeneity textural features as predictors of clinical outcome is more robust and is not dependent on the selected slice of the tumour.
Caravaca, Juan; Soria-Olivas, Emilio; Bataller, Manuel; Serrano, Antonio J; Such-Miquel, Luis; Vila-Francés, Joan; Guerrero, Juan F
2014-02-01
This work presents the application of machine learning techniques to analyse the influence of physical exercise in the physiological properties of the heart, during ventricular fibrillation. To this end, different kinds of classifiers (linear and neural models) are used to classify between trained and sedentary rabbit hearts. The use of those classifiers in combination with a wrapper feature selection algorithm allows to extract knowledge about the most relevant features in the problem. The obtained results show that neural models outperform linear classifiers (better performance indices and a better dimensionality reduction). The most relevant features to describe the benefits of physical exercise are those related to myocardial heterogeneity, mean activation rate and activation complexity. © 2013 Published by Elsevier Ltd.
Zhao, Weixiang; Sankaran, Shankar; Ibáñez, Ana M; Dandekar, Abhaya M; Davis, Cristina E
2009-08-04
This study introduces two-dimensional (2-D) wavelet analysis to the classification of gas chromatogram differential mobility spectrometry (GC/DMS) data which are composed of retention time, compensation voltage, and corresponding intensities. One reported method to process such large data sets is to convert 2-D signals to 1-D signals by summing intensities either across retention time or compensation voltage, but it can lose important signal information in one data dimension. A 2-D wavelet analysis approach keeps the 2-D structure of original signals, while significantly reducing data size. We applied this feature extraction method to 2-D GC/DMS signals measured from control and disordered fruit and then employed two typical classification algorithms to testify the effects of the resultant features on chemical pattern recognition. Yielding a 93.3% accuracy of separating data from control and disordered fruit samples, 2-D wavelet analysis not only proves its feasibility to extract feature from original 2-D signals but also shows its superiority over the conventional feature extraction methods including converting 2-D to 1-D and selecting distinguishable pixels from training set. Furthermore, this process does not require coupling with specific pattern recognition methods, which may help ensure wide applications of this method to 2-D spectrometry data.
NASA Astrophysics Data System (ADS)
Awad, Joseph; Krasinski, Adam; Spence, David; Parraga, Grace; Fenster, Aaron
2010-03-01
Carotid atherosclerosis is the major cause of ischemic stroke, a leading cause of death and disability. This is driving the development of image analysis methods to quantitatively evaluate local arterial effects of potential treatments of carotid disease. Here we investigate the use of novel texture analysis tools to detect potential changes in the carotid arteries after statin therapy. Three-dimensional (3D) carotid ultrasound images were acquired from the left and right carotid arteries of 35 subjects (16 treated with 80 mg atorvastatin and 19 treated with placebo) at baseline and after 3 months of treatment. Two-hundred and seventy texture features were extracted from 3D ultrasound carotid artery images. These images previously had their vessel walls (VW) manually segmented. Highly ranked individual texture features were selected and compared to the VW volume (VWV) change using 3 measures: distance between classes, Wilcoxon rank sum test, and accuracy of the classifiers. Six classifiers were used. Using texture feature (L7R7) increases the average accuracy and area under the ROC curve to 74.4% and 0.72 respectively compared to 57.2% and 0.61 using VWV change. Thus, the results demonstrate that texture features are more sensitive in detecting drug effects on the carotid vessel wall than VWV change.
Sultan, Mohammad M; Kiss, Gert; Shukla, Diwakar; Pande, Vijay S
2014-12-09
Given the large number of crystal structures and NMR ensembles that have been solved to date, classical molecular dynamics (MD) simulations have become powerful tools in the atomistic study of the kinetics and thermodynamics of biomolecular systems on ever increasing time scales. By virtue of the high-dimensional conformational state space that is explored, the interpretation of large-scale simulations faces difficulties not unlike those in the big data community. We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach. It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states. We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.
NASA Astrophysics Data System (ADS)
Bhardwaj, Kaushal; Patra, Swarnajyoti
2018-04-01
Inclusion of spatial information along with spectral features play a significant role in classification of remote sensing images. Attribute profiles have already proved their ability to represent spatial information. In order to incorporate proper spatial information, multiple attributes are required and for each attribute large profiles need to be constructed by varying the filter parameter values within a wide range. Thus, the constructed profiles that represent spectral-spatial information of an hyperspectral image have huge dimension which leads to Hughes phenomenon and increases computational burden. To mitigate these problems, this work presents an unsupervised feature selection technique that selects a subset of filtered image from the constructed high dimensional multi-attribute profile which are sufficiently informative to discriminate well among classes. In this regard the proposed technique exploits genetic algorithms (GAs). The fitness function of GAs are defined in an unsupervised way with the help of mutual information. The effectiveness of the proposed technique is assessed using one-against-all support vector machine classifier. The experiments conducted on three hyperspectral data sets show the robustness of the proposed method in terms of computation time and classification accuracy.
Ho, Sirikit; Lukacs, Zoltan; Hoffmann, Georg F; Lindner, Martin; Wetter, Thomas
2007-07-01
In newborn screening with tandem mass spectrometry, multiple intermediary metabolites are quantified in a single analytical run for the diagnosis of fatty-acid oxidation disorders, organic acidurias, and aminoacidurias. Published diagnostic criteria for these disorders normally incorporate a primary metabolic marker combined with secondary markers, often analyte ratios, for which the markers have been chosen to reflect metabolic pathway deviations. We applied a procedure to extract new markers and diagnostic criteria for newborn screening to the data of newborns with confirmed medium-chain acyl-CoA dehydrogenase deficiency (MCADD) and a control group from the newborn screening program, Heidelberg, Germany. We validated the results with external data of the screening center in Hamburg, Germany. We extracted new markers by performing a systematic search for analyte combinations (features) with high discriminatory performance for MCADD. To select feature thresholds, we applied automated procedures to separate controls and cases on the basis of the feature values. Finally, we built classifiers from these new markers to serve as diagnostic criteria in screening for MCADD. On the basis of chi(2) scores, we identified approximately 800 of >628,000 new analyte combinations with superior discriminatory performance compared with the best published combinations. Classifiers built with the new features achieved diagnostic sensitivities and specificities approaching 100%. Feature construction methods provide ways to disclose information hidden in the set of measured analytes. Other diagnostic tasks based on high-dimensional metabolic data might also profit from this approach.
Online 3D Ear Recognition by Combining Global and Local Features.
Liu, Yahui; Zhang, Bob; Lu, Guangming; Zhang, David
2016-01-01
The three-dimensional shape of the ear has been proven to be a stable candidate for biometric authentication because of its desirable properties such as universality, uniqueness, and permanence. In this paper, a special laser scanner designed for online three-dimensional ear acquisition was described. Based on the dataset collected by our scanner, two novel feature classes were defined from a three-dimensional ear image: the global feature class (empty centers and angles) and local feature class (points, lines, and areas). These features are extracted and combined in an optimal way for three-dimensional ear recognition. Using a large dataset consisting of 2,000 samples, the experimental results illustrate the effectiveness of fusing global and local features, obtaining an equal error rate of 2.2%.
Online 3D Ear Recognition by Combining Global and Local Features
Liu, Yahui; Zhang, Bob; Lu, Guangming; Zhang, David
2016-01-01
The three-dimensional shape of the ear has been proven to be a stable candidate for biometric authentication because of its desirable properties such as universality, uniqueness, and permanence. In this paper, a special laser scanner designed for online three-dimensional ear acquisition was described. Based on the dataset collected by our scanner, two novel feature classes were defined from a three-dimensional ear image: the global feature class (empty centers and angles) and local feature class (points, lines, and areas). These features are extracted and combined in an optimal way for three-dimensional ear recognition. Using a large dataset consisting of 2,000 samples, the experimental results illustrate the effectiveness of fusing global and local features, obtaining an equal error rate of 2.2%. PMID:27935955
NASA Astrophysics Data System (ADS)
Shah, Syed Muhammad Saqlain; Batool, Safeera; Khan, Imran; Ashraf, Muhammad Usman; Abbas, Syed Hussnain; Hussain, Syed Adnan
2017-09-01
Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.
Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes
NASA Astrophysics Data System (ADS)
Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie
2017-03-01
Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).
NASA Astrophysics Data System (ADS)
Wang, Dong
2016-03-01
Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.
Use of upscaled elevation and surface roughness data in two-dimensional surface water models
Hughes, J.D.; Decker, J.D.; Langevin, C.D.
2011-01-01
In this paper, we present an approach that uses a combination of cell-block- and cell-face-averaging of high-resolution cell elevation and roughness data to upscale hydraulic parameters and accurately simulate surface water flow in relatively low-resolution numerical models. The method developed allows channelized features that preferentially connect large-scale grid cells at cell interfaces to be represented in models where these features are significantly smaller than the selected grid size. The developed upscaling approach has been implemented in a two-dimensional finite difference model that solves a diffusive wave approximation of the depth-integrated shallow surface water equations using preconditioned Newton–Krylov methods. Computational results are presented to show the effectiveness of the mixed cell-block and cell-face averaging upscaling approach in maintaining model accuracy, reducing model run-times, and how decreased grid resolution affects errors. Application examples demonstrate that sub-grid roughness coefficient variations have a larger effect on simulated error than sub-grid elevation variations.
Two-Dimensional-Material Membranes: A New Family of High-Performance Separation Membranes.
Liu, Gongping; Jin, Wanqin; Xu, Nanping
2016-10-17
Two-dimensional (2D) materials of atomic thickness have emerged as nano-building blocks to develop high-performance separation membranes that feature unique nanopores and/or nanochannels. These 2D-material membranes exhibit extraordinary permeation properties, opening a new avenue to ultra-fast and highly selective membranes for water and gas separation. Summarized in this Minireview are the latest ground-breaking studies in 2D-material membranes as nanosheet and laminar membranes, with a focus on starting materials, nanostructures, and transport properties. Challenges and future directions of 2D-material membranes for wide implementation are discussed briefly. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Hypergraph-based anomaly detection of high-dimensional co-occurrences.
Silva, Jorge; Willett, Rebecca
2009-03-01
This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational Expectation-Maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the False Discovery Rate. The algorithm has O(np) computational complexity, where n is the number of training observations and p is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where p > 75,000, and it is shown that it can outperform other state-of-the-art methods.
Analyzing multicomponent receptive fields from neural responses to natural stimuli
Rowekamp, Ryan; Sharpee, Tatyana O
2011-01-01
The challenge of building increasingly better models of neural responses to natural stimuli is to accurately estimate the multiple stimulus features that may jointly affect the neural spike probability. The selectivity for combinations of features is thought to be crucial for achieving classical properties of neural responses such as contrast invariance. The joint search for these multiple stimulus features is difficult because estimating spike probability as a multidimensional function of stimulus projections onto candidate relevant dimensions is subject to the curse of dimensionality. An attractive alternative is to search for relevant dimensions sequentially, as in projection pursuit regression. Here we demonstrate using analytic arguments and simulations of model cells that different types of sequential search strategies exhibit systematic biases when used with natural stimuli. Simulations show that joint optimization is feasible for up to three dimensions with current algorithms. When applied to the responses of V1 neurons to natural scenes, models based on three jointly optimized dimensions had better predictive power in a majority of cases compared to dimensions optimized sequentially, with different sequential methods yielding comparable results. Thus, although the curse of dimensionality remains, at least several relevant dimensions can be estimated by joint information maximization. PMID:21780916
NASA Astrophysics Data System (ADS)
Li, Na; Gong, Xingyu; Li, Hongan; Jia, Pengtao
2018-01-01
For faded relics, such as Terracotta Army, the 2D-3D registration between an optical camera and point cloud model is an important part for color texture reconstruction and further applications. This paper proposes a nonuniform multiview color texture mapping for the image sequence and the three-dimensional (3D) model of point cloud collected by Handyscan3D. We first introduce nonuniform multiview calibration, including the explanation of its algorithm principle and the analysis of its advantages. We then establish transformation equations based on sift feature points for the multiview image sequence. At the same time, the selection of nonuniform multiview sift feature points is introduced in detail. Finally, the solving process of the collinear equations based on multiview perspective projection is given with three steps and the flowchart. In the experiment, this method is applied to the color reconstruction of the kneeling figurine, Tangsancai lady, and general figurine. These results demonstrate that the proposed method provides an effective support for the color reconstruction of the faded cultural relics and be able to improve the accuracy of 2D-3D registration between the image sequence and the point cloud model.
Image Recommendation Algorithm Using Feature-Based Collaborative Filtering
NASA Astrophysics Data System (ADS)
Kim, Deok-Hwan
As the multimedia contents market continues its rapid expansion, the amount of image contents used in mobile phone services, digital libraries, and catalog service is increasing remarkably. In spite of this rapid growth, users experience high levels of frustration when searching for the desired image. Even though new images are profitable to the service providers, traditional collaborative filtering methods cannot recommend them. To solve this problem, in this paper, we propose feature-based collaborative filtering (FBCF) method to reflect the user's most recent preference by representing his purchase sequence in the visual feature space. The proposed approach represents the images that have been purchased in the past as the feature clusters in the multi-dimensional feature space and then selects neighbors by using an inter-cluster distance function between their feature clusters. Various experiments using real image data demonstrate that the proposed approach provides a higher quality recommendation and better performance than do typical collaborative filtering and content-based filtering techniques.
Takahama, Sachiko; Saiki, Jun
2014-01-01
Information on an object's features bound to its location is very important for maintaining object representations in visual working memory. Interactions with dynamic multi-dimensional objects in an external environment require complex cognitive control, including the selective maintenance of feature-location binding. Here, we used event-related functional magnetic resonance imaging to investigate brain activity and functional connectivity related to the maintenance of complex feature-location binding. Participants were required to detect task-relevant changes in feature-location binding between objects defined by color, orientation, and location. We compared a complex binding task requiring complex feature-location binding (color-orientation-location) with a simple binding task in which simple feature-location binding, such as color-location, was task-relevant and the other feature was task-irrelevant. Univariate analyses showed that the dorsolateral prefrontal cortex (DLPFC), hippocampus, and frontoparietal network were activated during the maintenance of complex feature-location binding. Functional connectivity analyses indicated cooperation between the inferior precentral sulcus (infPreCS), DLPFC, and hippocampus during the maintenance of complex feature-location binding. In contrast, the connectivity for the spatial updating of simple feature-location binding determined by reanalyzing the data from Takahama et al. (2010) demonstrated that the superior parietal lobule (SPL) cooperated with the DLPFC and hippocampus. These results suggest that the connectivity for complex feature-location binding does not simply reflect general memory load and that the DLPFC and hippocampus flexibly modulate the dorsal frontoparietal network, depending on the task requirements, with the infPreCS involved in the maintenance of complex feature-location binding and the SPL involved in the spatial updating of simple feature-location binding. PMID:24917833
Takahama, Sachiko; Saiki, Jun
2014-01-01
Information on an object's features bound to its location is very important for maintaining object representations in visual working memory. Interactions with dynamic multi-dimensional objects in an external environment require complex cognitive control, including the selective maintenance of feature-location binding. Here, we used event-related functional magnetic resonance imaging to investigate brain activity and functional connectivity related to the maintenance of complex feature-location binding. Participants were required to detect task-relevant changes in feature-location binding between objects defined by color, orientation, and location. We compared a complex binding task requiring complex feature-location binding (color-orientation-location) with a simple binding task in which simple feature-location binding, such as color-location, was task-relevant and the other feature was task-irrelevant. Univariate analyses showed that the dorsolateral prefrontal cortex (DLPFC), hippocampus, and frontoparietal network were activated during the maintenance of complex feature-location binding. Functional connectivity analyses indicated cooperation between the inferior precentral sulcus (infPreCS), DLPFC, and hippocampus during the maintenance of complex feature-location binding. In contrast, the connectivity for the spatial updating of simple feature-location binding determined by reanalyzing the data from Takahama et al. (2010) demonstrated that the superior parietal lobule (SPL) cooperated with the DLPFC and hippocampus. These results suggest that the connectivity for complex feature-location binding does not simply reflect general memory load and that the DLPFC and hippocampus flexibly modulate the dorsal frontoparietal network, depending on the task requirements, with the infPreCS involved in the maintenance of complex feature-location binding and the SPL involved in the spatial updating of simple feature-location binding.
Sampling and Visualizing Creases with Scale-Space Particles
Kindlmann, Gordon L.; Estépar, Raúl San José; Smith, Stephen M.; Westin, Carl-Fredrik
2010-01-01
Particle systems have gained importance as a methodology for sampling implicit surfaces and segmented objects to improve mesh generation and shape analysis. We propose that particle systems have a significantly more general role in sampling structure from unsegmented data. We describe a particle system that computes samplings of crease features (i.e. ridges and valleys, as lines or surfaces) that effectively represent many anatomical structures in scanned medical data. Because structure naturally exists at a range of sizes relative to the image resolution, computer vision has developed the theory of scale-space, which considers an n-D image as an (n + 1)-D stack of images at different blurring levels. Our scale-space particles move through continuous four-dimensional scale-space according to spatial constraints imposed by the crease features, a particle-image energy that draws particles towards scales of maximal feature strength, and an inter-particle energy that controls sampling density in space and scale. To make scale-space practical for large three-dimensional data, we present a spline-based interpolation across scale from a small number of pre-computed blurrings at optimally selected scales. The configuration of the particle system is visualized with tensor glyphs that display information about the local Hessian of the image, and the scale of the particle. We use scale-space particles to sample the complex three-dimensional branching structure of airways in lung CT, and the major white matter structures in brain DTI. PMID:19834216
Microscale patterning of thermoplastic polymer surfaces by selective solvent swelling.
Rahmanian, Omid; Chen, Chien-Fu; DeVoe, Don L
2012-09-04
A new method for the fabrication of microscale features in thermoplastic substrates is presented. Unlike traditional thermoplastic microfabrication techniques, in which bulk polymer is displaced from the substrate by machining or embossing, a unique process termed orogenic microfabrication has been developed in which selected regions of a thermoplastic surface are raised from the substrate by an irreversible solvent swelling mechanism. The orogenic technique allows thermoplastic surfaces to be patterned using a variety of masking methods, resulting in three-dimensional features that would be difficult to achieve through traditional microfabrication methods. Using cyclic olefin copolymer as a model thermoplastic material, several variations of this process are described to realize growth heights ranging from several nanometers to tens of micrometers, with patterning techniques include direct photoresist masking, patterned UV/ozone surface passivation, elastomeric stamping, and noncontact spotting. Orogenic microfabrication is also demonstrated by direct inkjet printing as a facile photolithography-free masking method for rapid desktop thermoplastic microfabrication.
Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.; ...
2017-04-09
The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.
The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less
Musumeci, Matías A; Lozada, Mariana; Rial, Daniela V; Mac Cormack, Walter P; Jansson, Janet K; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M
2017-04-09
The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.
Musumeci, Matías A.; Lozada, Mariana; Rial, Daniela V.; Mac Cormack, Walter P.; Jansson, Janet K.; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M.
2017-01-01
The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer–Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments. PMID:28397770
NASA Astrophysics Data System (ADS)
Haitjema, Henk M.
1985-10-01
A technique is presented to incorporate three-dimensional flow in a Dupuit-Forchheimer model. The method is based on superposition of approximate analytic solutions to both two- and three-dimensional flow features in a confined aquifer of infinite extent. Three-dimensional solutions are used in the domain of interest, while farfield conditions are represented by two-dimensional solutions. Approximate three- dimensional solutions have been derived for a partially penetrating well and a shallow creek. Each of these solutions satisfies the condition that no flow occurs across the confining layers of the aquifer. Because of this condition, the flow at some distance of a three-dimensional feature becomes nearly horizontal. Consequently, remotely from a three-dimensional feature, its three-dimensional solution is replaced by a corresponding two-dimensional one. The latter solution is trivial as compared to its three-dimensional counterpart, and its use greatly enhances the computational efficiency of the model. As an example, the flow is modeled between a partially penetrating well and a shallow creek that occur in a regional aquifer system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lopez, Juan; Liefer, Nathan C.; Busho, Colin R.
Here, the need for improved Critical Infrastructure and Key Resource (CIKR) security is unquestioned and there has been minimal emphasis on Level-0 (PHY Process) improvements. Wired Signal Distinct Native Attribute (WS-DNA) Fingerprinting is investigated here as a non-intrusive PHY-based security augmentation to support an envisioned layered security strategy. Results are based on experimental response collections from Highway Addressable Remote Transducer (HART) Differential Pressure Transmitter (DPT) devices from three manufacturers (Yokogawa, Honeywell, Endress+Hauer) installed in an automated process control system. Device discrimination is assessed using Time Domain (TD) and Slope-Based FSK (SB-FSK) fingerprints input to Multiple Discriminant Analysis, Maximum Likelihood (MDA/ML)more » and Random Forest (RndF) classifiers. For 12 different classes (two devices per manufacturer at two distinct set points), both classifiers performed reliably and achieved an arbitrary performance benchmark of average cross-class percent correct of %C > 90%. The least challenging cross-manufacturer results included near-perfect %C ≈ 100%, while the more challenging like-model (serial number) discrimination results included 90%< %C < 100%, with TD Fingerprinting marginally outperforming SB-FSK Fingerprinting; SB-FSK benefits from having less stringent response alignment and registration requirements. The RndF classifier was most beneficial and enabled reliable selection of dimensionally reduced fingerprint subsets that minimize data storage and computational requirements. The RndF selected feature sets contained 15% of the full-dimensional feature sets and only suffered a worst case %CΔ = 3% to 4% performance degradation.« less
Bearing Fault Diagnosis Based on Statistical Locally Linear Embedding
Wang, Xiang; Zheng, Yuan; Zhao, Zhenzhou; Wang, Jinping
2015-01-01
Fault diagnosis is essentially a kind of pattern recognition. The measured signal samples usually distribute on nonlinear low-dimensional manifolds embedded in the high-dimensional signal space, so how to implement feature extraction, dimensionality reduction and improve recognition performance is a crucial task. In this paper a novel machinery fault diagnosis approach based on a statistical locally linear embedding (S-LLE) algorithm which is an extension of LLE by exploiting the fault class label information is proposed. The fault diagnosis approach first extracts the intrinsic manifold features from the high-dimensional feature vectors which are obtained from vibration signals that feature extraction by time-domain, frequency-domain and empirical mode decomposition (EMD), and then translates the complex mode space into a salient low-dimensional feature space by the manifold learning algorithm S-LLE, which outperforms other feature reduction methods such as PCA, LDA and LLE. Finally in the feature reduction space pattern classification and fault diagnosis by classifier are carried out easily and rapidly. Rolling bearing fault signals are used to validate the proposed fault diagnosis approach. The results indicate that the proposed approach obviously improves the classification performance of fault pattern recognition and outperforms the other traditional approaches. PMID:26153771
Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.
Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao
2015-04-01
Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Feeling form: the neural basis of haptic shape perception.
Yau, Jeffrey M; Kim, Sung Soo; Thakur, Pramodsingh H; Bensmaia, Sliman J
2016-02-01
The tactile perception of the shape of objects critically guides our ability to interact with them. In this review, we describe how shape information is processed as it ascends the somatosensory neuraxis of primates. At the somatosensory periphery, spatial form is represented in the spatial patterns of activation evoked across populations of mechanoreceptive afferents. In the cerebral cortex, neurons respond selectively to particular spatial features, like orientation and curvature. While feature selectivity of neurons in the earlier processing stages can be understood in terms of linear receptive field models, higher order somatosensory neurons exhibit nonlinear response properties that result in tuning for more complex geometrical features. In fact, tactile shape processing bears remarkable analogies to its visual counterpart and the two may rely on shared neural circuitry. Furthermore, one of the unique aspects of primate somatosensation is that it contains a deformable sensory sheet. Because the relative positions of cutaneous mechanoreceptors depend on the conformation of the hand, the haptic perception of three-dimensional objects requires the integration of cutaneous and proprioceptive signals, an integration that is observed throughout somatosensory cortex. Copyright © 2016 the American Physiological Society.
ERIC Educational Resources Information Center
Chan, Louis K. H.; Hayward, William G.
2009-01-01
In feature integration theory (FIT; A. Treisman & S. Sato, 1990), feature detection is driven by independent dimensional modules, and other searches are driven by a master map of locations that integrates dimensional information into salience signals. Although recent theoretical models have largely abandoned this distinction, some observed…
Ma, Li; Fan, Suohai
2017-03-14
The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
Oded, Meirav; Kelly, Stephen T.; Gilles, Mary K.; ...
2016-07-05
The combination of block copolymer templating with electrostatic self-assembly provides a simple and robust method for creating nano-patterned polyelectrolyte multilayers over large areas. The deposition of the first polyelectrolyte layer provides important insights on the initial stages of multilayer buildup. Here, we focus on two-dimensionally confined “dots” patterns afforded by block copolymer films featuring hexagonally-packed cylinders that are oriented normal to the substrate. Rendering the cylinder caps positively charged enables the selective deposition of negatively charged polyelectrolytes on them under salt-free conditions. The initially formed polyelectrolyte nanostructures adopt a toroidal (“doughnut”) shape, which results from retraction of dangling polyelectrolyte segmentsmore » into the “dots” upon drying. With increasing exposure time to the polyelectrolyte solution, the final shape of the deposited polyelectrolyte transitions from a doughnut to a hemisphere. In conclusion, these insights would enable the creation of patterned polyelectrolyte multilayers with increased control over adsorption selectivity of the additional incoming polyelectrolytes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oded, Meirav; Kelly, Stephen T.; Gilles, Mary K.
The combination of block copolymer templating with electrostatic self-assembly provides a simple and robust method for creating nano-patterned polyelectrolyte multilayers over large areas. The deposition of the first polyelectrolyte layer provides important insights on the initial stages of multilayer buildup. Here, we focus on two-dimensionally confined “dots” patterns afforded by block copolymer films featuring hexagonally-packed cylinders that are oriented normal to the substrate. Rendering the cylinder caps positively charged enables the selective deposition of negatively charged polyelectrolytes on them under salt-free conditions. The initially formed polyelectrolyte nanostructures adopt a toroidal (“doughnut”) shape, which results from retraction of dangling polyelectrolyte segmentsmore » into the “dots” upon drying. With increasing exposure time to the polyelectrolyte solution, the final shape of the deposited polyelectrolyte transitions from a doughnut to a hemisphere. In conclusion, these insights would enable the creation of patterned polyelectrolyte multilayers with increased control over adsorption selectivity of the additional incoming polyelectrolytes.« less
NASA Technical Reports Server (NTRS)
Wilson, R. B.; Banerjee, P. K.
1987-01-01
This Annual Status Report presents the results of work performed during the third year of the 3-D Inelastic Analysis Methods for Hot Sections Components program (NASA Contract NAS3-23697). The objective of the program is to produce a series of computer codes that permit more accurate and efficient three-dimensional analyses of selected hot section components, i.e., combustor liners, turbine blades, and turbine vanes. The computer codes embody a progression of mathematical models and are streamlined to take advantage of geometrical features, loading conditions, and forms of material response that distinguish each group of selected components.
Shahidi, Shoaleh; Bahrampour, Ehsan; Soltanimehr, Elham; Zamani, Ali; Oshagh, Morteza; Moattari, Marzieh; Mehdizadeh, Alireza
2014-09-16
Two-dimensional projection radiographs have been traditionally considered the modality of choice for cephalometric analysis. To overcome the shortcomings of two-dimensional images, three-dimensional computed tomography (CT) has been used to evaluate craniofacial structures. However, manual landmark detection depends on medical expertise, and the process is time-consuming. The present study was designed to produce software capable of automated localization of craniofacial landmarks on cone beam (CB) CT images based on image registration and to evaluate its accuracy. The software was designed using MATLAB programming language. The technique was a combination of feature-based (principal axes registration) and voxel similarity-based methods for image registration. A total of 8 CBCT images were selected as our reference images for creating a head atlas. Then, 20 CBCT images were randomly selected as the test images for evaluating the method. Three experts twice located 14 landmarks in all 28 CBCT images during two examinations set 6 weeks apart. The differences in the distances of coordinates of each landmark on each image between manual and automated detection methods were calculated and reported as mean errors. The combined intraclass correlation coefficient for intraobserver reliability was 0.89 and for interobserver reliability 0.87 (95% confidence interval, 0.82 to 0.93). The mean errors of all 14 landmarks were <4 mm. Additionally, 63.57% of landmarks had a mean error of <3 mm compared with manual detection (gold standard method). The accuracy of our approach for automated localization of craniofacial landmarks, which was based on combining feature-based and voxel similarity-based methods for image registration, was acceptable. Nevertheless we recommend repetition of this study using other techniques, such as intensity-based methods.
Novel gene sets improve set-level classification of prokaryotic gene expression data.
Holec, Matěj; Kuželka, Ondřej; Železný, Filip
2015-10-28
Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.
Cinelli, Mattia; Sun, Yuxin; Best, Katharine; Heather, James M; Reich-Zeliger, Shlomit; Shifrut, Eric; Friedman, Nir; Shawe-Taylor, John; Chain, Benny
2017-04-01
Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund's adjuvant (CFA) or CFA alone. The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund's Adjuvant. The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893 . The Decombinator package is available at github.com/innate2adaptive/Decombinator . The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html . b.chain@ucl.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Heus, Thijs; Jonker, Harm J. J.; van den Akker, Harry E. A.; Griffith, Eric J.; Koutek, Michal; Post, Frits H.
2009-03-01
In this study, a new method is developed to investigate the entire life cycle of shallow cumuli in large eddy simulations. Although trained observers have no problem in distinguishing the different life stages of a cloud, this process proves difficult to automate, because cloud-splitting and cloud-merging events complicate the distinction between a single system divided in several cloudy parts and two independent systems that collided. Because the human perception is well equipped to capture and to make sense of these time-dependent three-dimensional features, a combination of automated constraints and human inspection in a three-dimensional virtual reality environment is used to select clouds that are exemplary in their behavior throughout their entire life span. Three specific cases (ARM, BOMEX, and BOMEX without large-scale forcings) are analyzed in this way, and the considerable number of selected clouds warrants reliable statistics of cloud properties conditioned on the phase in their life cycle. The most dominant feature in this statistical life cycle analysis is the pulsating growth that is present throughout the entire lifetime of the cloud, independent of the case and of the large-scale forcings. The pulses are a self-sustained phenomenon, driven by a balance between buoyancy and horizontal convergence of dry air. The convective inhibition just above the cloud base plays a crucial role as a barrier for the cloud to overcome in its infancy stage, and as a buffer region later on, ensuring a steady supply of buoyancy into the cloud.
Latent feature decompositions for integrative analysis of multi-platform genomic data
Gregory, Karl B.; Momin, Amin A.; Coombes, Kevin R.; Baladandayuthapani, Veerabhadran
2015-01-01
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to, a glioblastoma multiforme dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features. PMID:26146492
NASA Astrophysics Data System (ADS)
Wisesty, Untari N.; Warastri, Riris S.; Puspitasari, Shinta Y.
2018-03-01
Cancer is one of the major causes of mordibility and mortality problems in the worldwide. Therefore, the need of a system that can analyze and identify a person suffering from a cancer by using microarray data derived from the patient’s Deoxyribonucleic Acid (DNA). But on microarray data has thousands of attributes, thus making the challenges in data processing. This is often referred to as the curse of dimensionality. Therefore, in this study built a system capable of detecting a patient whether contracted cancer or not. The algorithm used is Genetic Algorithm as feature selection and Momentum Backpropagation Neural Network as a classification method, with data used from the Kent Ridge Bio-medical Dataset. Based on system testing that has been done, the system can detect Leukemia and Colon Tumor with best accuracy equal to 98.33% for colon tumor data and 100% for leukimia data. Genetic Algorithm as feature selection algorithm can improve system accuracy, which is from 64.52% to 98.33% for colon tumor data and 65.28% to 100% for leukemia data, and the use of momentum parameters can accelerate the convergence of the system in the training process of Neural Network.
Recursive feature elimination for biomarker discovery in resting-state functional connectivity.
Ravishankar, Hariharan; Madhavan, Radhika; Mullick, Rakesh; Shetty, Teena; Marinelli, Luca; Joel, Suresh E
2016-08-01
Biomarker discovery involves finding correlations between features and clinical symptoms to aid clinical decision. This task is especially difficult in resting state functional magnetic resonance imaging (rs-fMRI) data due to low SNR, high-dimensionality of images, inter-subject and intra-subject variability and small numbers of subjects compared to the number of derived features. Traditional univariate analysis suffers from the problem of multiple comparisons. Here, we adopt an alternative data-driven method for identifying population differences in functional connectivity. We propose a machine-learning approach to down-select functional connectivity features associated with symptom severity in mild traumatic brain injury (mTBI). Using this approach, we identified functional regions with altered connectivity in mTBI. including the executive control, visual and precuneus networks. We compared functional connections at multiple resolutions to determine which scale would be more sensitive to changes related to patient recovery. These modular network-level features can be used as diagnostic tools for predicting disease severity and recovery profiles.
Feature extraction and classification algorithms for high dimensional data
NASA Technical Reports Server (NTRS)
Lee, Chulhee; Landgrebe, David
1993-01-01
Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized. By investigating the characteristics of high dimensional data, the reason why the second order statistics must be taken into account in high dimensional data is suggested. Recognizing the importance of the second order statistics, there is a need to represent the second order statistics. A method to visualize statistics using a color code is proposed. By representing statistics using color coding, one can easily extract and compare the first and the second statistics.
Using Interactive 3D PDF for Exploring Complex Biomedical Data: Experiences and Solutions.
Newe, Axel; Becker, Linda
2016-01-01
The Portable Document Format (PDF) is the most commonly used file format for the exchange of electronic documents. A lesser-known feature of PDF is the possibility to embed three-dimensional models and to display these models interactively with a qualified reader. This technology is well suited to present, to explore and to communicate complex biomedical data. This applies in particular for data which would suffer from a loss of information if it was reduced to a static two-dimensional projection. In this article, we present applications of 3D PDF for selected scholarly and clinical use cases in the biomedical domain. Furthermore, we present a sophisticated tool for the generation of respective PDF documents.
Thomas Edwin Beechem; McDonald, Anthony E.; Ohta, Taisuke; ...
2015-10-26
Oxidation of exfoliated gallium selenide (GaSe) is investigated through Raman, photoluminescence, Auger, and X-ray photoelectron spectroscopies. Photoluminescence and Raman intensity reductions associated with spectral features of GaSe are shown to coincide with the emergence of signatures emanating from the by-products of the oxidation reaction, namely, Ga 2Se 3 and amorphous Se. Furthermore, photoinduced oxidation is initiated over a portion of a flake highlighting the potential for laser based patterning of two-dimensional heterostructures via selective oxidation.
The Design and Performance Characteristics of a Cellular Logic 3-D Image Classification Processor.
1981-04-01
34 AGARD Proc. No. 94 on Artificiel Intelligence , 217: 1-13 (1971) 7. Golay, Marcel J. E. "Hexagonal Parallel Pattern Transformations." IEEE Trans. on...nonrandom nature of the data and features must be understood in order to intelligently select a reasonable three-dimensional noise filter. This completes...tactical targets which are located hundreds of meters away and are controlled and disguised by equally intelligent human beings, the difficulty of the
Detecting multiple moving objects in crowded environments with coherent motion regions
Cheriyadat, Anil M.; Radke, Richard J.
2013-06-11
Coherent motion regions extend in time as well as space, enforcing consistency in detected objects over long time periods and making the algorithm robust to noisy or short point tracks. As a result of enforcing the constraint that selected coherent motion regions contain disjoint sets of tracks defined in a three-dimensional space including a time dimension. An algorithm operates directly on raw, unconditioned low-level feature point tracks, and minimizes a global measure of the coherent motion regions. At least one discrete moving object is identified in a time series of video images based on the trajectory similarity factors, which is a measure of a maximum distance between a pair of feature point tracks.
Development of automatic body condition scoring using a low-cost 3-dimensional Kinect camera.
Spoliansky, Roii; Edan, Yael; Parmet, Yisrael; Halachmi, Ilan
2016-09-01
Body condition scoring (BCS) is a farm-management tool for estimating dairy cows' energy reserves. Today, BCS is performed manually by experts. This paper presents a 3-dimensional algorithm that provides a topographical understanding of the cow's body to estimate BCS. An automatic BCS system consisting of a Kinect camera (Microsoft Corp., Redmond, WA) triggered by a passive infrared motion detector was designed and implemented. Image processing and regression algorithms were developed and included the following steps: (1) image restoration, the removal of noise; (2) object recognition and separation, identification and separation of the cows; (3) movie and image selection, selection of movies and frames that include the relevant data; (4) image rotation, alignment of the cow parallel to the x-axis; and (5) image cropping and normalization, removal of irrelevant data, setting the image size to 150×200 pixels, and normalizing image values. All steps were performed automatically, including image selection and classification. Fourteen individual features per cow, derived from the cows' topography, were automatically extracted from the movies and from the farm's herd-management records. These features appear to be measurable in a commercial farm. Manual BCS was performed by a trained expert and compared with the output of the training set. A regression model was developed, correlating the features with the manual BCS references. Data were acquired for 4 d, resulting in a database of 422 movies of 101 cows. Movies containing cows' back ends were automatically selected (389 movies). The data were divided into a training set of 81 cows and a test set of 20 cows; both sets included the identical full range of BCS classes. Accuracy tests gave a mean absolute error of 0.26, median absolute error of 0.19, and coefficient of determination of 0.75, with 100% correct classification within 1 step and 91% correct classification within a half step for BCS classes. Results indicated good repeatability, with all standard deviations under 0.33. The algorithm is independent of the background and requires 10 cows for training with approximately 30 movies of 4 s each. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Convolutional neural network features based change detection in satellite images
NASA Astrophysics Data System (ADS)
Mohammed El Amin, Arabi; Liu, Qingjie; Wang, Yunhong
2016-07-01
With the popular use of high resolution remote sensing (HRRS) satellite images, a huge research efforts have been placed on change detection (CD) problem. An effective feature selection method can significantly boost the final result. While hand-designed features have proven difficulties to design features that effectively capture high and mid-level representations, the recent developments in machine learning (Deep Learning) omit this problem by learning hierarchical representation in an unsupervised manner directly from data without human intervention. In this letter, we propose approaching the change detection problem from a feature learning perspective. A novel deep Convolutional Neural Networks (CNN) features based HR satellite images change detection method is proposed. The main guideline is to produce a change detection map directly from two images using a pretrained CNN. This method can omit the limited performance of hand-crafted features. Firstly, CNN features are extracted through different convolutional layers. Then, a concatenation step is evaluated after an normalization step, resulting in a unique higher dimensional feature map. Finally, a change map was computed using pixel-wise Euclidean distance. Our method has been validated on real bitemporal HRRS satellite images according to qualitative and quantitative analyses. The results obtained confirm the interest of the proposed method.
NASA Technical Reports Server (NTRS)
Hoffman, R. N.; Leidner, S. M.; Henderson, J. M.; Atlas, R.; Ardizzone, J. V.; Bloom, S. C.; Atlas, Robert (Technical Monitor)
2001-01-01
In this study, we apply a two-dimensional variational analysis method (2d-VAR) to select a wind solution from NASA Scatterometer (NSCAT) ambiguous winds. 2d-VAR determines a "best" gridded surface wind analysis by minimizing a cost function. The cost function measures the misfit to the observations, the background, and the filtering and dynamical constraints. The ambiguity closest in direction to the minimizing analysis is selected. 2d-VAR method, sensitivity and numerical behavior are described. 2d-VAR is compared to statistical interpolation (OI) by examining the response of both systems to a single ship observation and to a swath of unique scatterometer winds. 2d-VAR is used with both NSCAT ambiguities and NSCAT backscatter values. Results are roughly comparable. When the background field is poor, 2d-VAR ambiguity removal often selects low probability ambiguities. To avoid this behavior, an initial 2d-VAR analysis, using only the two most likely ambiguities, provides the first guess for an analysis using all the ambiguities or the backscatter data. 2d-VAR and median filter selected ambiguities usually agree. Both methods require horizontal consistency, so disagreements occur in clumps, or as linear features. In these cases, 2d-VAR ambiguities are often more meteorologically reasonable and more consistent with satellite imagery.
Feature extraction with deep neural networks by a generalized discriminant analysis.
Stuhlsatz, André; Lippel, Jens; Zielke, Thomas
2012-04-01
We present an approach to feature extraction that is a generalization of the classical linear discriminant analysis (LDA) on the basis of deep neural networks (DNNs). As for LDA, discriminative features generated from independent Gaussian class conditionals are assumed. This modeling has the advantages that the intrinsic dimensionality of the feature space is bounded by the number of classes and that the optimal discriminant function is linear. Unfortunately, linear transformations are insufficient to extract optimal discriminative features from arbitrarily distributed raw measurements. The generalized discriminant analysis (GerDA) proposed in this paper uses nonlinear transformations that are learnt by DNNs in a semisupervised fashion. We show that the feature extraction based on our approach displays excellent performance on real-world recognition and detection tasks, such as handwritten digit recognition and face detection. In a series of experiments, we evaluate GerDA features with respect to dimensionality reduction, visualization, classification, and detection. Moreover, we show that GerDA DNNs can preprocess truly high-dimensional input data to low-dimensional representations that facilitate accurate predictions even if simple linear predictors or measures of similarity are used.
NASA Astrophysics Data System (ADS)
Liu, Wanjun; Liang, Xuejian; Qu, Haicheng
2017-11-01
Hyperspectral image (HSI) classification is one of the most popular topics in remote sensing community. Traditional and deep learning-based classification methods were proposed constantly in recent years. In order to improve the classification accuracy and robustness, a dimensionality-varied convolutional neural network (DVCNN) was proposed in this paper. DVCNN was a novel deep architecture based on convolutional neural network (CNN). The input of DVCNN was a set of 3D patches selected from HSI which contained spectral-spatial joint information. In the following feature extraction process, each patch was transformed into some different 1D vectors by 3D convolution kernels, which were able to extract features from spectral-spatial data. The rest of DVCNN was about the same as general CNN and processed 2D matrix which was constituted by by all 1D data. So that the DVCNN could not only extract more accurate and rich features than CNN, but also fused spectral-spatial information to improve classification accuracy. Moreover, the robustness of network on water-absorption bands was enhanced in the process of spectral-spatial fusion by 3D convolution, and the calculation was simplified by dimensionality varied convolution. Experiments were performed on both Indian Pines and Pavia University scene datasets, and the results showed that the classification accuracy of DVCNN improved by 32.87% on Indian Pines and 19.63% on Pavia University scene than spectral-only CNN. The maximum accuracy improvement of DVCNN achievement was 13.72% compared with other state-of-the-art HSI classification methods, and the robustness of DVCNN on water-absorption bands noise was demonstrated.
Information Gain Based Dimensionality Selection for Classifying Text Documents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumidu Wijayasekara; Milos Manic; Miles McQueen
2013-06-01
Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexitymore » is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.« less
Design and analysis of compound flexible skin based on deformable honeycomb
NASA Astrophysics Data System (ADS)
Zou, Tingting; Zhou, Li
2017-04-01
In this study, we focused at the development and verification of a robust framework for surface crack detection in steel pipes using measured vibration responses; with the presence of multiple progressive damage occurring in different locations within the structure. Feature selection, dimensionality reduction, and multi-class support vector machine were established for this purpose. Nine damage cases, at different locations, orientations and length, were introduced into the pipe structure. The pipe was impacted 300 times using an impact hammer, after each damage case, the vibration data were collected using 3 PZT wafers which were installed on the outer surface of the pipe. At first, damage sensitive features were extracted using the frequency response function approach followed by recursive feature elimination for dimensionality reduction. Then, a multi-class support vector machine learning algorithm was employed to train the data and generate a statistical model. Once the model is established, decision values and distances from the hyper-plane were generated for the new collected data using the trained model. This process was repeated on the data collected from each sensor. Overall, using a single sensor for training and testing led to a very high accuracy reaching 98% in the assessment of the 9 damage cases used in this study.
NASA Astrophysics Data System (ADS)
Mustapha, S.; Braytee, A.; Ye, L.
2017-04-01
In this study, we focused at the development and verification of a robust framework for surface crack detection in steel pipes using measured vibration responses; with the presence of multiple progressive damage occurring in different locations within the structure. Feature selection, dimensionality reduction, and multi-class support vector machine were established for this purpose. Nine damage cases, at different locations, orientations and length, were introduced into the pipe structure. The pipe was impacted 300 times using an impact hammer, after each damage case, the vibration data were collected using 3 PZT wafers which were installed on the outer surface of the pipe. At first, damage sensitive features were extracted using the frequency response function approach followed by recursive feature elimination for dimensionality reduction. Then, a multi-class support vector machine learning algorithm was employed to train the data and generate a statistical model. Once the model is established, decision values and distances from the hyper-plane were generated for the new collected data using the trained model. This process was repeated on the data collected from each sensor. Overall, using a single sensor for training and testing led to a very high accuracy reaching 98% in the assessment of the 9 damage cases used in this study.
FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data.
DeTomaso, David; Yosef, Nir
2016-08-23
A key challenge in the emerging field of single-cell RNA-Seq is to characterize phenotypic diversity between cells and visualize this information in an informative manner. A common technique when dealing with high-dimensional data is to project the data to 2 or 3 dimensions for visualization. However, there are a variety of methods to achieve this result and once projected, it can be difficult to ascribe biological significance to the observed features. Additionally, when analyzing single-cell data, the relationship between cells can be obscured by technical confounders such as variable gene capture rates. To aid in the analysis and interpretation of single-cell RNA-Seq data, we have developed FastProject, a software tool which analyzes a gene expression matrix and produces a dynamic output report in which two-dimensional projections of the data can be explored. Annotated gene sets (referred to as gene 'signatures') are incorporated so that features in the projections can be understood in relation to the biological processes they might represent. FastProject provides a novel method of scoring each cell against a gene signature so as to minimize the effect of missed transcripts as well as a method to rank signature-projection pairings so that meaningful associations can be quickly identified. Additionally, FastProject is written with a modular architecture and designed to serve as a platform for incorporating and comparing new projection methods and gene selection algorithms. Here we present FastProject, a software package for two-dimensional visualization of single cell data, which utilizes a plethora of projection methods and provides a way to systematically investigate the biological relevance of these low dimensional representations by incorporating domain knowledge.
Who Needs 3D When the Universe Is Flat?
ERIC Educational Resources Information Center
Eriksson, Urban; Linder, Cedric; Airey, John; Redfors, Andreas
2014-01-01
An overlooked feature in astronomy education is the need for students to learn to extrapolate three-dimensionality and the challenges that this may involve. Discerning critical features in the night sky that are embedded in dimensionality is a long-term learning process. Several articles have addressed the usefulness of three-dimensional (3D)…
Analysis of Big Data in Gait Biomechanics: Current Trends and Future Directions.
Phinyomark, Angkoon; Petri, Giovanni; Ibáñez-Marcelo, Esther; Osis, Sean T; Ferber, Reed
2018-01-01
The increasing amount of data in biomechanics research has greatly increased the importance of developing advanced multivariate analysis and machine learning techniques, which are better able to handle "big data". Consequently, advances in data science methods will expand the knowledge for testing new hypotheses about biomechanical risk factors associated with walking and running gait-related musculoskeletal injury. This paper begins with a brief introduction to an automated three-dimensional (3D) biomechanical gait data collection system: 3D GAIT, followed by how the studies in the field of gait biomechanics fit the quantities in the 5 V's definition of big data: volume, velocity, variety, veracity, and value. Next, we provide a review of recent research and development in multivariate and machine learning methods-based gait analysis that can be applied to big data analytics. These modern biomechanical gait analysis methods include several main modules such as initial input features, dimensionality reduction (feature selection and extraction), and learning algorithms (classification and clustering). Finally, a promising big data exploration tool called "topological data analysis" and directions for future research are outlined and discussed.
Biologically-inspired data decorrelation for hyper-spectral imaging
NASA Astrophysics Data System (ADS)
Picon, Artzai; Ghita, Ovidiu; Rodriguez-Vaamonde, Sergio; Iriondo, Pedro Ma; Whelan, Paul F.
2011-12-01
Hyper-spectral data allows the construction of more robust statistical models to sample the material properties than the standard tri-chromatic color representation. However, because of the large dimensionality and complexity of the hyper-spectral data, the extraction of robust features (image descriptors) is not a trivial issue. Thus, to facilitate efficient feature extraction, decorrelation techniques are commonly applied to reduce the dimensionality of the hyper-spectral data with the aim of generating compact and highly discriminative image descriptors. Current methodologies for data decorrelation such as principal component analysis (PCA), linear discriminant analysis (LDA), wavelet decomposition (WD), or band selection methods require complex and subjective training procedures and in addition the compressed spectral information is not directly related to the physical (spectral) characteristics associated with the analyzed materials. The major objective of this article is to introduce and evaluate a new data decorrelation methodology using an approach that closely emulates the human vision. The proposed data decorrelation scheme has been employed to optimally minimize the amount of redundant information contained in the highly correlated hyper-spectral bands and has been comprehensively evaluated in the context of non-ferrous material classification
Guo, Zhiqiang; Wang, Huaiqing; Yang, Jie; Miller, David J
2015-01-01
In this paper, we propose and implement a hybrid model combining two-directional two-dimensional principal component analysis ((2D)2PCA) and a Radial Basis Function Neural Network (RBFNN) to forecast stock market behavior. First, 36 stock market technical variables are selected as the input features, and a sliding window is used to obtain the input data of the model. Next, (2D)2PCA is utilized to reduce the dimension of the data and extract its intrinsic features. Finally, an RBFNN accepts the data processed by (2D)2PCA to forecast the next day's stock price or movement. The proposed model is used on the Shanghai stock market index, and the experiments show that the model achieves a good level of fitness. The proposed model is then compared with one that uses the traditional dimension reduction method principal component analysis (PCA) and independent component analysis (ICA). The empirical results show that the proposed model outperforms the PCA-based model, as well as alternative models based on ICA and on the multilayer perceptron.
Guo, Zhiqiang; Wang, Huaiqing; Yang, Jie; Miller, David J.
2015-01-01
In this paper, we propose and implement a hybrid model combining two-directional two-dimensional principal component analysis ((2D)2PCA) and a Radial Basis Function Neural Network (RBFNN) to forecast stock market behavior. First, 36 stock market technical variables are selected as the input features, and a sliding window is used to obtain the input data of the model. Next, (2D)2PCA is utilized to reduce the dimension of the data and extract its intrinsic features. Finally, an RBFNN accepts the data processed by (2D)2PCA to forecast the next day's stock price or movement. The proposed model is used on the Shanghai stock market index, and the experiments show that the model achieves a good level of fitness. The proposed model is then compared with one that uses the traditional dimension reduction method principal component analysis (PCA) and independent component analysis (ICA). The empirical results show that the proposed model outperforms the PCA-based model, as well as alternative models based on ICA and on the multilayer perceptron. PMID:25849483
Al-Qazzaz, Noor Kamal; Ali, Sawal; Ahmad, Siti Anom; Escudero, Javier
2017-07-01
The aim of the present study was to discriminate the electroencephalogram (EEG) of 5 patients with vascular dementia (VaD), 15 patients with stroke-related mild cognitive impairment (MCI), and 15 control normal subjects during a working memory (WM) task. We used independent component analysis (ICA) and wavelet transform (WT) as a hybrid preprocessing approach for EEG artifact removal. Three different features were extracted from the cleaned EEG signals: spectral entropy (SpecEn), permutation entropy (PerEn) and Tsallis entropy (TsEn). Two classification schemes were applied - support vector machine (SVM) and k-nearest neighbors (kNN) - with fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR) as a dimensionality reduction technique. The FNPAQR dimensionality reduction technique increased the SVM classification accuracy from 82.22% to 90.37% and from 82.6% to 86.67% for kNN. These results suggest that FNPAQR consistently improves the discrimination of VaD, MCI patients and control normal subjects and it could be a useful feature selection to help the identification of patients with VaD and MCI.
Watanabe, Takanori; Kessler, Daniel; Scott, Clayton; Angstadt, Michael; Sripada, Chandra
2014-01-01
Substantial evidence indicates that major psychiatric disorders are associated with distributed neural dysconnectivity, leading to strong interest in using neuroimaging methods to accurately predict disorder status. In this work, we are specifically interested in a multivariate approach that uses features derived from whole-brain resting state functional connectomes. However, functional connectomes reside in a high dimensional space, which complicates model interpretation and introduces numerous statistical and computational challenges. Traditional feature selection techniques are used to reduce data dimensionality, but are blind to the spatial structure of the connectomes. We propose a regularization framework where the 6-D structure of the functional connectome (defined by pairs of points in 3-D space) is explicitly taken into account via the fused Lasso or the GraphNet regularizer. Our method only restricts the loss function to be convex and margin-based, allowing non-differentiable loss functions such as the hinge-loss to be used. Using the fused Lasso or GraphNet regularizer with the hinge-loss leads to a structured sparse support vector machine (SVM) with embedded feature selection. We introduce a novel efficient optimization algorithm based on the augmented Lagrangian and the classical alternating direction method, which can solve both fused Lasso and GraphNet regularized SVM with very little modification. We also demonstrate that the inner subproblems of the algorithm can be solved efficiently in analytic form by coupling the variable splitting strategy with a data augmentation scheme. Experiments on simulated data and resting state scans from a large schizophrenia dataset show that our proposed approach can identify predictive regions that are spatially contiguous in the 6-D “connectome space,” offering an additional layer of interpretability that could provide new insights about various disease processes. PMID:24704268
Tamilvanan, Thangaraju; Hopper, Waheeta
2014-01-01
Yersinia pestis, a Gram negative bacillus, spreads via lymphatic to lymph nodes and to all organs through the bloodstream, causing plague. Yersinia outer protein H (YopH) is one of the important effector proteins, which paralyzes lymphocytes and macrophages by dephosphorylating critical tyrosine kinases and signal transduction molecules. The purpose of the study is to generate a three-dimensional (3D) pharmacophore model by using diverse sets of YopH inhibitors, which would be useful for designing of potential antitoxin. In this study, we have selected 60 biologically active inhibitors of YopH to perform Ligand based pharmacophore study to elucidate the important structural features responsible for biological activity. Pharmacophore model demonstrated the importance of two acceptors, one hydrophobic and two aromatic features toward the biological activity. Based on these features, different databases were screened to identify novel compounds and these ligands were subjected for docking, ADME properties and Binding energy prediction. Post docking validation was performed using molecular dynamics simulation for selected ligands to calculate the Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF). The ligands, ASN03270114, Mol_252138, Mol_31073 and ZINC04237078 may act as inhibitors against YopH of Y. pestis.
NASA Astrophysics Data System (ADS)
Kakkos, I.; Gkiatis, K.; Bromis, K.; Asvestas, P. A.; Karanasiou, I. S.; Ventouras, E. M.; Matsopoulos, G. K.
2017-11-01
The detection of an error is the cognitive evaluation of an action outcome that is considered undesired or mismatches an expected response. Brain activity during monitoring of correct and incorrect responses elicits Event Related Potentials (ERPs) revealing complex cerebral responses to deviant sensory stimuli. Development of accurate error detection systems is of great importance both concerning practical applications and in investigating the complex neural mechanisms of decision making. In this study, data are used from an audio identification experiment that was implemented with two levels of complexity in order to investigate neurophysiological error processing mechanisms in actors and observers. To examine and analyse the variations of the processing of erroneous sensory information for each level of complexity we employ Support Vector Machines (SVM) classifiers with various learning methods and kernels using characteristic ERP time-windowed features. For dimensionality reduction and to remove redundant features we implement a feature selection framework based on Sequential Forward Selection (SFS). The proposed method provided high accuracy in identifying correct and incorrect responses both for actors and for observers with mean accuracy of 93% and 91% respectively. Additionally, computational time was reduced and the effects of the nesting problem usually occurring in SFS of large feature sets were alleviated.
Topic modeling for cluster analysis of large biological and medical datasets
2014-01-01
Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106
Topic modeling for cluster analysis of large biological and medical datasets.
Zhao, Weizhong; Zou, Wen; Chen, James J
2014-01-01
The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hajian, Amir; Alvarez, Marcelo A.; Bond, J. Richard, E-mail: ahajian@cita.utoronto.ca, E-mail: malvarez@cita.utoronto.ca, E-mail: bond@cita.utoronto.ca
Making mock simulated catalogs is an important component of astrophysical data analysis. Selection criteria for observed astronomical objects are often too complicated to be derived from first principles. However the existence of an observed group of objects is a well-suited problem for machine learning classification. In this paper we use one-class classifiers to learn the properties of an observed catalog of clusters of galaxies from ROSAT and to pick clusters from mock simulations that resemble the observed ROSAT catalog. We show how this method can be used to study the cross-correlations of thermal Sunya'ev-Zeldovich signals with number density maps ofmore » X-ray selected cluster catalogs. The method reduces the bias due to hand-tuning the selection function and is readily scalable to large catalogs with a high-dimensional space of astrophysical features.« less
2009-04-01
Capabilities Co ns tr uc tio n En gi ne er in g R es ea rc h La bo ra to ry Kathy L. Simunich, Timothy K. Perkins, David M. Bailey, David Brown, and...inversion height in convective condition is estimated with a one- dimensional model of the atmospheric boundary layer based on the Drie- donks slab model...tool and its capabilities. Installation geospatial data, in CAD format, were obtained for select buildings, roads, and topographic features in
The programming language HAL: A specification
NASA Technical Reports Server (NTRS)
1971-01-01
HAL accomplishes three significant objectives: (1) increased readability, through the use of a natural two-dimensional mathematical format; (2) increased reliability, by providing for selective recognition of common data and subroutines, and by incorporating specific data-protect features; (3) real-time control facility, by including a comprehensive set of real-time control commands and signal conditions. Although HAL is designed primarily for programming on-board computers, it is general enough to meet nearly all the needs in the production, verification and support of aerospace, and other real-time applications.
Multiphoton writing of three-dimensional fluidic channels within a porous matrix.
Lee, Jyh-Tsung; George, Matthew C; Moore, Jeffrey S; Braun, Paul V
2009-08-19
We demonstrate a facile method for fabricating novel 3D microfluidic channels by using two-photon-activated chemistry to locally switch the interior surface of a porous host from a hydrophobic state to a hydrophilic state. The 3D structures can be infilled selectively with water and/or hydrophobic oil with a minimum feature size of only a few micrometers. We envision that this approach may enable the fabrication of complex microfluidic structures that cannot be easily formed via current technologies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rios Velazquez, E; Parmar, C; Narayan, V
Purpose: To compare the complementary value of quantitative radiomic features to that of radiologist-annotated semantic features in predicting EGFR mutations in lung adenocarcinomas. Methods: Pre-operative CT images of 258 lung adenocarcinoma patients were available. Tumors were segmented using the sing-click ensemble segmentation algorithm. A set of radiomic features was extracted using 3D-Slicer. Test-retest reproducibility and unsupervised dimensionality reduction were applied to select a subset of reproducible and independent radiomic features. Twenty semantic annotations were scored by an expert radiologist, describing the tumor, surrounding tissue and associated findings. Minimum-redundancy-maximum-relevance (MRMR) was used to identify the most informative radiomic and semantic featuresmore » in 172 patients (training-set, temporal split). Radiomic, semantic and combined radiomic-semantic logistic regression models to predict EGFR mutations were evaluated in and independent validation dataset of 86 patients using the area under the receiver operating curve (AUC). Results: EGFR mutations were found in 77/172 (45%) and 39/86 (45%) of the training and validation sets, respectively. Univariate AUCs showed a similar range for both feature types: radiomics median AUC = 0.57 (range: 0.50 – 0.62); semantic median AUC = 0.53 (range: 0.50 – 0.64, Wilcoxon p = 0.55). After MRMR feature selection, the best-performing radiomic, semantic, and radiomic-semantic logistic regression models, for EGFR mutations, showed a validation AUC of 0.56 (p = 0.29), 0.63 (p = 0.063) and 0.67 (p = 0.004), respectively. Conclusion: Quantitative volumetric and textural Radiomic features complement the qualitative and semi-quantitative radiologist annotations. The prognostic value of informative qualitative semantic features such as cavitation and lobulation is increased with the addition of quantitative textural features from the tumor region.« less
NASA Astrophysics Data System (ADS)
Tan, Maxine; Leader, Joseph K.; Liu, Hong; Zheng, Bin
2015-03-01
We recently investigated a new mammographic image feature based risk factor to predict near-term breast cancer risk after a woman has a negative mammographic screening. We hypothesized that unlike the conventional epidemiology-based long-term (or lifetime) risk factors, the mammographic image feature based risk factor value will increase as the time lag between the negative and positive mammography screening decreases. The purpose of this study is to test this hypothesis. From a large and diverse full-field digital mammography (FFDM) image database with 1278 cases, we collected all available sequential FFDM examinations for each case including the "current" and 1 to 3 most recently "prior" examinations. All "prior" examinations were interpreted negative, and "current" ones were either malignant or recalled negative/benign. We computed 92 global mammographic texture and density based features, and included three clinical risk factors (woman's age, family history and subjective breast density BIRADS ratings). On this initial feature set, we applied a fast and accurate Sequential Forward Floating Selection (SFFS) feature selection algorithm to reduce feature dimensionality. The features computed on both mammographic views were individually/ separately trained using two artificial neural network (ANN) classifiers. The classification scores of the two ANNs were then merged with a sequential ANN. The results show that the maximum adjusted odds ratios were 5.59, 7.98, and 15.77 for using the 3rd, 2nd, and 1st "prior" FFDM examinations, respectively, which demonstrates a higher association of mammographic image feature change and an increasing risk trend of developing breast cancer in the near-term after a negative screening.
Gene/protein name recognition based on support vector machine using dictionary as features.
Mitsumori, Tomohiro; Fation, Sevrani; Murata, Masaki; Doi, Kouichi; Doi, Hirohumi
2005-01-01
Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.
Valley-selective Landau-Zener oscillations in semi-Dirac p -n junctions
NASA Astrophysics Data System (ADS)
Saha, K.; Nandkishore, R.; Parameswaran, S. A.
2017-07-01
We study transport across p -n junctions of gapped two-dimensional semi-Dirac materials: nodal semimetals whose energy bands disperse quadratically and linearly along distinct crystal axes. The resulting electronic properties—relevant to materials such as TiO2/VO2 multilayers and α -(BEDT-TTF)2I3 salts—continuously interpolate between those of mono- and bilayer graphene as a function of propagation angle. We demonstrate that tunneling across the junction depends on the orientation of the tunnel barrier relative to the crystalline axes, leading to strongly nonmonotonic current-voltage characteristics, including negative differential conductance in some regimes. In multivalley systems, these features provide a natural route to engineering valley-selective transport.
Advanced supersonic propulsion system technology study, phase 2
NASA Technical Reports Server (NTRS)
Allan, R. D.
1975-01-01
Variable cycle engines were identified, based on the mixed-flow low-bypass-ratio augmented turbofan cycle, which has shown excellent range capability in the AST airplane. The best mixed-flow augmented turbofan engine was selected based on range in the AST Baseline Airplane. Selected variable cycle engine features were added to this best conventional baseline engine, and the Dual-Cycle VCE and Double-Bypass VCE were defined. The conventional mixed-flow turbofan and the Double-Bypass VCE were on the subjects of engine preliminary design studies to determine mechanical feasibility, confirm weight and dimensional estimates, and identify the necessary technology considered not yet available. Critical engine components were studied and incorporated into the variable cycle engine design.
NASA Astrophysics Data System (ADS)
Lai, Chunren; Guo, Shengwen; Cheng, Lina; Wang, Wensheng; Wu, Kai
2017-02-01
It's very important to differentiate the temporal lobe epilepsy (TLE) patients from healthy people and localize the abnormal brain regions of the TLE patients. The cortical features and changes can reveal the unique anatomical patterns of brain regions from the structural MR images. In this study, structural MR images from 28 normal controls (NC), 18 left TLE (LTLE), and 21 right TLE (RTLE) were acquired, and four types of cortical feature, namely cortical thickness (CTh), cortical surface area (CSA), gray matter volume (GMV), and mean curvature (MCu), were explored for discriminative analysis. Three feature selection methods, the independent sample t-test filtering, the sparse-constrained dimensionality reduction model (SCDRM), and the support vector machine-recursive feature elimination (SVM-RFE), were investigated to extract dominant regions with significant differences among the compared groups for classification using the SVM classifier. The results showed that the SVM-REF achieved the highest performance (most classifications with more than 92% accuracy), followed by the SCDRM, and the t-test. Especially, the surface area and gray volume matter exhibited prominent discriminative ability, and the performance of the SVM was improved significantly when the four cortical features were combined. Additionally, the dominant regions with higher classification weights were mainly located in temporal and frontal lobe, including the inferior temporal, entorhinal cortex, fusiform, parahippocampal cortex, middle frontal and frontal pole. It was demonstrated that the cortical features provided effective information to determine the abnormal anatomical pattern and the proposed method has the potential to improve the clinical diagnosis of the TLE.
Scene segmentation of natural images using texture measures and back-propagation
NASA Technical Reports Server (NTRS)
Sridhar, Banavar; Phatak, Anil; Chatterji, Gano
1993-01-01
Knowledge of the three-dimensional world is essential for many guidance and navigation applications. A sequence of images from an electro-optical sensor can be processed using optical flow algorithms to provide a sparse set of ranges as a function of azimuth and elevation. A natural way to enhance the range map is by interpolation. However, this should be undertaken with care since interpolation assumes continuity of range. The range is continuous in certain parts of the image and can jump at object boundaries. In such situations, the ability to detect homogeneous object regions by scene segmentation can be used to determine regions in the range map that can be enhanced by interpolation. The use of scalar features derived from the spatial gray-level dependence matrix for texture segmentation is explored. Thresholding of histograms of scalar texture features is done for several images to select scalar features which result in a meaningful segmentation of the images. Next, the selected scalar features are used with a neural net to automate the segmentation procedure. Back-propagation is used to train the feed forward neural network. The generalization of the network approach to subsequent images in the sequence is examined. It is shown that the use of multiple scalar features as input to the neural network result in a superior segmentation when compared with a single scalar feature. It is also shown that the scalar features, which are not useful individually, result in a good segmentation when used together. The methodology is applied to both indoor and outdoor images.
Extraction and LOD control of colored interval volumes
NASA Astrophysics Data System (ADS)
Miyamura, Hiroko N.; Takeshima, Yuriko; Fujishiro, Issei; Saito, Takafumi
2005-03-01
Interval volume serves as a generalized isosurface and represents a three-dimensional subvolume for which the associated scalar filed values lie within a user-specified closed interval. In general, it is not an easy task for novices to specify the scalar field interval corresponding to their ROIs. In order to extract interval volumes from which desirable geometric features can be mined effectively, we propose a suggestive technique which extracts interval volumes automatically based on the global examination of the field contrast structure. Also proposed here is a simplification scheme for decimating resultant triangle patches to realize efficient transmission and rendition of large-scale interval volumes. Color distributions as well as geometric features are taken into account to select best edges to be collapsed. In addition, when a user wants to selectively display and analyze the original dataset, the simplified dataset is restructured to the original quality. Several simulated and acquired datasets are used to demonstrate the effectiveness of the present methods.
Feature Selection for Wheat Yield Prediction
NASA Astrophysics Data System (ADS)
Ruß, Georg; Kruse, Rudolf
Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.
Microscale Patterning of Thermoplastic Polymer Surfaces by Selective Solvent Swelling
Rahmanian, Omid; Chen, Chien-Fu; DeVoe, Don L.
2012-01-01
A new method for the fabrication of microscale features in thermoplastic substrates is presented. Unlike traditional thermoplastic microfabrication techniques, in which bulk polymer is displaced from the substrate by machining or embossing, a unique process termed orogenic microfabrication has been developed in which selected regions of a thermoplastic surface are raised from the substrate by an irreversible solvent swelling mechanism. The orogenic technique allows thermoplastic surfaces to be patterned using a variety of masking methods, resulting in three-dimensional features that would be difficult to achieve through traditional microfabrication methods. Using cyclic olefin copolymer as a model thermoplastic material, several variations of this process are described to realize growth heights ranging from several nanometers to tens of microns, with patterning techniques include direct photoresist masking, patterned UV/ozone surface passivation, elastomeric stamping, and noncontact spotting. Orogenic microfabrication is also demonstrated by direct inkjet printing as a facile photolithography-free masking method for rapid desktop thermoplastic microfabrication. PMID:22900539
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-01-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices. PMID:25177107
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-11-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.
Li, Yiqing; Wang, Yu; Zi, Yanyang; Zhang, Mingquan
2015-10-21
The various multi-sensor signal features from a diesel engine constitute a complex high-dimensional dataset. The non-linear dimensionality reduction method, t-distributed stochastic neighbor embedding (t-SNE), provides an effective way to implement data visualization for complex high-dimensional data. However, irrelevant features can deteriorate the performance of data visualization, and thus, should be eliminated a priori. This paper proposes a feature subset score based t-SNE (FSS-t-SNE) data visualization method to deal with the high-dimensional data that are collected from multi-sensor signals. In this method, the optimal feature subset is constructed by a feature subset score criterion. Then the high-dimensional data are visualized in 2-dimension space. According to the UCI dataset test, FSS-t-SNE can effectively improve the classification accuracy. An experiment was performed with a large power marine diesel engine to validate the proposed method for diesel engine malfunction classification. Multi-sensor signals were collected by a cylinder vibration sensor and a cylinder pressure sensor. Compared with other conventional data visualization methods, the proposed method shows good visualization performance and high classification accuracy in multi-malfunction classification of a diesel engine.
Li, Yiqing; Wang, Yu; Zi, Yanyang; Zhang, Mingquan
2015-01-01
The various multi-sensor signal features from a diesel engine constitute a complex high-dimensional dataset. The non-linear dimensionality reduction method, t-distributed stochastic neighbor embedding (t-SNE), provides an effective way to implement data visualization for complex high-dimensional data. However, irrelevant features can deteriorate the performance of data visualization, and thus, should be eliminated a priori. This paper proposes a feature subset score based t-SNE (FSS-t-SNE) data visualization method to deal with the high-dimensional data that are collected from multi-sensor signals. In this method, the optimal feature subset is constructed by a feature subset score criterion. Then the high-dimensional data are visualized in 2-dimension space. According to the UCI dataset test, FSS-t-SNE can effectively improve the classification accuracy. An experiment was performed with a large power marine diesel engine to validate the proposed method for diesel engine malfunction classification. Multi-sensor signals were collected by a cylinder vibration sensor and a cylinder pressure sensor. Compared with other conventional data visualization methods, the proposed method shows good visualization performance and high classification accuracy in multi-malfunction classification of a diesel engine. PMID:26506347
Synthesis and Structural Studies of Calcium and Magnesium Phosphinate and Phosphonate Compounds
NASA Astrophysics Data System (ADS)
Bampoh, Victoria Naa Kwale
The work presented herein describes synthetic methodologies leading to the design of a wide array of magnesium and calcium based phosphinate and phosphonates with possible applications as bone scaffolding materials or additives to bone cements. The challenge to the chemistry of the alkaline earth phosphonate target compounds includes poor solubility of compounds, and poorly understood details on the control of the metal's coordination environment. Hence, less is known on phosphonate based alkaline earth metal organic frameworks as compared to transition metal phosphonates. Factors governing the challenges in obtaining crystalline, well-defined magnesium and calcium solids lie in the large metal diameters, the absence of energetically available d-orbitals to direct metal geometry, as well as the overall weakness of the metal-ligand bonds. A significant part of this project was concerned with the development of suitable reaction conditions to obtain X-ray quality crystals of the reaction products to allow for structural elucidation of the novel compounds. Various methodologies to aid in crystal growth including hydrothermal methods and gel crystallization were employed. We have used phosphinate and phosphonate ligands with different number of phosphorus oxygen atoms as well as diphosphonates with different linker lengths to determine their effects on the overall structural features. An interesting correlation is observed between the dimensionality of products and the increasing number of donor oxygen atoms in the ligands as we progress from phosphinic acid to the phosphorous acids. As an example, monophosphinate ligand only yielded one-dimensional compounds, whereas the phosphonates crystallize as one and two-dimensional compounds, and the di- and triphosphonate based compounds display two or three-dimensional geometries. This thesis provides a selection of calcium and magnesium compounds with one-dimensional geometry, as represented in a calcium phosphinate to novel two-dimensional sheets of magnesium and pillared calcium phosphonates. The preparation of these novel compounds has led to the establishment of synthetic protocols that allow for the direct preparation of compounds with defined structural features.
Wu, Zhi-fang; Lei, Yong-hua; Li, Wen-jie; Liao, Sheng-hui; Zhao, Zi-jin
2013-02-01
To explore an effective method to construct and validate a finite element model of the unilateral cleft lip and palate(UCLP) craniomaxillary complex with sutures, which could be applied in further three-dimensional finite element analysis (FEA). One male patient aged 9 with left complete lip and palate cleft was selected and CT scan was taken at 0.75mm intervals on the skull. The CT data was saved in Dicom format, which was, afterwards, imported into Software Mimics 10.0 to generate a three-dimensional anatomic model. Then Software Geomagic Studio 12.0 was used to match, smoothen and transfer the anatomic model into a CAD model with NURBS patches. Then, 12 circum-maxillary sutures were integrated into the CAD model by Solidworks (2011 version). Finally meshing by E-feature Biomedical Modeler was done and a three-dimensional finite element model with sutures was obtained. A maxillary protraction force (500 g per side, 20° downward and forward from the occlusal plane) was applied. Displacement and stress distribution of some important craniofacial structures were measured and compared with the results of related researches in the literature. A three-dimensional finite element model of UCLP craniomaxillary complex with 12 sutures was established from the CT scan data. This simulation model consisted of 206 753 individual elements with 260 662 nodes, which was a more precise simulation and a better representation of human craniomaxillary complex than the formerly available FEA models. By comparison, this model was proved to be valid. It is an effective way to establish the three-dimensional finite element model of UCLP cranio-maxillary complex with sutures from CT images with the help of the following softwares: Mimics 10.0, Geomagic Studio 12.0, Solidworks and E-feature Biomedical Modeler.
Pacharawongsakda, Eakasit; Theeramunkong, Thanaruk
2013-12-01
Predicting protein subcellular location is one of major challenges in Bioinformatics area since such knowledge helps us understand protein functions and enables us to select the targeted proteins during drug discovery process. While many computational techniques have been proposed to improve predictive performance for protein subcellular location, they have several shortcomings. In this work, we propose a method to solve three main issues in such techniques; i) manipulation of multiplex proteins which may exist or move between multiple cellular compartments, ii) handling of high dimensionality in input and output spaces and iii) requirement of sufficient labeled data for model training. Towards these issues, this work presents a new computational method for predicting proteins which have either single or multiple locations. The proposed technique, namely iFLAST-CORE, incorporates the dimensionality reduction in the feature and label spaces with co-training paradigm for semi-supervised multi-label classification. For this purpose, the Singular Value Decomposition (SVD) is applied to transform the high-dimensional feature space and label space into the lower-dimensional spaces. After that, due to limitation of labeled data, the co-training regression makes use of unlabeled data by predicting the target values in the lower-dimensional spaces of unlabeled data. In the last step, the component of SVD is used to project labels in the lower-dimensional space back to those in the original space and an adaptive threshold is used to map a numeric value to a binary value for label determination. A set of experiments on viral proteins and gram-negative bacterial proteins evidence that our proposed method improve the classification performance in terms of various evaluation metrics such as Aiming (or Precision), Coverage (or Recall) and macro F-measure, compared to the traditional method that uses only labeled data.
NASA Technical Reports Server (NTRS)
Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
Incipient Fault Detection for Rolling Element Bearings under Varying Speed Conditions.
Xue, Lang; Li, Naipeng; Lei, Yaguo; Li, Ningbo
2017-06-20
Varying speed conditions bring a huge challenge to incipient fault detection of rolling element bearings because both the change of speed and faults could lead to the amplitude fluctuation of vibration signals. Effective detection methods need to be developed to eliminate the influence of speed variation. This paper proposes an incipient fault detection method for bearings under varying speed conditions. Firstly, relative residual (RR) features are extracted, which are insensitive to the varying speed conditions and are able to reflect the degradation trend of bearings. Then, a health indicator named selected negative log-likelihood probability (SNLLP) is constructed to fuse a feature set including RR features and non-dimensional features. Finally, based on the constructed SNLLP health indicator, a novel alarm trigger mechanism is designed to detect the incipient fault. The proposed method is demonstrated using vibration signals from bearing tests and industrial wind turbines. The results verify the effectiveness of the proposed method for incipient fault detection of rolling element bearings under varying speed conditions.
Incipient Fault Detection for Rolling Element Bearings under Varying Speed Conditions
Xue, Lang; Li, Naipeng; Lei, Yaguo; Li, Ningbo
2017-01-01
Varying speed conditions bring a huge challenge to incipient fault detection of rolling element bearings because both the change of speed and faults could lead to the amplitude fluctuation of vibration signals. Effective detection methods need to be developed to eliminate the influence of speed variation. This paper proposes an incipient fault detection method for bearings under varying speed conditions. Firstly, relative residual (RR) features are extracted, which are insensitive to the varying speed conditions and are able to reflect the degradation trend of bearings. Then, a health indicator named selected negative log-likelihood probability (SNLLP) is constructed to fuse a feature set including RR features and non-dimensional features. Finally, based on the constructed SNLLP health indicator, a novel alarm trigger mechanism is designed to detect the incipient fault. The proposed method is demonstrated using vibration signals from bearing tests and industrial wind turbines. The results verify the effectiveness of the proposed method for incipient fault detection of rolling element bearings under varying speed conditions. PMID:28773035
Kopp, Bruno; Wessel, Karl
2010-05-01
In the present study, event-related potentials (ERPs) were recorded to investigate cognitive processes related to the partial transmission of information from stimulus recognition to response preparation. Participants classified two-dimensional visual stimuli with dimensions size and form. One feature combination was designated as the go-target, whereas the other three feature combinations served as no-go distractors. Size discriminability was manipulated across three experimental conditions. N2c and P3a amplitudes were enhanced in response to those distractors that shared the feature from the faster dimension with the target. Moreover, N2c and P3a amplitudes showed a crossover effect: Size distractors evoked more pronounced ERPs under high size discriminability, but form distractors elicited enhanced ERPs under low size discriminability. These results suggest that partial perceptual-motor transmission of information is accompanied by acts of cognitive control and by shifts of attention between the sources of conflicting information. Selection negativity findings imply adaptive allocation of visual feature-based attention across the two stimulus dimensions.
Arif, Muhammad
2012-06-01
In pattern classification problems, feature extraction is an important step. Quality of features in discriminating different classes plays an important role in pattern classification problems. In real life, pattern classification may require high dimensional feature space and it is impossible to visualize the feature space if the dimension of feature space is greater than four. In this paper, we have proposed a Similarity-Dissimilarity plot which can project high dimensional space to a two dimensional space while retaining important characteristics required to assess the discrimination quality of the features. Similarity-dissimilarity plot can reveal information about the amount of overlap of features of different classes. Separable data points of different classes will also be visible on the plot which can be classified correctly using appropriate classifier. Hence, approximate classification accuracy can be predicted. Moreover, it is possible to know about whom class the misclassified data points will be confused by the classifier. Outlier data points can also be located on the similarity-dissimilarity plot. Various examples of synthetic data are used to highlight important characteristics of the proposed plot. Some real life examples from biomedical data are also used for the analysis. The proposed plot is independent of number of dimensions of the feature space.
Registration algorithm of point clouds based on multiscale normal features
NASA Astrophysics Data System (ADS)
Lu, Jun; Peng, Zhongtao; Su, Hang; Xia, GuiHua
2015-01-01
The point cloud registration technology for obtaining a three-dimensional digital model is widely applied in many areas. To improve the accuracy and speed of point cloud registration, a registration method based on multiscale normal vectors is proposed. The proposed registration method mainly includes three parts: the selection of key points, the calculation of feature descriptors, and the determining and optimization of correspondences. First, key points are selected from the point cloud based on the changes of magnitude of multiscale curvatures obtained by using principal components analysis. Then the feature descriptor of each key point is proposed, which consists of 21 elements based on multiscale normal vectors and curvatures. The correspondences in a pair of two point clouds are determined according to the descriptor's similarity of key points in the source point cloud and target point cloud. Correspondences are optimized by using a random sampling consistency algorithm and clustering technology. Finally, singular value decomposition is applied to optimized correspondences so that the rigid transformation matrix between two point clouds is obtained. Experimental results show that the proposed point cloud registration algorithm has a faster calculation speed, higher registration accuracy, and better antinoise performance.
Twelve inequivalent Dirac cones in two-dimensional ZrB2
NASA Astrophysics Data System (ADS)
Lopez-Bezanilla, Alejandro
2018-01-01
Theoretical evidence of the existence of 12 inequivalent Dirac cones at the vicinity of the Fermi energy in monolayered ZrB2 is presented. Two-dimensional ZrB2 is a mechanically stable d - and p -orbital compound exhibiting a unique electronic structure with two Dirac cones out of high-symmetry points in the irreducible Brillouin zone with a small electron-pocket compensation. First-principles calculations demonstrate that while one of the cones is insensitive to lattice expansion, the second cone vanishes for small perturbation of the vertical Zr position. Internal symmetry breaking with external physical stimuli, along with the relativistic effect of spin-orbit coupling, is able to remove selectively the Dirac cones. A rational explanation in terms of d - and p -orbital mixing is provided to elucidate the origin of the infrequent Dirac cones in a flat structure. The versatility of transition-metal d orbitals combined with the honeycomb lattice provided by the B atoms yields particular features in a two-dimensional material.
NASA Astrophysics Data System (ADS)
Millan, Jaime; McMillan, Janet; Brodin, Jeff; Lee, Byeongdu; Mirkin, Chad; Olvera de La Cruz, Monica
Programmable DNA interactions represent a robust scheme to self-assemble a rich variety of tunable superlattices, where intrinsic and in some cases non-desirable nano-scale building blocks interactions are substituted for DNA hybridization events. Recent advances in synthesis has allowed the extension of this successful scheme to proteins, where DNA distribution can be tuned independently of protein shape by selectively addressing surface residues, giving rise to assembly properties in three dimensional protein-nanoparticle superlattices dependent on DNA distribution. In parallel to this advances, we introduced a scalable coarse-grained model that faithfully reproduces the previously observed co-assemblies from nanoparticles and proteins conjugates. Herein, we implement this numerical model to explain the stability of complex protein-nanoparticle binary superlattices and to elucidate experimentally inaccessible features such as protein orientation. Also, we will discuss systematic studies that highlight the role of DNA distribution and sequence on two-dimensional protein-protein and protein-nanoparticle superlattices.
Twelve inequivalent Dirac cones in two-dimensional ZrB 2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lopez-Bezanilla, Alejandro
Theoretical evidence of the existence of 12 inequivalent Dirac cones at the vicinity of the Fermi energy in monolayered ZrB 2 is presented. Two-dimensional ZrB 2 is a mechanically stable d- and p-orbital compound exhibiting a unique electronic structure with two Dirac cones out of high-symmetry points in the irreducible Brillouin zone with a small electron-pocket compensation. First-principles calculations demonstrate that while one of the cones is insensitive to lattice expansion, the second cone vanishes for small perturbation of the vertical Zr position. Internal symmetry breaking with external physical stimuli, along with the relativistic effect of spin-orbit coupling, is ablemore » to remove selectively the Dirac cones. A rational explanation in terms of d- and p-orbital mixing is provided to elucidate the origin of the infrequent Dirac cones in a flat structure. In conclusion, the versatility of transition-metal d orbitals combined with the honeycomb lattice provided by the B atoms yields particular features in a two-dimensional material.« less
Spectral Regression Discriminant Analysis for Hyperspectral Image Classification
NASA Astrophysics Data System (ADS)
Pan, Y.; Wu, J.; Huang, H.; Liu, J.
2012-08-01
Dimensionality reduction algorithms, which aim to select a small set of efficient and discriminant features, have attracted great attention for Hyperspectral Image Classification. The manifold learning methods are popular for dimensionality reduction, such as Locally Linear Embedding, Isomap, and Laplacian Eigenmap. However, a disadvantage of many manifold learning methods is that their computations usually involve eigen-decomposition of dense matrices which is expensive in both time and memory. In this paper, we introduce a new dimensionality reduction method, called Spectral Regression Discriminant Analysis (SRDA). SRDA casts the problem of learning an embedding function into a regression framework, which avoids eigen-decomposition of dense matrices. Also, with the regression based framework, different kinds of regularizes can be naturally incorporated into our algorithm which makes it more flexible. It can make efficient use of data points to discover the intrinsic discriminant structure in the data. Experimental results on Washington DC Mall and AVIRIS Indian Pines hyperspectral data sets demonstrate the effectiveness of the proposed method.
Three-dimensional couette flow of dusty fluid with heat transfer in the presence of magnetic field
NASA Astrophysics Data System (ADS)
Gayathri, R.; Govindarajan, A.; Sasikala, R.
2018-04-01
This paper is focused on the mathematical modelling of three-dimensional couette flow and heat transfer of a dusty fluid between two infinite horizontal parallel porous flat plates in the presence of an induced magnetic field. The problem is formulated using a continuum two-phase model and the resulting equations are solved analytically. The lower plate is stationary while the upper plate is undergoing uniform motion in its plane. These plates are, respectively subjected to transverse exponential injection and its corresponding removal by constant suction. Due to this type of injection velocity, the flow becomes three dimensional. The closed-form expressions for velocity and temperature fields of both the fluid and dust phase are obtained by solving the governing partial differentiation equations using the perturbation method. A selective set of graphical results is presented and discussed to show interesting features of the problem. It is found that the velocity profiles of both fluid and dust particles decrease due to the increase of (magnetic parameter) Hartmann number.
Twelve inequivalent Dirac cones in two-dimensional ZrB 2
Lopez-Bezanilla, Alejandro
2018-01-29
Theoretical evidence of the existence of 12 inequivalent Dirac cones at the vicinity of the Fermi energy in monolayered ZrB 2 is presented. Two-dimensional ZrB 2 is a mechanically stable d- and p-orbital compound exhibiting a unique electronic structure with two Dirac cones out of high-symmetry points in the irreducible Brillouin zone with a small electron-pocket compensation. First-principles calculations demonstrate that while one of the cones is insensitive to lattice expansion, the second cone vanishes for small perturbation of the vertical Zr position. Internal symmetry breaking with external physical stimuli, along with the relativistic effect of spin-orbit coupling, is ablemore » to remove selectively the Dirac cones. A rational explanation in terms of d- and p-orbital mixing is provided to elucidate the origin of the infrequent Dirac cones in a flat structure. In conclusion, the versatility of transition-metal d orbitals combined with the honeycomb lattice provided by the B atoms yields particular features in a two-dimensional material.« less
Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H
2017-03-01
For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type. Copyright © 2017. Published by Elsevier Inc.
An object-based visual attention model for robotic applications.
Yu, Yuanlong; Mann, George K I; Gosine, Raymond G
2010-10-01
By extending integrated competition hypothesis, this paper presents an object-based visual attention model, which selects one object of interest using low-dimensional features, resulting that visual perception starts from a fast attentional selection procedure. The proposed attention model involves seven modules: learning of object representations stored in a long-term memory (LTM), preattentive processing, top-down biasing, bottom-up competition, mediation between top-down and bottom-up ways, generation of saliency maps, and perceptual completion processing. It works in two phases: learning phase and attending phase. In the learning phase, the corresponding object representation is trained statistically when one object is attended. A dual-coding object representation consisting of local and global codings is proposed. Intensity, color, and orientation features are used to build the local coding, and a contour feature is employed to constitute the global coding. In the attending phase, the model preattentively segments the visual field into discrete proto-objects using Gestalt rules at first. If a task-specific object is given, the model recalls the corresponding representation from LTM and deduces the task-relevant feature(s) to evaluate top-down biases. The mediation between automatic bottom-up competition and conscious top-down biasing is then performed to yield a location-based saliency map. By combination of location-based saliency within each proto-object, the proto-object-based saliency is evaluated. The most salient proto-object is selected for attention, and it is finally put into the perceptual completion processing module to yield a complete object region. This model has been applied into distinct tasks of robots: detection of task-specific stationary and moving objects. Experimental results under different conditions are shown to validate this model.
CFS-SMO based classification of breast density using multiple texture models.
Sharma, Vipul; Singh, Sukhwinder
2014-06-01
It is highly acknowledged in the medical profession that density of breast tissue is a major cause for the growth of breast cancer. Increased breast density was found to be linked with an increased risk of breast cancer growth, as high density makes it difficult for radiologists to see an abnormality which leads to false negative results. Therefore, there is need for the development of highly efficient techniques for breast tissue classification based on density. This paper presents a hybrid scheme for classification of fatty and dense mammograms using correlation-based feature selection (CFS) and sequential minimal optimization (SMO). In this work, texture analysis is done on a region of interest selected from the mammogram. Various texture models have been used to quantify the texture of parenchymal patterns of breast. To reduce the dimensionality and to identify the features which differentiate between breast tissue densities, CFS is used. Finally, classification is performed using SMO. The performance is evaluated using 322 images of mini-MIAS database. Highest accuracy of 96.46% is obtained for two-class problem (fatty and dense) using proposed approach. Performance of selected features by CFS is also evaluated by Naïve Bayes, Multilayer Perceptron, RBF Network, J48 and kNN classifier. The proposed CFS-SMO method outperforms all other classifiers giving a sensitivity of 100%. This makes it suitable to be taken as a second opinion in classifying breast tissue density.
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
2010-01-01
Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480
Blöchliger, Nicolas; Caflisch, Amedeo; Vitalis, Andreas
2015-11-10
Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Molecular dynamics simulations produce high-dimensional data sets describing molecules observed in time. Here, we propose to globally or locally weight simulation features based on effective rates. This emphasizes, in a data-driven manner, slow degrees of freedom that often report on the metastable states sampled by the molecular system. We couple this idea to several unsupervised learning protocols. Our approach unmasks slow side chain dynamics within the native state of a miniprotein and reveals additional metastable conformations of a protein. The approach can be combined with most algorithms for clustering or dimensionality reduction.
Combining Feature Extraction Methods to Assist the Diagnosis of Alzheimer's Disease.
Segovia, F; Górriz, J M; Ramírez, J; Phillips, C
2016-01-01
Neuroimaging data as (18)F-FDG PET is widely used to assist the diagnosis of Alzheimer's disease (AD). Looking for regions with hypoperfusion/ hypometabolism, clinicians may predict or corroborate the diagnosis of the patients. Modern computer aided diagnosis (CAD) systems based on the statistical analysis of whole neuroimages are more accurate than classical systems based on quantifying the uptake of some predefined regions of interests (ROIs). In addition, these new systems allow determining new ROIs and take advantage of the huge amount of information comprised in neuroimaging data. A major branch of modern CAD systems for AD is based on multivariate techniques, which analyse a neuroimage as a whole, considering not only the voxel intensities but also the relations among them. In order to deal with the vast dimensionality of the data, a number of feature extraction methods have been successfully applied. In this work, we propose a CAD system based on the combination of several feature extraction techniques. First, some commonly used feature extraction methods based on the analysis of the variance (as principal component analysis), on the factorization of the data (as non-negative matrix factorization) and on classical magnitudes (as Haralick features) were simultaneously applied to the original data. These feature sets were then combined by means of two different combination approaches: i) using a single classifier and a multiple kernel learning approach and ii) using an ensemble of classifier and selecting the final decision by majority voting. The proposed approach was evaluated using a labelled neuroimaging database along with a cross validation scheme. As conclusion, the proposed CAD system performed better than approaches using only one feature extraction technique. We also provide a fair comparison (using the same database) of the selected feature extraction methods.
Wang, Hong-ping; Chen, Chang; Liu, Yan; Yang, Hong-Jun; Wu, Hong-Wei; Xiao, Hong-Bin
2015-11-01
The incomplete identification of the chemical components of traditional Chinese medicinal formula has been one of the bottlenecks in the modernization of traditional Chinese medicine. Tandem mass spectrometry has been widely used for the identification of chemical substances. Current automatic tandem mass spectrometry acquisition, where precursor ions were selected according to their signal intensity, encounters a drawback in chemical substances identification when samples contain many overlapping signals. Compounds in minor or trace amounts could not be identified because most tandem mass spectrometry information was lost. Herein, a molecular feature orientated precursor ion selection and tandem mass spectrometry structure elucidation method for complex Chinese medicine chemical constituent analysis was developed. The precursor ions were selected according to their two-dimensional characteristics of retention times and mass-to-charge ratio ranges from herbal compounds, so that all precursor ions from herbal compounds were included and more minor chemical constituents in Chinese medicine were identified. Compared to the conventional automatic tandem mass spectrometry setups, the approach is novel and can overcome the drawback for chemical substances identification. As an example, 276 compounds from the Chinese Medicine of Yi-Xin-Shu capsule were identified. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
SoFoCles: feature filtering for microarray classification based on gene ontology.
Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A
2010-02-01
Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.
NASA Astrophysics Data System (ADS)
Funane, Tsukasa; Hou, Steven S.; Zoltowska, Katarzyna Marta; van Veluw, Susanne J.; Berezovska, Oksana; Kumar, Anand T. N.; Bacskai, Brian J.
2018-05-01
We have developed an imaging technique which combines selective plane illumination microscopy with time-domain fluorescence lifetime imaging microscopy (SPIM-FLIM) for three-dimensional volumetric imaging of cleared mouse brains with micro- to mesoscopic resolution. The main features of the microscope include a wavelength-adjustable pulsed laser source (Ti:sapphire) (near-infrared) laser, a BiBO frequency-doubling photonic crystal, a liquid chamber, an electrically focus-tunable lens, a cuvette based sample holder, and an air (dry) objective lens. The performance of the system was evaluated with a lifetime reference dye and micro-bead phantom measurements. Intensity and lifetime maps of three-dimensional human embryonic kidney (HEK) cell culture samples and cleared mouse brain samples expressing green fluorescent protein (GFP) (donor only) and green and red fluorescent protein [positive Förster (fluorescence) resonance energy transfer] were acquired. The results show that the SPIM-FLIM system can be used for sample sizes ranging from single cells to whole mouse organs and can serve as a powerful tool for medical and biological research.
Zhai, Hong Lin; Zhai, Yue Yuan; Li, Pei Zhen; Tian, Yue Li
2013-01-21
A very simple approach to quantitative analysis is proposed based on the technology of digital image processing using three-dimensional (3D) spectra obtained by high-performance liquid chromatography coupled with a diode array detector (HPLC-DAD). As the region-based shape features of a grayscale image, Zernike moments with inherently invariance property were employed to establish the linear quantitative models. This approach was applied to the quantitative analysis of three compounds in mixed samples using 3D HPLC-DAD spectra, and three linear models were obtained, respectively. The correlation coefficients (R(2)) for training and test sets were more than 0.999, and the statistical parameters and strict validation supported the reliability of established models. The analytical results suggest that the Zernike moment selected by stepwise regression can be used in the quantitative analysis of target compounds. Our study provides a new idea for quantitative analysis using 3D spectra, which can be extended to the analysis of other 3D spectra obtained by different methods or instruments.
NASA Astrophysics Data System (ADS)
McNitt-Gray, Michael F.; Hart, Eric M.; Goldin, Jonathan G.; Yao, Chih-Wei; Aberle, Denise R.
1996-04-01
The purpose of our study was to characterize solitary pulmonary nodules (SPN) as benign or malignant based on pattern classification techniques using size, shape, density and texture features extracted from HRCT images. HRCT images of patients with a SPN are acquired, routed through a PACS and displayed on a thoracic radiology workstation. Using the original data, the SPN is semiautomatically contoured using a nodule/background threshold. The contour is used to calculate size and several shape parameters, including compactness and bending energy. Pixels within the interior of the contour are used to calculate several features including: (1) nodule density-related features, such as representative Hounsfield number and moment of inertia, and (2) texture measures based on the spatial gray level dependence matrix and fractal dimension. The true diagnosis of the SPN is established by histology from biopsy or, in the case of some benign nodules, extended follow-up. Multi-dimensional analyses of the features are then performed to determine which features can discriminate between benign and malignant nodules. When a sufficient number of cases are obtained two pattern classifiers, a linear discriminator and a neural network, are trained and tested using a select subset of features. Preliminary data from nine (9) nodule cases have been obtained and several features extracted. While the representative CT number is a reasonably good indicator, it is an inconclusive predictor of SPN diagnosis when considered by itself. Separation between benign and malignant nodules improves when other features, such as the distribution of density as measured by moment of inertia, are included in the analysis. Software has been developed and preliminary results have been obtained which show that individual features may not be sufficient to discriminate between benign and malignant nodules. However, combinations of these features may be able to discriminate between these two classes. With additional cases and more features, we will be able to perform a feature selection procedure and ultimately to train and test pattern classifiers in this discrimination task.
A wavelet-based technique to predict treatment outcome for Major Depressive Disorder.
Mumtaz, Wajid; Xia, Likun; Mohd Yasin, Mohd Azhar; Azhar Ali, Syed Saad; Malik, Aamir Saeed
2017-01-01
Treatment management for Major Depressive Disorder (MDD) has been challenging. However, electroencephalogram (EEG)-based predictions of antidepressant's treatment outcome may help during antidepressant's selection and ultimately improve the quality of life for MDD patients. In this study, a machine learning (ML) method involving pretreatment EEG data was proposed to perform such predictions for Selective Serotonin Reuptake Inhibitor (SSRIs). For this purpose, the acquisition of experimental data involved 34 MDD patients and 30 healthy controls. Consequently, a feature matrix was constructed involving time-frequency decomposition of EEG data based on wavelet transform (WT) analysis, termed as EEG data matrix. However, the resultant EEG data matrix had high dimensionality. Therefore, dimension reduction was performed based on a rank-based feature selection method according to a criterion, i.e., receiver operating characteristic (ROC). As a result, the most significant features were identified and further be utilized during the training and testing of a classification model, i.e., the logistic regression (LR) classifier. Finally, the LR model was validated with 100 iterations of 10-fold cross-validation (10-CV). The classification results were compared with short-time Fourier transform (STFT) analysis, and empirical mode decompositions (EMD). The wavelet features extracted from frontal and temporal EEG data were found statistically significant. In comparison with other time-frequency approaches such as the STFT and EMD, the WT analysis has shown highest classification accuracy, i.e., accuracy = 87.5%, sensitivity = 95%, and specificity = 80%. In conclusion, significant wavelet coefficients extracted from frontal and temporal pre-treatment EEG data involving delta and theta frequency bands may predict antidepressant's treatment outcome for the MDD patients.
GENIE: a hybrid genetic algorithm for feature classification in multispectral images
NASA Astrophysics Data System (ADS)
Perkins, Simon J.; Theiler, James P.; Brumby, Steven P.; Harvey, Neal R.; Porter, Reid B.; Szymanski, John J.; Bloch, Jeffrey J.
2000-10-01
We consider the problem of pixel-by-pixel classification of a multi- spectral image using supervised learning. Conventional spuervised classification techniques such as maximum likelihood classification and less conventional ones s uch as neural networks, typically base such classifications solely on the spectral components of each pixel. It is easy to see why: the color of a pixel provides a nice, bounded, fixed dimensional space in which these classifiers work well. It is often the case however, that spectral information alone is not sufficient to correctly classify a pixel. Maybe spatial neighborhood information is required as well. Or maybe the raw spectral components do not themselves make for easy classification, but some arithmetic combination of them would. In either of these cases we have the problem of selecting suitable spatial, spectral or spatio-spectral features that allow the classifier to do its job well. The number of all possible such features is extremely large. How can we select a suitable subset? We have developed GENIE, a hybrid learning system that combines a genetic algorithm that searches a space of image processing operations for a set that can produce suitable feature planes, and a more conventional classifier which uses those feature planes to output a final classification. In this paper we show that the use of a hybrid GA provides significant advantages over using either a GA alone or more conventional classification methods alone. We present results using high-resolution IKONOS data, looking for regions of burned forest and for roads.
Liu, Jianli; Lughofer, Edwin; Zeng, Xianyi
2015-01-01
Modeling human aesthetic perception of visual textures is important and valuable in numerous industrial domains, such as product design, architectural design, and decoration. Based on results from a semantic differential rating experiment, we modeled the relationship between low-level basic texture features and aesthetic properties involved in human aesthetic texture perception. First, we compute basic texture features from textural images using four classical methods. These features are neutral, objective, and independent of the socio-cultural context of the visual textures. Then, we conduct a semantic differential rating experiment to collect from evaluators their aesthetic perceptions of selected textural stimuli. In semantic differential rating experiment, eights pairs of aesthetic properties are chosen, which are strongly related to the socio-cultural context of the selected textures and to human emotions. They are easily understood and connected to everyday life. We propose a hierarchical feed-forward layer model of aesthetic texture perception and assign 8 pairs of aesthetic properties to different layers. Finally, we describe the generation of multiple linear and non-linear regression models for aesthetic prediction by taking dimensionality-reduced texture features and aesthetic properties of visual textures as dependent and independent variables, respectively. Our experimental results indicate that the relationships between each layer and its neighbors in the hierarchical feed-forward layer model of aesthetic texture perception can be fitted well by linear functions, and the models thus generated can successfully bridge the gap between computational texture features and aesthetic texture properties.
Defect-Repairable Latent Feature Extraction of Driving Behavior via a Deep Sparse Autoencoder
Taniguchi, Tadahiro; Takenaka, Kazuhito; Bando, Takashi
2018-01-01
Data representing driving behavior, as measured by various sensors installed in a vehicle, are collected as multi-dimensional sensor time-series data. These data often include redundant information, e.g., both the speed of wheels and the engine speed represent the velocity of the vehicle. Redundant information can be expected to complicate the data analysis, e.g., more factors need to be analyzed; even varying the levels of redundancy can influence the results of the analysis. We assume that the measured multi-dimensional sensor time-series data of driving behavior are generated from low-dimensional data shared by the many types of one-dimensional data of which multi-dimensional time-series data are composed. Meanwhile, sensor time-series data may be defective because of sensor failure. Therefore, another important function is to reduce the negative effect of defective data when extracting low-dimensional time-series data. This study proposes a defect-repairable feature extraction method based on a deep sparse autoencoder (DSAE) to extract low-dimensional time-series data. In the experiments, we show that DSAE provides high-performance latent feature extraction for driving behavior, even for defective sensor time-series data. In addition, we show that the negative effect of defects on the driving behavior segmentation task could be reduced using the latent features extracted by DSAE. PMID:29462931
Min, Jian-Liang; Chou, Kuo-Chen
2013-01-01
With the features of extremely high selectivity and efficiency in catalyzing almost all the chemical reactions in cells, enzymes play vitally important roles for the life of an organism and hence have become frequent targets for drug design. An essential step in developing drugs by targeting enzymes is to identify drug-enzyme interactions in cells. It is both time-consuming and costly to do this purely by means of experimental techniques alone. Although some computational methods were developed in this regard based on the knowledge of the three-dimensional structure of enzyme, unfortunately their usage is quite limited because three-dimensional structures of many enzymes are still unknown. Here, we reported a sequence-based predictor, called “iEzy-Drug,” in which each drug compound was formulated by a molecular fingerprint with 258 feature components, each enzyme by the Chou's pseudo amino acid composition generated via incorporating sequential evolution information and physicochemical features derived from its sequence, and the prediction engine was operated by the fuzzy K-nearest neighbor algorithm. The overall success rate achieved by iEzy-Drug via rigorous cross-validations was about 91%. Moreover, to maximize the convenience for the majority of experimental scientists, a user-friendly web server was established, by which users can easily obtain their desired results. PMID:24371828
NASA Astrophysics Data System (ADS)
Djallel Dilmi, Mohamed; Mallet, Cécile; Barthes, Laurent; Chazottes, Aymeric
2017-04-01
The study of rain time series records is mainly carried out using rainfall rate or rain accumulation parameters estimated on a fixed integration time (typically 1 min, 1 hour or 1 day). In this study we used the concept of rain event. In fact, the discrete and intermittent natures of rain processes make the definition of some features inadequate when defined on a fixed duration. Long integration times (hour, day) lead to mix rainy and clear air periods in the same sample. Small integration time (seconds, minutes) will lead to noisy data with a great sensibility to detector characteristics. The analysis on the whole rain event instead of individual short duration samples of a fixed duration allows to clarify relationships between features, in particular between macro physical and microphysical ones. This approach allows suppressing the intra-event variability partly due to measurement uncertainties and allows focusing on physical processes. An algorithm based on Genetic Algorithm (GA) and Self Organising Maps (SOM) is developed to obtain a parsimonious characterisation of rain events using a minimal set of variables. The use of self-organizing map (SOM) is justified by the fact that it allows to map a high dimensional data space in a two-dimensional space while preserving as much as possible the initial space topology in an unsupervised way. The obtained SOM allows providing the dependencies between variables and consequently removing redundant variables leading to a minimal subset of only five features (the event duration, the rain rate peak, the rain event depth, the event rain rate standard deviation and the absolute rain rate variation of order 0.5). To confirm relevance of the five selected features the corresponding SOM is analyzed. This analysis shows clearly the existence of relationships between features. It also shows the independence of the inter-event time (IETp) feature or the weak dependence of the Dry percentage in event (Dd%e) feature. This confirms that a rain time series can be considered by an alternation of independent rain event and no rain period. The five selected feature are used to perform a hierarchical clustering of the events. The well-known division between stratiform and convective events appears clearly. This classification into two classes is then refined in 5 fairly homogeneous subclasses. The data driven analysis performed on whole rain events instead of fixed length samples allows identifying strong relationships between macrophysics (based on rain rate) and microphysics (based on raindrops) features. We show that among the 5 identified subclasses some of them have specific microphysics characteristics. Obtaining information on microphysical characteristics of rainfall events from rain gauges measurement suggests many implications in development of the quantitative precipitation estimation (QPE), for the improvement of rain rate retrieval algorithm in remote sensing context.
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
NASA Astrophysics Data System (ADS)
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Dashtban, M; Balafar, Mohammadali
2017-03-01
Gene selection is a demanding task for microarray data analysis. The diverse complexity of different cancers makes this issue still challenging. In this study, a novel evolutionary method based on genetic algorithms and artificial intelligence is proposed to identify predictive genes for cancer classification. A filter method was first applied to reduce the dimensionality of feature space followed by employing an integer-coded genetic algorithm with dynamic-length genotype, intelligent parameter settings, and modified operators. The algorithmic behaviors including convergence trends, mutation and crossover rate changes, and running time were studied, conceptually discussed, and shown to be coherent with literature findings. Two well-known filter methods, Laplacian and Fisher score, were examined considering similarities, the quality of selected genes, and their influences on the evolutionary approach. Several statistical tests concerning choice of classifier, choice of dataset, and choice of filter method were performed, and they revealed some significant differences between the performance of different classifiers and filter methods over datasets. The proposed method was benchmarked upon five popular high-dimensional cancer datasets; for each, top explored genes were reported. Comparing the experimental results with several state-of-the-art methods revealed that the proposed method outperforms previous methods in DLBCL dataset. Copyright © 2017 Elsevier Inc. All rights reserved.
MO-AB-BRA-10: Cancer Therapy Outcome Prediction Based On Dempster-Shafer Theory and PET Imaging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lian, C; University of Rouen, QuantIF - EA 4108 LITIS, 76000 Rouen; Li, H
2015-06-15
Purpose: In cancer therapy, utilizing FDG-18 PET image-based features for accurate outcome prediction is challenging because of 1) limited discriminative information within a small number of PET image sets, and 2) fluctuant feature characteristics caused by the inferior spatial resolution and system noise of PET imaging. In this study, we proposed a new Dempster-Shafer theory (DST) based approach, evidential low-dimensional transformation with feature selection (ELT-FS), to accurately predict cancer therapy outcome with both PET imaging features and clinical characteristics. Methods: First, a specific loss function with sparse penalty was developed to learn an adaptive low-rank distance metric for representing themore » dissimilarity between different patients’ feature vectors. By minimizing this loss function, a linear low-dimensional transformation of input features was achieved. Also, imprecise features were excluded simultaneously by applying a l2,1-norm regularization of the learnt dissimilarity metric in the loss function. Finally, the learnt dissimilarity metric was applied in an evidential K-nearest-neighbor (EK- NN) classifier to predict treatment outcome. Results: Twenty-five patients with stage II–III non-small-cell lung cancer and thirty-six patients with esophageal squamous cell carcinomas treated with chemo-radiotherapy were collected. For the two groups of patients, 52 and 29 features, respectively, were utilized. The leave-one-out cross-validation (LOOCV) protocol was used for evaluation. Compared to three existing linear transformation methods (PCA, LDA, NCA), the proposed ELT-FS leads to higher prediction accuracy for the training and testing sets both for lung-cancer patients (100+/−0.0, 88.0+/−33.17) and for esophageal-cancer patients (97.46+/−1.64, 83.33+/−37.8). The ELT-FS also provides superior class separation in both test data sets. Conclusion: A novel DST- based approach has been proposed to predict cancer treatment outcome using PET image features and clinical characteristics. A specific loss function has been designed for robust accommodation of feature set incertitude and imprecision, facilitating adaptive learning of the dissimilarity metric for the EK-NN classifier.« less
Single trial detection of hand poses in human ECoG using CSP based feature extraction.
Kapeller, C; Schneider, C; Kamada, K; Ogawa, H; Kunii, N; Ortner, R; Pruckl, R; Guger, C
2014-01-01
Decoding brain activity of corresponding highlevel tasks may lead to an independent and intuitively controlled Brain-Computer Interface (BCI). Most of today's BCI research focuses on analyzing the electroencephalogram (EEG) which provides only limited spatial and temporal resolution. Derived electrocorticographic (ECoG) signals allow the investigation of spatially highly focused task-related activation within the high-gamma frequency band, making the discrimination of individual finger movements or complex grasping tasks possible. Common spatial patterns (CSP) are commonly used for BCI systems and provide a powerful tool for feature optimization and dimensionality reduction. This work focused on the discrimination of (i) three complex hand movements, as well as (ii) hand movement and idle state. Two subjects S1 and S2 performed single `open', `peace' and `fist' hand poses in multiple trials. Signals in the high-gamma frequency range between 100 and 500 Hz were spatially filtered based on a CSP algorithm for (i) and (ii). Additionally, a manual feature selection approach was tested for (i). A multi-class linear discriminant analysis (LDA) showed for (i) an error rate of 13.89 % / 7.22 % and 18.42 % / 1.17 % for S1 and S2 using manually / CSP selected features, where for (ii) a two class LDA lead to a classification error of 13.39 % and 2.33 % for S1 and S2, respectively.
Wexler, Eliezer J.
1992-01-01
Analytical solutions to the advective-dispersive solute-transport equation are useful in predicting the fate of solutes in ground water. Analytical solutions compiled from available literature or derived by the author are presented for a variety of boundary condition types and solute-source configurations in one-, two-, and three-dimensional systems having uniform ground-water flow. A set of user-oriented computer programs was created to evaluate these solutions and to display the results in tabular and computer-graphics format. These programs incorporate many features that enhance their accuracy, ease of use, and versatility. Documentation for the programs describes their operation and required input data, and presents the results of sample problems. Derivations of selected solutions, source codes for the computer programs, and samples of program input and output also are included.
Direct observation of vibrational energy dispersal via methyl torsions.
Gardner, Adrian M; Tuttle, William D; Whalley, Laura E; Wright, Timothy G
2018-02-28
Explicit evidence for the role of methyl rotor levels in promoting energy dispersal is reported. A set of coupled zero-order vibration/vibration-torsion (vibtor) levels in the S 1 state of para -fluorotoluene ( p FT) are investigated. Two-dimensional laser-induced fluorescence (2D-LIF) and two-dimensional zero-kinetic-energy (2D-ZEKE) spectra are reported, and the assignment of the main features in both sets of spectra reveals that the methyl torsion is instrumental in providing a route for coupling between vibrational levels of different symmetry classes. We find that there is very localized, and selective, dissipation of energy via doorway states, and that, in addition to an increase in the density of states, a critical role of the methyl group is a relaxation of symmetry constraints compared to direct vibrational coupling.
Three-Dimensional Structures Self-Assembled from DNA Bricks
Ke, Yonggang; Ong, Luvena L.; Shih, William M.; Yin, Peng
2013-01-01
We describe a simple and robust method to construct complex three-dimensional (3D) structures using short synthetic DNA strands that we call “DNA bricks”. In one-step annealing reactions, bricks with hundreds of distinct sequences self-assemble into prescribed 3D shapes. Each 32-nucleotide brick is a modular component; it binds to four local neighbors and can be removed or added independently. Each 8-base-pair interaction between bricks defines a voxel with dimensions 2.5 nanometers by 2.5 nanometers by 2.7 nanometers, and a master brick collection defines a “molecular canvas” with dimensions of 10 by 10 by 10 voxels. By selecting subsets of bricks from this canvas, we constructed a panel of 102 distinct shapes exhibiting sophisticated surface features as well as intricate interior cavities and tunnels. PMID:23197527
Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches
NASA Astrophysics Data System (ADS)
H, Vathsala; Koolagudi, Shashidhar G.
2017-10-01
This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969-2005).
Study of large hemispherical photomultiplier tubes for the ANTARES neutrino telescope
NASA Astrophysics Data System (ADS)
Aguilar, J. A.; Albert, A.; Ameli, F.; Amram, P.; Anghinolfi, M.; Anton, G.; Anvar, S.; Ardellier-Desages, F. E.; Aslanides, E.; Aubert, J.-J.; Bailey, D.; Basa, S.; Battaglieri, M.; Becherini, Y.; Bellotti, R.; Beltramelli, J.; Bertin, V.; Billault, M.; Blaes, R.; Blanc, F.; de Botton, N.; Boulesteix, J.; Bouwhuis, M. C.; Brooks, C. B.; Bradbury, S. M.; Bruijn, R.; Brunner, J.; Burgio, G. F.; Cafagna, F.; Calzas, A.; Capone, A.; Caponetto, L.; Carmona, E.; Carr, J.; Cartwright, S. L.; Castorina, E.; Cavasinni, V.; Cecchini, S.; Charvis, P.; Circella, M.; Colnard, C.; Compère, C.; Coniglione, R.; Cooper, S.; Coyle, P.; Cuneo, S.; Damy, G.; van Dantzig, R.; Deschamps, A.; de Marzo, C.; Denans, D.; Destelle, J.-J.; de Vita, R.; Dinkelspiler, B.; Distefano, C.; Drogou, J.-F.; Druillole, F.; Engelen, J.; Ernenwein, J.-P.; Falchini, E.; Favard, S.; Feinstein, F.; Ferry, S.; Festy, D.; Flaminio, V.; Fopma, J.; Fuda, J.-L.; Gallone, J.-M.; Giacomelli, G.; Girard, N.; Goret, P.; Graf, K.; Hallewell, G.; Hartmann, B.; Heijboer, A.; Hello, Y.; Hernández-Rey, J. J.; Herrouin, G.; Hößl, J.; Hoffmann, C.; Hubbard, J. R.; Jaquet, M.; de Jong, M.; Jouvenot, F.; Kappes, A.; Karg, T.; Karkar, S.; Karolak, M.; Katz, U.; Keller, P.; Kooijman, P.; Korolkova, E. V.; Kouchner, A.; Kretschmer, W.; Kuch, S.; Kudryavtsev, V. A.; Lafoux, H.; Lagier, P.; Lahmann, R.; Lamare, P.; Languillat, J.-C.; Laschinsky, H.; Laubier, L.; Legou, T.; Le Guen, Y.; Le Provost, H.; Le van Suu, A.; Lo Nigro, L.; Lo Presti, D.; Loucatos, S.; Louis, F.; Lyashuk, V.; Marcelin, M.; Margiotta, A.; Maron, C.; Massol, A.; Masullo, R.; Mazéas, F.; Mazure, A.; McMillan, J. E.; Migneco, E.; Millot, C.; Milovanovic, A.; Montanet, F.; Montaruli, T.; Morel, J.-P.; Morganti, M.; Moscoso, L.; Musumeci, M.; Naumann, C.; Naumann-Godo, M.; Nezri, E.; Niess, V.; Nooren, G. J.; Ogden, P.; Olivetto, C.; Palanque-Delabrouille, N.; Papaleo, R.; Payre, P.; Petta, C.; Piattelli, P.; Pineau, J.-P.; Poinsignon, J.; Popa, V.; Potheau, R.; Pradier, T.; Racca, C.; Raia, G.; Randazzo, N.; Real, D.; van Rens, B. A. P.; Réthoré, F.; Riccobene, G.; Rigaud, V.; Ripani, M.; Roca-Blay, V.; Rolin, J.-F.; Romita, M.; Rose, H. J.; Rostovtsev, A.; Ruppi, M.; Russo, G. V.; Sacquin, Y.; Salesa, F.; Salomon, K.; Saouter, S.; Sapienza, P.; Shanidze, R.; Schuller, J.-P.; Schuster, W.; Sokalski, I.; Spurio, M.; Stolarczyk, T.; Stubert, D.; Taiuti, M.; Thompson, L. F.; Tilav, S.; Valdy, P.; Valente, V.; Vallage, B.; Vernin, P.; Virieux, J.; de Vries, G.; de Witt Huberts, P.; de Wolf, E.; Zaborov, D.; Zaccone, H.; Zakharov, V.; Zornoza, J. D.; Zúñiga, J.
2005-12-01
The ANTARES neutrino telescope, to be immersed depth in the Mediterranean Sea, will consist of a three-dimensional matrix of 900 large area photomultiplier tubes housed in pressure-resistant glass spheres. The selection of the optimal photomultiplier was a critical step for the project and required an intensive phase of tests and developments carried out in close collaboration with the main manufacturers worldwide. This paper provides an overview of the tests performed by the collaboration and describes in detail the features of the photomultiplier tube chosen for ANTARES.
NASA Astrophysics Data System (ADS)
Dobner, Sven; Fallnich, Carsten
2014-02-01
We present the hyperspectral imaging capabilities of in-line interferometric femtosecond stimulated Raman scattering. The beneficial features of this method, namely, the improved signal-to-background ratio compared to other applicable broadband stimulated Raman scattering methods and the simple experimental implementation, allow for a rather fast acquisition of three-dimensional raster-scanned hyperspectral data-sets, which is shown for PMMA beads and a lipid droplet in water as a demonstration. A subsequent application of a principle component analysis displays the chemical selectivity of the method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cover, Keith S.; Lagerwaard, Frank J.; Senan, Suresh
2006-03-01
Purpose: Four-dimensional computerized tomography scans (4DCT) enable intrafractional motion to be determined. Because more than 1500 images can be generated with each 4DCT study, tools for efficient data visualization and evaluation are needed. We describe the use of color intensity projections (CIP) for visualizing mobility. Methods: Four-dimensional computerized tomography images of each patient slice were combined into a CIP composite image. Pixels largely unchanged over the component images appear unchanged in the CIP image. However, pixels whose intensity changes over the phases of the 4DCT appear in the CIP image as colored pixels, and the hue encodes the percentage ofmore » time the tissue was in each location. CIPs of 18 patients were used to study tumor and surrogate markers, namely the diaphragm and an abdominal marker block. Results: Color intensity projections permitted mobility of high-contrast features to be quickly visualized and measured. In three selected expiratory phases ('gating phases') that were reviewed in the sagittal plane, gating would have reduced mean tumor mobility from 6.3 {+-} 2.0 mm to 1.4 {+-} 0.5 mm. Residual tumor mobility in gating phases better correlated with residual mobility of the marker block than that of the diaphragm. Conclusion: CIPs permit immediate visualization of mobility in 4DCT images and simplify the selection of appropriate surrogates for gated radiotherapy.« less
Automated visual inspection for polished stone manufacture
NASA Astrophysics Data System (ADS)
Smith, Melvyn L.; Smith, Lyndon N.
2003-05-01
Increased globalisation of the ornamental stone market has lead to increased competition and more rigorous product quality requirements. As such, there are strong motivators to introduce new, more effective, inspection technologies that will help enable stone processors to reduce costs, improve quality and improve productivity. Natural stone surfaces may contain a mixture of complex two-dimensional (2D) patterns and three-dimensional (3D) features. The challenge in terms of automated inspection is to develop systems able to reliably identify 3D topographic defects, either naturally occurring or resulting from polishing, in the presence of concomitant complex 2D stochastic colour patterns. The resulting real-time analysis of the defects may be used in adaptive process control, in order to avoid the wasteful production of defective product. An innovative approach, using structured light and based upon an adaptation of the photometric stereo method, has been pioneered and developed at UWE to isolate and characterize mixed 2D and 3D surface features. The method is able to undertake tasks considered beyond the capabilities of existing surface inspection techniques. The approach has been successfully applied to real stone samples, and a selection of experimental results is presented.
Firefly Mating Algorithm for Continuous Optimization Problems
Ritthipakdee, Amarita; Premasathian, Nol; Jitkongchuen, Duangjai
2017-01-01
This paper proposes a swarm intelligence algorithm, called firefly mating algorithm (FMA), for solving continuous optimization problems. FMA uses genetic algorithm as the core of the algorithm. The main feature of the algorithm is a novel mating pair selection method which is inspired by the following 2 mating behaviors of fireflies in nature: (i) the mutual attraction between males and females causes them to mate and (ii) fireflies of both sexes are of the multiple-mating type, mating with multiple opposite sex partners. A female continues mating until her spermatheca becomes full, and, in the same vein, a male can provide sperms for several females until his sperm reservoir is depleted. This new feature enhances the global convergence capability of the algorithm. The performance of FMA was tested with 20 benchmark functions (sixteen 30-dimensional functions and four 2-dimensional ones) against FA, ALC-PSO, COA, MCPSO, LWGSODE, MPSODDS, DFOA, SHPSOS, LSA, MPDPGA, DE, and GABC algorithms. The experimental results showed that the success rates of our proposed algorithm with these functions were higher than those of other algorithms and the proposed algorithm also required fewer numbers of iterations to reach the global optima. PMID:28808442
Firefly Mating Algorithm for Continuous Optimization Problems.
Ritthipakdee, Amarita; Thammano, Arit; Premasathian, Nol; Jitkongchuen, Duangjai
2017-01-01
This paper proposes a swarm intelligence algorithm, called firefly mating algorithm (FMA), for solving continuous optimization problems. FMA uses genetic algorithm as the core of the algorithm. The main feature of the algorithm is a novel mating pair selection method which is inspired by the following 2 mating behaviors of fireflies in nature: (i) the mutual attraction between males and females causes them to mate and (ii) fireflies of both sexes are of the multiple-mating type, mating with multiple opposite sex partners. A female continues mating until her spermatheca becomes full, and, in the same vein, a male can provide sperms for several females until his sperm reservoir is depleted. This new feature enhances the global convergence capability of the algorithm. The performance of FMA was tested with 20 benchmark functions (sixteen 30-dimensional functions and four 2-dimensional ones) against FA, ALC-PSO, COA, MCPSO, LWGSODE, MPSODDS, DFOA, SHPSOS, LSA, MPDPGA, DE, and GABC algorithms. The experimental results showed that the success rates of our proposed algorithm with these functions were higher than those of other algorithms and the proposed algorithm also required fewer numbers of iterations to reach the global optima.
NASA Astrophysics Data System (ADS)
Li, Shuai; Wang, Chen; Zheng, Shi-Han; Wang, Rui-Qiang; Li, Jun; Yang, Mou
2018-04-01
The impurity effect is studied in three-dimensional Dirac semimetals in the framework of a T-matrix method to consider the multiple scattering events of Dirac electrons off impurities. It has been found that a strong impurity potential can significantly restructure the energy dispersion and the density of states of Dirac electrons. An impurity-induced resonant state emerges and significantly modifies the pristine optical response. It is shown that the impurity state disturbs the common longitudinal optical conductivity by creating either an optical conductivity peak or double absorption jumps, depending on the relative position of the impurity band and the Fermi level. More importantly, these conductivity features appear in the forbidden region between the Drude and interband transition, completely or partially filling the Pauli block region of optical response. The underlying physics is that the appearance of resonance states as well as the broadening of the bands leads to a more complicated selection rule for the optical transitions, making it possible to excite new electron-hole pairs in the forbidden region. These features in optical conductivity provide valuable information to understand the impurity behaviors in 3D Dirac materials.
Cinelli, Mattia; Sun, , Yuxin; Best, Katharine; Heather, James M.; Reich-Zeliger, Shlomit; Shifrut, Eric; Friedman, Nir; Shawe-Taylor, John; Chain, Benny
2017-01-01
Abstract Motivation: Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund’s adjuvant (CFA) or CFA alone. Results: The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. Summary: The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund’s Adjuvant. Availability and implementation: The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893. The Decombinator package is available at github.com/innate2adaptive/Decombinator. The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html. Contact: b.chain@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28073756
Multistage classification of multispectral Earth observational data: The design approach
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Muasher, M. J.; Landgrebe, D. A.
1981-01-01
An algorithm is proposed which predicts the optimal features at every node in a binary tree procedure. The algorithm estimates the probability of error by approximating the area under the likelihood ratio function for two classes and taking into account the number of training samples used in estimating each of these two classes. Some results on feature selection techniques, particularly in the presence of a very limited set of training samples, are presented. Results comparing probabilities of error predicted by the proposed algorithm as a function of dimensionality as compared to experimental observations are shown for aircraft and LANDSAT data. Results are obtained for both real and simulated data. Finally, two binary tree examples which use the algorithm are presented to illustrate the usefulness of the procedure.
An Optimal Strategy for Accurate Bulge-to-disk Decomposition of Disk Galaxies
NASA Astrophysics Data System (ADS)
Gao, Hua; Ho, Luis C.
2017-08-01
The development of two-dimensional (2D) bulge-to-disk decomposition techniques has shown their advantages over traditional one-dimensional (1D) techniques, especially for galaxies with non-axisymmetric features. However, the full potential of 2D techniques has yet to be fully exploited. Secondary morphological features in nearby disk galaxies, such as bars, lenses, rings, disk breaks, and spiral arms, are seldom accounted for in 2D image decompositions, even though some image-fitting codes, such as GALFIT, are capable of handling them. We present detailed, 2D multi-model and multi-component decomposition of high-quality R-band images of a representative sample of nearby disk galaxies selected from the Carnegie-Irvine Galaxy Survey, using the latest version of GALFIT. The sample consists of five barred and five unbarred galaxies, spanning Hubble types from S0 to Sc. Traditional 1D decomposition is also presented for comparison. In detailed case studies of the 10 galaxies, we successfully model the secondary morphological features. Through a comparison of best-fit parameters obtained from different input surface brightness models, we identify morphological features that significantly impact bulge measurements. We show that nuclear and inner lenses/rings and disk breaks must be properly taken into account to obtain accurate bulge parameters, whereas outer lenses/rings and spiral arms have a negligible effect. We provide an optimal strategy to measure bulge parameters of typical disk galaxies, as well as prescriptions to estimate realistic uncertainties of them, which will benefit subsequent decomposition of a larger galaxy sample.
Driving behavior recognition using EEG data from a simulated car-following experiment.
Yang, Liu; Ma, Rui; Zhang, H Michael; Guan, Wei; Jiang, Shixiong
2018-07-01
Driving behavior recognition is the foundation of driver assistance systems, with potential applications in automated driving systems. Most prevailing studies have used subjective questionnaire data and objective driving data to classify driving behaviors, while few studies have used physiological signals such as electroencephalography (EEG) to gather data. To bridge this gap, this paper proposes a two-layer learning method for driving behavior recognition using EEG data. A simulated car-following driving experiment was designed and conducted to simultaneously collect data on the driving behaviors and EEG data of drivers. The proposed learning method consists of two layers. In Layer I, two-dimensional driving behavior features representing driving style and stability were selected and extracted from raw driving behavior data using K-means and support vector machine recursive feature elimination. Five groups of driving behaviors were classified based on these two-dimensional driving behavior features. In Layer II, the classification results from Layer I were utilized as inputs to generate a k-Nearest-Neighbor classifier identifying driving behavior groups using EEG data. Using independent component analysis, a fast Fourier transformation, and linear discriminant analysis sequentially, the raw EEG signals were processed to extract two core EEG features. Classifier performance was enhanced using the adaptive synthetic sampling approach. A leave-one-subject-out cross validation was conducted. The results showed that the average classification accuracy for all tested traffic states was 69.5% and the highest accuracy reached 83.5%, suggesting a significant correlation between EEG patterns and car-following behavior. Copyright © 2017 Elsevier Ltd. All rights reserved.
An Optimal Strategy for Accurate Bulge-to-disk Decomposition of Disk Galaxies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao Hua; Ho, Luis C.
The development of two-dimensional (2D) bulge-to-disk decomposition techniques has shown their advantages over traditional one-dimensional (1D) techniques, especially for galaxies with non-axisymmetric features. However, the full potential of 2D techniques has yet to be fully exploited. Secondary morphological features in nearby disk galaxies, such as bars, lenses, rings, disk breaks, and spiral arms, are seldom accounted for in 2D image decompositions, even though some image-fitting codes, such as GALFIT, are capable of handling them. We present detailed, 2D multi-model and multi-component decomposition of high-quality R -band images of a representative sample of nearby disk galaxies selected from the Carnegie-Irvine Galaxymore » Survey, using the latest version of GALFIT. The sample consists of five barred and five unbarred galaxies, spanning Hubble types from S0 to Sc. Traditional 1D decomposition is also presented for comparison. In detailed case studies of the 10 galaxies, we successfully model the secondary morphological features. Through a comparison of best-fit parameters obtained from different input surface brightness models, we identify morphological features that significantly impact bulge measurements. We show that nuclear and inner lenses/rings and disk breaks must be properly taken into account to obtain accurate bulge parameters, whereas outer lenses/rings and spiral arms have a negligible effect. We provide an optimal strategy to measure bulge parameters of typical disk galaxies, as well as prescriptions to estimate realistic uncertainties of them, which will benefit subsequent decomposition of a larger galaxy sample.« less
Folded concave penalized learning in identifying multimodal MRI marker for Parkinson’s disease
Liu, Hongcheng; Du, Guangwei; Zhang, Lijun; Lewis, Mechelle M.; Wang, Xue; Yao, Tao; Li, Runze; Huang, Xuemei
2016-01-01
Background Brain MRI holds promise to gauge different aspects of Parkinson’s disease (PD)-related pathological changes. Its analysis, however, is hindered by the high-dimensional nature of the data. New method This study introduces folded concave penalized (FCP) sparse logistic regression to identify biomarkers for PD from a large number of potential factors. The proposed statistical procedures target the challenges of high-dimensionality with limited data samples acquired. The maximization problem associated with the sparse logistic regression model is solved by local linear approximation. The proposed procedures then are applied to the empirical analysis of multimodal MRI data. Results From 45 features, the proposed approach identified 15 MRI markers and the UPSIT, which are known to be clinically relevant to PD. By combining the MRI and clinical markers, we can enhance substantially the specificity and sensitivity of the model, as indicated by the ROC curves. Comparison to existing methods We compare the folded concave penalized learning scheme with both the Lasso penalized scheme and the principle component analysis-based feature selection (PCA) in the Parkinson’s biomarker identification problem that takes into account both the clinical features and MRI markers. The folded concave penalty method demonstrates a substantially better clinical potential than both the Lasso and PCA in terms of specificity and sensitivity. Conclusions For the first time, we applied the FCP learning method to MRI biomarker discovery in PD. The proposed approach successfully identified MRI markers that are clinically relevant. Combining these biomarkers with clinical features can substantially enhance performance. PMID:27102045
NASA Technical Reports Server (NTRS)
Stoutemyer, D. R.
1977-01-01
The computer algebra language MACSYMA enables the programmer to include symbolic physical units in computer calculations, and features automatic detection of dimensionally-inhomogeneous formulas and conversion of inconsistent units in a dimensionally homogeneous formula. Some examples illustrate these features.
Yilmaz, E; Kayikcioglu, T; Kayipmaz, S
2017-07-01
In this article, we propose a decision support system for effective classification of dental periapical cyst and keratocystic odontogenic tumor (KCOT) lesions obtained via cone beam computed tomography (CBCT). CBCT has been effectively used in recent years for diagnosing dental pathologies and determining their boundaries and content. Unlike other imaging techniques, CBCT provides detailed and distinctive information about the pathologies by enabling a three-dimensional (3D) image of the region to be displayed. We employed 50 CBCT 3D image dataset files as the full dataset of our study. These datasets were identified by experts as periapical cyst and KCOT lesions according to the clinical, radiographic and histopathologic features. Segmentation operations were performed on the CBCT images using viewer software that we developed. Using the tools of this software, we marked the lesional volume of interest and calculated and applied the order statistics and 3D gray-level co-occurrence matrix for each CBCT dataset. A feature vector of the lesional region, including 636 different feature items, was created from those statistics. Six classifiers were used for the classification experiments. The Support Vector Machine (SVM) classifier achieved the best classification performance with 100% accuracy, and 100% F-score (F1) scores as a result of the experiments in which a ten-fold cross validation method was used with a forward feature selection algorithm. SVM achieved the best classification performance with 96.00% accuracy, and 96.00% F1 scores in the experiments in which a split sample validation method was used with a forward feature selection algorithm. SVM additionally achieved the best performance of 94.00% accuracy, and 93.88% F1 in which a leave-one-out (LOOCV) method was used with a forward feature selection algorithm. Based on the results, we determined that periapical cyst and KCOT lesions can be classified with a high accuracy with the models that we built using the new dataset selected for this study. The studies mentioned in this article, along with the selected 3D dataset, 3D statistics calculated from the dataset, and performance results of the different classifiers, comprise an important contribution to the field of computer-aided diagnosis of dental apical lesions. Copyright © 2017 Elsevier B.V. All rights reserved.
Binary classification of items of interest in a repeatable process
Abell, Jeffrey A; Spicer, John Patrick; Wincek, Michael Anthony; Wang, Hui; Chakraborty, Debejyo
2015-01-06
A system includes host and learning machines. Each machine has a processor in electrical communication with at least one sensor. Instructions for predicting a binary quality status of an item of interest during a repeatable process are recorded in memory. The binary quality status includes passing and failing binary classes. The learning machine receives signals from the at least one sensor and identifies candidate features. Features are extracted from the candidate features, each more predictive of the binary quality status. The extracted features are mapped to a dimensional space having a number of dimensions proportional to the number of extracted features. The dimensional space includes most of the passing class and excludes at least 90 percent of the failing class. Received signals are compared to the boundaries of the recorded dimensional space to predict, in real time, the binary quality status of a subsequent item of interest.
NASA Astrophysics Data System (ADS)
Wang, Yongzhi; Ma, Yuqing; Zhu, A.-xing; Zhao, Hui; Liao, Lixia
2018-05-01
Facade features represent segmentations of building surfaces and can serve as a building framework. Extracting facade features from three-dimensional (3D) point cloud data (3D PCD) is an efficient method for 3D building modeling. By combining the advantages of 3D PCD and two-dimensional optical images, this study describes the creation of a highly accurate building facade feature extraction method from 3D PCD with a focus on structural information. The new extraction method involves three major steps: image feature extraction, exploration of the mapping method between the image features and 3D PCD, and optimization of the initial 3D PCD facade features considering structural information. Results show that the new method can extract the 3D PCD facade features of buildings more accurately and continuously. The new method is validated using a case study. In addition, the effectiveness of the new method is demonstrated by comparing it with the range image-extraction method and the optical image-extraction method in the absence of structural information. The 3D PCD facade features extracted by the new method can be applied in many fields, such as 3D building modeling and building information modeling.
Feature Screening for Ultrahigh Dimensional Categorical Data with Applications.
Huang, Danyang; Li, Runze; Wang, Hansheng
2014-01-01
Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies, and illustrate the proposed method by two empirical datasets.
Locally Linear Embedding of Local Orthogonal Least Squares Images for Face Recognition
NASA Astrophysics Data System (ADS)
Hafizhelmi Kamaru Zaman, Fadhlan
2018-03-01
Dimensionality reduction is very important in face recognition since it ensures that high-dimensionality data can be mapped to lower dimensional space without losing salient and integral facial information. Locally Linear Embedding (LLE) has been previously used to serve this purpose, however, the process of acquiring LLE features requires high computation and resources. To overcome this limitation, we propose a locally-applied Local Orthogonal Least Squares (LOLS) model can be used as initial feature extraction before the application of LLE. By construction of least squares regression under orthogonal constraints we can preserve more discriminant information in the local subspace of facial features while reducing the overall features into a more compact form that we called LOLS images. LLE can then be applied on the LOLS images to maps its representation into a global coordinate system of much lower dimensionality. Several experiments carried out using publicly available face datasets such as AR, ORL, YaleB, and FERET under Single Sample Per Person (SSPP) constraint demonstrates that our proposed method can reduce the time required to compute LLE features while delivering better accuracy when compared to when either LLE or OLS alone is used. Comparison against several other feature extraction methods and more recent feature-learning method such as state-of-the-art Convolutional Neural Networks (CNN) also reveal the superiority of the proposed method under SSPP constraint.
Computing and visualizing time-varying merge trees for high-dimensional data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oesterling, Patrick; Heine, Christian; Weber, Gunther H.
2017-06-03
We introduce a new method that identifies and tracks features in arbitrary dimensions using the merge tree -- a structure for identifying topological features based on thresholding in scalar fields. This method analyzes the evolution of features of the function by tracking changes in the merge tree and relates features by matching subtrees between consecutive time steps. Using the time-varying merge tree, we present a structural visualization of the changing function that illustrates both features and their temporal evolution. We demonstrate the utility of our approach by applying it to temporal cluster analysis of high-dimensional point clouds.
Hosseini, Seyyed Abed; Khalilzadeh, Mohammad Ali; Naghibi-Sistani, Mohammad Bagher; Homam, Seyyed Mehran
2015-01-01
Background: This paper proposes a new emotional stress assessment system using multi-modal bio-signals. Electroencephalogram (EEG) is the reflection of brain activity and is widely used in clinical diagnosis and biomedical research. Methods: We design an efficient acquisition protocol to acquire the EEG signals in five channels (FP1, FP2, T3, T4 and Pz) and peripheral signals such as blood volume pulse, skin conductance (SC) and respiration, under images induction (calm-neutral and negatively excited) for the participants. The visual stimuli images are selected from the subset International Affective Picture System database. The qualitative and quantitative evaluation of peripheral signals are used to select suitable segments of EEG signals for improving the accuracy of signal labeling according to emotional stress states. After pre-processing, wavelet coefficients, fractal dimension, and Lempel-Ziv complexity are used to extract the features of the EEG signals. The vast number of features leads to the problem of dimensionality, which is solved using the genetic algorithm as a feature selection method. Results: The results show that the average classification accuracy is 89.6% for two categories of emotional stress states using the support vector machine (SVM). Conclusion: This is a great improvement in results compared to other similar researches. We achieve a noticeable improvement of 11.3% in accuracy using SVM classifier, in compared to previous studies. Therefore, a new fusion between EEG and peripheral signals are more robust in comparison to the separate signals. PMID:26622979
Hosseini, Seyyed Abed; Khalilzadeh, Mohammad Ali; Naghibi-Sistani, Mohammad Bagher; Homam, Seyyed Mehran
2015-07-06
This paper proposes a new emotional stress assessment system using multi-modal bio-signals. Electroencephalogram (EEG) is the reflection of brain activity and is widely used in clinical diagnosis and biomedical research. We design an efficient acquisition protocol to acquire the EEG signals in five channels (FP1, FP2, T3, T4 and Pz) and peripheral signals such as blood volume pulse, skin conductance (SC) and respiration, under images induction (calm-neutral and negatively excited) for the participants. The visual stimuli images are selected from the subset International Affective Picture System database. The qualitative and quantitative evaluation of peripheral signals are used to select suitable segments of EEG signals for improving the accuracy of signal labeling according to emotional stress states. After pre-processing, wavelet coefficients, fractal dimension, and Lempel-Ziv complexity are used to extract the features of the EEG signals. The vast number of features leads to the problem of dimensionality, which is solved using the genetic algorithm as a feature selection method. The results show that the average classification accuracy is 89.6% for two categories of emotional stress states using the support vector machine (SVM). This is a great improvement in results compared to other similar researches. We achieve a noticeable improvement of 11.3% in accuracy using SVM classifier, in compared to previous studies. Therefore, a new fusion between EEG and peripheral signals are more robust in comparison to the separate signals.
NASA Astrophysics Data System (ADS)
Sebastian Mannoor, Manu
Direct multidimensional integration of functional electronics and mechanical elements with viable biological systems could allow for the creation of bionic systems and devices possessing unique and advanced capabilities. For example, the ability to three dimensionally integrate functional electronic and mechanical components with biological cells and tissue could enable the creation of bionic systems that can have tremendous impact in regenerative medicine, prosthetics, and human-machine interfaces. However, as a consequence of the inherent dichotomy in material properties and limitations of conventional fabrication methods, the attainment of truly seamless integration of electronic and/or mechanical components with biological systems has been challenging. Nanomaterials engineering offers a general route for overcoming these dichotomies, primarily due to the existence of a dimensional compatibility between fundamental biological functional units and abiotic nanomaterial building blocks. One area of compelling interest for bionic systems is in the field of biomedical sensing, where the direct interfacing of nanosensors onto biological tissue or the human body could stimulate exciting opportunities such as on-body health quality monitoring and adaptive threat detection. Further, interfacing of antimicrobial peptide based bioselective probes onto the bionic nanosensors could offer abilities to detect pathogenic bacteria with bio-inspired selectivity. Most compellingly, when paired with additive manufacturing techniques such as 3D printing, these characteristics enable three dimensional integration and merging of a variety of functional materials including electronic, structural and biomaterials with viable biological cells, in the precise anatomic geometries of human organs, to form three dimensionally integrated, multi-functional bionic hybrids and cyborg devices with unique capabilities. In this thesis, we illustrate these approaches using three representative bionic systems: 1) Bionic Nanosensors: featuring bio-integrated graphene nanosensors for ubiquitous sensing, 2) Bionic Organs: featuring 3D printed bionic ears with three dimensionally integrated electronics and 3) Bionic Leaves: describing ongoing work in the direction of the creation of a bionic leaf enabled by the integration of plant derived photosynthetic functional units with electronic materials and components into a leaf-shaped hierarchical structure for harvesting photosynthetic bioelectricity.
Performance Evaluation of Multimodal Multifeature Authentication System Using KNN Classification.
Rajagopal, Gayathri; Palaniswamy, Ramamoorthy
2015-01-01
This research proposes a multimodal multifeature biometric system for human recognition using two traits, that is, palmprint and iris. The purpose of this research is to analyse integration of multimodal and multifeature biometric system using feature level fusion to achieve better performance. The main aim of the proposed system is to increase the recognition accuracy using feature level fusion. The features at the feature level fusion are raw biometric data which contains rich information when compared to decision and matching score level fusion. Hence information fused at the feature level is expected to obtain improved recognition accuracy. However, information fused at feature level has the problem of curse in dimensionality; here PCA (principal component analysis) is used to diminish the dimensionality of the feature sets as they are high dimensional. The proposed multimodal results were compared with other multimodal and monomodal approaches. Out of these comparisons, the multimodal multifeature palmprint iris fusion offers significant improvements in the accuracy of the suggested multimodal biometric system. The proposed algorithm is tested using created virtual multimodal database using UPOL iris database and PolyU palmprint database.
Performance Evaluation of Multimodal Multifeature Authentication System Using KNN Classification
Rajagopal, Gayathri; Palaniswamy, Ramamoorthy
2015-01-01
This research proposes a multimodal multifeature biometric system for human recognition using two traits, that is, palmprint and iris. The purpose of this research is to analyse integration of multimodal and multifeature biometric system using feature level fusion to achieve better performance. The main aim of the proposed system is to increase the recognition accuracy using feature level fusion. The features at the feature level fusion are raw biometric data which contains rich information when compared to decision and matching score level fusion. Hence information fused at the feature level is expected to obtain improved recognition accuracy. However, information fused at feature level has the problem of curse in dimensionality; here PCA (principal component analysis) is used to diminish the dimensionality of the feature sets as they are high dimensional. The proposed multimodal results were compared with other multimodal and monomodal approaches. Out of these comparisons, the multimodal multifeature palmprint iris fusion offers significant improvements in the accuracy of the suggested multimodal biometric system. The proposed algorithm is tested using created virtual multimodal database using UPOL iris database and PolyU palmprint database. PMID:26640813
Heimann, David C.; Kelly, Brian P.; Studley, Seth E.
2015-01-01
Additional information in this report includes maps of simulated stream velocity for an 8.2-mile, two-dimensional modeled reach of the Blue River and a Wetland Restoration Suitability Index (WRSI) generated for the study area that was based on hydrologic, topographic, and land-use digital feature layers. The calculated WRSI for the selected flood-plain area ranged from 1 (least suitable for possible wetland mitigation efforts) to 10 (most suitable for possible wetland mitigation efforts). A WRSI of 5 to 10 is most closely associated with existing riparian wetlands in the study area. The WRSI allows for the identification of lands along the Blue River and selected tributaries that are most suitable for restoration or creation of wetlands. Alternatively, the index can be used to identify and avoid disturbances to areas with the highest potential to support healthy sustainable riparian wetlands.
NASA Astrophysics Data System (ADS)
Kohno, Masanori
2018-05-01
The single-particle spectral properties of the two-dimensional t-J model with next-nearest-neighbor hopping are investigated near the Mott transition by using cluster perturbation theory. The spectral features are interpreted by considering the effects of the next-nearest-neighbor hopping on the shift of the spectral-weight distribution of the two-dimensional t-J model. Various anomalous features observed in hole-doped and electron-doped high-temperature cuprate superconductors are collectively explained in the two-dimensional t-J model with next-nearest-neighbor hopping near the Mott transition.
NASA Astrophysics Data System (ADS)
Zhu, Feng; Macdonald, Niall; Skommer, Joanna; Wlodkowic, Donald
2015-06-01
Current microfabrication methods are often restricted to two-dimensional (2D) or two and a half dimensional (2.5D) structures. Those fabrication issues can be potentially addressed by emerging additive manufacturing technologies. Despite rapid growth of additive manufacturing technologies in tissue engineering, microfluidics has seen relatively little developments with regards to adopting 3D printing for rapid fabrication of complex chip-based devices. This has been due to two major factors: lack of sufficient resolution of current rapid-prototyping methods (usually >100 μm ) and optical transparency of polymers to allow in vitro imaging of specimens. We postulate that adopting innovative fabrication processes can provide effective solutions for prototyping and manufacturing of chip-based devices with high-aspect ratios (i.e. above ration of 20:1). This work provides a comprehensive investigation of commercially available additive manufacturing technologies as an alternative for rapid prototyping of complex monolithic Lab-on-a-Chip devices for biological applications. We explored both multi-jet modelling (MJM) and several stereolithography (SLA) processes with five different 3D printing resins. Compared with other rapid prototyping technologies such as PDMS soft lithography and infrared laser micromachining, we demonstrated that selected SLA technologies had superior resolution and feature quality. We also for the first time optimised the post-processing protocols and demonstrated polymer features under scanning electronic microscope (SEM). Finally we demonstrate that selected SLA polymers have optical properties enabling high-resolution biological imaging. A caution should be, however, exercised as more work is needed to develop fully bio-compatible and non-toxic polymer chemistries.
Converging shock flows for a Mie-Grüneisen equation of state
NASA Astrophysics Data System (ADS)
Ramsey, Scott D.; Schmidt, Emma M.; Boyd, Zachary M.; Lilieholm, Jennifer F.; Baty, Roy S.
2018-04-01
Previous work has shown that the one-dimensional (1D) inviscid compressible flow (Euler) equations admit a wide variety of scale-invariant solutions (including the famous Noh, Sedov, and Guderley shock solutions) when the included equation of state (EOS) closure model assumes a certain scale-invariant form. However, this scale-invariant EOS class does not include even simple models used for shock compression of crystalline solids, including many broadly applicable representations of Mie-Grüneisen EOS. Intuitively, this incompatibility naturally arises from the presence of multiple dimensional scales in the Mie-Grüneisen EOS, which are otherwise absent from scale-invariant models that feature only dimensionless parameters (such as the adiabatic index in the ideal gas EOS). The current work extends previous efforts intended to rectify this inconsistency, by using a scale-invariant EOS model to approximate a Mie-Grüneisen EOS form. To this end, the adiabatic bulk modulus for the Mie-Grüneisen EOS is constructed, and its key features are used to motivate the selection of a scale-invariant approximation form. The remaining surrogate model parameters are selected through enforcement of the Rankine-Hugoniot jump conditions for an infinitely strong shock in a Mie-Grüneisen material. Finally, the approximate EOS is used in conjunction with the 1D inviscid Euler equations to calculate a semi-analytical Guderley-like imploding shock solution in a metal sphere and to determine if and when the solution may be valid for the underlying Mie-Grüneisen EOS.
NASA Astrophysics Data System (ADS)
Aytaç Korkmaz, Sevcan; Binol, Hamidullah
2018-03-01
Patients who die from stomach cancer are still present. Early diagnosis is crucial in reducing the mortality rate of cancer patients. Therefore, computer aided methods have been developed for early detection in this article. Stomach cancer images were obtained from Fırat University Medical Faculty Pathology Department. The Local Binary Patterns (LBP) and Histogram of Oriented Gradients (HOG) features of these images are calculated. At the same time, Sammon mapping, Stochastic Neighbor Embedding (SNE), Isomap, Classical multidimensional scaling (MDS), Local Linear Embedding (LLE), Linear Discriminant Analysis (LDA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Laplacian Eigenmaps methods are used for dimensional the reduction of the features. The high dimension of these features has been reduced to lower dimensions using dimensional reduction methods. Artificial neural networks (ANN) and Random Forest (RF) classifiers were used to classify stomach cancer images with these new lower feature sizes. New medical systems have developed to measure the effects of these dimensions by obtaining features in different dimensional with dimensional reduction methods. When all the methods developed are compared, it has been found that the best accuracy results are obtained with LBP_MDS_ANN and LBP_LLE_ANN methods.
Privacy preserving data publishing of categorical data through k-anonymity and feature selection.
Aristodimou, Aristos; Antoniades, Athos; Pattichis, Constantinos S
2016-03-01
In healthcare, there is a vast amount of patients' data, which can lead to important discoveries if combined. Due to legal and ethical issues, such data cannot be shared and hence such information is underused. A new area of research has emerged, called privacy preserving data publishing (PPDP), which aims in sharing data in a way that privacy is preserved while the information lost is kept at a minimum. In this Letter, a new anonymisation algorithm for PPDP is proposed, which is based on k-anonymity through pattern-based multidimensional suppression (kPB-MS). The algorithm uses feature selection for reducing the data dimensionality and then combines attribute and record suppression for obtaining k-anonymity. Five datasets from different areas of life sciences [RETINOPATHY, Single Proton Emission Computed Tomography imaging, gene sequencing and drug discovery (two datasets)], were anonymised with kPB-MS. The produced anonymised datasets were evaluated using four different classifiers and in 74% of the test cases, they produced similar or better accuracies than using the full datasets.
Clustering molecular dynamics trajectories for optimizing docking experiments.
De Paris, Renata; Quevedo, Christian V; Ruiz, Duncan D; Norberto de Souza, Osmar; Barros, Rodrigo C
2015-01-01
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand.
The Gamma-Ray Burst ToolSHED is Open for Business
NASA Astrophysics Data System (ADS)
Giblin, Timothy W.; Hakkila, Jon; Haglin, David J.; Roiger, Richard J.
2004-09-01
The GRB ToolSHED, a Gamma-Ray Burst SHell for Expeditions in Data-Mining, is now online and available via a web browser to all in the scientific community. The ToolSHED is an online web utility that contains pre-processed burst attributes of the BATSE catalog and a suite of induction-based machine learning and statistical tools for classification and cluster analysis. Users create their own login account and study burst properties within user-defined multi-dimensional parameter spaces. Although new GRB attributes are periodically added to the database for user selection, the ToolSHED has a feature that allows users to upload their own burst attributes (e.g. spectral parameters, etc.) so that additional parameter spaces can be explored. A data visualization feature using GNUplot and web-based IDL has also been implemented to provide interactive plotting of user-selected session output. In an era in which GRB observations and attributes are becoming increasingly more complex, a utility such as the GRB ToolSHED may play an important role in deciphering GRB classes and understanding intrinsic burst properties.
Dimensional control of die castings
NASA Astrophysics Data System (ADS)
Karve, Aniruddha Ajit
The demand for net shape die castings, which require little or no machining, is steadily increasing. Stringent customer requirements are forcing die casters to deliver high quality castings in increasingly short lead times. Dimensional conformance to customer specifications is an inherent part of die casting quality. The dimensional attributes of a die casting are essentially dependent upon many factors--the quality of the die and the degree of control over the process variables being the two major sources of dimensional error in die castings. This study focused on investigating the nature and the causes of dimensional error in die castings. The two major components of dimensional error i.e., dimensional variability and die allowance were studied. The major effort of this study was to qualitatively and quantitatively study the effects of casting geometry and process variables on die casting dimensional variability and die allowance. This was accomplished by detailed dimensional data collection at production die casting sites. Robust feature characterization schemes were developed to describe complex casting geometry in quantitative terms. Empirical modeling was utilized to quantify the effects of the casting variables on dimensional variability and die allowance for die casting features. A number of casting geometry and process variables were found to affect dimensional variability in die castings. The dimensional variability was evaluated by comparisons with current published dimensional tolerance standards. The casting geometry was found to play a significant role in influencing the die allowance of the features measured. The predictive models developed for dimensional variability and die allowance were evaluated to test their effectiveness. Finally, the relative impact of all the components of dimensional error in die castings was put into perspective, and general guidelines for effective dimensional control in the die casting plant were laid out. The results of this study will contribute to enhancement of dimensional quality and lead time compression in the die casting industry, thus making it competitive with other net shape manufacturing processes.
A wavelet-based technique to predict treatment outcome for Major Depressive Disorder
Xia, Likun; Mohd Yasin, Mohd Azhar; Azhar Ali, Syed Saad
2017-01-01
Treatment management for Major Depressive Disorder (MDD) has been challenging. However, electroencephalogram (EEG)-based predictions of antidepressant’s treatment outcome may help during antidepressant’s selection and ultimately improve the quality of life for MDD patients. In this study, a machine learning (ML) method involving pretreatment EEG data was proposed to perform such predictions for Selective Serotonin Reuptake Inhibitor (SSRIs). For this purpose, the acquisition of experimental data involved 34 MDD patients and 30 healthy controls. Consequently, a feature matrix was constructed involving time-frequency decomposition of EEG data based on wavelet transform (WT) analysis, termed as EEG data matrix. However, the resultant EEG data matrix had high dimensionality. Therefore, dimension reduction was performed based on a rank-based feature selection method according to a criterion, i.e., receiver operating characteristic (ROC). As a result, the most significant features were identified and further be utilized during the training and testing of a classification model, i.e., the logistic regression (LR) classifier. Finally, the LR model was validated with 100 iterations of 10-fold cross-validation (10-CV). The classification results were compared with short-time Fourier transform (STFT) analysis, and empirical mode decompositions (EMD). The wavelet features extracted from frontal and temporal EEG data were found statistically significant. In comparison with other time-frequency approaches such as the STFT and EMD, the WT analysis has shown highest classification accuracy, i.e., accuracy = 87.5%, sensitivity = 95%, and specificity = 80%. In conclusion, significant wavelet coefficients extracted from frontal and temporal pre-treatment EEG data involving delta and theta frequency bands may predict antidepressant’s treatment outcome for the MDD patients. PMID:28152063
A new time-frequency method for identification and classification of ball bearing faults
NASA Astrophysics Data System (ADS)
Attoui, Issam; Fergani, Nadir; Boutasseta, Nadir; Oudjani, Brahim; Deliou, Adel
2017-06-01
In order to fault diagnosis of ball bearing that is one of the most critical components of rotating machinery, this paper presents a time-frequency procedure incorporating a new feature extraction step that combines the classical wavelet packet decomposition energy distribution technique and a new feature extraction technique based on the selection of the most impulsive frequency bands. In the proposed procedure, firstly, as a pre-processing step, the most impulsive frequency bands are selected at different bearing conditions using a combination between Fast-Fourier-Transform FFT and Short-Frequency Energy SFE algorithms. Secondly, once the most impulsive frequency bands are selected, the measured machinery vibration signals are decomposed into different frequency sub-bands by using discrete Wavelet Packet Decomposition WPD technique to maximize the detection of their frequency contents and subsequently the most useful sub-bands are represented in the time-frequency domain by using Short Time Fourier transform STFT algorithm for knowing exactly what the frequency components presented in those frequency sub-bands are. Once the proposed feature vector is obtained, three feature dimensionality reduction techniques are employed using Linear Discriminant Analysis LDA, a feedback wrapper method and Locality Sensitive Discriminant Analysis LSDA. Lastly, the Adaptive Neuro-Fuzzy Inference System ANFIS algorithm is used for instantaneous identification and classification of bearing faults. In order to evaluate the performances of the proposed method, different testing data set to the trained ANFIS model by using different conditions of healthy and faulty bearings under various load levels, fault severities and rotating speed. The conclusion resulting from this paper is highlighted by experimental results which prove that the proposed method can serve as an intelligent bearing fault diagnosis system.
van Gemert, Jan C; Veenman, Cor J; Smeulders, Arnold W M; Geusebroek, Jan-Mark
2010-07-01
This paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been successfully applied for some years. In this paper, we investigate four types of soft assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known data sets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories.
Fuzzy Performance between Surface Fitting and Energy Distribution in Turbulence Runner
Liang, Zhongwei; Liu, Xiaochu; Ye, Bangyan; Brauwer, Richard Kars
2012-01-01
Because the application of surface fitting algorithms exerts a considerable fuzzy influence on the mathematical features of kinetic energy distribution, their relation mechanism in different external conditional parameters must be quantitatively analyzed. Through determining the kinetic energy value of each selected representative position coordinate point by calculating kinetic energy parameters, several typical algorithms of complicated surface fitting are applied for constructing microkinetic energy distribution surface models in the objective turbulence runner with those obtained kinetic energy values. On the base of calculating the newly proposed mathematical features, we construct fuzzy evaluation data sequence and present a new three-dimensional fuzzy quantitative evaluation method; then the value change tendencies of kinetic energy distribution surface features can be clearly quantified, and the fuzzy performance mechanism discipline between the performance results of surface fitting algorithms, the spatial features of turbulence kinetic energy distribution surface, and their respective environmental parameter conditions can be quantitatively analyzed in detail, which results in the acquirement of final conclusions concerning the inherent turbulence kinetic energy distribution performance mechanism and its mathematical relation. A further turbulence energy quantitative study can be ensured. PMID:23213287
Well-balanced compressible cut-cell simulation of atmospheric flow.
Klein, R; Bates, K R; Nikiforakis, N
2009-11-28
Cut-cell meshes present an attractive alternative to terrain-following coordinates for the representation of topography within atmospheric flow simulations, particularly in regions of steep topographic gradients. In this paper, we present an explicit two-dimensional method for the numerical solution on such meshes of atmospheric flow equations including gravitational sources. This method is fully conservative and allows for time steps determined by the regular grid spacing, avoiding potential stability issues due to arbitrarily small boundary cells. We believe that the scheme is unique in that it is developed within a dimensionally split framework, in which each coordinate direction in the flow is solved independently at each time step. Other notable features of the scheme are: (i) its conceptual and practical simplicity, (ii) its flexibility with regard to the one-dimensional flux approximation scheme employed, and (iii) the well-balancing of the gravitational sources allowing for stable simulation of near-hydrostatic flows. The presented method is applied to a selection of test problems including buoyant bubble rise interacting with geometry and lee-wave generation due to topography.
Bühnemann, Claudia; Li, Simon; Yu, Haiyue; Branford White, Harriet; Schäfer, Karl L; Llombart-Bosch, Antonio; Machado, Isidro; Picci, Piero; Hogendoorn, Pancras C W; Athanasou, Nicholas A; Noble, J Alison; Hassan, A Bassim
2014-01-01
Driven by genomic somatic variation, tumour tissues are typically heterogeneous, yet unbiased quantitative methods are rarely used to analyse heterogeneity at the protein level. Motivated by this problem, we developed automated image segmentation of images of multiple biomarkers in Ewing sarcoma to generate distributions of biomarkers between and within tumour cells. We further integrate high dimensional data with patient clinical outcomes utilising random survival forest (RSF) machine learning. Using material from cohorts of genetically diagnosed Ewing sarcoma with EWSR1 chromosomal translocations, confocal images of tissue microarrays were segmented with level sets and watershed algorithms. Each cell nucleus and cytoplasm were identified in relation to DAPI and CD99, respectively, and protein biomarkers (e.g. Ki67, pS6, Foxo3a, EGR1, MAPK) localised relative to nuclear and cytoplasmic regions of each cell in order to generate image feature distributions. The image distribution features were analysed with RSF in relation to known overall patient survival from three separate cohorts (185 informative cases). Variation in pre-analytical processing resulted in elimination of a high number of non-informative images that had poor DAPI localisation or biomarker preservation (67 cases, 36%). The distribution of image features for biomarkers in the remaining high quality material (118 cases, 104 features per case) were analysed by RSF with feature selection, and performance assessed using internal cross-validation, rather than a separate validation cohort. A prognostic classifier for Ewing sarcoma with low cross-validation error rates (0.36) was comprised of multiple features, including the Ki67 proliferative marker and a sub-population of cells with low cytoplasmic/nuclear ratio of CD99. Through elimination of bias, the evaluation of high-dimensionality biomarker distribution within cell populations of a tumour using random forest analysis in quality controlled tumour material could be achieved. Such an automated and integrated methodology has potential application in the identification of prognostic classifiers based on tumour cell heterogeneity.
Numerical Investigation of Dual-Mode Scramjet Combustor with Large Upstream Interaction
NASA Technical Reports Server (NTRS)
Mohieldin, T. O.; Tiwari, S. N.; Reubush, David E. (Technical Monitor)
2004-01-01
Dual-mode scramjet combustor configuration with significant upstream interaction is investigated numerically, The possibility of scaling the domain to accelerate the convergence and reduce the computational time is explored. The supersonic combustor configuration was selected to provide an understanding of key features of upstream interaction and to identify physical and numerical issues relating to modeling of dual-mode configurations. The numerical analysis was performed with vitiated air at freestream Math number of 2.5 using hydrogen as the sonic injectant. Results are presented for two-dimensional models and a three-dimensional jet-to-jet symmetric geometry. Comparisons are made with experimental results. Two-dimensional and three-dimensional results show substantial oblique shock train reaching upstream of the fuel injectors. Flow characteristics slow numerical convergence, while the upstream interaction slowly increases with further iterations. As the flow field develops, the symmetric assumption breaks down. A large separation zone develops and extends further upstream of the step. This asymmetric flow structure is not seen in the experimental data. Results obtained using a sub-scale domain (both two-dimensional and three-dimensional) qualitatively recover the flow physics obtained from full-scale simulations. All results show that numerical modeling using a scaled geometry provides good agreement with full-scale numerical results and experimental results for this configuration. This study supports the argument that numerical scaling is useful in simulating dual-mode scramjet combustor flowfields and could provide an excellent convergence acceleration technique for dual-mode simulations.
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data
Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J.; Yanes, Oscar
2012-01-01
Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples. PMID:24957762
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data.
Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J; Yanes, Oscar
2012-10-18
Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.
Saha, Sourabh K.; Divin, Chuck; Cuadra, Jefferson A.; ...
2017-05-12
Two-photon polymerization (TPP) is a laser writing process that enables fabrication of millimeter scale three-dimensional (3D) structures with submicron features. In TPP, writing is achieved via nonlinear two-photon absorption that occurs at high laser intensities. Thus, it is essential to carefully select the incident power to prevent laser damage during polymerization. Currently, the feasible range of laser power is identified by writing small test patterns at varying power levels. Here in this paper, we demonstrate that the results of these tests cannot be generalized, because the damage threshold power depends on the proximity of features and reduces by as muchmore » as 47% for overlapping features. We have identified that this reduction occurs primarily due to an increase in the single-photon absorptivity of the resin after curing. We have captured the damage from proximity effects via X-ray 3D computed tomography (CT) images of a non-homogenous part that has varying feature density. Part damage manifests as internal spherical voids that arise due to boiling of the resist. We have empirically quantified this proximity effect by identifying the damage threshold power at different writing speeds and feature overlap spacings. In addition, we present a first-order analytical model that captures the scaling of this proximity effect. Based on this model and the experiments, we have identified that the proximity effect is more significant at high writing speeds; therefore, it adversely affects the scalability of manufacturing. The scaling laws and the empirical data generated here can be used to select the appropriate TPP writing parameters.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saha, Sourabh K.; Divin, Chuck; Cuadra, Jefferson A.
Two-photon polymerization (TPP) is a laser writing process that enables fabrication of millimeter scale three-dimensional (3D) structures with submicron features. In TPP, writing is achieved via nonlinear two-photon absorption that occurs at high laser intensities. Thus, it is essential to carefully select the incident power to prevent laser damage during polymerization. Currently, the feasible range of laser power is identified by writing small test patterns at varying power levels. Here in this paper, we demonstrate that the results of these tests cannot be generalized, because the damage threshold power depends on the proximity of features and reduces by as muchmore » as 47% for overlapping features. We have identified that this reduction occurs primarily due to an increase in the single-photon absorptivity of the resin after curing. We have captured the damage from proximity effects via X-ray 3D computed tomography (CT) images of a non-homogenous part that has varying feature density. Part damage manifests as internal spherical voids that arise due to boiling of the resist. We have empirically quantified this proximity effect by identifying the damage threshold power at different writing speeds and feature overlap spacings. In addition, we present a first-order analytical model that captures the scaling of this proximity effect. Based on this model and the experiments, we have identified that the proximity effect is more significant at high writing speeds; therefore, it adversely affects the scalability of manufacturing. The scaling laws and the empirical data generated here can be used to select the appropriate TPP writing parameters.« less
Wexler, Eliezer J.
1989-01-01
Analytical solutions to the advective-dispersive solute-transport equation are useful in predicting the fate of solutes in ground water. Analytical solutions compiled from available literature or derived by the author are presented in this report for a variety of boundary condition types and solute-source configurations in one-, two-, and three-dimensional systems with uniform ground-water flow. A set of user-oriented computer programs was created to evaluate these solutions and to display the results in tabular and computer-graphics format. These programs incorporate many features that enhance their accuracy, ease of use, and versatility. Documentation for the programs describes their operation and required input data, and presents the results of sample problems. Derivations of select solutions, source codes for the computer programs, and samples of program input and output also are included.
Mu, Jiuke; Wang, Gang; Yan, Hongping; Li, Huayu; Wang, Xuemin; Gao, Enlai; Hou, Chengyi; Pham, Anh Thi Cam; Wu, Lianjun; Zhang, Qinghong; Li, Yaogang; Xu, Zhiping; Guo, Yang; Reichmanis, Elsa; Wang, Hongzhi; Zhu, Meifang
2018-02-09
The ability to achieve simultaneous intrinsic deformation with fast response in commercially available materials that can safely contact skin continues to be an unresolved challenge for artificial actuating materials. Rather than using a microporous structure, here we show an ambient-driven actuator that takes advantage of inherent nanoscale molecular channels within a commercial perfluorosulfonic acid ionomer (PFSA) film, fabricated by simple solution processing to realize a rapid response, self-adaptive, and exceptionally stable actuation. Selective patterning of PFSA films on an inert soft substrate (polyethylene terephthalate film) facilitates the formation of a range of different geometries, including a 2D (two-dimensional) roll or 3D (three-dimensional) helical structure in response to vapor stimuli. Chemical modification of the surface allowed the development of a kirigami-inspired single-layer actuator for personal humidity and heat management through macroscale geometric design features, to afford a bilayer stimuli-responsive actuator with multicolor switching capability.
NASA Astrophysics Data System (ADS)
Thangjam, Guneshwar; Nathues, Andreas; Mengel, Kurt; Schäfer, Michael; Hoffmann, Martin; Cloutis, Edward A.; Mann, Paul; Müller, Christian; Platz, Thomas; Schäfer, Tanja
2016-03-01
We introduce an innovative three-dimensional spectral approach (three band parameter space with polyhedrons) that can be used for both qualitative and quantitative analyzes improving the characterization of surface compositional heterogeneity of (4) Vesta. It is an advanced and more robust methodology compared to the standard two-dimensional spectral approach (two band parameter space). The Dawn Framing Camera (FC) color data obtained during High Altitude Mapping Orbit (resolution ∼ 60 m/pixel) is used. The main focus is on the howardite-eucrite-diogenite (HED) lithologies containing carbonaceous chondritic material, olivine, and impact-melt. The archived spectra of HEDs and their mixtures, from RELAB, HOSERLab and USGS databases as well as our laboratory-measured spectra are used for this study. Three-dimensional convex polyhedrons are defined using computed band parameter values of laboratory spectra. Polyhedrons based on the parameters of Band Tilt (R0.92μm/R0.96μm), Mid Ratio ((R0.75μm/R0.83μm)/(R0.83μm/R0.92μm)) and reflectance at 0.55 μm (R0.55μm) are chosen for the present analysis. An algorithm in IDL programming language is employed to assign FC data points to the respective polyhedrons. The Arruntia region in the northern hemisphere of Vesta is selected for a case study because of its geological and mineralogical importance. We observe that this region is eucrite-dominated howarditic in composition. The extent of olivine-rich exposures within an area of 2.5 crater radii is ∼12% larger than the previous finding (Thangjam, G. et al. [2014]. Meteorit. Planet. Sci. 49, 1831-1850). Lithologies of nearly pure CM2-chondrite, olivine, glass, and diogenite are not found in this region. Although there are no unambiguous spectral features of impact melt, the investigation of morphological features using FC clear filter data from Low Altitude Mapping Orbit (resolution ∼ 18 m/pixel) suggests potential impact-melt features inside and outside of the crater. Our spectral approach can be extended to the entire Vestan surface to study the heterogeneous surface composition and its geology.
Age and gender classification in the wild with unsupervised feature learning
NASA Astrophysics Data System (ADS)
Wan, Lihong; Huo, Hong; Fang, Tao
2017-03-01
Inspired by unsupervised feature learning (UFL) within the self-taught learning framework, we propose a method based on UFL, convolution representation, and part-based dimensionality reduction to handle facial age and gender classification, which are two challenging problems under unconstrained circumstances. First, UFL is introduced to learn selective receptive fields (filters) automatically by applying whitening transformation and spherical k-means on random patches collected from unlabeled data. The learning process is fast and has no hyperparameters to tune. Then, the input image is convolved with these filters to obtain filtering responses on which local contrast normalization is applied. Average pooling and feature concatenation are then used to form global face representation. Finally, linear discriminant analysis with part-based strategy is presented to reduce the dimensions of the global representation and to improve classification performances further. Experiments on three challenging databases, namely, Labeled faces in the wild, Gallagher group photos, and Adience, demonstrate the effectiveness of the proposed method relative to that of state-of-the-art approaches.
Methods and devices for fabricating three-dimensional nanoscale structures
Rogers, John A.; Jeon, Seokwoo; Park, Jangung
2010-04-27
The present invention provides methods and devices for fabricating 3D structures and patterns of 3D structures on substrate surfaces, including symmetrical and asymmetrical patterns of 3D structures. Methods of the present invention provide a means of fabricating 3D structures having accurately selected physical dimensions, including lateral and vertical dimensions ranging from 10s of nanometers to 1000s of nanometers. In one aspect, methods are provided using a mask element comprising a conformable, elastomeric phase mask capable of establishing conformal contact with a radiation sensitive material undergoing photoprocessing. In another aspect, the temporal and/or spatial coherence of electromagnetic radiation using for photoprocessing is selected to fabricate complex structures having nanoscale features that do not extend entirely through the thickness of the structure fabricated.
Visuospatial, visuoperceptual, and visuoconstructive abilities in congenital hypothyroidism.
Simic, Nevena; Khan, Sarah; Rovet, Joanne
2013-11-01
Individuals with congenital hypothyroidism (CH), even those diagnosed and treated early, experience selective cognitive deficits, the most striking of which involves the visuocognitive domain. However, the range and nature of their visuocognitive disturbances is not fully understood. We assessed a range of higher-order visuocognitive abilities in 19 children and adolescents with CH and 19 age- and sex-matched typically developing peers (TD) using a battery of neuropsychological tests and a novel self-report measure of sense of direction. CH scored lower than TD on direct tests of visuocognitive function (judging line orientation, parts-to-whole localization, copying three-dimensional block towers, discriminating designs, and matching unfamiliar faces in ¾ profile-view) as well as on self-reported problems in spatial ability. Visuocognitive problems were not global as CH and TD did not differ at copying two-dimensional block designs, mentally rotating and matching abstract shapes, or at matching unfamiliar front-view faces, design features, or designs that engaged either figure-ground segregation, visual constancy, or closure. Early and concurrent thyroid stimulating hormone (TSH) levels were associated with visuocognitive ability, although attention and working memory were not. Individuals with CH exhibit selective visuocognitive weaknesses, some of which are related to early and concurrent TSH levels.
Töllner, Thomas; Müller, Hermann J; Zehetleitner, Michael
2012-07-01
Visual search for feature singletons is slowed when a task-irrelevant, but more salient distracter singleton is concurrently presented. While there is a consensus that this distracter interference effect can be influenced by internal system settings, it remains controversial at what stage of processing this influence starts to affect visual coding. Advocates of the "stimulus-driven" view maintain that the initial sweep of visual processing is entirely driven by physical stimulus attributes and that top-down settings can bias visual processing only after selection of the most salient item. By contrast, opponents argue that top-down expectancies can alter the initial selection priority, so that focal attention is "not automatically" shifted to the location exhibiting the highest feature contrast. To precisely trace the allocation of focal attention, we analyzed the Posterior-Contralateral-Negativity (PCN) in a task in which the likelihood (expectancy) with which a distracter occurred was systematically varied. Our results show that both high (vs. low) distracter expectancy and experiencing a distracter on the previous trial speed up the timing of the target-elicited PCN. Importantly, there was no distracter-elicited PCN, indicating that participants did not shift attention to the distracter before selecting the target. This pattern unambiguously demonstrates that preattentive vision is top-down modifiable.
Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval.
Wei, Xiu-Shen; Luo, Jian-Hao; Wu, Jianxin; Zhou, Zhi-Hua
2017-06-01
Deep convolutional neural network models pre-trained for the ImageNet classification task have been successfully adopted to tasks in other domains, such as texture description and object proposal generation, but these tasks require annotations for images in the new domain. In this paper, we focus on a novel and challenging task in the pure unsupervised setting: fine-grained image retrieval. Even with image labels, fine-grained images are difficult to classify, letting alone the unsupervised retrieval task. We propose the selective convolutional descriptor aggregation (SCDA) method. The SCDA first localizes the main object in fine-grained images, a step that discards the noisy background and keeps useful deep descriptors. The selected descriptors are then aggregated and the dimensionality is reduced into a short feature vector using the best practices we found. The SCDA is unsupervised, using no image label or bounding box annotation. Experiments on six fine-grained data sets confirm the effectiveness of the SCDA for fine-grained image retrieval. Besides, visualization of the SCDA features shows that they correspond to visual attributes (even subtle ones), which might explain SCDA's high-mean average precision in fine-grained retrieval. Moreover, on general image retrieval data sets, the SCDA achieves comparable retrieval results with the state-of-the-art general image retrieval approaches.
Automatic seed selection for segmentation of liver cirrhosis in laparoscopic sequences
NASA Astrophysics Data System (ADS)
Sinha, Rahul; Marcinczak, Jan Marek; Grigat, Rolf-Rainer
2014-03-01
For computer aided diagnosis based on laparoscopic sequences, image segmentation is one of the basic steps which define the success of all further processing. However, many image segmentation algorithms require prior knowledge which is given by interaction with the clinician. We propose an automatic seed selection algorithm for segmentation of liver cirrhosis in laparoscopic sequences which assigns each pixel a probability of being cirrhotic liver tissue or background tissue. Our approach is based on a trained classifier using SIFT and RGB features with PCA. Due to the unique illumination conditions in laparoscopic sequences of the liver, a very low dimensional feature space can be used for classification via logistic regression. The methodology is evaluated on 718 cirrhotic liver and background patches that are taken from laparoscopic sequences of 7 patients. Using a linear classifier we achieve a precision of 91% in a leave-one-patient-out cross-validation. Furthermore, we demonstrate that with logistic probability estimates, seeds with high certainty of being cirrhotic liver tissue can be obtained. For example, our precision of liver seeds increases to 98.5% if only seeds with more than 95% probability of being liver are used. Finally, these automatically selected seeds can be used as priors in Graph Cuts which is demonstrated in this paper.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Knogler, Thomas; El-Rabadi, Karem; Weber, Michael
2014-12-15
Purpose: To determine the diagnostic performance of three-dimensional (3D) texture analysis (TA) of contrast-enhanced computed tomography (CE-CT) images for treatment response assessment in patients with Hodgkin lymphoma (HL), compared with F-18-fludeoxyglucose (FDG) positron emission tomography/CT. Methods: 3D TA of 48 lymph nodes in 29 patients was performed on venous-phase CE-CT images before and after chemotherapy. All lymph nodes showed pathologically elevated FDG uptake at baseline. A stepwise logistic regression with forward selection was performed to identify classic CT parameters and texture features (TF) that enable the separation of complete response (CR) and persistent disease. Results: The TF fraction of imagemore » in runs, calculated for the 45° direction, was able to correctly identify CR with an accuracy of 75%, a sensitivity of 79.3%, and a specificity of 68.4%. Classical CT features achieved an accuracy of 75%, a sensitivity of 86.2%, and a specificity of 57.9%, whereas the combination of TF and CT imaging achieved an accuracy of 83.3%, a sensitivity of 86.2%, and a specificity of 78.9%. Conclusions: 3D TA of CE-CT images is potentially useful to identify nodal residual disease in HL, with a performance comparable to that of classical CT parameters. Best results are achieved when TA and classical CT features are combined.« less
Bi-level Multi-Source Learning for Heterogeneous Block-wise Missing Data
Xiang, Shuo; Yuan, Lei; Fan, Wei; Wang, Yalin; Thompson, Paul M.; Ye, Jieping
2013-01-01
Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer’s Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified “bi-level” learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches. PMID:23988272
Bi-level multi-source learning for heterogeneous block-wise missing data.
Xiang, Shuo; Yuan, Lei; Fan, Wei; Wang, Yalin; Thompson, Paul M; Ye, Jieping
2014-11-15
Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified "bi-level" learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches. © 2013 Elsevier Inc. All rights reserved.
Reorienting in Images of a Three-Dimensional Environment
ERIC Educational Resources Information Center
Kelly, Debbie M.; Bischof, Walter F.
2005-01-01
Adult humans searched for a hidden goal in images depicting 3-dimensional rooms. Images contained either featural cues, geometric cues, or both, which could be used to determine the correct location of the goal. In Experiment 1, participants learned to use featural and geometric information equally well. However, men and women showed significant…
Model-Free Conditional Independence Feature Screening For Ultrahigh Dimensional Data.
Wang, Luheng; Liu, Jingyuan; Li, Yong; Li, Runze
2017-03-01
Feature screening plays an important role in ultrahigh dimensional data analysis. This paper is concerned with conditional feature screening when one is interested in detecting the association between the response and ultrahigh dimensional predictors (e.g., genetic makers) given a low-dimensional exposure variable (such as clinical variables or environmental variables). To this end, we first propose a new index to measure conditional independence, and further develop a conditional screening procedure based on the newly proposed index. We systematically study the theoretical property of the proposed procedure and establish the sure screening and ranking consistency properties under some very mild conditions. The newly proposed screening procedure enjoys some appealing properties. (a) It is model-free in that its implementation does not require a specification on the model structure; (b) it is robust to heavy-tailed distributions or outliers in both directions of response and predictors; and (c) it can deal with both feature screening and the conditional screening in a unified way. We study the finite sample performance of the proposed procedure by Monte Carlo simulations and further illustrate the proposed method through two real data examples.
Geometric Representations of Condition Queries on Three-Dimensional Vector Fields
NASA Technical Reports Server (NTRS)
Henze, Chris
1999-01-01
Condition queries on distributed data ask where particular conditions are satisfied. It is possible to represent condition queries as geometric objects by plotting field data in various spaces derived from the data, and by selecting loci within these derived spaces which signify the desired conditions. Rather simple geometric partitions of derived spaces can represent complex condition queries because much complexity can be encapsulated in the derived space mapping itself A geometric view of condition queries provides a useful conceptual unification, allowing one to intuitively understand many existing vector field feature detection algorithms -- and to design new ones -- as variations on a common theme. A geometric representation of condition queries also provides a simple and coherent basis for computer implementation, reducing a wide variety of existing and potential vector field feature detection techniques to a few simple geometric operations.
Fault Detection of Bearing Systems through EEMD and Optimization Algorithm
Lee, Dong-Han; Ahn, Jong-Hyo; Koh, Bong-Hwan
2017-01-01
This study proposes a fault detection and diagnosis method for bearing systems using ensemble empirical mode decomposition (EEMD) based feature extraction, in conjunction with particle swarm optimization (PSO), principal component analysis (PCA), and Isomap. First, a mathematical model is assumed to generate vibration signals from damaged bearing components, such as the inner-race, outer-race, and rolling elements. The process of decomposing vibration signals into intrinsic mode functions (IMFs) and extracting statistical features is introduced to develop a damage-sensitive parameter vector. Finally, PCA and Isomap algorithm are used to classify and visualize this parameter vector, to separate damage characteristics from healthy bearing components. Moreover, the PSO-based optimization algorithm improves the classification performance by selecting proper weightings for the parameter vector, to maximize the visualization effect of separating and grouping of parameter vectors in three-dimensional space. PMID:29143772
McFarland, Dennis J; Krusienski, Dean J; Wolpaw, Jonathan R
2006-01-01
The Wadsworth brain-computer interface (BCI), based on mu and beta sensorimotor rhythms, uses one- and two-dimensional cursor movement tasks and relies on user training. This is a real-time closed-loop system. Signal processing consists of channel selection, spatial filtering, and spectral analysis. Feature translation uses a regression approach and normalization. Adaptation occurs at several points in this process on the basis of different criteria and methods. It can use either feedforward (e.g., estimating the signal mean for normalization) or feedback control (e.g., estimating feature weights for the prediction equation). We view this process as the interaction between a dynamic user and a dynamic system that coadapt over time. Understanding the dynamics of this interaction and optimizing its performance represent a major challenge for BCI research.
Texture analysis based on the Hermite transform for image classification and segmentation
NASA Astrophysics Data System (ADS)
Estudillo-Romero, Alfonso; Escalante-Ramirez, Boris; Savage-Carmona, Jesus
2012-06-01
Texture analysis has become an important task in image processing because it is used as a preprocessing stage in different research areas including medical image analysis, industrial inspection, segmentation of remote sensed imaginary, multimedia indexing and retrieval. In order to extract visual texture features a texture image analysis technique is presented based on the Hermite transform. Psychovisual evidence suggests that the Gaussian derivatives fit the receptive field profiles of mammalian visual systems. The Hermite transform describes locally basic texture features in terms of Gaussian derivatives. Multiresolution combined with several analysis orders provides detection of patterns that characterizes every texture class. The analysis of the local maximum energy direction and steering of the transformation coefficients increase the method robustness against the texture orientation. This method presents an advantage over classical filter bank design because in the latter a fixed number of orientations for the analysis has to be selected. During the training stage, a subset of the Hermite analysis filters is chosen in order to improve the inter-class separability, reduce dimensionality of the feature vectors and computational cost during the classification stage. We exhaustively evaluated the correct classification rate of real randomly selected training and testing texture subsets using several kinds of common used texture features. A comparison between different distance measurements is also presented. Results of the unsupervised real texture segmentation using this approach and comparison with previous approaches showed the benefits of our proposal.
Fault Diagnosis for Rolling Bearings under Variable Conditions Based on Visual Cognition
Cheng, Yujie; Zhou, Bo; Lu, Chen; Yang, Chao
2017-01-01
Fault diagnosis for rolling bearings has attracted increasing attention in recent years. However, few studies have focused on fault diagnosis for rolling bearings under variable conditions. This paper introduces a fault diagnosis method for rolling bearings under variable conditions based on visual cognition. The proposed method includes the following steps. First, the vibration signal data are transformed into a recurrence plot (RP), which is a two-dimensional image. Then, inspired by the visual invariance characteristic of the human visual system (HVS), we utilize speed up robust feature to extract fault features from the two-dimensional RP and generate a 64-dimensional feature vector, which is invariant to image translation, rotation, scaling variation, etc. Third, based on the manifold perception characteristic of HVS, isometric mapping, a manifold learning method that can reflect the intrinsic manifold embedded in the high-dimensional space, is employed to obtain a low-dimensional feature vector. Finally, a classical classification method, support vector machine, is utilized to realize fault diagnosis. Verification data were collected from Case Western Reserve University Bearing Data Center, and the experimental result indicates that the proposed fault diagnosis method based on visual cognition is highly effective for rolling bearings under variable conditions, thus providing a promising approach from the cognitive computing field. PMID:28772943
An ensemble method for extracting adverse drug events from social media.
Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi
2016-06-01
Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys. PMID:25148528
NASA Astrophysics Data System (ADS)
Jiang, Li; Xuan, Jianping; Shi, Tielin
2013-12-01
Generally, the vibration signals of faulty machinery are non-stationary and nonlinear under complicated operating conditions. Therefore, it is a big challenge for machinery fault diagnosis to extract optimal features for improving classification accuracy. This paper proposes semi-supervised kernel Marginal Fisher analysis (SSKMFA) for feature extraction, which can discover the intrinsic manifold structure of dataset, and simultaneously consider the intra-class compactness and the inter-class separability. Based on SSKMFA, a novel approach to fault diagnosis is put forward and applied to fault recognition of rolling bearings. SSKMFA directly extracts the low-dimensional characteristics from the raw high-dimensional vibration signals, by exploiting the inherent manifold structure of both labeled and unlabeled samples. Subsequently, the optimal low-dimensional features are fed into the simplest K-nearest neighbor (KNN) classifier to recognize different fault categories and severities of bearings. The experimental results demonstrate that the proposed approach improves the fault recognition performance and outperforms the other four feature extraction methods.
Feature theory and the two-step hypothesis of Müllerian mimicry evolution.
Balogh, Alexandra Catherine Victoria; Gamberale-Stille, Gabriella; Tullberg, Birgitta Sillén; Leimar, Olof
2010-03-01
The two-step hypothesis of Müllerian mimicry evolution states that mimicry starts with a major mutational leap between adaptive peaks, followed by gradual fine-tuning. The hypothesis was suggested to solve the problem of apostatic selection producing a valley between adaptive peaks, and appears reasonable for a one-dimensional phenotype. Extending the hypothesis to the realistic scenario of multidimensional phenotypes controlled by multiple genetic loci can be problematic, because it is unlikely that major mutational leaps occur simultaneously in several traits. Here we consider the implications of predator psychology on the evolutionary process. According to feature theory, single prey traits may be used by predators as features to classify prey into discrete categories. A mutational leap in such a trait could initiate mimicry evolution. We conducted individual-based evolutionary simulations in which virtual predators both categorize prey according to features and generalize over total appearances. We found that an initial mutational leap toward feature similarity in one dimension facilitates mimicry evolution of multidimensional traits. We suggest that feature-based predator categorization together with predator generalization over total appearances solves the problem of applying the two-step hypothesis to complex phenotypes, and provides a basis for a theory of the evolution of mimicry rings.
Kernel and divergence techniques in high energy physics separations
NASA Astrophysics Data System (ADS)
Bouř, Petr; Kůs, Václav; Franc, Jiří
2017-10-01
Binary decision trees under the Bayesian decision technique are used for supervised classification of high-dimensional data. We present a great potential of adaptive kernel density estimation as the nested separation method of the supervised binary divergence decision tree. Also, we provide a proof of alternative computing approach for kernel estimates utilizing Fourier transform. Further, we apply our method to Monte Carlo data set from the particle accelerator Tevatron at DØ experiment in Fermilab and provide final top-antitop signal separation results. We have achieved up to 82 % AUC while using the restricted feature selection entering the signal separation procedure.
Learning with imperfectly labeled patterns
NASA Technical Reports Server (NTRS)
Chittineni, C. B.
1979-01-01
The problem of learning in pattern recognition using imperfectly labeled patterns is considered. The performance of the Bayes and nearest neighbor classifiers with imperfect labels is discussed using a probabilistic model for the mislabeling of the training patterns. Schemes for training the classifier using both parametric and non parametric techniques are presented. Methods for the correction of imperfect labels were developed. To gain an understanding of the learning process, expressions are derived for success probability as a function of training time for a one dimensional increment error correction classifier with imperfect labels. Feature selection with imperfectly labeled patterns is described.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662
NASA Astrophysics Data System (ADS)
Kim, Jeongwoo; Wu, Ruqian
2018-03-01
Despite the superiority of two-dimensional (2D) topological insulators (TIs) over their three-dimensional (3D) counterparts in various aspects and the essential distinction between them in structural symmetry, the variation of the topological one-dimensional (1D) edge states upon magnetic interaction and their application for spintronic devices have not been sufficiently illuminated. Here, we reveal that 1D edge states of 2D TIs have a unique magnetic response never observed in 2D surface states of 3D TIs, and using this exotic nature we propose a way to utilize the spin-polarized channel for spintronic applications. We investigate the effects of width and magnetic decoration on the 1D topological edge state of Bi bilayer nanoribbons (BNRs). Through the Zak phase, we find that the zero-energy states are enforced at the magnetic domain boundaries in the Cr-decorated BNR and directly examine their robustness using short-range magnetic domain structures. We also demonstrate that 1D edge states of BNRs can be selectively and reversibly controlled by the combination of magnetic reorientation and electric field without compromising their structural integrity. Our work provides a fundamental understanding of 1D topological edge states and shows the opportunity of using these features in spintronic devices.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stein, Joshua S.; Rautman, Christopher Arthur
2004-02-01
The geologic model implicit in the original site characterization report for the Bayou Choctaw Strategic Petroleum Reserve Site near Baton Rouge, Louisiana, has been converted to a numerical, computer-based three-dimensional model. The original site characterization model was successfully converted with minimal modifications and use of new information. The geometries of the salt diapir, selected adjacent sedimentary horizons, and a number of faults have been modeled. Models of a partial set of the several storage caverns that have been solution-mined within the salt mass are also included. Collectively, the converted model appears to be a relatively realistic representation of the geologymore » of the Bayou Choctaw site as known from existing data. A small number of geometric inconsistencies and other problems inherent in 2-D vs. 3-D modeling have been noted. Most of the major inconsistencies involve faults inferred from drill hole data only. Modem computer software allows visualization of the resulting site model and its component submodels with a degree of detail and flexibility that was not possible with conventional, two-dimensional and paper-based geologic maps and cross sections. The enhanced visualizations may be of particular value in conveying geologic concepts involved in the Bayou Choctaw Strategic Petroleum Reserve site to a lay audience. A Microsoft WindowsTM PC-based viewer and user-manipulable model files illustrating selected features of the converted model are included in this report.« less
Defining an essence of structure determining residue contacts in proteins.
Sathyapriya, R; Duarte, Jose M; Stehr, Henning; Filippis, Ioannis; Lappe, Michael
2009-12-01
The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the fields of structure prediction, empirical potentials and docking.
Defining an Essence of Structure Determining Residue Contacts in Proteins
Sathyapriya, R.; Duarte, Jose M.; Stehr, Henning; Filippis, Ioannis; Lappe, Michael
2009-01-01
The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking. PMID:19997489
One-dimensional conduction through supporting electrolytes: two-scale cathodic Debye layer.
Almog, Yaniv; Yariv, Ehud
2011-10-01
Supporting-electrolyte solutions comprise chemically inert cations and anions, produced by salt dissolution, together with a reactive ionic species that may be consumed and generated on bounding ion-selective surfaces (e.g., electrodes or membranes). Upon application of an external voltage, a Faraday current is thereby established. It is natural to analyze this ternary-system process through a one-dimensional transport problem, employing the thin Debye-layer limit. Using a simple model of ideal ion-selective membranes, we have recently addressed this problem for moderate voltages [Yariv and Almog, Phys. Rev. Lett. 105, 176101 (2010)], predicting currents that scale as a fractional power of Debye thickness. We address herein the complementary problem of moderate currents. We employ matched asymptotic expansions, separately analyzing the two inner thin Debye layers adjacent to the ion-selective surfaces and the outer electroneutral region outside them. A straightforward calculation following comparable singular-perturbation analyses of binary systems is frustrated by the prediction of negative ionic concentrations near the cathode. Accompanying numerical simulations, performed for small values of Debye thickness, indicate a number unconventional features occurring at that region, such as inert-cation concentration amplification and electric-field intensification. The current-voltage correlation data of the electrochemical cell, obtained from compilation of these simulations, does not approach a limit as the Debye thickness vanishes. Resolution of these puzzles reveals a transformation of the asymptotic structure of the cathodic Debye layer. This reflects the emergence of an internal boundary layer, adjacent to the cathode, wherein field and concentration scaling differs from those of the Gouy-Chapman theory. The two-scale feature of the cathodic Debye layer is manifested through a logarithmic voltage scaling with Debye thickness. Accounting for this scaling, the complied current-voltage data collapses upon a single curve. This curve practically coincides with an asymptotically calculated universal current-voltage relation.
Application of Fourier analysis to multispectral/spatial recognition
NASA Technical Reports Server (NTRS)
Hornung, R. J.; Smith, J. A.
1973-01-01
One approach for investigating spectral response from materials is to consider spatial features of the response. This might be accomplished by considering the Fourier spectrum of the spatial response. The Fourier Transform may be used in a one-dimensional to multidimensional analysis of more than one channel of data. The two-dimensional transform represents the Fraunhofer diffraction pattern of the image in optics and has certain invariant features. Physically the diffraction pattern contains spatial features which are possibly unique to a given configuration or classification type. Different sampling strategies may be used to either enhance geometrical differences or extract additional features.
The identification of two unusual types of homemade ammunition.
Lee, Hsieh-Chang; Meng, Hsien-Hui
2012-07-01
Illegal homemade ammunition is commonly used by criminals to commit crimes in Taiwan. Two unusual types of homemade ammunition that most closely resembling genuine ammunition are studied here. Their genuine counterparts are studied as the control samples for the purpose of comparison. Unfired ammunition is disassembled, and the morphological, dimensional, and compositional features of the bullet and cartridge case are examined. Statistical tests are employed to distinguish the dimensional differences between homemade and genuine ammunitions. Manufacturing marks on head stamps of the cartridge case are carefully examined. Compositional features of propellant powders, primer mixtures, and gunshot residues are also analyzed. The results reveal that the morphological, dimensional, and compositional features of major parts of the ammunition can be employed to differentiate homemade cartridges from genuine ones. Among these features, tool marks on the head stamps left by the bunter can be used to trace the origin of ammunition. © 2012 American Academy of Forensic Sciences.
Automatic identification of abstract online groups
Engel, David W; Gregory, Michelle L; Bell, Eric B; Cowell, Andrew J; Piatt, Andrew W
2014-04-15
Online abstract groups, in which members aren't explicitly connected, can be automatically identified by computer-implemented methods. The methods involve harvesting records from social media and extracting content-based and structure-based features from each record. Each record includes a social-media posting and is associated with one or more entities. Each feature is stored on a data storage device and includes a computer-readable representation of an attribute of one or more records. The methods further involve grouping records into record groups according to the features of each record. Further still the methods involve calculating an n-dimensional surface representing each record group and defining an outlier as a record having feature-based distances measured from every n-dimensional surface that exceed a threshold value. Each of the n-dimensional surfaces is described by a footprint that characterizes the respective record group as an online abstract group.
Blended particle filters for large-dimensional chaotic dynamical systems
Majda, Andrew J.; Qi, Di; Sapsis, Themistoklis P.
2014-01-01
A major challenge in contemporary data science is the development of statistically accurate particle filters to capture non-Gaussian features in large-dimensional chaotic dynamical systems. Blended particle filters that capture non-Gaussian features in an adaptively evolving low-dimensional subspace through particles interacting with evolving Gaussian statistics on the remaining portion of phase space are introduced here. These blended particle filters are constructed in this paper through a mathematical formalism involving conditional Gaussian mixtures combined with statistically nonlinear forecast models compatible with this structure developed recently with high skill for uncertainty quantification. Stringent test cases for filtering involving the 40-dimensional Lorenz 96 model with a 5-dimensional adaptive subspace for nonlinear blended filtering in various turbulent regimes with at least nine positive Lyapunov exponents are used here. These cases demonstrate the high skill of the blended particle filter algorithms in capturing both highly non-Gaussian dynamical features as well as crucial nonlinear statistics for accurate filtering in extreme filtering regimes with sparse infrequent high-quality observations. The formalism developed here is also useful for multiscale filtering of turbulent systems and a simple application is sketched below. PMID:24825886
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ye, Tianyu; Mani, Ramesh G.; Wegscheider, Werner
2013-12-04
We present the results of a concurrent experimental study of microwave reflection and transport in the GaAs/AlGaAs two dimensional electron gas system and correlate observed features in the reflection with the observed transport features. The experimental results are compared with expectations based on theory.
Maxwell, James L; Rose, Chris R; Black, Marcie R; Springer, Robert W
2014-03-11
Microelectronic structures and devices, and method of fabricating a three-dimensional microelectronic structure is provided, comprising passing a first precursor material for a selected three-dimensional microelectronic structure into a reaction chamber at temperatures sufficient to maintain said precursor material in a predominantly gaseous state; maintaining said reaction chamber under sufficient pressures to enhance formation of a first portion of said three-dimensional microelectronic structure; applying an electric field between an electrode and said microelectronic structure at a desired point under conditions whereat said first portion of a selected three-dimensional microelectronic structure is formed from said first precursor material; positionally adjusting either said formed three-dimensional microelectronic structure or said electrode whereby further controlled growth of said three-dimensional microelectronic structure occurs; passing a second precursor material for a selected three-dimensional microelectronic structure into a reaction chamber at temperatures sufficient to maintain said precursor material in a predominantly gaseous state; maintaining said reaction chamber under sufficient pressures whereby a second portion of said three-dimensional microelectronic structure formation is enhanced; applying an electric field between an electrode and said microelectronic structure at a desired point under conditions whereat said second portion of a selected three-dimensional microelectronic structure is formed from said second precursor material; and, positionally adjusting either said formed three-dimensional microelectronic structure or said electrode whereby further controlled growth of said three-dimensional microelectronic structure occurs.
Teeter, Matthew G; Kopacz, Alexander J; Nikolov, Hristo N; Holdsworth, David W
2015-01-01
Additive manufacturing continues to increase in popularity and is being used in applications such as biomaterial ingrowth that requires sub-millimeter dimensional accuracy. The purpose of this study was to design a metrology test object for determining the capabilities of additive manufacturing systems to produce common objects, with a focus on those relevant to medical applications. The test object was designed with a variety of features of varying dimensions, including holes, cylinders, rectangles, gaps, and lattices. The object was built using selective laser melting, and the produced dimensions were compared to the target dimensions. Location of the test objects on the build plate did not affect dimensions. Features with dimensions less than 0.300 mm did not build or were overbuilt to a minimum of 0.300 mm. The mean difference between target and measured dimensions was less than 0.100 mm in all cases. The test object is applicable to multiple systems and materials, tests the effect of location on the build, uses a minimum of material, and can be measured with a variety of efficient metrology tools (including measuring microscopes and micro-CT). Investigators can use this test object to determine the limits of systems and adjust build parameters to achieve maximum accuracy. © IMechE 2014.
NASA Astrophysics Data System (ADS)
Dieckman, Eric Allen
In order to perform useful tasks for us, robots must have the ability to notice, recognize, and respond to objects and events in their environment. This requires the acquisition and synthesis of information from a variety of sensors. Here we investigate the performance of a number of sensor modalities in an unstructured outdoor environment, including the Microsoft Kinect, thermal infrared camera, and coffee can radar. Special attention is given to acoustic echolocation measurements of approaching vehicles, where an acoustic parametric array propagates an audible signal to the oncoming target and the Kinect microphone array records the reflected backscattered signal. Although useful information about the target is hidden inside the noisy time domain measurements, the Dynamic Wavelet Fingerprint process (DWFP) is used to create a time-frequency representation of the data. A small-dimensional feature vector is created for each measurement using an intelligent feature selection process for use in statistical pattern classification routines. Using our experimentally measured data from real vehicles at 50 m, this process is able to correctly classify vehicles into one of five classes with 94% accuracy. Fully three-dimensional simulations allow us to study the nonlinear beam propagation and interaction with real-world targets to improve classification results.
CAFÉ-Map: Context Aware Feature Mapping for mining high dimensional biomedical data.
Minhas, Fayyaz Ul Amir Afsar; Asif, Amina; Arif, Muhammad
2016-12-01
Feature selection and ranking is of great importance in the analysis of biomedical data. In addition to reducing the number of features used in classification or other machine learning tasks, it allows us to extract meaningful biological and medical information from a machine learning model. Most existing approaches in this domain do not directly model the fact that the relative importance of features can be different in different regions of the feature space. In this work, we present a context aware feature ranking algorithm called CAFÉ-Map. CAFÉ-Map is a locally linear feature ranking framework that allows recognition of important features in any given region of the feature space or for any individual example. This allows for simultaneous classification and feature ranking in an interpretable manner. We have benchmarked CAFÉ-Map on a number of toy and real world biomedical data sets. Our comparative study with a number of published methods shows that CAFÉ-Map achieves better accuracies on these data sets. The top ranking features obtained through CAFÉ-Map in a gene profiling study correlate very well with the importance of different genes reported in the literature. Furthermore, CAFÉ-Map provides a more in-depth analysis of feature ranking at the level of individual examples. CAFÉ-Map Python code is available at: http://faculty.pieas.edu.pk/fayyaz/software.html#cafemap . The CAFÉ-Map package supports parallelization and sparse data and provides example scripts for classification. This code can be used to reconstruct the results given in this paper. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, Jie; Hu, Youmin; Wang, Yan; Wu, Bo; Fan, Jikai; Hu, Zhongxu
2018-05-01
The diagnosis of complicated fault severity problems in rotating machinery systems is an important issue that affects the productivity and quality of manufacturing processes and industrial applications. However, it usually suffers from several deficiencies. (1) A considerable degree of prior knowledge and expertise is required to not only extract and select specific features from raw sensor signals, and but also choose a suitable fusion for sensor information. (2) Traditional artificial neural networks with shallow architectures are usually adopted and they have a limited ability to learn the complex and variable operating conditions. In multi-sensor-based diagnosis applications in particular, massive high-dimensional and high-volume raw sensor signals need to be processed. In this paper, an integrated multi-sensor fusion-based deep feature learning (IMSFDFL) approach is developed to identify the fault severity in rotating machinery processes. First, traditional statistics and energy spectrum features are extracted from multiple sensors with multiple channels and combined. Then, a fused feature vector is constructed from all of the acquisition channels. Further, deep feature learning with stacked auto-encoders is used to obtain the deep features. Finally, the traditional softmax model is applied to identify the fault severity. The effectiveness of the proposed IMSFDFL approach is primarily verified by a one-stage gearbox experimental platform that uses several accelerometers under different operating conditions. This approach can identify fault severity more effectively than the traditional approaches.
Ensemble based on static classifier selection for automated diagnosis of Mild Cognitive Impairment.
Nanni, Loris; Lumini, Alessandra; Zaffonato, Nicolò
2018-05-15
Alzheimer's disease (AD) is the most common cause of neurodegenerative dementia in the elderly population. Scientific research is very active in the challenge of designing automated approaches to achieve an early and certain diagnosis. Recently an international competition among AD predictors has been organized: "A Machine learning neuroimaging challenge for automated diagnosis of Mild Cognitive Impairment" (MLNeCh). This competition is based on pre-processed sets of T1-weighted Magnetic Resonance Images (MRI) to be classified in four categories: stable AD, individuals with MCI who converted to AD, individuals with MCI who did not convert to AD and healthy controls. In this work, we propose a method to perform early diagnosis of AD, which is evaluated on MLNeCh dataset. Since the automatic classification of AD is based on the use of feature vectors of high dimensionality, different techniques of feature selection/reduction are compared in order to avoid the curse-of-dimensionality problem, then the classification method is obtained as the combination of Support Vector Machines trained using different clusters of data extracted from the whole training set. The multi-classifier approach proposed in this work outperforms all the stand-alone method tested in our experiments. The final ensemble is based on a set of classifiers, each trained on a different cluster of the training data. The proposed ensemble has the great advantage of performing well using a very reduced version of the data (the reduction factor is more than 90%). The MATLAB code for the ensemble of classifiers will be publicly available 1 to other researchers for future comparisons. Copyright © 2017 Elsevier B.V. All rights reserved.
Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud
2018-01-01
Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919
Phase space interrogation of the empirical response modes for seismically excited structures
NASA Astrophysics Data System (ADS)
Paul, Bibhas; George, Riya C.; Mishra, Sudib K.
2017-07-01
Conventional Phase Space Interrogation (PSI) for structural damage assessment relies on exciting the structure with low dimensional chaotic waveform, thereby, significantly limiting their applicability to large structures. The PSI technique is presently extended for structure subjected to seismic excitations. The high dimensionality of the phase space for seismic response(s) are overcome by the Empirical Mode Decomposition (EMD), decomposing the responses to a number of intrinsic low dimensional oscillatory modes, referred as Intrinsic Mode Functions (IMFs). Along with their low dimensionality, a few IMFs, retain sufficient information of the system dynamics to reflect the damage induced changes. The mutually conflicting nature of low-dimensionality and the sufficiency of dynamic information are taken care by the optimal choice of the IMF(s), which is shown to be the third/fourth IMFs. The optimal IMF(s) are employed for the reconstruction of the Phase space attractor following Taken's embedding theorem. The widely referred Changes in Phase Space Topology (CPST) feature is then employed on these Phase portrait(s) to derive the damage sensitive feature, referred as the CPST of the IMFs (CPST-IMF). The legitimacy of the CPST-IMF is established as a damage sensitive feature by assessing its variation with a number of damage scenarios benchmarked in the IASC-ASCE building. The damage localization capability, remarkable tolerance to noise contamination and the robustness under different seismic excitations of the feature are demonstrated.
Quantum-enhanced feature selection with forward selection and backward elimination
NASA Astrophysics Data System (ADS)
He, Zhimin; Li, Lvzhou; Huang, Zhiming; Situ, Haozhen
2018-07-01
Feature selection is a well-known preprocessing technique in machine learning, which can remove irrelevant features to improve the generalization capability of a classifier and reduce training and inference time. However, feature selection is time-consuming, particularly for the applications those have thousands of features, such as image retrieval, text mining and microarray data analysis. It is crucial to accelerate the feature selection process. We propose a quantum version of wrapper-based feature selection, which converts a classical feature selection to its quantum counterpart. It is valuable for machine learning on quantum computer. In this paper, we focus on two popular kinds of feature selection methods, i.e., wrapper-based forward selection and backward elimination. The proposed feature selection algorithm can quadratically accelerate the classical one.
Deep neural networks for texture classification-A theoretical analysis.
Basu, Saikat; Mukhopadhyay, Supratik; Karki, Manohar; DiBiano, Robert; Ganguly, Sangram; Nemani, Ramakrishna; Gayaka, Shreekant
2018-01-01
We investigate the use of Deep Neural Networks for the classification of image datasets where texture features are important for generating class-conditional discriminative representations. To this end, we first derive the size of the feature space for some standard textural features extracted from the input dataset and then use the theory of Vapnik-Chervonenkis dimension to show that hand-crafted feature extraction creates low-dimensional representations which help in reducing the overall excess error rate. As a corollary to this analysis, we derive for the first time upper bounds on the VC dimension of Convolutional Neural Network as well as Dropout and Dropconnect networks and the relation between excess error rate of Dropout and Dropconnect networks. The concept of intrinsic dimension is used to validate the intuition that texture-based datasets are inherently higher dimensional as compared to handwritten digits or other object recognition datasets and hence more difficult to be shattered by neural networks. We then derive the mean distance from the centroid to the nearest and farthest sampling points in an n-dimensional manifold and show that the Relative Contrast of the sample data vanishes as dimensionality of the underlying vector space tends to infinity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Military personnel recognition system using texture, colour, and SURF features
NASA Astrophysics Data System (ADS)
Irhebhude, Martins E.; Edirisinghe, Eran A.
2014-06-01
This paper presents an automatic, machine vision based, military personnel identification and classification system. Classification is done using a Support Vector Machine (SVM) on sets of Army, Air Force and Navy camouflage uniform personnel datasets. In the proposed system, the arm of service of personnel is recognised by the camouflage of a persons uniform, type of cap and the type of badge/logo. The detailed analysis done include; camouflage cap and plain cap differentiation using gray level co-occurrence matrix (GLCM) texture feature; classification on Army, Air Force and Navy camouflaged uniforms using GLCM texture and colour histogram bin features; plain cap badge classification into Army, Air Force and Navy using Speed Up Robust Feature (SURF). The proposed method recognised camouflage personnel arm of service on sets of data retrieved from google images and selected military websites. Correlation-based Feature Selection (CFS) was used to improve recognition and reduce dimensionality, thereby speeding the classification process. With this method success rates recorded during the analysis include 93.8% for camouflage appearance category, 100%, 90% and 100% rates of plain cap and camouflage cap categories for Army, Air Force and Navy categories, respectively. Accurate recognition was recorded using SURF for the plain cap badge category. Substantial analysis has been carried out and results prove that the proposed method can correctly classify military personnel into various arms of service. We show that the proposed method can be integrated into a face recognition system, which will recognise personnel in addition to determining the arm of service which the personnel belong. Such a system can be used to enhance the security of a military base or facility.
Vijayan, R S K; Ghoshal, Nanda
2008-10-01
Given the heterogeneity of GABA(A) receptor, the pharmacological significance of identifying subtype selective modulators is increasingly being recognized. Thus, drugs selective for GABA(A) alpha(3) receptors are expected to display fewer side effects than the drugs presently in clinical use. Hence we carried out 3D QSAR (three-dimensional quantitative structure-activity relationship) studies on a series of novel GABA(A) alpha(3) subtype selective modulators to gain more insight into subtype affinity. To identify the 3D functional attributes required for subtype selectivity, a chemical feature-based pharmacophore, primarily based on selective ligands representing diverse structural classes was generated. The obtained pseudo receptor model of the benzodiazepine binding site revealed a binding mode akin to "Message-Address" concept. Scaffold hopping was carried out across multi-conformational May Bridge database for the identification of novel chemotypes. Further a focused data reduction approach was employed to choose a subset of enriched compounds based on "Drug likeness" and "Similarity-based" methods. These results taken together could provide impetus for rational design and optimization of more selective and high affinity leads with a potential to have decreased adverse effects.
Titaley, Ivan A; Ogba, O Maduka; Chibwe, Leah; Hoh, Eunha; Cheong, Paul H-Y; Simonich, Staci L Massey
2018-03-16
Non-targeted analysis of environmental samples, using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC × GC/ToF-MS), poses significant data analysis challenges due to the large number of possible analytes. Non-targeted data analysis of complex mixtures is prone to human bias and is laborious, particularly for comparative environmental samples such as contaminated soil pre- and post-bioremediation. To address this research bottleneck, we developed OCTpy, a Python™ script that acts as a data reduction filter to automate GC × GC/ToF-MS data analysis from LECO ® ChromaTOF ® software and facilitates selection of analytes of interest based on peak area comparison between comparative samples. We used data from polycyclic aromatic hydrocarbon (PAH) contaminated soil, pre- and post-bioremediation, to assess the effectiveness of OCTpy in facilitating the selection of analytes that have formed or degraded following treatment. Using datasets from the soil extracts pre- and post-bioremediation, OCTpy selected, on average, 18% of the initial suggested analytes generated by the LECO ® ChromaTOF ® software Statistical Compare feature. Based on this list, 63-100% of the candidate analytes identified by a highly trained individual were also selected by OCTpy. This process was accomplished in several minutes per sample, whereas manual data analysis took several hours per sample. OCTpy automates the analysis of complex mixtures of comparative samples, reduces the potential for human error during heavy data handling and decreases data analysis time by at least tenfold. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Castillo, Richard; Castillo, Edward; Fuentes, David; Ahmad, Moiz; Wood, Abbie M.; Ludwig, Michelle S.; Guerrero, Thomas
2013-05-01
Landmark point-pairs provide a strategy to assess deformable image registration (DIR) accuracy in terms of the spatial registration of the underlying anatomy depicted in medical images. In this study, we propose to augment a publicly available database (www.dir-lab.com) of medical images with large sets of manually identified anatomic feature pairs between breath-hold computed tomography (BH-CT) images for DIR spatial accuracy evaluation. Ten BH-CT image pairs were randomly selected from the COPDgene study cases. Each patient had received CT imaging of the entire thorax in the supine position at one-fourth dose normal expiration and maximum effort full dose inspiration. Using dedicated in-house software, an imaging expert manually identified large sets of anatomic feature pairs between images. Estimates of inter- and intra-observer spatial variation in feature localization were determined by repeat measurements of multiple observers over subsets of randomly selected features. 7298 anatomic landmark features were manually paired between the 10 sets of images. Quantity of feature pairs per case ranged from 447 to 1172. Average 3D Euclidean landmark displacements varied substantially among cases, ranging from 12.29 (SD: 6.39) to 30.90 (SD: 14.05) mm. Repeat registration of uniformly sampled subsets of 150 landmarks for each case yielded estimates of observer localization error, which ranged in average from 0.58 (SD: 0.87) to 1.06 (SD: 2.38) mm for each case. The additions to the online web database (www.dir-lab.com) described in this work will broaden the applicability of the reference data, providing a freely available common dataset for targeted critical evaluation of DIR spatial accuracy performance in multiple clinical settings. Estimates of observer variance in feature localization suggest consistent spatial accuracy for all observers across both four-dimensional CT and COPDgene patient cohorts.
NASA Technical Reports Server (NTRS)
Nakazawa, S.
1987-01-01
This Annual Status Report presents the results of work performed during the third year of the 3-D Inelastic Analysis Methods for Hot Section Components program (NASA Contract NAS3-23697). The objective of the program is to produce a series of new computer codes that permit more accurate and efficient three-dimensional analysis of selected hot section components, i.e., combustor liners, turbine blades, and turbine vanes. The computer codes embody a progression of mathematical models and are streamlined to take advantage of geometrical features, loading conditions, and forms of material response that distinguish each group of selected components. This report is presented in two volumes. Volume 1 describes effort performed under Task 4B, Special Finite Element Special Function Models, while Volume 2 concentrates on Task 4C, Advanced Special Functions Models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Katherine L.; Ambler, Catherine M.; Anderson, David R.
Through fragment-based drug design focused on engaging the active site of IRAK4 and leveraging three-dimensional topology in a ligand-efficient manner, a micromolar hit identified from a screen of a Pfizer fragment library was optimized to afford IRAK4 inhibitors with nanomolar potency in cellular assays. The medicinal chemistry effort featured the judicious placement of lipophilicity, informed by co-crystal structures with IRAK4 and optimization of ADME properties to deliver clinical candidate PF-06650833 (compound 40). This compound displays a 5-unit increase in lipophilic efficiency from the fragment hit, excellent kinase selectivity, and pharmacokinetic properties suitable for oral administration.
Natural extension of fast-slow decomposition for dynamical systems
NASA Astrophysics Data System (ADS)
Rubin, J. E.; Krauskopf, B.; Osinga, H. M.
2018-01-01
Modeling and parameter estimation to capture the dynamics of physical systems are often challenging because many parameters can range over orders of magnitude and are difficult to measure experimentally. Moreover, selecting a suitable model complexity requires a sufficient understanding of the model's potential use, such as highlighting essential mechanisms underlying qualitative behavior or precisely quantifying realistic dynamics. We present an approach that can guide model development and tuning to achieve desired qualitative and quantitative solution properties. It relies on the presence of disparate time scales and employs techniques of separating the dynamics of fast and slow variables, which are well known in the analysis of qualitative solution features. We build on these methods to show how it is also possible to obtain quantitative solution features by imposing designed dynamics for the slow variables in the form of specified two-dimensional paths in a bifurcation-parameter landscape.
Learning discriminative features from RGB-D images for gender and ethnicity identification
NASA Astrophysics Data System (ADS)
Azzakhnini, Safaa; Ballihi, Lahoucine; Aboutajdine, Driss
2016-11-01
The development of sophisticated sensor technologies gave rise to an interesting variety of data. With the appearance of affordable devices, such as the Microsoft Kinect, depth-maps and three-dimensional data became easily accessible. This attracted many computer vision researchers seeking to exploit this information in classification and recognition tasks. In this work, the problem of face classification in the context of RGB images and depth information (RGB-D images) is addressed. The purpose of this paper is to study and compare some popular techniques for gender recognition and ethnicity classification to understand how much depth data can improve the quality of recognition. Furthermore, we investigate which combination of face descriptors, feature selection methods, and learning techniques is best suited to better exploit RGB-D images. The experimental results show that depth data improve the recognition accuracy for gender and ethnicity classification applications in many use cases.
Reliability of numerical wind tunnels for VAWT simulation
NASA Astrophysics Data System (ADS)
Raciti Castelli, M.; Masi, M.; Battisti, L.; Benini, E.; Brighenti, A.; Dossena, V.; Persico, G.
2016-09-01
Computational Fluid Dynamics (CFD) based on the Unsteady Reynolds Averaged Navier Stokes (URANS) equations have long been widely used to study vertical axis wind turbines (VAWTs). Following a comprehensive experimental survey on the wakes downwind of a troposkien-shaped rotor, a campaign of bi-dimensional simulations is presented here, with the aim of assessing its reliability in reproducing the main features of the flow, also identifying areas needing additional research. Starting from both a well consolidated turbulence model (k-ω SST) and an unstructured grid typology, the main simulation settings are here manipulated in a convenient form to tackle rotating grids reproducing a VAWT operating in an open jet wind tunnel. The dependence of the numerical predictions from the selected grid spacing is investigated, thus establishing the less refined grid size that is still capable of capturing some relevant flow features such as integral quantities (rotor torque) and local ones (wake velocities).
Pichler, Verena; Heffeter, Petra; Valiahdi, Seied M.; Kowol, Christian R.; Egger, Alexander; Berger, Walter; Jakupec, Michael A.; Galanski, Markus; Keppler, Bernhard K.
2014-01-01
Eight novel mononuclear and two dinuclear platinum(IV) complexes were synthesized and characterized by elemental analysis, one- and two-dimensional NMR spectroscopy, mass spectrometry, and reversed-phase HPLC (log kw) and in one case by X-ray diffraction. Cytotoxicity of the compounds was studied in three human cancer cell lines (CH1, SW480, and A549) by means of the MTT assay, featuring IC50 values to the low micromolar range. Furthermore a selected set of compounds was investigated in additional cancer cell lines (P31 and P31/cis, A2780 and A2780/cis, SW1573, 2R120, and 2R160) with regard to their resistance patterns, offering a distinctly different scheme compared to cisplatin. To gain further insights into the mode of action, drug uptake, DNA synthesis inhibition, cell cycle effects, and induction of apoptosis were determined for two characteristic substances. PMID:23194425
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yuan, Rui; Singh, Sudhanshu S.; Chawla, Nikhilesh
2016-08-15
We present a robust method for automating removal of “segregation artifacts” in segmented tomographic images of three-dimensional heterogeneous microstructures. The objective of this method is to accurately identify and separate discrete features in composite materials where limitations in imaging resolution lead to spurious connections near close contacts. The method utilizes betweenness centrality, a measure of the importance of a node in the connectivity of a graph network, to identify voxels that create artificial bridges between otherwise distinct geometric features. To facilitate automation of the algorithm, we develop a relative centrality metric to allow for the selection of a threshold criterionmore » that is not sensitive to inclusion size or shape. As a demonstration of the effectiveness of the algorithm, we report on the segmentation of a 3D reconstruction of a SiC particle reinforced aluminum alloy, imaged by X-ray synchrotron tomography.« less
Schönweiler, R; Wübbelt, P; Tolloczko, R; Rose, C; Ptok, M
2000-01-01
Discriminant analysis (DA) and self-organizing feature maps (SOFM) were used to classify passively evoked auditory event-related potentials (ERP) P(1), N(1), P(2) and N(2). Responses from 16 children with severe behavioral auditory perception deficits, 16 children with marked behavioral auditory perception deficits, and 14 controls were examined. Eighteen ERP amplitude parameters were selected for examination of statistical differences between the groups. Different DA methods and SOFM configurations were trained to the values. SOFM had better classification results than DA methods. Subsequently, measures on another 37 subjects that were unknown for the trained SOFM were used to test the reliability of the system. With 10-dimensional vectors, reliable classifications were obtained that matched behavioral auditory perception deficits in 96%, implying central auditory processing disorder (CAPD). The results also support the assumption that CAPD includes a 'non-peripheral' auditory processing deficit. Copyright 2000 S. Karger AG, Basel.
Zhang, Cunji; Yao, Xifan; Zhang, Jianming; Jin, Hong
2016-05-31
Tool breakage causes losses of surface polishing and dimensional accuracy for machined part, or possible damage to a workpiece or machine. Tool Condition Monitoring (TCM) is considerably vital in the manufacturing industry. In this paper, an indirect TCM approach is introduced with a wireless triaxial accelerometer. The vibrations in the three vertical directions (x, y and z) are acquired during milling operations, and the raw signals are de-noised by wavelet analysis. These features of de-noised signals are extracted in the time, frequency and time-frequency domains. The key features are selected based on Pearson's Correlation Coefficient (PCC). The Neuro-Fuzzy Network (NFN) is adopted to predict the tool wear and Remaining Useful Life (RUL). In comparison with Back Propagation Neural Network (BPNN) and Radial Basis Function Network (RBFN), the results show that the NFN has the best performance in the prediction of tool wear and RUL.
NASA Astrophysics Data System (ADS)
Teixidor, D.; Ferrer, I.; Ciurana, J.
2012-04-01
This paper reports the characterization of laser machining (milling) process to manufacture micro-channels in order to understand the incidence of process parameters on the final features. Selection of process operational parameters is highly critical for successful laser micromachining. A set of designed experiments is carried out in a pulsed Nd:YAG laser system using AISI H13 hardened tool steel as work material. Several micro-channels have been manufactured as micro-mold cavities varying parameters such as scanning speed (SS), pulse intensity (PI) and pulse frequency (PF). Results are obtained by evaluating the dimensions and the surface finish of the micro-channel. The dimensions and shape of the micro-channels produced with laser-micro-milling process exhibit variations. In general the use of low scanning speeds increases the quality of the feature in both surface finishing and dimensional.
Two-photon calcium imaging during fictive navigation in virtual environments.
Ahrens, Misha B; Huang, Kuo Hua; Narayan, Sujatha; Mensh, Brett D; Engert, Florian
2013-01-01
A full understanding of nervous system function requires recording from large populations of neurons during naturalistic behaviors. Here we enable paralyzed larval zebrafish to fictively navigate two-dimensional virtual environments while we record optically from many neurons with two-photon imaging. Electrical recordings from motor nerves in the tail are decoded into intended forward swims and turns, which are used to update a virtual environment displayed underneath the fish. Several behavioral features-such as turning responses to whole-field motion and dark avoidance-are well-replicated in this virtual setting. We readily observed neuronal populations in the hindbrain with laterally selective responses that correlated with right or left optomotor behavior. We also observed neurons in the habenula, pallium, and midbrain with response properties specific to environmental features. Beyond single-cell correlations, the classification of network activity in such virtual settings promises to reveal principles of brainwide neural dynamics during behavior.
Designing perturbative metamaterials from discrete models.
Matlack, Kathryn H; Serra-Garcia, Marc; Palermo, Antonio; Huber, Sebastian D; Daraio, Chiara
2018-04-01
Identifying material geometries that lead to metamaterials with desired functionalities presents a challenge for the field. Discrete, or reduced-order, models provide a concise description of complex phenomena, such as negative refraction, or topological surface states; therefore, the combination of geometric building blocks to replicate discrete models presenting the desired features represents a promising approach. However, there is no reliable way to solve such an inverse problem. Here, we introduce 'perturbative metamaterials', a class of metamaterials consisting of weakly interacting unit cells. The weak interaction allows us to associate each element of the discrete model with individual geometric features of the metamaterial, thereby enabling a systematic design process. We demonstrate our approach by designing two-dimensional elastic metamaterials that realize Veselago lenses, zero-dispersion bands and topological surface phonons. While our selected examples are within the mechanical domain, the same design principle can be applied to acoustic, thermal and photonic metamaterials composed of weakly interacting unit cells.
Kurosumi, M; Mizukoshi, K
2018-05-01
The types of shape feature that constitutes a face have not been comprehensively established, and most previous studies of age-related changes in facial shape have focused on individual characteristics, such as wrinkle, sagging skin, etc. In this study, we quantitatively measured differences in face shape between individuals and investigated how shape features changed with age. We analyzed three-dimensionally the faces of 280 Japanese women aged 20-69 years and used principal component analysis to establish the shape features that characterized individual differences. We also evaluated the relationships between each feature and age, clarifying the shape features characteristic of different age groups. Changes in facial shape in middle age were a decreased volume of the upper face and increased volume of the whole cheeks and around the chin. Changes in older people were an increased volume of the lower cheeks and around the chin, sagging skin, and jaw distortion. Principal component analysis was effective for identifying facial shape features that represent individual and age-related differences. This method allowed straightforward measurements, such as the increase or decrease in cheeks caused by soft tissue changes or skeletal-based changes to the forehead or jaw, simply by acquiring three-dimensional facial images. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Jiang, Li; Shi, Tielin; Xuan, Jianping
2012-05-01
Generally, the vibration signals of fault bearings are non-stationary and highly nonlinear under complicated operating conditions. Thus, it's a big challenge to extract optimal features for improving classification and simultaneously decreasing feature dimension. Kernel Marginal Fisher analysis (KMFA) is a novel supervised manifold learning algorithm for feature extraction and dimensionality reduction. In order to avoid the small sample size problem in KMFA, we propose regularized KMFA (RKMFA). A simple and efficient intelligent fault diagnosis method based on RKMFA is put forward and applied to fault recognition of rolling bearings. So as to directly excavate nonlinear features from the original high-dimensional vibration signals, RKMFA constructs two graphs describing the intra-class compactness and the inter-class separability, by combining traditional manifold learning algorithm with fisher criteria. Therefore, the optimal low-dimensional features are obtained for better classification and finally fed into the simplest K-nearest neighbor (KNN) classifier to recognize different fault categories of bearings. The experimental results demonstrate that the proposed approach improves the fault classification performance and outperforms the other conventional approaches.
Chan, Louis K H; Hayward, William G
2009-02-01
In feature integration theory (FIT; A. Treisman & S. Sato, 1990), feature detection is driven by independent dimensional modules, and other searches are driven by a master map of locations that integrates dimensional information into salience signals. Although recent theoretical models have largely abandoned this distinction, some observed results are difficult to explain in its absence. The present study measured dimension-specific performance during detection and localization, tasks that require operation of dimensional modules and the master map, respectively. Results showed a dissociation between tasks in terms of both dimension-switching costs and cross-dimension attentional capture, reflecting a dimension-specific nature for detection tasks and a dimension-general nature for localization tasks. In a feature-discrimination task, results precluded an explanation based on response mode. These results are interpreted to support FIT's postulation that different mechanisms are involved in parallel and focal attention searches. This indicates that the FIT architecture should be adopted to explain the current results and that a variety of visual attention findings can be addressed within this framework. Copyright 2009 APA, all rights reserved.
NASA Astrophysics Data System (ADS)
van de Moortele, Tristan; Nemes, Andras; Wendt, Christine; Coletti, Filippo
2016-11-01
The morphological features of the airway tree directly affect the air flow features during breathing, which determines the gas exchange and inhaled particle transport. Lung disease, Chronic Obstructive Pulmonary Disease (COPD) in this study, affects the structural features of the lungs, which in turn negatively affects the air flow through the airways. Here bronchial tree air volume geometries are segmented from Computed Tomography (CT) scans of healthy and diseased subjects. Geometrical analysis of the airway centerlines and corresponding cross-sectional areas provide insight into the specific effects of COPD on the airway structure. These geometries are also used to 3D print anatomically accurate, patient specific flow models. Three-component, three-dimensional velocity fields within these models are acquired using Magnetic Resonance Imaging (MRI). The three-dimensional flow fields provide insight into the change in flow patterns and features. Additionally, particle trajectories are determined using the velocity fields, to identify the fate of therapeutic and harmful inhaled aerosols. Correlation between disease-specific and patient-specific anatomical features with dysfunctional airflow patterns can be achieved by combining geometrical and flow analysis.
Tensor Rank Preserving Discriminant Analysis for Facial Recognition.
Tao, Dapeng; Guo, Yanan; Li, Yaotang; Gao, Xinbo
2017-10-12
Facial recognition, one of the basic topics in computer vision and pattern recognition, has received substantial attention in recent years. However, for those traditional facial recognition algorithms, the facial images are reshaped to a long vector, thereby losing part of the original spatial constraints of each pixel. In this paper, a new tensor-based feature extraction algorithm termed tensor rank preserving discriminant analysis (TRPDA) for facial image recognition is proposed; the proposed method involves two stages: in the first stage, the low-dimensional tensor subspace of the original input tensor samples was obtained; in the second stage, discriminative locality alignment was utilized to obtain the ultimate vector feature representation for subsequent facial recognition. On the one hand, the proposed TRPDA algorithm fully utilizes the natural structure of the input samples, and it applies an optimization criterion that can directly handle the tensor spectral analysis problem, thereby decreasing the computation cost compared those traditional tensor-based feature selection algorithms. On the other hand, the proposed TRPDA algorithm extracts feature by finding a tensor subspace that preserves most of the rank order information of the intra-class input samples. Experiments on the three facial databases are performed here to determine the effectiveness of the proposed TRPDA algorithm.
PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm.
Xu, Qian; Xiong, Yi; Dai, Hao; Kumari, Kotni Meena; Xu, Qin; Ou, Hong-Yu; Wei, Dong-Qing
2017-03-21
Combinatorial therapy is a promising strategy for combating complex diseases by improving the efficacy and reducing the side effects. To facilitate the identification of drug combinations in pharmacology, we proposed a new computational model, termed PDC-SGB, to predict effective drug combinations by integrating biological, chemical and pharmacological information based on a stochastic gradient boosting algorithm. To begin with, a set of 352 golden positive samples were collected from the public drug combination database. Then, a set of 732 dimensional feature vector involving biological, chemical and pharmaceutical information was constructed for each drug combination to describe its properties. To avoid overfitting, the maximum relevance & minimum redundancy (mRMR) method was performed to extract useful ones by removing redundant subsets. Based on the selected features, the three different type of classification algorithms were employed to build the drug combination prediction models. Our results demonstrated that the model based on the stochastic gradient boosting algorithm yield out the best performance. Furthermore, it is indicated that the feature patterns of therapy had powerful ability to discriminate effective drug combinations from non-effective ones. By analyzing various features, it is shown that the enriched features occurred frequently in golden positive samples can help predict novel drug combinations. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
NASA Astrophysics Data System (ADS)
Bikku, Thulasi; Sambasiva Rao, N., Dr; Rao, Akepogu Ananda, Dr
2017-08-01
This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases. Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.
Feature Selection for Chemical Sensor Arrays Using Mutual Information
Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.
2014-01-01
We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058
Post-SELEX optimization of aptamers.
Gao, Shunxiang; Zheng, Xin; Jiao, Binghua; Wang, Lianghua
2016-07-01
Aptamers are functional single-stranded DNA or RNA oligonucleotides, selected in vitro by SELEX (Systematic Evolution of Ligands by Exponential Enrichment), which can fold into stable unique three-dimensional structures that bind their target ligands with high affinity and specificity. Although aptamers show a number of favorable advantages such as better stability and easier modification when compared with the properties of antibodies, only a handful of aptamers have entered clinical trials and only one, pegaptanib, has received US Food and Drug Administration approval for clinical use. The main reasons that limit the practical application of aptamers are insufficient nuclease stability, bioavailability, thermal stability, or even affinity. Some aptamers obtained from modified libraries show better properties; however, polymerase amplification of nucleic acids containing non-natural bases is currently a primary drawback of the SELEX process. This review focuses on several post-SELEX optimization strategies of aptamers identified in recent years. We describe four common methods in detail: truncation, chemical modification, bivalent or multivalent aptamer construction, and mutagenesis. We believe that these optimization strategies should improve one or more specific properties of aptamers, and the type of feature(s) selected for improvement will be dependent on the application purpose.
Aphinyanaphongs, Yin; Fu, Lawrence D; Aliferis, Constantin F
2013-01-01
Building machine learning models that identify unproven cancer treatments on the Health Web is a promising approach for dealing with the dissemination of false and dangerous information to vulnerable health consumers. Aside from the obvious requirement of accuracy, two issues are of practical importance in deploying these models in real world applications. (a) Generalizability: The models must generalize to all treatments (not just the ones used in the training of the models). (b) Scalability: The models can be applied efficiently to billions of documents on the Health Web. First, we provide methods and related empirical data demonstrating strong accuracy and generalizability. Second, by combining the MapReduce distributed architecture and high dimensionality compression via Markov Boundary feature selection, we show how to scale the application of the models to WWW-scale corpora. The present work provides evidence that (a) a very small subset of unproven cancer treatments is sufficient to build a model to identify unproven treatments on the web; (b) unproven treatments use distinct language to market their claims and this language is learnable; (c) through distributed parallelization and state of the art feature selection, it is possible to prepare the corpora and build and apply models with large scalability.
Biodynamic digital holography of chemoresistance in a pre-clinical trial of canine B-cell lymphoma.
Choi, Honggu; Li, Zhe; Sun, Hao; Merrill, Dan; Turek, John; Childress, Michael; Nolte, David
2018-05-01
Biodynamic digital holography was used to obtain phenotypic profiles of canine non-Hodgkin B-cell lymphoma biopsies treated with standard-of-care chemotherapy. Biodynamic signatures from the living 3D tissues were extracted using fluctuation spectroscopy from intracellular Doppler light scattering in response to the molecular mechanisms of action of therapeutic drugs that modify a range of internal cellular motions. The standard-of-care to treat B-cell lymphoma in both humans and dogs is a combination CHOP therapy that consists of doxorubicin, prednisolone, cyclophosphamide and vincristine. The proportion of dogs experiencing durable cancer remission following CHOP chemotherapy was 68%, with 13 out of 19 dogs responding favorably to therapy and 6 dogs failing to have progression-free survival times greater than 100 days. Biodynamic signatures were found that correlate with inferior survival times, and biomarker selection was optimized to identify specific Doppler signatures related to chemoresistance. A machine learning classifier was constructed based on feature vector correlations and linear separability in high-dimensional feature space. Hold-out validation predicted patient response to therapy with 84% accuracy. These results point to the potential for biodynamic profiling to contribute to personalized medicine by aiding the selection of chemotherapy for cancer patients.
Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments
De Paris, Renata; Quevedo, Christian V.; Ruiz, Duncan D.; Norberto de Souza, Osmar; Barros, Rodrigo C.
2015-01-01
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand. PMID:25873944
Good Practices for Learning to Recognize Actions Using FV and VLAD.
Wu, Jianxin; Zhang, Yu; Lin, Weiyao
2016-12-01
High dimensional representations such as Fisher vectors (FV) and vectors of locally aggregated descriptors (VLAD) have shown state-of-the-art accuracy for action recognition in videos. The high dimensionality, on the other hand, also causes computational difficulties when scaling up to large-scale video data. This paper makes three lines of contributions to learning to recognize actions using high dimensional representations. First, we reviewed several existing techniques that improve upon FV or VLAD in image classification, and performed extensive empirical evaluations to assess their applicability for action recognition. Our analyses of these empirical results show that normality and bimodality are essential to achieve high accuracy. Second, we proposed a new pooling strategy for VLAD and three simple, efficient, and effective transformations for both FV and VLAD. Both proposed methods have shown higher accuracy than the original FV/VLAD method in extensive evaluations. Third, we proposed and evaluated new feature selection and compression methods for the FV and VLAD representations. This strategy uses only 4% of the storage of the original representation, but achieves comparable or even higher accuracy. Based on these contributions, we recommend a set of good practices for action recognition in videos for practitioners in this field.
Zhang, Bo; Chen, Zhen; Albert, Paul S
2012-01-01
High-dimensional biomarker data are often collected in epidemiological studies when assessing the association between biomarkers and human disease is of interest. We develop a latent class modeling approach for joint analysis of high-dimensional semicontinuous biomarker data and a binary disease outcome. To model the relationship between complex biomarker expression patterns and disease risk, we use latent risk classes to link the 2 modeling components. We characterize complex biomarker-specific differences through biomarker-specific random effects, so that different biomarkers can have different baseline (low-risk) values as well as different between-class differences. The proposed approach also accommodates data features that are common in environmental toxicology and other biomarker exposure data, including a large number of biomarkers, numerous zero values, and complex mean-variance relationship in the biomarkers levels. A Monte Carlo EM (MCEM) algorithm is proposed for parameter estimation. Both the MCEM algorithm and model selection procedures are shown to work well in simulations and applications. In applying the proposed approach to an epidemiological study that examined the relationship between environmental polychlorinated biphenyl (PCB) exposure and the risk of endometriosis, we identified a highly significant overall effect of PCB concentrations on the risk of endometriosis.
Castro, Jorge
2017-07-11
This paper reviews the main modeling techniques for stone columns, both ordinary stone columns and geosynthetic-encased stone columns. The paper tries to encompass the more recent advances and recommendations in the topic. Regarding the geometrical model, the main options are the "unit cell", longitudinal gravel trenches in plane strain conditions, cylindrical rings of gravel in axial symmetry conditions, equivalent homogeneous soil with improved properties and three-dimensional models, either a full three-dimensional model or just a three-dimensional row or slice of columns. Some guidelines for obtaining these simplified geometrical models are provided and the particular case of groups of columns under footings is also analyzed. For the latter case, there is a column critical length that is around twice the footing width for non-encased columns in a homogeneous soft soil. In the literature, the column critical length is sometimes given as a function of the column length, which leads to some disparities in its value. Here it is shown that the column critical length mainly depends on the footing dimensions. Some other features related with column modeling are also briefly presented, such as the influence of column installation. Finally, some guidance and recommendations are provided on parameter selection for the study of stone columns.
2017-01-01
This paper reviews the main modeling techniques for stone columns, both ordinary stone columns and geosynthetic-encased stone columns. The paper tries to encompass the more recent advances and recommendations in the topic. Regarding the geometrical model, the main options are the “unit cell”, longitudinal gravel trenches in plane strain conditions, cylindrical rings of gravel in axial symmetry conditions, equivalent homogeneous soil with improved properties and three-dimensional models, either a full three-dimensional model or just a three-dimensional row or slice of columns. Some guidelines for obtaining these simplified geometrical models are provided and the particular case of groups of columns under footings is also analyzed. For the latter case, there is a column critical length that is around twice the footing width for non-encased columns in a homogeneous soft soil. In the literature, the column critical length is sometimes given as a function of the column length, which leads to some disparities in its value. Here it is shown that the column critical length mainly depends on the footing dimensions. Some other features related with column modeling are also briefly presented, such as the influence of column installation. Finally, some guidance and recommendations are provided on parameter selection for the study of stone columns. PMID:28773146
The study of integration about measurable image and 4D production
NASA Astrophysics Data System (ADS)
Zhang, Chunsen; Hu, Pingbo; Niu, Weiyun
2008-12-01
In this paper, we create the geospatial data of three-dimensional (3D) modeling by the combination of digital photogrammetry and digital close-range photogrammetry. For large-scale geographical background, we make the establishment of DEM and DOM combination of three-dimensional landscape model based on the digital photogrammetry which uses aerial image data to make "4D" (DOM: Digital Orthophoto Map, DEM: Digital Elevation Model, DLG: Digital Line Graphic and DRG: Digital Raster Graphic) production. For the range of building and other artificial features which the users are interested in, we realize that the real features of the three-dimensional reconstruction adopting the method of the digital close-range photogrammetry can come true on the basis of following steps : non-metric cameras for data collection, the camera calibration, feature extraction, image matching, and other steps. At last, we combine three-dimensional background and local measurements real images of these large geographic data and realize the integration of measurable real image and the 4D production.The article discussed the way of the whole flow and technology, achieved the three-dimensional reconstruction and the integration of the large-scale threedimensional landscape and the metric building.
Rough sets and Laplacian score based cost-sensitive feature selection
Yu, Shenglong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884
Rough sets and Laplacian score based cost-sensitive feature selection.
Yu, Shenglong; Zhao, Hong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.
Computer-aided teniae coli detection using height maps from computed tomographic colonography images
NASA Astrophysics Data System (ADS)
Wei, Zhuoshi; Yao, Jianhua; Wang, Shijun; Summers, Ronald M.
2011-03-01
Computed tomographic colonography (CTC) is a minimally invasive technique for colonic polyps and cancer screening. Teniae coli are three bands of longitudinal smooth muscle on the colon surface. They are parallel, equally distributed on the colon wall, and form a triple helix structure from the appendix to the sigmoid colon. Because of their characteristics, teniae coli are important anatomical meaningful landmarks on human colon. This paper proposes a novel method for teniae coli detection on CT colonography. We first unfold the three-dimensional (3D) colon using a reversible projection technique and compute the two-dimensional (2D) height map of the unfolded colon. The height map records the elevation of colon surface relative to the unfolding plane, where haustral folds corresponding to high elevation points and teniae to low elevation points. The teniae coli are detected on the height map and then projected back to the 3D colon. Since teniae are located where the haustral folds meet, we break down the problem by first detecting haustral folds. We apply 2D Gabor filter banks to extract fold features. The maximum response of the filter banks is then selected as the feature image. The fold centers are then identified based on piecewise thresholding on the feature image. Connecting the fold centers yields a path of the folds. Teniae coli are finally extracted as lines running between the fold paths. Experiments were carried out on 7 cases. The proposed method yielded a promising result with an average normalized RMSE of 5.66% and standard deviation of 4.79% of the circumference of the colon.
Barbosa, Daniel C; Roupar, Dalila B; Ramos, Jaime C; Tavares, Adriano C; Lima, Carlos S
2012-01-11
Wireless capsule endoscopy has been introduced as an innovative, non-invasive diagnostic technique for evaluation of the gastrointestinal tract, reaching places where conventional endoscopy is unable to. However, the output of this technique is an 8 hours video, whose analysis by the expert physician is very time consuming. Thus, a computer assisted diagnosis tool to help the physicians to evaluate CE exams faster and more accurately is an important technical challenge and an excellent economical opportunity. The set of features proposed in this paper to code textural information is based on statistical modeling of second order textural measures extracted from co-occurrence matrices. To cope with both joint and marginal non-Gaussianity of second order textural measures, higher order moments are used. These statistical moments are taken from the two-dimensional color-scale feature space, where two different scales are considered. Second and higher order moments of textural measures are computed from the co-occurrence matrices computed from images synthesized by the inverse wavelet transform of the wavelet transform containing only the selected scales for the three color channels. The dimensionality of the data is reduced by using Principal Component Analysis. The proposed textural features are then used as the input of a classifier based on artificial neural networks. Classification performances of 93.1% specificity and 93.9% sensitivity are achieved on real data. These promising results open the path towards a deeper study regarding the applicability of this algorithm in computer aided diagnosis systems to assist physicians in their clinical practice.
Three Dimensional Imaging with Multiple Wavelength Speckle Interferometry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernacki, Bruce E.; Cannon, Bret D.; Schiffern, John T.
2014-05-28
We present the design, modeling, construction, and results of a three-dimensional imager based upon multiple-wavelength speckle interferometry. A surface under test is illuminated with tunable laser light in a Michelson interferometer configuration while a speckled image is acquired at each laser frequency step. The resulting hypercube is Fourier transformed in the frequency dimension and the beat frequencies that result map the relative offsets of surface features. Synthetic wavelengths resulting from the laser tuning can probe features ranging from 18 microns to hundreds of millimeters. Three dimensional images will be presented along with modeling results.
Fingerprints selection for topological localization
NASA Astrophysics Data System (ADS)
Popov, Vladimir
2017-07-01
Problems of visual navigation are extensively studied in contemporary robotics. In particular, we can mention different problems of visual landmarks selection, the problem of selection of a minimal set of visual landmarks, selection of partially distinguishable guards, the problem of placement of visual landmarks. In this paper, we consider one-dimensional color panoramas. Such panoramas can be used for creating fingerprints. Fingerprints give us unique identifiers for visually distinct locations by recovering statistically significant features. Fingerprints can be used as visual landmarks for the solution of various problems of mobile robot navigation. In this paper, we consider a method for automatic generation of fingerprints. In particular, we consider the bounded Post correspondence problem and applications of the problem to consensus fingerprints and topological localization. We propose an efficient approach to solve the bounded Post correspondence problem. In particular, we use an explicit reduction from the decision version of the problem to the satisfiability problem. We present the results of computational experiments for different satisfiability algorithms. In robotic experiments, we consider the average accuracy of reaching of the target point for different lengths of routes and types of fingerprints.
Lepre, Jorge; Rice, J Jeremy; Tu, Yuhai; Stolovitzky, Gustavo
2004-05-01
Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen).
Schädler, Marc René; Kollmeier, Birger
2015-04-01
To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor filter bank. A feature set that is extracted with these separate spectral and temporal modulation filter banks was introduced, the separate Gabor filter bank (SGBFB) features, and evaluated on the CHiME (Computational Hearing in Multisource Environments) keywords-in-noise recognition task. From the perspective of robust ASR, the results showed that spectral and temporal processing can be performed independently and are not required to interact with each other. Using SGBFB features permitted the signal-to-noise ratio (SNR) to be lowered by 1.2 dB while still performing as well as the GBFB-based reference system, which corresponds to a relative improvement of the word error rate by 12.8%. Additionally, the real time factor of the spectro-temporal processing could be reduced by more than an order of magnitude. Compared to human listeners, the SNR needed to be 13 dB higher when using Mel-frequency cepstral coefficient features, 11 dB higher when using GBFB features, and 9 dB higher when using SGBFB features to achieve the same recognition performance.
Titanate-based adsorbents for radioactive ions entrapment from water.
Yang, Dongjiang; Liu, Hongwei; Zheng, Zhanfeng; Sarina, Sarina; Zhu, Huaiyong
2013-03-21
This feature article reviews some titanate-based adsorbents for the removal of radioactive wastes (cations and anions) from water. At the beginning, we discuss the development of the conventional ion-exchangeable titanate powders for the entrapment of radioactive cations, such as crystalline silicotitanate (CST), monosodium titanate (MST), peroxotitanate (PT). Then, we specially emphasize the recent progress in the uptake of radioactive ions by one-dimensional (1D) sodium titanate nanofibers and nanotubes, which includes the synthesis and phase transformation of the 1D nanomaterials, adsorption ability (capacity, selectivity, kinetics, etc.) of radioactive cations and anions, and the structural evolution during the adsorption process.
PATTERNS IN BIOMEDICAL DATA-HOW DO WE FIND THEM?
Basile, Anna O; Verma, Anurag; Byrska-Bishop, Marta; Pendergrass, Sarah A; Darabos, Christian; Lester Kirchner, H
2017-01-01
Given the exponential growth of biomedical data, researchers are faced with numerous challenges in extracting and interpreting information from these large, high-dimensional, incomplete, and often noisy data. To facilitate addressing this growing concern, the "Patterns in Biomedical Data-How do we find them?" session of the 2017 Pacific Symposium on Biocomputing (PSB) is devoted to exploring pattern recognition using data-driven approaches for biomedical and precision medicine applications. The papers selected for this session focus on novel machine learning techniques as well as applications of established methods to heterogeneous data. We also feature manuscripts aimed at addressing the current challenges associated with the analysis of biomedical data.
NASA Astrophysics Data System (ADS)
Banon, J.-P.; Hetland, Ø. S.; Simonsen, I.
2018-02-01
By the use of both perturbative and non-perturbative solutions of the reduced Rayleigh equation, we present a detailed study of the scattering of light from two-dimensional weakly rough dielectric films. It is shown that for several rough film configurations, Selényi interference rings exist in the diffusely scattered light. For film systems supported by dielectric substrates where only one of the two interfaces of the film is weakly rough and the other planar, Selényi interference rings are observed at angular positions that can be determined from simple phase arguments. For such single-rough-interface films, we find and explain by a single scattering model that the contrast in the interference patterns is better when the top interface of the film (the interface facing the incident light) is rough than when the bottom interface is rough. When both film interfaces are rough, Selényi interference rings exist but a potential cross-correlation of the two rough interfaces of the film can be used to selectively enhance some of the interference rings while others are attenuated and might even disappear. This feature may in principle be used in determining the correlation properties of interfaces of films that otherwise would be difficult to access.
Attentional Bias in Human Category Learning: The Case of Deep Learning.
Hanson, Catherine; Caglar, Leyla Roskan; Hanson, Stephen José
2018-01-01
Category learning performance is influenced by both the nature of the category's structure and the way category features are processed during learning. Shepard (1964, 1987) showed that stimuli can have structures with features that are statistically uncorrelated (separable) or statistically correlated (integral) within categories. Humans find it much easier to learn categories having separable features, especially when attention to only a subset of relevant features is required, and harder to learn categories having integral features, which require consideration of all of the available features and integration of all the relevant category features satisfying the category rule (Garner, 1974). In contrast to humans, a single hidden layer backpropagation (BP) neural network has been shown to learn both separable and integral categories equally easily, independent of the category rule (Kruschke, 1993). This "failure" to replicate human category performance appeared to be strong evidence that connectionist networks were incapable of modeling human attentional bias. We tested the presumed limitations of attentional bias in networks in two ways: (1) by having networks learn categories with exemplars that have high feature complexity in contrast to the low dimensional stimuli previously used, and (2) by investigating whether a Deep Learning (DL) network, which has demonstrated humanlike performance in many different kinds of tasks (language translation, autonomous driving, etc.), would display human-like attentional bias during category learning. We were able to show a number of interesting results. First, we replicated the failure of BP to differentially process integral and separable category structures when low dimensional stimuli are used (Garner, 1974; Kruschke, 1993). Second, we show that using the same low dimensional stimuli, Deep Learning (DL), unlike BP but similar to humans, learns separable category structures more quickly than integral category structures. Third, we show that even BP can exhibit human like learning differences between integral and separable category structures when high dimensional stimuli (face exemplars) are used. We conclude, after visualizing the hidden unit representations, that DL appears to extend initial learning due to feature development thereby reducing destructive feature competition by incrementally refining feature detectors throughout later layers until a tipping point (in terms of error) is reached resulting in rapid asymptotic learning.
NASA Astrophysics Data System (ADS)
Aviles, Angelica I.; Alsaleh, Samar; Sobrevilla, Pilar; Casals, Alicia
2016-03-01
Robotic-Assisted Surgery approach overcomes the limitations of the traditional laparoscopic and open surgeries. However, one of its major limitations is the lack of force feedback. Since there is no direct interaction between the surgeon and the tissue, there is no way of knowing how much force the surgeon is applying which can result in irreversible injuries. The use of force sensors is not practical since they impose different constraints. Thus, we make use of a neuro-visual approach to estimate the applied forces, in which the 3D shape recovery together with the geometry of motion are used as input to a deep network based on LSTM-RNN architecture. When deep networks are used in real time, pre-processing of data is a key factor to reduce complexity and improve the network performance. A common pre-processing step is dimensionality reduction which attempts to eliminate redundant and insignificant information by selecting a subset of relevant features to use in model construction. In this work, we show the effects of dimensionality reduction in a real-time application: estimating the applied force in Robotic-Assisted Surgeries. According to the results, we demonstrated positive effects of doing dimensionality reduction on deep networks including: faster training, improved network performance, and overfitting prevention. We also show a significant accuracy improvement, ranging from about 33% to 86%, over existing approaches related to force estimation.
The Specificity of Sound Symbolic Correspondences in Spoken Language.
Tzeng, Christina Y; Nygaard, Lynne C; Namy, Laura L
2017-11-01
Although language has long been regarded as a primarily arbitrary system, sound symbolism, or non-arbitrary correspondences between the sound of a word and its meaning, also exists in natural language. Previous research suggests that listeners are sensitive to sound symbolism. However, little is known about the specificity of these mappings. This study investigated whether sound symbolic properties correspond to specific meanings, or whether these properties generalize across semantic dimensions. In three experiments, native English-speaking adults heard sound symbolic foreign words for dimensional adjective pairs (big/small, round/pointy, fast/slow, moving/still) and for each foreign word, selected a translation among English antonyms that either matched or mismatched with the correct meaning dimension. Listeners agreed more reliably on the English translation for matched relative to mismatched dimensions, though reliable cross-dimensional mappings did occur. These findings suggest that although sound symbolic properties generalize to meanings that may share overlapping semantic features, sound symbolic mappings offer semantic specificity. Copyright © 2016 Cognitive Science Society, Inc.
Semiconductor Nanomembrane Tubes: Three-Dimensional Confinement for Controlled Neurite Outgrowth
Yu, Minrui; Huang, Yu; Ballweg, Jason; Shin, Hyuncheol; Huang, Minghuang; Savage, Donald E.; Lagally, Max G.; Dent, Erik W.; Blick, Robert H.; Williams, Justin C.
2013-01-01
In many neural culture studies, neurite migration on a flat, open surface does not reflect the three-dimensional (3D) microenvironment in vivo. With that in mind, we fabricated arrays of semiconductor tubes using strained silicon (Si) and germanium (Ge) nanomembranes and employed them as a cell culture substrate for primary cortical neurons. Our experiments show that the SiGe substrate and the tube fabrication process are biologically viable for neuron cells. We also observe that neurons are attracted by the tube topography, even in the absence of adhesion factors, and can be guided to pass through the tubes during outgrowth. Coupled with selective seeding of individual neurons close to the tube opening, growth within a tube can be limited to a single axon. Furthermore, the tube feature resembles the natural myelin, both physically and electrically, and it is possible to control the tube diameter to be close to that of an axon, providing a confined 3D contact with the axon membrane and potentially insulating it from the extracellular solution. PMID:21366271
Multiscale Anomaly Detection and Image Registration Algorithms for Airborne Landmine Detection
2008-05-01
with the sensed image. The two- dimensional correlation coefficient r for two matrices A and B both of size M ×N is given by r = ∑ m ∑ n (Amn...correlation based method by matching features in a high- dimensional feature- space . The current implementation of the SIFT algorithm uses a brute-force...by repeatedly convolving the image with a Guassian kernel. Each plane of the scale
Ranjanomennahary, P; Ghalila, S Sevestre; Malouche, D; Marchadier, A; Rachidi, M; Benhamou, Cl; Chappard, C
2011-01-01
Hip fracture is a serious health problem and textural methods are being developed to assess bone quality. The authors aimed to perform textural analysis at femur on high-resolution digital radiographs compared to three-dimensional (3D) microarchitecture comparatively to bone mineral density. Sixteen cadaveric femurs were imaged with an x-ray device using a C-MOS sensor. One 17 mm square region of interest (ROI) was selected in the femoral head (FH) and one in the great trochanter (GT). Two-dimensional (2D) textural features from the co-occurrence matrices were extracted. Site-matched measurements of bone mineral density were performed. Inside each ROI, a 16 mm diameter core was extracted. Apparent density (Dapp) and bone volume proportion (BV/TV(Arch)) were measured from a defatted bone core using Archimedes' principle. Microcomputed tomography images of the entire length of the core were obtained (Skyscan 1072) at 19.8 microm of resolution and usual 3D morphometric parameters were computed on the binary volume after calibration from BV/TV(Arch). Then, bone surface/bone volume, trabecular thickness, trabecular separation, and trabecular number were obtained by direct methods without model assumption and the structure model index was calculated. In univariate analysis, the correlation coefficients between 2D textural features and 3D morphological parameters reached 0.83 at the FH and 0.79 at the GT. In multivariate canonical correlation analysis, coefficients of the first component reached 0.95 at the FH and 0.88 at the GT. Digital radiographs, widely available and economically viable, are an alternative method for evaluating bone microarchitectural structure.
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1980-01-01
A generalized three dimensional perspective software capability was developed within the framework of a low cost computer oriented geographically based information system using the Earth Resources Laboratory Applications Software (ELAS) operating subsystem. This perspective software capability, developed primarily to support data display requirements at the NASA/NSTL Earth Resources Laboratory, provides a means of displaying three dimensional feature space object data in two dimensional picture plane coordinates and makes it possible to overlay different types of information on perspective drawings to better understand the relationship of physical features. An example topographic data base is constructed and is used as the basic input to the plotting module. Examples are shown which illustrate oblique viewing angles that convey spatial concepts and relationships represented by the topographic data planes.
Inter- and Intra-Dimensional Dependencies in Implicit Phonotactic Learning
ERIC Educational Resources Information Center
Moreton, Elliott
2012-01-01
Is phonological learning subject to the same inductive biases as learning in other domains? Previous studies of non-linguistic learning found that intra-dimensional dependencies (between two instances of the same feature) were learned more easily than inter-dimensional ones. This study compares implicit learning of intra- and inter-dimensional…
Using Three-Dimensional Interactive Graphics To Teach Equipment Procedures.
ERIC Educational Resources Information Center
Hamel, Cheryl J.; Ryan-Jones, David L.
1997-01-01
Focuses on how three-dimensional graphical and interactive features of computer-based instruction can enhance learning and support human cognition during technical training of equipment procedures. Presents guidelines for using three-dimensional interactive graphics to teach equipment procedures based on studies of the effects of graphics, motion,…
Walters, Glenn D; Diamond, Pamela M; Magaletta, Philip R
2010-03-01
Three indicators derived from the Personality Assessment Inventory (PAI) Alcohol Problems scale (ALC)-tolerance/high consumption, loss of control, and negative social and psychological consequences-were subjected to taxometric analysis-mean above minus below a cut (MAMBAC), maximum covariance (MAXCOV), and latent mode factor analysis (L-Mode)-in 1,374 federal prison inmates (905 males, 469 females). Whereas the total sample yielded ambiguous results, the male subsample produced dimensional results, and the female subsample produced taxonic results. Interpreting these findings in light of previous taxometric research on alcohol abuse and dependence it is speculated that while alcohol use disorders may be taxonic in female offenders, they are probably both taxonic and dimensional in male offenders. Two models of male alcohol use disorder in males are considered, one in which the diagnostic features are categorical and the severity of symptomatology is dimensional, and one in which some diagnostic features (e.g., withdrawal) are taxonic and other features (e.g., social problems) are dimensional.
Narin, B; Ozyörük, Y; Ulas, A
2014-05-30
This paper describes a two-dimensional code developed for analyzing two-phase deflagration-to-detonation transition (DDT) phenomenon in granular, energetic, solid, explosive ingredients. The two-dimensional model is constructed in full two-phase, and based on a highly coupled system of partial differential equations involving basic flow conservation equations and some constitutive relations borrowed from some one-dimensional studies that appeared in open literature. The whole system is solved using an optimized high-order accurate, explicit, central-difference scheme with selective-filtering/shock capturing (SF-SC) technique, to augment central-diffencing and prevent excessive dispersion. The sources of the equations describing particle-gas interactions in terms of momentum and energy transfers make the equation system quite stiff, and hence its explicit integration difficult. To ease the difficulties, a time-split approach is used allowing higher time steps. In the paper, the physical model for the sources of the equation system is given for a typical explosive, and several numerical calculations are carried out to assess the developed code. Microscale intergranular and/or intragranular effects including pore collapse, sublimation, pyrolysis, etc. are not taken into account for ignition and growth, and a basic temperature switch is applied in calculations to control ignition in the explosive domain. Results for one-dimensional DDT phenomenon are in good agreement with experimental and computational results available in literature. A typical shaped-charge wave-shaper case study is also performed to test the two-dimensional features of the code and it is observed that results are in good agreement with those of commercial software. Copyright © 2014 Elsevier B.V. All rights reserved.
Unique Chiral Interpenetrating d-f Heterometallic MOFs as Luminescent Sensors.
Wu, Zhi-Lei; Dong, Jie; Ni, Wei-Yan; Zhang, Bo-Wen; Cui, Jian-Zhong; Zhao, Bin
2015-06-01
One novel three-dimensional (3D) 3d-4f metal-organic framework (MOF), [TbZn(L)(CO3)2(H2O)]n (1) [HL = 4'-(4-carboxyphenyl)-2,2':6',2″-terpyridine], has been successfully synthesized and structurally characterized. Structural analysis shows that compound 1 features a unique chiral interpenetrating 3D framework for the first time. The resulting crystals of 1 are composed of enantiomers 1a (P41) and 1b (P43), as was clearly confirmed by the crystal structure and the corresponding circular dichroism (CD) analyses of eight randomly selected crystals. The investigations on CD spectra based on every single crystal clearly assigned the Cotton effect signals. The powder X-ray diffraction measurement of 1 after being immersed in common solvents reveals that 1 possess excellent solvent stability. Furthermore, luminescent studies imply that 1 displays highly selective luminescent sensing of aldehydes, such as formol, acetaldehyde, and propanal.
D GIS for Flood Modelling in River Valleys
NASA Astrophysics Data System (ADS)
Tymkow, P.; Karpina, M.; Borkowski, A.
2016-06-01
The objective of this study is implementation of system architecture for collecting and analysing data as well as visualizing results for hydrodynamic modelling of flood flows in river valleys using remote sensing methods, tree-dimensional geometry of spatial objects and GPU multithread processing. The proposed solution includes: spatial data acquisition segment, data processing and transformation, mathematical modelling of flow phenomena and results visualization. Data acquisition segment was based on aerial laser scanning supplemented by images in visible range. Vector data creation was based on automatic and semiautomatic algorithms of DTM and 3D spatial features modelling. Algorithms for buildings and vegetation geometry modelling were proposed or adopted from literature. The implementation of the framework was designed as modular software using open specifications and partially reusing open source projects. The database structure for gathering and sharing vector data, including flood modelling results, was created using PostgreSQL. For the internal structure of feature classes of spatial objects in a database, the CityGML standard was used. For the hydrodynamic modelling the solutions of Navier-Stokes equations in two-dimensional version was implemented. Visualization of geospatial data and flow model results was transferred to the client side application. This gave the independence from server hardware platform. A real-world case in Poland, which is a part of Widawa River valley near Wroclaw city, was selected to demonstrate the applicability of proposed system.
Makarem, Mohamadamin; Sawada, Daisuke; O'Neill, Hugh M.; ...
2017-04-21
Vibrational sum frequency generation (SFG) spectroscopy can selectively detect not only molecules at two-dimensional (2D) interfaces but also noncentrosymmetric domains interspersed in amorphous three-dimensional (3D) matrixes. However, the SFG analysis of 3D systems is more complicated than 2D systems because more variables are involved. One such variable is the distance between SFG-active domains in SFG-inactive matrixes. In this study, we fabricated control samples in which SFG-active cellulose crystals were uniaxially aligned in an amorphous matrix. Assuming uniform separation distances between cellulose crystals, the relative intensities of alkyl (CH) and hydroxyl (OH) SFG peaks of cellulose could be related to themore » intercrystallite distance. The experimentally measured CH/OH intensity ratio as a function of the intercrystallite distance could be explained reasonably well with a model constructed using the theoretically calculated hyperpolarizabilities of cellulose and the symmetry cancellation principle of dipoles antiparallel to each other. In conclusion, this comparison revealed physical insights into the intercrystallite distance dependence of the CH/OH SFG intensity ratio of cellulose, which can be used to interpret the SFG spectral features of plant cell walls in terms of mesoscale packing of cellulose microfibrils.« less
Deng, Changjian; Lv, Kun; Shi, Debo; Yang, Bo; Yu, Song; He, Zhiyi; Yan, Jia
2018-06-12
In this paper, a novel feature selection and fusion framework is proposed to enhance the discrimination ability of gas sensor arrays for odor identification. Firstly, we put forward an efficient feature selection method based on the separability and the dissimilarity to determine the feature selection order for each type of feature when increasing the dimension of selected feature subsets. Secondly, the K-nearest neighbor (KNN) classifier is applied to determine the dimensions of the optimal feature subsets for different types of features. Finally, in the process of establishing features fusion, we come up with a classification dominance feature fusion strategy which conducts an effective basic feature. Experimental results on two datasets show that the recognition rates of Database I and Database II achieve 97.5% and 80.11%, respectively, when k = 1 for KNN classifier and the distance metric is correlation distance (COR), which demonstrates the superiority of the proposed feature selection and fusion framework in representing signal features. The novel feature selection method proposed in this paper can effectively select feature subsets that are conducive to the classification, while the feature fusion framework can fuse various features which describe the different characteristics of sensor signals, for enhancing the discrimination ability of gas sensors and, to a certain extent, suppressing drift effect.
Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.
Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi
2017-09-22
DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.
An Autonomous Star Identification Algorithm Based on One-Dimensional Vector Pattern for Star Sensors
Luo, Liyan; Xu, Luping; Zhang, Hua
2015-01-01
In order to enhance the robustness and accelerate the recognition speed of star identification, an autonomous star identification algorithm for star sensors is proposed based on the one-dimensional vector pattern (one_DVP). In the proposed algorithm, the space geometry information of the observed stars is used to form the one-dimensional vector pattern of the observed star. The one-dimensional vector pattern of the same observed star remains unchanged when the stellar image rotates, so the problem of star identification is simplified as the comparison of the two feature vectors. The one-dimensional vector pattern is adopted to build the feature vector of the star pattern, which makes it possible to identify the observed stars robustly. The characteristics of the feature vector and the proposed search strategy for the matching pattern make it possible to achieve the recognition result as quickly as possible. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition accuracy and robustness by the proposed algorithm are better than those by the pyramid algorithm, the modified grid algorithm, and the LPT algorithm. The theoretical analysis and experimental results show that the proposed algorithm outperforms the other three star identification algorithms. PMID:26198233
Luo, Liyan; Xu, Luping; Zhang, Hua
2015-07-07
In order to enhance the robustness and accelerate the recognition speed of star identification, an autonomous star identification algorithm for star sensors is proposed based on the one-dimensional vector pattern (one_DVP). In the proposed algorithm, the space geometry information of the observed stars is used to form the one-dimensional vector pattern of the observed star. The one-dimensional vector pattern of the same observed star remains unchanged when the stellar image rotates, so the problem of star identification is simplified as the comparison of the two feature vectors. The one-dimensional vector pattern is adopted to build the feature vector of the star pattern, which makes it possible to identify the observed stars robustly. The characteristics of the feature vector and the proposed search strategy for the matching pattern make it possible to achieve the recognition result as quickly as possible. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition accuracy and robustness by the proposed algorithm are better than those by the pyramid algorithm, the modified grid algorithm, and the LPT algorithm. The theoretical analysis and experimental results show that the proposed algorithm outperforms the other three star identification algorithms.
Geist; Dauble
1998-09-01
/ Knowledge of the three-dimensional connectivity between rivers and groundwater within the hyporheic zone can be used to improve the definition of fall chinook salmon (Oncorhynchus tshawytscha) spawning habitat. Information exists on the microhabitat characteristics that define suitable salmon spawning habitat. However, traditional spawning habitat models that use these characteristics to predict available spawning habitat are restricted because they can not account for the heterogeneous nature of rivers. We present a conceptual spawning habitat model for fall chinook salmon that describes how geomorphic features of river channels create hydraulic processes, including hyporheic flows, that influence where salmon spawn in unconstrained reaches of large mainstem alluvial rivers. Two case studies based on empirical data from fall chinook salmon spawning areas in the Hanford Reach of the Columbia River are presented to illustrate important aspects of our conceptual model. We suggest that traditional habitat models and our conceptual model be combined to predict the limits of suitable fall chinook salmon spawning habitat. This approach can incorporate quantitative measures of river channel morphology, including general descriptors of geomorphic features at different spatial scales, in order to understand the processes influencing redd site selection and spawning habitat use. This information is needed in order to protect existing salmon spawning habitat in large rivers, as well as to recover habitat already lost.KEY WORDS: Hyporheic zone; Geomorphology; Spawning habitat; Large rivers; Fall chinook salmon; Habitat management
Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.
Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei
2016-01-13
An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.